Introduction to data.table
The data.table
package is a high-performance extension of R’s data.frame
that provides a concise syntax for data manipulation. It’s particularly efficient for large datasets.
Getting Started with data.table
First, let’s install and load the package:
# Install if needed (uncomment to run)
# install.packages("data.table")
# Load the package
library (data.table)
# Convert the built-in mtcars dataset to a data.table
dt_cars <- as.data.table (mtcars, keep.rownames = TRUE )
setnames (dt_cars, "rn" , "model" ) # Rename the rownames column
dt_cars
model mpg cyl disp hp drat wt qsec vs am
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
3: Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
5: Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
6: Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
7: Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
8: Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
9: Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
10: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
11: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
12: Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
13: Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
14: Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
15: Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
16: Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
17: Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
18: Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
19: Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
20: Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
21: Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
22: Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
23: AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
24: Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
25: Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
26: Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
27: Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
28: Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
29: Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
30: Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
31: Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
32: Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
model mpg cyl disp hp drat wt qsec vs am
gear carb
<num> <num>
1: 4 4
2: 4 4
3: 4 1
4: 3 1
5: 3 2
6: 3 1
7: 3 4
8: 4 2
9: 4 2
10: 4 4
11: 4 4
12: 3 3
13: 3 3
14: 3 3
15: 3 4
16: 3 4
17: 3 4
18: 4 1
19: 4 2
20: 4 1
21: 3 1
22: 3 2
23: 3 2
24: 3 4
25: 3 2
26: 4 1
27: 5 2
28: 5 2
29: 5 4
30: 5 6
31: 5 8
32: 4 2
gear carb
Basic Syntax: [i, j, by]
data.table uses a concise syntax based on [i, j, by]
: - i
: Subset rows (WHERE) - j
: Compute on columns (SELECT) - by
: Group by columns (GROUP BY)
Subsetting Rows (i)
# Select cars with 6 cylinders
dt_cars[cyl == 6 ]
model mpg cyl disp hp drat wt qsec vs am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4
3: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
4: Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
5: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
6: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
7: Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5
carb
<num>
1: 4
2: 4
3: 1
4: 1
5: 4
6: 4
7: 6
# Multiple conditions
dt_cars[cyl == 6 & mpg > 20 ]
model mpg cyl disp hp drat wt qsec vs am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4
3: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3
carb
<num>
1: 4
2: 4
3: 1
# Select specific rows by position
dt_cars[1 : 5 ]
model mpg cyl disp hp drat wt qsec vs am
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
4: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
5: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
gear carb
<num> <num>
1: 4 4
2: 4 4
3: 4 1
4: 3 1
5: 3 2
Selecting and Computing on Columns (j)
# Select specific columns
dt_cars[, .(mpg, hp, cyl)]
mpg hp cyl
<num> <num> <num>
1: 21.0 110 6
2: 21.0 110 6
3: 22.8 93 4
4: 21.4 110 6
5: 18.7 175 8
6: 18.1 105 6
7: 14.3 245 8
8: 24.4 62 4
9: 22.8 95 4
10: 19.2 123 6
11: 17.8 123 6
12: 16.4 180 8
13: 17.3 180 8
14: 15.2 180 8
15: 10.4 205 8
16: 10.4 215 8
17: 14.7 230 8
18: 32.4 66 4
19: 30.4 52 4
20: 33.9 65 4
21: 21.5 97 4
22: 15.5 150 8
23: 15.2 150 8
24: 13.3 245 8
25: 19.2 175 8
26: 27.3 66 4
27: 26.0 91 4
28: 30.4 113 4
29: 15.8 264 8
30: 19.7 175 6
31: 15.0 335 8
32: 21.4 109 4
mpg hp cyl
# Compute new values
dt_cars[, .(kpl = mpg * 0.425 )]
kpl
<num>
1: 8.9250
2: 8.9250
3: 9.6900
4: 9.0950
5: 7.9475
6: 7.6925
7: 6.0775
8: 10.3700
9: 9.6900
10: 8.1600
11: 7.5650
12: 6.9700
13: 7.3525
14: 6.4600
15: 4.4200
16: 4.4200
17: 6.2475
18: 13.7700
19: 12.9200
20: 14.4075
21: 9.1375
22: 6.5875
23: 6.4600
24: 5.6525
25: 8.1600
26: 11.6025
27: 11.0500
28: 12.9200
29: 6.7150
30: 8.3725
31: 6.3750
32: 9.0950
kpl
# Select and compute multiple columns
dt_cars[, .(model, kpl = mpg * 0.425 , hp_per_cyl = hp/ cyl)]
model kpl hp_per_cyl
<char> <num> <num>
1: Mazda RX4 8.9250 18.33333
2: Mazda RX4 Wag 8.9250 18.33333
3: Datsun 710 9.6900 23.25000
4: Hornet 4 Drive 9.0950 18.33333
5: Hornet Sportabout 7.9475 21.87500
6: Valiant 7.6925 17.50000
7: Duster 360 6.0775 30.62500
8: Merc 240D 10.3700 15.50000
9: Merc 230 9.6900 23.75000
10: Merc 280 8.1600 20.50000
11: Merc 280C 7.5650 20.50000
12: Merc 450SE 6.9700 22.50000
13: Merc 450SL 7.3525 22.50000
14: Merc 450SLC 6.4600 22.50000
15: Cadillac Fleetwood 4.4200 25.62500
16: Lincoln Continental 4.4200 26.87500
17: Chrysler Imperial 6.2475 28.75000
18: Fiat 128 13.7700 16.50000
19: Honda Civic 12.9200 13.00000
20: Toyota Corolla 14.4075 16.25000
21: Toyota Corona 9.1375 24.25000
22: Dodge Challenger 6.5875 18.75000
23: AMC Javelin 6.4600 18.75000
24: Camaro Z28 5.6525 30.62500
25: Pontiac Firebird 8.1600 21.87500
26: Fiat X1-9 11.6025 16.50000
27: Porsche 914-2 11.0500 22.75000
28: Lotus Europa 12.9200 28.25000
29: Ford Pantera L 6.7150 33.00000
30: Ferrari Dino 8.3725 29.16667
31: Maserati Bora 6.3750 41.87500
32: Volvo 142E 9.0950 27.25000
model kpl hp_per_cyl
# Apply functions
dt_cars[, .(avg_mpg = mean (mpg), max_hp = max (hp))]
avg_mpg max_hp
<num> <num>
1: 20.09062 335
Grouping (by)
# Group by cylinder and calculate statistics
dt_cars[, .(count = .N, avg_mpg = mean (mpg)), by = cyl]
cyl count avg_mpg
<num> <int> <num>
1: 6 7 19.74286
2: 4 11 26.66364
3: 8 14 15.10000
# Multiple grouping variables
dt_cars[, .(count = .N, avg_mpg = mean (mpg)), by = .(cyl, gear)]
cyl gear count avg_mpg
<num> <num> <int> <num>
1: 6 4 4 19.750
2: 4 4 8 26.925
3: 6 3 2 19.750
4: 8 3 12 15.050
5: 4 3 1 21.500
6: 4 5 2 28.200
7: 8 5 2 15.400
8: 6 5 1 19.700
# Grouping with expressions
dt_cars[, .(count = .N), by = .(cyl, high_mpg = mpg > 20 )]
cyl high_mpg count
<num> <lgcl> <int>
1: 6 TRUE 3
2: 4 TRUE 11
3: 8 FALSE 14
4: 6 FALSE 4
Special Symbols in data.table
data.table provides special symbols for common operations:
# .N: number of rows
dt_cars[, .N]
cyl N
<num> <int>
1: 6 7
2: 4 11
3: 8 14
# .SD: Subset of Data
dt_cars[, lapply (.SD, mean), by = cyl, .SDcols = c ("mpg" , "hp" , "wt" )]
cyl mpg hp wt
<num> <num> <num> <num>
1: 6 19.74286 122.28571 3.117143
2: 4 26.66364 82.63636 2.285727
3: 8 15.10000 209.21429 3.999214
# .I: Row numbers
dt_cars[, .I[1 : 2 ], by = cyl] # First two row numbers for each cyl group
cyl V1
<num> <int>
1: 6 1
2: 6 2
3: 4 3
4: 4 8
5: 8 5
6: 8 7
Modifying Data
data.table allows efficient in-place modifications:
# Create a copy to avoid modifying the original
dt_copy <- copy (dt_cars)
# Add a new column
dt_copy[, efficiency : = mpg/ wt]
dt_copy
Index: <cyl>
model mpg cyl disp hp drat wt qsec vs am
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
3: Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
4: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
5: Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0
6: Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0
7: Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0
8: Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0
9: Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0
10: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0
11: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0
12: Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0
13: Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0
14: Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0
15: Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0
16: Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0
17: Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0
18: Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1
19: Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1
20: Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1
21: Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0
22: Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0
23: AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0
24: Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0
25: Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0
26: Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1
27: Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1
28: Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1
29: Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1
30: Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1
31: Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1
32: Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1
model mpg cyl disp hp drat wt qsec vs am
gear carb efficiency
<num> <num> <num>
1: 4 4 8.015267
2: 4 4 7.304348
3: 4 1 9.827586
4: 3 1 6.656299
5: 3 2 5.436047
6: 3 1 5.231214
7: 3 4 4.005602
8: 4 2 7.648903
9: 4 2 7.238095
10: 4 4 5.581395
11: 4 4 5.174419
12: 3 3 4.029484
13: 3 3 4.638070
14: 3 3 4.021164
15: 3 4 1.980952
16: 3 4 1.917404
17: 3 4 2.750234
18: 4 1 14.727273
19: 4 2 18.823529
20: 4 1 18.474114
21: 3 1 8.722110
22: 3 2 4.403409
23: 3 2 4.425036
24: 3 4 3.463542
25: 3 2 4.993498
26: 4 1 14.108527
27: 5 2 12.149533
28: 5 2 20.092531
29: 5 4 4.984227
30: 5 6 7.111913
31: 5 8 4.201681
32: 4 2 7.697842
gear carb efficiency
# Update existing values
dt_copy[cyl == 4 , mpg : = mpg * 1.1 ] # Increase mpg by 10% for 4-cylinder cars
dt_copy
Index: <cyl>
model mpg cyl disp hp drat wt qsec vs am
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.00 6 160.0 110 3.90 2.620 16.46 0 1
2: Mazda RX4 Wag 21.00 6 160.0 110 3.90 2.875 17.02 0 1
3: Datsun 710 25.08 4 108.0 93 3.85 2.320 18.61 1 1
4: Hornet 4 Drive 21.40 6 258.0 110 3.08 3.215 19.44 1 0
5: Hornet Sportabout 18.70 8 360.0 175 3.15 3.440 17.02 0 0
6: Valiant 18.10 6 225.0 105 2.76 3.460 20.22 1 0
7: Duster 360 14.30 8 360.0 245 3.21 3.570 15.84 0 0
8: Merc 240D 26.84 4 146.7 62 3.69 3.190 20.00 1 0
9: Merc 230 25.08 4 140.8 95 3.92 3.150 22.90 1 0
10: Merc 280 19.20 6 167.6 123 3.92 3.440 18.30 1 0
11: Merc 280C 17.80 6 167.6 123 3.92 3.440 18.90 1 0
12: Merc 450SE 16.40 8 275.8 180 3.07 4.070 17.40 0 0
13: Merc 450SL 17.30 8 275.8 180 3.07 3.730 17.60 0 0
14: Merc 450SLC 15.20 8 275.8 180 3.07 3.780 18.00 0 0
15: Cadillac Fleetwood 10.40 8 472.0 205 2.93 5.250 17.98 0 0
16: Lincoln Continental 10.40 8 460.0 215 3.00 5.424 17.82 0 0
17: Chrysler Imperial 14.70 8 440.0 230 3.23 5.345 17.42 0 0
18: Fiat 128 35.64 4 78.7 66 4.08 2.200 19.47 1 1
19: Honda Civic 33.44 4 75.7 52 4.93 1.615 18.52 1 1
20: Toyota Corolla 37.29 4 71.1 65 4.22 1.835 19.90 1 1
21: Toyota Corona 23.65 4 120.1 97 3.70 2.465 20.01 1 0
22: Dodge Challenger 15.50 8 318.0 150 2.76 3.520 16.87 0 0
23: AMC Javelin 15.20 8 304.0 150 3.15 3.435 17.30 0 0
24: Camaro Z28 13.30 8 350.0 245 3.73 3.840 15.41 0 0
25: Pontiac Firebird 19.20 8 400.0 175 3.08 3.845 17.05 0 0
26: Fiat X1-9 30.03 4 79.0 66 4.08 1.935 18.90 1 1
27: Porsche 914-2 28.60 4 120.3 91 4.43 2.140 16.70 0 1
28: Lotus Europa 33.44 4 95.1 113 3.77 1.513 16.90 1 1
29: Ford Pantera L 15.80 8 351.0 264 4.22 3.170 14.50 0 1
30: Ferrari Dino 19.70 6 145.0 175 3.62 2.770 15.50 0 1
31: Maserati Bora 15.00 8 301.0 335 3.54 3.570 14.60 0 1
32: Volvo 142E 23.54 4 121.0 109 4.11 2.780 18.60 1 1
model mpg cyl disp hp drat wt qsec vs am
gear carb efficiency
<num> <num> <num>
1: 4 4 8.015267
2: 4 4 7.304348
3: 4 1 9.827586
4: 3 1 6.656299
5: 3 2 5.436047
6: 3 1 5.231214
7: 3 4 4.005602
8: 4 2 7.648903
9: 4 2 7.238095
10: 4 4 5.581395
11: 4 4 5.174419
12: 3 3 4.029484
13: 3 3 4.638070
14: 3 3 4.021164
15: 3 4 1.980952
16: 3 4 1.917404
17: 3 4 2.750234
18: 4 1 14.727273
19: 4 2 18.823529
20: 4 1 18.474114
21: 3 1 8.722110
22: 3 2 4.403409
23: 3 2 4.425036
24: 3 4 3.463542
25: 3 2 4.993498
26: 4 1 14.108527
27: 5 2 12.149533
28: 5 2 20.092531
29: 5 4 4.984227
30: 5 6 7.111913
31: 5 8 4.201681
32: 4 2 7.697842
gear carb efficiency
# Delete columns
dt_copy[, c ("carb" , "vs" ) : = NULL ]
dt_copy
Index: <cyl>
model mpg cyl disp hp drat wt qsec am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.00 6 160.0 110 3.90 2.620 16.46 1 4
2: Mazda RX4 Wag 21.00 6 160.0 110 3.90 2.875 17.02 1 4
3: Datsun 710 25.08 4 108.0 93 3.85 2.320 18.61 1 4
4: Hornet 4 Drive 21.40 6 258.0 110 3.08 3.215 19.44 0 3
5: Hornet Sportabout 18.70 8 360.0 175 3.15 3.440 17.02 0 3
6: Valiant 18.10 6 225.0 105 2.76 3.460 20.22 0 3
7: Duster 360 14.30 8 360.0 245 3.21 3.570 15.84 0 3
8: Merc 240D 26.84 4 146.7 62 3.69 3.190 20.00 0 4
9: Merc 230 25.08 4 140.8 95 3.92 3.150 22.90 0 4
10: Merc 280 19.20 6 167.6 123 3.92 3.440 18.30 0 4
11: Merc 280C 17.80 6 167.6 123 3.92 3.440 18.90 0 4
12: Merc 450SE 16.40 8 275.8 180 3.07 4.070 17.40 0 3
13: Merc 450SL 17.30 8 275.8 180 3.07 3.730 17.60 0 3
14: Merc 450SLC 15.20 8 275.8 180 3.07 3.780 18.00 0 3
15: Cadillac Fleetwood 10.40 8 472.0 205 2.93 5.250 17.98 0 3
16: Lincoln Continental 10.40 8 460.0 215 3.00 5.424 17.82 0 3
17: Chrysler Imperial 14.70 8 440.0 230 3.23 5.345 17.42 0 3
18: Fiat 128 35.64 4 78.7 66 4.08 2.200 19.47 1 4
19: Honda Civic 33.44 4 75.7 52 4.93 1.615 18.52 1 4
20: Toyota Corolla 37.29 4 71.1 65 4.22 1.835 19.90 1 4
21: Toyota Corona 23.65 4 120.1 97 3.70 2.465 20.01 0 3
22: Dodge Challenger 15.50 8 318.0 150 2.76 3.520 16.87 0 3
23: AMC Javelin 15.20 8 304.0 150 3.15 3.435 17.30 0 3
24: Camaro Z28 13.30 8 350.0 245 3.73 3.840 15.41 0 3
25: Pontiac Firebird 19.20 8 400.0 175 3.08 3.845 17.05 0 3
26: Fiat X1-9 30.03 4 79.0 66 4.08 1.935 18.90 1 4
27: Porsche 914-2 28.60 4 120.3 91 4.43 2.140 16.70 1 5
28: Lotus Europa 33.44 4 95.1 113 3.77 1.513 16.90 1 5
29: Ford Pantera L 15.80 8 351.0 264 4.22 3.170 14.50 1 5
30: Ferrari Dino 19.70 6 145.0 175 3.62 2.770 15.50 1 5
31: Maserati Bora 15.00 8 301.0 335 3.54 3.570 14.60 1 5
32: Volvo 142E 23.54 4 121.0 109 4.11 2.780 18.60 1 4
model mpg cyl disp hp drat wt qsec am gear
efficiency
<num>
1: 8.015267
2: 7.304348
3: 9.827586
4: 6.656299
5: 5.436047
6: 5.231214
7: 4.005602
8: 7.648903
9: 7.238095
10: 5.581395
11: 5.174419
12: 4.029484
13: 4.638070
14: 4.021164
15: 1.980952
16: 1.917404
17: 2.750234
18: 14.727273
19: 18.823529
20: 18.474114
21: 8.722110
22: 4.403409
23: 4.425036
24: 3.463542
25: 4.993498
26: 14.108527
27: 12.149533
28: 20.092531
29: 4.984227
30: 7.111913
31: 4.201681
32: 7.697842
efficiency
Keys and Indexing
Setting keys enables fast subsetting and joins:
# Set a key
setkey (dt_copy, cyl)
dt_copy
Key: <cyl>
model mpg cyl disp hp drat wt qsec am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Datsun 710 25.08 4 108.0 93 3.85 2.320 18.61 1 4
2: Merc 240D 26.84 4 146.7 62 3.69 3.190 20.00 0 4
3: Merc 230 25.08 4 140.8 95 3.92 3.150 22.90 0 4
4: Fiat 128 35.64 4 78.7 66 4.08 2.200 19.47 1 4
5: Honda Civic 33.44 4 75.7 52 4.93 1.615 18.52 1 4
6: Toyota Corolla 37.29 4 71.1 65 4.22 1.835 19.90 1 4
7: Toyota Corona 23.65 4 120.1 97 3.70 2.465 20.01 0 3
8: Fiat X1-9 30.03 4 79.0 66 4.08 1.935 18.90 1 4
9: Porsche 914-2 28.60 4 120.3 91 4.43 2.140 16.70 1 5
10: Lotus Europa 33.44 4 95.1 113 3.77 1.513 16.90 1 5
11: Volvo 142E 23.54 4 121.0 109 4.11 2.780 18.60 1 4
12: Mazda RX4 21.00 6 160.0 110 3.90 2.620 16.46 1 4
13: Mazda RX4 Wag 21.00 6 160.0 110 3.90 2.875 17.02 1 4
14: Hornet 4 Drive 21.40 6 258.0 110 3.08 3.215 19.44 0 3
15: Valiant 18.10 6 225.0 105 2.76 3.460 20.22 0 3
16: Merc 280 19.20 6 167.6 123 3.92 3.440 18.30 0 4
17: Merc 280C 17.80 6 167.6 123 3.92 3.440 18.90 0 4
18: Ferrari Dino 19.70 6 145.0 175 3.62 2.770 15.50 1 5
19: Hornet Sportabout 18.70 8 360.0 175 3.15 3.440 17.02 0 3
20: Duster 360 14.30 8 360.0 245 3.21 3.570 15.84 0 3
21: Merc 450SE 16.40 8 275.8 180 3.07 4.070 17.40 0 3
22: Merc 450SL 17.30 8 275.8 180 3.07 3.730 17.60 0 3
23: Merc 450SLC 15.20 8 275.8 180 3.07 3.780 18.00 0 3
24: Cadillac Fleetwood 10.40 8 472.0 205 2.93 5.250 17.98 0 3
25: Lincoln Continental 10.40 8 460.0 215 3.00 5.424 17.82 0 3
26: Chrysler Imperial 14.70 8 440.0 230 3.23 5.345 17.42 0 3
27: Dodge Challenger 15.50 8 318.0 150 2.76 3.520 16.87 0 3
28: AMC Javelin 15.20 8 304.0 150 3.15 3.435 17.30 0 3
29: Camaro Z28 13.30 8 350.0 245 3.73 3.840 15.41 0 3
30: Pontiac Firebird 19.20 8 400.0 175 3.08 3.845 17.05 0 3
31: Ford Pantera L 15.80 8 351.0 264 4.22 3.170 14.50 1 5
32: Maserati Bora 15.00 8 301.0 335 3.54 3.570 14.60 1 5
model mpg cyl disp hp drat wt qsec am gear
efficiency
<num>
1: 9.827586
2: 7.648903
3: 7.238095
4: 14.727273
5: 18.823529
6: 18.474114
7: 8.722110
8: 14.108527
9: 12.149533
10: 20.092531
11: 7.697842
12: 8.015267
13: 7.304348
14: 6.656299
15: 5.231214
16: 5.581395
17: 5.174419
18: 7.111913
19: 5.436047
20: 4.005602
21: 4.029484
22: 4.638070
23: 4.021164
24: 1.980952
25: 1.917404
26: 2.750234
27: 4.403409
28: 4.425036
29: 3.463542
30: 4.993498
31: 4.984227
32: 4.201681
efficiency
# Fast subsetting using key
dt_copy[.(6 )] # All rows where cyl == 6
Key: <cyl>
model mpg cyl disp hp drat wt qsec am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 1 4
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 1 4
3: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 0 3
4: Valiant 18.1 6 225.0 105 2.76 3.460 20.22 0 3
5: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 0 4
6: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 0 4
7: Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 1 5
efficiency
<num>
1: 8.015267
2: 7.304348
3: 6.656299
4: 5.231214
5: 5.581395
6: 5.174419
7: 7.111913
# Multiple keys
setkey (dt_copy, cyl, gear)
dt_copy[.(6 , 4 )] # All rows where cyl == 6 and gear == 4
Key: <cyl, gear>
model mpg cyl disp hp drat wt qsec am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 1 4
2: Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 1 4
3: Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 0 4
4: Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 0 4
efficiency
<num>
1: 8.015267
2: 7.304348
3: 5.581395
4: 5.174419
Joins in data.table
data.table provides efficient joins using keys:
# Create sample data.tables
manufacturers <- data.table (
make = c ("Honda" , "Toyota" , "Ford" , "BMW" , "Mercedes" ),
country = c ("Japan" , "Japan" , "USA" , "Germany" , "Germany" )
)
cars <- data.table (
model = c ("Civic" , "Corolla" , "Focus" , "3 Series" , "Fiesta" ),
make = c ("Honda" , "Toyota" , "Ford" , "BMW" , "Ford" )
)
# Set keys for joining
setkey (manufacturers, make)
setkey (cars, make)
# Inner join
cars[manufacturers]
Key: <make>
model make country
<char> <char> <char>
1: 3 Series BMW Germany
2: Focus Ford USA
3: Fiesta Ford USA
4: Civic Honda Japan
5: <NA> Mercedes Germany
6: Corolla Toyota Japan
# Left join
manufacturers[cars, nomatch= NA ]
Key: <make>
make country model
<char> <char> <char>
1: BMW Germany 3 Series
2: Ford USA Focus
3: Ford USA Fiesta
4: Honda Japan Civic
5: Toyota Japan Corolla
# Non-equi joins
dt_cars[dt_cars[, .(max_mpg = max (mpg)), by = cyl], on = .(mpg = max_mpg, cyl)]
model mpg cyl disp hp drat wt qsec vs am gear
<char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1: Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
2: Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4
3: Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
carb
<num>
1: 1
2: 1
3: 2
Reshaping data
data.table provides functions for reshaping data:
# Create a sample data.table
dt <- data.table (
id = rep (1 : 3 , each = 2 ),
variable = rep (c ("height" , "weight" ), 3 ),
value = c (170 , 68 , 155 , 52 , 182 , 75 )
)
dt
id variable value
<int> <char> <num>
1: 1 height 170
2: 1 weight 68
3: 2 height 155
4: 2 weight 52
5: 3 height 182
6: 3 weight 75
# Wide to long
dt_wide <- dcast (dt, id ~ variable, value.var = "value" )
dt_wide
Key: <id>
id height weight
<int> <num> <num>
1: 1 170 68
2: 2 155 52
3: 3 182 75
# Long to wide
dt_long <- melt (dt_wide, id.vars = "id" , variable.name = "measure" , value.name = "value" )
dt_long
id measure value
<int> <fctr> <num>
1: 1 height 170
2: 2 height 155
3: 3 height 182
4: 1 weight 68
5: 2 weight 52
6: 3 weight 75