Introduction to data.table

data-manipulation
performance
Learn the basics of fast data manipulation with the data.table package in R
Published

March 10, 2023

Introduction to data.table

The data.table package is a high-performance extension of R’s data.frame that provides a concise syntax for data manipulation. It’s particularly efficient for large datasets.

Getting Started with data.table

First, let’s install and load the package:

# Install if needed (uncomment to run)
# install.packages("data.table")

# Load the package
library(data.table)

# Convert the built-in mtcars dataset to a data.table
dt_cars <- as.data.table(mtcars, keep.rownames = TRUE)
setnames(dt_cars, "rn", "model")  # Rename the rownames column
dt_cars
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
                 <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
 1:           Mazda RX4  21.0     6 160.0   110  3.90 2.620 16.46     0     1
 2:       Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875 17.02     0     1
 3:          Datsun 710  22.8     4 108.0    93  3.85 2.320 18.61     1     1
 4:      Hornet 4 Drive  21.4     6 258.0   110  3.08 3.215 19.44     1     0
 5:   Hornet Sportabout  18.7     8 360.0   175  3.15 3.440 17.02     0     0
 6:             Valiant  18.1     6 225.0   105  2.76 3.460 20.22     1     0
 7:          Duster 360  14.3     8 360.0   245  3.21 3.570 15.84     0     0
 8:           Merc 240D  24.4     4 146.7    62  3.69 3.190 20.00     1     0
 9:            Merc 230  22.8     4 140.8    95  3.92 3.150 22.90     1     0
10:            Merc 280  19.2     6 167.6   123  3.92 3.440 18.30     1     0
11:           Merc 280C  17.8     6 167.6   123  3.92 3.440 18.90     1     0
12:          Merc 450SE  16.4     8 275.8   180  3.07 4.070 17.40     0     0
13:          Merc 450SL  17.3     8 275.8   180  3.07 3.730 17.60     0     0
14:         Merc 450SLC  15.2     8 275.8   180  3.07 3.780 18.00     0     0
15:  Cadillac Fleetwood  10.4     8 472.0   205  2.93 5.250 17.98     0     0
16: Lincoln Continental  10.4     8 460.0   215  3.00 5.424 17.82     0     0
17:   Chrysler Imperial  14.7     8 440.0   230  3.23 5.345 17.42     0     0
18:            Fiat 128  32.4     4  78.7    66  4.08 2.200 19.47     1     1
19:         Honda Civic  30.4     4  75.7    52  4.93 1.615 18.52     1     1
20:      Toyota Corolla  33.9     4  71.1    65  4.22 1.835 19.90     1     1
21:       Toyota Corona  21.5     4 120.1    97  3.70 2.465 20.01     1     0
22:    Dodge Challenger  15.5     8 318.0   150  2.76 3.520 16.87     0     0
23:         AMC Javelin  15.2     8 304.0   150  3.15 3.435 17.30     0     0
24:          Camaro Z28  13.3     8 350.0   245  3.73 3.840 15.41     0     0
25:    Pontiac Firebird  19.2     8 400.0   175  3.08 3.845 17.05     0     0
26:           Fiat X1-9  27.3     4  79.0    66  4.08 1.935 18.90     1     1
27:       Porsche 914-2  26.0     4 120.3    91  4.43 2.140 16.70     0     1
28:        Lotus Europa  30.4     4  95.1   113  3.77 1.513 16.90     1     1
29:      Ford Pantera L  15.8     8 351.0   264  4.22 3.170 14.50     0     1
30:        Ferrari Dino  19.7     6 145.0   175  3.62 2.770 15.50     0     1
31:       Maserati Bora  15.0     8 301.0   335  3.54 3.570 14.60     0     1
32:          Volvo 142E  21.4     4 121.0   109  4.11 2.780 18.60     1     1
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
     gear  carb
    <num> <num>
 1:     4     4
 2:     4     4
 3:     4     1
 4:     3     1
 5:     3     2
 6:     3     1
 7:     3     4
 8:     4     2
 9:     4     2
10:     4     4
11:     4     4
12:     3     3
13:     3     3
14:     3     3
15:     3     4
16:     3     4
17:     3     4
18:     4     1
19:     4     2
20:     4     1
21:     3     1
22:     3     2
23:     3     2
24:     3     4
25:     3     2
26:     4     1
27:     5     2
28:     5     2
29:     5     4
30:     5     6
31:     5     8
32:     4     2
     gear  carb

Basic Syntax: [i, j, by]

data.table uses a concise syntax based on [i, j, by]: - i: Subset rows (WHERE) - j: Compute on columns (SELECT) - by: Group by columns (GROUP BY)

Subsetting Rows (i)

# Select cars with 6 cylinders
dt_cars[cyl == 6]
            model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
           <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:      Mazda RX4  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4
2:  Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4
3: Hornet 4 Drive  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3
4:        Valiant  18.1     6 225.0   105  2.76 3.460 20.22     1     0     3
5:       Merc 280  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4
6:      Merc 280C  17.8     6 167.6   123  3.92 3.440 18.90     1     0     4
7:   Ferrari Dino  19.7     6 145.0   175  3.62 2.770 15.50     0     1     5
    carb
   <num>
1:     4
2:     4
3:     1
4:     1
5:     4
6:     4
7:     6
# Multiple conditions
dt_cars[cyl == 6 & mpg > 20]
            model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
           <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:      Mazda RX4  21.0     6   160   110  3.90 2.620 16.46     0     1     4
2:  Mazda RX4 Wag  21.0     6   160   110  3.90 2.875 17.02     0     1     4
3: Hornet 4 Drive  21.4     6   258   110  3.08 3.215 19.44     1     0     3
    carb
   <num>
1:     4
2:     4
3:     1
# Select specific rows by position
dt_cars[1:5]
               model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
              <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:         Mazda RX4  21.0     6   160   110  3.90 2.620 16.46     0     1
2:     Mazda RX4 Wag  21.0     6   160   110  3.90 2.875 17.02     0     1
3:        Datsun 710  22.8     4   108    93  3.85 2.320 18.61     1     1
4:    Hornet 4 Drive  21.4     6   258   110  3.08 3.215 19.44     1     0
5: Hornet Sportabout  18.7     8   360   175  3.15 3.440 17.02     0     0
    gear  carb
   <num> <num>
1:     4     4
2:     4     4
3:     4     1
4:     3     1
5:     3     2

Selecting and Computing on Columns (j)

# Select specific columns
dt_cars[, .(mpg, hp, cyl)]
      mpg    hp   cyl
    <num> <num> <num>
 1:  21.0   110     6
 2:  21.0   110     6
 3:  22.8    93     4
 4:  21.4   110     6
 5:  18.7   175     8
 6:  18.1   105     6
 7:  14.3   245     8
 8:  24.4    62     4
 9:  22.8    95     4
10:  19.2   123     6
11:  17.8   123     6
12:  16.4   180     8
13:  17.3   180     8
14:  15.2   180     8
15:  10.4   205     8
16:  10.4   215     8
17:  14.7   230     8
18:  32.4    66     4
19:  30.4    52     4
20:  33.9    65     4
21:  21.5    97     4
22:  15.5   150     8
23:  15.2   150     8
24:  13.3   245     8
25:  19.2   175     8
26:  27.3    66     4
27:  26.0    91     4
28:  30.4   113     4
29:  15.8   264     8
30:  19.7   175     6
31:  15.0   335     8
32:  21.4   109     4
      mpg    hp   cyl
# Compute new values
dt_cars[, .(kpl = mpg * 0.425)]
        kpl
      <num>
 1:  8.9250
 2:  8.9250
 3:  9.6900
 4:  9.0950
 5:  7.9475
 6:  7.6925
 7:  6.0775
 8: 10.3700
 9:  9.6900
10:  8.1600
11:  7.5650
12:  6.9700
13:  7.3525
14:  6.4600
15:  4.4200
16:  4.4200
17:  6.2475
18: 13.7700
19: 12.9200
20: 14.4075
21:  9.1375
22:  6.5875
23:  6.4600
24:  5.6525
25:  8.1600
26: 11.6025
27: 11.0500
28: 12.9200
29:  6.7150
30:  8.3725
31:  6.3750
32:  9.0950
        kpl
# Select and compute multiple columns
dt_cars[, .(model, kpl = mpg * 0.425, hp_per_cyl = hp/cyl)]
                  model     kpl hp_per_cyl
                 <char>   <num>      <num>
 1:           Mazda RX4  8.9250   18.33333
 2:       Mazda RX4 Wag  8.9250   18.33333
 3:          Datsun 710  9.6900   23.25000
 4:      Hornet 4 Drive  9.0950   18.33333
 5:   Hornet Sportabout  7.9475   21.87500
 6:             Valiant  7.6925   17.50000
 7:          Duster 360  6.0775   30.62500
 8:           Merc 240D 10.3700   15.50000
 9:            Merc 230  9.6900   23.75000
10:            Merc 280  8.1600   20.50000
11:           Merc 280C  7.5650   20.50000
12:          Merc 450SE  6.9700   22.50000
13:          Merc 450SL  7.3525   22.50000
14:         Merc 450SLC  6.4600   22.50000
15:  Cadillac Fleetwood  4.4200   25.62500
16: Lincoln Continental  4.4200   26.87500
17:   Chrysler Imperial  6.2475   28.75000
18:            Fiat 128 13.7700   16.50000
19:         Honda Civic 12.9200   13.00000
20:      Toyota Corolla 14.4075   16.25000
21:       Toyota Corona  9.1375   24.25000
22:    Dodge Challenger  6.5875   18.75000
23:         AMC Javelin  6.4600   18.75000
24:          Camaro Z28  5.6525   30.62500
25:    Pontiac Firebird  8.1600   21.87500
26:           Fiat X1-9 11.6025   16.50000
27:       Porsche 914-2 11.0500   22.75000
28:        Lotus Europa 12.9200   28.25000
29:      Ford Pantera L  6.7150   33.00000
30:        Ferrari Dino  8.3725   29.16667
31:       Maserati Bora  6.3750   41.87500
32:          Volvo 142E  9.0950   27.25000
                  model     kpl hp_per_cyl
# Apply functions
dt_cars[, .(avg_mpg = mean(mpg), max_hp = max(hp))]
    avg_mpg max_hp
      <num>  <num>
1: 20.09062    335

Grouping (by)

# Group by cylinder and calculate statistics
dt_cars[, .(count = .N, avg_mpg = mean(mpg)), by = cyl]
     cyl count  avg_mpg
   <num> <int>    <num>
1:     6     7 19.74286
2:     4    11 26.66364
3:     8    14 15.10000
# Multiple grouping variables
dt_cars[, .(count = .N, avg_mpg = mean(mpg)), by = .(cyl, gear)]
     cyl  gear count avg_mpg
   <num> <num> <int>   <num>
1:     6     4     4  19.750
2:     4     4     8  26.925
3:     6     3     2  19.750
4:     8     3    12  15.050
5:     4     3     1  21.500
6:     4     5     2  28.200
7:     8     5     2  15.400
8:     6     5     1  19.700
# Grouping with expressions
dt_cars[, .(count = .N), by = .(cyl, high_mpg = mpg > 20)]
     cyl high_mpg count
   <num>   <lgcl> <int>
1:     6     TRUE     3
2:     4     TRUE    11
3:     8    FALSE    14
4:     6    FALSE     4

Special Symbols in data.table

data.table provides special symbols for common operations:

# .N: number of rows
dt_cars[, .N]
[1] 32
dt_cars[, .N, by = cyl]
     cyl     N
   <num> <int>
1:     6     7
2:     4    11
3:     8    14
# .SD: Subset of Data
dt_cars[, lapply(.SD, mean), by = cyl, .SDcols = c("mpg", "hp", "wt")]
     cyl      mpg        hp       wt
   <num>    <num>     <num>    <num>
1:     6 19.74286 122.28571 3.117143
2:     4 26.66364  82.63636 2.285727
3:     8 15.10000 209.21429 3.999214
# .I: Row numbers
dt_cars[, .I[1:2], by = cyl]  # First two row numbers for each cyl group
     cyl    V1
   <num> <int>
1:     6     1
2:     6     2
3:     4     3
4:     4     8
5:     8     5
6:     8     7

Modifying Data

data.table allows efficient in-place modifications:

# Create a copy to avoid modifying the original
dt_copy <- copy(dt_cars)

# Add a new column
dt_copy[, efficiency := mpg/wt]
dt_copy
Index: <cyl>
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
                 <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
 1:           Mazda RX4  21.0     6 160.0   110  3.90 2.620 16.46     0     1
 2:       Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875 17.02     0     1
 3:          Datsun 710  22.8     4 108.0    93  3.85 2.320 18.61     1     1
 4:      Hornet 4 Drive  21.4     6 258.0   110  3.08 3.215 19.44     1     0
 5:   Hornet Sportabout  18.7     8 360.0   175  3.15 3.440 17.02     0     0
 6:             Valiant  18.1     6 225.0   105  2.76 3.460 20.22     1     0
 7:          Duster 360  14.3     8 360.0   245  3.21 3.570 15.84     0     0
 8:           Merc 240D  24.4     4 146.7    62  3.69 3.190 20.00     1     0
 9:            Merc 230  22.8     4 140.8    95  3.92 3.150 22.90     1     0
10:            Merc 280  19.2     6 167.6   123  3.92 3.440 18.30     1     0
11:           Merc 280C  17.8     6 167.6   123  3.92 3.440 18.90     1     0
12:          Merc 450SE  16.4     8 275.8   180  3.07 4.070 17.40     0     0
13:          Merc 450SL  17.3     8 275.8   180  3.07 3.730 17.60     0     0
14:         Merc 450SLC  15.2     8 275.8   180  3.07 3.780 18.00     0     0
15:  Cadillac Fleetwood  10.4     8 472.0   205  2.93 5.250 17.98     0     0
16: Lincoln Continental  10.4     8 460.0   215  3.00 5.424 17.82     0     0
17:   Chrysler Imperial  14.7     8 440.0   230  3.23 5.345 17.42     0     0
18:            Fiat 128  32.4     4  78.7    66  4.08 2.200 19.47     1     1
19:         Honda Civic  30.4     4  75.7    52  4.93 1.615 18.52     1     1
20:      Toyota Corolla  33.9     4  71.1    65  4.22 1.835 19.90     1     1
21:       Toyota Corona  21.5     4 120.1    97  3.70 2.465 20.01     1     0
22:    Dodge Challenger  15.5     8 318.0   150  2.76 3.520 16.87     0     0
23:         AMC Javelin  15.2     8 304.0   150  3.15 3.435 17.30     0     0
24:          Camaro Z28  13.3     8 350.0   245  3.73 3.840 15.41     0     0
25:    Pontiac Firebird  19.2     8 400.0   175  3.08 3.845 17.05     0     0
26:           Fiat X1-9  27.3     4  79.0    66  4.08 1.935 18.90     1     1
27:       Porsche 914-2  26.0     4 120.3    91  4.43 2.140 16.70     0     1
28:        Lotus Europa  30.4     4  95.1   113  3.77 1.513 16.90     1     1
29:      Ford Pantera L  15.8     8 351.0   264  4.22 3.170 14.50     0     1
30:        Ferrari Dino  19.7     6 145.0   175  3.62 2.770 15.50     0     1
31:       Maserati Bora  15.0     8 301.0   335  3.54 3.570 14.60     0     1
32:          Volvo 142E  21.4     4 121.0   109  4.11 2.780 18.60     1     1
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
     gear  carb efficiency
    <num> <num>      <num>
 1:     4     4   8.015267
 2:     4     4   7.304348
 3:     4     1   9.827586
 4:     3     1   6.656299
 5:     3     2   5.436047
 6:     3     1   5.231214
 7:     3     4   4.005602
 8:     4     2   7.648903
 9:     4     2   7.238095
10:     4     4   5.581395
11:     4     4   5.174419
12:     3     3   4.029484
13:     3     3   4.638070
14:     3     3   4.021164
15:     3     4   1.980952
16:     3     4   1.917404
17:     3     4   2.750234
18:     4     1  14.727273
19:     4     2  18.823529
20:     4     1  18.474114
21:     3     1   8.722110
22:     3     2   4.403409
23:     3     2   4.425036
24:     3     4   3.463542
25:     3     2   4.993498
26:     4     1  14.108527
27:     5     2  12.149533
28:     5     2  20.092531
29:     5     4   4.984227
30:     5     6   7.111913
31:     5     8   4.201681
32:     4     2   7.697842
     gear  carb efficiency
# Update existing values
dt_copy[cyl == 4, mpg := mpg * 1.1]  # Increase mpg by 10% for 4-cylinder cars
dt_copy
Index: <cyl>
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
                 <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
 1:           Mazda RX4 21.00     6 160.0   110  3.90 2.620 16.46     0     1
 2:       Mazda RX4 Wag 21.00     6 160.0   110  3.90 2.875 17.02     0     1
 3:          Datsun 710 25.08     4 108.0    93  3.85 2.320 18.61     1     1
 4:      Hornet 4 Drive 21.40     6 258.0   110  3.08 3.215 19.44     1     0
 5:   Hornet Sportabout 18.70     8 360.0   175  3.15 3.440 17.02     0     0
 6:             Valiant 18.10     6 225.0   105  2.76 3.460 20.22     1     0
 7:          Duster 360 14.30     8 360.0   245  3.21 3.570 15.84     0     0
 8:           Merc 240D 26.84     4 146.7    62  3.69 3.190 20.00     1     0
 9:            Merc 230 25.08     4 140.8    95  3.92 3.150 22.90     1     0
10:            Merc 280 19.20     6 167.6   123  3.92 3.440 18.30     1     0
11:           Merc 280C 17.80     6 167.6   123  3.92 3.440 18.90     1     0
12:          Merc 450SE 16.40     8 275.8   180  3.07 4.070 17.40     0     0
13:          Merc 450SL 17.30     8 275.8   180  3.07 3.730 17.60     0     0
14:         Merc 450SLC 15.20     8 275.8   180  3.07 3.780 18.00     0     0
15:  Cadillac Fleetwood 10.40     8 472.0   205  2.93 5.250 17.98     0     0
16: Lincoln Continental 10.40     8 460.0   215  3.00 5.424 17.82     0     0
17:   Chrysler Imperial 14.70     8 440.0   230  3.23 5.345 17.42     0     0
18:            Fiat 128 35.64     4  78.7    66  4.08 2.200 19.47     1     1
19:         Honda Civic 33.44     4  75.7    52  4.93 1.615 18.52     1     1
20:      Toyota Corolla 37.29     4  71.1    65  4.22 1.835 19.90     1     1
21:       Toyota Corona 23.65     4 120.1    97  3.70 2.465 20.01     1     0
22:    Dodge Challenger 15.50     8 318.0   150  2.76 3.520 16.87     0     0
23:         AMC Javelin 15.20     8 304.0   150  3.15 3.435 17.30     0     0
24:          Camaro Z28 13.30     8 350.0   245  3.73 3.840 15.41     0     0
25:    Pontiac Firebird 19.20     8 400.0   175  3.08 3.845 17.05     0     0
26:           Fiat X1-9 30.03     4  79.0    66  4.08 1.935 18.90     1     1
27:       Porsche 914-2 28.60     4 120.3    91  4.43 2.140 16.70     0     1
28:        Lotus Europa 33.44     4  95.1   113  3.77 1.513 16.90     1     1
29:      Ford Pantera L 15.80     8 351.0   264  4.22 3.170 14.50     0     1
30:        Ferrari Dino 19.70     6 145.0   175  3.62 2.770 15.50     0     1
31:       Maserati Bora 15.00     8 301.0   335  3.54 3.570 14.60     0     1
32:          Volvo 142E 23.54     4 121.0   109  4.11 2.780 18.60     1     1
                  model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am
     gear  carb efficiency
    <num> <num>      <num>
 1:     4     4   8.015267
 2:     4     4   7.304348
 3:     4     1   9.827586
 4:     3     1   6.656299
 5:     3     2   5.436047
 6:     3     1   5.231214
 7:     3     4   4.005602
 8:     4     2   7.648903
 9:     4     2   7.238095
10:     4     4   5.581395
11:     4     4   5.174419
12:     3     3   4.029484
13:     3     3   4.638070
14:     3     3   4.021164
15:     3     4   1.980952
16:     3     4   1.917404
17:     3     4   2.750234
18:     4     1  14.727273
19:     4     2  18.823529
20:     4     1  18.474114
21:     3     1   8.722110
22:     3     2   4.403409
23:     3     2   4.425036
24:     3     4   3.463542
25:     3     2   4.993498
26:     4     1  14.108527
27:     5     2  12.149533
28:     5     2  20.092531
29:     5     4   4.984227
30:     5     6   7.111913
31:     5     8   4.201681
32:     4     2   7.697842
     gear  carb efficiency
# Delete columns
dt_copy[, c("carb", "vs") := NULL]
dt_copy
Index: <cyl>
                  model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
                 <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
 1:           Mazda RX4 21.00     6 160.0   110  3.90 2.620 16.46     1     4
 2:       Mazda RX4 Wag 21.00     6 160.0   110  3.90 2.875 17.02     1     4
 3:          Datsun 710 25.08     4 108.0    93  3.85 2.320 18.61     1     4
 4:      Hornet 4 Drive 21.40     6 258.0   110  3.08 3.215 19.44     0     3
 5:   Hornet Sportabout 18.70     8 360.0   175  3.15 3.440 17.02     0     3
 6:             Valiant 18.10     6 225.0   105  2.76 3.460 20.22     0     3
 7:          Duster 360 14.30     8 360.0   245  3.21 3.570 15.84     0     3
 8:           Merc 240D 26.84     4 146.7    62  3.69 3.190 20.00     0     4
 9:            Merc 230 25.08     4 140.8    95  3.92 3.150 22.90     0     4
10:            Merc 280 19.20     6 167.6   123  3.92 3.440 18.30     0     4
11:           Merc 280C 17.80     6 167.6   123  3.92 3.440 18.90     0     4
12:          Merc 450SE 16.40     8 275.8   180  3.07 4.070 17.40     0     3
13:          Merc 450SL 17.30     8 275.8   180  3.07 3.730 17.60     0     3
14:         Merc 450SLC 15.20     8 275.8   180  3.07 3.780 18.00     0     3
15:  Cadillac Fleetwood 10.40     8 472.0   205  2.93 5.250 17.98     0     3
16: Lincoln Continental 10.40     8 460.0   215  3.00 5.424 17.82     0     3
17:   Chrysler Imperial 14.70     8 440.0   230  3.23 5.345 17.42     0     3
18:            Fiat 128 35.64     4  78.7    66  4.08 2.200 19.47     1     4
19:         Honda Civic 33.44     4  75.7    52  4.93 1.615 18.52     1     4
20:      Toyota Corolla 37.29     4  71.1    65  4.22 1.835 19.90     1     4
21:       Toyota Corona 23.65     4 120.1    97  3.70 2.465 20.01     0     3
22:    Dodge Challenger 15.50     8 318.0   150  2.76 3.520 16.87     0     3
23:         AMC Javelin 15.20     8 304.0   150  3.15 3.435 17.30     0     3
24:          Camaro Z28 13.30     8 350.0   245  3.73 3.840 15.41     0     3
25:    Pontiac Firebird 19.20     8 400.0   175  3.08 3.845 17.05     0     3
26:           Fiat X1-9 30.03     4  79.0    66  4.08 1.935 18.90     1     4
27:       Porsche 914-2 28.60     4 120.3    91  4.43 2.140 16.70     1     5
28:        Lotus Europa 33.44     4  95.1   113  3.77 1.513 16.90     1     5
29:      Ford Pantera L 15.80     8 351.0   264  4.22 3.170 14.50     1     5
30:        Ferrari Dino 19.70     6 145.0   175  3.62 2.770 15.50     1     5
31:       Maserati Bora 15.00     8 301.0   335  3.54 3.570 14.60     1     5
32:          Volvo 142E 23.54     4 121.0   109  4.11 2.780 18.60     1     4
                  model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
    efficiency
         <num>
 1:   8.015267
 2:   7.304348
 3:   9.827586
 4:   6.656299
 5:   5.436047
 6:   5.231214
 7:   4.005602
 8:   7.648903
 9:   7.238095
10:   5.581395
11:   5.174419
12:   4.029484
13:   4.638070
14:   4.021164
15:   1.980952
16:   1.917404
17:   2.750234
18:  14.727273
19:  18.823529
20:  18.474114
21:   8.722110
22:   4.403409
23:   4.425036
24:   3.463542
25:   4.993498
26:  14.108527
27:  12.149533
28:  20.092531
29:   4.984227
30:   7.111913
31:   4.201681
32:   7.697842
    efficiency

Keys and Indexing

Setting keys enables fast subsetting and joins:

# Set a key
setkey(dt_copy, cyl)
dt_copy
Key: <cyl>
                  model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
                 <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
 1:          Datsun 710 25.08     4 108.0    93  3.85 2.320 18.61     1     4
 2:           Merc 240D 26.84     4 146.7    62  3.69 3.190 20.00     0     4
 3:            Merc 230 25.08     4 140.8    95  3.92 3.150 22.90     0     4
 4:            Fiat 128 35.64     4  78.7    66  4.08 2.200 19.47     1     4
 5:         Honda Civic 33.44     4  75.7    52  4.93 1.615 18.52     1     4
 6:      Toyota Corolla 37.29     4  71.1    65  4.22 1.835 19.90     1     4
 7:       Toyota Corona 23.65     4 120.1    97  3.70 2.465 20.01     0     3
 8:           Fiat X1-9 30.03     4  79.0    66  4.08 1.935 18.90     1     4
 9:       Porsche 914-2 28.60     4 120.3    91  4.43 2.140 16.70     1     5
10:        Lotus Europa 33.44     4  95.1   113  3.77 1.513 16.90     1     5
11:          Volvo 142E 23.54     4 121.0   109  4.11 2.780 18.60     1     4
12:           Mazda RX4 21.00     6 160.0   110  3.90 2.620 16.46     1     4
13:       Mazda RX4 Wag 21.00     6 160.0   110  3.90 2.875 17.02     1     4
14:      Hornet 4 Drive 21.40     6 258.0   110  3.08 3.215 19.44     0     3
15:             Valiant 18.10     6 225.0   105  2.76 3.460 20.22     0     3
16:            Merc 280 19.20     6 167.6   123  3.92 3.440 18.30     0     4
17:           Merc 280C 17.80     6 167.6   123  3.92 3.440 18.90     0     4
18:        Ferrari Dino 19.70     6 145.0   175  3.62 2.770 15.50     1     5
19:   Hornet Sportabout 18.70     8 360.0   175  3.15 3.440 17.02     0     3
20:          Duster 360 14.30     8 360.0   245  3.21 3.570 15.84     0     3
21:          Merc 450SE 16.40     8 275.8   180  3.07 4.070 17.40     0     3
22:          Merc 450SL 17.30     8 275.8   180  3.07 3.730 17.60     0     3
23:         Merc 450SLC 15.20     8 275.8   180  3.07 3.780 18.00     0     3
24:  Cadillac Fleetwood 10.40     8 472.0   205  2.93 5.250 17.98     0     3
25: Lincoln Continental 10.40     8 460.0   215  3.00 5.424 17.82     0     3
26:   Chrysler Imperial 14.70     8 440.0   230  3.23 5.345 17.42     0     3
27:    Dodge Challenger 15.50     8 318.0   150  2.76 3.520 16.87     0     3
28:         AMC Javelin 15.20     8 304.0   150  3.15 3.435 17.30     0     3
29:          Camaro Z28 13.30     8 350.0   245  3.73 3.840 15.41     0     3
30:    Pontiac Firebird 19.20     8 400.0   175  3.08 3.845 17.05     0     3
31:      Ford Pantera L 15.80     8 351.0   264  4.22 3.170 14.50     1     5
32:       Maserati Bora 15.00     8 301.0   335  3.54 3.570 14.60     1     5
                  model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
    efficiency
         <num>
 1:   9.827586
 2:   7.648903
 3:   7.238095
 4:  14.727273
 5:  18.823529
 6:  18.474114
 7:   8.722110
 8:  14.108527
 9:  12.149533
10:  20.092531
11:   7.697842
12:   8.015267
13:   7.304348
14:   6.656299
15:   5.231214
16:   5.581395
17:   5.174419
18:   7.111913
19:   5.436047
20:   4.005602
21:   4.029484
22:   4.638070
23:   4.021164
24:   1.980952
25:   1.917404
26:   2.750234
27:   4.403409
28:   4.425036
29:   3.463542
30:   4.993498
31:   4.984227
32:   4.201681
    efficiency
# Fast subsetting using key
dt_copy[.(6)]  # All rows where cyl == 6
Key: <cyl>
            model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
           <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:      Mazda RX4  21.0     6 160.0   110  3.90 2.620 16.46     1     4
2:  Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875 17.02     1     4
3: Hornet 4 Drive  21.4     6 258.0   110  3.08 3.215 19.44     0     3
4:        Valiant  18.1     6 225.0   105  2.76 3.460 20.22     0     3
5:       Merc 280  19.2     6 167.6   123  3.92 3.440 18.30     0     4
6:      Merc 280C  17.8     6 167.6   123  3.92 3.440 18.90     0     4
7:   Ferrari Dino  19.7     6 145.0   175  3.62 2.770 15.50     1     5
   efficiency
        <num>
1:   8.015267
2:   7.304348
3:   6.656299
4:   5.231214
5:   5.581395
6:   5.174419
7:   7.111913
# Multiple keys
setkey(dt_copy, cyl, gear)
dt_copy[.(6, 4)]  # All rows where cyl == 6 and gear == 4
Key: <cyl, gear>
           model   mpg   cyl  disp    hp  drat    wt  qsec    am  gear
          <char> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:     Mazda RX4  21.0     6 160.0   110  3.90 2.620 16.46     1     4
2: Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875 17.02     1     4
3:      Merc 280  19.2     6 167.6   123  3.92 3.440 18.30     0     4
4:     Merc 280C  17.8     6 167.6   123  3.92 3.440 18.90     0     4
   efficiency
        <num>
1:   8.015267
2:   7.304348
3:   5.581395
4:   5.174419

Joins in data.table

data.table provides efficient joins using keys:

# Create sample data.tables
manufacturers <- data.table(
  make = c("Honda", "Toyota", "Ford", "BMW", "Mercedes"),
  country = c("Japan", "Japan", "USA", "Germany", "Germany")
)

cars <- data.table(
  model = c("Civic", "Corolla", "Focus", "3 Series", "Fiesta"),
  make = c("Honda", "Toyota", "Ford", "BMW", "Ford")
)

# Set keys for joining
setkey(manufacturers, make)
setkey(cars, make)

# Inner join
cars[manufacturers]
Key: <make>
      model     make country
     <char>   <char>  <char>
1: 3 Series      BMW Germany
2:    Focus     Ford     USA
3:   Fiesta     Ford     USA
4:    Civic    Honda   Japan
5:     <NA> Mercedes Germany
6:  Corolla   Toyota   Japan
# Left join
manufacturers[cars, nomatch=NA]
Key: <make>
     make country    model
   <char>  <char>   <char>
1:    BMW Germany 3 Series
2:   Ford     USA    Focus
3:   Ford     USA   Fiesta
4:  Honda   Japan    Civic
5: Toyota   Japan  Corolla
# Non-equi joins
dt_cars[dt_cars[, .(max_mpg = max(mpg)), by = cyl], on = .(mpg = max_mpg, cyl)]
              model   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear
             <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
1:   Hornet 4 Drive  21.4     6 258.0   110  3.08 3.215 19.44     1     0     3
2:   Toyota Corolla  33.9     4  71.1    65  4.22 1.835 19.90     1     1     4
3: Pontiac Firebird  19.2     8 400.0   175  3.08 3.845 17.05     0     0     3
    carb
   <num>
1:     1
2:     1
3:     2

Reshaping data

data.table provides functions for reshaping data:

# Create a sample data.table
dt <- data.table(
  id = rep(1:3, each = 2),
  variable = rep(c("height", "weight"), 3),
  value = c(170, 68, 155, 52, 182, 75)
)
dt
      id variable value
   <int>   <char> <num>
1:     1   height   170
2:     1   weight    68
3:     2   height   155
4:     2   weight    52
5:     3   height   182
6:     3   weight    75
# Wide to long
dt_wide <- dcast(dt, id ~ variable, value.var = "value")
dt_wide
Key: <id>
      id height weight
   <int>  <num>  <num>
1:     1    170     68
2:     2    155     52
3:     3    182     75
# Long to wide
dt_long <- melt(dt_wide, id.vars = "id", variable.name = "measure", value.name = "value")
dt_long
      id measure value
   <int>  <fctr> <num>
1:     1  height   170
2:     2  height   155
3:     3  height   182
4:     1  weight    68
5:     2  weight    52
6:     3  weight    75

Performance Benefits

data.table is designed for performance:

# Create a larger dataset for demonstration
set.seed(123)
n <- 1e5
dt_large <- data.table(
  id = 1:n,
  x = sample(1:100, n, replace = TRUE),
  y = sample(letters[1:5], n, replace = TRUE)
)

# Measure time for a grouped operation
system.time(dt_large[, .(mean_x = mean(x)), by = y])
   user  system elapsed 
      0       0       0 
# Compare with equivalent dplyr operation (if dplyr is installed)
if (requireNamespace("dplyr", quietly = TRUE)) {
  df_large <- as.data.frame(dt_large)
  system.time(
    dplyr::summarise(dplyr::group_by(df_large, y), mean_x = mean(x))
  )
}
   user  system elapsed 
   0.01    0.00    0.03 

data.table is particularly valuable when working with large datasets due to its efficient memory usage and optimized C implementation.