Rows with distinct combinations of columns
An instance of ExprBuilder.
Which columns to use to determine uniqueness.
See details below.
Indices of rows to return for each unique combination of the chosen columns. See details.
Logical. Whether to apply rlang::parse_expr()
to obtain the expressions.
If .keep = TRUE
(the default), the columns not mentioned in ...
are also kept. However, if
a new column is created in one of the expressions therein, .keep
can also be set to a character
vector containing the names of all the columns that should be in the result in addition to the
ones mentioned in ...
. See the examples.
The value of .n
is only relevant when .keep
is not FALSE
. It is used to subset .SD
in
the built data.table
expression. For example, we could get 2 rows per combination by setting
.n
to 1:2
, or get the last row instead of the first by using .N
. If more than one index is
used, and not enough rows are found, some rows will have NA
. Do note that, at least as of
version 1.12.2 of data.table
, only expressions with single indices are internally optimized.
To see more examples, check the vignette, or the table.express-package entry.
data("mtcars")
# compare with .keep = TRUE
data.table::as.data.table(mtcars) %>%
distinct(amvs = am + vs, .keep = names(mtcars))
#> amvs mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: 1 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4
#> 2: 2 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
#> 3: 0 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2