Rows with distinct combinations of columns

# S3 method for ExprBuilder
distinct(
  .data,
  ...,
  .keep = TRUE,
  .n = 1L,
  .parse = getOption("table.express.parse", FALSE)
)

# S3 method for data.table
distinct(.data, ...)

Arguments

.data

An instance of ExprBuilder.

...

Which columns to use to determine uniqueness.

.keep

See details below.

.n

Indices of rows to return for each unique combination of the chosen columns. See details.

.parse

Logical. Whether to apply rlang::parse_expr() to obtain the expressions.

Details

If .keep = TRUE (the default), the columns not mentioned in ... are also kept. However, if a new column is created in one of the expressions therein, .keep can also be set to a character vector containing the names of all the columns that should be in the result in addition to the ones mentioned in .... See the examples.

The value of .n is only relevant when .keep is not FALSE. It is used to subset .SD in the built data.table expression. For example, we could get 2 rows per combination by setting .n to 1:2, or get the last row instead of the first by using .N. If more than one index is used, and not enough rows are found, some rows will have NA. Do note that, at least as of version 1.12.2 of data.table, only expressions with single indices are internally optimized.

To see more examples, check the vignette, or the table.express-package entry.

Examples


data("mtcars")

# compare with .keep = TRUE
data.table::as.data.table(mtcars) %>%
    distinct(amvs = am + vs, .keep = names(mtcars))
#>    amvs  mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1:    1 21.0   6  160 110 3.90 2.62 16.46  0  1    4    4
#> 2:    2 22.8   4  108  93 3.85 2.32 18.61  1  1    4    1
#> 3:    0 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2