rguides

How to Sample Without Replacement in R with sample()

Sample without replacement in R using sample() with replace = FALSE (the default) to draw unique values from a population. Each element can only be selected once. This is the typical choice for train/test splits and random subset selection where duplicates would be meaningless. For data frames, combine with dplyr::slice_sample() or row-index subsetting.

# Sample 3 numbers from 1 to 10 without replacement
sample(1:10, size = 3)
# [1] 4 9 2

# Shuffle all elements
sample(1:10)
# [1]  3  9  5 10  6  1  2  4  8  7

For data frames, use dplyr::slice_sample(n = 5) or subset by indices: df[sample(nrow(df), 5), ]. Always set a seed with set.seed() for reproducible sampling. Since R 3.6, the default random number generator changed, so record the R version alongside your seed for full reproducibility.

# Create train/test indices without replacement
set.seed(42)
indices <- sample(100, size = 70)
train <- indices
test  <- setdiff(1:100, indices)

Weighted sampling uses the prob argument: sample(x, n, prob = weights) where weights is a vector of selection probabilities the same length as x. sample() requires size <= length(x) when replace = FALSE; requesting more samples than the population raises an error. The weights do not need to sum to one — R normalises them automatically.

See also

  • sample(), Base R sampling function
  • rep(), Repeat values (with replacement)
  • setdiff(), Set difference operations