How to Sample Random Rows from Data Frames in R
Sample random rows from data frames to create train/test splits, bootstrap resamples, or randomise data order. Use dplyr::slice_sample(), base R’s sample(), or data.table indexing. The three approaches cover different workflows — pick the one that matches your existing pipeline. Always set a seed with set.seed() for reproducible sampling across runs.
library(dplyr)
# Sample 5 rows without replacement
df %>% slice_sample(n = 5)
# Sample 10% of rows
df %>% slice_sample(prop = 0.1)
# Sample with replacement (bootstrap)
df %>% slice_sample(n = nrow(df), replace = TRUE)
Base R uses sample() on row indices: df[sample(nrow(df), 5), ]. For stratified sampling, group with group_by() first: df |> group_by(category) |> slice_sample(n = 5) draws 5 rows from each category. This is useful when you need balanced representation across groups.
# Train/test split with base R
set.seed(123)
idx <- sample(nrow(df))
train <- df[idx[1:70], ]
test <- df[idx[71:nrow(df)], ]
slice_sample() replaced the older sample_n() and sample_frac() in dplyr 1.0+. The rsample package from tidymodels provides initial_split(), training(), and testing() for structured train/test splits with stratification and cross-validation. The weight_by argument in slice_sample() accepts a vector of weights for probability-proportional-to-size sampling. If you need reproducible random numbers across R sessions, use set.seed() with a fixed integer before any sampling call.
See also
- sample(), Base R sampling function
- dplyr::slice(), dplyr row selection functions
- head(), Select first rows