How to Sample Random Rows from Data Frames in R

March 14, 2026 · 2 min read ·Updated May 28, 2026 ·beginner

rsamplerandomdplyrdata.tableslice

Sample random rows from data frames to create train/test splits, bootstrap resamples, or randomise data order. Use dplyr::slice_sample(), base R’s sample(), or data.table indexing. The three approaches cover different workflows — pick the one that matches your existing pipeline. Always set a seed with set.seed() for reproducible sampling across runs.

library(dplyr)

# Sample 5 rows without replacement
df %>% slice_sample(n = 5)

# Sample 10% of rows
df %>% slice_sample(prop = 0.1)

# Sample with replacement (bootstrap)
df %>% slice_sample(n = nrow(df), replace = TRUE)

Base R uses sample() on row indices: df[sample(nrow(df), 5), ]. For stratified sampling, group with group_by() first: df |> group_by(category) |> slice_sample(n = 5) draws 5 rows from each category. This is useful when you need balanced representation across groups.

# Train/test split with base R
set.seed(123)
idx <- sample(nrow(df))
train <- df[idx[1:70], ]
test  <- df[idx[71:nrow(df)], ]

slice_sample() replaced the older sample_n() and sample_frac() in dplyr 1.0+. The rsample package from tidymodels provides initial_split(), training(), and testing() for structured train/test splits with stratification and cross-validation. The weight_by argument in slice_sample() accepts a vector of weights for probability-proportional-to-size sampling. If you need reproducible random numbers across R sessions, use set.seed() with a fixed integer before any sampling call.

See also