Sentiment Analysis in R

· 4 min read · Updated March 16, 2026 · intermediate
text-mining sentiment nlp r tidytext

Sentiment analysis assigns emotional scores to text, revealing whether opinions are positive, negative, or neutral. This tutorial covers lexicon-based sentiment analysis using the tidytext package, the most common approach for getting started with text emotion classification in R.

Prerequisites

You should be familiar with tidytext basics—tokenization, stop word removal, and word frequency analysis. If you need a refresher, work through the Tidytext Basics tutorial first.

Sentiment Lexicons

The tidytext package provides several sentiment lexicons. Each word receives a score indicating its emotional tone.

Getting Started with Lexicons

library(tidytext)
library(tidyverse)

# View available sentiment lexicons
get_sentiments()

# Load the AFINN lexicon (scores from -5 to +5)
afinn <- get_sentiments("afinn")

# Load the bing lexicon (binary positive/negative)
bing <- get_sentiments("bing")

# Load the nrc lexicon (emotions)
nrc <- get_sentiments("nrc")

Understanding Lexicon Structure

Each lexicon structures sentiment differently:

# AFINN: numeric scores
afinn %>% 
  head(10)
# # A tibble: 10 × 2
#   word value
# 1 abandom  -2
# 2 abbreviate -2
# 3 abdicate  -2

# Bing: binary classification
bing %>% 
  head(10)
# # A tibble: 10 × 2
#   word sentiment
# 1 2-faced negative
# 2 2-faces negative

# NRC: multiple emotions
nrc %>% 
  distinct(sentiment)
# # A tibble: 10 × 1
#   sentiment
# 1 anger
# 2 anticipation
# 3 disgust
# 4 fear
# 5 joy
# 6 sadness
# 7 surprise
# 8 trust
# 9 positive
# 10 negative

Basic Sentiment Analysis Workflow

The core workflow joins sentiment scores to your tokenized text:

# Start with tokenized text
text_data <- tibble(
  id = 1:3,
  text = c(
    "I love this product, it is amazing and wonderful!",
    "This is terrible, I hate it and would not recommend.",
    "The product works as expected, nothing special."
  )
)

# Tokenize
tidy_text <- text_data %>%
  unnest_tokens(word, text)

# Join with sentiment lexicon
tidy_text %>%
  inner_join(bing, by = "word")

This gives you sentiment labels for each word.

Sentiment Scoring Methods

Method 1: AFINN Scores

AFINN provides numeric scores—the most intuitive for overall sentiment:

# Calculate sentiment score per document
sentiment_scores <- tidy_text %>%
  inner_join(afinn, by = "word") %>%
  group_by(id) %>%
  summarise(
    sentiment_score = sum(value),
    word_count = n()
  )

print(sentiment_scores)

Method 2: Bing Binary Classification

Use bing for simple positive/negative counts:

sentiment_counts <- tidy_text %>%
  inner_join(bing, by = "word") %>%
  count(id, sentiment) %>%
  spread(sentiment, n, fill = 0) %>%
  mutate(
    net_sentiment = positive - negative,
    total_sentiment_words = positive + negative
  )

print(sentiment_counts)

Method 3: NRC Emotion Categories

NRC lets you analyze specific emotions:

emotion_counts <- tidy_text %>%
  inner_join(nrc, by = "word") %>%
  count(id, sentiment) %>%
  filter(!sentiment %in% c("positive", "negative"))

print(emotion_counts)

Practical Example: Movie Reviews

Let us apply these concepts to a real dataset:

library(janeaustenr)

# Get text from Pride and Prejudice
text <- austen_books() %>%
  filter(book == "Pride & Prejudice") %>%
  mutate(linenumber = row_number()) %>%
  unnest_tokens(word, text) %>%
  filter(!word %in% stop_words$word)

# Calculate sentiment over the book
sentiment_by_section <- text %>%
  inner_join(afinn, by = "word") %>%
  mutate(
    section = floor(linenumber / 80)
  ) %>%
  group_by(section) %>%
  summarise(
    score = sum(value),
    words = n()
  )

print(sentiment_by_section)

Visualizations reveal emotional arcs in text:

library(ggplot2)

# Plot sentiment over the narrative
ggplot(sentiment_by_section, aes(section, score)) +
  geom_col(fill = ifelse(sentiment_by_section$score > 0, "steelblue", "coral")) +
  labs(
    x = "Section of Book",
    y = "Sentiment Score",
    title = "Emotional Arc in Pride and Prejudice"
  ) +
  theme_minimal()

Comparing Sentiments Between Texts

Compare sentiment across different sources:

# Compare two books
books <- austen_books() %>%
  filter(book %in% c("Pride & Prejudice", "Emma"))

book_sentiment <- books %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  inner_join(afinn, by = "word") %>%
  group_by(book) %>%
  summarise(
    avg_sentiment = mean(value),
    total_score = sum(value),
    sentiment_words = n()
  )

print(book_sentiment)

Handling Negation

Words like “not” or “never” flip sentiment polarity:

# Custom negation words
negation_words <- c("not", "no", "never", "neither", "nobody", "nothing")

# Find negation + sentiment word pairs
text_data <- tibble(
  id = 1,
  text = "This is not good, I am not happy with this."
) %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ")

# Find negated sentiments
text_data %>%
  filter(word1 %in% negation_words) %>%
  inner_join(afinn, by = c("word2" = "word")) %>%
  mutate(value = -value)  # Flip the sentiment

print(negated)

This technique reverses sentiment scores for negated words.

What You Have Learned

MethodBest For
AFINNOverall sentiment score (numeric)
BingSimple positive/negative classification
NRCDetailed emotion analysis

Key Takeaways

  1. Join tokenized text with sentiment lexicons using inner_join()
  2. Aggregate word-level scores to document level
  3. Visualize sentiment trends to find emotional arcs
  4. Handle negation for more accurate analysis

See Also

Next Steps

Continue your text mining journey:

  • Topic Modeling with LDA in R — Uncover latent topics in document collections
  • Text Classification in R — Build supervised models to categorize text