Sentiment Analysis in R
Sentiment analysis assigns emotional scores to text, revealing whether opinions are positive, negative, or neutral. This tutorial covers lexicon-based sentiment analysis using the tidytext package, the most common approach for getting started with text emotion classification in R.
Prerequisites
You should be familiar with tidytext basics—tokenization, stop word removal, and word frequency analysis. If you need a refresher, work through the Tidytext Basics tutorial first.
Sentiment Lexicons
The tidytext package provides several sentiment lexicons. Each word receives a score indicating its emotional tone.
Getting Started with Lexicons
library(tidytext)
library(tidyverse)
# View available sentiment lexicons
get_sentiments()
# Load the AFINN lexicon (scores from -5 to +5)
afinn <- get_sentiments("afinn")
# Load the bing lexicon (binary positive/negative)
bing <- get_sentiments("bing")
# Load the nrc lexicon (emotions)
nrc <- get_sentiments("nrc")
Understanding Lexicon Structure
Each lexicon structures sentiment differently:
# AFINN: numeric scores
afinn %>%
head(10)
# # A tibble: 10 × 2
# word value
# 1 abandom -2
# 2 abbreviate -2
# 3 abdicate -2
# Bing: binary classification
bing %>%
head(10)
# # A tibble: 10 × 2
# word sentiment
# 1 2-faced negative
# 2 2-faces negative
# NRC: multiple emotions
nrc %>%
distinct(sentiment)
# # A tibble: 10 × 1
# sentiment
# 1 anger
# 2 anticipation
# 3 disgust
# 4 fear
# 5 joy
# 6 sadness
# 7 surprise
# 8 trust
# 9 positive
# 10 negative
Basic Sentiment Analysis Workflow
The core workflow joins sentiment scores to your tokenized text:
# Start with tokenized text
text_data <- tibble(
id = 1:3,
text = c(
"I love this product, it is amazing and wonderful!",
"This is terrible, I hate it and would not recommend.",
"The product works as expected, nothing special."
)
)
# Tokenize
tidy_text <- text_data %>%
unnest_tokens(word, text)
# Join with sentiment lexicon
tidy_text %>%
inner_join(bing, by = "word")
This gives you sentiment labels for each word.
Sentiment Scoring Methods
Method 1: AFINN Scores
AFINN provides numeric scores—the most intuitive for overall sentiment:
# Calculate sentiment score per document
sentiment_scores <- tidy_text %>%
inner_join(afinn, by = "word") %>%
group_by(id) %>%
summarise(
sentiment_score = sum(value),
word_count = n()
)
print(sentiment_scores)
Method 2: Bing Binary Classification
Use bing for simple positive/negative counts:
sentiment_counts <- tidy_text %>%
inner_join(bing, by = "word") %>%
count(id, sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(
net_sentiment = positive - negative,
total_sentiment_words = positive + negative
)
print(sentiment_counts)
Method 3: NRC Emotion Categories
NRC lets you analyze specific emotions:
emotion_counts <- tidy_text %>%
inner_join(nrc, by = "word") %>%
count(id, sentiment) %>%
filter(!sentiment %in% c("positive", "negative"))
print(emotion_counts)
Practical Example: Movie Reviews
Let us apply these concepts to a real dataset:
library(janeaustenr)
# Get text from Pride and Prejudice
text <- austen_books() %>%
filter(book == "Pride & Prejudice") %>%
mutate(linenumber = row_number()) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word)
# Calculate sentiment over the book
sentiment_by_section <- text %>%
inner_join(afinn, by = "word") %>%
mutate(
section = floor(linenumber / 80)
) %>%
group_by(section) %>%
summarise(
score = sum(value),
words = n()
)
print(sentiment_by_section)
Visualizing Sentiment Trends
Visualizations reveal emotional arcs in text:
library(ggplot2)
# Plot sentiment over the narrative
ggplot(sentiment_by_section, aes(section, score)) +
geom_col(fill = ifelse(sentiment_by_section$score > 0, "steelblue", "coral")) +
labs(
x = "Section of Book",
y = "Sentiment Score",
title = "Emotional Arc in Pride and Prejudice"
) +
theme_minimal()
Comparing Sentiments Between Texts
Compare sentiment across different sources:
# Compare two books
books <- austen_books() %>%
filter(book %in% c("Pride & Prejudice", "Emma"))
book_sentiment <- books %>%
unnest_tokens(word, text) %>%
anti_join(stop_words, by = "word") %>%
inner_join(afinn, by = "word") %>%
group_by(book) %>%
summarise(
avg_sentiment = mean(value),
total_score = sum(value),
sentiment_words = n()
)
print(book_sentiment)
Handling Negation
Words like “not” or “never” flip sentiment polarity:
# Custom negation words
negation_words <- c("not", "no", "never", "neither", "nobody", "nothing")
# Find negation + sentiment word pairs
text_data <- tibble(
id = 1,
text = "This is not good, I am not happy with this."
) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ")
# Find negated sentiments
text_data %>%
filter(word1 %in% negation_words) %>%
inner_join(afinn, by = c("word2" = "word")) %>%
mutate(value = -value) # Flip the sentiment
print(negated)
This technique reverses sentiment scores for negated words.
What You Have Learned
| Method | Best For |
|---|---|
| AFINN | Overall sentiment score (numeric) |
| Bing | Simple positive/negative classification |
| NRC | Detailed emotion analysis |
Key Takeaways
- Join tokenized text with sentiment lexicons using
inner_join() - Aggregate word-level scores to document level
- Visualize sentiment trends to find emotional arcs
- Handle negation for more accurate analysis
See Also
- dplyr::filter() — Filter rows after tokenization
- dplyr::count() — Essential for word frequency analysis
Next Steps
Continue your text mining journey:
- Topic Modeling with LDA in R — Uncover latent topics in document collections
- Text Classification in R — Build supervised models to categorize text