City of Raleigh Budget Sentiment Analysis
Posted on April 29, 2018 | 1 minute readPackage Import
Load necessary packages and set one global option.
library(tidyverse)
library(pdftools)
library(tidytext)
library(knitr)
library(kableExtra)
Retrieve File
Download the file from the City of Raleigh website, read that file in as a character vector, and delete the downloaded file from the directory.
download.file("https://cityofraleigh0drupal.blob.core.usgovcloudapi.net/drupal-prod/COR11/FY2018AdoptedBudget20160612.pdf",
"FY2018AdoptedBudget.pdf",
mode = "wb")
txt = pdf_text("FY2018AdoptedBudget.pdf")
unlink("FY2018AdoptedBudget.pdf")
Create Data Frame
Create a page number character vector, create a data frame by binding the page number character vector with the extracted text, and finally “unnest” all of the page text into individual words.
page = as.character(1:length(txt))
df = data.frame(cbind(page, txt))
budget_words = df %>%
mutate(txt = as.character(txt)) %>%
unnest_tokens(word, txt)
Cleaning
Remove stop words and save as clean object, join sentiment lexicon with clean object, and group the object by page and sentiments before summarising.
cleaned = budget_words %>%
anti_join(stop_words)
sentiment = cleaned %>%
inner_join(get_sentiments("nrc"))
sent_count = sentiment %>%
group_by(page, sentiment) %>%
summarise(sent_count = n()) %>%
ungroup() %>%
mutate(page = as.integer(page))
Visualize
Negative Word Table
Word | Word Count |
---|---|
APPROPRIATION | 77 |
BONDS | 50 |
DEBT | 219 |
EMERGENCY | 67 |
EXPENDITURE | 123 |
FEE | 136 |
INCOME | 65 |
RISK | 31 |
TAX | 153 |
WASTE | 122 |
Trust Word Table
Word | Word Count |
---|---|
BUDGET | 400 |
CENTER | 196 |
COUNCIL | 173 |
GRANT | 85 |
IMPROVEMENT | 93 |
MANAGEMENT | 165 |
ORDINANCE | 88 |
PLANNING | 77 |
RESOURCES | 105 |
SYSTEM | 101 |
Tags:R Markdown
budget
sentiment
comments powered by Disqus