Using Twitter to Better Understand the Spatiotemporal Patterns of Public Sentiment: A Case Study in Massachusetts, USA
This study aimed to investigate mobility patterns of Twitter users across space and time and understand users’ opinions and sentiments about different public spaces. Building on previous studies of quality of life in urban spaces, Cao et al. categorized twitter data based on sentiment, to try to isolate more subjective quality of life factors. They were more focused on the locational property of the tweet rather than the textual content, emphasizing the land use and time aspect of the tweets, and assigning a simple sentiment score to rate the Tweet’s content. For their analysis, they used over 800,000 georeferenced tweets collected within Massachusetts, along with high-resolution land use data to ultimately reveal the spatiotemporal variations of public sentiment. This study is more inductive, because the researchers do not start out with thoery or hypothesis driven approaches. While they do cite other literature on creating positive urban spaces and quality of life factors, this research is more investigative than deductive.
The authors collected Twitter data from November 31, 2012 to June 3, 2013. They collected data within a one-mile radius of all private and public schools in Massachusetts, restricting the data to geotagged tweets accurate to a mobile phone’s location. Cao et al. explains that they use schools with a 1-mile radius to encompass basically all urban and suburban areas in the state, yet they did not do a keyword search as we have done to get data.
They then classified each tweet into a land use type, based on its geolocation, and a temporal category, based on periods of a day. They use a the IBM Watson Alchemy API to quanitfy the sentiments of Tweets on a large scale. While the authors are clear about this designation, this case study is not reproducible because of the limitations on accessing historical Twitter data.
While this study may not be reproducible, it is replicable. They clearly define the computational sentiment analysis that they conducted to quantitatively identify the users’ polarity with sentiment scores from the collected tweets on a large scale. They also provide their multivariate linear mixed-effects model that they used to statistically reveal the prevalence of users’ sentiment across different geographical locations and time periods. For these analyses, they used R version 3.4.0 with packages, stating, “The “plyr” package was used to count the number of users across land use categories and time periods. The heat maps were plotted out by using the “ggplot2” package, and further polished by using the “ggthemes”, “scales” and “viridis” packages.” They also used the “nlme” package in their regression model. I think the methods in the paper are well-defined, so a researcher could conduct a similar case study today on different data. They found that on average, users’ sentiments tended to be skewed slightly negative, with the most negative sentiments in areas of farmland, industry, and transportation. The overall polarity of public areas tended to be neutral, with positive and negative tweet scores distributed evenly throughout. I think that a similar study today, using more recent data, might find more negative sentiments overall, but still I think the concepts and methods in this case study are replicable.
Reference: Cao, X., MacNaughton, P., Deng, Z., Yin, J., Zhang, X., & Allen, J. (2018). Using twitter to better understand the spatiotemporal patterns of public sentiment: a case study in Massachusetts, USA. International journal of environmental research and public health, 15(2), 250.