ELXN43 and #CdnPoli: Exploring Sentiment on Political Twitter

Standard

File it under “Fun Things To Do During An Election.” We’re going to track the sentiment scores of tweets on the ‘official’ Canadian politics hashtag on Twitter, #CdnPoli, first and foremost just to gather the data but also to see if we can learn anything about the political process on Twitter. There are any number of research questions that one can use to approach this data set but for right now we’re just going to use what I refer to only half-jokingly as the Armstrong Method: start with the data and then explore it to see if any patterns leap out at you, before you start making assumptions about it. It’s a much more inductive process and one that this type of data set seems to really need.

So how does this work? First we scrape #CdnPoli, of course. I have a standard script I like to use to do that using the twitteR package in R; I’m aware that RTweet is newer, better (and, more importantly, maintained) and I use that as well, but for some reason I can’t get RTweet scripts to schedule properly using Windows Task Scheduler so for right now the workflow is: scrape tweets initially using twitteR, and then run them through an iterated loop that re-grabs each tweet in RTweet. This is not ideal obviously and takes slightly longer than if I could just get RTweet to schedule properly, but it does have the added bonus of eliminating duplicate tweets without having to take an extra step, so maybe it washes out.

Tweets get scraped on an hourly basis and then stitched together at the end of the day into one csv file. The Hu & Liu sentiment lexicon is used to provide scores for sentiment in each tweet. The Hu & Liu lexicon is a dictionary of around 6800 English words that are coded to denote positive or negative sentiment. If a tweet contains a word in the lexicon, the sentiment score of the tweet is adjusted up or down as necessary. This is a fairly basic application of sentiment analysis but it does allow for some numerical representation of the sentiment of a tweet.

Next, regular expressions are used to identify tweets referring to each of the five ‘parties of interest’ in the 2019 Canadian federal election and to each of the leaders of those parties. Separating each of these out allows us to determine a mean score for tweets mentioning that party or party leader. We can then track that sentiment day-by-day (I mean, we could do it hour by hour but come on, I have grad work I’m supposed to be doing).

Just as a visual example of what this is, here’s a bar chart for Day 1 (September 11th, 2019):

day1

So, on the first day of the campaign, tweets using the #CndPoli hashtag were quite negative with regard to both the Liberal Party and Prime Minister Justin Trudeau; they were also negative (although not quite as negative) with regard to the Conservative Party and CPC leader Andrew Scheer. Positive opinions were characteristic of tweets regarding the New Democratic Party, the Green Party, the People’s Party, and their respective leaders. Overall, the sentiment on #CdnPoli for Day 1 was slightly negative.

One of the possibilities for the sentiment scores we’ll see over the election is that they may be a measure for how energetic the respective campaign is. About 90% of Canada is online, and 61% of Canadians use social media on a daily basis. 77% of Canadians use Facebook, whereas 26% use Twitter. Despite this, Twitter makes for a more interesting data source for two reasons: first, Twitter posts are publicly available and Facebook posts are jealously guarded by Facebook; second, natural limitations on Twitter posting make for a more level playing field with which to analyze posts. Posts cannot be longer than 280 characters, which eliminates the scaling problem that typically occurs when you have texts of wildly differing lengths to analyze. Everyone’s posts are around the same length, so we can compare one tweet to the next in terms of sentiment scores derived from them. The use of hashtags as organizing containers on Twitter makes it easy to track specific phenomena of interest, which means that data gathering can be accomplished with relative ease.

The rationale for sentiment scores being a function of campaign energy lies in the relatively low use of Twitter by Canadians. If only 26% of Canadians are using Twitter, then the ones who are both using the platform and are specifically engaging on an explicitly political hashtag are likely to be more engaged politically than the average person. They are more likely to be partisan supporters of a specific party (or at least a range of parties within a certain ideological range). Thus, we can think of it as persons performing one of two activities: boosting support for their chosen party by expressing positive sentiment, or attacking an opposing party by expressing negative sentiment. If supporters can manage to boost their chosen party more than opposers can manage to bring down their opposing party, then sentiment will rise day-over-day. This represents an energized campaign. If the opposite occurs, and supporters can not express enough positive sentiment to overcome the negative sentiment of their opposite numbers, then sentiment will fall day-over-day. This represents a de-energized campaign.

Granted, there are also users of #CdnPoli who are not hard supporters of one party or another; these can be thought of as ‘undecided’ voters reacting to acute or aggregate events during the campaign. These will also effect sentiment scores on a daily basis; if a particular event causes negative sentiment outside of social media toward a party, then their supporters will have a harder time boosting the party through positive sentiment tweets.

At least, that’s a theory. A proper academic treatment of the subject after the fact will require some grounding in the literature obviously but that’s at least something to hang our hats on with regard to tracking it during the campaign. In the end, we’ll see what the data says.

As an example of tracking change day-over-day, here’s each party by day for the first three days of the campaign:

day1to3

and the same, but for party leaders:

day1to3leaders

Note the upswing for May and Singh and their respective parties after day 2; the first debate occurred on that day and we can see a sharp effect for them and a slight upswing for Scheer. These three leaders participated in the debate, whereas Trudeau chose not to take part (Bernier was not invited). Despite this, we see a downward trend on that day for Scheer’s Conservative Party and a sharp upward trend for the ruling Liberal Party. Part of this likely has to do with perceptions gained from the debate; I’ll post a bar chart of a separate analysis of just the #FirstDebate hashtag later, but the general idea is that Jagmeet Singh won the sentiment battle there, while Scheer underperformed and in fact garnered less positive sentiment than Prime Minister Trudeau, who wasn’t even there.

At any rate, that’s the idea. Let’s see what the data tells us from here on out.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s