Text Mining: Intro + Carrie


As mentioned in my previous post I’m examining Stephen King texts through the magic of text mining, using a number of tools in the R language, but especially through Julia Silge’s tidytext package. The book Text Mining With R: A Tidy Approach by Julia Silge and David Robinson was a godsend in explaining the process of using tidy data formats to store and analyze text-as-data. I will roughly summarize the basics to give you an idea as to what’s involved but there is a great deal more that can be done than I am covering here.

To begin, I start with the actual document: in this case, Carrie, Stephen King’s name-making 1974 debut novel. The original document was in an epub format that was converted into a raw text file – .txt – through the calibre open library program. This text file is then read into R, converted into a data frame, and divided into trackable chapters. The text is then “tokenized”, that is, divided into separate words, or tokens. These tokens are then compared with a list of “stop words” – common mechanics of the English language like “the”, “and”, or”, etc.  I add some custom stop words in order to take the proper names of characters out: “Miss Dejardins”, “Carrie White”, “Sue Snell”, etc. Part of this is so that most-used word lists as well as term frequency-inverse document frequency calculations aren’t thrown off, and also because a number of sentiment words – like “grace”, “don”, “carol”, “miss”, “sue”, “white”, etc. – are also proper names and thus inclusion would throw sentiment counts off. The token lists are then run against sentiment dictionaries – AFINN and NRC – in order to provide overall sentiment scores on a scale of positive to negative. The sentiment scores are then tracked chapter by chapter and compared with events to show that emotional sentiment scores can provide the same sort of plot trajectory that a traditional summary would, at base.

Carrie (1974)


King’s first novel is fairly short, at around 61,000 words, but it’s one that has etched itself into the mainstream world’s tapestry of uneasy images. It’s primarily a treatise on the horrible things teenagers can do to each other but there’s a feminist reading as well that reveals Carrie to be a rather more sympathetic character, finding her power and using it to tear down the oppressive system that put the question of her sexual self on display and judged it as wanting, from its position of power and privilege. Carrie’s destruction of her high school prom and of the town as a whole becomes in this a revolutionary act, a rebellion against deeply ingrained societal norms. Like all revolutions, it is messy and bloody. Like all revolutions, there are irreversible consequences; Carrie’s death is preordained because there is no structure to her actions, only the spontaneous, violent reaction of a nameless being when provoked to the breaking point by those who would seek to keep their voice from being heard in society.

There’s a certain dour old-world superstition vibe going on in Carrie that I really enjoy. The conceit is that witches are real, but they’re the result of hereditary genetic abnormalities; the witch further up the White line is mentioned briefly as cackling mad and able of performing strange, inexplicable acts while in the middle of fits of early onset dementia. The book ends with a letter from a distant relative of Carrie’s, detailing more instances of this genetic witchery. The Biblical subtext is clear; “Thou shalt not suffer a witch to live”, although the hierarchies of society play pretty fast and loose with what’s considered a witch. Women acting against the oppressive norms of society have always been labeled as witches though, in one sense or another. Margaret White, a deeply, fundamentally religious woman, is determined to ensure that her daughter conforms to an extreme vision of that norm.  For her, “thou shalt not suffer a witch to live” is literal.


Here we have presented the sentiment ranges per chapter in Carrie, using the NRC dictionary list of positive and negative sentiments This is a useful graph to look at “at a glance” to get the overall idea of what the plot looks like. It represents the overall “volume” of the book – where it is emotionally loaded and where it falls quiet for sentiment. In a way it’s much like a soundwave will give you a glancing look at the loud and quiet parts of a sound file; it shows where the emphasis of the book is. If a book were to be considered as a single act of speech, you can tell through this graph where the emphatic syllables would be located. In a rough sense you can see the path of the “beats” that Carrie follows. The big incident at the beginning of the novel – the infamous shower scene where Carrie menstruates for the first time and is humiliated by her classmates – occurs in that first big splash. The book gradually echoes off and gets quieter as Carrie is accepted (on false pretenses) into the upper crust of high school society. This ends with a series of louder emotional pulses as the pig’s blood trick is set up and executed, followed by pulses that trend quite negative – these would be the scenes wherein the fire burns the high school to the ground and that fire spreads and kills a whole lot more people, before a final pulse where Carrie kills her ultimate tormentors and then dies.


The glance graph of Carrie using the AFINN dictionary is similar to what is given by the NRC dictionary, but with a seemingly greater emphasis on negative scores. Also from an aesthetic standpoint it looks a little more like rivulets of blood running down from spatter.


Here we see the top words that contributed to both positive and negative sentiment. “Fell” is by far the biggest contributor to negative sentiment, contributing well over double what it’s leading competitor in positive sentiment, “love”.


This graph is a point-by-point visualization of the overall sentiment score for each chapter mapped out over the course of the book. This is the real meat of the examination – the highs and lows can be tied to specific matching moments. With the help of some rudimentary Paint skills, we can examine some of the more telling points that help to establish the accuracy of the sentiment graphing.


This point, marked off as chapter 4 in the break system, is the scene where Miss Dejardins confronts Carrie after the incident where she is pelted with tampons in the shower. More so than the scene before it, where the actual tampon pelting occurs, this chapter outlines Carrie as a rather pathetic creature, slathered in her own menstrual fluids, unaware of what’s happening to her. It’s a moment of visceral body horror, and the emotional low peak of this section bears that out. The high point that follows it, meanwhile – one of the more positive moments in an overall negative book – is the scene where Miss Dejardins comforts Carrie, brings her to the principal, and sees that she is let out for the rest of the day; she then discusses it gently and kindly with the principal. It’s a calmer, gentler scene than the ones that came before it, and this is born out by it’s higher sentiment score.


The indicated sequence here is the lowest sentiment region of the early part of the book. In 7, the lowest sentiment score, we are introduced to the awful, overbearing relationship between Carrie and her mother in the form of a flashback to a time when Carrie was a young child and her mother had a publicly abusive episode. Carrie’s psychic powers are alluded to here and the chapter is actually quite uncomfortable, given the borderline child abuse that is displayed. 8 has a slightly higher sentiment score, something which can be drawn from the fact that while 7 features Carrie and her mother, 8 is just Carrie alone, dealing with the body horror and shaming that just happened to her and planning revenge in an ephemeral, completely unstructured way. The sentiment score leaps to just under 0 in the next chapter – why the swing in emotional sentiment? The narrative switches in 9 to Carrie‘s other protagonist, Sue Snell, the only girl in the cool clique that seems to have conscience and a sense of guilt. 9 features Sue Snell and her boyfriend Tommy making love and then discussing how Sue feels about what they did to Carrie; it ends with Tommy formally asking Sue to prom (after making love again, of course). The sentiment difference between 7-8 and 9 is marked, and there is why: we switch from bloody, scared, hurt Carrie White to popular, poised, and sexually active Sue Snell. The sentiment embedded in the language bears this narrative switch out.

When we dip back down again, in 10 and 11, it’s because we have switched back: 10 is where Carrie and her mother get into an argument about Carrie getting her period, and 11 is where Miss Dejardins tears into the kids who pelted Carrie with tampons. In 10 Margaret White unleashes the dour, hard-edged religion that she uses to keep her daughter in line, but Carrie fights back and the negativity of the scene is increased by the presence of both of them. 11, meanwhile, is also a fight, this time between Miss Desjardins and Chris Hargensen, one of the main antagonists. 12, however (which I didn’t highlight in this section but you should be able to see it, it’s the biggest positive peak in the entire book) is the scene where the Principal, a sort of meek-and-mild type of man, faces down Chris Hargensen’s scummy lawyer father and threatens to counter-sue. It’s a huge outburst of righteous anger and while it involves a lot of acrimonious wrangling the overall sentiment is highly positive and it fits the scene quite well. The Hargensen’s are obviously used to getting their own way and the principal’s lambasting of the elder Hargensen is a moment of exultation for the reader.


16-18 – the low sentiment region highlighted – is actually when Tommy asks Carrie to prom, and Carrie’s telling Margaret that a boy has asked her to prom. Carrie’s uncertainty as to being asked to prom in the first place shows in the sentiment cues used in the language, and of course Margaret White’s reaction to Carrie having a normal, potentially sexual teenage life is less than enthusiastic. The last line of this region is great, too:

“Upstairs, Momma continued to whisper. It was not the Lord’s Prayer. It was the Prayer of Exorcism from Deuteronomy.”

Again, more of that “thou shalt not suffer a witch to live” dark Christianity that we’ll see so much more of in King’s early work.

21 – the highlighted high-sentiment point above – is the chapter where Chris nominates Carrie and Tommy for Prom King & Queen. She has ulterior motives – it will eventually be interesting to train the collected King corpus data and see if we can start picking out sentiment-reversals like this – but it’s still a bright, shiny moment as far as Carrie knows and the language reveals this. Days before she was the butt of awful adolescent jokes, and now she is going to prom with someone she had a huge crush on and is being nominated for King & Queen. It’s a big moment for Carrie and a positive high point in the text for her.


A couple of interesting low points here. The first is 22, where Billy (Chris’ JD boyfriend) slaughters a pig and catches it’s blood in a bucket. No real surprise as to the negative sentiment of that chapter. The other low point, 25, is where Billy and Chris have rough, borderline-abusive sex and plot out the awful prank they’re going to play on Carrie. Again, no surprise there. 23-24 are where Carrie again gets in a fight with her mother about prom and then Tommy shows up to take her to prom. The elevated sentiment for 24 isn’t surprising since Carrie is full of good feelings (if still uncertain), but the fact that 23 is higher bears some examination. Carrie and her Mom fight again, as I mentioned, and Margaret as always gets very Biblical with her arguments. Carrie, however, is in her prom dress and she feels absolutely beautiful.

“Wearing it gave her a weird, dreamy feeling that was half shame and half defiant excitement.”

Despite the fact that the fight between Carrie and Margaret here gets physically bloody (which probably accounts for the fact that it’s still sub-zero in terms of sentiment score), the language about Carrie’s own buoyant feelings about herself and going to prom carries it to a higher place.  She also refuses to give in to her own misgivings about her mother:

“I love you, Momma,” Carrie said steadily. “I’m sorry.”

That admittance of love for her overbearing mother is likely what raises the sentiment score slightly above the next scene.


The highlighted high point of sentiment – the last such point for the rest of the novel – is when Carrie arrives at prom and is struck with how beautiful and glamourous everything is. She comes and people are nice to her; no one is being mean or tricking her, and Carrie actually starts feeling good about the situation. The rest of the “quiet section” of the plot is scenes from prom, scenes from Sue Snell’s house, where she is at home wondering if she’s pregnant with Tommy’s child, and scenes where Billy and Chris are going to the prom and getting the prank set up. The more negative scenes are Billy and Chris, of course, although the first negative spike after Carrie’s arrival is Carrie feeling wildly uncertain and having visions of blood. You’ll notice that the sentiment scores slowly get more negative throughout this section and that’s a testament to King’s ability to really ratchet up the tension through subtle means before making his big strike.


If you’ve read the book or seen the movie and are following along at home you can probably guess what this scene is. Hint: it involves a bucket of blood. Also, it involves Carrie losing her cool and consigning everyone in the gym to death (although to be fair, it’s not like she specifically sought out the death of everyone at first, it’s just that’s what happens when you combine lots of water with unshielded Seventies-era musical equipment, as Stone The Crows could tell you).

The two higher-scored chapters are interesting, in that people die in them but it’s most implied and off-camera. They are epistolary accounts of what happened after Carrie locked the gym doors and turned all the sprinklers on, and their matter-of-fact style, delivered as a testimony, keeps them from being as viscerally negative as the rest of the book tends to be.

Carrie end

We can uncover two final plot-pillars here with these last negative spikes. The first is the scene where Chris and Billy have fallen asleep after celebrating their awfulness with brutal, animalistic sex. They awake in a panic to find Carrie has arrived to kill them, which she does. The second spike is where Carrie dies in Sue’s arms, confused and hurt to the end. It ends with Carrie dying and Sue getting her period; it’s what we in the business call “a fucking downer ending.”

Carrie epilogue

We can see the epilogue here as well. The tail is rather interesting – because King returns to the epistolary passages to tell the story of “what happens after”, the sentiment scores return to near-neutral; there’s a slight but definite downturn in sentiment right at the end, which is the letter where it’s shown that even with Carrie dead that fey genetic witchery lives on – again, a pretty downer ending, for what it’s worth.

So there’s definitely a correlation between the sentiment peaks and major plot points in the novel, and they seem to by and large correspond correctly – the higher sentiment peaks correspond to the more positively-oriented emotional moments, and vice versa. In terms of this one single novel emotional sentiment scores seem to correspond correctly to actual plot trajectories, but whether this is a singular phenomenon or indicative of a potentially predictive pattern remains to be seen.






One thought on “Text Mining: Intro + Carrie

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s