Text Mining: Cujo


When it comes to my least favourite King novels, Cujo is third. Why? It’s disjointed, for one; a lot of the book is taken up by the foibles of the Sharp Cereal Professor and honestly I can’t bring myself to care enough about the dying art of marketing kid’s cereals in the early 1980s. Also, the Trentons are not sympathetic characters. Look, I’ve written elsewhere about how your characters don’t necessarily need to be likable. I’ve gone off at length about how needing your characters to be the reader’s best friend is just a trap that encourages an immature fanbase that will rise up and kidnap you when you decide to kill those characters off…

Wait, actually, I think that was Misery.

Anyway, the Trentons are middle-class assholes. Vic somehow seems to be completely oblivious to the fact that his wife is feeling trapped in their bourgeois life of comfort and Donna spins in place, rebelling against all that awful success that has her feeling like youth has passed her by. Rather than, I don’t know, taking up a hobby or volunteering in her community, she decides to lie, by having an affair with the dirtbag painter/poet who seems more than a little nuts. I mean, if you want to have sex with someone else because you’re feeling bored, maybe communicate some of those feelings with your significant other. So many of the problems in this book could have been solved if the characters would just talk to each other about more than petty surface-level bullshit. Meanwhile, little Tad is having the psychic equivalent of a nervous breakdown but his parents are too self-absorbed to even notice.

The Chambers family, meanwhile, is much more sympathetic, despite being the nominal antagonists, sort of. Cujo is the antagonist, sure, because he kills a lot of people (including Sheriff Bannerman from The Dead Zone) but he’s just a rabid dog. Poor guy. Joe Chambers should have gotten him his rabies shots, but in the end Cujo’s rabies are the product of a very tragic, very archetypal human story. Joe is a solid, working class man who’s been bitten both by the drink and by the bigotry of low expectations. He’s an alcoholic, sure, but his place in rural Maine society is such that no one expects anything else out of him. He drinks, fucks off work, fishes with the boys, and lives that good ol’ boy life that marks him out as a certain sort of person. His son Brett is following quickly in his footsteps, and that’s why Charity Chambers is desperate to get him away from his father’s influence. When her lottery win comes along, it’s the perfect chance to do just that, and herein is a far more interesting tale than the Trenton’s narcissistic story. Will the boy become the man? Will the son become the father? Can Charity make sure her son grows up right, and not become the latest in a long line of Maine tosspots?

Plus, King breaks the cardinal rule of mainstream horror, which is that if one of your main characters is a kid, they have plot armour. When Tad dies, I prefer to think of it as the cocaine talking, rather than King, because he should know better. He’s seen all the movies. I know, I’ve read Danse Macabre.

Anyway Cujo looks like this:


So we have here a book whose major emotional volume seems to occur at the beginning and through the middle; the end tends to be quite a bit quieter.


Unlike a lot of King’s previous books, the sentiment in Cujo seems more evenly distributed. Also of note is that nothing goes over the +20 mark. The full stats range looks like this:

Min: -61

Max: 19

Median: -9

Mean: -12.36

Fairly negative, as far as King books go.


There’s a lot of negative spikes here but there’s three that stand out as guideposts for the book: 10, 45, and 77. They divide the book into three basic parts, with the last one of course being quite short (which seems to be a King hallmark). 10 kicks off Cujo’s downward spiral, 45 is where Donna and Tad arrive at Chez Chambers only to find a rabid dog prowling the yard, and 77 is where Vic realizes that Tad and Donna are missing and the whole “Tad dreaming about dying” thing comes crashing down on him.

Note that the negative sentiment peak isn’t even where Tad dies. That’s not even 80, the little spike that occurs after – that’s where Cujo dies. Tad’s death is relegated to the muckery that comes near the end of the book, near to the zero-mark. Like Wilson in The Naked And The Dead, Tad Trenton’s death occurs between one moment and the next, and his passing isn’t even surrounded with sentiment cues.

Which brings up my ultimate beef with this book: Tad exists in this book just to die. We don’t focus on him much, his story gets lost among everyone else’s and he exists in the car just to provide a foil for Donna to try to keep going. He’s a sketch of a character and his death is used as a token in the ongoing saga of the Trenton Marriage.

So there’s that.

Finally, the word contributions:


Note that “hot” is quite misleading here; Tad dies of thirst in a hot car, but “hot” is counted as positive here, meaning the book is probably more negative than the mean indicates.


Text Mining: Roadwork



The third Bachman novel, Roadwork, is another portrait of a seethingly angry man acting out against his grievances with society. In Rage, the protagonist dealt with his anti-social angst by taking his classroom hostage and killing two teachers. In The Long Walk, the protagonist deals with it by joining a ghastly game show that runs people down to their deaths. Roadwork is a little less kinetic than either; the protagonist here, George Dawes, simply gives into inertia and refuses to progress along with everyone else. A highway extension is slated to destroy an old suburban neighbourhood and Dawes is in charge of finding both a new house to live and a new location for the industrial laundry he works for. In an act of rebellion against the inherent unfairness of the situation, he decides to do neither. He refuses to vacate his property, and ends up getting shot and killed in a stand-off with the police.

In other words, it’s a very American story of the late 1970s, despite being set in the earlier part of the decade. Everyone else in George’s life has decided to go along with the change – say goodbye to the old neighbourhood, the old workplace, the old way of life. Things are getting meaner, and the little guy doesn’t have anyone to stand up for him anymore, not really. The oil crisis is settling in, the recession is hitting hard, and, as Bruce Springsteen would point out around the same time Roadwork was published, the good jobs were gone and they weren’t coming back. George lost his young son to brain cancer; now he’s losing his house and his workplace. Had he been faced with this crisis 43 years later, he would have been a Trump voter; like Trump, George just wanted to burn the whole thing down.

Here’s what it looks like:


What leaps out immediately is that it’s very front-loaded. All of the emotional heavy lifting seems to occur in the first half of the book. 25 is the last section of any emotional weight until just before the end, and it’s coincidentally the beginning of the last third of the book, covering the events of January 1973 and the end of George’s life.

The line graph shows it even more dramatically:



The first half is dominated by big negative spikes and then, after the December-January changeover of 24-25, it sets into an even keel that trends slightly downward, in a muttering sort of way. George’s final days start with the highest positive peak of the book before settling down into their inevitable violent end.

As for the negative peaks, chapter 8 is where he meets up with mobster/car dealer Sal, who sets in motions the events that will eventually lead to George’s standoff with the police. 17 is where George takes a call and finds out his work family is shattered: his old boss is trying to find out if George was embezzling and his old co-worker Arnie killed himself. Also of interest is that the prologue of the book starts off highly negative, which makes sense since Dawes gets man-on-the-street interviewed and calls the developments “a piece of shit” and there isn’t much positive language to offset the negative aspects.

40, of course, is the final standoff, set to Let It Bleed.

The smooth line graph shows the same thing:


George’s life bottoms out around chapter 15 – the chapter where he meets up with his wife Mary and she lays on him in no uncertain terms the cold fact that wishing the construction work away won’t make their house continue standing. Our emotional reading of the situation rises from there, as George continues to refuse to deal with reality. This height tops off around chapter 26-27 – the beginning of that last January, as mentioned above – before hurtling rapidly downhill.


There’s the distribution for Roadwork. 70% or so of the book occurs within 20 and -20, but there are a lot of instances outside of that as well, making a map that gives those big spike points but also a whole third of the book that stays colouring within the lines.

The stats on this one look like this:

MIN: -85

MAX: 40


MEAN: -2.476

Interesting that, despite such deep spikes of sentiment, the mean sentiment score for the book is quite close to zero. It actually has the most positive mean score of any of the Bachman books. That seems odd, but then consider the other Bachman books.

Finally, the word scores, for anyone who actually cares about them:


(These will be more interesting when compared with each other and the overall corpus of King’s work in general).

Text Mining: Firestarter


Firestarter: another classic King tale of a troubled young girl who develops strange psychic powers and uses them to literally burn people alive. Charlie and her dad are chased by a mysterious U.S. alphabet agency bent on weaponizing the intersection of science and paranormal research. Half the book is the chase; the other half is the catch, and that combination makes for some interesting results, as we’ll see.

The stats:

MIN: -237

MAX: 28


MEAN: -5.734

Just as a quick aside, the min/max values themselves aren’t going to show much in the end, as a lot of it depends on length of chapters and how King divides them up (or in cases how I divide them up, like when there are no traditional chapter breaks. I include them for the sake of completion but they’re not neatly comparable across texts. The median is a little more useful but the mean value will likely be the most useful statistic that comes out of the NRC sentiment analysis.

So, Firestarter looks like this:


Notice the values on the Y axis. Negative scores range down to -1000, although only on a couple of occasions. There are two bombs dropped in terms of emotional sentiment in this book, and, interestingly, not a lot of major activity for large parts of the book.

There’s some mild activity right at the beginning, as we go in media res to Andy and Charlie’s first escape from the Shop. Then there’s nothing until the early 30s when the showdown at the farm happens. Then the twin bombs, without much between or after them.


In fact, 90% of the book is ranged between 50 and -50, which is pretty calm overall.


Based on where most distribution tends to fall, these are the peaks. Note there is movement along the line but it remains within a definite range, briefly going down in a very minor way in 31 and 37 – the former where we meet psychotic antagonist John Redbird, and the latter where Charlie blows the shit out of the Shop men who’ve come to the farm to abduct her.

In fact, except for 53 and 66, it keeps an even keel. 53 and 66, incidentally, hit those peaks because they are a lot longer – all the big action happens in them. 53 is where Charlie and her dad finally get picked up by the Shop. 66 is where everyone confronts each other and the Shop gets ripped apart and Charlie’s dad finally finishes the job of dying.

This who pattern is interesting because of a Goodreads review I came across. GR user Councillor panned it with a one-star rating and his thoughts on it can be summed up like this:

“Firestarter was one of the most boring, long-stretched, boring, uninspiring, badly-written and – did I mention this already? – boring novels I have read for a long time.”

At one point he bemoans the idea that there is nothing to sink yourself into for 350 pages and guess what? The sentiment analysis bears this out. There are a number of stretches of this book where nothing happens, emotionally speaking. The line graph above shows what looks more like a slow slog than anything else.

Look, I liked Firestarter, but there’s definitely a whole lot of nothing going on for long parts of it. When it hits, it hits hard, but a lot of it is Charlie being a scared kid and Andy trying to soothe her while worrying about blowing out his brain, intentionally or accidentally.


The smooth line graph shows this quite well, I feel. From 0 until about 50 or so there’s nothing going on in terms of an up or down movement in sentiment. That’s 62.5% of the book. Councillor is really on to something here: unlike a lot of other King books, this one looks stretched out, staid, and not very interesting if examining it solely on a statistical basis. Some objective empirical evidence to support a feeling about a book is always a result of interest.

To cap it off, here are the word contribution scores:




Text Mining: The Dead Zone


You want to talk about an out-there outlier for what we’ve seen of Stephen King’s bibliography so far, let’s talk about The Dead Zone.

A quick run-down: John Smith suffers a head injury as a kid but comes out mostly ok. Greg Stillson is a crazy but wildly charismatic traveling salesman. Johnny becomes a teacher, falls in love, and then is driven into a coma by a car accident. When he emerges he has wild psychic powers where he can touch people and know both their secrets and their future. He endures some tabloid celebrity, solves murders, tries to keep teaching and being normal, saves some kids from dying, and then discovers that Stillson, now running for office, is going to win and eventually become President briefly before destroying the world in a nuclear holocaust. Johnny becomes a would-be assassin, dying but also revealing Stillson to be a huge coward and an electoral loser after he grabs a kid as a human shield. It’s a timely examination of the American hunger for an end to the seemingly endless corrupt two-party circus and a bit of a satire of the then-blossoming American Tabloid market.

Continue reading

Text Mining: The Long Walk


Now that we’ve established that there is a link between key scenes in the plot progress of a Stephen King novel and mapped sentiment peaks coded from the text, we can spend significantly less time on analyzing each peak to show this. This will allow us to go through books with a little less ponderous text.

The Long Walk (1979)

The Long Walk is another short Bachman novel about sexually frustrated young men. This time it’s about the contestants of a gruelling, cruel national sport instituted after America’s loss in the Second World War and the institution of military rule by “The Squads.” The backdrop is briefly described but evocative for that when it is mentioned. At any rate, the protagonist is one of 100 contestants who start the Long Walk. They have to keep walking at a certain speed or they are shot by soldiers who are driving around beside them. They get three warnings to get their speed back up, otherwise the guns ring out and down goes another contestant. It’s a pretty horrifying idea when it comes right down to it, if only for how weirdly plausible it is given the modern love of both spectacle and fascism. It’s also pretty psychologically taxing, especially once the weakest contestants die off and it becomes a game to walk your opponents into the ground.

Continue reading

Text Mining: Salem’s Lot



SALEM’S LOT (1975)

Alright, now that we’ve established there’s some preliminary evidence of a link between emotional sentiment peaks and the plot progress of a Stephen King novel let’s keep going so we can start to see if there are patterns and also to generate a corpus of King material that we can use for topic modeling and other fun supervised/unsupervised machine learning stuff.

So, let’s go to the Lot as it slowly turns into a vampire colony.

Continue reading

Literary Fun With Text Mining


My wife is doing her PhD in political science on the topic of political interest groups and how they use social media to disseminate information and reach new audiences, and how they utilize this new(ish wow we’re old) medium to effect voting behaviour. Part of this has meant learning how to mine Twitter data and analyze it through the R programming language; in order to provide technical support and to have someone to troubleshoot coding issues, I’ve also been learning to use R to mine and analyze texts. What I’ve been concentrating on, in order to learn the language and the processes, is using it to mine and visualize data gathered from fictional texts, specifically the bibliography of Stephen King. What I want to do is to analyze plot trajectories drawn from sentiment data – quantitative measures of emotional sentiment words based on established dictionaries used for that sort of thing. Research questions on this would include things like: is there a pattern that King has for his plots, based on emotional language cues? Is this pattern, if any, different from other well-known horror writers? Furthermore, are there established “archetypal” emotional plot patterns for horror books, and do these patterns differ when you switch genres – say, to fantasy, military science fiction, paranormal romance, etc. etc. down the fracture lines of human experience.

Continue reading