Text Mining: The Shining


I completely skipped The Shining somehow, so we’ll circle back and do that one now.

The Shining (1977)

Stephen King’s third novel finds him cycling through doing his own take on all the classic horror bits: the avenging revenant of Carrie, updating Bram Stoker’s Dracula to the modern (in 1976) age in Salem’s Lot, and now the Haunted House – in this case, a whole haunted hotel. There’s an element of Shirley Jackson’s The Haunting Of Hill House in Salem’s Lot as well; the house that the villain Barlow moves into in the Lot is a long-time haunted house inhabited by cursed individuals.  The Overlook Hotel has been the destination of rich, shady people since it’s inception and by the time full-time alcoholic/on-his-last-chance writer Jack Torrence comes around to be it’s winter caretaker, it’s charged with their energies: the awful, unspeakable emotions that were left behind and whose ghosts now bestow a strong, malevolent force of will upon the hotel.

Making matters worse, Jack’s sweet son Danny has psychic powers called “shining” or “the shine.” These powers lend him a sensitivity to the problems with both the hotel and his father; he’s taught something of them from Magic-Negro-In-Chief Dick Hallorann, After the winter storms come to the mountains of Colorado, Jack, Wendy, and Danny Torrence are locked away in the hotel, dealing with boredom, dread, and madness. As you might imagine (or as you probably saw in Kubrick’s adaptation), murder ensues, although in the book it doesn’t actually ensue. In comparison to King’s other novels there’s very little outright death – funny, given it’s a haunted house novel full of ghosts. The only person who dies is Jack himself, though; Wendy, Danny, and (despite what you saw on film) Dick all survive to relax by a pool in the novel’s last chapter. The dead that haunt the Overlook are the manifestations of dead memories; the people they depict didn’t necessarily die at the hotel, save of course for the bathtub suicide. In this sense, Kubrick’s film misses the point entirely.

The difference between the book and the movie goes like this:

The Shining, the Stephen King novel, is a book about a man struggling to overcome his addictions and his own innate nature in order to be the best husband and father he can be for his family. It’s a book about how the past is a millstone hung around our necks by our parents, and about how that millstone will try to drag us down whenever we let our guard down. Jack Torrence’s father was a sodden, abusive drunk and Jack Torrence is afraid that he’s really no different. He knows the hotel gig in Colorado is his last chance; if he fucks this up, his wife and kid are leaving him and he’s probably going to kill himself. He struggles so hard against it but eventually it drags him under; he manages to defeat that base nature in the end and saves his family, although he destroys both himself and the hotel. The ghosts in the hotel are a part of this metaphor – they are the negative, awful emotions of the past lingering to trip Jack up and make him into what he most fears.

The Shining, the Stanley Kubrick film, is a movie about a haunted hotel.


This is actually one of those instances where the initial soundwave graph can be a little misleading. Looking at this one might think that the second and third quarter are where the peaks are, but they’re really just showing the overall volume of those chapters, in terms of emotional sentiment. Other graphs show some different takes, which reveal something that brings us back to Salem’s Lot.


Yes, that’s right ladies and gentlemen, just like Salem’s LotThe Shining has a statistically significant negative linear relationship of sentiment over time. The coefficient here is -0.5648 – that is, for every chapter that goes on we get an average drop of about 0.56 in emotional sentiment scores (P=0.0179). Something interesting I found, in terms of it’s similarity to Salem’s Lot, lies in a question asked on Goodreads. GR user Beesarahlee asked:

This is my favorite Stephen King book,I love the how it slowly stresses you out! Anyone on this site know of any other books that are very similar to this?! I love anything to do with witches and vampires! Thanks for the help!

Well Beesarahlee, you are exactly correct. It slowly gets more negative over time in terms of it’s sentiment scores, and so it does in fact slowly stress you out. Excellent observation, and one that we can empirically prove. How much fun is that?

Anyway, The Shining does the same thing. It starts off near neutral and carefully structures the positive and the negative so that it creeps toward the bottom at an even pace. Rather than striking big emotional chords at strategic times, it slowly terrifies you, like a boa constrictor, or that proverbial frog in a pot of water that’s really just a metaphor for human extinction.


The smooth-line graph shows much the same, with the interesting caveat that there is a brief reprieve somewhere in the mid-30s that doesn’t last long before the descent picks up speed again.


The distribution histogram, though, shows that a good third of the book is actually positive sentiment, which is strange to see given the basic stats for it:

Min: -94

Max: 47

Median: -14

Mean: -19.28

That’s quite a low mean sentiment score, given our other examples. The Stand and The Long Walk are the only ones with lower mean sentiment scores, including two books I have the stats and graphs for but haven’t done write-ups on yet. Yet a third of the book is above the zero line.


Here are the positive sentiment peaks for the book, as well as a look at the heartbeat\line graph for the overall piece. 9-12 (with 11 to break up the flow) look like a plateau before everything starts to go to hell; if I had to guess without examining the text, I would think it’s the part where they first move into the hotel and Jack/Wendy/Danny get lulled into thinking that everything’s going to be okay, that they’ll pull through the winter, Jack will finish his play, and life will get better from there.

Examining the text, it turns out I’m partially correct, just moved slightly “to the left”. 9 is where the family meets with Ullman, technically Jack’s boss and the manager of the Overlook Hotel. 10 introduces Hallorann, 11 is actually the introduction to the concept of “shining” (and boy are there some disturbing things embedded in there, mirrored in the sentiment) and 12 is the “grand tour” of the Overlook. Later, on the other side of the crash, at 44, Jack dances and drinks and has a grand old time at the Hotel’s Party For The Dead – the glitter and glamour of the glammer outweighs the sheer amount of mentions of “blood” in that chapter, just to show you how shiny and glorious it seems to Jack.


More negative peaks, of course, but a few things of interest. 16 and 46 are the big negative spikes, and 50-57 is an entire valley of negative sentiment.

16 is where Danny starts falling into weird trances because of the hotel – dreaming about a crashing madman coming through the hallways, getting stung by creepy ghost-wasps, and first croaking “REDRUM” which everyone knows and loves from the film adaptation. It also is the first real kickstart of Jack’s drinking-without-drinking downfall.  46, on the other end of a lot of negative emotion, is where Wendy and Danny play cat-and-mouse with Jack in the hotel amidst a screaming blizzard, and manage to lock him in a room. They make good guideposts for the progression of the novel: 16 is where the hotel actively starts trying to do harm to the family, and 46 is where the hotel has to ramp up it’s efforts to kill Wendy and Danny. The sentiment scores between 16 and 46 are low but higher than that between 50-57, where the hotel attempts it’s endgame and it’s only through Jack’s honest love of his family that he saves their lives, if not his own. It’s interesting, then, to see that the sentiment scores here match the severity of the antagonist’s efforts to do harm to the protagonists.

I guess that while both Salem’s Lot and The Shining have that negative linear relationship between sentiment and time, they have it for different reasons. In Salem’s Lot it’s because, as beesarahlee mentions in her GR review, it is structured to slowly stress you out and get more negative over time. In The Shining it’s because the book is structured in way that brings to mind shifting gears: it starts at one level, a big spike happens, and then it goes on for a time at a lower sentiment level, until it bottoms out right near the end. Not the end, of course, because the final chapter features Wendy, Danny, and Dick relaxing and planning out how to get their lives back in order and it shoots us back up to positive – a happy ending, which is not always in the cards for later King books.

To finish off, the word contributions:





Text Mining: The Long Walk


Now that we’ve established that there is a link between key scenes in the plot progress of a Stephen King novel and mapped sentiment peaks coded from the text, we can spend significantly less time on analyzing each peak to show this. This will allow us to go through books with a little less ponderous text.

The Long Walk (1979)

The Long Walk is another short Bachman novel about sexually frustrated young men. This time it’s about the contestants of a gruelling, cruel national sport instituted after America’s loss in the Second World War and the institution of military rule by “The Squads.” The backdrop is briefly described but evocative for that when it is mentioned. At any rate, the protagonist is one of 100 contestants who start the Long Walk. They have to keep walking at a certain speed or they are shot by soldiers who are driving around beside them. They get three warnings to get their speed back up, otherwise the guns ring out and down goes another contestant. It’s a pretty horrifying idea when it comes right down to it, if only for how weirdly plausible it is given the modern love of both spectacle and fascism. It’s also pretty psychologically taxing, especially once the weakest contestants die off and it becomes a game to walk your opponents into the ground.

Continue reading

Text Mining: The Stand


The Stand (1978)

So…it may behoove you to know that The Stand, King’s gigantic, bloated, sprawling epic, was picked by American adults in 2008 as their fifth-favourite book of all time. The Bible was #1 – this is America that was being polled, after all – but The Stand kept company with other books you may be familiar with: Gone With The WindThe Lord Of The Rings, and the Harry Potter series. Generational touchstones, in other words. As a further fact, Generation X picked it as their #1 favourite (again, behind the Bible). That’s some big company, so an examination of this one should yield some interesting results.

Continue reading

Text Mining: Rage


Rage (1977)

Today we turn our attention to the first Richard Bachman book, Rage, a book that lives up to it’s name in as pure a fashion as you could imagine. If you haven’t found a copy of this yet, you might want to get on that: they aren’t making any more of them, at the behest of the author. As the events depicted in the book came into depressing vogue in the 21st Century, King feared that the portrayal of Charlie Decker would give aid and comfort to others in similarly desperate emotional situations.

It’s about a school shooter, you see.

Continue reading

Text Mining: Salem’s Lot



SALEM’S LOT (1975)

Alright, now that we’ve established there’s some preliminary evidence of a link between emotional sentiment peaks and the plot progress of a Stephen King novel let’s keep going so we can start to see if there are patterns and also to generate a corpus of King material that we can use for topic modeling and other fun supervised/unsupervised machine learning stuff.

So, let’s go to the Lot as it slowly turns into a vampire colony.

Continue reading

Text Mining: Intro + Carrie


As mentioned in my previous post I’m examining Stephen King texts through the magic of text mining, using a number of tools in the R language, but especially through Julia Silge’s tidytext package. The book Text Mining With R: A Tidy Approach by Julia Silge and David Robinson was a godsend in explaining the process of using tidy data formats to store and analyze text-as-data. I will roughly summarize the basics to give you an idea as to what’s involved but there is a great deal more that can be done than I am covering here.

Continue reading

Literary Fun With Text Mining


My wife is doing her PhD in political science on the topic of political interest groups and how they use social media to disseminate information and reach new audiences, and how they utilize this new(ish wow we’re old) medium to effect voting behaviour. Part of this has meant learning how to mine Twitter data and analyze it through the R programming language; in order to provide technical support and to have someone to troubleshoot coding issues, I’ve also been learning to use R to mine and analyze texts. What I’ve been concentrating on, in order to learn the language and the processes, is using it to mine and visualize data gathered from fictional texts, specifically the bibliography of Stephen King. What I want to do is to analyze plot trajectories drawn from sentiment data – quantitative measures of emotional sentiment words based on established dictionaries used for that sort of thing. Research questions on this would include things like: is there a pattern that King has for his plots, based on emotional language cues? Is this pattern, if any, different from other well-known horror writers? Furthermore, are there established “archetypal” emotional plot patterns for horror books, and do these patterns differ when you switch genres – say, to fantasy, military science fiction, paranormal romance, etc. etc. down the fracture lines of human experience.

Continue reading