Literary Fun With Text Mining

Standard

My wife is doing her PhD in political science on the topic of political interest groups and how they use social media to disseminate information and reach new audiences, and how they utilize this new(ish wow we’re old) medium to effect voting behaviour. Part of this has meant learning how to mine Twitter data and analyze it through the R programming language; in order to provide technical support and to have someone to troubleshoot coding issues, I’ve also been learning to use R to mine and analyze texts. What I’ve been concentrating on, in order to learn the language and the processes, is using it to mine and visualize data gathered from fictional texts, specifically the bibliography of Stephen King. What I want to do is to analyze plot trajectories drawn from sentiment data – quantitative measures of emotional sentiment words based on established dictionaries used for that sort of thing. Research questions on this would include things like: is there a pattern that King has for his plots, based on emotional language cues? Is this pattern, if any, different from other well-known horror writers? Furthermore, are there established “archetypal” emotional plot patterns for horror books, and do these patterns differ when you switch genres – say, to fantasy, military science fiction, paranormal romance, etc. etc. down the fracture lines of human experience.

So to start I’ll be going book by book through the King bibliography and presenting what are basically preliminary findings based on the sentiment dictionaries included in the quanteda package for R: Afinn, Bing et al, and the NRC emotional sentiment dictionary. Ultimately none of these will be ideal; a custom dictionary for emotional sentiment specifically in literature would be necessary to really capture a more accurate picture, but this is where linguistics comes in and I don’t have much formal training in that area. My on-paper expertise is in English literature and political science, and while Stuart Soroka’s Lexicoder program is what I’ll likely use to build and code the custom dictionary, the Lexicoder topic dictionaries that exist to date are meant to examine political speech rather than literary texts. Building my own will take a lot of time and research.

This should be fairly quick work up until about 1989 or so – The Dark Half, at any rate. I happen to think it’s the absolute nadir of his oeuvre, a self-indulgent author-insert story used to deal with the professional regret of outing his pseudonym and failing his dream of being Donald Westlake/Richard Stark. From Carrie to Tommyknockers, though, I’m familiar enough with the books that I can look at specific chapters pointed out on the graphs and quickly grasp what they mean in the story as a whole. After that…I’m going to end up having to read a number of King books I haven’t actually read yet, like Dolores Claiborne, The Girl Who Loved Tom Gordon, and any of the newer crime trilogy books he’s written. I mean, I guess it’s as good an excuse as any, right? I plan on running the data through the process and then reading, to see what kind of predictive power is embedded into the visualized data.

There is minimal pre-processing being done to the texts. They are epubs that are being converted to .txt files through calibre. They are then trimmed to get rid of all the extraneous matter – the list of other books, the reviews of other books, the acknowledgements, the endless introductions, and in some cases the other stories tacked on after the main story is finished. An example of this is the inclusion of “One For The Road” and “Jerusalem’s Lot” at the end of “Salem’s Lot”. Both stories are included in short fiction collections so that isn’t much of a concern. The texts are also gone through to ensure a certain similarity in terms of chapter breaks; these are necessary since chapter breaks are how I am tracking plot progress as the x variable. Carrie, for example, has no chapter breaks; breaks were inserted at the beginning of each epistolary passage, since those marked natural breaks in the story. Salem’s Lot meanwhile has sixteen chapters, but each of those chapters has multiple sub-chapters within them; these were all used as breaks, after converting each one to a “Chapter (n)” format. Rage and The Stand, the two other books I have to date processed, have luckily been blessed with a more normal chapter break format. One other I know off the top of my head that will require greater pre-processing is The Running Man, since it has that weird (n) And Counting chapter heading.

Some explanation of the text mining process and a number of glowing recommendations of the work of Julia Silge will follow, and the data visualization of Carrie.

Advertisements

Interstitial Burn-Boy Blues

Standard

Stuart watched the kid shake and mutter to himself in the seat across the aisle. His skin looked waxy in the dingy interior bus lights, and Stuart was sure that if he reached across and caressed the kid’s forehead with the back of his hand that skin would be near to scalding. He ran his tongue along the back of his teeth and watched the kid carefully. No one else in the general vicinity seemed to be concerned. Stuart noticed an old man dozing in the seat behind the kid, and a young couple murmuring to each other beneath a blanket in the seat ahead of him.

“Scourge of the panhandle,” the kid muttered, and Stuart looked away. He stared out of the window into the emptiness of the night. There was absolutely nothing to see; there was no moon in the sky and nothing to illuminate beyond the arid brush and gravel that lay on either side of the road to Flagstaff. Blackness rushed by like a hurricane wind and only the occasional light shining wanly from far off allowed for the recognition of motion.

When the bus passed the exit to Twin Arrows the kid moved violently in his seat, thrashing like a person trying to get comfortable when assailed by pains in every joint. By the time the exit sign for Winona passed the bus window, wreathed in shadows, the kid began moaning in a low animal tone. Stuart watched the others to see if they would notice and take action but the old man continued to snore softly and the couple in front of him continued to murmur and giggle lightly. The man in that seat had begun to breathe in quick, short bursts, and Stuart didn’t have to think very hard to figure out what was going on. Grimacing, he leaned slowly across the aisle and gingerly put the back of his hand against the kid’s forehead.

As he suspected, his flesh was burning to the touch and uncomfortably dry. The kid’s moaning grew louder, and Stuart drew his hand back with a hiss. He retreated back into his seat and ran his shaking fingers through his thinning hair.

“Faster,” the boy in the seat in front of him whispered loudly, and Stuart leapt out of his seat and strode up the narrow aisle. He approached the driver, a heavyset man with a fuzzed-out crewcut and a hypertensive tinge to his complexion.

“Do you have any aspirin?” Stuart asked. The driver kept his eyes on the road.

“Sorry, I drive the bus. You want a pharmacy, we should be stopping and getting off for a minute in Flagstaff.”

“There’s a kid back there who’s burning up,” Stuart confessed. “I think he might need a doctor.”

“We’ll be stopping in Flagstaff before long. You can take him to a doctor there.”

“You’ll wait?”

“Lord, no. The stopover in Flagstaff is only for an hour. After that we’re heading out again. I have a schedule to meet.”

“Is there another bus behind this one?”

The bus driver said nothing for a moment. The cracked and weathered visage of the old Route 66 slid by under the hard glare of the headlights.

“Word on the radio is that no, there won’t be, at least for a while. The governor is extending martial law out to the Okie border. He wants to stem the tide of ’em coming over and making trouble on their way to California.”

The driver stole a glance at Stuart, and Stuart shifted his weight from one foot to the other.

“I’m from New York,” he said, the words falling flat as they left his tongue.

“Don’t really care,” the driver replied. “Just letting you know that it’ll be a while before the next bus comes along. Long enough that you’ll have to either settle or move on some other way.”

Stuart returned to his seat without saying anything further to the driver. The kid was breathing and seemed to finally be deep asleep, but it was hard to tell. Stuart quickly checked the kid’s temperature and found that, while it hadn’t abated, it hadn’t gotten any worse in the meantime. He shrank back into his seat and pulled out his phone. There was still some room in his data allowance, so he searched for pharmacies in Flagstaff and found one that was near to the bus station, nestled in a Wal-Mart. He slid the phone back into his pocket and waited, watching the kid out of the corner of his eye.

When old Route 66 separated from the I-40, the bus followed Route 66 into Flagstaff. Like the highway before, there was little to see out of Stuart’s window once in town. What buildings there were crouched close to the ground, well back from the road, creeping like rats in the distance. Eventually a shopping mall ran past the window, but in the dead of night it looked patched and forlorn. When the bus eventually slowed and came to a stop, Stuart was confused as to where they were.

After the driver called out their stopover in Flagstaff, Stuart rushed out the door and into the cold Arizona night. He was shocked to see his breath in the glare of the bus lights and rubbed at his shirtsleeves. His phone reported that the Wal-Mart was on the other side of the road, set on the far side of a sprawling black parking lot. There was no traffic although the parking lot was populated with cars. Inside the store, a few midnight shoppers ambled down the aisles, their ruddy, wrinkled faces kept firmly towards the floor.

There was no pharmacist on duty, so Stuart picked up aspirin so he could at least bring the kid’s fever down. It was more expensive than he’d initially thought, and he mentioned it to the cashier, who shrugged and said that the cost had gone up around the time the army had been called to the border. Stuart weighed his options and put the charge on his sole remaining credit card.

Across the street, the bus had been driven beyond the gate that allowed entrance to the station. A pair of guards loitered on either side of the gate and came to attention as Stuart approached. They demanded his ticket and, when presented with it, continued to eye Stuart suspiciously even as the gate opened behind them. The space between his shoulder blades crawled as he walked up the laneway toward the bus. The driver was scrolling through something on his phone and the other passengers were either sleeping or engaged in the same activity.

The kid was breathing evenly through his mouth. His face was turned up toward the overhead light, and his eyes were closed. Stuart retrieved a plastic bottle of water from his carry-on bag and moved across the aisle. After a bit of shaking, he managed to wake the kid up enough to acknowledge his presence.

“Sal?” the kid asked. “Sal, you’ve lost weight.”

The kid’s eyes were unfocused, like he’d taken too many hits to the head. Stuart popped a couple of aspirin into his palm and unscrewed the lid from the bottle of water. He motioned to the kid to take them.

“It’s poison, Sal,” the kid raved. He looked away and shook his head. “I’m the last one; I won’t take it. I’ve seen them all take it already.”

“No,” Stuart said firmly, “it’s medicine. You’re burning up, you need to take it.”

The kid looked at him, his expression uncomprehending. “I’m on the bus,” he said, blinking rapidly. “Who are you?”

“Introductions later,” Stuart said. “Just take this and relax. Don’t worry, it’s just aspirin.”

The kid stared at Stuart and then took the tablets from his hand.

 

[Interstitial Burn-Boy Blues is available on Amazon in ebook and paperback, as well as from the Across The Margin site directly]

50 Days Of Soundcloud #12

Standard

“Formula Modernia”

BUY SELL BUY SLEEP

Feel free to check out some books:  today’s featured titles include Disappearance, only 99 cents, which if you enjoy the action bits in books and you like apocalypse fiction you’ll enjoy; What You See Is What You Get, which manages to combine the specter of ag-gag laws with criminal trials that look more like reality TV than anything else; and 9th Street Blues, about a kid delivering cobbled-together drugs in the near future ruins of Woodward, OK (and is also the jumping-off point for my new serial novel, coming soon from ATM Publishing).

Soon To Be Featured On Dirty Little Bookers!

Standard

 

A rather excellent artist I know gave me some advice the last time I saw him, and it was to the effect that art announcements should only be done a week or so in advance, so people don’t have time to forget them.  To that end, I’m proud to announce that the November spot on literature blog Dirty Little Booker’s “Calling All Indies” feature was won by yours truly, and I’ll be featured over there starting some time in the next week (as they work a month behind or so).  So, like voting in Chicago elections, visit early and visit often:  www.dirtylittlebookers.com

Rappers Wordier Than Shakespeare

Standard

The Largest Vocabulary in Hip hop.

NYC coder Matt Daniels recently teamed up with Red Bull Music Academy to do an interesting sort of data analysis on hip hop.  The benchmark of English literature for vocabulary usage is typically the Bard, although Daniels also points out through this study that Herman Melville also had quite the vocabulary.  What about hip hop though – specifically the rappers?  They make their name entirely on their words, so it seems only natural to compare them to the aforementioned benchmarks.

Mr. Daniels took the first 35,000 words in their lyrics and sorted them for unique word usages.  He then took the first 5000 words of Shakespeare’s 7 most popular plays (and the first 35,000 words of Moby Dick) and compared them.

The result?  Wu-Tang Clan is, in fact, nothin’ to fuck with.  There are 16 rappers identified that have bigger purported vocabularies than the Bard, and 5 of them are Clan – including the Clan itself.  There are, however, only three that have bigger vocabularies than Melville:  Kool Keith (no surprise considering he throws in words like “moosebumps” whenever he can), the GZA (the Genius!), and, by an extremely wide margin, Aesop Rock, whose lyrics often come across like post-modernist literature.

At the bottom?  DMX.  Although, as one astute commenter on /r/music pointed out, Mr. Daniels obviously didn’t count the sixteen different variations on barking that DMX pulls out.

Gone From The Charts But Not From Our Hearts

Link

Gone From The Charts But Not From Our Hearts

…is how they usually introduce an early rock ‘n’ roll radio show but in this case the line is apt for literature.  Off The Shelf is a new site set up by publishing giant Simon & Schuster to allow business insiders (editors, agents, authors, etc.) the room to reminisce and review books that they’ve loved that are at the very least one year old (my first novel, for example, is a year old now – how time flies).

Let’s face it, even with a downturn in the industry there is a lot of books flowing through the stores on any given day.  The bestseller lists and the review pages in the papers are full of books that you would love to read, but maybe you don’t have the time when they’re out or you’re already committed to another book or series of books.  The ones you notice tend to slip away; you’ll remember them months or years later when you catch a reference to them, or maybe you’ll never think of them again.  Off The Shelf is for those books – books that the insiders feel didn’t get their fair share of attention when they were fresh and new.  It’s a neat idea and I commend S & S for setting it up.

Hello New Visitors!

Link

Hello New Visitors!

I’m glad that so many people like the guide to GBV, there will be more discographies in the future.  I like writing them and people seem to enjoy reading them.  I already have two ready to go, so look for those in the coming days.

 

Also, since you’re here, consider buying a book.  People also tend to like it, I find.

 

 

A Paranoid New Novella for the Price of a Cup of Coffee

Standard

WYSIWY3G

 

New novella for the price of a cup of coffee. If you’ve been keeping up to speed with things like the filming of factory farms in the U.S., the altered perceptions of reality in the digital age, or the rather paranoid theory that the Powers That Be can do away with you just by planting things on your computer, it’ll probably seem familiar.

Novellas are notoriously difficult to sell to traditional markets. They’re too long for the short story markets (who get leery of anything over 7,500 words) and too short for the traditional publishing houses (who prefer you start at about 60,000 words). They are, however, perfect for the Amazon market, which is primarily Kindle users who don’t have a great deal of time on their hands. So, while I’m shopping my second novel around, expect to see some more of these, since I have more than a few of them knocking about.

Get It Here:  http://www.amazon.com/dp/B00IL0HODW