Seeing the Forest for the Algorithm: a Review of Past Edit Notes and a Hard Truth


In which the author makes an embarrassing confession

One of my little secrets is that, in between projects, I’ll sometimes read a book about writing. It’s always useful to reinforce the basics, and seeing how other writers approach the blank page gives me insights into my own work.

Sometimes I get the impression that I’m supposed to be past all that, but I’m not. I’ve never really felt that I’ve mastered this craft. Some aspects of it, maybe, but I still struggle.

And since I’m brainstorming something new, I took another writer’s casual mention of his favorite book about writing, Stephen King’s On Writing, and borrowed it from the library. I had barely started when I heard a discussion of a different book over the radio. You can listen to that here, if you really feel you have to. It’s not an interview with either of the authors, and the interviewee’s Wired article is more interesting and informative with fewer dopey questions.

The book is The Bestseller Code by Jodie Archer and Matthew Jockers. Maybe you remember when it came out last year, or maybe the title is enough to guess what the book’s about. The authors created an algorithm to analyze a variety of modern novels, then ran all sorts of books through them: bestsellers, non-bestseller, midlist books, self-published, the whole deal. The algorithm noted the differences, then sorted out the ones that were strongly predictive of bestsellers. According to the authors, their “bestseller-ometer” was capable of predicting whether a book would be a bestseller with 80% accuracy.

It’s correlation, sure, but the authors found nearly 2800 factors that were present in books that made the NY Times list but not present in the ones that didn’t. Yes, the NYT has issues with the ways it manages it’s list and it’s not a true sales meritocracy, but it is a powerful cultural signifier, and Archer, a former Penguin UK editor, wanted to better understand the differences.

The Bestseller Code is an exercise in finding meaning in those differences.

I don’t know if you remember when the book came out last year, but I do. I scoffed at it. Computer analysis? Of a creative endeavor? Please.

But that interview, flawed as it was, piqued my interest, so I borrowed a copy from the library.
It turned out to be interesting stuff.

What we talk about when we talk about luck

First, I want to say that the technology Archer and Jockers deployed—sentiment analysis, topic modeling, and more—was pretty impressive. The field is more advanced than I would have guessed.

Second, it turned out that the way they applied those tools, and the conclusions they drew from them, were entirely unremarkable. (Bolded because I want folks to take note.)

It’s common for folks to talk about success in the arts as part skill, part talent and part luck. I’ve talked at length on this blog about my opinion of “talent”, and at a little less length about “luck.” The effects of luck have been proven experimentally.

My question is always: what are they calling “luck”? What confluence of choices and incidents brought about this fortunate outcome? Because, to me, “luck” is what you call a series of events you don’t understand well enough to predict or control.

But what if we had the tools to look at things more closely? What if we had a better understanding of the differences between what people want to read and what we’re offering? What if we could narrow that gap?

Data doesn’t frighten me. Nihil veritas erubescit.

Besides, I’m a published writer with starred reviews and even, if you can believe it, fans. I already have the skills I need to break through to a larger readership, don’t I?

This is where my agent comes in

As I was reading The Bestseller Code, I kept thinking My agent could have written this.
Let me take an example. Using topic modeling, the algorithm breaks down what each book is “about.” Maybe a certain percentage might be concerned with crime and police work, and a smaller percentage for domestic matters. The next smallest percentage would concern, say, hospitals and medical concerns.

It seemed weird to me that algorithms are sophisticated enough to manage this task until I remembered Pat Rothfuss talking about programs that could handle the task five years ago.

Anyway, the books that sold well had fewer topics (around four), and those topics offered opportunities for dramatic contrast. Books that didn’t sell as well had more topics (around six, if I remember correctly). The subjects were more wide-ranging, less unified.

What’s more, one of the most important predictors of success was that a book devoted a certain amount of time to human interaction and connectedness. If one of the four topics was characters being with the people they cared about, living their lives and dealing with each other, that was a strong indicator of good sales.

Guys, my agent has been trying to teach me these lessons for years. For my whole career, I’ve been trying to establish relationships between characters the way a movie would: with a single, significant gesture or remark. She has been telling me, book after book, to give them more time on the page. To let them relate to each other. To let them bond. It turns out that human interaction in fiction is incredibly powerful, and I’ve been giving it short shrift.

She has also told me—many times—that I need to simplify. Often times I have too many storylines, plot turns, or characters. Especially characters. Too many “topics.” Maybe my work would reach a larger audience if it was more unified.

Another thing the algorithm does is generate plot curves through sentiment analysis. When the language of the book is full of upbeat words, like succeed, kindness, rest, and peace, the plot trends upwards. When it’s full of words like loss, failed, grief, and pain, it trends downward.

What surprised me is that, when the algorithm studied bestsellers, it produced plot curves very similar to the ones writers see all the time. One is quite similar to Freytag’s Pyramid; others matched different but fairly common models.

I’ll admit that I was startled to see a computer pull the old tried-and-true plot diagrams out of bestselling books, and how non-bestsellers seemed so flat. It made me question how well I manage the rise and fall of a plot curve and whether the language I use is appropriate for it.

There were other findings beyond those, obviously. The data was all descriptive, and it covered books that were popular but critically derided as well as popular but prize-winning. Except for a few surprises, like the need for scenes of human connection and a general distaste for sex (::shakes fist at America::) it’s standard stuff. Create a character who really wants something. Have them go after it. Make the plot turns powerful. Keep things focused. Write in a naturalistic style. Hook them in the first few pages.

Honestly, my agent could have written this advice, and as I was coming to the end of the book, it occurred to me that she sort of already had.

In which I step back from my edit notes to examine my edit notes

Just last week, my agent got back to me about a book I’d sent her. The news was bad, I don’t mind admitting, and of course she had some notes to give me.

As I was thinking about how closely the advice in Archer’s and Jocker’s book matched what my agent told me, I got the idea to go back through all her editorial notes for all of my books and look for patterns.

I’ve been happy to take her input—I signed with her, in part, because I knew she’d help make my work better—but I’ve been looking at them case by case. Book by book. It never occurred to me to look for trends.

To be fair, there was usually a year in between each new book, and sometimes more, and I’m a forgetful, disorganized person. It’s easy for me to carefully study a bunch of trees without once considering the forest.

So I opened all my old emails from my agent to review the notes she’d given me. My first thought was that past-me really needed to be more practical with his subject lines. My second thought was that I’d always thought of myself as a slam-bang thriller writer, a guy who could spin out an exciting story. It occurred to me that I wasn’t being exciting enough, because that self-conception wasn’t matched by outside reality. The work I was doing was earning fans and selling books (by my estimation, The Way into Chaos, which was self-published, has sold a little over 13k copies, which would be a fine, fine number for most NY genre publishers) but I wasn’t breaking through to the larger world.

What if I had placed myself in the “Good But Not Good Enough” category, and was missing out because I wasn’t really addressing persistent flaws in my creative choices?

So what were those persistent flaws? Obviously, each book had its unique problems, but there were several that popped up over and over.

Here they are:

  • Book started too slowly
  • Too many characters/plot complications/names
  • Characters not sympathetic enough/don’t have time for personal bonding

The hook must come sooner. More unity. More time for the characters’ relationships.

Honestly, I thought I’d already learned all the skills The Bestseller Code suggested I would need. I thought I was already working at that level. It’s pretty clear that I’m not.

The nice thing is that I’ll have a chance to be mindful of these persistent issues as I start a new book. Will it help? Shit, I hope so. I have ambitions, you guys, and I’m not meeting them.

My agent will still have notes for me, but maybe she won’t have to tell me the same old things she’s had to tell me every other time.

There’s more to say on the subject of computer analysis and the services various tech companies offer publishers, but that’ll have to be next time.

(If you thought this post was interesting or useful, why not share it?)