When I began my work with the Viral Texts Project, I was tasked with coming up with a computational way to assign genres to our texts. It seems foolish to admit now, a year into the project, that back then I imagined a kind of “one-click” solution. I believed my own elevator pitch explanation of the work, that I could show a few examples of each genre to the computer, and then send it off on its way to find me the rest. It has not been that easy.
In a forthcoming paper that I’ll be presenting at the Keystone DH conference at the University of Pittsburgh later this month I explain my process, and I plan to post that paper here after the conference, so I’ll forego the details of its inner-workings. Suffice it to say that the real innovation (with due credit to Ben Schmidt) is that rather than use words as the classifiers, I’m using topics derived from topic modelling the corpus.
Here I want to focus on the assumptions that went into my early conception of the project as a self-contained, auto-magical genre classifier. Reflecting on this recently, it occurred to me that I was operating on assumptions common to those who are new to DH. That is, I was trying to build a tool.
But the cracks in this idea began to show early. One of the first indications that a ready-made classification tool would never materialize came while reading Ted Underwood’s report “Understanding Genre in a Collection of a Million Volumes.” He writes that though he’s made his code public, “it definitely is not a tool that could simply be pointed at collections from other libraries, languages, or periods to map them. Our methods can be adapted to other problems. But the process of mapping genres involves too many domain-specific aspects to be packaged as a ‘tool.’” This was a bummer because, had he created such a tool, I most definitely would’ve simply pointed it at my data.
Here, however, Underwood was saying that not only did he not create a tool, such a tool could probably not be created. Starting there, and throughout my own work as my messy, definitely-not-a-tool process began to materialize, I’ve been trying to articulate why this exploratory (and frustrating) method ultimately seemed more meaningful than creating a tool or pointing a pre-made tool at my data. But it wasn’t until I finally got around to reading Kieran Healy’s excellent (and brilliantly titled) essay “Fuck Nuance” that I started to formulate an idea as to what seemed problematic about ready-made tools.
If you haven’t read Healy’s essay, you should, but by way of summary, the object of Healy’s ire, if somehow it’s not completely obvious, is nuance. Specifically, he takes issue with the way the idea of nuance has been forcefully inserted into sociological theory. He provides a helpful visualization that shows, unmistakably, the rise of nuance in sociological academic journals from the 1980s onward–it’s a steep incline. His thesis is that “demanding more nuance typically obstructs the development of theory that is intellectually interesting, empirically generative, or practically successful.” Any academic from any number of disciplines should recognize, if not the problem, than at least the symptoms of it. We’ve all been to conferences where at least one questioner wishes to push back against the presenter by asking if she’s ever considered this or that theory or author–usually the object of study of the questioner. I had never heard the word “problematize” until I started my PhD program, and now I hear it far more often than I care to.
So what does this have to do with tools in DH? My sense is that a ready-made tool is like an overly-nuanced theory, imbued with what Healy calls “Actually-Existing Nuance,” which he defines as “the act of making—or the call to make—some bit of theory ‘richer’ or ‘more sophisticated’ by adding complexity to it, usually by way of some additional dimension, level, or aspect, but in the absence of any strong means of disciplining or specifying the relationship between the new elements and the existing ones.”
This is kind of what I imagined my tool might be, a fully realized genre classifier that takes into account all the complexities, levels, and aspects of whatever genre meant in nineteenth century newspapers. I fell into the trap that many software developers fall into, trying to build something that does everything (open iTunes recently?). But the reality is, a comprehensive tool never materialized, and will never materialize. The concept of genre, it turns out, is mutable over time, space, audience, writer. What actually ended up working is letting the texts define themselves and creating a model tailored to the data. Even the idea to use topics rather than tokens as classifiers is an effort to reduce dimensionality in the model, to do without nuance. It doesn’t work all the time, and requires a lot of paging back and forth across R Script files in R Studio. It doesn’t definitively answer every question about genre that I had when approaching the texts.
In short, lots of things are left out in order to make my model work well for other things. At present, it’s really good at seeing top-level genres, but not so great at digging down into them where the differences between, say, a vignette and an advice piece are trickier to discern. But, as Underwood writes reflecting on the call for nuance in DH, “It’s okay to simplify the world in order to investigate a specific question.” He continues, “Something will always be left out, but I don’t think distant reading needs to be pushed toward even more complexity and completism.” Franco Moretti had something to say about this as well in his book Graphs, Maps, Trees, “Problems without a solution are exactly what we need in a field like ours, where we are used to asking only those questions for which we already have an answer.”
It’s okay if a theory or, in this case, a model, raises problems without a solution, or ends up asking more questions than it answers; we aren’t in the business of looking for definitive answers. That’s not how the humanities work. Julia Flanders reminds us to value “Methods and tools that combine what has been gained in power and scale with a real measure of scholarly effort and engagement.” She continues, “the intellectual outcomes will not be judged by their power or speed, but by the same criteria used in humanities scholarship all along: does it make us think? does it make us keep thinking?”
Early in my process I was driven to create a tool with power and speed; eventually, I remembered to stop and think. From my classification efforts, a theory of genre in periodicals of the era is developing. Like the classification experiments that birth it, it won’t be complete, or accountable to all the potential complexities that such a theory might be imagined to contain. It won’t be terribly nuanced. But, with any luck, it will be interesting and generative. It will offer some answers and raise additional questions. It will be complicated, but not over-complicated.
Though, honestly, at this point in my work, I just hope it won’t be a tool.