Tag Archives: Stanford University

Stanford Literary Lab – Reflection 2

While the evolving  New Yorker project (February 13 meeting) seemed like the perfect opportunity to begin the internship and follow the entire development of a DH project in Literature, it seems to be a complex and time consuming endeavour to sort out the different interests and sub genre plans. Thus, at this point, as an intern my opportunity to work on a task in this project has been delayed.

However, the leading group of professors involved in other projects suggested to sit in on the Microgenre project, which has been ongoing for some time. The project explores the “discursive inter-disciplinarity of novels, using machine learning to identify points at which authors incorporate the language and style of other contemporary disciplines into their narratives.” The team is looking for moments in a wide range of novels across genres and time periods to determine the way authors signal the shift between narrative and history, philosophy or natural science. Some of the questions they are hoping answer:  “Do these signaling practices change with time or with discipline? Akin to what Bakhtin terms “heteroglossia,” these stylistic shifts indicate not only the historically contingent ways that novels are assembled from heterogeneous discourses, but they also shed light on the practices of disciplinary knowledge itself.” Since the disciplines have exploded in number at the ubiversities after 1870, the project is examining novels and journals between 1880 and 1930.

The first meeting I attended on February 14 looked at an extensive spreadsheet containing the disciplinary breakdown of journals found in JSTOR’s database. The goal was to narrow down the number of active journals in each discipline during the target years.  Thus, the discussion and on-the-spot quick Internet research yielded some result in the areas of science, literary, religious, phylosophy, and psychology journals. Besides JSTOR’s, Wikipedia’s metadata was also considered in the search.  At the end of the meeting, I was asked to research law as a discipline in the period 1880-1930. Specifically, to know the premier journals, among those in the JSTOR holdings, in the field in both the U.S. and the U.K. and research and give a brief, qualitative sense of the field at this time: the top schools, the difficulty of obtaining a degree, the major questions in the field. My research results are contained in the PDF file “Law as a Discipline” and were briefly discussed in the following meeting on February 28 along with the other major journals. During that meeting, duplicate journals were disounted for the reason that they span across multiple genres and could create confusion of data in the reading of the DFA Classifier.

The discriminant function analysis (DFA) program was created by Mark Algee-Hewitt. In it, the groups are our various disciplines (anthropology, philosophy, history, etc.), and they are training and running the classifier on 100-sentence excerpts from the corpus of texts. As J. D. Porter explained to me, often in literary DH classification it is done with words, but they are trying to capture style and avoid “aboutness”, so for the variables they mostly used parts of speech tags instead, plus a few other things such as sentence length, number of clauses, and numbers of named people and places. Therefore, the results are fairly unique, in that the classifier doesn’t know any semantic content. Nonetheless, it performs well above chance, and for some categories remarkably well (e.g., it correctly identified >60% of the passages in the history, novels, and psychology categories, where chance would have gotten ~12% of them right). This seemed abit confusing to me since I have not seen the previous results or have heard much about this tool.

Other items discussed in the 2/28 meeting included the following:

  • How to measure fiction:
    • Sample sizes and what will they signal (big corpus vs. small corpus)
  • Article length (what should be a median length)
  • Journalism Corpus:
    • articles vs. newspapers
    • where to find newspapers
    • British Periodicals
    • anthology of yellow journalism
  • Literary Criticism
  • Book reviews
  • Reviews of Pedagogical practices
  • What journals should be included in Politics
  • Discipline of Theology/Religion
  • Do disciplines cohere?
  • Classification Model
  • Outlier Slices

The Microgenre Project meeting on March 7 was highly anticipated because Mark promised bring and share the newly run novel chunks and their colorful bar graphs indicating the disciplinary breakdown of genres and hopefuly pointing to shifts within the writing.

A Study in Scarlet by Arthur Conan Doyle

Features of the DH element:

  • graph of disciplines
  • 73% success rate for 100 chunks
  • Values(100)/disciple
  • The bigger the chunk the better the classifier
  • Just parts of speech! – a continuous surprise
  • Microgenres Master feature
  • smaller chunks – division points
  • Posterior Probability/Position
  • 10 most acurate in each discipline/50 sentence chunks
  • Random samples

The meeting concluded with the agreement of sampling about 200 sentence chunks and examining them on a sliding scale. In addition, the already graphed novel chunks will be reviewed by the members of the project and matched with the prediction of the graphs to look for shifts.

My Questions:

  • Which novels are included?
  • Could I have access to any of the sheets/graphs/novel chunks?

While it was helpful to finally see the actual graphs depicting novel chunks and the interplay of various other disciplines within the writing, in the same time the amount of data was overwhelming to take in withouth the actual reference. I would have liked to go over some of he graphs beforehand to study these elements. I was familiar with most of the novels mentioned in the study, however, I do not have the descriptive list of novels included. Once again, I left the meeting without any particular task to complete. Because of the formally closed nature of the projects, I  do not have access to the data or findings of the project before publication. Thus, it seems futile to attend the upcoming meeting on March 15 since I am not able to study the data or provide further research/imput in any part of the project. It has been fascinating to learn about the numerous ways the Literary Lab examines literature and prepares for its Computational Criticism in the field, however, as far as the DH internship is concerned, I feel that I need to search for a new assignment where I am able to participate and learn about the use of tools and strategies applied in the field of Digital Humanities.

Stanford Literary Lab – Reflection 1

The internship at Stanford University’s Literary Lab began with a round-table discussion event I attended. The event explored, as Mark Algee-Hewitt, director of the Literary Lab phrased it, “the relationship between Literary Studies and the Digital Humanities, specifically that associated with text mining or quantitative analysis. In what ways have we been successful in integrating the two fields to produce new methodologies for studying Literary Criticism and History? Where do the fault lines between the fields still exist and what work might be necessary to synthesize the methodologies of close reading and computation? And are there fundamental incompatibilities between the humanistic study of literature and the Digital Humanities that we may not be able to solve? With four very different perspectives, our round-table participants will lay out the stakes of this compatibility and engage the audience in a larger conversation about the future(s) of the field.”

The discussion revolved around the democratic nature of the projects, the question of graphs and Ngrams analyzing language, and the practicality of it in literary criticism. For instance, the issue of close reading being possible replace or in some ways supplemented by a nonhuman analytical tool is a touchy subject of humanities scholars.  While many views and more inquires were raised, the answer were scarce in light of the proposed ideas. As this field continues to evolve, the methodologies of research, developing critical questions, the tools applied, and the emerging criticism are all subject to change and interpretation. In addition, one of the puzzling notions of the crossroad of DH and Literary Studies is the question of where it leads to and what possible answer can we gain from “studying” literature through these tools that haven’t been already considered. An additional missing piece of the puzzle is the interest and connection to the general public, which probes the democratic nature of Literary DH projects and how if at all they may reach beyond the academic realm of readers. As an intern, just beginning to become familiar with the mission and the projects of the Lit. Lab, the event held an impressively involved, intimidatingly intriguing approach to the function of DH in Literary studies.

A few days after the event, and following an introductory meeting with the Literary Lab’s director, Algee-Hewitt, assistant director, J D Porter, and coordinator Erik Fredner, I was able to attend a project meeting. While the Lab has several ongoing projects, I was lucky enough to sit in on the first meeting for the New Yorker Project. As I found out, when opportunity presents itself in a form of a corpus becoming available, DH enthusiast gather to brainstorm  possible analytical avenues and the methodologies through which these ideas may unfold and become visible representations. Thus, the New Yorker Project set out to examine 4, 695 issues of the magazine published between February, 1925 and July, 2017: 559,924 number of pages, 617,088,848 number of words. Some possible suggestions for the study of this corpus were:

  • how to identify fiction/poetry
  • breakdown by editors
  • predictable genre  in the New Yorker
  • examining page-level geography, XML
  • geography of ads
  • cartoon captions
  • viewed Vector analysis of the word “inflation”
  • What changed? What didn’t exist?
  • portion of ads in comparison to articles/fiction over time
  • short story index
  • Is there a New Yorker genre?
  • timeliness vs. timelesness
  • signals of the the sense of the century
  • comparison of cultural artifacts with literary focus
  • tracing gender pronouns
  • ethnic breakdown of names
  • readership: upper bourgeois vs. academia
  • literary titles over time
  • When did photographs come into the publication?
  • Vector model analysis of discursive span within: variation in the number of pages published – 1940-1960 top increase, largest volume 2/21/2000 (75th anniversary issue)

Th overall aim and result of the meeting was to share areas of interest in the research and analysis of the New Yorker and then break into focus groups. The upcoming meeting will begin in assembling the groups and begin tagging the corpus for markers of short story, poetry, reviews, text vs. non-text,

On the wall of the conference room, all present and future projects are listed, indicating the phase of each. I will try to attend the Microgenre project meeting to get an idea of the development of a given project and different stages of this “organic” collaboration in the Lab. The Migrant Discourse Project (examining migrations in South America and mapping literature along the way) also intrigued me, however, it does not have a set date for its first meeting, so it might be some time before it even begins.

As much as the focus of the Literary Lab is to examine various literature while applying computational criticism to the study of literature, I find an inevitable historical component in the process.