A new product, Tesserae (described in HASTAC), “aims to provide a flexible and robust web interface for exploring intertextual parallels.” Texts (such as books or plays or poem cycles) become grist for analysis–the meanings and implications, not the character sequence or word count.
Its start is with classical authors (Plautus, Ovid, Catullus, Vergil, Horace), but it is expanding to English prose.
This is another good example of what digital humanities (DH) are, in that understanding texts for their language and thought expression (as opposed to phenomena described in words) is a core concern of humanistic scholarship.
What is noteworthy, especially from the point of view of finding crossover points or ways in which the humanities can open new windows of analysis of science, is the concern over copyright. As soon as text becomes data, it brings along issues of copyright. Of course experimental data can be copyrighted as well, but that is either liable to be waived, or the level of concern over originality and remuneration is likely to be lower than in artistic or literary communication. As DH grows, scientists, clinicians, and social scientists can learn to address issues they consider “supporting” or outside their disciplines, and can award such problems the attention they deserve, since their colleagues in the humanities are supplying the expertise.
Jer Thorp, Data Artist in Residence at the New York Times, develops data visualizations answering humanistic questions (modeling sharing, questions during conversations, looking for narrative structures, laying out names in a 9/11 memorial according to relationships among the people). By creating “human contexts” for primitive data points (latitude and longitude of landing in New York for the first time, where one met one’s girlfriend, etc.), he attempts to bring more participants into “dialogs” about the data points (or chains of events or consequences), which, by widening the scope of additional viewpoints, can enhance creativity or at least address needs or mitigate hazards (by giving data stories, or creating empathy). In a TED Talk, he invokes the role of artists and poets to work at the convergence of science, art, and design, add meanings and promote a deeper relation between humans and data. OpenPaths, a site for uploading and sharing (thus “owning”) one’s own location data, is an example.
Medical informatics has been around for decades, and with the rise of data availability and interest, it is natural that there would be a “medical flavor” to data files and stewardship. There doesn’t seem to be a society dedicated to it yet, but there is an annual conference, now in its second year, the Meaningful Use of Complex Medical Data Symposium. Programs include not only clinical models for decisionmaking (a long-standing instance of medical informatics, including performance measurement by comparison to protocols, now informed by data as well as expert opinion), but also mechanisms for collaboration and crowdsourcing.
The sciences have led the data revolution because of their very nature, and the humanities’ “data” started out as mostly collections of digital creative works; but there are legitimate “pure data” endeavors that are exclusive (mostly) of science. The Council on Library and Information Resources has just released a report, “One Culture. Computationally Intensive Research in the Humanities and Social Sciences: A Report on the Experiences of First Respondents to the Digging Into Data Challenge,” based on interviews of recipients of grants through the Digging into Data program, led by the NEH, who partnered with JISC in the UK, SSHRC in Canada, and NSF. The table of contents, listing the cases studied, is informative:
- Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent (DMCI)
- Digging into the Enlightenment: Mapping the Republic of Letters
- Towards Dynamic Variorum Editions (DVE)
- Mining a Year of Speech
- Harvesting Speech Datasets for Linguistic Research on the Web
- Structural Analysis of Large Amounts of Music Information (SALAMI)
- Digging into Image Data to Answer Authorship Related Questions (DID-ARQ)
- Railroads and the Making of Modern America
A summit of directors of the highest-level US agencies (OSTP, NSF, NIH, DOE, DOD, DARPA, and USGS) met on 29 March 2012 in a panel discussion at AAAS (moderated by Steve Lohr, a reporter from the New York Times), “Challenges and Opportunities in Big Data.” They declared their support for (or at least activity in) “big, i.e. supported by large or wealthy institutions) data.” Reaction was from academia and industry. Significant was that so many institutions are sitting up and taking notice that “data” is not an ancillary activity distributed among pockets throughout the world. It was an interdisciplinary meeting where the participants truly had something in common.
Mentioned were education for data skills, business demand for people with such skills, programs in subject domains addressing data (e.g., IDASH, a training and infrastructure program in the biomedical domain; NASA and its international collaborations; the convergence of physics/astronomy and biomedicine in discovering they are using similar techniques), using data to enhance innovation and competitiveness, the decentralization of ownership of data, privacy and patient-donated data, economic discrimination in access to data (interesting, in that Open Data combats that precisely). One observation was the need to train young scientists in statistical thinking, beyond facility in their own domains: not only data mining but taking a broader, higher perspective.
One attendee from RPI remarked on the need to instill “social side” behaviors of collaboration, marketing, foreign languages, and non-technical tasks. A scientist at Google was quoted as saying the hardest positions to fill are statisticians.
Another attendee from the National Council of Teachers of Mathematics mad the interesting observation that the education field generates not so much data as information, and asked how educators could join the data community.
An archive of this webcast will be available on nsf.gov within two days of this event.