Digital literary studies: text mining: does this make text data?

A new product, Tesserae (described in HASTAC), “aims to provide a flexible and robust web interface for exploring intertextual parallels.”  Texts (such as books or plays or poem cycles) become grist for analysis–the meanings and implications, not the character sequence or word count.

Its start is with classical authors (Plautus, Ovid, Catullus, Vergil, Horace), but it is expanding to English prose.

This is another good example of what digital humanities (DH) are, in that understanding texts for their language and thought expression (as opposed to phenomena described in words) is a core concern of humanistic scholarship.

What is noteworthy, especially from the point of view of finding crossover points or ways in which the humanities can open new windows of analysis of science, is the concern over copyright.  As soon as text becomes data, it brings along issues of copyright.  Of course experimental data can be copyrighted as well, but that is either liable to be waived, or the level of concern over originality and remuneration is likely to be lower than in artistic or literary communication.  As DH grows, scientists, clinicians, and social scientists can learn to address issues they consider “supporting” or outside their disciplines, and can award such problems the attention they deserve, since their colleagues in the humanities are supplying the expertise.

Google Analytics less useful

According to a 16 November 2012 story in econtentmag, Google started encrypting transactions from signed-in users in SSL, so “organic search” queries can’t be captured.

The problem may be greater for commercial entities concerned with “search engine optimization (SEO),” i.e., manipulating characteristics of different search services so that a page’s meta tags, header, and other such ranking features to get higher placement on results pages or otherwise drive searchers to a “conversion” (viewer decides to buy something on the page and thus transmutes into a customer).  As the article says, “The change to SSL has also made it impossible to deliver to targeted landing pages based on organic keyword searches. But Google still allows advertisers to see data related to paid search terms ‘to enable advertisers to measure the effectiveness of their campaigns and to improve the ads and offers they present to you.’ In other words, Google wants you to fork over some cash.”

This is more of a lesson possibly to be inferred from the commercial world to academic authors who don’t usually try to increase readership by such devious means–and whose pages are less likely to have some of the more sophisticated features that a commercial page uses to shape the “user experience.”  But it is worth noting, just to keep up with what’s going on, and possibly available, in the field of persuasive Web page design.

Alternative peer review model: keep best of openness, but use moderators

David Stern, now at Illinois State University after scientific information positions at Yale and Brown, presents a modified or partial peer review model (“post e-print”) in an SLA professional development presentation, “Alternatives to journal subscriptions,” a detailed analysis of alternative journal subscription models.  After considering the finer pricing points of editing peer review, discovery, and subscriptions (useful to libraries considering their own e-journal platform efforts), he proposes a moderation model (moderated peer review) where junk is filtered out, submissions are placed in an open repository, and needed level of peer review services are determined.  Such moderation would be a volunteer activity.

“What do publishers do?” video clips

The International Association of Scientific, Technical & Medical Publishers has just released the winning entries from their recent video competition.  The clips (only a few minutes long) range from amusing to staid, but avoid delicately the obvious answer, “Make money!”  Open Access publishers probably wouldn’t have displayed production values any skimpier.  For-profit publishers do put money into glitz, but they apply more of their business acumen (also not mentioned in the videos) to direct resources to classier displays in conference exhibit areas.

“Make data more human”

Jer Thorp, Data Artist in Residence at the New York Times, develops data visualizations answering humanistic questions (modeling sharing, questions during conversations, looking for narrative structures, laying out names in a 9/11 memorial according to relationships among the people).  By creating “human contexts” for primitive data points (latitude and longitude of landing in New York for the first time, where one met one’s girlfriend, etc.), he attempts to bring more participants into “dialogs” about the data points (or chains of events or consequences), which, by widening the scope of additional viewpoints, can enhance creativity or at least address needs or mitigate hazards (by giving data stories, or creating empathy).  In a TED Talk, he invokes the role of artists and poets to work at the convergence of science, art, and design, add meanings and promote a deeper relation between humans and data.  OpenPaths, a site for uploading and sharing (thus “owning”) one’s own location data, is an example.