Jun 10

Humanities Publishing & Data Curation: Eternal Life & Eternal Damnation

Here are the slides and notes for a presentation by me and Patrick Alexander, Director, Penn State Press.  This is to be delivered at Valued Resources: Roles and Responsibilities of Digital Curators and Publishers, the 4th Bloomsbury Conference on E-Publishing and E-Publications, sponsored by the Department of Information Studies, University College, London.



Subject to change.  YMMV, and provided here under a Creative Commons license:  Attribution-Noncommercial-No Derivative Works 3.0 Unported.

May 10


I hate the word “repository” because it obscures the variety of problems we are attempting to address through their development, and in turn constrains our thinking about what may possible through the services they can enable.   Modifiers such as “institutional,” “central,” “digital,” “open,” “collections” repository (or some torturous combination of these) do not help because each of these variations imply a singular technological solution to a set of  complex changes in the way research is conducted and information is communicated.    The term “repository” carries with it many connotations, some of them rather unfortunate.  In general it describes a place where things lay, not where things are happening.  According to the Oxford English Dictionary, a repository could be a “A vessel, receptacle, chamber, etc., in which things are or may be placed, deposited, or stored” (1.a).  Definition 5– “A person to whom some matter is entrusted or confided”–is a less common usage, but one that certainly resonates with the institutional mission and responsibilities that libraries hold for their collections.  Yet it is also hard to overlook definition 2.b: “A place in which a dead body is deposited; a vault or sepulchre.” [1]

The early energy surrounding institutional repositories (IR) centered on a hope that promoting open access could serve as a countermeasure to commercial publishing power and its ability to distort the market for knowledge.   “Taking control” of our institution’s research by providing the ability to distribute this information to the world in an open access mode seemed to be an inevitable outcome of the Internet.   Here is a brief history of institutional repository hype.  In July 2002, The Chronicle of Higher Education reported ” ‘Superarchives’ Could Hold All Scholarly Output: Online collections by institutions may challenge the role of journal publishers”[2]     Also in 2002, a SPARC position paper declared

“Institutional repositories–digital collections capturing and preserving the intellectual output of a single or multi-university community….[p]rovide a critical component in reforming the system of scholarly communication–a component that expands access to research, reasserts control over scholarship by the academy, increases competition and reduces the monopoly power of journals, and brings economic relief and heightened relevance to the institutions and libraries that support them.” [3]

But in 2004 The Chronicle provided an update: “Papers Wanted:  Online archives run by universities struggle to attract material. [4]     IRs soon became the butt of jokes, even inside the community of practitioners. In March 2006, Dorothea Salo, an institutional repository manager, rechristened herself in her blog.  “I have a new title. Innkeeper of the Roach Motel,”  she wrote, describing her repository as a site where data goes in but doesn’t come out. [5]    By November 2008, attendees at the SPARC Repositories Conference worried openly about how faculty can be persuaded to use the institutional repositories on their campuses and how these services were going to survive the worst economic crisis in decades if they didn’t.

Many of my publishing colleagues have warned me that if institutional repositories are successful, they will go out of business and eventually the entire scholarly communication system will start to break down.  I can assure my friends that their jobs are quite safe.  While IRs have generally had limited success, many publishers have adapted their policies to allow authors to distribute pre- or post- print versions of articles in open access forms.  Those changes are at least partly related to funder and public pressure and the availability of repository outlets.   Some institutions have begun to have luck negotiating with publishers for the rights to deposit their faculty’s articles in those same repositories.  The emphasis on opening access has been driven heavily by institutional (library) hopes, not the needs of our users, whose work is changing, and who require new services to keep pace in their fields.  Archiving single articles didn’t make much sense to them in that context.   Continuing to focus on IR “deposit” by faculty and students–which sounds like a one-way proposition for the information–will not carry us forward.   I am also not very hopeful about local, campus level “mandates” for open access, like those coming out of Harvard, MIT, and even the University of Kansas.   It is hard work to establish a campus wide policy that defaults all researcher publishing to “open access,”  and it’s easy to fail.  Such mandates are great PR, but are they really enforceable at a local level?  Are they really worth the time it takes to evangelize, combat falsehoods, smooth feathers and win converts? Will they really change the way that scholarship is communicated?  In the end they are right thing to do, but they don’t really challenge anyone’s scholarly norms–in fact they go out of their way not to do so in order to win the political battle.

Repository tools and many related programs have been developed with a potential scope of use broader than that implied by the institutional repository hype, and may yet serve, as Clifford Lynch and Joan Lippincott wrote, as “general-purpose infrastructure within the context of changing scholarly practice.” [6]    Deployment has varied.  Some libraries have focused first on “the intellectual output of the institution,” while others have focused on particular disciplines or user groups, while still others have attempted to better manage and provide access to  digitized versions of the physical collection of the library.  Libraries are also using these services to manage “born-digital” resources acquired by the library from a variety of sources, including vendors and publishers.   None of these activities are mutually exclusive, and it is likely that libraries will end up working with all of these materials simultaneously.

So what is it that we think we are talking about when we talk about repositories in research libraries today?  Are repositories things?  If so, they are more like conglomerate rock than uniform applications and programs.  Are they places, like the open stacks or the closed archives?  If so, they are Victorian follies–an aggregation of features, not all of them fully functional, offering none of the transparency of Phillip Johnson’s glass house.     In the widest possible sense, when we are talking about repositories, we are talking about a set of organized methods for content management, not about specific applications or even specific access points online.  Managing and providing access to diverse digital content requires many different processes, methodologies, policies and technologies, just like a physical library collection. Collectively, we are today determining how to manage digital data as smoothly and with the same degree of certainty as we do physical collections. [7]   Repository-based content management can and must serve many functions at once, and successful implementations will recognize this to move beyond our early narrow focus to succeed.   So where do we begin?  One potential answer to these questions is provided by Catherine Mitchell, who at the 2008 SPARC conference presented with the title “Let’s stop talking about repositories,” arguing instead for a talk about services. [8]     That is a small, but critically important rhetorical shift.

After all the hype, today it is most critical to identify the content-driven services that can be offered through “repositories,” and which of these our libraries need to offer to our clients, however we define them.   I don’t believe that all libraries should offer such services.   We are well past the days when all collections needed to reside physically on each campus, and we approaching times when replication of similar technology services on each campus may prove to be economically impossible.  If content management and delivery services have a limited audience on a given campus, it may be better to partner with others to offer or to rent the needed technology.   That is heresy to many, because it contradicts our philosophy of retaining control over “our” materials.  But scale matters, and if we cannot it achieve it on our own we will risk poorly managing services that have limited use.

No library should implement a digital repository program without examining the role it will play in its broader strategy for collection development, stewardship, and providing access to its primary constituencies.   That strategy should be based on a clear understanding of the community’s needs, and the requirements for long-term stewardship of the data collected.  Most importantly, it should include a critical assessment of the library’s ability to fully meet those needs, including funding, the skills of its staff, and the benefit of the service relative to the cost of operating it. [9]    We cannot do everything, especially now, and we should be willing to walk away from that which doesn’t work for us.  As an administrator, I appreciate that this is much more easily said than done, and I have witnessed and been complicit in many situations where it has been necessary and expedient for political and “reputational” reasons to continue walking into the big muddy.  We learn from failures, but institutions have a terrible time admitting them.  Here’s to hoping that one effect of the Great Recession will be a greater willingness to walk away.

We tend to build silos for our collections and services, either because of organizational politics, convenience, feasibility, or just because we are predisposed to think about fitting things into buckets.   Some libraries that are offering significant services for original publications such as journals, for articles such as pre- or post-prints, and for large collections of reformatted or born-digital materials, operate each some or all of these services through different software and different operational divisions of that library.  Heterogeneous content and heterogeneous communities require heterogeneous services, but a coherent organizational strategy and economies of scale should underlie these.

When we talk about repositories, or better, the services we offer through them, we are discussing  the sociological side of technology and its adoption.  Repository programs are still exotic, or even scary, to too many of our librarian colleagues, and to make things worse, most librarians were never trained to make the sale for experimental services or projects.  But those programs must be integrated into the rest of the library’s services.  Public services librarians meet students every day in the classroom, in the library, or online, and, in spite of their slight reluctance to paying us a visit in the library, faculty still call upon us.  All of us have a responsibility to query our teaching and research colleagues to divine the needs that they didn’t know they had, and try to match those to the services we can provide.  Asking questions that you don’t have answers to is the best way to start learning.  That, in turn, requires more active communication across the divisions of our libraries to ensure that the programs we offer are integrated into instruction, reference and collection development.

Many researchers, perhaps scientists especially, cannot imagine why or how the library could do anything but subscribe to journals, even as they struggle to document and organize their work.   We have huge obstacles to overcome, but the library remains a trusted brand, and our partners are out there and talking.   Johanna Drucker wrote recently in the Chronicle of Higher Education:

“The design of new [online] environments for performing scholarly work cannot be left to the technical staff and to library professionals. The library is a crucial partner in planning and envisioning the future of preserving, using, even creating scholarly resources. So are the technology professionals. But in an analogy with building construction, they are the architects and the contractors. The creation of archives, analytic tools, and statistical analyses of aggregate data in the humanities (and in some other scholarly fields) requires the combined expertise of technical, professional, and scholarly personnel.” [10]

In other words, we have to engage and guide researchers, but we must also let them lead us, possibly where we might not have expected, or maybe even wouldn’t want them to go.  We can’t assume we know best, or the library will end up running a repository, i.e., “a place in which a dead body is deposited; a vault or sepulchre.”

[This essay began as a longer piece titled “What We Talk About When We Talk About Repositories,” which appeared in Reference and User Services Quarterly, September 2009, 49.1: 18-23.  It is addressed primarily to the librarians among us.]


[1] OED Online, s.v. “Repository” www.oed.com

[2] Jefferey R. Young, “‘Superarchives’ Could Hold All Scholarly Output: Online Collections by Institutions May Challenge the Role of Journal Publishers,” Chronicle of Higher Education, July 5, 2002 2002.

[3] Raym Crow, “The Case for Institutional Repositories: A SPARC Position Paper,”  (Washington, DC: The Scholarly Publishing & Academic Resources Coalition, 2002).

[4] Andrea L. Foster, “Papers Wanted:  Online Archives Run by Universities Struggle to Attract Material,” Chronicle of Higher Education, June 25, 2004 2004.

[5] Dorothea Salo, “Unappetizing Metaphors.”

[6] Clifford Lynch, Joan Lippincott, “Institutional Repository Deployment in the United States as of Early 2005,” D-Lib Magazine 11, no. 9 (2005).

[7]Throughout this essay, I use the term “data” broadly to refer to just about anything that is in digital form and of enduring interest to scholars or librarians.

[8] Catherine Mitchell, “Let’s Stop Talking About Repositories: A Study in Perceived Use-Value, Communication and Publishing Services,” in SPARC Digital Repositories Meeting 2008 (Baltimore, MD: 2008).

[9] Dorothea Salo has quite effectively written about the failure of institutional repository programs, attributing much of it to a failure of vision and leadership that results in a poor alignment of resources with the program goals. See Dorthea Salo, “Innkeeper at the Roach Motel ” Library Trends 57, no. 2 (2008).

[10] Johanna Drucker, “Blind Spots:  Humanists Must Plan Their Digital Future,” The Chronicle of Higher Education, April 3, 2009.

[NOTE:  This post was edited on May 14, 2014 to update the link to the original essay.]