Reliable and Confidential Archiving of U.S. Supreme Court Web Cites

Published by Nicholas Taylor on 17 September 2014

The topic of link rot in legal materials has been an area of study for some time (PDF) but has only recently surfaced in the mainstream press with the high-profile findings by two separate law journal articles of the high incidence of broken links in U.S. Supreme Court opinions. It's a problem that the Court has made some effort to mitigate, albeit in a disjointed way. Staff working for the Reporter of Decisions pre-check the citations in all-but-published opinions and save cited web content to PDF for inclusion in the files, an approach consistent with 2009 guidance (PDF) issued by the U.S. Federal Courts. The Ninth Circuit Library uses a similar technique but does one better by actually posting the PDFs online.

The current setup isn't ideal, though. A cited link in a PDF opinion provides a false sense of confidence that the resource it once represented still exists at the same location and, more critically, still exists at the same location unchanged. What might it take to put in place a solution for more trustworthy archiving?

Let's unpack the problem a bit more.

PDFs of web cites (adopting the Ninth Circuit Library's handy neologism) are created late in the process, so there's no guarantee that the represented resource is the same as what the opinion drafter intended, potentially many months prior. This was cheekily pointed out by the new owner of the host http://ssnat.com/ after Justice Alito cited it in his concurring opinion for Brown v. Entertainment Merchants Association. Though presumably the Reporter's Office stores a PDF that more accurately reflects that particular web cite, the Court can make no general assurances that the stale web cites within the opinions themselves are still accurate or even functional.

Archiving links at the same time they're cited would optimally allow for the web addresses of archived versions to be placed along side the original web cites in the final opinions, but it presents challenges. The archiving process would need to be simple, so as not to burden opinion drafters. While creating a PDF from a web page is trivial, creating a workflow to aggregate and manage the many resulting PDFs would be hard. Protecting the confidentiality of the Court's research imposes additional requirements. The archiving requests of the different chambers would need to be firewalled from each other and the rest of the Court. If WARC rather than PDF were the format of choice, automated methods of archiving web cites like Heritrix or wget would need to be carefully disguised or proxied through a third-party service provider, so as not to leak hints about which way the Court may be leaning or along what lines of argument.

Strong objections would need to be overcome to employ a third-party service provider for a purpose so close to the Court's research. On the other hand, recent transparency-focused applications of government agency IP address ranges (e.g., @congressedits) might make outsourcing more appealing. Reed Archives would be a logical candidate, as they are a subsidiary of LexisNexis, a company that the Court already trusts. Perma.cc would be another candidate. The multi-institutional storage redundancy is likely to be seen as more of a liability than a feature in this context, though, and provisions would need to be established to ensure that copyright claims didn't trump the Court's prevailing interest in providing persistent access to the bases for its jurisprudence.

With its browser integration, click-to-capture ease-of-use, and focus on citation metadata management, Zotero would both readily handle the data capture requirements and integrate with the research workflow. Data captured to individual Zotero libraries could be aggregated by means of WebDAV syncing to an internal server. Unfortunately, WebDAV syncing isn't supported for Zotero Group libraries, which would likely be essential for the Reporter's staff to corroborate synced files with their originating web cites and might otherwise be used to good effect for flexible, discrete, and discreet collaborations within and/or beyond a given justice's chambers. Even if this were a feature, all groups must be registered and managed through the Center for History and New Media, posing additional overhead and possible confidentiality issues.

If a solution for near-contemporaneous web archiving could be configured, then the resulting archived web cites could be made accessible via permalinks pointing either to PDFs or, in the case of WARCs, Wayback access points hosted by the Court, and embargoed for public access until the opinions were published. Foreknowledge of the "permanent" location of the archived web resources would allow the Reporter's Office to amend the opinion web cites with the permalinks prior to publication. To be sure, there are many more solutions available if archiving took place after the opinions were published, but these web cites couldn't be treated as authoritative. With consideration of some of the challenges outlined above, I hope the Court can progress toward ensuring that their opinion web cites are as reliable as anything else referenced in their opinions, without compromising the confidentiality of the research process.

Permalink | Crossposted to Legal Information Preservation Alliance Blog