A Brief Primer on Using Web-Archived Evidence

Published by Nicholas Taylor on 7 June 2023

One would suppose that Elon Musk had adopted Meta's (née Facebook's) famous motto, "move fast and break things" since his takeover of Twitter in October 2022. A steady stream of news articles has chronicled the numerous and not entirely successful experiments that he has undertaken with the company and its service. We need not even take journalists' word for it, or rely on the continued accessibility and discoverability of those articles; many of the changes themselves can be observed by anyone through indelible web archives, most notably among them the Internet Archive Wayback Machine (IAWM).

Through IAWM, we could effectively and non-exhaustively document the following changes:

The elimination of legacy blue check marks;
The deletion of the official Twitter @Policy account;
Updates to the Hateful Conduct policy;
Updates to the State-Affiliated Media policy; and
Updates to the Inactive Account policy.

It is too soon to say whether any of these changes or others may bring about legal consequences — the company is notably under an FTC consent decree for misuse of user e-mail addresses and phone numbers provided for account security purposes instead for targeted advertising — but if they do, it is easy to see how the archived web content would be probative. The conspicuous fact of a readily accessible, archived documentary record appears to be little deterrent to abrupt or, even, post hoc changes to longstanding policies.

Twitter's vicissitudes are but one recent and highly visible example of how web archaeology — that is, the use of web archives like IAWM and other tools to discover and authoritatively characterize how particular web content has changed over time — can furnish instrumental evidence for litigation. The primacy of the web surely means that the incidence of cases that turn on what information may have appeared online, when, will only increase. It thus behooves attorneys to gain some understanding of web archaeology, its methods and concerns. This article provides related background, reviews key techniques and tools, and examines strategies for securing the admission and authentication of IAWM evidence, in particular.

Background

It is paradoxically said of the web both that it is ephemeral — constantly turning over with little assurance of the persistence of its history — as well that it "never forgets" — that it affords the possibility that some content, once uploaded, will inexorably circulate in perpetuity. In between these extremes, it is probably most reasonable to say that the web's memory is selective and inconstant, and that, even without dipping into web archives, the "contemporary" web that we experience is in fact a palimpsest of many eras of web history.

This can be keenly illustrated with a couple of websites that one could go visit right now. The New York Times website home page will surely present content no more than minutes old whenever it is visited. Meanwhile, the website for the 1996 film Space Jam has been continuously available in its original form since 1997. These websites both belong to the contemporary web in that they are as easily accessible as one another — just enter either web address in a browser and submit — but that belies their distinct temporal provenance.

That is all just to say that the web has a temporal topography; it exists in time as well as space. And the temporal dimensions of web content — when it was published, updated, available, or discoverable on the web — may be material to any number of legal concerns: patent disclosures, use of marks or copyrighted materials, terms of service, advertised claims regarding products or services, defamatory statements, etc. There are hundreds of legal proceedings, at least, over the last two decades in which IAWM has been employed for these kinds of matters.

Considerations for Attorneys

Techniques for ascertaining the temporal dimensions of web content vary. IAWM and other public web archives are the most obvious tools, as they can re-present snapshots of historical web content. The contents of such archives, however, are far from comprehensive, in terms of which web resources they have preserved, with what frequency, over what time span, and with what resulting fidelity on replay. And, of course, they only contain web content that is (or was) publicly accessible. Private or authentication-protected web content (e.g., some social media, collaboration platforms, intranet websites, subscription media, etc.) must instead be the subject of electronic discovery negotiations.

What publicly accessible web archives exist are the best-effort work of cultural heritage and governmental organizations, and what they archive is shaped by their collecting policies and mandates, which may or may not conform to the content that most frequently turns out to be of consequence to a legal proceeding. Invariably occurring gaps in archival coverage must be carefully interpreted in order to draw appropriate conclusions. For example, it is generally most reasonable to assume that the intermediate (but not directly observed) state of an historical webpage is the same as the archival snapshots that bound it on either end if those snapshots are identical. However, the length of time between the two captures, the frequency with which the webpage or website is observed to change over time, and trace metadata may be mitigating considerations.

Moreover, attorneys should understand that webpages accessed through IAWM and most other public web archives are essentially qualified reconstructions rather than necessarily faithful historical facsimiles. The assets that make up an individual archived webpage, to say nothing of an entire archived website, may have been archived at (sometimes, dramatically) staggered points in time, but IAWM-like replay engines typically co-present them, in the interest of completeness. This can result in temporally incoherent composites, that may not stand up to expert scrutiny.

Categorical statements like, "this is what this webpage looked like on this date, according to IAWM" invite authenticity challenges, e.g., for co-presented images that may be demonstrably shown not to have been displayed on the webpage at the time. Or claims of when a particular image (e.g., a graphic trademark or copyrighted photo) was first used on a given webpage may be discordant with the capture timestamp of the webpage itself (i.e., the image itself has a different capture date, or other trace metadata suggests that it was available online earlier or later). Attorneys should mind the inherently reconstructive nature of web archives when assessing this kind of evidence and make assertions based on their contents that are no more expansive than they need to be to support their case.

IAWM and other web archives are not the only tools for discovering the temporal dimensions of web content. Attorneys can also use other techniques, such as examining HTML source code or embedded image metadata, prospecting timestamped in-links (e.g., from social media or, even, IAWM), leveraging search engine indexes and temporal operators, and other open-source intelligence (OSINT) strategies. These approaches underscore that there are many more sources of information about the temporal dimensions of web content than what is visibly self-declared (e.g., footer text indicating the date that a webpage was updated). Some of that information may be latent in the content itself, or it may be discoverable through third-party sources. IAWM can be a complement to any of these techniques.

Authentication and Admissibility

The next challenge for attorneys is the authentication and admissibility of web-archived evidence in a court of law. Under the Federal Rules of Evidence (FRE) 901, in order for proffered information to be considered as evidence, there must be a sufficiently persuasive showing to the judge that it is what it purports to be. Archived web content is a special case and tends to face a higher bar for authentication than other websites, for a few reasons: an archived webpage accessed through IAWM may look or behave conspicuously differently than its “live web” version would; the Internet Archive disclaims the necessary completeness or accuracy of IAWM contents; and the Internet Archive lacks the authority of a governmental organization (i.e., whose websites typically benefit from a greater presumption of authenticity).

Courts do but should not base their denials on these reasons. Archived webpages accessed through IAWM naturally behave differently than live webpages, on account of temporal navigation, versioning, an encapsulated web context, etc.; this can be qualified and explained as just how web archives work. The Internet Archive makes disclaimers about IAWM because their archive of the web is necessarily incomplete, and the previously mentioned temporal heuristics may result in historically inaccurate composites as part of normal IAWM functioning. Lastly, while IAWM is not a governmental organization, it is a non-profit digital library focused on the preservation of digital information of all types.

In all but those jurisdictions with a strong predisposition towards the authenticity of IAWM evidence, it will nonetheless be necessary to provide some foundation for its authentication and admission. Strategies for overcoming predictable concerns include:

Requesting judicial notice under FRE 201, which provides for judicial discretion in the admission of evidence whose authority or veracity is so well established as to not be subject to reasonable dispute;
Securing a standard legal affidavit from the Internet Archive, that attests to that proffered screenshots are authentic copies of IAWM records;
Presenting testimony of a witness with personal knowledge of the web content in question under FRE 901(b)(1),
Submitting the testimony of an expert witness under FRE 702, and
Achieving mutual stipulation by parties to the proceeding.

A comprehensive review of the treatment of IAWM evidence in the U.S. Federal Courts

suggests that the Internet Archive affidavit and expert witnesses are the most reliable strategies; judicial notice is the most used strategy, though its success varies by judge and circuit; a witness with personal knowledge of the web content can work; and mutual stipulation is uncommon.

Conclusion

The enduring primacy of the web points towards the increasing relevance of web content for legal proceedings for all sorts of matters. The longevity of the web and the ways in which the temporal dimensions of web content are critical to particular claims and cases also suggest that the ability to authoritatively ascertain these details will be increasingly important. To that end, attorneys would do well to better acquaint themselves with web archaeology and what possibilities it affords for making sense of, and making use of, the temporal web.

Permalink | Crossposted to Law360