Caveats on the Internet Archive Standard Affidavit

Published by on

The Internet Archive (IA) standard affidavit has for some time functioned as a routine, reliable mechanism for the authentication and admission of Internet Archive Wayback Machine (IAWM) evidence in legal proceedings. While the prevailing acceptance of the IA standard affidavit for this purpose is probably reasonable, it is important to be mindful of IA's own stated qualifications on their attestation. The gap between authentic copies of IA's records and faithful re-presentations of historical webpages may be significant, and legally material.

I'll unpack further what I mean by these caveats, but let's start by reviewing the performance of the IA standard affidavit. Surveying U.S. federal court cases, it is overwhelmingly common that judges approve of the application of the IA standard affidavit for the authentication and admission of IAWM evidence.

"Paperwork" by Chris Betcher under CC BY-SA 2.0
I've discovered only one case so far where a correctly applied IA standard affidavit failed to effect the admission of IAWM evidence. In Murj, Inc. v. Rhythm Mgmt. Grp., 5:21-cv-00072-EJD (N.D. Cal. Aug. 22, 2022), the defendant requested judicial notice of IAWM evidence on a motion to dismiss, backed by an IA standard affidavit. The judge concluded that acceding to the request would have been tantamount to adjudicating disputed facts, which was inappropriate for a motion to dismiss, and consequently denied admission of the IAWM evidence.

There were a couple of other, older cases where the IA standard affidavit was misapplied, again resulting in the proffered IAWM evidence being denied admission.

In St. Luke's Cataract Laser Institute, P.A. v. Sanderson, 8:06-cv-223-T-MSS (M.D. Fla. May. 12, 2006), the party pointed to an IA standard affidavit applied in another, entirely separate case, Telewizja Polska USA, Inc v. Echostar Satellite Corporation, 02 C 3293 (N.D. Ill. Oct. 14, 2004). The judge indicated that an IA standard affidavit pertaining to the IAWM evidence in the present case could yet be provided and would then suffice for authentication.

In Sam's Riverside, Inc. v. Intercon Solutions, Inc., 790 F. Supp. 2d 965 (S.D. Iowa 2011), the party produced the IA standard affidavit but didn't include the screenshots themselves with its submission. The judge indicated that the IA standard affidavit would have been sufficient but that the IAWM evidence could not be authenticated without the designated screenshots attached.

There are, meanwhile, numerous U.S. federal court cases in which the IA standard affidavit has effected the authentication and admission of IAWM evidence:
And there are a number more U.S. federal court cases where the judge either implied or explicitly indicated that application of the IA standard affidavit likely would have resulted in the authentication and admission of IAWM evidence (i.e., where the evidence had otherwise been denied admission, under an alternative approach):
It's obviously harder to more expansively benchmark performance in jurisdictions outside of the federal context. It should be noted, though, that the governing rules of evidence in diverse jurisdictions do not universally support the application of the IA standard affidavit for the purpose of authentication and admission of IAWM evidence.

Moving from the cataloging of its performance to a review of its actual contents, the IA standard affidavit includes an explanation of what, precisely, is being re-presented in IAWM and the scope of the attestation that IA's agent is making with respect to the indicated screenshots. There are two particular sections of the example affidavit that I want to highlight; the first of those is:
"The date indicated by an extended URL applies to a preserved instance of a file for a given URL, but not necessarily to any other files linked therein. Thus, in the case of a page constituted by a primary HTML file and other separate files (e.g., files with images, audio, multimedia, design elements, or other embedded content) linked within that primary HTML file, the primary HTML file and the other files will each have their own respective extended URLs and may not have been archived on the same dates."
We're accustomed to webpages being a composite of various constituent resources — e.g., hypertext markup language, or HTML; images, such as GIFs, JPEGs, or PNGs; styling and layout instructions, or CSS; and executable software code, like JavaScript. Webpages in IAWM are composites in this sense, but they are also temporal composites, as each of those constituent resources may have been captured at (marginally, or dramatically) staggered moments in time.

On the replay side, when a user goes to access a historical webpage, IAWM uses a temporal proximity heuristic to assemble the re-presented composite. For a given snapshot of an archived webpage, it will select for the snapshots of embedded resources — e.g., images, scripts, style and layout instructions — with the capture timestamps closest to when the webpage itself was archived.

2004-12-09 IAWM capture of www.wunderground.com
2004-12-09 IAWM capture of www.wunderground.com webpage for Varina, Iowa
Sometimes this results in webpages being re-presented that never existed as such. Using various timestamping techniques, it is possible in some instances even to demonstrably establish that the reconstruction of a given webpage capture is ahistorical. Scott Ainsworth found a great example of this in the course of his research on web archive temporal coherence, which I illustrated in a series of slides in a recent presentation pdf icon.

To IA's credit, the "About this capture" link in the IAWM overlay banner summarizes the temporal spread of a webpage capture's constitutent resources. However, just because an asset was archived temporally near or far from the root webpage that links to it doesn't necessarily mean that it either was or wasn't the specific version of the asset available at the time that the root webpage was archived. The temporal coherence of specific assets with their root webpage must typically be ascertained through additional analysis.

In any case, this context is missing when the archived webpage is saved as a PDF and/or printed, as is required for the purposes of its use with the IA standard affidavit and submission to a court's filing system. The temporal topography of the archived webpage is flattened into a screenshot, that gives the misimpression that, "this is what the webpage looked like at the time and date that it was archived," which may or may not be true. Temporal coherence, moreover, is not the only reason that an archived webpage can't always be taken at face value, but in practice it is likely to be the most common reason.

The second key excerpt from the IA standard affidavit is:
"Attached hereto as Exhibit A are true and accurate copies of browser printouts of the Internet Archive's records of the archived files for the URLs and the dates specified..."
I'm pretty sure that this is a categorical attestation; IA isn't performing any individuated analysis of submitted URLs. That is to say that IA likely wouldn't hesitate to attest that even the most conspicuously temporally incoherent webpage was a, "true and accurate [copy]" of its records. And that would be entirely accurate and reasonable, since any content in IAWM should axiomatically be authentic copies of their records; it's just that care must be exercised not to draw incorrect conclusions.

IA appears to recognize the potential for misinterpretation, judging by this FAQ about the IA standard affidavit:
"Does the Internet Archive's affidavit mean that the printout was actually the page posted on the Web at the recorded time?

The Internet Archive's affidavit only affirms that the printed document is a true and correct copy of our records. It remains your burden to convince the finder of fact further regarding the presence of past online material."
Notwithstanding Ainsworth's finding that only a minority of randomly sampled IAWM webpages were demonstrably temporally coherent, I suspect that, in most cases, a closer examination of the temporal coherence of historical webpages re-presented in IAWM wouldn't have been material to the legal claims at hand. Ainsworth didn't explore the magnitude or meaning of identified instances of temporal incoherence and, besides, the IAWM evidence of interest in many cases is text on the archived webpage itself, for which the temporal coherence of other embedded assets doesn't matter.

That said, if there haven't already been, then there at least will be cases where the validity of certain claims does turn on the difference between what a PDF screenshot of an IAWM webpage shows and what the temporal characteristics of key embedded assets reveal. The quite real possibility for this discrepancy shouldn't be a cause to fundamentally doubt IAWM evidence, but it is an opportunity for the legal community to get more sophisticated in how it considers it.

PermalinkCreative Commons Attribution-ShareAlike 4.0 International License