Using Wayback Machine for Research


Prompted by questions from Library of Congress staff on how to more effectively use web archives to answer research questions, I recently gave a presentation on "Using Wayback Machine for Research" (PDF). I thought that readers of The Signal might be interested in this topic as well. This post covers the outline of the presentation.

Read more »

The Immeasurable Library of Congress


Online photo platforms increasingly support the precise positioning and browsing of user-submitted images in a three-dimensional space. Unsurprisingly, the geographic locations that are most thoroughly blanketed with photos correspond to popular tourist attractions; so many photos are taken as to construct "complete" digital facsimiles. These digitized versions are never really complete, though, because no number of photos could perfectly capture and represent every aspect of the originals. It is for much the same reason that the data stored in the Library of Congress won't fit on a 10-terabyte hard drive.

Read more »

Harvesting and Preserving the Future Web: Replay and Scale Challenges


This is the second part of a two-post recap of the "Harvesting and Preserving the Future Web" workshop at the recent International Internet Preservation Consortium General Assembly.

The session was divided into three topics:
  1. Capture: challenges in acquiring web content;
  2. Replay: challenges in recreating the user experience from the archived content; and
  3. Scale: challenges in doing both of these at web scale.
Having covered the topic of capture previously, this post addresses replay and scale.

Read more »

Harvesting and Preserving the Future Web: Content Capture Challenges


Following our earlier summary of the recent International Internet Preservation Consortium General Assembly, I thought I'd share some of the insights from the workshop, "Harvesting and Preserving the Future Web".

The workshop was divided into three topics:
  1. capture, challenges in acquiring web content;
  2. replay, challenges in recreating the user experience from the archived content; and
  3. scale, challenges in doing both of these at web scale.
I'll be talking about capture here, leaving replay and scale for a second post.

Read more »

The Value of a Broken Link


What is the value of a broken link? For understandable reasons, many would say, "not much." While the destination-unaware nature of hyperlinks has facilitated the decentralized growth of the web, it has also greatly contributed to the perceived ephemerality of its contents. A recent literature review of the extent of link rot in academic publications, for example, demonstrated broken link rates of 39-83%. Concerned, in particular, with this reality, the Modern Language Association prominently dispensed with the requirement that URLs be included in works-cited lists in the most recent revision of its handbook. It might be said that if links are the currency of the web, then broken links are paper notes in a defunct coinage.

Read more »

Designing Preservable Websites, Redux


As much as we can do to preserve archived websites once we have them, the challenges we encounter are always already determined by how those websites were originally constructed. In the interest of giving us and others the best possible chance of preserving your online content, I wanted to follow on an excellent blog post by Robin Davis (previously) of the Smithsonian Institution Archives on the topic of designing preservable websites. Here are some best practices to keep in mind:

Read more »

Previous Years

2019
2018
2017
2016
2015
2014
2013
2012
2011