blog - nullhandle.org

Using Wayback Machine for Research

Published by Nicholas Taylor on 26 October 2012

Prompted by questions from Library of Congress staff on how to more effectively use web archives to answer research questions, I recently gave a presentation on "Using Wayback Machine for Research" (PDF). I thought that readers of The Signal might be interested in this topic as well. This post covers the outline of the presentation.

The Immeasurable Library of Congress

Published by Nicholas Taylor on 6 August 2012

Online photo platforms increasingly support the precise positioning and browsing of user-submitted images in a three-dimensional space. Unsurprisingly, the geographic locations that are most thoroughly blanketed with photos correspond to popular tourist attractions; so many photos are taken as to construct "complete" digital facsimiles. These digitized versions are never really complete, though, because no number of photos could perfectly capture and represent every aspect of the originals. It is for much the same reason that the data stored in the Library of Congress won't fit on a 10-terabyte hard drive.

Harvesting and Preserving the Future Web: Replay and Scale Challenges

Published by Nicholas Taylor on 18 June 2012

This is the second part of a two-post recap of the "Harvesting and Preserving the Future Web" workshop at the recent International Internet Preservation Consortium General Assembly.

The session was divided into three topics:

Capture: challenges in acquiring web content;
Replay: challenges in recreating the user experience from the archived content; and
Scale: challenges in doing both of these at web scale.

Having covered the topic of capture previously, this post addresses replay and scale.

Harvesting and Preserving the Future Web: Content Capture Challenges

Published by Nicholas Taylor on 1 June 2012

Following our earlier summary of the recent International Internet Preservation Consortium General Assembly, I thought I'd share some of the insights from the workshop, "Harvesting and Preserving the Future Web".

The workshop was divided into three topics:

capture, challenges in acquiring web content;
replay, challenges in recreating the user experience from the archived content; and
scale, challenges in doing both of these at web scale.

I'll be talking about capture here, leaving replay and scale for a second post.

The Value of a Broken Link

Published by Nicholas Taylor on 28 March 2012

What is the value of a broken link? For understandable reasons, many would say, "not much." While the destination-unaware nature of hyperlinks has facilitated the decentralized growth of the web, it has also greatly contributed to the perceived ephemerality of its contents. A recent literature review of the extent of link rot in academic publications, for example, demonstrated broken link rates of 39-83%. Concerned, in particular, with this reality, the Modern Language Association prominently dispensed with the requirement that URLs be included in works-cited lists in the most recent revision of its handbook. It might be said that if links are the currency of the web, then broken links are paper notes in a defunct coinage.

Designing Preservable Websites, Redux

Published by Nicholas Taylor on 6 February 2012

As much as we can do to preserve archived websites once we have them, the challenges we encounter are always already determined by how those websites were originally constructed. In the interest of giving us and others the best possible chance of preserving your online content, I wanted to follow on an excellent blog post by Robin Davis (previously) of the Smithsonian Institution Archives on the topic of designing preservable websites. Here are some best practices to keep in mind:

Previous Years

2025
2024
2023
2022
2021
2019
2018
2017
2016
2015
2014
2013
2012
2011