I previously examined and documented the behavior of the Google date operators as a tool for approximating what content would have been presented to a user at specific points in time. Another web-scale search engine that permits date filtering is Microsoft Bing. While it has never been nearly as popular as Google, it can nonetheless be useful as an additional source of information about what webpages may have been discoverable when.
To better understand how Bing's date filtering works and to probe any particularities, I adapted, repeated, and documented some of the tests I'd also conducted with Google. It turns out that Bing is generally both more conservative and more transparent in how it handles dated web content, making its utility for approximating historical web discovery more limited.
How Microsoft explains Bing date filtering
As with the Google date operators, let's start by reviewing Microsoft's explanation of how Bing date filtering works.
Conveniently, the Bing API documentation indicates support for fixed-value before date filtering, up to 30 days in the past, as well as limiting results to a specific time frame. At least for the API, no open-ended after date is supported, though presumably this could be specified using today as the end date.
The date filter keys off of the Webpage object's datePublished property, which is described as,
The time that webpage published. The date is in the form, YYYY-MM-DDTHH:MM:SS.
Example: 2015-04-13T05:23:39.
with a string value type.
Google is more vague about the meaning of its recorded dates, indicating that they may also reflect the date that a webpage was "significantly" updated. With Bing, if the documentation is correct, it appears that Bing shouldn't care about changes made to a webpage subsequent to its publication; the original publication date will be used for the purpose of date filtering.
Dated indexed web resources
Let's repeat some of the searches from the previous investigation and see what Bing returns. I will again include screenshots for temporally-dependent example queries or webpages, recognizing that these may change over time.
Like Google, Bing supports a site: operator, that constrains the results to a given domain or web address path. A search for any and all content on the Supreme Court of the United States (SCOTUS) website returns 2,960 results (screenshot). We know from previous exploration in the Internet Archive Wayback Machine (IAWM) that the first capture of supremecourt.gov dates to 20 March 2010.
Based on this date, let's further constrain the Bing site: search to the time frame 20 March 2010 to today's date (i.e., 12 June 2024) (screenshot). The results count isn't indicated on the first page of results, though, curiously, it appears at the top after advancing to the second page of results; there are 519 results (screenshot).
The web address syntax is more opaque than Google's; whereas Google has an easily manipulable parameter for date-limiting (e.g., "after%3A2010-03-20"), Bing's (e.g., "%26filters%3dex1%253a%2522ez5_14688_19886%2522") isn't immediately intelligible. Through further investigation, it turns out that the underscore-separated number-only strings (e.g., "14688_19875") represent day counts added to Unix epoch time (i.e., 1 January 1970), for the after and before filter dates, respectively.
Let's double check that results aren't being excluded on account of the 20 March 2010 date being set for the start of the range (i.e., on the possibility that Bing indexed content prior to that date that the Internet Archive didn't) by trying the date of publication of the very first website, 6 August 1991. Here is the search (screenshot). Clicking through to the second page of results, we see that 16 results were returned (screenshot).
Taking a closer look at one of the results with the earliest date, "Our Democratic Constitution" with a Bing-recorded date of 22 October 2001, we can see conspicuous body text with that date. My suspicion is that Bing has derived the date from the body text and that this doesn't actually reflect the date that the webpage was available at this web address.
IAWM corroborates this hunch, with a first capture on 12 October 2014. This doesn't in and of itself establish that the webpage wasn't available earlier, but given that no web resources for supremecourt.gov were captured earlier than 20 March 2010, it is very unlikely that the speech webpage was available in 2001, as Bing would seem to suggest.
As an aside, a quick check on Google demonstrates that it, too, has recorded the date 22 October 2001 for this webpage (screenshot). The takeaway is that both Bing- and Google-recorded dates may reflect that of the content on the page and not necessarily of the webpage itself; IAWM remains a more reliable ground-truth, if relevant captures are available.
Circling back to the discrepancy between the results count for the domain-constrained Bing search (screenshot) and the domain-constrained Bing search for the full effective time frame that the website has existed at supremecourt.gov (screenshot) — i.e., ostensibly functionally equivalent searches — strongly suggests that not all web resources have an associated datePublished property in the Bing index and furthermore that those lacking this property aren't returned for any sort of date-filtered search.
Contemporary website leakage
That webpages indexed by Bing without recorded dates are excluded from date-filtered searches makes some of the subsequent tests I tried with Google superfluous. Let's therefore jump ahead to examining whether Bing returns relevance-matched results based on content added to a webpage outside of a specified date-filtered time frame.
The previous example used a SCOTUSblog post from 2 June 2021 about the retirement of the Court Public Information Officer.
Comparing the earliest (2 June 2021) and latest (31 March 2023) IAWM captures of the webpage, we can see that they're highly similar. One notable difference is that Emergency Docket appears in the top navigation bar in the latter capture. Let's use this difference to tease out what Bing searches against.
We can learn more about Bing's indexing of a given webpage by looking at its cache (for now, anyway). Bing's cached version of the SCOTUSblog post is from 11 June 2024 (screenshot) and predictably includes Emergency Docket in the top navigation bar.
A search limited to the web address of the post (i.e., without date filtering) also predictably returns it as a result, with a publishedDate of 2 June 2021 (screenshot), consistent with the publication date indicated on the webpage.
Now let's try adding the keywords, emergency docket. Predictably, the result still shows up (screenshot).
We know from IAWM that the text, "Emergency Docket", wasn't part of the top navigation bar as of 2 June 2021. However, if we perform the previous search of emergency docket limited to that web address, additionally filtered to 20 March 2010 to 3 June 2021 (i.e., one day after the publication date), we still get the search result (screenshot).
Conducting the same search for a keyword that is not on the webpage — jackson — does not return it as a result (screenshot).
The upshot is that Bing, like Google, is susceptible to contemporary "leakage" — i.e., relevance matching takes place against the most recently indexed version of the webpage, not a cached, historical version of the webpage from the time of its publication or index-recorded date.
Conclusions
In summary, Bing appears to be both more conservative and more transparent in its assignments of dates, compared to Google.
This conservatism, combined with webpages without assigned dates being entirely excluded from date-filtered searches, makes Bing a less useful tool for approximating historical search results.
While Bing is nonetheless still useful for identifying historically discoverable webpages, it may not provide new date information that wasn't already evident.
As with Google, and especially given the observation of the potential discrepancy between webpage content dates and webpage dates, IAWM remains an important complement for validating dates recorded by Bing.