Update 2022-09-17: Added the 50 million missing 2007-2008 /b/ posts! Read about this and other changes here.

Welcome to the new Oldfriend Archive, hosting over 160M text-only 2005-2008 4Chan posts.

History

Site History

Updated 2022-09-17

V1

The first version of this site was created in October 2021. This version served about 114M posts/7.5M threads across 44 boards dating from 2005 to 2008.

Initially I was not able to serve all of the posts which had originally been scraped. The "Archive Ten Billion" post dump the data was sourced from stored three formats of the same data, each of which had some flaws:

  • The MyISAM database files version (chanarchive.tar.gz) was corrupt.
  • The HTML version was missing post numbers and timestamps, and so could not be imported into FoolFuuka without faking some data. The HTML /b/ gzip was corrupt.
  • The XML version had post numbers and timestamps, but was missing gzips for /b/, /p/ and /con/ (i.e. about 50% of the posts in the archive).

The site initially used the XML version of the dump, so it was missing /b/, /p/ and /con/. A few days after bringing the site up, I was able to retrieve some of the posts for /b/ and /p/ in the pre-corrupt part of the decompressed MyISAM files gzip. Then a few days after that, I retrieved some more posts for /b/ and /p/ from the very end of the database, where the effects of the gzip corruption had mostly trailed off.

This version of the site had about 36 million /b/ posts, and was missing about 50 million. Most other boards were roughly complete.

V2

This version of the site replaced the old server on 2022-09-17. This was mainly a content update, and increases the size of the archive to 162M posts/10.8M threads (+40%) across 45 boards (with the addition of /con/).

In summer 2022, I figured out a way to locate and repair the damage to chanarchive.tar.gz (just one flipped bit - the way gzip works, a single incorrect bit can corrupt all the data that follows). The HTML dump provided necessary contextual clues for this process, so in the end Ten Billion's weird triplicate format saved the day.

In September 2022, I migrated the data to the FoolFuuka/Asagi format. Here are the update highlights:

  • Added about 50 million more /b/ posts, covering a period of massive traffic growth in 2007 and 2008.
  • Migrated to a bigger server (needed more space for the new posts).
  • Added previously missing /p/ posts from the same date range as the "new" /b/ posts.
  • Found content from /con/ (Conferences), a board which was frequently reset or locked between events.
  • Split up /n/, /sp/ and /con/ into numbered "board iterations" so that same-numbered posts from before and after board resets could all coexist in FoolFuuka. Otherwise, some thread links would be ambiguous, since board resets have overlapping post/thread numbers, and some threads would contain irrelevant posts made years apart. This breaks some external links to the old version of the site, but the content should still exist (replace /sp/ with /sp1/ or /sp2/ etc.)
  • Fortunes fixed.
  • HTML stripped from subject fields.
  • A handful of under-the-hood tweaks.

Known issues

  • Bold text is handled improperly (replaced with "1").
  • There are some irrelevant options in the search form.