Update 2024-03-27: Greatly expanded the "Samples" page and renamed it to "Glossary".
Update 2024-04-04: Added 5 million mid-2011 posts from the k47 post dump. Browse (mostly) them here.
Update 2024-04-07: Added ~400 October 2003 posts from 4chan.net. Browse them here.

Welcome to Oldfriend Archive, hosting ~170M text-only 2003-2014 4chan posts (mostly 2006-2008).

History

Site History

Updated 2024-04-07

Initial "Archive Ten Billion" archive

The first version of this site was created in October 2021. This version served about 114M posts/7.5M threads across 44 boards dating from 2005 to 2008.

Initially I was not able to serve all of the posts which had originally been scraped. The "Archive Ten Billion" post dump the data was sourced from stored three formats of the same data, each of which had some flaws:

  • The MyISAM database files version (chanarchive.tar.gz) was corrupt.
  • The HTML version was missing post numbers and timestamps, and so could not be imported into FoolFuuka without faking some data. The HTML /b/ gzip was corrupt.
  • The XML version had post numbers and timestamps, but was missing gzips for /b/, /p/ and /con/ (i.e. about 50% of the posts in the archive).

The site initially used the XML version of the dump, so it was missing /b/, /p/ and /con/. A few days after bringing the site up, I was able to retrieve some of the posts for /b/ and /p/ in the pre-corrupt part of the decompressed MyISAM files gzip. Then a few days after that, I retrieved some more posts for /b/ and /p/ from the very end of the database, where the effects of the gzip corruption had mostly trailed off.

This version of the site had about 36 million /b/ posts, and was missing about 50 million. Most other boards were roughly complete.

More Ten Billion content

The site moved to new hosting on 2022-09-17 to accommodate more content. This was mainly a content update, and increases the size of the archive to 162M posts/10.8M threads (+40%) across 45 boards (with the addition of /con/).

In summer 2022, I figured out a way to locate and repair the damage to chanarchive.tar.gz (just one flipped bit - the way gzip works, a single incorrect bit can corrupt all the data that follows). The HTML dump provided necessary contextual clues for this process, so in the end Ten Billion's weird triplicate format saved the day.

In September 2022, I migrated the data to the FoolFuuka/Asagi format. Here are the update highlights:

  • Added about 50 million more /b/ posts, covering a period of massive traffic growth in 2007 and 2008.
  • Migrated to a bigger server (needed more space for the new posts).
  • Added previously missing /p/ posts from the same date range as the "new" /b/ posts.
  • Found content from /con/ (Conferences), a board which was frequently reset or locked between events.
  • Split up /n/, /sp/ and /con/ into numbered "board iterations" so that same-numbered posts from before and after board resets could all coexist in FoolFuuka. Otherwise, some thread links would be ambiguous, since board resets have overlapping post/thread numbers, and some threads would contain irrelevant posts made years apart. This breaks some external links to the old version of the site, but the content should still exist (replace /sp/ with /sp1/ or /sp2/ etc.)
  • Fortunes fixed.
  • HTML stripped from subject fields.
  • A handful of under-the-hood tweaks.

Known issues

  • Bold text is handled improperly (replaced with "1").
  • There are some irrelevant options in the search form.

Progscrape textboard update

The 2014 ghonsonb3120 progscrape dump was added on 2023-06-21 (about 2 million 2004-2014 posts).

chanarchive.org waybackmachine scrapes update

Scrapes from the waybackmachine were processed and added to the site on 2023-07-10 (about 3.7 million "new" posts, some overlap with the Archive Ten Billion content).

Glossary update

The outdated "samples" article was replaced with a larger "glossary" page on 2024-03-27, mainly cataloguing original content originating from late-2000s to early-2010s 4chan.

k47 5M 2011 post dump update

The mid-2011 5M post dump sourced from k47 of k47.cz was added on 2024-04-04. This update also changed post-Ten Billion posts to display placeholder images as thumbnails at the original images’ aspect ratios, even when this would cause them to appear distorted. It also restored the "n images omitted" text in the board index view.

4chan.net 2003 waybackmachine crawls

This content was added on 2024-04-07.