Update 2022-09-17: Added the 50 million missing 2007-2008 /b/ posts! Read about this and other changes here.

Welcome to the new Oldfriend Archive, hosting over 160M text-only 2005-2008 4Chan posts.

About

Oldfriend Archive

Updated 2022-09-26

This is a searchable archive of old 4Chan posts. Currently it hosts the "Archive Ten Billion" dump of 2005-2008 era 4Chan threads. Among these are the very earliest posts for many boards: /fa/, /fit/, /hc/, /jp/, /n/ (the Transportation reset), /r9k/, /tg/, /toy/, /trv/, /x/. It includes posts from almost all* 4Chan boards of the era - about 162 million posts/10.8 million threads in all, mostly from mid-2006 to November 2008 (see the stats page). Few of these posts, as of this time, are replicated in other sources. Exceptions include /jp/, which was archived by Fuuka archivers from day 1, and /a/ through most of 2008.

*missing /f/ (flash), /j/ (the sekrit club) and /yg/.

The archive is no longer missing large chunks of /b/ and /p/. See History for more.

There are no NSFW images hosted here, but plenty of NSFW language.

For a different way to browse some of the contents of this archive, check out 4museum (unaffiliated, est. 2019).

Q&A

What's the point of this site?
This site exists to be an easily-accessible repository of late 2000s internet culture. I hope that it will help people researching internet history (such as the first SCPs), nostalgic oldfriends, and newfriends wanting to pose as oldfriends.

Who is this "Four Chan"?
For some time in the late 2000s (the period covered by this archive), 4Chan was a nexus of internet culture. In that period, it generated countless memes and other original content which propagated out to the rest of the internet.

It was also one of several major gathering points for anonymous online agitators (i.e. the internet hate machine), who would organize raids (concerted trolling) against various targets. In 2007, this activity had earned them enough notoriety to be recognized as "hackers on steroids" in an infamous Fox News broadcast. By 2008 with Project Chanology, Anonymous had exploded into real life and normies everywhere lived in fear of hackers cosplaying as 17th-century terrorists blowing up their computers.

Of course, the vast majority of the content was just inane bullshit, and that is reflected here.

Where did you get this data?
I sourced the data from the Archive Ten Billion dump Internet Archive item, which was uploaded by Jason Scott in 2018 after originally acquiring the data in 2009. The original scraper was not identified (it probably wasn't Jason).

The IA upload is borked in a few ways, and the data needed non-trivial repair work before it could be fully migrated to a searchable online archive format. See History for more.

Who originally scraped the data?
The archive of database files in the Ten Billion IA item is named chanarchive.tar.gz, which might lead someone to assume that the dump comes from 4chanarchive/chanarchive, the best-known archive from this early in 4Chan's history (4chanarchive ran from 2006 to 2013). But I think this is a coincidence.

The data might originate from a site called Rapidsearch, which was an index of links to uploads on file-sharing sites like Rapidshare. This would better explain why the first posts in the dump come from 4Chan's /r/ (where file-sharing links would frequently be shared), why the dump contains threads from other sites besides 4Chan (which match sites covered by Rapidsearch), and why the scrape coverage is comprehensive instead of selective like 4chanarchive (which only served about 20,000 saved threads at its peak).

Whoever it was though, thanks for sharing.

How can I trust the data here?
I compared sample posts/threads to the GETs encyclopedia, the 2008 /a/ overlap with Desuarchive (with a timestamp offset - early Desu /a/ posts are in UTC instead of New York/"4Chan" time), and 4chanarchive threads on waybackmachine (e.g. compare the first thread here to this). Most of the posts here can't be found anywhere else, but those that I've been able to test match other sources.

Where are the images?
All image data minus a binary "post has image" indicator is missing in the source. This includes the thumbnail/full-size image files, original filenames, file sizes and md5 hashes. A placeholder image is used here (with dummy metadata).

Were any posts removed from the source archive?
About 10k posts with odd characteristics had to be removed to get FoolFuuka working.

  • It looks like the 7777777 /b/ get was interfered with. Many threads with this number are in the source archive - it looks like every new thread displayed the thread number 7777777 for a while, but had the true thread number in the URL. Replies to these threads might also have had the number 7777777. All 7777777 /b/ posts were dropped.
  • There were a few thousand posts with duplicate post numbers that I couldn't easily explain.
  • There were a few thousand posts missing their threads' OPs.


What other data is missing from the posts?

  • Posts were only timestamped to the minute in the source, except for on /b/ for some reason. This means that posts ordered by timestamp are often disordered by number, since many will share the same timestamp.
  • No data regarding post/image deletion status or time was preserved in the source.
  • Capcodes are missing from the source.
  • Some ban messages seem to be missing from the source. Others are preserved.
  • All HTML tags have been stripped (some were stripped in the source).
  • Oekaki post data (drawing duration etc.) has been stripped.


What's with the numbered boards (/n1-2/, /sp1-2/, /con1-4/)?
During the period covered by this archive, some of the boards were reset (cleared, post number reset to 1). This means that in some cases, post ranges in a board would overlap with those of an early iteration of the board. To prevent collisions with thread links and avoid confusing the software, affected boards have been split up into numbered sections.

  • /n/ switched from 'News' to 'Transportation' on 2008-02-19, so posts/threads from both of these iterations of /n/ are in the archive. Before switching to 'News' in 2006, /n/ was 'Nature' in 2005 and 'Trains' in 2004 (neither covered in this archive). The post count was reset when /n/ transitioned to Transportation.
  • /sp/ was killed in 2006 and was brought back with a post number reset on the same day as /n/ became Transportation, 2008-02-19.
  • /con/ was reset 3 times during this period between different cons.


Could I get a dump of the data?
Here's a repaired version of chanarchive.tar.gz in .sql format. I'd like to upload the Asagi version of the dump (the format used by this site) too sometime.

Reports / Contact

For personal information takedown requests, valid links to illegal content, or bugged post data, use the report function. To report other problems or for other inquiries email admin[@]sage.moe.