Update 2024-03-27: Greatly expanded the "Samples" page and renamed it to "Glossary".
Update 2024-04-04: Added 5 million mid-2011 posts from the k47 post dump. Browse (mostly) them here.
Update 2024-04-07: Added ~400 October 2003 posts from 4chan.net. Browse them here.

Welcome to Oldfriend Archive, hosting ~170M text-only 2003-2014 4chan posts (mostly 2006-2008).

About

Oldfriend Archive

Updated 2024-04-07

This is a searchable archive of old (late 2000s to early 2010s) 4chan posts. Currently it hosts the "Archive Ten Billion" dump of 2005-2008 era 4chan threads, a complete scrape of the textboards, most of the text content from chanarchive.org, another dump of mid-2011 posts, and about 400 2003 posts from 4chan.net. Among these are the very earliest posts for many boards: /fa/, /fit/, /hc/, /jp/, /n/ (the Transportation reset), /r9k/, /tg/, /toy/, /trv/, /x/. It includes posts from almost all* 4chan boards of the era - about 173 million posts/11.2 million threads in all, ranging from 2003 to 2014, but mostly from mid-2006 to November 2008 (see the stats page). Most of this content, as of this time, is not hosted elsewhere.

*missing /f/ (flash), /j/ (the sekrit club) and /yg/.

There are no NSFW images hosted here, but plenty of NSFW language.

For a different way to browse some of the contents of this archive, check out 4museum (unaffiliated, est. 2019).

Reports / Contact

For personal information takedown requests, valid links to illegal content, or bugged post data, use the report function. To report other problems or for other inquiries email admin[@]sage.moe.

Q&A

What's the point of this site?
This site exists to be an easily-accessible repository of late 2000s - early 2010s internet culture. The era before Facebook/Twitter took over. I hope that it will help people researching internet history (such as the first SCPs), nostalgic oldfriends, and newfriends wanting to RP as oldfriends.

Who is this "Four Chan"?
For some time in the late 2000s (the main period covered by this archive), 4chan was a nexus of internet culture. In that period, it generated countless memes and other original content which propagated out to the rest of the internet.

It was also one of several major gathering points for anonymous online agitators (i.e. the internet hate machine), who would organize raids (concerted trolling) against various targets. In 2007, this activity had earned them enough notoriety to be recognized as "hackers on steroids" in an infamous Fox News broadcast. By 2008 with Project Chanology, Anonymous had exploded into real life and normies everywhere lived in fear of masked hackers blowing up their computers.

Of course, the vast majority of the content was just inane bullshit, and that is reflected here.

Where did you get this data?
I sourced most the data from the Archive Ten Billion dump Internet Archive item, which was uploaded by Jason Scott in 2018 after originally acquiring the data in 2009. The original scraper was not identified (it probably wasn't Jason). The textboards content also came from an anon on archive.org, the chanarchive.org and 4chan.net content was recovered in HTML form from waybackmachine scrapes and the 2011 dump was hosted on a filesharing site.

The original IA upload was borked in a few ways, and the data needed non-trivial repair work before it could be fully migrated to a searchable online archive format. See History for more.

Who originally scraped the Archive Ten Billion data?
The archive of database files in the Ten Billion IA item is named chanarchive.tar.gz, which might lead someone to assume that the dump comes from 4chanarchive/chanarchive, the best-known archive from this early in 4chan's history (4chanarchive ran from 2006 to 2013). But this is a coincidence.

The data might originate from a site called Rapidsearch, which was an index of links to uploads on file-sharing sites like Rapidshare. This would better explain why the first posts in the dump come from 4chan's /r/ (where file-sharing links would frequently be shared), why the dump contains threads from other sites besides 4chan (which match sites covered by Rapidsearch), and why the scrape coverage is comprehensive instead of selective like 4chanarchive (which served under 20,000 saved threads at its peak).

Whoever it was though, thanks for sharing.

How can I trust the data here?
I compared sample posts/threads to the GETs encyclopedia, the 2008 /a/ overlap with Desuarchive, and 4chanarchive threads on waybackmachine (e.g. compare the first thread here to this). Most of the posts here can't be found anywhere else, but those that I've been able to test match other sources.

Where are the images?
For the Archive Ten Billion posts, image data minus a binary "post has image" indicator is missing in the source. This includes the thumbnail/full-size image files, original filenames, file sizes and md5 hashes. A placeholder image is used here (with dummy metadata).

For the early 2010s posts, most image metadata was kept in the source, but most of the images themselves were not scraped. The placeholder thumbnails for these posts will often appear distorted, since the original aspect ratios have been maintained.

Were any posts removed from the source archive?
About 10k posts with odd characteristics had to be removed from the Ten Billion dump to get FoolFuuka working. Quite a few of the textboard dump and chanarchive.org posts were malformed in one way or another and also had to be discarded.


What other data is missing from the posts?

See the section for each source archive.

  • Posts were only timestamped to the minute in the Archive Ten Billion source, except for on /b/ for some reason. This means that posts ordered by timestamp are often disordered by number, since many will share the same timestamp. The same is true for some of the chanarchive.org (>2008) posts.
  • Capcodes are missing from the source.
  • Some ban messages seem to be missing from the source. Others are preserved.
  • All HTML tags have been stripped (some were stripped in the source).
  • Oekaki post data (drawing duration etc.) has been stripped.


What's with the numbered boards (/n1-2/, /sp1-2/, /r9k1-2/. /hc1-2/, /con1-4/)?
During the period covered by this archive, some of the boards were reset (cleared, post number reset to 1). This means that in some cases, post ranges in a board would overlap with those of an early iteration of the board. To prevent collisions with thread links and avoid confusing the software, affected boards have been split up into numbered sections.

  • /n/ switched from 'News' to 'Transportation' on 2008-02-19, so posts/threads from both of these iterations of /n/ are in the archive. Before switching to 'News' in 2006, /n/ was 'Nature' in 2005 and 'Trains' in 2004 (neither covered in this archive). The post count was reset when /n/ transitioned to Transportation.
  • /sp/ was killed in 2006 and was brought back with a post number reset on the same day as /n/ became Transportation, 2008-02-19.
  • /r9k/ was killed in early 2011 and brought back in late 2011
  • /hc/ was killed in 2009 and brought back in 2011
  • /con/ was reset 3 times during this period between different cons.


What time zone does the archive use/what's with the timestamps?
Ignore the displayed date; this will be 4-5 hours off the correct time for your time zone. Treat the "4chan Time" date in the alt-text (mouse over the date) as correct UTC (regardless of your local time zone, and with the exception of some annual 1-hour periods on New York DST start dates, where 3am-3:59am happens twice). In most newer archives, "4chan Time" is ET/New York time, not UTC. Credit to this guy for noticing the discrepancy.

I might adjust the "4chan Time" to ET in the future, which would fix the displayed (localized) date as well. This would involve doing some ugly things in the database. The "4chan Time" issue affects other 4chan archives in various ways and has roots in the fact that early scrapers did not have access to a UTC field in the HTML.

Could I get a dump of the data?
You can download an dump of the posts database including the Ten Billion, textboard and chanarchive.org content here. You can download a repaired version of the "Ten Billion" data here.

Source data

Archive Ten Billion (2006-2008 full archive)

This is the 90% complete 2006-2008 dump which makes up the majority of the archive. Tons of memes and other original content make their first appearances here.

More info Boards: /a/, /an/, /b/, /c/, /cgl/, /ck/, /cm/, /co/, /con/, /d/, /e/, /fa/, /fit/, /g/, /gif/, /h/, /hc/, /hr/, /i/, /ib/, /ic/, /ip/, /jp/, /k/, /m/, /mu/, /n/, /o/, /p/, /po/, /r/, /r9k/, /s/, /sp/, /t/, /tg/, /toy/, /trv/, /tv/, /u/, /v/, /w/, /wg/, /x/, /y/ (every board which existed between mid-2006 to mid-2008 except /f/, /j/ and /yg/)
Date range: Jan. 2005 - Dec. 2008 (but mid-2006 to late 2008 for most boards)
Coverage: Full (roughly 90% of all posts from mid-2006 to mid-2008, plus some of 2005 /r/)
Original archiver: Anonymous donation to Jason Scott of archive.org (probably from the "rapidsearch" filesharing link aggregator)
Source data: (Original upload) (Repaired version)
Post/Thread Count: 162M posts/10.8M threads

More info earlier on on this page and in the History page.

Kludges

  • This dump was missing all image data except for a "has image" bool. All image data in the database has been faked, and the image files replaced with placeholders.
  • Some boards were reset (possibly multiple times) in this period, so the different iterations have been numbered (/con1-4/, /n1-2/, /sp1-2/).

2014 textboard progscrape scrape (2004-2014 full textboard archive)

This is a mostly-complete scrape of the 4chan text boards made in 2014, before their closure in 2015. The most notable content would probably be the programming discussion on /prog/ (companion board to /g/ and by far the largest board). It also goes back just a little further than the oldest posts in the Ten Billion dump (to December 2004).

More info Boards: /anime/, /book/, /carcom/, /comp/, /food/, /games/, /img/, /lang/, /lounge/, /music/, /newnew/, /newpol/, /prog/, /sci/, /sjis/, /sports/, /tech/, /tele/, /vip/ (all 4chan text boards extant in 2014)
Date range: Dec. 2004 - Apr. 2014 (from inception to dis.4chan posting shutdown for most boards)
Coverage: Full (~4% or ~90k posts lost to corruption, mostly /prog/)
Original archiver: Anonymous progrider.org user (ghonsonb3120 on archive.org)
Source data: (Original uploads)
Post/Thread Count: 2.2M posts/147k threads

From 2004 up until 2014/2015, 4chan hosted a number of text-only boards. In 2014, after the textboards were made read-only, an archiver ran an archiving tool developed by a member of the /prog/ community (Xarn) named progscrape on all textboards. The discussion is archived here, and the dumps can be downloaded from here. The archiver (ghonsonb3120 on archive.org) noted at the time that some of the posts (about 10% by my count) on /prog/ had malformed/invalid post dates or were otherwise borked. The content here is also available here, and it's probably better to use that site for citations since the content will be better formatted there. There's also this site for search, but the links no longer work. archive.today has a few snaps from the original site.

A few boards were deleted forever or reset before Dec. 2004, and those boards/board iterations are not present in the dump. These include the pilot /amh/ and /bbs/ boards added in February 2004, and the /dis/ and /sug/ site discussion/suggestions boards opened a few months later on newer software in April 2004. Also missing is an old /ascii/ art board (companion to /sjis/) and possibly others.

Kludges
This content suffers the most damage in translation of any content here due to some fundamental incompatibilities between imageboard and textboard content. The biggest difference is in post and thread numbering - on the text boards, posts are enumerated within the thread rather than having a post number with a scope global to the board. e.g. the first 3 posts in a thread will be numbered 1,2,3. To reconcile this with the model used by the archive software running here, post numbers were faked/mocked. There were a million other issues concerning the different BBCode implementations. I tried to preserve the original post format as much as possible.

  • Post numbers have been faked for archive software compatibility (post numbers within a board need to be unique). The faked post numbers are in ascending time order.
  • Backlinks do not work for the reason above.
  • /sci/ and /vip/ have been renamed to /dissci/ and /disvip/ to disambiguate from the imageboard versions of those boards.
  • For reply posts, the original post numbers are in the subject line.
  • For OP posts, the original thread ID is in the subject line before the original topic. This is so that you can easily verify the thread content against another archive (archive.tinychan or the archive.org dumps), or search for a thread by its original ID (say if you had an old world4ch/dis.4chan bookmark).
  • Quotes are in the textboard blockquote style, as opposed to the usual imageboard >greentext. This was done with [code] tags.
  • Various other BBCode incompatibilities.

chanarchive.org waybackmachine scrape (2006-2013 partial archive)

waybackmachine scraped almost all the text content of chanarchive.org (formerly 4chanarchive.org) before it closed in 2013. These threads were added to the archive via a voting system, so only threads considered particularly noteworthy at the time made it in. The content is notable for bridging a period of time not covered by other archives, containing a lot of "nostalgic" OC, and being a "best of" collection (for what it's worth).

More info Boards: /3/, /a/, /adv/, /an/, /b/, /c/, /cgl/, /ck/, /cm/, /co/, /d/, /diy/, /e/, /fa/, /fit/, /g/, /gif/, /h/, /hc/, /hr/, /ic/, /int/, /jp/, /k/, /lit/, /m/, /mlp/, /mu/, /n/, /new/, /o/, /p/, /po/, /pol/, /r9k/, /s/, /sci/, /soc/, /sp/, /t/, /tg/, /toy/, /trv/, /tv/, /u/, /v/, /vg/, /vp/, /w/, /wg/, /wsg/, /x/, /y/ (most boards, missing /r/)
Date range: Nov. 2006 - July 2013
Coverage: Partial/selective (users would submit threads and then vote on which ones to keep)
Original archiver: Scraped by 4chanarchive.org/chanarchive.org, initially admined by capsized and later Edgeworth E. Euler, itself archived on waybackmachine/archive.org
Source data: (4chanarchive.org on waybackmachine) (chanarchive.org on waybackmachine) (no dump available)
Post/Thread Count: 4.1M posts/17k threads

There's some overlap with the Ten Billion dump here. I used a script to download the earliest version of each thread it could find. Most of the images are missing from waybackmachine. This scrape was taken from chanarchive.org snapshots, but chanarchive.org merged in all 4chanarchive content when the latter closed.

By my estimate, waybackmachine scraped around 95% of threads on the site before it closed down. This is based on a count of 18131 threads a month before shutdown, of which waybackmachine snapshot some 17600, and I recovered about 17400. There were a lot of little bugs in the HTML and my recovery script, which resulted in a loss of maybe 1-2% of posts on top of that.

The majority of images on chanarchive.org were not saved by waybackmachine, and I won't import any.

Kludges

  • /3/ renamed to /3/ because otherwise the full-text search won't work (this is also an issue for other archivers)
  • /r9k/ split into /r9k1/ and /r9k2/, /hc/ split into /hc1/ and /hc2/
  • Some emails left out due to chanarchive.org's adoption of an email-hiding service from Cloudflare
  • Some usernames missing from posts with Cloudflare-hidden emails
  • Some posts lost due to malformed and unrecoverable timestamps
  • A handful of missing thread OP names assigned to "Anonymous"
  • Some files missing md5s (unlike the Ten Billion dataset, most of the image data is still there)
  • Ad posts needed to be stripped out (fake posts made to look like ads)

k47 5M post dump (mid-2011 ~full archive)

This is a mass scrape of ~5 million posts from ~all boards from a short interval in mid-2011. It comes from a blogger by the name of k47.

More info Boards: /a/, /adv/, /an/, /b/, /c/, /cgl/, /ck/, /co/, /d/, /e/, /fa/, /fit/, /g/, /gif/, /h/, /hr/, /ic/, /int/, /jp/, /k/, /lit/, /m/, /mu/, /n/, /o/, /p/, /po/, /r/, /s/, /sci/, /soc/, /sp/, /t/, /tg/, /toy/, /trv/, /tv/, /u/, /v/, /vp/, /w/, /wg/, /x/, /y/ (all contemporaneous boards except /cm/ and /i/)
Date range: 2011-04-20 to 2011-05-02
Coverage: About 75% of all posts over the scraping period
Original archiver: k47 of k47.cz
Source data: (archive.org reupload) (original schema upload)
Post/Thread Count: 4.95M posts/258k threads

The release of the Ten Billion archive in 2018 put the 4chan fossil record in a strange state. Suddenly, the late 2000s were better-covered by archives than the early 2010s. I think that this collection might more thoroughly answer the question "What was 4chan like in the early 2010s?" than any other, even if it is just a week or two’s worth of posts. Other archives from this time only cover certain boards, or offer a hand-picked (possibly unrepresentative) selection of threads.

This dump was uploaded as an initial sample of an archiving project by the Czech-language blogger k47 in 2011 to the filesharing site uloz.to, linked from an article on his blog. The link to the data expired sometime in late 2023 or early 2024.

The interval covered was roughly 2011-04-20 to 2011-05-02 (example search), with a handful of posts from before (stickies) and after these dates. The coverage of this date range is around 75%. Images were not saved/uploaded.

Kludges

  • Some image data was present, some was missing. md5s, full-size image dimensions, file sizes were present. Original filenames, 4chan-generated filenames, thumbnail dimensions were missing.
  • All images have been assigned the filename "unknown.jpg", and thumbnail dimensions have been synthesized to match the original image aspect ratios.
  • Timestamps on some boards are only accurate to the minute.
  • The original archiver already stripped out all the HTML, so some things like fortunes will look a little different.
  • Some comments long enough for a "Comment too long" message were truncated.

4chan.net 2003 waybackmachine crawls

This is a very small collection of posts taken from all 4chan.net board pages captured on waybackmachine in early October 2003, starting within 3 days of the site's launch.

More info Boards: /b/, /y/
Date range: 2003-10-04 to 2003-10-10
Coverage: ~10% of the first ~3000 posts on /b/, a few from /y/
Original archiver: waybackmachine (or one of their own sources)
Source data: (4chan.net waybackmachine links pre-domain change)
Post/Thread Count: 383 posts/121 threads

4chan was initially hosted at the domain "4chan.net". It only lasts there for a few months, until February 2004, before moot is forced to change hosting and moves the site to 4chan.org. While all historical snapshots of 4chan.org by waybackmachine are missing/hidden due to the site's robots.txt, this is not true of 4chan.net. It so happens that waybackmachine, or one of its sources, crawled some pages on 4chan.net shortly after it launched.

In terms of user-generated content, this crawl captured a couple of images and the HTML of 12 pages of /b/ and 1 page of /y/. It also captured a page of the oekaki board, which ran on different software. I integrated the /b/ and /y/ text content here. Between /b/ and /y/, it only comes out to under 400 posts, but it's interesting that this content from the first few days of the site persists considering how scarce 2003 to mid-2006 content is. This collection includes a few early posts from shut and moot. In sharp contrast to later years, almost every post was posted pseudonymously (93% of posts had custom names, and 61% had tripcodes) rather than anonymously.

You can check out the content here. 3000 GET.

Kludges

  • Some image data was present, some was missing. thumb dimensions, file sizes, 4chan-generated thumb and full-size image filenames were present. Original filenames, image dimensions and image md5s were missing.
  • All images have been assigned the original filename "unknown.jpg".
  • Timestamps are only accurate to the minute. Posts posted in the same minute may appear out-of-order in search.