| 00:00:06 | <sudesu> | ah, thank you! but I need historical data ^_^; |
| 00:03:37 | <sudesu> | I was hoping to find an updated version of https://boards.4chan.org/t/thread/1153106#p1153107 |
| 00:08:06 | <sudesu> | are there any guides to running your own 4chan archive around here? I have the storage and resources to do so |
| 00:27:13 | | etnguyen03 quits [Client Quit] |
| 00:27:36 | | arch quits [Ping timeout: 256 seconds] |
| 00:36:50 | | Terbium quits [Quit: No Ping reply in 180 seconds.] |
| 00:38:43 | <@JAA> | There's a large page about 4chan on the wiki. |
| 00:38:55 | <@JAA> | Including mentions of mirroring tools etc., IIRC. |
| 00:42:57 | | arch (arch) joins |
| 00:45:12 | | Terbium joins |
| 00:51:47 | | opl3 (opl) joins |
| 00:53:29 | | opl quits [Ping timeout: 272 seconds] |
| 00:53:29 | | opl3 is now known as opl |
| 00:58:37 | <klea> | afaik it's not tools but AT archiving archives :p |
| 00:58:59 | <sudesu> | anon it's really hard to read through the wiki, I don't even know what to search for, I am just looking for a data dump (*/_\) |
| 00:59:01 | <klea> | oh there are tools sorry |
| 00:59:18 | <klea> | afaik you'd have to make it yourself? |
| 00:59:43 | <sudesu> | surely every archive isn't making their own dataset, right? @_@ |
| 01:01:33 | <klea> | https://github.com/eksopl/fuuka? |
| 01:01:38 | | etnguyen03 (etnguyen03) joins |
| 01:02:09 | | Webuser874861 joins |
| 01:02:24 | | Webuser874861 quits [Client Quit] |
| 01:02:29 | <sudesu> | mostly I am just looking for some tar file with all the posts ever posted to 4chan, so I can then search through it |
| 01:09:41 | <hexagonwin> | sudesu: maybe https://archive.4plebs.org/_/articles/credits/ can help? |
| 01:10:36 | <hexagonwin> | 4plebs has data dumps on IA so at least you can get /pol/ i believe |
| 01:27:24 | | nimaje1 joins |
| 01:27:57 | | nimaje quits [Read error: Connection reset by peer] |
| 01:30:29 | <nicolas17> | I wouldn't even expect "archive of all posts ever posted to 4chan" to be a thing that exists |
| 01:32:12 | <nicolas17> | I remember some website that archived /b/ threads (individual threads specifically requested by users) which had special measures to refresh the archive every *second* when the thread was close to expiration in order to catch the last posts |
| 01:37:11 | <pabs> | arkiver: btw, would getting wiki diff traffic on project channels be feasible/good? |
| 01:37:15 | <pabs> | guess the script would have to parse the template, and somehow make sure not to spam channels it shouldn't |
| 01:40:07 | <nicolas17> | I'm getting another file listing of the parti bucket to diff it later |
| 01:40:58 | <nicolas17> | so far it looks like the channels directory had many new videos as people keep streaming and many deleted videos as they expire, but the ivs directory (listing still in progress) had zero changes |
| 01:44:50 | | khaoohs joins |
| 01:48:35 | | Island joins |
| 02:05:55 | | guest1 joins |
| 02:09:13 | <sudesu> | hexagonwin: thanks for the link |
| 02:09:13 | <sudesu> | I was hoping there was something more recently updated ^_^; |
| 02:09:13 | <sudesu> | I found https://archive.org/details/4plebsimagedump has "data dump 2024" |
| 02:09:13 | <sudesu> | I suppose the best way to get my hands on the data is just asking the people running the archives? |
| 02:09:13 | <sudesu> | nicolas17: tbh I was expecting all the posts to already be concatenation into a single archive already @_@ |
| 02:09:13 | <sudesu> | is the current state of 4chan archives really so uncoordinated? |
| 02:10:58 | <nicolas17> | ¯\_(ツ)_/¯ I didn't even know of the ones listed in the archiveteam wiki |
| 02:11:24 | <nicolas17> | and there doesn't seem to be any current 4chan archival project as part of archiveteam? |
| 02:12:36 | <nicolas17> | seems kind of hard with our infrastructure given the thread expiration thing |
| 02:13:55 | | arch quits [Ping timeout: 272 seconds] |
| 02:13:58 | <nicolas17> | with a normal forum, when we archive everything, years-old threads aren't gonna change anymore and we save them in full, and recent threads may get new posts after archiving and we'll lose those posts |
| 02:14:34 | <nicolas17> | with 4chan, archiving a thread too early would miss new posts but *also* archiving too late means it could expire |
| 02:15:08 | <nicolas17> | and our DPoS stuff has no guarantees of how long it may take for something to get archived once added to the queue... |
| 02:16:33 | <nicolas17> | doing "continuous" archival of 4chan would take quite some effort |
| 02:19:07 | <sudesu> | ah, I see... sorry, I didn't realize archiveteam had so much on its plate. I honestly thought it was only focused on 4chan ^_^; my bad lol |
| 02:23:57 | | arch (arch) joins |
| 02:26:42 | | hexagonwin_ joins |
| 02:28:18 | | hexagonwin quits [Ping timeout: 256 seconds] |
| 02:29:35 | <klea> | it'd be interesting to try to make the warrior thingy allow working on more than one project at a time, allowing some kind of language to make tasks be able to be delegated to specific systems based on things like if they have a lot of storage, where they're going out to the internet from, etc (basically allow individually picking warriors to use for specific projects) |
| 03:00:42 | <nicolas17> | sudesu: #archivebot is a fun place to watch for a while :P |
| 03:01:49 | <Guest> | adding to this ^^ - i think the simplest way is running a cronjob every x hours to update "warrior characteristics" (what you're describing), and having it pull the latest AT warrior allocation characteristics ("WAC" i guess). then, the warrior filters through all of those and decides the best project to work on. so most of the work is client-side besides fetching and maintaining the WAC from AT servers. cc: klea |
| 03:02:31 | <Guest> | WAC is a cool name |
| 03:02:46 | <nicolas17> | wireless auto config |
| 03:03:50 | <Guest> | isnt everything wireless now |
| 03:04:10 | <nicolas17> | >wireless device |
| 03:04:11 | <nicolas17> | >look inside |
| 03:04:13 | <nicolas17> | >wires |
| 03:05:09 | | guest1 quits [Client Quit] |
| 03:05:12 | <Guest> | or "warrior auto config", but it isnt neccesarily a config |
| 03:23:00 | | Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…] |
| 03:30:56 | <klea> | it's even more fun to connect to ws://archivebot.archivingyoursh.it/stream and see it move |
| 03:31:54 | | etnguyen03 quits [Client Quit] |
| 03:37:29 | <that_lurker> | katia++ |
| 03:37:30 | <eggdrop> | [karma] 'katia' now has 111 karma! |
| 03:37:48 | | etnguyen03 (etnguyen03) joins |
| 03:51:21 | <h2ibot> | Cooljeanius edited Adobe Aero (+41, /* Archival progress */ linkify): https://wiki.archiveteam.org/?diff=58460&oldid=58226 |
| 03:53:02 | | etnguyen03 quits [Remote host closed the connection] |
| 04:13:17 | | Island quits [Read error: Connection reset by peer] |
| 04:15:23 | | Wohlstand (Wohlstand) joins |
| 04:38:06 | <nicolas17> | parti bucket grew from 197M files to 208M files |
| 04:38:55 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:41:50 | | Guest58 joins |
| 04:47:33 | | Guest58 quits [Client Quit] |
| 05:05:30 | | andrewnyr quits [Quit: Ping timeout (120 seconds)] |
| 05:07:03 | | Guest58 joins |
| 05:08:19 | | cyanbox joins |
| 05:10:12 | | andrewnyr joins |
| 05:14:42 | | Guest58 quits [Client Quit] |
| 05:31:35 | | Guest58 joins |
| 05:33:27 | | Guest58 quits [Client Quit] |
| 05:34:04 | | Guest58 joins |
| 05:40:10 | | Guest58 quits [Client Quit] |
| 05:42:30 | | Guest58 joins |
| 05:43:56 | | HackMii (hacktheplanet) joins |
| 05:47:17 | | SootBector quits [Remote host closed the connection] |
| 05:48:24 | | SootBector (SootBector) joins |
| 05:51:25 | | Guest58 quits [Client Quit] |
| 05:53:53 | | Guest58 joins |
| 05:57:39 | | ericgallager joins |
| 05:58:07 | | cooljeanius quits [Ping timeout: 272 seconds] |
| 06:07:25 | | Guest58 quits [Client Quit] |
| 06:24:28 | | Guest58 joins |
| 06:27:20 | | Guest58 quits [Client Quit] |
| 06:31:19 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:31:19 | | Guest58 joins |
| 06:33:33 | | Guest58 quits [Client Quit] |
| 06:34:22 | | Guest58 joins |
| 06:36:07 | | agtsmith quits [Ping timeout: 272 seconds] |
| 06:36:22 | | Guest58 quits [Client Quit] |
| 06:40:04 | | Guest58 joins |
| 06:43:00 | | Guest58 quits [Client Quit] |
| 06:44:59 | | ericgallager quits [Ping timeout: 272 seconds] |
| 06:50:53 | | Guest58 joins |
| 06:51:34 | | agtsmith joins |
| 06:54:49 | | Guest58 quits [Client Quit] |
| 06:58:33 | | Guest58 joins |
| 07:00:16 | | Guest58 quits [Client Quit] |
| 07:29:26 | | benjins3 quits [Read error: Connection reset by peer] |
| 08:08:07 | | Boppen_ quits [Read error: Connection reset by peer] |
| 08:12:17 | | HackMii quits [Remote host closed the connection] |
| 08:12:34 | | HackMii (hacktheplanet) joins |
| 08:13:28 | | Boppen (Boppen) joins |
| 08:30:59 | | Guest58 joins |
| 08:40:21 | | PredatorIWD25 quits [Read error: Connection reset by peer] |
| 08:40:32 | | PredatorIWD25 joins |
| 08:56:25 | | Webuser445398 joins |
| 08:59:29 | | Dada joins |
| 09:47:06 | | Webuser445398 leaves |
| 10:51:43 | | BornOn420_ (BornOn420) joins |
| 10:51:59 | | BornOn420 quits [Ping timeout: 272 seconds] |
| 10:56:42 | <cruller> | https://archiveready.com/ 's concept shares similarities with that of https://wiki.archiveteam.org/index.php/Obstacles |
| 10:57:21 | <cruller> | I guess this tool focuses on obstacles not intentionally created by the site owner and is similar in functionality to an SEO checker. |
| 11:15:45 | | tertu (tertu) joins |
| 11:18:16 | | Webuser356513 joins |
| 11:18:42 | | tertu2 quits [Ping timeout: 256 seconds] |
| 11:19:10 | | Webuser356513 quits [Client Quit] |
| 11:51:38 | <h2ibot> | Manu edited Distributed recursive crawls (+70, Candidates: Add www.artinliverpool.com): https://wiki.archiveteam.org/?diff=58462&oldid=58218 |
| 11:57:41 | | sudesu quits [Quit: Ooops, wrong browser tab.] |
| 12:14:55 | <cruller> | Probably a stupid question: Why are custom downloaders like "foo-grab" usually executed via DPoS? |
| 12:17:56 | <cruller> | There are many tasks that don't require DPoS manpower but require custom downloaders. |
| 12:36:31 | | benjins3 joins |
| 12:46:42 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 13:31:32 | <masterx244|m> | those that require custom stuff but not a DPoS are usually run by core AT members directly. reference any [J]AA qwarc shenanigans |
| 13:52:43 | | Shyy46 quits [Quit: The Lounge - https://thelounge.chat] |
| 13:52:56 | | Shyy46 joins |
| 13:53:21 | | Shyy46 quits [Client Quit] |
| 13:53:39 | | Shyy46 joins |
| 13:58:47 | | Shyy46 quits [Client Quit] |
| 13:59:01 | | Shyy46 joins |
| 14:34:43 | | Wohlstand quits [Quit: Wohlstand] |
| 14:34:49 | | sec^nd quits [Ping timeout: 276 seconds] |
| 14:40:33 | | sec^nd (second) joins |
| 14:48:28 | | sec^nd quits [Ping timeout: 276 seconds] |
| 14:58:30 | <cruller> | masterx244: Yeah, I know a few such examples too. Those make sense. |
| 15:01:58 | | TheEnbyperor quits [Ping timeout: 256 seconds] |
| 15:02:02 | <cruller> | To be honest, I myself don't have a clear idea about how such tasks should be handled... |
| 15:02:47 | | TheEnbyperor_ quits [Ping timeout: 272 seconds] |
| 15:03:01 | | sec^nd (second) joins |
| 15:03:35 | <justauser> | https://archiveready.com/ and other sites of the owner look pretty abandoned... |
| 15:03:57 | <justauser> | With a bit of irony, I should probably feed them to AB. |
| 15:09:25 | | Cuphead2527480 (Cuphead2527480) joins |
| 15:13:36 | <cruller> | koichi: When discussing facts rather than ideals, I guess many people archive them independently of AT. |
| 15:19:11 | <cruller> | justauser: Fan fact: The owner, Vangelis Banos, is the author of SPN2 API Docs. |