00:00:06<sudesu>ah, thank you! but I need historical data ^_^;
00:03:37<sudesu>I was hoping to find an updated version of https://boards.4chan.org/t/thread/1153106#p1153107
00:08:06<sudesu>are there any guides to running your own 4chan archive around here? I have the storage and resources to do so
00:27:13etnguyen03 quits [Client Quit]
00:27:36arch quits [Ping timeout: 256 seconds]
00:36:50Terbium quits [Quit: No Ping reply in 180 seconds.]
00:38:43<@JAA>There's a large page about 4chan on the wiki.
00:38:55<@JAA>Including mentions of mirroring tools etc., IIRC.
00:42:57arch (arch) joins
00:45:12Terbium joins
00:51:47opl3 (opl) joins
00:53:29opl quits [Ping timeout: 272 seconds]
00:53:29opl3 is now known as opl
00:58:37<klea>afaik it's not tools but AT archiving archives :p
00:58:59<sudesu>anon it's really hard to read through the wiki, I don't even know what to search for, I am just looking for a data dump (*/_\)
00:59:01<klea>oh there are tools sorry
00:59:18<klea>afaik you'd have to make it yourself?
00:59:43<sudesu>surely every archive isn't making their own dataset, right? @_@
01:01:33<klea>https://github.com/eksopl/fuuka?
01:01:38etnguyen03 (etnguyen03) joins
01:02:09Webuser874861 joins
01:02:24Webuser874861 quits [Client Quit]
01:02:29<sudesu>mostly I am just looking for some tar file with all the posts ever posted to 4chan, so I can then search through it
01:09:41<hexagonwin>sudesu: maybe https://archive.4plebs.org/_/articles/credits/ can help?
01:10:36<hexagonwin>4plebs has data dumps on IA so at least you can get /pol/ i believe
01:27:24nimaje1 joins
01:27:57nimaje quits [Read error: Connection reset by peer]
01:30:29<nicolas17>I wouldn't even expect "archive of all posts ever posted to 4chan" to be a thing that exists
01:32:12<nicolas17>I remember some website that archived /b/ threads (individual threads specifically requested by users) which had special measures to refresh the archive every *second* when the thread was close to expiration in order to catch the last posts
01:37:11<pabs>arkiver: btw, would getting wiki diff traffic on project channels be feasible/good?
01:37:15<pabs>guess the script would have to parse the template, and somehow make sure not to spam channels it shouldn't
01:40:07<nicolas17>I'm getting another file listing of the parti bucket to diff it later
01:40:58<nicolas17>so far it looks like the channels directory had many new videos as people keep streaming and many deleted videos as they expire, but the ivs directory (listing still in progress) had zero changes
01:44:50khaoohs joins
01:48:35Island joins
02:05:55guest1 joins
02:09:13<sudesu>hexagonwin: thanks for the link
02:09:13<sudesu>I was hoping there was something more recently updated ^_^;
02:09:13<sudesu>I found https://archive.org/details/4plebsimagedump has "data dump 2024"
02:09:13<sudesu>I suppose the best way to get my hands on the data is just asking the people running the archives?
02:09:13<sudesu>nicolas17: tbh I was expecting all the posts to already be concatenation into a single archive already @_@
02:09:13<sudesu>is the current state of 4chan archives really so uncoordinated?
02:10:58<nicolas17>¯\_(ツ)_/¯ I didn't even know of the ones listed in the archiveteam wiki
02:11:24<nicolas17>and there doesn't seem to be any current 4chan archival project as part of archiveteam?
02:12:36<nicolas17>seems kind of hard with our infrastructure given the thread expiration thing
02:13:55arch quits [Ping timeout: 272 seconds]
02:13:58<nicolas17>with a normal forum, when we archive everything, years-old threads aren't gonna change anymore and we save them in full, and recent threads may get new posts after archiving and we'll lose those posts
02:14:34<nicolas17>with 4chan, archiving a thread too early would miss new posts but *also* archiving too late means it could expire
02:15:08<nicolas17>and our DPoS stuff has no guarantees of how long it may take for something to get archived once added to the queue...
02:16:33<nicolas17>doing "continuous" archival of 4chan would take quite some effort
02:19:07<sudesu>ah, I see... sorry, I didn't realize archiveteam had so much on its plate. I honestly thought it was only focused on 4chan ^_^; my bad lol
02:23:57arch (arch) joins
02:26:42hexagonwin_ joins
02:28:18hexagonwin quits [Ping timeout: 256 seconds]
02:29:35<klea>it'd be interesting to try to make the warrior thingy allow working on more than one project at a time, allowing some kind of language to make tasks be able to be delegated to specific systems based on things like if they have a lot of storage, where they're going out to the internet from, etc (basically allow individually picking warriors to use for specific projects)
03:00:42<nicolas17>sudesu: #archivebot is a fun place to watch for a while :P
03:01:49<Guest>adding to this ^^ - i think the simplest way is running a cronjob every x hours to update "warrior characteristics" (what you're describing), and having it pull the latest AT warrior allocation characteristics ("WAC" i guess). then, the warrior filters through all of those and decides the best project to work on. so most of the work is client-side besides fetching and maintaining the WAC from AT servers. cc: klea
03:02:31<Guest>WAC is a cool name
03:02:46<nicolas17>wireless auto config
03:03:50<Guest>isnt everything wireless now
03:04:10<nicolas17>>wireless device
03:04:11<nicolas17>>look inside
03:04:13<nicolas17>>wires
03:05:09guest1 quits [Client Quit]
03:05:12<Guest>or "warrior auto config", but it isnt neccesarily a config
03:23:00Guest58 quits [Quit: My Mac has gone to sleep. ZZZzzz…]
03:30:56<klea>it's even more fun to connect to ws://archivebot.archivingyoursh.it/stream and see it move
03:31:54etnguyen03 quits [Client Quit]
03:37:29<that_lurker>katia++
03:37:30<eggdrop>[karma] 'katia' now has 111 karma!
03:37:48etnguyen03 (etnguyen03) joins
03:51:21<h2ibot>Cooljeanius edited Adobe Aero (+41, /* Archival progress */ linkify): https://wiki.archiveteam.org/?diff=58460&oldid=58226
03:53:02etnguyen03 quits [Remote host closed the connection]
04:13:17Island quits [Read error: Connection reset by peer]
04:15:23Wohlstand (Wohlstand) joins
04:38:06<nicolas17>parti bucket grew from 197M files to 208M files
04:38:55DogsRNice quits [Read error: Connection reset by peer]
04:41:50Guest58 joins
04:47:33Guest58 quits [Client Quit]
05:05:30andrewnyr quits [Quit: Ping timeout (120 seconds)]
05:07:03Guest58 joins
05:08:19cyanbox joins
05:10:12andrewnyr joins
05:14:42Guest58 quits [Client Quit]
05:31:35Guest58 joins
05:33:27Guest58 quits [Client Quit]
05:34:04Guest58 joins
05:40:10Guest58 quits [Client Quit]
05:42:30Guest58 joins
05:43:56HackMii (hacktheplanet) joins
05:47:17SootBector quits [Remote host closed the connection]
05:48:24SootBector (SootBector) joins
05:51:25Guest58 quits [Client Quit]
05:53:53Guest58 joins
05:57:39ericgallager joins
05:58:07cooljeanius quits [Ping timeout: 272 seconds]
06:07:25Guest58 quits [Client Quit]
06:24:28Guest58 joins
06:27:20Guest58 quits [Client Quit]
06:31:19nexussfan quits [Quit: Konversation terminated!]
06:31:19Guest58 joins
06:33:33Guest58 quits [Client Quit]
06:34:22Guest58 joins
06:36:07agtsmith quits [Ping timeout: 272 seconds]
06:36:22Guest58 quits [Client Quit]
06:40:04Guest58 joins
06:43:00Guest58 quits [Client Quit]
06:44:59ericgallager quits [Ping timeout: 272 seconds]
06:50:53Guest58 joins
06:51:34agtsmith joins
06:54:49Guest58 quits [Client Quit]
06:58:33Guest58 joins
07:00:16Guest58 quits [Client Quit]
07:29:26benjins3 quits [Read error: Connection reset by peer]
08:08:07Boppen_ quits [Read error: Connection reset by peer]
08:12:17HackMii quits [Remote host closed the connection]
08:12:34HackMii (hacktheplanet) joins
08:13:28Boppen (Boppen) joins
08:30:59Guest58 joins
08:40:21PredatorIWD25 quits [Read error: Connection reset by peer]
08:40:32PredatorIWD25 joins
08:56:25Webuser445398 joins
08:59:29Dada joins
09:47:06Webuser445398 leaves
10:51:43BornOn420_ (BornOn420) joins
10:51:59BornOn420 quits [Ping timeout: 272 seconds]
10:56:42<cruller>https://archiveready.com/ 's concept shares similarities with that of https://wiki.archiveteam.org/index.php/Obstacles
10:57:21<cruller>I guess this tool focuses on obstacles not intentionally created by the site owner and is similar in functionality to an SEO checker.
11:15:45tertu (tertu) joins
11:18:16Webuser356513 joins
11:18:42tertu2 quits [Ping timeout: 256 seconds]
11:19:10Webuser356513 quits [Client Quit]
11:51:38<h2ibot>Manu edited Distributed recursive crawls (+70, Candidates: Add www.artinliverpool.com): https://wiki.archiveteam.org/?diff=58462&oldid=58218
11:57:41sudesu quits [Quit: Ooops, wrong browser tab.]
12:14:55<cruller>Probably a stupid question: Why are custom downloaders like "foo-grab" usually executed via DPoS?
12:17:56<cruller>There are many tasks that don't require DPoS manpower but require custom downloaders.
12:36:31benjins3 joins
12:46:42HP_Archivist quits [Read error: Connection reset by peer]
13:31:32<masterx244|m>those that require custom stuff but not a DPoS are usually run by core AT members directly. reference any [J]AA qwarc shenanigans
13:52:43Shyy46 quits [Quit: The Lounge - https://thelounge.chat]
13:52:56Shyy46 joins
13:53:21Shyy46 quits [Client Quit]
13:53:39Shyy46 joins
13:58:47Shyy46 quits [Client Quit]
13:59:01Shyy46 joins
14:34:43Wohlstand quits [Quit: Wohlstand]
14:34:49sec^nd quits [Ping timeout: 276 seconds]
14:40:33sec^nd (second) joins
14:48:28sec^nd quits [Ping timeout: 276 seconds]
14:58:30<cruller>masterx244: Yeah, I know a few such examples too. Those make sense.
15:01:58TheEnbyperor quits [Ping timeout: 256 seconds]
15:02:02<cruller>To be honest, I myself don't have a clear idea about how such tasks should be handled...
15:02:47TheEnbyperor_ quits [Ping timeout: 272 seconds]
15:03:01sec^nd (second) joins
15:03:35<justauser>https://archiveready.com/ and other sites of the owner look pretty abandoned...
15:03:57<justauser>With a bit of irony, I should probably feed them to AB.
15:09:25Cuphead2527480 (Cuphead2527480) joins
15:13:36<cruller>koichi: When discussing facts rather than ideals, I guess many people archive them independently of AT.
15:19:11<cruller>justauser: Fan fact: The owner, Vangelis Banos, is the author of SPN2 API Docs.