00:09:40yawkat` quits [Ping timeout: 255 seconds]
00:10:56yawkat (yawkat) joins
00:25:45HP_Archivist quits [Client Quit]
00:29:15Unholy23619246453 quits [Ping timeout: 265 seconds]
00:35:07leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in]
00:35:57leo60228 (leo60228) joins
00:42:01dxrt_ is now known as dxrt
00:42:01dxrt quits [Changing host]
00:42:01dxrt (dxrt) joins
00:42:01@ChanServ sets mode: +o dxrt
00:58:21<pabs>c3manu: also https://wiki.archiveteam.org/index.php/Mastodon
01:29:52<@JAA>arkiver: TL;DR on that is: there are three forums for different regions (forum.worldoftanks.{com,eu,asia}), all closing 2024-05-20 09:00, JS challenge can be passed by finding the smallest integer i (from 0) for which `md5(tpc + '::' + i)` starts with `chk` and setting two cookies to tpc and i (should be done in pipeline.py probably, but also failing the item if a response has the challenge later),
01:29:58<@JAA>strict throttling to 0.5 req/IP/s. URLs without slugs redirect to the canonical URL iff there are no special characters in the slug, else that needs to be found from the HTML. I can share my qwarc script if that helps.
01:30:34<nicolas17>what's tpc?
01:30:58<@JAA>No idea what it stands for. It's a variable in the JS code.
01:31:12<nicolas17>I mean does it come from the server?
01:31:16<@JAA>Yes
01:31:19<@JAA>And chk, too
01:31:31<@JAA>I've only seen chk = '0001', but maybe that varies.
01:31:40<nicolas17>yeah I was going to ask about chk next :p
01:32:16<nicolas17>if they used sha256(sha256(x)) we could have asked crypto miners for help /s
01:32:58<nicolas17>I thought that said 'chk' rather than `chk` and I was like "huh, you mean the first 3 bytes of md5 being ascii 'c' 'h' 'k'?"
01:33:10<@JAA>I mean, solving the challenge takes 25 ms with my non-optimised Python code.
01:33:33<@JAA>Largest i I've seen was under 300k.
01:35:50<@JAA>Specifically the hex digest starts with the value of the `chk` variable, yeah.
01:36:16<@JAA>E.g. 95c8dc44a288bde976f4a952d73e5b4c::8965
01:36:23<@JAA>→ 000128a595486a61820c22e5b7b11328
01:36:39<nicolas17>do we have md5 readily available in wget-lua?
01:37:09<@JAA>It's almost certainly easier to do this in pipeline.py.
01:37:47<nicolas17>I'm not awake enough for this
01:37:53<nicolas17>yes, of course, pipeline.py
02:01:40<fireonlive>some nice piping 🙂‍↕️
02:31:20Island_ joins
02:35:53Island quits [Ping timeout: 265 seconds]
02:58:24<pabs>does AT have any sort of webring archiving going on?
02:58:27<pabs>for eg I see "The sausage champs webring!" on http://meatspace.debian.net/planet/
02:58:42<pabs>but neither the previous/next URLs link to any webring stuff
03:02:48<fireonlive>not that i know of
04:20:42Campbellh joins
04:41:33<@arkiver>JAA: how long is the solved challenge usable?
04:45:16<@arkiver>JAA: do they allow for attachments or embedeed images?
04:51:23<@JAA>arkiver: Hours, but I don't know exactly.
04:51:30<@JAA>Yes, there are image uploads.
04:51:58<@JAA>This has one: https://forum.worldoftanks.com/index.php?/topic/671288-misinformation/
04:52:36<@JAA>Also embedded Imgur images, e.g. https://forum.worldoftanks.com/index.php?/topic/671359-theyre-all-over-the-place/
04:54:30<fireonlive>i'd guess we'd have to enforce concurrency as well
04:54:54<@JAA>The challenge value seems to depend on the IP, so probably can't just hardcode it.
04:55:00<nicolas17>aww
04:55:04<nicolas17>was about to ask that
04:55:32<nicolas17>put the challenge response on a downloaded file like the dictionary and let everyone use the same and update it hourly or so :P
04:55:33<@JAA>Enforcing the concurrency might not be needed. The throttling happens server-side. You'll still average one request per 2 seconds.
04:55:50<nicolas17>but if it's per IP then nope
04:56:11<fireonlive>ah they don't ban/429 you or such?
04:56:23<nicolas17>they delay the response?
04:57:15<@JAA>Yeah, response gets delayed. The ban I saw was a connection timeout.
04:57:37<@JAA>That ban was when I tried 50 connections, I think.
04:57:51<@JAA>10 was fine but 20 seconds average response time.
04:57:52<fireonlive>ahh
04:57:58Campbellh quits [Remote host closed the connection]
04:58:13<nicolas17>maybe not a ban
04:58:22<fireonlive>so those with warriors and 6 shouldn't be too hard pressed
04:58:24<nicolas17>maybe they just delayed you so much that it exceeded a timeout elsewhere :P
04:58:54<@JAA>Well, it lasted for 10-15 minutes after I stopped it, so...
04:59:17<nicolas17>oh ok yeah
05:00:03<@JAA>Yeah, challenge solution isn't reusable from another IP.
05:01:24<@JAA>Doesn't appear to be UA-dependent though.
05:01:52<@JAA>It *is* reusable between the three sites.
05:03:41<@JAA>(Or more specifically, the .com challenge solution works on .eu for me.)
05:04:18<nicolas17>if it takes milliseconds, and it can be done from the python script, I guess it doesn't actually matter much
05:04:59<@JAA>Yeah, I restricted my solver loop to 10 million hashes, which takes a few seconds. Realistically, it should never hit that.
05:05:58<@JAA>But we'd probably have separate items per site, and having to only solve the challenge once and then use the cookies for the entire multiitem is definitely useful.
05:06:40<@JAA>Or I guess keep the cookies in memory in pipeline.py, check whether they're still valid by making one request, solve the challenge if expired, or something along those lines.
05:06:57<nicolas17>a global var that spans more than one multiitem yeah
05:07:29<nicolas17>if it lasts hours, just renew it every 15/30 mins or so?
05:08:37<@JAA>Looks like the solution I got at 18:45 is still valid.
05:08:50<@JAA>So over 10 hours
05:09:10<nicolas17>...ok what is this even supposed to protect? :D
05:09:17<@JAA>I suspect they might rotate it at a fixed time though.
05:09:27<@JAA>E.g. once per day or whatever.
05:09:34<@JAA>Low-effort spammers, I imagine.
05:10:05<@JAA>The kind you instantly get when operating a public forum with open registration.
05:10:43etnguyen03 quits [Remote host closed the connection]
05:25:17JaffaCakes118 quits [Remote host closed the connection]
05:28:58JaffaCakes118 (JaffaCakes118) joins
06:05:05BlueMaxima quits [Read error: Connection reset by peer]
06:16:11<@arkiver>is pad.riseup.net suddenly gone?
06:17:40<fireonlive>hmm. 404
06:18:31<@JAA>I'd think that's unintentional.
06:18:46<@arkiver>pad.riseup.net is still listed at https://riseup.net/accounts
06:18:55<@arkiver>anyone with a contact there that could ask?
06:24:28<@arkiver>i created a ticket
06:31:36DopefishJustin quits [Remote host closed the connection]
06:41:07Island_ quits [Read error: Connection reset by peer]
06:41:08DopefishJustin joins
06:48:16Island joins
07:06:10Unholy23619246453 (Unholy2361) joins
07:41:34Island quits [Read error: Connection reset by peer]
08:19:26Island joins
08:43:56<h2ibot>Manu edited Mailman/2 (+39, /* http://linuxbox.org/pipermail lost */): https://wiki.archiveteam.org/?diff=52261&oldid=52234
08:48:30Island quits [Read error: Connection reset by peer]
08:49:57<h2ibot>Manu edited Mailman/2 (+98, /* https://lists.linuxfromscratch.org/ saved */): https://wiki.archiveteam.org/?diff=52262&oldid=52261
09:00:05Bleo182600722719 quits [Client Quit]
09:01:24Bleo182600722719 joins
09:04:59<h2ibot>Manu edited Mailman/2 (+15, /* correction:…): https://wiki.archiveteam.org/?diff=52263&oldid=52262
09:10:29loug joins
09:34:05nic8693 quits [Read error: Connection reset by peer]
09:38:09techdude3000 joins
09:38:26techdude3000 quits [Client Quit]
09:44:07nic8693 (nic) joins
10:04:45JaffaCakes118 quits [Remote host closed the connection]
10:22:59JaffaCakes118 (JaffaCakes118) joins
10:26:08Wohlstand (Wohlstand) joins
10:30:04eroc1990 quits [Client Quit]
10:34:48eroc1990 (eroc1990) joins
11:01:43Gereon quits [Ping timeout: 272 seconds]
11:37:01Notrealname1234 (Notrealname1234) joins
11:50:07JaffaCakes118_2 (JaffaCakes118) joins
11:53:55JaffaCakes118 quits [Ping timeout: 255 seconds]
12:05:37Gereon0 (Gereon) joins
12:06:04Notrealname1234 quits [Read error: Connection reset by peer]
12:06:45driib quits [Client Quit]
12:07:31driib (driib) joins
12:16:59Notrealname1234 (Notrealname1234) joins
12:17:18Notrealname1234 quits [Client Quit]
12:19:31driib quits [Client Quit]
12:26:17driib (driib) joins
12:34:19JaffaCakes118_2 quits [Read error: Connection reset by peer]
12:35:08JaffaCakes118 (JaffaCakes118) joins
12:43:12driib quits [Client Quit]
12:45:00driib (driib) joins
12:47:37lunik11 quits [Quit: :x]
12:47:54Wohlstand quits [Remote host closed the connection]
12:48:15lunik11 joins
12:52:23Wohlstand (Wohlstand) joins
13:20:32Wohlstand quits [Client Quit]
13:35:33MrMcNuggets (MrMcNuggets) joins
14:25:34Arcorann quits [Ping timeout: 255 seconds]
14:38:00<@arkiver>JAA: i'm not completely sure i'll be able to get a project runnig before the deadline
14:38:13<@arkiver>JAA: how does one trigger the js challenge page?
14:39:29<@arkiver>rewby: we have a short time deadline unfortunately. i'm not sure i'll be able to get a project running i time, but perhaps a target can be created just in case
14:39:33<@arkiver>for
14:39:43<@arkiver>archiveteam_worldoftanksforum_
14:39:54<@arkiver>worldoftanksforum_
14:40:01<@arkiver>Archive Team World of Tanks forum:
14:43:47MrMcNuggets quits [Ping timeout: 265 seconds]
14:45:23etnguyen03 (etnguyen03) joins
14:48:13etnguyen03 quits [Remote host closed the connection]
14:50:36etnguyen03 (etnguyen03) joins
14:53:22MrMcNuggets (MrMcNuggets) joins
14:53:56MrMcNuggets quits [Client Quit]
15:07:25parfait quits [Ping timeout: 255 seconds]
15:17:56<fuzzy8021>JAA how many ips are you thinking you would need on a box?
15:18:17<katia>how many you got? 👀
15:19:32<fuzzy8021>borrowing a few atm
15:19:55<fuzzy8021>nothing like what some of the others here have access too
15:22:03<h2ibot>Manu edited Mailman/2 (+76, /* http://linuxmafia.com/pipermail/ archived */): https://wiki.archiveteam.org/?diff=52264&oldid=52263
15:39:03<@JAA>arkiver: Accessing the site with cleared cookies returns the JS challenge.
15:40:21<@JAA>fuzzy8021: A /27 probably. Maybe a /28 would do.
15:43:55<@JAA>I should have a complete-ish copy of .asia's important pages now. Homepage, forums, topics, no images etc.
15:44:07<@JAA>(.asia is far smaller than the other two.)
15:44:33<@JAA>There's a chance .com will finish in time with my current setup. .eu seems very unlikely.
16:01:57<fuzzy8021>JAA i can spin up a box with a /27 if you want it. give me an hour
16:02:24nicolas17 quits [Remote host closed the connection]
16:02:45nicolas17 joins
16:05:18<@JAA>fuzzy8021: Sounds great!
16:06:58<fuzzy8021>ubuntu ok?
16:09:19<@JAA>I'd prefer Debian, but I can make Ubuntu work.
16:24:59pokechu22 quits [Read error: Connection reset by peer]
16:25:03pokechu22 (pokechu22) joins
17:02:51nulldata quits [Changing host]
17:02:51nulldata (nulldata) joins
17:27:41evanim_ joins
17:29:37evanim quits [Ping timeout: 255 seconds]
17:29:37evanim_ is now known as evanim
17:42:04<nicolas17>Matter 1.3 specifications are out, not sure if AB or #// is the best way to archive the PDFs https://transfer.archivete.am/inline/cUcYO/matter1.3.txt
17:45:45<fireonlive>ab
17:47:15Notrealname1234 (Notrealname1234) joins
18:16:15Notrealname1234 quits [Client Quit]
18:48:22tzt quits [Ping timeout: 255 seconds]
18:49:19nicolas17 quits [Ping timeout: 265 seconds]