00:00:07<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=51953&oldid=51930
00:00:42thecommentarchiver quits [Client Quit]
00:01:59Ruthalas59 quits [Quit: END OF LINE]
00:13:45Ruthalas59 (Ruthalas) joins
00:20:10<h2ibot>Pokechu22 edited Deathwatch (+520, /* 2024 */ monster hunter now forum also closing): https://wiki.archiveteam.org/?diff=51954&oldid=51952
00:24:44BearFortress joins
00:32:55<archivst>> they seem to detect us based on TLS fingerprinting?
00:33:48<archivst>what does that mean and is there an effort to get around it? (i know what tls means but i am unfamiliar with tls fingerprinting)
00:36:59<imer>a solution is in the works as far as I know, yes
00:38:03<thuban>icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login?
00:38:04<imer>"what does that mean" different TLS (the thing used to encrypt https) implementations act slightly differently, so you can "fingerprint" specific ones (like major browsers)
00:38:05<immibis>TLS fingerprinting is identifying something based on how it uses TLS e.g. which ciphers it supports. If you support this cipher but not that cipher, you must be a terrorist.
00:38:14<thuban>(nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg https://www.mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad)
00:38:48<imer>and if you see an unknown fingerprint do lots of requests, and know its not a browser, you block it
00:38:56<immibis>you're saying all someone has to do is run the archiver scripts through Tor and reddit will block tor access
00:39:17<imer>does tor actually work?
00:39:27<nicolas17>well no, reddit would still see the origin client's TLS handshake
00:39:29<immibis>reddit allows you to read it through tor
00:41:00<immibis>but you're saying if someone read it a lot with the wrong fingerprint, they could be made to automatically ban all tor users
00:42:41<archivst>How long are these bans? Are they just minute/hour level throttling, or do they last longer?
00:46:01<@JAA>→ #shreddit
00:46:27archivst quits [Client Quit]
00:46:32archivst joins
00:56:49programmerq (programmerq) joins
01:09:33<thuban>(mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...)
01:13:23<h2ibot>Blankie edited Fandom (-1, /* Download */ Fix link to more information…): https://wiki.archiveteam.org/?diff=51955&oldid=49560
01:13:24<h2ibot>IDKhowToEdit edited Deathwatch (+301, Add marketplace comment deprecation for roblox): https://wiki.archiveteam.org/?diff=51956&oldid=51954
01:13:25<h2ibot>Dango360 edited Roblox (+7342, added roblox comments removal section): https://wiki.archiveteam.org/?diff=51957&oldid=49854
01:13:26<h2ibot>IDKhowToEdit edited Roblox (+384, Added marketplace comment removal): https://wiki.archiveteam.org/?diff=51958&oldid=51957
01:15:24<h2ibot>JustAnotherArchivist edited Roblox (-369, Remove duplicate content, datetimeify): https://wiki.archiveteam.org/?diff=51959&oldid=51958
02:08:49JaffaCakes118 quits [Remote host closed the connection]
02:09:13JaffaCakes118 (JaffaCakes118) joins
02:28:18<fireonlive>https://news.ycombinator.com/item?id=39852219 < is openai going to get mad about this and lock things down lol
02:30:10<@JAA>TIL /raw/ on Discourse
02:34:49<@JAA>> Raw data was gathered into a single JSONL file by automating a browser using Playwright.
02:34:57<@JAA>Running a full browser to fetch some JSON...
02:36:33<immibis>is an effective way to bypass any check that is looking for non-approved browsers
02:37:18<@JAA>Obviously, but as far as I can see, there isn't such a check here.
02:37:44<@JAA>Or at least not one that would excessively limit the retrieval rate.
03:34:16<immibis>it's also an effective way to run all the arcane bloated SPA JS code to fetch the data for you
03:40:04Island quits [Read error: Connection reset by peer]
03:44:40zhongfu (zhongfu) joins
03:57:49archivst quits [Client Quit]
04:14:26oddline leaves
04:24:54<h2ibot>Petchea edited Bilibili (+128): https://wiki.archiveteam.org/?diff=51960&oldid=51859
05:01:49archivst joins
05:35:32JohnnyJ quits [Quit: Ping timeout (120 seconds)]
05:35:49JohnnyJ joins
05:36:18eroc19904 (eroc1990) joins
05:36:44eroc1990 quits [Read error: Connection reset by peer]
05:47:12qwertyasdfuiopghjkl quits [Quit: Ping timeout (120 seconds)]
05:47:12archivst quits [Client Quit]
05:48:53<icedice><thuban> icedice: i will look at scraping mangaupdates and mangadex for release group sites and feeding the blogspot urls to #frogger, thanks for the suggestion. i cannot access the vatoto groups page; does it require login?
05:49:03<icedice>Vatoto works for me
05:49:16<icedice>It has groups under letter categories
05:49:22<icedice><thuban> (mangadex also limits its group pagination :/ max 10000 results, search enabled on group names only. they seem cool so we might be able to get a complete list (of sites/of blogspot sites) if we ask nicely, but i would have to get on... discord...)
05:49:28Guest85 quits [Ping timeout: 258 seconds]
05:49:38<icedice>I've chatted with MangaDex staff in the past
05:49:54<icedice>I can handle it if you want
05:51:04<thuban>icedice: that sounds good, thank you!
05:51:30<icedice>No problem
05:51:45qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:59:25<icedice>thuban> (nb: mangaupdates' group list seems to go only to page 100 and cut off in the middle of 'P'. i can use the 'by letter' pages to get _most_ of the rest, but their collation order seems to put non-ascii characters at the end so there's no way to find eg https://www.mangaupdates.com/group/ago2peh/al-yans-kustarnikov except by brute-forcing or search. bad)
05:59:46<icedice>Mangaupdates has an IRC channel at #baka-updates@irc.irchighway.net
06:00:18<icedice>They handed over Imgur links from their forums to me in the past
06:01:00<icedice>However, iirc they ignored me for probably like a week at least until I poked them again and they went "here's the list, now piss off"
06:01:06<icedice>Or something along those lines
06:01:19<icedice>I think that was them, at least
06:01:31<thuban>hmmm
06:04:40<thuban>i did use search to do some spot-checking with other cyrillic characters and didn't find any results other than that group, and with cjk and didn't find anything i hadn't already seen in 'all', so paging by letter is probably Good Enough™?
06:05:29<thuban>i'm much less confident in saying that about cjk/other character sets than about cyrillic, but
06:05:59<thuban>good place to start unless/until one of us talks to them about it
06:08:07BlueMaxima quits [Read error: Connection reset by peer]
08:25:25qwertyasdfuiopghjkl quits [Client Quit]
08:44:53JaffaCakes118 quits [Remote host closed the connection]
08:45:16JaffaCakes118 (JaffaCakes118) joins
08:56:27BornOn420 quits [Ping timeout: 272 seconds]
09:00:01Bleo182600 quits [Client Quit]
09:01:26Bleo182600 joins
09:08:50BornOn420 (BornOn420) joins
09:28:26f_ (funderscore) joins
09:30:47f_ quits [Remote host closed the connection]
09:31:32f_ (funderscore) joins
09:33:18f_ quits [Remote host closed the connection]
09:36:53f_ (funderscore) joins
10:08:18qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
10:13:35zhongfu quits [Client Quit]
10:14:03Hackerpcs quits [Quit: Hackerpcs]
10:16:52Hackerpcs (Hackerpcs) joins
10:53:07pedantic-darwin quits [Client Quit]
10:53:24pedantic-darwin joins
10:57:06pedantic-darwin7 joins
10:58:07pedantic-darwin quits [Ping timeout: 255 seconds]
10:58:07pedantic-darwin7 is now known as pedantic-darwin
11:33:25jacksonchen666 (jacksonchen666) joins
11:33:26Wohlstand (Wohlstand) joins
11:35:39SootBector quits [Ping timeout: 255 seconds]
11:36:28zhongfu (zhongfu) joins
11:37:54sec^nd quits [Ping timeout: 255 seconds]
11:38:04SootBector (SootBector) joins
11:38:10zhongfu quits [Client Quit]
11:41:05zhongfu (zhongfu) joins
11:44:42sec^nd (second) joins
12:02:51KiviStone joins
12:36:42SootBector quits [Remote host closed the connection]
12:37:06SootBector (SootBector) joins
12:56:31icedice quits [Client Quit]
13:02:06SootBector quits [Remote host closed the connection]
13:02:31SootBector (SootBector) joins
13:49:43JaffaCakes118_2 (JaffaCakes118) joins
13:50:49JaffaCakes118_2 quits [Read error: Connection reset by peer]
13:51:46JaffaCakes118_2 (JaffaCakes118) joins
13:53:10JaffaCakes118 quits [Ping timeout: 255 seconds]
14:02:56icedice (icedice) joins
14:14:45Guest88 joins
14:15:13Arcorann quits [Ping timeout: 255 seconds]
14:20:51JaffaCakes118 (JaffaCakes118) joins
14:24:13JaffaCakes118_2 quits [Ping timeout: 255 seconds]
14:37:28sec^nd quits [Remote host closed the connection]
14:37:49sec^nd (second) joins
14:41:46Ruthalas59 quits [Ping timeout: 255 seconds]
14:47:08HP_Archivist (HP_Archivist) joins
14:48:28<HP_Archivist>Happened upon https://narkive.com/ - doesn't look like it's been crawled in length previously
15:07:25Wohlstand quits [Ping timeout: 255 seconds]
15:08:19thecommentarchiver joins
15:23:45jo70 joins
15:24:41<jo70>how to use itunes content and how to search on specific topic
15:38:49kiryu_ quits [Remote host closed the connection]
15:41:35kiryu joins
15:41:35kiryu quits [Changing host]
15:41:35kiryu (kiryu) joins
15:43:32jo70 quits [Client Quit]
16:08:48lunik1 quits [Client Quit]
16:09:10lunik1 joins
16:12:35lunik1 quits [Client Quit]
16:12:59lunik1 joins
16:47:52rappet quits [Quit: https://quassel-irc.org - Komfortabler Chat. Überall.]
16:49:50rappet (rappet) joins
16:57:05rachel joins
16:57:44rachel quits [Client Quit]
17:10:16Perk quits [Ping timeout: 255 seconds]
17:12:55<c3manu>can anyone tell me whether it's a good idea to grab a mailman instance using AB? the wiki page mentions a few tools, but doesn't say anything about AB
17:34:58PredatorIWD quits [Read error: Connection reset by peer]
17:41:05PredatorIWD joins
17:49:01Perk joins
17:50:37pixel leaves
17:50:38pixel (pixel) joins
18:18:46HP_Archivist quits [Client Quit]
18:32:58decky_e joins
18:35:19decky quits [Ping timeout: 255 seconds]
18:55:51<pokechu22>c3manu: pretty sure most of them have been done via AB?
19:02:10Lord_Nightmare quits [Quit: ZNC - http://znc.in]
19:05:16Island joins
19:28:43d10n_ quits [Quit: why all the #hashtags]
19:29:24d10n joins
19:41:11andrew quits [Quit: ]
19:41:49<c3manu>pokechu22: idk, that's why i'm asking ^^
19:47:00andrew (andrew) joins
19:58:55emberquill080 quits [Quit: The Lounge - https://thelounge.chat]
19:59:41emberquill080 (emberquill) joins
20:12:34adamus1red quits [Quit: SigTerm]
20:14:33<thuban>c3manu: yes, people have been doing it with archivebot (https://hackint.logs.kiska.pw/archiveteam-bs/20230616#c352608). from what i've heard mailman 2 and mailman 3 both work ok (https://hackint.logs.kiska.pw/archiveteam-bs/20230621#c353873)
20:14:35<thuban>https://wiki.archiveteam.org/index.php/Mailman/2 has ab tips for 2
20:15:43adamus1red (adamus1red) joins
20:33:23<c3manu>thuban: oh nice, thanks. i indeed do have a 2.19 here
20:33:35<c3manu>eeh 2.1.29
20:33:52emberquill080 quits [Client Quit]
20:46:41<h2ibot>Manu edited Mailman/2 (+26, add https://lists.metalab.at/ to archived list): https://wiki.archiveteam.org/?diff=51963&oldid=51875
21:50:15aninternettroll quits [Remote host closed the connection]
21:52:51aninternettroll (aninternettroll) joins
22:23:29JaffaCakes118_2 (JaffaCakes118) joins
22:26:54Church (Church) joins
22:27:31JaffaCakes118 quits [Ping timeout: 255 seconds]
22:29:30JaffaCakes118_2 quits [Remote host closed the connection]
22:37:54JaffaCakes118 (JaffaCakes118) joins
22:41:04thecommentarchiver quits [Client Quit]
23:13:04KiviStone quits [Client Quit]
23:24:00rktk quits [Quit: ZNC - https://znc.in]
23:33:44Arcorann (Arcorann) joins