00:00:45<h2ibot>JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=51228&oldid=51227
00:19:50kitonthe1et quits [Ping timeout: 240 seconds]
00:20:23kitonthenet joins
00:26:29kitonthenet quits [Ping timeout: 272 seconds]
00:29:20coretx quits [Ping timeout: 240 seconds]
00:36:54kitonthe1et joins
00:37:08coretx joins
00:37:20rohvani quits [Ping timeout: 240 seconds]
00:41:50kitonthe1et quits [Ping timeout: 240 seconds]
00:45:43<fireonlive>smol change: twitter2nitter/transferinliner/and karma system now ignore lines starting with !; so it won't go off if you're using a bot command (thanks project10); also 'known-bots' (h2ibot, botifico, and Aramaki) are skipped from them
00:50:21kitonthenet joins
00:54:07rohvani joins
00:57:31kitonthenet quits [Ping timeout: 272 seconds]
00:58:17ArcticCircleSys quits [Ping timeout: 265 seconds]
00:59:45rohvani quits [Client Quit]
01:02:14rohvani joins
01:12:01balrog_ quits [Quit: Bye]
01:19:40balrog (balrog) joins
01:31:50Barto quits [Ping timeout: 240 seconds]
01:32:38kitonthe2et joins
01:34:29Barto (Barto) joins
01:46:55kitonthe2et quits [Ping timeout: 272 seconds]
01:47:38kitonthe2et joins
01:59:35kitonthe2et quits [Ping timeout: 272 seconds]
02:15:09kitonthe2et joins
02:23:20MetaNova quits [Ping timeout: 240 seconds]
02:28:00MetaNova (MetaNova) joins
03:00:40datechnoman5 (datechnoman) joins
03:02:20datechnoman quits [Ping timeout: 240 seconds]
03:02:21datechnoman5 is now known as datechnoman
03:08:20datechnoman quits [Ping timeout: 240 seconds]
03:19:47nyany (nyany) joins
03:30:20@OrIdow6 quits [Ping timeout: 240 seconds]
03:32:16OrIdow6 (OrIdow6) joins
03:32:16@ChanServ sets mode: +o OrIdow6
05:21:50kitonthe2et quits [Ping timeout: 240 seconds]
05:31:27kitonthenet joins
05:35:50kitonthenet quits [Ping timeout: 240 seconds]
05:48:55kitonthe1et joins
05:52:18BlueMaxima quits [Read error: Connection reset by peer]
06:33:15<@JAA>Sanqui: Just a brief update, all of those linked Webzdarma jobs are 4.5 TiB, so it'll take a while to download them all, even at the 60 MB/s I'm getting from right next to IA.
06:35:55Island quits [Read error: Connection reset by peer]
06:36:10<h2ibot>Arctic Circle System edited Alive... OR ARE THEY (+383, /* Endangered */ Added Kirby's Rainbow Resort): https://wiki.archiveteam.org/?diff=51229&oldid=51031
07:02:01DogsRNice quits [Read error: Connection reset by peer]
07:23:50bladem quits [Ping timeout: 240 seconds]
07:25:46bladem (bladem) joins
07:44:20kitonthe1et quits [Ping timeout: 240 seconds]
07:57:22kitonthenet joins
07:58:19Arcorann (Arcorann) joins
08:04:23kitonthenet quits [Ping timeout: 272 seconds]
08:16:43kitonthe2et joins
08:37:01icedice (icedice) joins
08:37:30icedice quits [Remote host closed the connection]
08:37:54icedice (icedice) joins
08:39:13kitonthe2et quits [Ping timeout: 272 seconds]
08:50:10kitonthe2et joins
09:01:23kitonthe2et quits [Ping timeout: 272 seconds]
09:12:29kitonthenet joins
09:51:43<@Sanqui>Thanks JAA. Problem is the ones that had offsite, sadly not enough foresight there. In the long term we will be making and keeping our own copies
10:00:02Bleo1826 quits [Client Quit]
10:01:20Bleo1826 joins
10:03:36icedice quits [Client Quit]
10:05:50kitonthenet quits [Ping timeout: 240 seconds]
10:08:09datechnoman (datechnoman) joins
10:17:09kitonthe1et joins
10:42:43kitonthe1et quits [Ping timeout: 272 seconds]
10:54:06kitonthe1et joins
11:10:12icedice (icedice) joins
12:22:12BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
12:30:50Arcorann quits [Ping timeout: 240 seconds]
12:40:02icedice2 (icedice) joins
12:43:03icedice quits [Ping timeout: 272 seconds]
12:43:12icedice (icedice) joins
12:45:35icedice2 quits [Ping timeout: 272 seconds]
12:50:12supercar99 joins
12:50:20kitonthe1et quits [Ping timeout: 240 seconds]
12:50:24supercar99 quits [Remote host closed the connection]
12:50:33supercar99 joins
12:53:00supercar99 quits [Remote host closed the connection]
12:54:39kitonthenet joins
13:01:20kitonthenet quits [Ping timeout: 240 seconds]
13:01:43kitonthenet joins
13:23:32icedice quits [Client Quit]
13:24:13kitonthenet quits [Ping timeout: 272 seconds]
13:25:45kitonthe1et joins
13:28:11gfhh quits [Client Quit]
13:44:53eroc19905 quits [Quit: The Lounge - https://thelounge.chat]
13:49:27eroc1990 (eroc1990) joins
13:53:59katocala quits [Ping timeout: 272 seconds]
13:54:21katocala joins
13:55:01icedice (icedice) joins
14:03:45Wohlstand (Wohlstand) joins
14:04:10<imer>Sanqui: would you like me to run that through the common crawl cdx? I have that lying around and from a quick spot check there is some matching links in there
14:21:54gfhh joins
14:51:30<@Sanqui>imer: yes please, ^https?://(www.)?(uloz.to|ulozto.cz|ulozto.sk|ulozto.net|zachowajto.pl)
14:58:31<kiska>I could try the fdns data set I have
14:59:13kitonthe1et quits [Ping timeout: 272 seconds]
14:59:57kitonthe2et joins
15:09:59kitonthe2et quits [Ping timeout: 272 seconds]
15:20:50kitonthe2et joins
15:48:27icedice quits [Client Quit]
15:54:19kitonthe2et quits [Ping timeout: 272 seconds]
15:58:41kitonthenet joins
15:59:58Wohlstand quits [Client Quit]
16:00:54<imer>Sanqui: ack, will be a few days to run through it all
16:06:02Island joins
16:26:01<@Sanqui>imer: deadline is tomorrow, so probably no need then
16:26:08<@Sanqui>thanks though
16:26:19<@Sanqui>maybe if it's possible to run on a subset of .cz sites
16:26:22<@Sanqui>(and .sk)
16:26:24<@Sanqui>it would make sense
16:27:02<imer>oh. oops
16:27:18<imer>i'll toss you over the partial results then as I get them
16:31:23ArcticCircleSys joins
17:24:17ArcticCircleSys quits [Ping timeout: 265 seconds]
17:28:18<@JAA>Sanqui: Sometime in the future, all AB jobs' databases should be kept, and then this wouldn't be an issue. wpull still extracts all links when running with --no-offsite-links, it just then ignores them silently, so they only appear in the DB.
17:32:52parfait (kdqep) joins
17:35:25<Vokun>Can these sorts of links be put into AB? This person passed away and if possible i'd like to have these pages saved. Also, can AB grab a youtube channel? Just the pages, not videos. I already put it into downthetube
17:35:25<Vokun>https://www.instagram.com/chesyarts
17:35:25<Vokun>https://ko-fi.com/chesyarts
17:35:25<Vokun>https://www.tiktok.com/@chesyarts0w0
17:35:25<Vokun>https://www.youtube.com/@chesyarts1691
17:37:11<pokechu22>Vokun: I don't think any o those work properly in AB, n; all of those sites have strict rate-limiting and are JS-based, and AB will only get 429s
17:37:56<Vokun>rip
17:39:32<fireonlive>youtube can go to #down-the-tube as long as it's in scope https://wiki.archiveteam.org/index.php/YouTube#Scope (someone dying is)
17:47:20kitonthenet quits [Ping timeout: 240 seconds]
17:48:46<Vokun>I put it in. Thanks
17:50:00<fireonlive>:)
17:54:40kitonthenet joins
18:15:33aninternettroll quits [Ping timeout: 272 seconds]
18:15:50aninternettroll (aninternettroll) joins
18:16:22aninternettroll quits [Remote host closed the connection]
18:18:29aninternettroll (aninternettroll) joins
18:22:47Lej joins
18:25:02Lej leaves
18:30:45<h2ibot>Pokechu22 edited DokuWiki (+472, mention taskrunner): https://wiki.archiveteam.org/?diff=51230&oldid=51010
18:33:17kitonthenet quits [Ping timeout: 272 seconds]
18:44:37kitonthe1et joins
18:49:58katocala quits [Remote host closed the connection]
18:52:17HP_Archivist quits [Ping timeout: 272 seconds]
19:06:00BlueMaxima joins
19:08:02polduran joins
19:12:09IDK (IDK) joins
19:14:06<Pedrosso>the archiveteam wikipage on bluesky is very short, has anything been done about that?
19:20:50kitonthe1et quits [Ping timeout: 240 seconds]
19:22:39<polduran>hello everyone. I might have something for the archivebot if anyone has time to put it in the queue: https://www.summoners-inn.de is the biggest and probably one of the oldest german league of legends news website with articles back to 2013. today, they announced the end of Summoner's Inn after their parent company Freaks4U lost their partnership
19:22:40<polduran>to host the official german Leauge of Legends broadcast.
19:25:05<pokechu22>polduran: I've queued it, not sure how well it'll run though as they don't seem to have a sitemap
19:26:10<pokechu22>I also queued https://www.freaks4u.de
19:30:31<polduran>let's hope for the best^^ thank you. and yeah, good idea ^-^" maybe also the german LoL-league? https://www.primeleague.gg/ not sure if there is anything interessting on there and how and if the situation also affects this, but the website is hosted and copyrighted by freaks4u
19:31:20coretx quits [Ping timeout: 240 seconds]
19:32:08kitonthe1et joins
19:32:24<pokechu22>Alright
19:33:05<polduran>thanks again and have a nice day :D
19:36:05Megame (Megame) joins
19:37:17polduran quits [Remote host closed the connection]
19:43:50kitonthe1et quits [Ping timeout: 240 seconds]
19:45:01<sdomi>continuing on the discussion from #//; imer: what would be the best way to handle this JS mess?
19:45:27<sdomi>I can probably write a scraper that'll generate a list of URLs from these downloaders; there isn't much metadata to be saved anyways, so IMO saving just the ZIPs is a good starting point
19:47:20kitonthenet joins
19:48:09<sdomi>imer: hey, also, can you verify if the downloader3.html still works? I.. think I crashed it
19:48:27BornOn420_ (BornOn420) joins
19:48:28<masterX244>did you check with devtools how the EULA acceptance is handled?
19:48:30<sdomi>checked from two IPs and several browsers, no dice
19:48:39<sdomi>masterX244: on some of them there's no EULA at all
19:48:48<masterX244>with some luck that can be faked with some headers/constant request stuff
19:48:49<sdomi>so I'm focusing on that right now
19:49:57<masterX244>had a site once that had a ad-intercept on first download under a session, fooled that by "wasting" that with a url-parametered URL before the real crawl started
19:50:27<sdomi>https://f.sakamoto.pl/UwUMicKuA.png ,_,
19:51:01<masterX244>2 "wasted" requests ion the WARC but better than a lost one. POST sucks for archivebot though
19:52:04<sdomi>masterX244: no, no; i'm not getting any responses anymore
19:52:07<sdomi>oh, it's back now
19:52:26<sdomi>so what I did was.. I tried a wildcard instead of the version number, just to check what would happen
19:52:27BornOn420 quits [Ping timeout: 272 seconds]
19:52:41<masterX244>ahh, poking around for shortcuts
19:52:43<sdomi>and it seems that it crashed their entire API for a solid minute
19:52:56<sdomi>so. uh. we need to be careful around this one XD
19:54:28<masterX244>cockroach-infested area :(, that sucks
19:55:39<sdomi>btw, how does WARC work? I know that I can run a mitm proxy for myself, but how would I go about handing it over to IA? what are the steps/precautions/who do I need to talk to...? :p
19:55:54<nicolas17>"you don't"
19:56:44<nicolas17>you can upload WARC files to archive.org, but they won't be used by web.archive.org, because there's no way to know if they actually match the website you mirrored or if you messed with the content (accidentally or intentionally)
19:56:54<sdomi>yes, that I know
19:57:25<sdomi>i was more asking about... what steps do I take to actually get the content preserved with y'alls help?
20:04:00<fireonlive> a project/mini-project proposal let’s say :3
20:05:55<sdomi>figured out how the EULA stuff works! it's a static JS function that takes params from the current URL
20:06:05<sdomi>so this is very much possible to automate
20:06:27<sdomi>function in question: https://pastebin.com/9bsxLDLu
20:09:20kitonthenet quits [Ping timeout: 240 seconds]
20:11:10katocala joins
20:14:45kitonthenet joins
20:38:02<imer>sdomi: sorry, stepped away for a bit, I have not the slightest idea how to do this - although I am probably no the person to ask haha
20:38:15<sdomi>imer: writing a scraper as we speak :p
20:38:18<imer>nice
20:40:50kitonthenet quits [Ping timeout: 240 seconds]
20:52:00kitonthe1et joins
20:54:10Webuser533 joins
20:57:31aninternettroll quits [Read error: Connection reset by peer]
20:57:34aninternettroll_ (aninternettroll) joins
20:57:41aninternettroll_ is now known as aninternettroll
20:57:52<Webuser533>could you help me find an archive of this video https://www.youtube.com/watch?v=V3gbrP2U10A ?
21:05:05<that_lurker>#youtubearchive would be a fitting channel for that question
21:06:26<Webuser533>alright thank you !
21:07:18Webuser533 leaves
21:37:10Naruyoko5 joins
21:39:29Naruyoko quits [Ping timeout: 272 seconds]
21:52:29HP_Archivist (HP_Archivist) joins
21:54:45DogsRNice joins
22:01:39Megame quits [Client Quit]
22:13:41kitonthe1et quits [Ping timeout: 272 seconds]
22:13:59kitonthenet joins
22:31:12Dango360_ joins
22:32:58_Dango360 joins
22:34:50Dango360 quits [Ping timeout: 240 seconds]
22:37:07Dango360_ quits [Ping timeout: 272 seconds]
22:43:20kitonthenet quits [Ping timeout: 240 seconds]
22:56:02kitonthenet joins
23:22:56<sdomi>https://pastebin.com/gAwF2bwc URLs
23:34:55<sdomi>https://f.sakamoto.pl/nvidia_rescue.tar.gz here's the code I wrote
23:36:49<sdomi>turns out that most docs URLs are completely dead already, or point to generic sites that have likely been archived for ages. i'm downloading real "data" locally right now, gonna upload as an item onto IA later ^-^
23:37:55kitonthenet quits [Ping timeout: 272 seconds]
23:40:22kitonthe2et joins
23:45:50kitonthe2et quits [Ping timeout: 240 seconds]
23:46:55aninternettroll quits [Read error: Connection reset by peer]
23:47:38aninternettroll (aninternettroll) joins
23:52:09kitonthenet joins
23:53:06_Dango360 quits [Client Quit]
23:53:26Dango360 (Dango360) joins