00:00:45 | <h2ibot> | JAABot edited CurrentWarriorProject (+2): https://wiki.archiveteam.org/?diff=51228&oldid=51227 |
00:19:50 | | kitonthe1et quits [Ping timeout: 240 seconds] |
00:20:23 | | kitonthenet joins |
00:26:29 | | kitonthenet quits [Ping timeout: 272 seconds] |
00:29:20 | | coretx quits [Ping timeout: 240 seconds] |
00:36:54 | | kitonthe1et joins |
00:37:08 | | coretx joins |
00:37:20 | | rohvani quits [Ping timeout: 240 seconds] |
00:41:50 | | kitonthe1et quits [Ping timeout: 240 seconds] |
00:45:43 | <fireonlive> | smol change: twitter2nitter/transferinliner/and karma system now ignore lines starting with !; so it won't go off if you're using a bot command (thanks project10); also 'known-bots' (h2ibot, botifico, and Aramaki) are skipped from them |
00:50:21 | | kitonthenet joins |
00:54:07 | | rohvani joins |
00:57:31 | | kitonthenet quits [Ping timeout: 272 seconds] |
00:58:17 | | ArcticCircleSys quits [Ping timeout: 265 seconds] |
00:59:45 | | rohvani quits [Client Quit] |
01:02:14 | | rohvani joins |
01:12:01 | | balrog_ quits [Quit: Bye] |
01:19:40 | | balrog (balrog) joins |
01:31:50 | | Barto quits [Ping timeout: 240 seconds] |
01:32:38 | | kitonthe2et joins |
01:34:29 | | Barto (Barto) joins |
01:46:55 | | kitonthe2et quits [Ping timeout: 272 seconds] |
01:47:38 | | kitonthe2et joins |
01:59:35 | | kitonthe2et quits [Ping timeout: 272 seconds] |
02:15:09 | | kitonthe2et joins |
02:23:20 | | MetaNova quits [Ping timeout: 240 seconds] |
02:28:00 | | MetaNova (MetaNova) joins |
03:00:40 | | datechnoman5 (datechnoman) joins |
03:02:20 | | datechnoman quits [Ping timeout: 240 seconds] |
03:02:21 | | datechnoman5 is now known as datechnoman |
03:08:20 | | datechnoman quits [Ping timeout: 240 seconds] |
03:19:47 | | nyany (nyany) joins |
03:30:20 | | @OrIdow6 quits [Ping timeout: 240 seconds] |
03:32:16 | | OrIdow6 (OrIdow6) joins |
03:32:16 | | @ChanServ sets mode: +o OrIdow6 |
05:21:50 | | kitonthe2et quits [Ping timeout: 240 seconds] |
05:31:27 | | kitonthenet joins |
05:35:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
05:48:55 | | kitonthe1et joins |
05:52:18 | | BlueMaxima quits [Read error: Connection reset by peer] |
06:33:15 | <@JAA> | Sanqui: Just a brief update, all of those linked Webzdarma jobs are 4.5 TiB, so it'll take a while to download them all, even at the 60 MB/s I'm getting from right next to IA. |
06:35:55 | | Island quits [Read error: Connection reset by peer] |
06:36:10 | <h2ibot> | Arctic Circle System edited Alive... OR ARE THEY (+383, /* Endangered */ Added Kirby's Rainbow Resort): https://wiki.archiveteam.org/?diff=51229&oldid=51031 |
07:02:01 | | DogsRNice quits [Read error: Connection reset by peer] |
07:23:50 | | bladem quits [Ping timeout: 240 seconds] |
07:25:46 | | bladem (bladem) joins |
07:44:20 | | kitonthe1et quits [Ping timeout: 240 seconds] |
07:57:22 | | kitonthenet joins |
07:58:19 | | Arcorann (Arcorann) joins |
08:04:23 | | kitonthenet quits [Ping timeout: 272 seconds] |
08:16:43 | | kitonthe2et joins |
08:37:01 | | icedice (icedice) joins |
08:37:30 | | icedice quits [Remote host closed the connection] |
08:37:54 | | icedice (icedice) joins |
08:39:13 | | kitonthe2et quits [Ping timeout: 272 seconds] |
08:50:10 | | kitonthe2et joins |
09:01:23 | | kitonthe2et quits [Ping timeout: 272 seconds] |
09:12:29 | | kitonthenet joins |
09:51:43 | <@Sanqui> | Thanks JAA. Problem is the ones that had offsite, sadly not enough foresight there. In the long term we will be making and keeping our own copies |
10:00:02 | | Bleo1826 quits [Client Quit] |
10:01:20 | | Bleo1826 joins |
10:03:36 | | icedice quits [Client Quit] |
10:05:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
10:08:09 | | datechnoman (datechnoman) joins |
10:17:09 | | kitonthe1et joins |
10:42:43 | | kitonthe1et quits [Ping timeout: 272 seconds] |
10:54:06 | | kitonthe1et joins |
11:10:12 | | icedice (icedice) joins |
12:22:12 | | BearFortress quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
12:30:50 | | Arcorann quits [Ping timeout: 240 seconds] |
12:40:02 | | icedice2 (icedice) joins |
12:43:03 | | icedice quits [Ping timeout: 272 seconds] |
12:43:12 | | icedice (icedice) joins |
12:45:35 | | icedice2 quits [Ping timeout: 272 seconds] |
12:50:12 | | supercar99 joins |
12:50:20 | | kitonthe1et quits [Ping timeout: 240 seconds] |
12:50:24 | | supercar99 quits [Remote host closed the connection] |
12:50:33 | | supercar99 joins |
12:53:00 | | supercar99 quits [Remote host closed the connection] |
12:54:39 | | kitonthenet joins |
13:01:20 | | kitonthenet quits [Ping timeout: 240 seconds] |
13:01:43 | | kitonthenet joins |
13:23:32 | | icedice quits [Client Quit] |
13:24:13 | | kitonthenet quits [Ping timeout: 272 seconds] |
13:25:45 | | kitonthe1et joins |
13:28:11 | | gfhh quits [Client Quit] |
13:44:53 | | eroc19905 quits [Quit: The Lounge - https://thelounge.chat] |
13:49:27 | | eroc1990 (eroc1990) joins |
13:53:59 | | katocala quits [Ping timeout: 272 seconds] |
13:54:21 | | katocala joins |
13:55:01 | | icedice (icedice) joins |
14:03:45 | | Wohlstand (Wohlstand) joins |
14:04:10 | <imer> | Sanqui: would you like me to run that through the common crawl cdx? I have that lying around and from a quick spot check there is some matching links in there |
14:11:29 | | katocala is now authenticated as katocala |
14:21:54 | | gfhh joins |
14:51:30 | <@Sanqui> | imer: yes please, ^https?://(www.)?(uloz.to|ulozto.cz|ulozto.sk|ulozto.net|zachowajto.pl) |
14:58:31 | <kiska> | I could try the fdns data set I have |
14:59:13 | | kitonthe1et quits [Ping timeout: 272 seconds] |
14:59:57 | | kitonthe2et joins |
15:09:59 | | kitonthe2et quits [Ping timeout: 272 seconds] |
15:20:50 | | kitonthe2et joins |
15:48:27 | | icedice quits [Client Quit] |
15:54:19 | | kitonthe2et quits [Ping timeout: 272 seconds] |
15:58:41 | | kitonthenet joins |
15:59:58 | | Wohlstand quits [Client Quit] |
16:00:54 | <imer> | Sanqui: ack, will be a few days to run through it all |
16:06:02 | | Island joins |
16:26:01 | <@Sanqui> | imer: deadline is tomorrow, so probably no need then |
16:26:08 | <@Sanqui> | thanks though |
16:26:19 | <@Sanqui> | maybe if it's possible to run on a subset of .cz sites |
16:26:22 | <@Sanqui> | (and .sk) |
16:26:24 | <@Sanqui> | it would make sense |
16:27:02 | <imer> | oh. oops |
16:27:18 | <imer> | i'll toss you over the partial results then as I get them |
16:31:23 | | ArcticCircleSys joins |
17:24:17 | | ArcticCircleSys quits [Ping timeout: 265 seconds] |
17:28:18 | <@JAA> | Sanqui: Sometime in the future, all AB jobs' databases should be kept, and then this wouldn't be an issue. wpull still extracts all links when running with --no-offsite-links, it just then ignores them silently, so they only appear in the DB. |
17:32:52 | | parfait (kdqep) joins |
17:35:25 | <Vokun> | Can these sorts of links be put into AB? This person passed away and if possible i'd like to have these pages saved. Also, can AB grab a youtube channel? Just the pages, not videos. I already put it into downthetube |
17:35:25 | <Vokun> | https://www.instagram.com/chesyarts |
17:35:25 | <Vokun> | https://ko-fi.com/chesyarts |
17:35:25 | <Vokun> | https://www.tiktok.com/@chesyarts0w0 |
17:35:25 | <Vokun> | https://www.youtube.com/@chesyarts1691 |
17:37:11 | <pokechu22> | Vokun: I don't think any o those work properly in AB, n; all of those sites have strict rate-limiting and are JS-based, and AB will only get 429s |
17:37:56 | <Vokun> | rip |
17:39:32 | <fireonlive> | youtube can go to #down-the-tube as long as it's in scope https://wiki.archiveteam.org/index.php/YouTube#Scope (someone dying is) |
17:47:20 | | kitonthenet quits [Ping timeout: 240 seconds] |
17:48:46 | <Vokun> | I put it in. Thanks |
17:50:00 | <fireonlive> | :) |
17:54:40 | | kitonthenet joins |
18:15:33 | | aninternettroll quits [Ping timeout: 272 seconds] |
18:15:50 | | aninternettroll (aninternettroll) joins |
18:16:22 | | aninternettroll quits [Remote host closed the connection] |
18:18:29 | | aninternettroll (aninternettroll) joins |
18:22:47 | | Lej joins |
18:25:02 | | Lej leaves |
18:30:45 | <h2ibot> | Pokechu22 edited DokuWiki (+472, mention taskrunner): https://wiki.archiveteam.org/?diff=51230&oldid=51010 |
18:33:17 | | kitonthenet quits [Ping timeout: 272 seconds] |
18:44:37 | | kitonthe1et joins |
18:49:58 | | katocala quits [Remote host closed the connection] |
18:52:17 | | HP_Archivist quits [Ping timeout: 272 seconds] |
19:06:00 | | BlueMaxima joins |
19:08:02 | | polduran joins |
19:12:09 | | IDK (IDK) joins |
19:14:06 | <Pedrosso> | the archiveteam wikipage on bluesky is very short, has anything been done about that? |
19:20:50 | | kitonthe1et quits [Ping timeout: 240 seconds] |
19:22:39 | <polduran> | hello everyone. I might have something for the archivebot if anyone has time to put it in the queue: https://www.summoners-inn.de is the biggest and probably one of the oldest german league of legends news website with articles back to 2013. today, they announced the end of Summoner's Inn after their parent company Freaks4U lost their partnership |
19:22:40 | <polduran> | to host the official german Leauge of Legends broadcast. |
19:25:05 | <pokechu22> | polduran: I've queued it, not sure how well it'll run though as they don't seem to have a sitemap |
19:26:10 | <pokechu22> | I also queued https://www.freaks4u.de |
19:30:31 | <polduran> | let's hope for the best^^ thank you. and yeah, good idea ^-^" maybe also the german LoL-league? https://www.primeleague.gg/ not sure if there is anything interessting on there and how and if the situation also affects this, but the website is hosted and copyrighted by freaks4u |
19:31:20 | | coretx quits [Ping timeout: 240 seconds] |
19:32:08 | | kitonthe1et joins |
19:32:24 | <pokechu22> | Alright |
19:33:05 | <polduran> | thanks again and have a nice day :D |
19:36:05 | | Megame (Megame) joins |
19:37:17 | | polduran quits [Remote host closed the connection] |
19:43:50 | | kitonthe1et quits [Ping timeout: 240 seconds] |
19:45:01 | <sdomi> | continuing on the discussion from #//; imer: what would be the best way to handle this JS mess? |
19:45:27 | <sdomi> | I can probably write a scraper that'll generate a list of URLs from these downloaders; there isn't much metadata to be saved anyways, so IMO saving just the ZIPs is a good starting point |
19:47:20 | | kitonthenet joins |
19:48:09 | <sdomi> | imer: hey, also, can you verify if the downloader3.html still works? I.. think I crashed it |
19:48:27 | | BornOn420_ (BornOn420) joins |
19:48:28 | <masterX244> | did you check with devtools how the EULA acceptance is handled? |
19:48:30 | <sdomi> | checked from two IPs and several browsers, no dice |
19:48:39 | <sdomi> | masterX244: on some of them there's no EULA at all |
19:48:48 | <masterX244> | with some luck that can be faked with some headers/constant request stuff |
19:48:49 | <sdomi> | so I'm focusing on that right now |
19:49:57 | <masterX244> | had a site once that had a ad-intercept on first download under a session, fooled that by "wasting" that with a url-parametered URL before the real crawl started |
19:50:27 | <sdomi> | https://f.sakamoto.pl/UwUMicKuA.png ,_, |
19:51:01 | <masterX244> | 2 "wasted" requests ion the WARC but better than a lost one. POST sucks for archivebot though |
19:52:04 | <sdomi> | masterX244: no, no; i'm not getting any responses anymore |
19:52:07 | <sdomi> | oh, it's back now |
19:52:26 | <sdomi> | so what I did was.. I tried a wildcard instead of the version number, just to check what would happen |
19:52:27 | | BornOn420 quits [Ping timeout: 272 seconds] |
19:52:41 | <masterX244> | ahh, poking around for shortcuts |
19:52:43 | <sdomi> | and it seems that it crashed their entire API for a solid minute |
19:52:56 | <sdomi> | so. uh. we need to be careful around this one XD |
19:54:28 | <masterX244> | cockroach-infested area :(, that sucks |
19:55:39 | <sdomi> | btw, how does WARC work? I know that I can run a mitm proxy for myself, but how would I go about handing it over to IA? what are the steps/precautions/who do I need to talk to...? :p |
19:55:54 | <nicolas17> | "you don't" |
19:56:44 | <nicolas17> | you can upload WARC files to archive.org, but they won't be used by web.archive.org, because there's no way to know if they actually match the website you mirrored or if you messed with the content (accidentally or intentionally) |
19:56:54 | <sdomi> | yes, that I know |
19:57:25 | <sdomi> | i was more asking about... what steps do I take to actually get the content preserved with y'alls help? |
20:04:00 | <fireonlive> | a project/mini-project proposal let’s say :3 |
20:05:55 | <sdomi> | figured out how the EULA stuff works! it's a static JS function that takes params from the current URL |
20:06:05 | <sdomi> | so this is very much possible to automate |
20:06:27 | <sdomi> | function in question: https://pastebin.com/9bsxLDLu |
20:09:20 | | kitonthenet quits [Ping timeout: 240 seconds] |
20:11:10 | | katocala joins |
20:14:28 | | katocala is now authenticated as katocala |
20:14:45 | | kitonthenet joins |
20:38:02 | <imer> | sdomi: sorry, stepped away for a bit, I have not the slightest idea how to do this - although I am probably no the person to ask haha |
20:38:15 | <sdomi> | imer: writing a scraper as we speak :p |
20:38:18 | <imer> | nice |
20:40:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
20:52:00 | | kitonthe1et joins |
20:54:10 | | Webuser533 joins |
20:57:31 | | aninternettroll quits [Read error: Connection reset by peer] |
20:57:34 | | aninternettroll_ (aninternettroll) joins |
20:57:41 | | aninternettroll_ is now known as aninternettroll |
20:57:52 | <Webuser533> | could you help me find an archive of this video https://www.youtube.com/watch?v=V3gbrP2U10A ? |
21:05:05 | <that_lurker> | #youtubearchive would be a fitting channel for that question |
21:06:26 | <Webuser533> | alright thank you ! |
21:07:18 | | Webuser533 leaves |
21:37:10 | | Naruyoko5 joins |
21:39:29 | | Naruyoko quits [Ping timeout: 272 seconds] |
21:52:29 | | HP_Archivist (HP_Archivist) joins |
21:54:45 | | DogsRNice joins |
22:01:39 | | Megame quits [Client Quit] |
22:13:41 | | kitonthe1et quits [Ping timeout: 272 seconds] |
22:13:59 | | kitonthenet joins |
22:31:12 | | Dango360_ joins |
22:32:58 | | _Dango360 joins |
22:34:50 | | Dango360 quits [Ping timeout: 240 seconds] |
22:37:07 | | Dango360_ quits [Ping timeout: 272 seconds] |
22:43:20 | | kitonthenet quits [Ping timeout: 240 seconds] |
22:56:02 | | kitonthenet joins |
23:22:56 | <sdomi> | https://pastebin.com/gAwF2bwc URLs |
23:34:55 | <sdomi> | https://f.sakamoto.pl/nvidia_rescue.tar.gz here's the code I wrote |
23:36:49 | <sdomi> | turns out that most docs URLs are completely dead already, or point to generic sites that have likely been archived for ages. i'm downloading real "data" locally right now, gonna upload as an item onto IA later ^-^ |
23:37:55 | | kitonthenet quits [Ping timeout: 272 seconds] |
23:40:22 | | kitonthe2et joins |
23:45:50 | | kitonthe2et quits [Ping timeout: 240 seconds] |
23:46:55 | | aninternettroll quits [Read error: Connection reset by peer] |
23:47:38 | | aninternettroll (aninternettroll) joins |
23:52:09 | | kitonthenet joins |
23:53:06 | | _Dango360 quits [Client Quit] |
23:53:26 | | Dango360 (Dango360) joins |