00:02:25 | | monoxane8 (monoxane) joins |
00:03:09 | | monoxane quits [Ping timeout: 272 seconds] |
00:03:09 | | monoxane8 is now known as monoxane |
00:03:47 | | etnguyen03 quits [Ping timeout: 272 seconds] |
00:08:30 | | ealia joins |
00:09:47 | | ealia quits [Remote host closed the connection] |
00:11:41 | | etnguyen03 (etnguyen03) joins |
00:15:55 | | monoxane quits [Client Quit] |
00:16:18 | | Megame (Megame) joins |
00:23:07 | | kitonthenet joins |
00:24:03 | | tzt quits [Ping timeout: 272 seconds] |
00:25:41 | | HP_Archivist quits [Read error: Connection reset by peer] |
00:27:44 | | kitonthenet quits [Ping timeout: 265 seconds] |
00:27:49 | | tzt (tzt) joins |
00:28:29 | | systwi quits [Ping timeout: 272 seconds] |
00:34:00 | <h2ibot> | JustAnotherArchivist edited Blogger (+182, Fix source and tracker links, update status): https://wiki.archiveteam.org/?diff=51174&oldid=51148 |
00:37:09 | | monoxane (monoxane) joins |
00:45:42 | | systwi (systwi) joins |
01:00:37 | | atphoenix_ quits [Remote host closed the connection] |
01:01:20 | | atphoenix_ (atphoenix) joins |
01:18:31 | | etnguyen03 quits [Ping timeout: 272 seconds] |
01:19:18 | | kitonthenet joins |
01:36:33 | | etnguyen03 (etnguyen03) joins |
01:49:25 | | kitonthenet quits [Ping timeout: 265 seconds] |
01:56:04 | | Megame quits [Client Quit] |
01:58:44 | | DogsRNice_ quits [Read error: Connection reset by peer] |
01:58:58 | | DogsRNice joins |
02:09:06 | <phuz-test> | Anyone wanna archive the Questionable Content forums? (https://forums.questionablecontent.net) It's a webforum with approximately 900k posts, about 20 years old. Mostly webcomic related content. |
02:09:09 | | phuz-test is now known as phuzion |
02:09:59 | <nicolas17> | why though? is it at risk of dying? |
02:10:40 | <phuzion> | Yeah, the comic's server is showing 503s a lot, and it's likely that they're going to move to a new host. The forum was locked some time ago, and no new posts or registrations are allowed |
02:10:49 | <phuzion> | Locked as of Jan 1 this year. |
02:11:12 | <nicolas17> | oh :| I didn't know of that |
02:11:40 | <phuzion> | The current speculation is that Jeph (comic author) and his tech team might opt not to migrate the forums because of the additional complexity in doing so |
02:11:53 | | nicolas17 hasn't even read the comics in a few years |
02:14:15 | | etnguyen03 quits [Ping timeout: 272 seconds] |
02:14:48 | | dumbgoy joins |
02:15:24 | | Pedrosso didn't know of its existence |
02:15:41 | | Pedrosso wants it saved anyway, of course |
02:18:03 | | dumbgoy__ quits [Ping timeout: 272 seconds] |
02:21:36 | <pokechu22> | phuzion: I've queued an archivebot job for them - 900k posts is large but should be doable |
02:22:36 | <phuzion> | pokechu22: the forums are behind cloudflare, so I'd check to make sure that it's working properly at some point. |
02:23:52 | | etnguyen03 (etnguyen03) joins |
02:25:55 | | CandidSparrow2 joins |
02:26:53 | <pokechu22> | Yeah - there's other stuff running in archivebot right now so it's not started yet, but if it finishes abnormally quickly then I'll know it's cloudflare at least (not sure what we could do about it for something that large though) |
02:28:11 | | CandidSparrow quits [Ping timeout: 272 seconds] |
02:28:11 | | CandidSparrow2 is now known as CandidSparrow |
02:37:18 | <@JAA> | ohai |
02:37:39 | <@JAA> | Looks like their Buttflare config isn't very aggressive, so should be possible even if it doesn't work with AB. |
02:40:14 | | BlueMaxima joins |
02:42:11 | <Pedrosso> | should DeviantArt's sitemap be grabbed proactively? I'm surprised it hasn't been hidden from public yet and it's very big |
02:43:11 | <@JAA> | Seems to be running fine, albeit with timeouts and general slugginess. |
02:54:29 | | dumbgoy_ joins |
02:57:57 | | dumbgoy quits [Ping timeout: 272 seconds] |
03:03:57 | | kitonthe1et joins |
03:19:17 | <phuzion> | JAA: Yeah that server is creaky right now. The main comic page doesn't load about 1/3 of the time, it seems. |
03:24:24 | | systwi_ joins |
03:28:21 | | Barto quits [Ping timeout: 272 seconds] |
03:34:03 | | kitonthe1et quits [Ping timeout: 272 seconds] |
03:35:44 | <@JAA> | Barto pls |
03:35:47 | <@JAA> | Worst SLA ever |
03:35:52 | <fireonlive> | xD |
03:38:56 | | BlueMaxima_ joins |
03:39:25 | | CraftByte quits [Client Quit] |
03:39:25 | | BlueMaxima quits [Remote host closed the connection] |
03:39:25 | | atphoenix_ quits [Remote host closed the connection] |
03:39:25 | | DigitalDragons quits [Client Quit] |
03:39:25 | | AK quits [Client Quit] |
03:39:25 | | CandidSparrow quits [Client Quit] |
03:39:25 | | DogsRNice quits [Remote host closed the connection] |
03:39:26 | | DogsRNice joins |
03:39:30 | | CraftByte (DragonSec|CraftByte) joins |
03:39:36 | | CandidSparrow joins |
03:39:37 | | DigitalDragons (DigitalDragons) joins |
03:39:51 | | atphoenix_ (atphoenix) joins |
03:39:51 | | AK (AK) joins |
04:04:12 | | kitonthenet joins |
04:07:08 | | dumbgoy__ joins |
04:10:04 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
04:12:58 | | kitonthenet quits [Ping timeout: 265 seconds] |
04:27:15 | | dumbgoy__ quits [Ping timeout: 272 seconds] |
04:55:17 | | BlueMaxima_ quits [Read error: Connection reset by peer] |
05:01:39 | | imer quits [Remote host closed the connection] |
05:02:49 | | imer (imer) joins |
05:03:21 | | Ruthalas59 quits [Ping timeout: 272 seconds] |
05:05:46 | | imer quits [Remote host closed the connection] |
05:07:07 | | imer (imer) joins |
05:10:10 | | imer quits [Remote host closed the connection] |
05:11:24 | | imer (imer) joins |
05:12:38 | | imer quits [Remote host closed the connection] |
05:41:05 | <project10> | JAA: are the logs from an AB job saved/accessible anywhere? |
05:45:31 | | imer (imer) joins |
05:46:04 | <@JAA> | project10: Yes, they're in the *-meta.warc.gz file. For aborted or crashed jobs, there's a -wpull.log.gz file instead, though that isn't indexed by the viewer; it should normally be in the same item as the *.json file. |
05:46:43 | <project10> | cool, I kinda had an inkling it might be saved in the data uploaded to IA itself. Throw nothing away and all that |
05:47:20 | <fireonlive> | =] |
05:47:34 | <@JAA> | Yeah, we do currently throw away the DB file though, which has some data that's hard to extract otherwise and is much more suitable for many analysis things. |
05:48:17 | <fireonlive> | ah like a quick sweep for failed urls or certain outlinks i suppose |
05:48:19 | <@JAA> | There's a... uh... three years old issue about it: https://github.com/ArchiveTeam/ArchiveBot/issues/465 |
05:49:19 | <@JAA> | Yeah. And some links get indexed by wpull but silently ignored. They only appear in the raw responses and in the DB. |
05:50:19 | | Barto (Barto) joins |
05:50:20 | <fireonlive> | ahh |
05:55:08 | | c3manu (c3manu) joins |
06:08:08 | | Wohlstand quits [Client Quit] |
06:14:17 | | etnguyen03 quits [Client Quit] |
06:14:44 | | Island quits [Read error: Connection reset by peer] |
06:23:06 | | c3manu quits [Remote host closed the connection] |
06:28:02 | | Ruthalas59 (Ruthalas) joins |
06:29:02 | | DogsRNice quits [Read error: Connection reset by peer] |
06:33:00 | | hitgrr8 joins |
06:45:19 | | Earendil7 quits [Ping timeout: 272 seconds] |
06:45:52 | | Earendil7 (Earendil7) joins |
07:09:06 | | fireonlive quits [Killed (NickServ (GHOST command used by fireonlive5))] |
07:09:49 | | fireonlive (fireonlive) joins |
07:14:50 | | sec^nd quits [*.net *.split] |
07:31:39 | | Arcorann (Arcorann) joins |
07:35:20 | | sonick (sonick) joins |
07:42:12 | <sonick> | Has there already mentioned dotup.org and its light version, light.dotup.org, the website that will be shut down on November 30? |
07:43:00 | <sonick> | These sites are relatively simple and could be done by AB. |
07:45:02 | <sonick> | The light version and the normal version seem to have different size limits for uploading and different uploaded content. |
08:00:21 | | nfriedly quits [Remote host closed the connection] |
08:18:04 | | Dango360 quits [Read error: Connection reset by peer] |
08:35:54 | | nicolas17 quits [Ping timeout: 265 seconds] |
08:39:34 | | nicolas17 joins |
08:41:57 | <pabs> | sonick: JAA did the non-lite version on 20231103 |
08:42:37 | <pabs> | https://archive.fart.website/archivebot/viewer/?q=dotup.org |
08:44:59 | <pabs> | stuck light one in AB now |
08:45:32 | <pabs> | hmm, file uploads are still enabled |
08:46:03 | <pabs> | its in Deathwatch so I guess someone will do another save near the deadline |
08:59:34 | <sonick> | ok, thanks. |
09:24:40 | | Vokun quits [*.net *.split] |
09:24:41 | | that_lurker|m quits [*.net *.split] |
09:24:41 | | M--mlv|m quits [*.net *.split] |
09:24:41 | | hillow596|m quits [*.net *.split] |
09:24:41 | | sonst-was|m quits [*.net *.split] |
09:24:41 | | qq44|m quits [*.net *.split] |
09:24:41 | | Misty|m quits [*.net *.split] |
09:24:41 | | username675f|m quits [*.net *.split] |
09:24:41 | | AntoninDelFabbro|m quits [*.net *.split] |
09:24:41 | | Peetz0r|m quits [*.net *.split] |
09:24:41 | | marius851000 quits [*.net *.split] |
09:24:41 | | EmeraldSnorlax|m quits [*.net *.split] |
09:24:41 | | ram|m quits [*.net *.split] |
09:24:41 | | kaz__|m quits [*.net *.split] |
09:24:41 | | Passiing|m quits [*.net *.split] |
09:24:41 | | noxious quits [*.net *.split] |
09:24:41 | | EvanBoehs|m quits [*.net *.split] |
09:24:41 | | Maakuth|m quits [*.net *.split] |
09:24:41 | | trumad|m quits [*.net *.split] |
09:24:41 | | NickS|m quits [*.net *.split] |
09:24:41 | | haha-whered-it-go|m quits [*.net *.split] |
09:24:41 | | joepie91|m quits [*.net *.split] |
09:24:41 | | yetanotherarchiver|m quits [*.net *.split] |
09:24:41 | | gwetchen|m quits [*.net *.split] |
09:24:42 | | superusercode quits [*.net *.split] |
09:24:42 | | noobirc|m quits [*.net *.split] |
09:24:42 | | lasdkfj|m quits [*.net *.split] |
09:24:42 | | GRBaset quits [*.net *.split] |
09:24:42 | | nyuuzyou quits [*.net *.split] |
09:24:42 | | Cydog|m quits [*.net *.split] |
09:24:42 | | will|m quits [*.net *.split] |
09:24:42 | | JC|m quits [*.net *.split] |
09:24:42 | | pannekoek11|m quits [*.net *.split] |
09:24:42 | | jwoglom|m quits [*.net *.split] |
09:24:42 | | gungagungagunga|m quits [*.net *.split] |
09:24:42 | | jevinskie quits [*.net *.split] |
09:24:42 | | coro quits [*.net *.split] |
09:24:43 | | t3chler|m quits [*.net *.split] |
09:24:43 | | qyxojzh|m quits [*.net *.split] |
09:24:43 | | hlgs|m quits [*.net *.split] |
09:24:43 | | thermospheric quits [*.net *.split] |
09:24:43 | | akaibu|m quits [*.net *.split] |
09:24:43 | | cmostracker|m quits [*.net *.split] |
09:24:43 | | Max|m12 quits [*.net *.split] |
09:24:43 | | Video quits [*.net *.split] |
09:24:43 | | nosamu|m quits [*.net *.split] |
09:24:43 | | masterx244|m quits [*.net *.split] |
09:24:43 | | voltagex|m quits [*.net *.split] |
09:24:43 | | mikolaj|m quits [*.net *.split] |
09:24:43 | | iCesenberk|m quits [*.net *.split] |
09:24:43 | | octylFractal|m quits [*.net *.split] |
09:24:43 | | wrangle|m quits [*.net *.split] |
09:24:43 | | tech234a|m quits [*.net *.split] |
09:24:44 | | Roki_100|m quits [*.net *.split] |
09:24:44 | | Ruk8 quits [*.net *.split] |
09:24:44 | | finalti|m quits [*.net *.split] |
09:24:44 | | jackt1365|m quits [*.net *.split] |
09:24:44 | | saouroun|m quits [*.net *.split] |
09:24:44 | | moe-a-m|m quits [*.net *.split] |
09:24:44 | | schwarzkatz|m quits [*.net *.split] |
09:24:44 | | alexshpilkin quits [*.net *.split] |
09:24:44 | | madpro|m quits [*.net *.split] |
09:24:45 | | Minkafighter|m quits [*.net *.split] |
09:24:45 | | yzqzss quits [*.net *.split] |
09:24:45 | | x9fff00 quits [*.net *.split] |
09:24:45 | | Exorcism quits [*.net *.split] |
09:24:45 | | phaeton quits [*.net *.split] |
09:24:45 | | Tom|m1 quits [*.net *.split] |
09:24:45 | | vexr quits [*.net *.split] |
09:24:45 | | Froxcey|m quits [*.net *.split] |
09:24:45 | | manu|m quits [*.net *.split] |
09:24:45 | | CrispyAlice2 quits [*.net *.split] |
09:24:45 | | s-crypt|m quits [*.net *.split] |
09:24:45 | | flashfire42|m quits [*.net *.split] |
09:24:45 | | Fletcher quits [*.net *.split] |
09:24:45 | | ragu|m quits [*.net *.split] |
09:24:45 | | MinePlayersPEMyNey|m quits [*.net *.split] |
09:24:45 | | Thibaultmol quits [*.net *.split] |
09:24:45 | | nstrom|m quits [*.net *.split] |
09:24:45 | | Hans5958 quits [*.net *.split] |
09:24:45 | | theblazehen|m quits [*.net *.split] |
09:24:45 | | rewby|m quits [*.net *.split] |
09:24:45 | | mpeter|m quits [*.net *.split] |
09:24:45 | | tomodachi94 quits [*.net *.split] |
09:24:45 | | audrooku|m quits [*.net *.split] |
09:24:45 | | xxia|m quits [*.net *.split] |
09:24:45 | | britmob|m quits [*.net *.split] |
09:24:45 | | andrewvieyra|m quits [*.net *.split] |
09:24:45 | | DigitalDragon quits [*.net *.split] |
09:24:45 | | mind_combatant quits [*.net *.split] |
09:24:45 | | @Sanqui|m quits [*.net *.split] |
09:24:45 | | igneousx quits [*.net *.split] |
09:24:45 | | Ajay quits [*.net *.split] |
09:28:19 | | marius851000 joins |
09:28:19 | | sonst-was|m joins |
09:28:19 | | that_lurker|m joins |
09:28:19 | | alexshpilkin joins |
09:28:19 | | qq44|m joins |
09:28:19 | | hillow596|m joins |
09:28:19 | | Misty|m joins |
09:28:19 | | username675f|m joins |
09:28:19 | | AntoninDelFabbro|m joins |
09:28:19 | | EmeraldSnorlax|m joins |
09:28:19 | | Peetz0r|m joins |
09:28:19 | | ram|m joins |
09:28:19 | | Passiing|m joins |
09:28:19 | | kaz__|m joins |
09:28:19 | | yzqzss joins |
09:28:19 | | Vokun joins |
09:28:19 | | tomodachi94 joins |
09:28:19 | | moe-a-m|m joins |
09:28:19 | | Sanqui|m joins |
09:28:19 | | Thibaultmol joins |
09:28:19 | | Tom|m1 joins |
09:28:19 | | Exorcism joins |
09:28:19 | | nstrom|m joins |
09:28:19 | | CrispyAlice2 joins |
09:28:19 | | Max|m12 joins |
09:28:19 | | thermospheric joins |
09:28:19 | | DigitalDragon joins |
09:28:19 | | GRBaset joins |
09:28:19 | | Froxcey|m joins |
09:28:19 | | JC|m joins |
09:28:19 | | x9fff00 joins |
09:28:19 | | iCesenberk|m joins |
09:28:19 | | phaeton joins |
09:28:19 | | octylFractal|m joins |
09:28:19 | | Roki_100|m joins |
09:28:19 | | schwarzkatz|m joins |
09:28:19 | | Minkafighter|m joins |
09:28:19 | | britmob|m joins |
09:28:19 | | mind_combatant joins |
09:28:19 | | Hans5958 joins |
09:28:19 | | xxia|m joins |
09:28:19 | | flashfire42|m joins |
09:28:19 | | qyxojzh|m joins |
09:28:19 | | noxious joins |
09:28:19 | | EvanBoehs|m joins |
09:28:19 | | coro joins |
09:28:19 | | Video joins |
09:28:19 | | yetanotherarchiver|m joins |
09:28:19 | | noobirc|m joins |
09:28:19 | | gungagungagunga|m joins |
09:28:19 | | cmostracker|m joins |
09:28:19 | | trumad|m joins |
09:28:19 | | Cydog|m joins |
09:28:19 | | nosamu|m joins |
09:28:19 | | s-crypt|m joins |
09:28:19 | | jwoglom|m joins |
09:28:19 | | superusercode joins |
09:28:19 | | nyuuzyou joins |
09:28:19 | | vexr joins |
09:28:19 | | wrangle|m joins |
09:28:19 | | will|m joins |
09:28:19 | | jevinskie joins |
09:28:19 | | gwetchen|m joins |
09:28:19 | | voltagex|m joins |
09:28:19 | | Fletcher joins |
09:28:19 | | manu|m joins |
09:28:19 | | finalti|m joins |
09:28:19 | | lasdkfj|m joins |
09:28:19 | | masterx244|m joins |
09:28:19 | | NickS|m joins |
09:28:20 | | akaibu|m joins |
09:28:20 | | igneousx joins |
09:28:20 | | audrooku|m joins |
09:28:20 | | Ajay joins |
09:28:20 | | ragu|m joins |
09:28:20 | | Maakuth|m joins |
09:28:20 | | madpro|m joins |
09:28:20 | | jackt1365|m joins |
09:28:20 | | t3chler|m joins |
09:28:20 | | haha-whered-it-go|m joins |
09:28:20 | | saouroun|m joins |
09:28:20 | | andrewvieyra|m joins |
09:28:20 | | theblazehen|m joins |
09:28:20 | | hlgs|m joins |
09:28:20 | | mpeter|m joins |
09:28:20 | | mikolaj|m joins |
09:28:20 | | Ruk8 joins |
09:28:20 | | MinePlayersPEMyNey|m joins |
09:28:20 | | tech234a|m joins |
09:28:20 | | pannekoek11|m joins |
09:28:20 | | joepie91|m joins |
09:28:20 | | rewby|m joins |
10:00:03 | | Bleo182 quits [Client Quit] |
10:01:23 | | Bleo182 joins |
10:47:21 | | s-crypt2 is now known as s-crypt |
10:52:35 | | s-crypt|m is now known as s-crypt|m|m |
10:56:45 | | MetaNova quits [Ping timeout: 272 seconds] |
11:01:29 | | MetaNova (MetaNova) joins |
11:25:55 | | atphoenix_ quits [Remote host closed the connection] |
11:26:18 | | atphoenix_ (atphoenix) joins |
11:34:24 | | DigitalDragons quits [Client Quit] |
11:34:38 | | DigitalDragons (DigitalDragons) joins |
11:48:25 | | nfriedly joins |
11:51:51 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
11:51:52 | | DigitalDragons quits [Client Quit] |
11:52:06 | | DigitalDragons (DigitalDragons) joins |
12:23:24 | | yzqzss quits [Client Quit] |
12:23:40 | | yzqzss (yzqzss) joins |
12:28:02 | | razul quits [Quit: Bye -] |
12:29:14 | | razul joins |
12:34:55 | | Arcorann quits [Ping timeout: 272 seconds] |
12:53:15 | | kitonthenet joins |
12:54:03 | | Megame (Megame) joins |
12:58:21 | | kitonthenet quits [Ping timeout: 272 seconds] |
12:59:04 | | kitonthenet joins |
13:03:40 | | kitonthenet quits [Ping timeout: 265 seconds] |
13:37:20 | | kitonthenet joins |
13:42:03 | | kitonthenet quits [Ping timeout: 272 seconds] |
13:55:06 | | Chris5010 (Chris5010) joins |
14:26:22 | | sec^nd (second) joins |
14:30:12 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
14:38:29 | | kitonthenet joins |
14:38:46 | | etnguyen03 (etnguyen03) joins |
14:47:35 | | kitonthenet quits [Ping timeout: 265 seconds] |
15:01:02 | | kitonthe1et joins |
15:05:35 | | Exorcism is now authenticated as exorcism |
15:05:35 | | Exorcism quits [Changing host] |
15:05:35 | | Exorcism (exorcism) joins |
15:08:49 | | kitonthe1et quits [Ping timeout: 272 seconds] |
15:46:01 | <@JAA> | sonick, pabs: That job only grabbed the most recent files because the pagination's limited, but I intend to do another run bruteforcing the older files (extensions have to be guessed). |
15:46:04 | | M--mlv|m joins |
15:57:23 | <@JAA> | If anyone is able to access https://javiermilei.com/ , a grab-site crawl would be great. I tried from machines in 9 countries and got blocked everywhere. It might need to be run in Argentina. |
15:57:39 | | HP_Archivist (HP_Archivist) joins |
15:58:16 | <anarchat> | i'm blocked by CF there as well |
16:00:17 | | ehmry joins |
16:03:28 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
16:06:04 | | Island joins |
16:19:14 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
16:24:13 | | bf_ joins |
16:32:33 | | bf_ quits [Remote host closed the connection] |
16:41:10 | <nicolas17> | JAA: yeah works for me, I think he had some kind of raffle a while ago so they had to stop people using bots from foreign IPs? |
16:41:40 | <nicolas17> | give me a tl;dr for grab-site |
16:51:53 | <@JAA> | nicolas17: Nice. The way I run grab-site is without the web interface and stuff, one container per target site: https://gitea.arpa.li/JustAnotherArchivist/grab-site-docker |
16:52:59 | | bf_ joins |
16:54:28 | <@JAA> | Since I can't look at the site, no idea what options are required here. :-| |
17:03:27 | | dumbgoy__ joins |
17:07:52 | <that_lurker> | Do you use docker exec for ignores and such or just yolo it without |
17:10:48 | <@JAA> | nano as root |
17:11:01 | <@JAA> | No risk, no fun. |
17:11:06 | | bf_ quits [Remote host closed the connection] |
17:11:34 | <@JAA> | But since it's a mount, it should be fine. |
17:12:11 | <fireonlive> | no vim for JAA :o? |
17:12:18 | | sonick quits [Client Quit] |
17:12:41 | <project10> | no emacs for JAA? |
17:13:50 | <@JAA> | I'd normally use a magnetised needle, but that's kind of hard to do remotely. |
17:14:01 | <fireonlive> | true true |
17:14:09 | <fireonlive> | we need to get you one of those surgeon robots |
17:14:11 | <@JAA> | Butterflies would work though, I suppose. |
17:14:58 | | bf_ joins |
17:15:02 | <kpcyrd> | I wish we could archive the editor jokes eventually |
17:16:30 | | @JAA archives kpcyrd. |
17:16:39 | <that_lurker> | I like to to rearange the 1 and 0 with electric magnet. Might edit the text or might kill the disk, but at least im not running as root so im safe |
17:16:44 | | kpcyrd .zip |
17:17:55 | <nicolas17> | JAA: do I need to build the container or is it in some repo already? |
17:18:17 | <that_lurker> | the build command is in the readme |
17:18:43 | <@JAA> | ^ |
17:18:55 | <@JAA> | Haven't pushed it anywhere, no. |
17:18:55 | <nicolas17> | yes |
17:19:02 | <nicolas17> | that_lurker: I'm asking if I have to :P |
17:19:49 | | kitonthenet joins |
17:25:16 | <nicolas17> | ugh |
17:25:30 | <nicolas17> | JAA: does grab-site run as an unprivileged user inside the container? |
17:27:05 | <nicolas17> | [Errno 13] Permission denied: '/data/javiermilei.com-2023-11-20-e125cf6c' |
17:27:31 | | kitonthenet quits [Ping timeout: 272 seconds] |
17:27:38 | <@JAA> | nicolas17: Yes: https://gitea.arpa.li/JustAnotherArchivist/grab-site-docker/src/commit/398726f73e84233a584fe096d916799fa3c90006/Dockerfile#L48 |
17:28:05 | <nicolas17> | so I guess I need to make the data dir world writable |
17:28:31 | <nicolas17> | RuntimeError: html5-parser and lxml are using different versions of libxml2. This happens commonly when using pip installed versions of lxml. Use pip install --no-binary lxml lxml instead. libxml2 versions: html5-parser: (2, 9, 14) != lxml: (2, 10, 3) |
17:28:32 | <@JAA> | Hmm, yeah, there might be room for improvement. |
17:28:49 | <@JAA> | Ugh |
17:29:14 | <nicolas17> | non-reproducible build tsk tsk |
17:29:58 | <fireonlive> | you developers and your chmod 777 😾 |
17:32:44 | <@JAA> | I'll take a look at the libxml2 issue in a sec. |
17:33:58 | <project10> | . |
17:34:02 | | Island quits [Remote host closed the connection] |
17:34:02 | <AK> | my builds are reproducible, they will fail every time 🤷 |
17:34:12 | | Island joins |
17:36:40 | <fireonlive> | xD |
17:37:31 | <h2ibot> | Megame edited Deathwatch (+219, /* 2023 */ Okada Books - Nov 30): https://wiki.archiveteam.org/?diff=51175&oldid=51170 |
17:38:42 | | bf_ quits [Remote host closed the connection] |
17:40:41 | | c3manu (c3manu) joins |
17:42:42 | | icedice (icedice) joins |
17:43:28 | | bf_ joins |
18:00:21 | | scurvy__dog__ joins |
18:00:25 | <@JAA> | Is there an equivalent to https://snapshot.debian.org/ for Alpine, such that you can 'install packages as they were at a specific datetime'? |
18:02:10 | <nicolas17> | forcing it to alpine 3.13 isn't enough? there's breaking changes to packages within the same alpine version? bleh |
18:05:31 | <kpcyrd> | JAA: no, but please let me know if you find one |
18:06:21 | <@JAA> | nicolas17: I mean, it might be, but my point is rather about how to do reproducible builds with Alpine. |
18:06:35 | <kpcyrd> | sad news: you can't |
18:06:38 | <@JAA> | Welp |
18:06:55 | <kpcyrd> | the error is likely related to python dependencies and unrelated to alpine tho? |
18:07:00 | <@JAA> | Yeah |
18:07:10 | <@JAA> | I was thinking more broadly about reproducibility. |
18:07:12 | <nicolas17> | https://gitlab.alpinelinux.org/alpine/abuild/-/issues/9996 |
18:07:56 | <@JAA> | I was not aware they outright delete old packages. Oof. |
18:08:17 | <nicolas17> | so does debian |
18:08:23 | <nicolas17> | hence having a separate snapshot service :P |
18:08:44 | <kpcyrd> | the other problem with alpine is the build environments are not really documented, even if you have all old packages its difficult to tell which ones you need to pick to re-create the original build environment |
18:10:04 | <kpcyrd> | other distros solve this with buildinfo files (the OG sbom basically), but Alpine is also stuck in this apk2-apk3 migration thing |
18:10:14 | <kpcyrd> | so they decided against adding buildinfo files to apk2 |
18:10:25 | <@JAA> | Well, yeah, but snapshot is part of the Debian project. So bit different, I think. (Although I believe snapshot.d.o might sometimes miss things if there are rapid uploads? I've heard something like that at least.) |
18:10:54 | <kpcyrd> | the only "proper" archive I'm aware of is https://archive.archlinux.org/ |
18:11:05 | <nicolas17> | snapshot.debian.org becoming an Official Part of the Project is relatively recent, it used to be snapshot.debian.net |
18:11:07 | <@JAA> | Yeah, Arch seems to do a good job at this. |
18:11:16 | <@JAA> | Ah, interesting. |
18:17:14 | <@JAA> | > The official recommendation is to keep your own mirror / repository with all the specific package and their versions that you may want to use. |
18:17:27 | <@JAA> | For Alpine. Ok then... |
18:18:03 | <kpcyrd> | 🤷 |
18:19:59 | <nicolas17> | but yeah I bet your problem is not pinning Python dep versions |
18:20:49 | | phaeton is now authenticated as phaeton |
18:20:49 | | phaeton quits [Changing host] |
18:20:49 | | phaeton (phaeton) joins |
18:21:18 | <@JAA> | Yeah, but indirectly. grab-site doesn't directly depend on html5-parser or lxml. |
18:22:16 | <nicolas17> | or you could push your working image somewhere :p |
18:22:40 | | scurvy__dog__ quits [Ping timeout: 265 seconds] |
18:25:47 | | etnguyen03 quits [Ping timeout: 272 seconds] |
18:26:29 | | bf_ quits [Remote host closed the connection] |
18:30:08 | | etnguyen03 (etnguyen03) joins |
18:37:24 | <kpcyrd> | JAA: the python ecosystem is very silly compared to other languages. Ideally you would have something like package-lock.json, Cargo.lock or composer.lock that records your dependency graph. |
18:38:36 | <fireonlive> | hmm you can pin requirements in the .txt can’t you |
18:38:44 | <fireonlive> | but i guess that’s also annoying |
18:39:24 | | icedice quits [Client Quit] |
18:40:07 | <kpcyrd> | tl;dr "yeah idk lol" https://stackoverflow.com/questions/52665596/equivalent-of-package-json-and-package-lock-json-for-pip |
18:42:26 | <kpcyrd> | "python is supposed to be easy, can we have easy dependency management too?" - "we have easy dependency management at home" |
18:42:34 | <kpcyrd> | dependency management at home: https://stackoverflow.com/questions/58218592/feature-comparison-between-npm-pip-pipenv-and-poetry-package-managers |
18:47:19 | | etnguyen03 quits [Ping timeout: 272 seconds] |
18:47:25 | <fireonlive> | 😅 |
18:49:22 | | etnguyen03 (etnguyen03) joins |
18:52:28 | | Dango360 (Dango360) joins |
19:00:18 | <nicolas17> | kpcyrd: yet pipenv and poetry seem to do exactly what you say? |
19:01:00 | <fireonlive> | but who uses those |
19:03:47 | | kiska5 quits [Ping timeout: 272 seconds] |
19:03:47 | | Ryz quits [Ping timeout: 272 seconds] |
19:08:35 | | Megame quits [Client Quit] |
19:08:47 | <@JAA> | There's also pip-tools. But I don't disagree. |
19:09:32 | <@JAA> | On the other hand, those packages need to be *constantly* updated for bug or security fixes anywhere in the dependency tree, which is also very silly. |
19:09:40 | <@JAA> | those package lists* |
19:14:11 | | kitonthenet joins |
19:14:33 | | etnguyen03 quits [Ping timeout: 272 seconds] |
19:20:11 | | kitonthenet quits [Ping timeout: 265 seconds] |
19:38:18 | | me joins |
19:38:41 | | me quits [Remote host closed the connection] |
19:40:32 | | IDK_ joins |
19:40:59 | | Ryz (Ryz) joins |
19:48:12 | | Ryz quits [Excess Flood] |
19:52:30 | | kiska5 joins |
19:53:04 | | Ryz (Ryz) joins |
19:55:02 | <fireonlive> | hmm yeah |
19:55:43 | | nicolas17 quits [Ping timeout: 272 seconds] |
19:58:07 | | ScenarioPlanet (ScenarioPlanet) joins |
19:58:11 | | nicolas17 joins |
20:02:29 | | Gooshka joins |
20:02:54 | <Gooshka> | https://www.forbes.ru/biznes/494353-andeks-zadumalsa-o-prodaze-svoego-biznesa-v-izraile - Yandex thinks about selling its business located in Israel. |
20:03:19 | <Gooshka> | https://www.golosameriki.com/a/yandex-can-sell-its-entire-business-in-russia/7355003.html -Yandex can sell its entire business in Russia |
20:03:20 | <fireonlive> | oh interesting, thanks Gooshka. what sites/fronts does it have there? |
20:03:47 | <Gooshka> | I sent some links in AB channel. |
20:03:51 | <fireonlive> | oh wow; all of yandex |
20:04:00 | <fireonlive> | Gooshka: ah! i missed that. thanks as always :) |
20:05:44 | <Gooshka> | https://github.com/yandex/ , https://github.com/yandex-cloud/ , https://huggingface.co/yandex , https://yandex.ru/company/ , https://yandex.ru/legal/ , https://yandex.ru/support/ |
20:05:49 | <Gooshka> | etc. |
20:06:49 | | lumidify_ quits [Quit: leaving] |
20:09:37 | <Gooshka> | https://toloka.ai/ , https://toloka.ai/tolokers/ru/ (formerly https://toloka.yandex.ru/ ), has page on WKP: https://en.wikipedia.org/wiki/Toloka . |
20:11:27 | <Gooshka> | https://yandex.ru/dev/ - technologies of Yandex. |
20:15:15 | <Gooshka> | https://yatalks2023.com/ , https://yatalks.yandex.ru/ , I can't find YaTalks before 2023 on sites like yatalks2023.com, only pages like this: https://yatalks2023.com/2022/ru . |
20:15:37 | <nicolas17> | JAA: got a working grab-site yet? :P |
20:17:46 | | lumidify (lumidify) joins |
20:36:00 | <Gooshka> | https://habr.com/ru/companies/yandex/ - blog of Yandex team. |
20:36:50 | <Gooshka> | https://shedevrum.ai/ - AI by Yandex creates beatiful pictures of animals and people. |
20:41:35 | <Gooshka> | https://yandex.ru/lab/countries - game in which you guess what country is on photo. It follows Russian laws, so Abkhazia is not part of Georgia according to this. Player 2 is Alisa, AI by Yandex. Some other goods under /lab/ directory. |
20:50:50 | | Megame (Megame) joins |
21:02:09 | | dumbgoy joins |
21:02:16 | | dumbgoy quits [Read error: Connection reset by peer] |
21:03:18 | | Gooshka quits [Remote host closed the connection] |
21:05:23 | | dumbgoy__ quits [Ping timeout: 272 seconds] |
21:05:46 | | DigitalDragons quits [Client Quit] |
21:05:46 | | Megame quits [Remote host closed the connection] |
21:05:46 | | ScenarioPlanet quits [Remote host closed the connection] |
21:05:46 | | Island quits [Remote host closed the connection] |
21:05:48 | | Island joins |
21:05:50 | | Megame (Megame) joins |
21:05:52 | | ScenarioPlanet (ScenarioPlanet) joins |
21:05:59 | | DigitalDragons (DigitalDragons) joins |
21:15:40 | | dumbgoy joins |
21:21:01 | | BlueMaxima joins |
21:54:50 | <fireonlive> | hii; so i'm very crudelyâ„¢ monitoring urls that archivebot hits until something more betterer is in place - so far i'm looking for blogger/blogspot and imgur.. any others I should look for? |
21:55:16 | <fireonlive> | imgur because most pipelines just get a 429 from imgur right away |
21:55:19 | <fireonlive> | (it seems) |
21:57:41 | | bf_ joins |
22:00:30 | | ThetaDev_ quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
22:00:38 | | ThetaDev joins |
22:01:45 | | wickedplayer494 quits [Ping timeout: 272 seconds] |
22:02:31 | | wickedplayer494 joins |
22:04:32 | | kitonthenet joins |
22:06:48 | <thuban> | fireonlive: https://wiki.archiveteam.org/index.php/Category:Projects_requiring_URL_lists mediafire |
22:07:07 | <fireonlive> | ah yes :) |
22:07:10 | <fireonlive> | thanks |
22:08:39 | | bf_ quits [Remote host closed the connection] |
22:09:05 | | bf_ joins |
22:09:21 | | kitonthenet quits [Ping timeout: 272 seconds] |
22:16:20 | | abirkill- (abirkill) joins |
22:18:51 | | abirkill quits [Ping timeout: 272 seconds] |
22:18:51 | | abirkill- is now known as abirkill |
22:25:56 | | icedice (icedice) joins |
22:28:39 | | c3manu quits [Client Quit] |
22:35:53 | | jacksonchen666 (jacksonchen666) joins |
22:37:04 | <@JAA> | fireonlive: Telegram, perhaps? |
22:37:34 | <@JAA> | You'll want to filter out the share links though. |
22:38:26 | <fireonlive> | ah right! |
22:43:56 | <thuban> | ah, someone should add the 'do you have a list' template to the telegram wiki page |
22:44:49 | <thuban> | idk exactly what the regex would be |
22:45:30 | | benjins2_ quits [Read error: Connection reset by peer] |
22:48:21 | <@JAA> | The reason it isn't there is that we don't currently have a bot that takes arbitrary URLs and extracts items for the tracker from it, like we do for Imgur and MediaFire. |
22:49:22 | | Megame1_ (Megame) joins |
22:53:49 | | Megame quits [Ping timeout: 265 seconds] |
22:55:56 | | ScenarioPlanet quits [Client Quit] |
22:57:55 | <@JAA> | Added it, but we won't be able to make full use of the lists easily yet. |
22:58:35 | <h2ibot> | JustAnotherArchivist edited Telegram (+113, Add URL list CTA): https://wiki.archiveteam.org/?diff=51176&oldid=50298 |
23:03:37 | | ScenarioPlanet (ScenarioPlanet) joins |
23:04:34 | <fireonlive> | JAA++ |
23:04:34 | <eggdrop> | [karma] 'JAA' now has 4 karma! |
23:05:11 | | bf_ quits [Remote host closed the connection] |
23:05:12 | | bf_ joins |
23:06:21 | | HP_Archivist quits [Ping timeout: 272 seconds] |
23:16:06 | | ScenarioPlanet quits [Client Quit] |
23:19:03 | | benjins2 joins |
23:21:06 | <thuban> | ah, fair |
23:31:33 | | jacksonchen666 quits [Client Quit] |
23:33:59 | | kitonthenet joins |
23:45:37 | | ymgve quits [Ping timeout: 272 seconds] |