00:04:42<driib>I now progressed to trying https://github.com/internetarchive/warctools but seems like they are quite broken with Python 3.10 and from the bug tracker it seems I'd have to roll back with pyenv quite far back to 3.5 (and then still, there seems to be a large PR fixing lots of things for Python 3.5). Those tools look completely unmaintained, is it
00:04:42<driib>because there is another set of tools that is more prevalent, or did the repo just languish or is everyone in archiving world still using Python 2? Please forgive my ignorance.
00:04:58<driib>I then found https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem#Tools (thanks!) and took 'warcat' for a spin as a first tool on the list that should work with Python 3. I just tried to put the decompressed WARC through it and it complains on every record: https://pastebin.com/9KaZKygx I guess my next question is: which of the tools listed on
00:04:58<driib>AT's wiki are compatible with the megawarcs produced by AT?
00:07:36<@JAA>The existing tooling for working with WARCs is horrid. I've been working on something to improve that, but other things keep coming up.
00:08:50<@JAA>For now, when I work with WARCs, it usually consists of pretty dumb shell pipelines. I wrote some helper tools to dump the contents of response records, process HTTP responses with transfer encoding, and crudely process HTML (last one still kind of WIP). You can find those in little-things as well.
00:09:58<@JAA>So I might do `curl -sL ${internetArchiveWarcUrl} | zstdwarccat | warc-dump-responses | http-response-bodies | grep ...`.
00:11:09<@JAA>There's also a `warc-tiny` which helps with a few things, and sometimes I modify that or `warc-dump-responses` for my specific needs (e.g. only dumping responses for particular URL patterns).
00:13:37<@JAA>But your impression is entirely correct: only few tools exist to read WARCs in the first place, and they're basically all lacking maintenance or have awful bugs.
00:15:17<@JAA>The software that ingests WARCs into the Wayback Machine is also buggy and essentially impossible to reuse outside of that context.
00:17:19ThreeHM quits [Ping timeout: 252 seconds]
00:17:42ThreeHM (ThreeHeadedMonkey) joins
00:23:59<@JAA>If you only read WARCs and don't mind the minor corruption mentioned in the GitHub issues, you could use warcio I suppose.
00:24:55<@JAA>(Minor when it comes to reading and processing, usually. It's still a non-starter for any WARC writing.)
00:55:39TastyWiener95 (TastyWiener95) joins
00:57:11Mateon2 joins
00:58:12Mateon1 quits [Ping timeout: 252 seconds]
00:58:12Mateon2 is now known as Mateon1
01:07:25<lennier1>It's a little inconsistent, but I was able to get Scweet running with Chrome. I didn't even get through as many of @verified's followed accounts as with Firefox, though. I think it's just not possible to scroll though 420K accounts before it stops loading. Maybe if someone has access to the paid Twitter API they could get it. Not sure if there are any other options. If someone has already compiled a list of verified
01:07:25<lennier1>accounts, I couldn't find it.
01:12:33BlueMaxima joins
01:48:43<dumbgoy_>you guys been tracking archive.org's legal battle? seems like things shouldn't be archived to there if we want them for permanent use
01:49:59<dumbgoy_>but i've got no alternatives
01:56:50<@OrIdow6^2>See topic in #archiveteam for future reference for channel
01:57:07<@OrIdow6^2>And to make a short case for them... the fact that they can be sued and will comply is to an extent a good sign
01:57:33<@OrIdow6^2>Most of the "alternatives" are really shoddy things that steal storage from Google by using education accounts or whatever
02:02:22@OrIdow6^2 quits [Ping timeout: 252 seconds]
02:07:08<dumbgoy_>christ, i didn't read the topic in #archiveteam , that's my bad, oh, he quit
02:12:34OrIdow6 (OrIdow6) joins
02:12:34@ChanServ sets mode: +o OrIdow6
02:25:02Lord_Nightmare quits [Quit: ZNC - http://znc.in]
02:28:35Lord_Nightmare (Lord_Nightmare) joins
02:37:07__19100 quits [Client Quit]
04:00:01treora quits [Quit: blub blub.]
04:01:23treora joins
04:07:40umgr036 quits [Read error: Connection reset by peer]
04:10:33Lord_Nightmare quits [Client Quit]
04:10:53Lord_Nightmare (Lord_Nightmare) joins
04:15:34umgr036 joins
04:16:13umgr036 quits [Read error: Connection reset by peer]
04:16:34umgr036 joins
04:36:04michaelblob quits [Read error: Connection reset by peer]
04:38:29michaelblob (michaelblob) joins
05:03:40pabs quits [Client Quit]
05:04:01hackbug quits [Ping timeout: 265 seconds]
05:05:51pabs (pabs) joins
05:12:21Island quits [Read error: Connection reset by peer]
05:22:34katocala quits [Ping timeout: 252 seconds]
05:23:12katocala joins
05:32:37BlueMaxima quits [Read error: Connection reset by peer]
05:39:50hitgrr8 joins
06:01:26tzt quits [Ping timeout: 252 seconds]
06:13:20tzt (tzt) joins
06:29:50qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
08:14:17zhongfu quits [Quit: cya losers]
08:21:55zhongfu (zhongfu) joins
09:07:38TastyWiener95 quits [Ping timeout: 265 seconds]
09:29:40Jake quits [Remote host closed the connection]
09:29:53Jake (Jake) joins
09:30:58Jake quits [Remote host closed the connection]
09:31:10Jake (Jake) joins
09:31:57Jake quits [Remote host closed the connection]
09:32:15Jake (Jake) joins
09:32:59Jake quits [Remote host closed the connection]
09:33:12Jake (Jake) joins
09:33:58Jake quits [Remote host closed the connection]
09:34:10Jake (Jake) joins
09:34:56Jake quits [Remote host closed the connection]
09:35:08Jake (Jake) joins
09:35:53Jake quits [Remote host closed the connection]
09:36:05Jake (Jake) joins
09:36:35Minkafighter7225 quits [Client Quit]
09:36:51Jake quits [Read error: Connection reset by peer]
09:36:53Minkafighter7225 joins
09:37:03Jake (Jake) joins
09:37:48Jake quits [Remote host closed the connection]
09:38:01Jake (Jake) joins
09:38:45Jake quits [Remote host closed the connection]
09:38:57Jake (Jake) joins
09:39:41Jake quits [Remote host closed the connection]
09:39:53Jake (Jake) joins
09:40:40Jake quits [Remote host closed the connection]
09:40:52Jake (Jake) joins
09:41:37Jake quits [Remote host closed the connection]
09:41:49Jake (Jake) joins
09:42:34Jake quits [Remote host closed the connection]
09:42:46Jake (Jake) joins
09:43:31Jake quits [Remote host closed the connection]
09:43:43Jake (Jake) joins
09:44:29Jake quits [Remote host closed the connection]
09:44:41Jake (Jake) joins
09:45:30Jake quits [Read error: Connection reset by peer]
09:45:41Jake (Jake) joins
09:46:26Jake quits [Remote host closed the connection]
09:46:39Jake (Jake) joins
09:47:23Jake quits [Remote host closed the connection]
09:47:36Jake (Jake) joins
09:48:23Jake quits [Remote host closed the connection]
09:48:35Jake (Jake) joins
09:49:21Jake quits [Remote host closed the connection]
09:49:33Jake (Jake) joins
09:50:17Jake quits [Remote host closed the connection]
09:50:29Jake (Jake) joins
09:51:15Jake quits [Remote host closed the connection]
09:51:27Jake (Jake) joins
10:56:20Jonboy345 joins
11:21:10yawkat quits [Ping timeout: 252 seconds]
11:57:41hackbug (hackbug) joins
12:30:49<bilboed>just noticed that "SendDoneToTracker" seems to continuously go down in the warrior (whereas it didn't before). Issue with tracker not confirming reception ?
12:41:29Minkafighter7225 quits [Client Quit]
12:41:30michaelblob quits [Remote host closed the connection]
12:41:41michaelblob (michaelblob) joins
12:41:43Minkafighter7225 joins
13:08:12BearFortress quits [Read error: Connection reset by peer]
13:09:27BearFortress joins
13:10:33Jake7 (Jake) joins
13:10:35Minkafighter72255 joins
13:10:38Jake quits [Client Quit]
13:10:38Minkafighter7225 quits [Read error: Connection reset by peer]
13:10:38qwertyasdfuiopghjkl quits [Remote host closed the connection]
13:10:38Minkafighter72255 is now known as Minkafighter7225
13:10:38Jake7 is now known as Jake
13:20:13yawkat (yawkat) joins
13:23:02hitgrr8 quits [Client Quit]
13:26:01eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
13:29:11<spirit>https://twitter.com/PlanetPhillip/status/1623995091709751297 @PlanetPhillip: "Hey peps, so I got an email from Mega saying I needed to login to my account or I would lose the files. It seems I stored some very old FPS games there."
13:32:04Arcorann quits [Ping timeout: 252 seconds]
13:34:22eroc1990 (eroc1990) joins
13:57:31umgr036 quits [Read error: Connection reset by peer]
14:04:48umgr036 joins
14:05:27umgr036 quits [Read error: Connection reset by peer]
14:05:49umgr036 joins
14:22:33<driib>JAA: I will look into the rest of your tools under `little-things`, thank you for the guidance and for open-sourcing them!
14:32:44<@JAA>FOSS all the way. The thing I hinted at for better WARC processing will also be FOSS, naturally. :-)
14:34:59Island joins
14:49:44<hexa->JAA++
14:51:06<h2ibot>JustAnotherArchivist edited In The Media (+195, Add Hackaday article about DPReview): https://wiki.archiveteam.org/?diff=49622&oldid=49336
14:51:07<h2ibot>JustAnotherArchivist edited In The Media (-1): https://wiki.archiveteam.org/?diff=49623&oldid=49622
15:00:08<h2ibot>JAABot edited Main Page/In The Media (-117): https://wiki.archiveteam.org/?diff=49624&oldid=49337
15:14:27qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:17:42<Ryz>spirit, that's...that's seriously not good :c
15:19:55<Ryz>!ig elb0matk17dhb9oy8nwfci9t0 ^https?://soundcloud\.com/
15:20:00<Ryz>Oops
15:38:47hitgrr8 joins
16:12:27<nfriedly>JAA: Sorry, when they started with a / they were showing up red and not working for me
16:13:43<@JAA>nfriedly: Ah, might've been a caching issue then.
16:17:16abirkill quits [Quit: Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.]
16:29:44lol joins
16:31:13lol quits [Remote host closed the connection]
16:42:00thuban quits [Ping timeout: 252 seconds]
16:49:07Island_ joins
16:49:18Island quits [Remote host closed the connection]
16:52:36qwertyasdfuiopghjkl quits [Remote host closed the connection]
17:19:35qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
17:39:04TastyWiener95 (TastyWiener95) joins
18:03:24Jonboy345 quits [Ping timeout: 252 seconds]
18:06:11Jonboy345 joins
18:39:07michaelblob quits [Read error: Connection reset by peer]
18:46:08qwertyasdfuiopghjkl quits [Remote host closed the connection]
19:02:47abirkill (abirkill) joins
19:02:51qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:07:46zhongfu quits [Client Quit]
19:07:46yawkat quits [Client Quit]
19:07:46qwertyasdfuiopghjkl quits [Client Quit]
19:07:56TastyWiener95 quits [Ping timeout: 252 seconds]
19:07:57zhongfu (zhongfu) joins
19:08:25yawkat (yawkat) joins
19:25:36<@JAA>So it turns out that https://source.codeaurora.org/ is gone in entirety, not just the QUIC parts. :-|
19:25:58<@JAA>Domain fails to resolve since at least a few hours ago.
19:28:42balrog quits [Quit: Bye]
19:34:00balrog (balrog) joins
19:45:08qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
20:13:53Pichu0102 quits [Remote host closed the connection]
20:29:36sec^nd quits [Ping timeout: 245 seconds]
20:30:30thuban joins
20:38:52Sluggs quits [Excess Flood]
20:42:56Sluggs joins
20:51:00tattoo453 joins
20:53:10spirit quits [Ping timeout: 252 seconds]
20:54:52tattoo453 quits [Remote host closed the connection]
20:57:07TheTechRobo (TheTechRobo) joins
21:10:03hitgrr8 quits [Client Quit]
21:16:47<@JAA>https://bye.codeaurora.org/
21:16:50<@JAA>...
21:17:08<@JAA>Great.
21:17:54<@JAA>Most of what was left apart from QUIC seem to have been mirrors though, so probably not that bad.
21:17:59<@JAA>But still, ugh.
21:18:52<@JAA>'CodeAurora.org is now archived.' You keep using that word. It does not mean what you think it means.
21:21:00<hexa->> Qualcomm Innovation Center
21:21:11<hexa->I did have other associations with that acronym, but oh well
21:25:11<@JAA>Heh, yeah, I've been working on this for too long, so the association has changed.
21:41:12dumbgoy_ quits [Ping timeout: 252 seconds]
21:43:59dumbgoy joins
21:56:57michaelblob (michaelblob) joins
22:01:42sec^nd (second) joins
22:01:44ehmry quits [Ping timeout: 252 seconds]
22:02:59ehmry joins
22:51:50BearFortress quits [Client Quit]
23:43:49<@arkiver>JAA: do we have the rootsweb mailing lists preserved?
23:59:42<@JAA>arkiver: I didn't do anything. hook54321 said 'we have i believe a partial copy' last month.