| 00:04:42 | <driib> | I now progressed to trying https://github.com/internetarchive/warctools but seems like they are quite broken with Python 3.10 and from the bug tracker it seems I'd have to roll back with pyenv quite far back to 3.5 (and then still, there seems to be a large PR fixing lots of things for Python 3.5). Those tools look completely unmaintained, is it |
| 00:04:42 | <driib> | because there is another set of tools that is more prevalent, or did the repo just languish or is everyone in archiving world still using Python 2? Please forgive my ignorance. |
| 00:04:58 | <driib> | I then found https://wiki.archiveteam.org/index.php/The_WARC_Ecosystem#Tools (thanks!) and took 'warcat' for a spin as a first tool on the list that should work with Python 3. I just tried to put the decompressed WARC through it and it complains on every record: https://pastebin.com/9KaZKygx I guess my next question is: which of the tools listed on |
| 00:04:58 | <driib> | AT's wiki are compatible with the megawarcs produced by AT? |
| 00:07:36 | <@JAA> | The existing tooling for working with WARCs is horrid. I've been working on something to improve that, but other things keep coming up. |
| 00:08:50 | <@JAA> | For now, when I work with WARCs, it usually consists of pretty dumb shell pipelines. I wrote some helper tools to dump the contents of response records, process HTTP responses with transfer encoding, and crudely process HTML (last one still kind of WIP). You can find those in little-things as well. |
| 00:09:58 | <@JAA> | So I might do `curl -sL ${internetArchiveWarcUrl} | zstdwarccat | warc-dump-responses | http-response-bodies | grep ...`. |
| 00:11:09 | <@JAA> | There's also a `warc-tiny` which helps with a few things, and sometimes I modify that or `warc-dump-responses` for my specific needs (e.g. only dumping responses for particular URL patterns). |
| 00:13:37 | <@JAA> | But your impression is entirely correct: only few tools exist to read WARCs in the first place, and they're basically all lacking maintenance or have awful bugs. |
| 00:15:17 | <@JAA> | The software that ingests WARCs into the Wayback Machine is also buggy and essentially impossible to reuse outside of that context. |
| 00:17:19 | | ThreeHM quits [Ping timeout: 252 seconds] |
| 00:17:42 | | ThreeHM (ThreeHeadedMonkey) joins |
| 00:23:59 | <@JAA> | If you only read WARCs and don't mind the minor corruption mentioned in the GitHub issues, you could use warcio I suppose. |
| 00:24:55 | <@JAA> | (Minor when it comes to reading and processing, usually. It's still a non-starter for any WARC writing.) |
| 00:55:39 | | TastyWiener95 (TastyWiener95) joins |
| 00:57:11 | | Mateon2 joins |
| 00:58:12 | | Mateon1 quits [Ping timeout: 252 seconds] |
| 00:58:12 | | Mateon2 is now known as Mateon1 |
| 01:07:25 | <lennier1> | It's a little inconsistent, but I was able to get Scweet running with Chrome. I didn't even get through as many of @verified's followed accounts as with Firefox, though. I think it's just not possible to scroll though 420K accounts before it stops loading. Maybe if someone has access to the paid Twitter API they could get it. Not sure if there are any other options. If someone has already compiled a list of verified |
| 01:07:25 | <lennier1> | accounts, I couldn't find it. |
| 01:12:33 | | BlueMaxima joins |
| 01:48:43 | <dumbgoy_> | you guys been tracking archive.org's legal battle? seems like things shouldn't be archived to there if we want them for permanent use |
| 01:49:59 | <dumbgoy_> | but i've got no alternatives |
| 01:56:50 | <@OrIdow6^2> | See topic in #archiveteam for future reference for channel |
| 01:57:07 | <@OrIdow6^2> | And to make a short case for them... the fact that they can be sued and will comply is to an extent a good sign |
| 01:57:33 | <@OrIdow6^2> | Most of the "alternatives" are really shoddy things that steal storage from Google by using education accounts or whatever |
| 02:02:22 | | @OrIdow6^2 quits [Ping timeout: 252 seconds] |
| 02:07:08 | <dumbgoy_> | christ, i didn't read the topic in #archiveteam , that's my bad, oh, he quit |
| 02:12:34 | | OrIdow6 (OrIdow6) joins |
| 02:12:34 | | @ChanServ sets mode: +o OrIdow6 |
| 02:25:02 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 02:28:35 | | Lord_Nightmare (Lord_Nightmare) joins |
| 02:37:07 | | __19100 quits [Client Quit] |
| 04:00:01 | | treora quits [Quit: blub blub.] |
| 04:01:23 | | treora joins |
| 04:07:40 | | umgr036 quits [Read error: Connection reset by peer] |
| 04:10:33 | | Lord_Nightmare quits [Client Quit] |
| 04:10:53 | | Lord_Nightmare (Lord_Nightmare) joins |
| 04:15:34 | | umgr036 joins |
| 04:16:13 | | umgr036 quits [Read error: Connection reset by peer] |
| 04:16:34 | | umgr036 joins |
| 04:36:04 | | michaelblob quits [Read error: Connection reset by peer] |
| 04:38:29 | | michaelblob (michaelblob) joins |
| 05:03:40 | | pabs quits [Client Quit] |
| 05:04:01 | | hackbug quits [Ping timeout: 265 seconds] |
| 05:05:51 | | pabs (pabs) joins |
| 05:12:21 | | Island quits [Read error: Connection reset by peer] |
| 05:22:34 | | katocala quits [Ping timeout: 252 seconds] |
| 05:23:12 | | katocala joins |
| 05:32:37 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 05:39:50 | | hitgrr8 joins |
| 06:01:26 | | tzt quits [Ping timeout: 252 seconds] |
| 06:13:20 | | tzt (tzt) joins |
| 06:29:50 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 08:14:17 | | zhongfu quits [Quit: cya losers] |
| 08:21:55 | | zhongfu (zhongfu) joins |
| 09:07:38 | | TastyWiener95 quits [Ping timeout: 265 seconds] |
| 09:29:40 | | Jake quits [Remote host closed the connection] |
| 09:29:53 | | Jake (Jake) joins |
| 09:30:58 | | Jake quits [Remote host closed the connection] |
| 09:31:10 | | Jake (Jake) joins |
| 09:31:57 | | Jake quits [Remote host closed the connection] |
| 09:32:15 | | Jake (Jake) joins |
| 09:32:59 | | Jake quits [Remote host closed the connection] |
| 09:33:12 | | Jake (Jake) joins |
| 09:33:58 | | Jake quits [Remote host closed the connection] |
| 09:34:10 | | Jake (Jake) joins |
| 09:34:56 | | Jake quits [Remote host closed the connection] |
| 09:35:08 | | Jake (Jake) joins |
| 09:35:53 | | Jake quits [Remote host closed the connection] |
| 09:36:05 | | Jake (Jake) joins |
| 09:36:35 | | Minkafighter7225 quits [Client Quit] |
| 09:36:51 | | Jake quits [Read error: Connection reset by peer] |
| 09:36:53 | | Minkafighter7225 joins |
| 09:37:03 | | Jake (Jake) joins |
| 09:37:48 | | Jake quits [Remote host closed the connection] |
| 09:38:01 | | Jake (Jake) joins |
| 09:38:45 | | Jake quits [Remote host closed the connection] |
| 09:38:57 | | Jake (Jake) joins |
| 09:39:41 | | Jake quits [Remote host closed the connection] |
| 09:39:53 | | Jake (Jake) joins |
| 09:40:40 | | Jake quits [Remote host closed the connection] |
| 09:40:52 | | Jake (Jake) joins |
| 09:41:37 | | Jake quits [Remote host closed the connection] |
| 09:41:49 | | Jake (Jake) joins |
| 09:42:34 | | Jake quits [Remote host closed the connection] |
| 09:42:46 | | Jake (Jake) joins |
| 09:43:31 | | Jake quits [Remote host closed the connection] |
| 09:43:43 | | Jake (Jake) joins |
| 09:44:29 | | Jake quits [Remote host closed the connection] |
| 09:44:41 | | Jake (Jake) joins |
| 09:45:30 | | Jake quits [Read error: Connection reset by peer] |
| 09:45:41 | | Jake (Jake) joins |
| 09:46:26 | | Jake quits [Remote host closed the connection] |
| 09:46:39 | | Jake (Jake) joins |
| 09:47:23 | | Jake quits [Remote host closed the connection] |
| 09:47:36 | | Jake (Jake) joins |
| 09:48:23 | | Jake quits [Remote host closed the connection] |
| 09:48:35 | | Jake (Jake) joins |
| 09:49:21 | | Jake quits [Remote host closed the connection] |
| 09:49:33 | | Jake (Jake) joins |
| 09:50:17 | | Jake quits [Remote host closed the connection] |
| 09:50:29 | | Jake (Jake) joins |
| 09:51:15 | | Jake quits [Remote host closed the connection] |
| 09:51:27 | | Jake (Jake) joins |
| 10:56:20 | | Jonboy345 joins |
| 11:21:10 | | yawkat quits [Ping timeout: 252 seconds] |
| 11:57:41 | | hackbug (hackbug) joins |
| 12:30:49 | <bilboed> | just noticed that "SendDoneToTracker" seems to continuously go down in the warrior (whereas it didn't before). Issue with tracker not confirming reception ? |
| 12:41:29 | | Minkafighter7225 quits [Client Quit] |
| 12:41:30 | | michaelblob quits [Remote host closed the connection] |
| 12:41:41 | | michaelblob (michaelblob) joins |
| 12:41:43 | | Minkafighter7225 joins |
| 13:08:12 | | BearFortress quits [Read error: Connection reset by peer] |
| 13:09:27 | | BearFortress joins |
| 13:10:33 | | Jake7 (Jake) joins |
| 13:10:35 | | Minkafighter72255 joins |
| 13:10:38 | | Jake quits [Client Quit] |
| 13:10:38 | | Minkafighter7225 quits [Read error: Connection reset by peer] |
| 13:10:38 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 13:10:38 | | Minkafighter72255 is now known as Minkafighter7225 |
| 13:10:38 | | Jake7 is now known as Jake |
| 13:17:27 | | katocala is now authenticated as katocala |
| 13:20:13 | | yawkat (yawkat) joins |
| 13:23:02 | | hitgrr8 quits [Client Quit] |
| 13:26:01 | | eroc1990 quits [Quit: The Lounge - https://thelounge.chat] |
| 13:29:11 | <spirit> | https://twitter.com/PlanetPhillip/status/1623995091709751297 @PlanetPhillip: "Hey peps, so I got an email from Mega saying I needed to login to my account or I would lose the files. It seems I stored some very old FPS games there." |
| 13:32:04 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:34:22 | | eroc1990 (eroc1990) joins |
| 13:57:31 | | umgr036 quits [Read error: Connection reset by peer] |
| 14:04:48 | | umgr036 joins |
| 14:05:27 | | umgr036 quits [Read error: Connection reset by peer] |
| 14:05:49 | | umgr036 joins |
| 14:22:33 | <driib> | JAA: I will look into the rest of your tools under `little-things`, thank you for the guidance and for open-sourcing them! |
| 14:32:44 | <@JAA> | FOSS all the way. The thing I hinted at for better WARC processing will also be FOSS, naturally. :-) |
| 14:34:59 | | Island joins |
| 14:49:44 | <hexa-> | JAA++ |
| 14:51:06 | <h2ibot> | JustAnotherArchivist edited In The Media (+195, Add Hackaday article about DPReview): https://wiki.archiveteam.org/?diff=49622&oldid=49336 |
| 14:51:07 | <h2ibot> | JustAnotherArchivist edited In The Media (-1): https://wiki.archiveteam.org/?diff=49623&oldid=49622 |
| 15:00:08 | <h2ibot> | JAABot edited Main Page/In The Media (-117): https://wiki.archiveteam.org/?diff=49624&oldid=49337 |
| 15:14:27 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 15:17:42 | <Ryz> | spirit, that's...that's seriously not good :c |
| 15:19:55 | <Ryz> | !ig elb0matk17dhb9oy8nwfci9t0 ^https?://soundcloud\.com/ |
| 15:20:00 | <Ryz> | Oops |
| 15:38:47 | | hitgrr8 joins |
| 16:12:27 | <nfriedly> | JAA: Sorry, when they started with a / they were showing up red and not working for me |
| 16:13:43 | <@JAA> | nfriedly: Ah, might've been a caching issue then. |
| 16:17:16 | | abirkill quits [Quit: Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.] |
| 16:29:44 | | lol joins |
| 16:31:13 | | lol quits [Remote host closed the connection] |
| 16:42:00 | | thuban quits [Ping timeout: 252 seconds] |
| 16:49:07 | | Island_ joins |
| 16:49:18 | | Island quits [Remote host closed the connection] |
| 16:52:36 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 17:19:35 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 17:39:04 | | TastyWiener95 (TastyWiener95) joins |
| 18:03:24 | | Jonboy345 quits [Ping timeout: 252 seconds] |
| 18:06:11 | | Jonboy345 joins |
| 18:39:07 | | michaelblob quits [Read error: Connection reset by peer] |
| 18:46:08 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 19:02:47 | | abirkill (abirkill) joins |
| 19:02:51 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 19:07:46 | | zhongfu quits [Client Quit] |
| 19:07:46 | | yawkat quits [Client Quit] |
| 19:07:46 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 19:07:56 | | TastyWiener95 quits [Ping timeout: 252 seconds] |
| 19:07:57 | | zhongfu (zhongfu) joins |
| 19:08:25 | | yawkat (yawkat) joins |
| 19:25:36 | <@JAA> | So it turns out that https://source.codeaurora.org/ is gone in entirety, not just the QUIC parts. :-| |
| 19:25:58 | <@JAA> | Domain fails to resolve since at least a few hours ago. |
| 19:28:42 | | balrog quits [Quit: Bye] |
| 19:34:00 | | balrog (balrog) joins |
| 19:45:08 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 20:13:53 | | Pichu0102 quits [Remote host closed the connection] |
| 20:29:36 | | sec^nd quits [Ping timeout: 245 seconds] |
| 20:30:30 | | thuban joins |
| 20:38:52 | | Sluggs quits [Excess Flood] |
| 20:42:56 | | Sluggs joins |
| 20:51:00 | | tattoo453 joins |
| 20:53:10 | | spirit quits [Ping timeout: 252 seconds] |
| 20:54:52 | | tattoo453 quits [Remote host closed the connection] |
| 20:57:07 | | TheTechRobo (TheTechRobo) joins |
| 21:10:03 | | hitgrr8 quits [Client Quit] |
| 21:16:47 | <@JAA> | https://bye.codeaurora.org/ |
| 21:16:50 | <@JAA> | ... |
| 21:17:08 | <@JAA> | Great. |
| 21:17:54 | <@JAA> | Most of what was left apart from QUIC seem to have been mirrors though, so probably not that bad. |
| 21:17:59 | <@JAA> | But still, ugh. |
| 21:18:52 | <@JAA> | 'CodeAurora.org is now archived.' You keep using that word. It does not mean what you think it means. |
| 21:21:00 | <hexa-> | > Qualcomm Innovation Center |
| 21:21:11 | <hexa-> | I did have other associations with that acronym, but oh well |
| 21:25:11 | <@JAA> | Heh, yeah, I've been working on this for too long, so the association has changed. |
| 21:41:12 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 21:43:59 | | dumbgoy joins |
| 21:56:57 | | michaelblob (michaelblob) joins |
| 22:01:42 | | sec^nd (second) joins |
| 22:01:44 | | ehmry quits [Ping timeout: 252 seconds] |
| 22:02:59 | | ehmry joins |
| 22:51:50 | | BearFortress quits [Client Quit] |
| 23:43:49 | <@arkiver> | JAA: do we have the rootsweb mailing lists preserved? |
| 23:59:42 | <@JAA> | arkiver: I didn't do anything. hook54321 said 'we have i believe a partial copy' last month. |