00:04:57sec^nd quits [Remote host closed the connection]
00:06:19sec^nd (second) joins
00:22:46hackbug quits [Remote host closed the connection]
00:30:00Sluggs quits [Read error: Connection reset by peer]
00:30:58Sluggs joins
00:46:41<pokechu22>myself: https://archive.org/details/wiki-wavesharecom_w - this has all of the files and page history. The big trick I used to deal with the rate limiting is that wikiteam tools can export 50 revisions at a time... but I still had to deal with the 12-second wait when downloading files (I made sure to leave a minimum gap of 12 seconds, but I included the time spent downloading
00:46:44<pokechu22>the file, and some files took way longer (e.g. Game-hat-3D-drawing.7z which is 15 MB took 72 seconds to download)). Some files are missing as one was detected as malware and others are victims of case-insensitive filesystems though (I should look into fixing the latter problem at some point). I am running the files through archivebot as well, so those should still end up
00:46:46<pokechu22>being saved (but it'll take even longer)
00:52:10AlsoTheTechRobo quits [Remote host closed the connection]
00:52:51AlsoTheTechRobo (TheTechRobo) joins
00:59:21<AlsoTheTechRobo>thuban: I just modified an existing script to add the post argument /shrug
00:59:36AlsoTheTechRobo quits [Remote host closed the connection]
01:01:08TheTechRobo (TheTechRobo) joins
01:46:31Justin[home] quits [Ping timeout: 252 seconds]
01:58:26<pabs>is there a project for GitLab archiving? this just got removed from the Mozilla add-ons store https://gitlab.com/magnolia1234/bypass-paywalls-clean-filters
01:58:40<pabs>https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clean/-/issues/905
01:58:46<pabs>https://nits.readthefinemanual.net/Magnolia1234B/status/1624091040570306561
01:58:58<pabs>https://www.ghacks.net/2023/02/13/mozilla-removes-bypass-paywalls-clean-extension-from-its-add-ons-repository/
02:04:21<@JAA>I wonder if the .xpi is still accessible.
02:05:16<@JAA>Nope, they purged those, too.
02:05:54<tomodachi94>JAA: They're in the GitLab releases, but no telling if they're the same as the Store: https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clean/-/releases
02:06:12<@JAA>Yeah, I meant the ones signed by Mozilla.
02:08:19Icyelut (Icyelut) joins
02:08:21<@JAA>Archiving GitLab is a pain because of all the scripting. I don't think we have any tooling for that currently.
02:11:21<pabs>the glab tool is useful for command-line API access: https://gitlab.com/gitlab-org/cli
02:12:54<@JAA>Yeah, not the web interface into the WBM though.
02:15:17umgr036 joins
02:27:25<pabs>yep. I guess SPN is the only JS-to-WBM thing there is?
02:27:59pabs wonders if the WBM will ever get DOM dumps in addition to screenshots etc
02:34:59lennier1 quits [Client Quit]
02:35:14lennier1 (lennier1) joins
02:37:28<@JAA>Kind of, yeah. SPN uses Brozzler, which is essentially a WARC-writing MITM proxy and a Chromium browser (plus machinery around it). And that kind of approach is really the only thing that's sensible for heavily interactive sites. Unfortunately, WARC supports neither HTTP/2 (or /3) nor WebSockets, so it still breaks down quickly.
02:37:50<@JAA>And of course, this is extremely resource-intensive compared to the crawls we normally do.
02:48:12<@JAA>FWIW, I've dumped a bundle of the Git repo itself onto IA.
02:49:21<pokechu22>Hmm, I thought I saw websockets work before? I think with a ws_ suffix on the timestamp?
02:49:26DiscantX joins
02:49:40<@JAA>The WARC format does not support WS at all.
02:50:01<@JAA>It'd be an unofficial extension with no public specification (that I'm aware of).
02:50:20<@JAA>Not that this stops people. There are also people writing fake HTTP/1.1 responses for HTTP/2 traffic.
02:51:18<TheTechRobo>I love the idea of WARC, but it's horrendously outdated and the tooling is meh
02:51:45<@JAA>Yeah, it's certainly slow-moving.
02:51:50<TheTechRobo>Storing raw data sent and received is a great idea, but it doesn't matter if you can't store any modern protocol.
02:52:00<TheTechRobo>Eventually people will abandon HTTP/1.1.
02:52:12<TheTechRobo>Might take ages, but it's going to happen.
02:52:27<TheTechRobo>And a lot of servers already don't support it.
02:52:43<@JAA>You could store pcaps + SSL pre-master keys. That would be the most generic capture format possible. It'd be even worse to work with though.
02:52:50<TheTechRobo>WebSockets are also super important, and we don't have that.
02:53:16<TheTechRobo>JAA: Yeah, that'd make an annoying-to-use format just dumb to use. Nobody's going to use WARC if it's that difficult to extract a damn webpage.
02:53:24DiscantX quits [Client Quit]
02:53:39<@JAA>I haven't come across many HTTP servers that didn't accept 1.1 connections. In fact, I can't think of any right now. Buttflare's 1.1 implementation is flawed, but that's about it.
02:53:42DiscantX joins
02:54:03DiscantX quits [Client Quit]
02:54:23<TheTechRobo>JAA: It will happen though, it's just a matter of time. Websites are so awfully bloated that HTTP/1.1 won't be able to keep up.
02:54:36<@JAA>The problem is that people have very different ideas of what 'a webpage' even is.
02:55:03<@JAA>And that's especially true for interactive pages.
02:55:08<TheTechRobo>^
02:55:33<pokechu22>Ah, what I was thinking of is that e.g. https://web.archive.org/web/20230214025333/https://en.js.cx/article/websocket/chat/ redirects wss://web.archive.org/web/20230214025333ws_/wss://javascript.info/article/websocket/chat/ws to https://web.archive.org/web/20221010222504ws_/http://javascript.info/article/websocket/chat/ws (which doesn't work here)
02:57:22<@JAA>I've had that little idea in my head for a while of modifying a browser in a way to make things deterministic. E.g. random seed values, current timestamp, and whatnot. Mocking all external side effects, essentially. Then, on playback, you could in theory play things back exactly as they were captured. However, this wouldn't necessarily be what people actually want anyway, and it'd be a *lot* of work.
02:57:44<TheTechRobo>Yeah
02:58:31<TheTechRobo>I just want to go back to the geocities era, where this wasn't nearly as much of a problem. Sign my guestbook, I guess...
02:58:59<@JAA>Agreed, but also delete JavaScript engines from the browser, thank.
03:01:18<@JAA>Well, and ActiveX and Flash and all that nonsense. :-)
03:01:34<TheTechRobo>And Java!
03:01:44<TheTechRobo>I wasn't even on the Internet during that era and I miss it.
03:01:44<@JAA>Oh god yeah, Java applets...
03:02:37<@JAA>Just HTML and CSS. If you squint, it's even Turing-complete, so it's not like it limits what you can do with the page. :-)
03:02:44<TheTechRobo>hah
03:03:03<TheTechRobo>Just make it all server-side and be done with it.
03:03:18BlueMaxima quits [Read error: Connection reset by peer]
03:03:40<TheTechRobo>Maybe add an html feature to add a loading screen to the page while something's loading, e.g. a link or form submit, but other than that, no client-side scripting kthxbai.
03:03:49<@JAA>Oh, also, POST forms for navigation are banned.
03:19:10pabs hugs the URL extraction of yt-dlp yt-dlp --verbose --dump-json
03:51:20Ketchup901 quits [Ping timeout: 276 seconds]
03:57:12Ketchup901 (Ketchup901) joins
04:38:23dan_a quits [Remote host closed the connection]
05:19:19monoxane7 (monoxane) joins
05:21:53monoxane quits [Ping timeout: 265 seconds]
05:21:54monoxane7 is now known as monoxane
05:36:51sonick quits [Client Quit]
05:52:48Ketchup902 (Ketchup901) joins
05:52:53Ketchup901 quits [Ping timeout: 276 seconds]
06:04:26DopefishJustin joins
06:05:11Ketchup902 quits [Remote host closed the connection]
06:05:30Ketchup901 (Ketchup901) joins
06:29:48<Barto>not sure if throwing https://community.mycroft.ai/ into AB will be working, it's running discourse.
06:30:00<pabs>that was done recently already
06:33:02<Barto>:-)
07:16:02hackbug (hackbug) joins
07:17:31sec^nd quits [Remote host closed the connection]
07:18:31sec^nd (second) joins
07:35:13hitgrr8 joins
07:36:57hackbug quits [Client Quit]
07:42:50benjinsm joins
07:43:45hackbug (hackbug) joins
07:44:48Island quits [Read error: Connection reset by peer]
07:46:13benjins quits [Ping timeout: 252 seconds]
07:53:05Arcorann (Arcorann) joins
08:01:52hackbug quits [Ping timeout: 265 seconds]
08:08:11umgr036 quits [Remote host closed the connection]
08:12:21umgr036 joins
08:16:28LeGoupil joins
08:27:18@dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
08:27:40dxrt joins
08:27:42dxrt quits [Changing host]
08:27:42dxrt (dxrt) joins
08:27:42@ChanServ sets mode: +o dxrt
09:51:26HackMii_ quits [Ping timeout: 276 seconds]
09:53:28HackMii_ (hacktheplanet) joins
09:54:26datechnoman quits [Quit: The Lounge - https://thelounge.chat]
09:55:32datechnoman (datechnoman) joins
10:01:20benjinsmi joins
10:04:49benjinsm quits [Ping timeout: 252 seconds]
10:05:01umgr036 quits [Remote host closed the connection]
10:05:16umgr036 joins
11:12:28xkey quits [Client Quit]
11:13:04KateBush joins
11:13:25KateBush quits [Remote host closed the connection]
11:13:41StrangePhenomena joins
11:13:46<StrangePhenomena>Hello all
11:15:10StrangePhenomena quits [Remote host closed the connection]
11:16:33xkey (xkey) joins
11:23:51dan_a (dan_a) joins
11:54:16LeGoupil quits [Ping timeout: 252 seconds]
12:29:19benjinsmi is now known as benjins
12:42:00LeGoupil joins
12:56:13Arcorann quits [Ping timeout: 265 seconds]
12:57:31hackbug (hackbug) joins
13:18:16daxxy quits [Quit: bye]
13:18:30daxxy (daxxy) joins
13:20:37qwertyasdfuiopghjkl quits [Remote host closed the connection]
13:22:16lennier1 quits [Ping timeout: 252 seconds]
13:22:27lennier1 (lennier1) joins
13:24:36lennier2_ joins
13:26:40lennier1 quits [Ping timeout: 252 seconds]
13:28:45Icyelut quits [Client Quit]
13:28:52lennier2_ quits [Ping timeout: 252 seconds]
13:31:43lennier2_ joins
13:31:48lennier2_ is now known as lennier1
13:32:28Icyelut (Icyelut) joins
13:35:39lennier2 joins
13:38:02lennier1 quits [Ping timeout: 252 seconds]
13:39:16lennier2_ joins
13:39:36lennier2_ quits [Read error: Connection reset by peer]
13:39:51lennier2_ joins
13:39:51lennier2_ is now known as lennier1
13:42:08lennier2 quits [Ping timeout: 265 seconds]
13:56:47eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
13:57:29eroc1990 (eroc1990) joins
14:12:41Chris5010 quits [Quit: Ping timeout (120 seconds)]
14:12:41franga2000 quits [Quit: Ping timeout (120 seconds)]
14:12:41nomad-geek quits [Client Quit]
14:12:41Arachnophine quits [Quit: Ping timeout (120 seconds)]
14:12:41Jake quits [Client Quit]
14:12:41birdjj quits [Quit: Ping timeout (120 seconds)]
14:12:41eroc1990 quits [Client Quit]
14:12:41CraftByte quits [Quit: Ping timeout (120 seconds)]
14:12:41umgr036 quits [Remote host closed the connection]
14:12:41VerifiedJ quits [Quit: Ping timeout (120 seconds)]
14:12:41ave quits [Quit: Ping timeout (120 seconds)]
14:12:41flashfire42 quits [Quit: Ping timeout (120 seconds)]
14:12:41CraftByte6 (DragonSec|CraftByte) joins
14:12:41nomad-geek1 joins
14:12:41Jake7 (Jake) joins
14:12:41birdjj3 joins
14:12:41flashfire423 (flashfire42) joins
14:12:41VerifiedJ7 (VerifiedJ) joins
14:12:42CraftByte6 is now known as CraftByte
14:12:42VerifiedJ7 is now known as VerifiedJ
14:12:42nomad-geek1 is now known as nomad-geek
14:12:42Jake7 is now known as Jake
14:12:42flashfire423 is now known as flashfire42
14:12:42birdjj3 is now known as birdjj
14:12:43umgr036 joins
14:12:43franga2000 joins
14:12:45Arachnophine (Arachnophine) joins
14:12:47Chris5010 (Chris5010) joins
14:12:49eroc1990 (eroc1990) joins
14:12:49ave (ave) joins
14:13:43Nulo joins
14:14:31Sluggs quits [Ping timeout: 241 seconds]
14:15:15Sluggs joins
14:28:49lennier1 quits [Ping timeout: 252 seconds]
14:29:45lennier1 (lennier1) joins
14:37:14benjinsm joins
14:40:22benjins quits [Ping timeout: 252 seconds]
14:42:33lennier1 quits [Ping timeout: 265 seconds]
14:44:30lennier1 (lennier1) joins
14:44:39sonick (sonick) joins
14:56:05lennier1 quits [Ping timeout: 265 seconds]
14:56:42lennier1 (lennier1) joins
15:06:06lennier2 joins
15:07:52lennier1 quits [Ping timeout: 252 seconds]
15:07:57lennier2 is now known as lennier1
15:25:00lennier2 joins
15:25:32sec^nd quits [Ping timeout: 276 seconds]
15:25:55HackMii_ quits [Remote host closed the connection]
15:26:29HackMii_ (hacktheplanet) joins
15:27:40lennier1 quits [Ping timeout: 252 seconds]
15:27:41lennier2 is now known as lennier1
15:29:49sec^nd (second) joins
15:34:16lennier1 quits [Ping timeout: 252 seconds]
15:35:27lennier1 (lennier1) joins
15:36:35Island joins
15:55:05benjinsm is now known as benjins
15:56:39jabagawee joins
16:20:05jabagawee quits [Client Quit]
16:22:11lennier2 joins
16:23:13lennier1 quits [Ping timeout: 252 seconds]
16:23:16lennier2 is now known as lennier1
16:54:00LeGoupil quits [Client Quit]
16:54:34lennier1 quits [Ping timeout: 252 seconds]
17:05:43lennier1 (lennier1) joins
17:07:05qwertyasdfuiopghjkl joins
17:12:10lennier1 quits [Ping timeout: 252 seconds]
17:12:57lennier1 (lennier1) joins
17:50:19umgr036 quits [Remote host closed the connection]
17:54:31Barto quits [Ping timeout: 252 seconds]
18:01:12lennier2 joins
18:01:18lennier1 quits [Ping timeout: 252 seconds]
18:01:22lennier2 is now known as lennier1
18:56:48Barto (Barto) joins
19:37:08<h2ibot>JustAnotherArchivist edited Deathwatch (+595, /* 2023 */ Add Terminal Boredom): https://wiki.archiveteam.org/?diff=49462&oldid=49461
20:21:44edisondotme quits [Remote host closed the connection]
20:24:24benjins quits [Remote host closed the connection]
20:24:24franga2000 quits [Client Quit]
20:24:24Chris5010 quits [Client Quit]
20:24:24benjinsm joins
20:24:24eroc1990 quits [Client Quit]
20:24:24franga20000 joins
20:24:24ave quits [Client Quit]
20:24:25franga20000 is now known as franga2000
20:24:30ave (ave) joins
20:24:30Chris5010 (Chris5010) joins
20:24:44eroc1990 (eroc1990) joins
21:48:56<Barto>pabs: thanks for doing it. Sorry i missed your work earlier.
22:46:22hitgrr8 quits [Client Quit]
23:11:33BlueMaxima joins
23:32:04Atom-- quits [Read error: Connection reset by peer]
23:48:51lennier1 quits [Client Quit]
23:49:16lennier1 (lennier1) joins
23:51:20lennier1 quits [Client Quit]
23:57:39benjinsm is now known as benjins