| 00:12:44 | | rewby quits [Ping timeout: 268 seconds] |
| 00:16:57 | | rewby (rewby) joins |
| 01:00:55 | <pabs> | is there a list of archiving tools that are allowed to send WARCs to the WBM? wondering if warcprox is on the list yet |
| 01:04:56 | <@JAA> | It's about the origin of the WARCs, not the tools used to create them. |
| 01:16:51 | | DogsRNice_ quits [Read error: Connection reset by peer] |
| 01:35:57 | <TheTechRobo> | Also, warcprox is used in prod at IA, so it's almost certainly considered OK for use in the WBM |
| 01:36:56 | <TheTechRobo> | well, hard to say that from an outside perspective, but brozzler's tooling is very much built around it so I'd be very surprised if they aren't using it |
| 01:55:43 | | rewby quits [Ping timeout: 268 seconds] |
| 01:57:22 | | rewby (rewby) joins |
| 04:20:27 | | BearFortress quits [] |
| 04:54:52 | | BearFortress joins |
| 05:16:50 | | atphoenix__ (atphoenix) joins |
| 05:19:50 | | atphoenix_ quits [Ping timeout: 268 seconds] |
| 05:38:07 | | Starchives_ (Starchives) joins |
| 05:42:02 | | Starchives__ quits [Ping timeout: 268 seconds] |
| 06:13:59 | <klea> | "Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot), 1.1 warcprox" and "Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot), 1.1 warcprox" do sure make it seem like warcprox is used. |
| 07:05:14 | | SootBector quits [Ping timeout: 260 seconds] |
| 07:06:55 | | SootBector (SootBector) joins |
| 09:56:39 | | Grzesiek11 quits [Read error: Connection reset by peer] |
| 09:56:42 | | Grzesiek11 (Grzesiek11) joins |
| 15:03:55 | <cruller> | It appears that https://web.archive.org/web/20260217030013/https://www.youtube.com/watch?v=Hnh8SufJgz0 was captured using Zeno. |
| 15:03:56 | <cruller> | > "userAgent":"Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot) Zeno/0344175 warc/v0.8.97,gzip(gfe)" |
| 15:03:57 | <cruller> | Also, > x-archive-src: SPNOUTLINKS-20260217005153258-00010-zeno-k8s-spn-crawl-b7h7s/SPNOUTLINKS-20260217024118128-00017-zeno-k8s-spn-crawl-7665ddb6c9.warc.gz |
| 15:58:30 | <cruller> | https://web.archive.org/web/20231207103832/https://www.youtube.com/watch?v=WM9kEnEePZk (= https://archive.org/download/daily_perma_cc_2023-12-07/25YS-QQGG.warc.gz ) was captured by Perma.cc using Scoop 0.6.2 |
| 15:59:06 | <klea> | Huh. |
| 15:59:32 | <klea> | TIL YouTube puts their UA back. |
| 15:59:44 | <klea> | Yeah if they start Zeno it works differently I believe. |
| 16:11:13 | <cruller> | It’s interesting that while Scoop supports WACZ, the developers themselves don’t use it. |
| 16:11:15 | <cruller> | WARC++ |
| 16:11:15 | <eggdrop> | [karma] 'WARC' now has 2 karma! |
| 16:13:22 | <klea> | I suppose cruller also meant to link to the perma link :p https://perma.cc/25YS-QQGG |
| 16:17:37 | <cruller> | Yeah, I should have done that :D |
| 16:17:56 | <klea> | I think the one will like to at least !ao something :p |
| 16:44:00 | <nicolas17> | what's the file size limit of IA-generated torrents? |
| 16:44:49 | <nicolas17> | IIRC if an item is larger than X then archive.org does not make a torrent for it |
| 16:46:59 | <nicolas17> | heh, looks like the way derive.php generates video thumbnails, it has to decode the entire video stream, even though it's making like 1 thumb per hour so it would be faster to seek to those positions |
| 18:15:47 | | Matthww3 quits [Quit: Ping timeout (120 seconds)] |
| 18:16:47 | | Matthww3 joins |
| 18:30:17 | <klea> | Huh, how much swap do petaboxes have? |
| 18:32:33 | <nicolas17> | oh I don't think this takes much RAM, it's still reading the file incrementally |
| 18:34:55 | <pokechu22> | I previously filed https://github.com/traceypooh/deriver-archive/issues/1 which I *think* is the repo that handles that derivation (or at least it looked like it at the time) |
| 18:37:55 | <nicolas17> | I don't think it's encoding JPEGs for every frame, but it is certainly decoding every frame yeah |
| 18:45:54 | | @hook54321 quits [Ping timeout: 633 seconds] |
| 18:48:09 | | hook54321 (hook54321) joins |
| 18:48:09 | | @ChanServ sets mode: +o hook54321 |
| 19:09:08 | <nicolas17> | pokechu22: https://archive.org/download/nasa-artemis-ii-primarystream/nasa-artemis-ii-primarystream.thumbs/ look at those timestamps |
| 19:09:19 | <nicolas17> | it took 11 hours |
| 19:52:25 | <klea> | Huh, I wonder how the deriver broke after https://archive.org/metadata/WJZ_20100122_000141_CBS_Evening_News_With_Katie_Couric/ got ARId. |
| 21:18:08 | | cm quits [Ping timeout: 268 seconds] |
| 21:19:27 | | Matthww3 quits [Ping timeout: 268 seconds] |
| 21:21:00 | | Matthww3 joins |
| 21:29:33 | | cm joins |