00:12:44rewby quits [Ping timeout: 268 seconds]
00:16:57rewby (rewby) joins
01:00:55<pabs>is there a list of archiving tools that are allowed to send WARCs to the WBM? wondering if warcprox is on the list yet
01:04:56<@JAA>It's about the origin of the WARCs, not the tools used to create them.
01:16:51DogsRNice_ quits [Read error: Connection reset by peer]
01:35:57<TheTechRobo>Also, warcprox is used in prod at IA, so it's almost certainly considered OK for use in the WBM
01:36:56<TheTechRobo>well, hard to say that from an outside perspective, but brozzler's tooling is very much built around it so I'd be very surprised if they aren't using it
01:55:43rewby quits [Ping timeout: 268 seconds]
01:57:22rewby (rewby) joins
04:20:27BearFortress quits []
04:54:52BearFortress joins
05:16:50atphoenix__ (atphoenix) joins
05:19:50atphoenix_ quits [Ping timeout: 268 seconds]
05:38:07Starchives_ (Starchives) joins
05:42:02Starchives__ quits [Ping timeout: 268 seconds]
06:13:59<klea>"Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot), 1.1 warcprox" and "Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; http://archive.org/details/archive.org_bot), 1.1 warcprox" do sure make it seem like warcprox is used.
07:05:14SootBector quits [Ping timeout: 260 seconds]
07:06:55SootBector (SootBector) joins
09:56:39Grzesiek11 quits [Read error: Connection reset by peer]
09:56:42Grzesiek11 (Grzesiek11) joins
15:03:55<cruller>It appears that https://web.archive.org/web/20260217030013/https://www.youtube.com/watch?v=Hnh8SufJgz0 was captured using Zeno.
15:03:56<cruller>> "userAgent":"Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot) Zeno/0344175 warc/v0.8.97,gzip(gfe)"
15:03:57<cruller>Also, > x-archive-src: SPNOUTLINKS-20260217005153258-00010-zeno-k8s-spn-crawl-b7h7s/SPNOUTLINKS-20260217024118128-00017-zeno-k8s-spn-crawl-7665ddb6c9.warc.gz
15:58:30<cruller>https://web.archive.org/web/20231207103832/https://www.youtube.com/watch?v=WM9kEnEePZk (= https://archive.org/download/daily_perma_cc_2023-12-07/25YS-QQGG.warc.gz ) was captured by Perma.cc using Scoop 0.6.2
15:59:06<klea>Huh.
15:59:32<klea>TIL YouTube puts their UA back.
15:59:44<klea>Yeah if they start Zeno it works differently I believe.
16:11:13<cruller>It’s interesting that while Scoop supports WACZ, the developers themselves don’t use it.
16:11:15<cruller>WARC++
16:11:15<eggdrop>[karma] 'WARC' now has 2 karma!
16:13:22<klea>I suppose cruller also meant to link to the perma link :p https://perma.cc/25YS-QQGG
16:17:37<cruller>Yeah, I should have done that :D
16:17:56<klea>I think the one will like to at least !ao something :p
16:44:00<nicolas17>what's the file size limit of IA-generated torrents?
16:44:49<nicolas17>IIRC if an item is larger than X then archive.org does not make a torrent for it
16:46:59<nicolas17>heh, looks like the way derive.php generates video thumbnails, it has to decode the entire video stream, even though it's making like 1 thumb per hour so it would be faster to seek to those positions
18:15:47Matthww3 quits [Quit: Ping timeout (120 seconds)]
18:16:47Matthww3 joins
18:30:17<klea>Huh, how much swap do petaboxes have?
18:32:33<nicolas17>oh I don't think this takes much RAM, it's still reading the file incrementally
18:34:55<pokechu22>I previously filed https://github.com/traceypooh/deriver-archive/issues/1 which I *think* is the repo that handles that derivation (or at least it looked like it at the time)
18:37:55<nicolas17>I don't think it's encoding JPEGs for every frame, but it is certainly decoding every frame yeah
18:45:54@hook54321 quits [Ping timeout: 633 seconds]
18:48:09hook54321 (hook54321) joins
18:48:09@ChanServ sets mode: +o hook54321
19:09:08<nicolas17>pokechu22: https://archive.org/download/nasa-artemis-ii-primarystream/nasa-artemis-ii-primarystream.thumbs/ look at those timestamps
19:09:19<nicolas17>it took 11 hours
19:52:25<klea>Huh, I wonder how the deriver broke after https://archive.org/metadata/WJZ_20100122_000141_CBS_Evening_News_With_Katie_Couric/ got ARId.
21:18:08cm quits [Ping timeout: 268 seconds]
21:19:27Matthww3 quits [Ping timeout: 268 seconds]
21:21:00Matthww3 joins
21:29:33cm joins