00:24:25irisfreckles13 quits [Read error: Connection reset by peer]
00:38:07etnguyen03 quits [Client Quit]
00:47:47RadRooster joins
00:50:46<pokechu22>klea: that bucket is 41 TB
00:51:32<RadRooster>Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions?
00:52:21<pokechu22>RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load
00:53:17<RadRooster>Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently
00:54:46<RadRooster>Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways.
01:02:34BennyOtt quits [Ping timeout: 256 seconds]
01:08:41etnguyen03 (etnguyen03) joins
01:15:44@imer quits [Killed (NickServ (GHOST command used by imer5))]
01:15:56Sk1d quits [Read error: Connection reset by peer]
01:15:58imer (imer) joins
01:15:58@ChanServ sets mode: +o imer
01:19:37<katia>pokechu22: biggest single item is the only bottleneck though
01:19:46<katia>s/item/file/
01:20:12<pokechu22>I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting
01:23:24lukash984 quits [Quit: The Lounge - https://thelounge.chat]
01:25:19lukash984 joins
01:38:27Wohlstand quits [Client Quit]
01:38:48Mateon1 quits [Quit: Mateon1]
01:50:35lukash984 quits [Client Quit]
01:51:57lukash984 joins
01:58:43linuxgemini8 (linuxgemini) joins
01:59:48linuxgemini quits [Ping timeout: 256 seconds]
01:59:48linuxgemini8 is now known as linuxgemini
02:06:32etnguyen03 quits [Client Quit]
02:08:01Mateon1 joins
02:34:43ducky quits [Ping timeout: 272 seconds]
02:53:57<nicolas17>what bucket
02:54:03<nicolas17>oh creatorspace-public
02:54:20<nicolas17>is that from bento.me?
02:56:16<nicolas17>I tried "rclone ncdu" and it OOM'd less than a minute in :D
03:01:26<nicolas17>two 100MB+ gifs x_x
03:03:19<nicolas17>weird
03:03:26<nicolas17>https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg
03:03:27<nicolas17>https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg
03:03:29<nicolas17>same file
03:03:48<nicolas17>the base64 in the GCS filename is the behance URL
03:03:52<nicolas17>so this seems like a cache?
03:04:22ducky (ducky) joins
03:04:48nicolas17 .oO if only we had that deduplicating archival thing we talked about
03:08:38etnguyen03 (etnguyen03) joins
03:24:45nepeat quits [Ping timeout: 272 seconds]
03:25:40nepeat (nepeat) joins
03:36:09<nicolas17>yeah ok I don't think we care to archive the entire bucket
03:37:07<nicolas17>https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg
03:37:13<nicolas17>we'll get the image when we archive the profile
03:37:55<nicolas17>I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere
03:37:55<pokechu22>bento.me doesn't have a sitemap though so it's not clear if we could discover all of those
03:38:14<nicolas17>so now the bucket has thousands and thousands of images that won't actually show up in any profile
03:39:15<pokechu22>FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there
03:40:49<nicolas17>15GB damn
03:43:35<nicolas17>ok we can enumerate all users based on the bucket contents
03:43:39<nicolas17>but I think there's 832996 users
03:44:40<nicolas17>https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82
03:47:13<nicolas17>oh my smol VPS will run out of disk space before I get all 15GBs
03:47:35<pokechu22>https://storage.googleapis.com/creatorspace-public/?prefix=og
03:47:53<nicolas17>yeah I did get all of og
03:49:43<nicolas17>I could sacrifice my VPS IP and figure out if there's API rate limits :p
03:50:41<pokechu22>The AB job was also getting some 403s (though not in the API), so that's probably worth checking
03:50:52<pokechu22>(though not in the API == we didn't try the API)
03:51:00<nicolas17>AB job starting at bento.me homepage?
03:51:27<pokechu22>Yes
03:54:32<nicolas17>many of these IDs return 404 not found
04:00:45<nicolas17>got some gateway timeouts that succeeded on wget's automatic retry
04:01:28<nicolas17>465/700 IDs succeeded
04:02:36<nicolas17>wonder if I should try making my first wget-lua script...
04:04:23<nicolas17>if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket
04:10:03<nicolas17>2099/3124 IDs succeeded
04:15:03tripleshrimp joins
04:20:56etnguyen03 quits [Client Quit]
04:25:04etnguyen03 (etnguyen03) joins
04:28:40<nicolas17>5619/8331 IDs succeeded, 28m25s
04:44:47Arcorann joins
05:01:21<Doranwen>Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/
05:01:26etnguyen03 quits [Remote host closed the connection]
05:03:01DopefishJustin quits [Remote host closed the connection]
05:04:07DogsRNice quits [Read error: Connection reset by peer]
05:04:32n9nes quits [Ping timeout: 256 seconds]
05:06:06n9nes joins
05:15:28DopefishJustin joins
05:28:17Webuser164222 joins
05:28:28Webuser164222 quits [Client Quit]
05:46:58v01d quits [Remote host closed the connection]
06:21:10Justin[home] joins
06:21:15DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))]
06:21:18Justin[home] is now known as DopefishJustin
06:28:50nexussfan quits [Quit: Konversation terminated!]
06:49:41BennyOtt (BennyOtt) joins
07:42:10Island quits [Read error: Connection reset by peer]
07:46:14<cruller>nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes).
07:47:21<cruller>pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB).
07:47:46<cruller>Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr
07:51:09<cruller>I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f
07:53:24<h2ibot>Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619
07:54:24<h2ibot>SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231
07:56:34Webuser270860 joins
07:56:39Webuser270860 quits [Client Quit]
08:04:21<cruller>Or rather, this is also a kind of “shaking.”
08:04:35<cruller>* shucking
08:06:06<cruller>I think the advantage of "shucking" is deduplication when multiple warc files contain the same content.
08:17:48rohvani quits [Quit: Ping timeout (120 seconds)]
08:17:59rohvani joins
08:26:25hexagonwin_ joins
08:27:29hexagonwin quits [Ping timeout: 272 seconds]
10:37:50<klea>that's too big I suppose?
10:38:25rohvani quits [Client Quit]
10:38:34rohvani joins
10:40:55oxtyped quits [Read error: Connection reset by peer]
10:49:27oxtyped joins
10:54:46pabs quits [Read error: Connection reset by peer]
10:55:35pabs (pabs) joins
10:58:08rohvani quits [Ping timeout: 256 seconds]
11:19:02oxtyped quits [Read error: Connection reset by peer]
11:19:04oxtyped joins
11:27:49Kotomind joins
11:33:03anarcat quits [Ping timeout: 272 seconds]
11:38:09anarcat (anarcat) joins
12:00:06Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:51Bleo182600722719623455222 joins
12:10:10starg2|m joins
12:25:54twiswist quits [Read error: Connection reset by peer]
12:26:14twiswist (twiswist) joins
12:31:51HP_Archivist quits [Quit: Leaving]
12:32:06HP_Archivist (HP_Archivist) joins