00:24:25irisfreckles13 quits [Read error: Connection reset by peer]
00:38:07etnguyen03 quits [Client Quit]
00:47:47RadRooster joins
00:50:46<pokechu22>klea: that bucket is 41 TB
00:51:32<RadRooster>Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions?
00:52:21<pokechu22>RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load
00:53:17<RadRooster>Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently
00:54:46<RadRooster>Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways.
01:02:34BennyOtt quits [Ping timeout: 256 seconds]
01:08:41etnguyen03 (etnguyen03) joins
01:15:44@imer quits [Killed (NickServ (GHOST command used by imer5))]
01:15:56Sk1d quits [Read error: Connection reset by peer]
01:15:58imer (imer) joins
01:15:58@ChanServ sets mode: +o imer
01:19:37<katia>pokechu22: biggest single item is the only bottleneck though
01:19:46<katia>s/item/file/
01:20:12<pokechu22>I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting
01:23:24lukash984 quits [Quit: The Lounge - https://thelounge.chat]
01:25:19lukash984 joins
01:38:27Wohlstand quits [Client Quit]
01:38:48Mateon1 quits [Quit: Mateon1]
01:50:35lukash984 quits [Client Quit]
01:51:57lukash984 joins
01:58:43linuxgemini8 (linuxgemini) joins
01:59:48linuxgemini quits [Ping timeout: 256 seconds]
01:59:48linuxgemini8 is now known as linuxgemini
02:06:32etnguyen03 quits [Client Quit]
02:08:01Mateon1 joins
02:34:43ducky quits [Ping timeout: 272 seconds]
02:53:57<nicolas17>what bucket
02:54:03<nicolas17>oh creatorspace-public
02:54:20<nicolas17>is that from bento.me?
02:56:16<nicolas17>I tried "rclone ncdu" and it OOM'd less than a minute in :D
03:01:26<nicolas17>two 100MB+ gifs x_x
03:03:19<nicolas17>weird
03:03:26<nicolas17>https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg
03:03:27<nicolas17>https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg
03:03:29<nicolas17>same file
03:03:48<nicolas17>the base64 in the GCS filename is the behance URL
03:03:52<nicolas17>so this seems like a cache?
03:04:22ducky (ducky) joins
03:04:48nicolas17 .oO if only we had that deduplicating archival thing we talked about
03:08:38etnguyen03 (etnguyen03) joins
03:24:45nepeat quits [Ping timeout: 272 seconds]
03:25:40nepeat (nepeat) joins
03:36:09<nicolas17>yeah ok I don't think we care to archive the entire bucket
03:37:07<nicolas17>https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg
03:37:13<nicolas17>we'll get the image when we archive the profile
03:37:55<nicolas17>I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere
03:37:55<pokechu22>bento.me doesn't have a sitemap though so it's not clear if we could discover all of those
03:38:14<nicolas17>so now the bucket has thousands and thousands of images that won't actually show up in any profile
03:39:15<pokechu22>FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there
03:40:49<nicolas17>15GB damn
03:43:35<nicolas17>ok we can enumerate all users based on the bucket contents
03:43:39<nicolas17>but I think there's 832996 users
03:44:40<nicolas17>https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82
03:47:13<nicolas17>oh my smol VPS will run out of disk space before I get all 15GBs
03:47:35<pokechu22>https://storage.googleapis.com/creatorspace-public/?prefix=og
03:47:53<nicolas17>yeah I did get all of og
03:49:43<nicolas17>I could sacrifice my VPS IP and figure out if there's API rate limits :p
03:50:41<pokechu22>The AB job was also getting some 403s (though not in the API), so that's probably worth checking
03:50:52<pokechu22>(though not in the API == we didn't try the API)
03:51:00<nicolas17>AB job starting at bento.me homepage?
03:51:27<pokechu22>Yes
03:54:32<nicolas17>many of these IDs return 404 not found
04:00:45<nicolas17>got some gateway timeouts that succeeded on wget's automatic retry
04:01:28<nicolas17>465/700 IDs succeeded
04:02:36<nicolas17>wonder if I should try making my first wget-lua script...
04:04:23<nicolas17>if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket
04:10:03<nicolas17>2099/3124 IDs succeeded
04:15:03tripleshrimp joins
04:20:56etnguyen03 quits [Client Quit]
04:25:04etnguyen03 (etnguyen03) joins
04:28:40<nicolas17>5619/8331 IDs succeeded, 28m25s
04:44:47Arcorann joins
05:01:21<Doranwen>Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/
05:01:26etnguyen03 quits [Remote host closed the connection]
05:03:01DopefishJustin quits [Remote host closed the connection]
05:04:07DogsRNice quits [Read error: Connection reset by peer]
05:04:32n9nes quits [Ping timeout: 256 seconds]
05:06:06n9nes joins
05:15:28DopefishJustin joins
05:28:17Webuser164222 joins
05:28:28Webuser164222 quits [Client Quit]
05:46:58v01d quits [Remote host closed the connection]
06:21:10Justin[home] joins
06:21:15DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))]
06:21:18Justin[home] is now known as DopefishJustin
06:28:50nexussfan quits [Quit: Konversation terminated!]
06:49:41BennyOtt (BennyOtt) joins
07:42:10Island quits [Read error: Connection reset by peer]
07:46:14<cruller>nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes).
07:47:21<cruller>pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB).
07:47:46<cruller>Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr
07:51:09<cruller>I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f
07:53:24<h2ibot>Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619
07:54:24<h2ibot>SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231
07:56:34Webuser270860 joins
07:56:39Webuser270860 quits [Client Quit]
08:04:21<cruller>Or rather, this is also a kind of “shaking.”
08:04:35<cruller>* shucking
08:06:06<cruller>I think the advantage of "shucking" is deduplication when multiple warc files contain the same content.
08:17:48rohvani quits [Quit: Ping timeout (120 seconds)]
08:17:59rohvani joins
08:26:25hexagonwin_ joins
08:27:29hexagonwin quits [Ping timeout: 272 seconds]
10:37:50<klea>that's too big I suppose?
10:38:25rohvani quits [Client Quit]
10:38:34rohvani joins
10:40:55oxtyped quits [Read error: Connection reset by peer]
10:49:27oxtyped joins
10:54:46pabs quits [Read error: Connection reset by peer]
10:55:35pabs (pabs) joins
10:58:08rohvani quits [Ping timeout: 256 seconds]
11:19:02oxtyped quits [Read error: Connection reset by peer]
11:19:04oxtyped joins
11:27:49Kotomind joins
11:33:03anarcat quits [Ping timeout: 272 seconds]
11:38:09anarcat (anarcat) joins
12:00:06Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:51Bleo182600722719623455222 joins
12:10:10starg2|m joins
12:25:54twiswist quits [Read error: Connection reset by peer]
12:26:14twiswist (twiswist) joins
12:31:51HP_Archivist quits [Quit: Leaving]
12:32:06HP_Archivist (HP_Archivist) joins
12:49:14<justauser>Doranwen: We already know about Vimeo troubles but don't have a plan yet.
12:50:27<@arkiver>thanks Doranwen
12:57:52etnguyen03 (etnguyen03) joins
13:04:41Webuser243832 joins
13:05:26Webuser2438324 joins
13:05:55Webuser243832 quits [Client Quit]
13:05:55Webuser2438324 quits [Client Quit]
13:16:58Arcorann quits [Ping timeout: 256 seconds]
13:22:37Kotomind quits [Ping timeout: 272 seconds]
13:27:17Webuser183385 joins
13:27:35Ointment8862 quits [Quit: Lost terminal]
13:27:49Webuser183385 quits [Client Quit]
14:06:19TunaLobster quits [Ping timeout: 272 seconds]
14:07:37Dada joins
14:08:44Dada quits [Remote host closed the connection]
14:10:37Dada joins
14:30:09etnguyen03 quits [Client Quit]
14:31:08etnguyen03 (etnguyen03) joins
14:34:51chrismrtn quits [Quit: leaving]
14:41:01<kiska>pokechu22: It simply says that the general publication will cease on 17th Jan 2026 and that breaking news will continue until the end of the month
14:43:46<kiska>Or well that is my interpretation. The translation is: "Ming Pao Canada website will cease updates on January 17, 2026. Breaking news will continue to be updated until January 31, 2026. We sincerely thank everyone for 33 years of support!"
15:03:03chrismrtn (chrismrtn) joins
15:22:02pseudorizer (pseudorizer) joins
16:13:14Wohlstand (Wohlstand) joins
16:48:30chrismrtn quits [Client Quit]
16:58:53twiswist quits [Read error: Connection reset by peer]
16:59:56twiswist (twiswist) joins
17:00:29chrismrtn (chrismrtn) joins
17:05:43<h2ibot>Klea edited Bugzilla (+23, Add bugs.sysrq.in): https://wiki.archiveteam.org/?diff=60338&oldid=57630
17:20:43Dada quits [Remote host closed the connection]
17:22:33Wohlstand quits [Client Quit]
17:26:47BornOn420 quits [Read error: Connection reset by peer]
17:28:13Webuser220730 joins
17:28:20Webuser220730 quits [Client Quit]
17:34:47<h2ibot>Manu edited Discourse/archived (+95, HP_Archivist queued forum.rclone.org): https://wiki.archiveteam.org/?diff=60339&oldid=60325
17:37:44Webuser482213 joins
17:39:26<klea>I've noticed of urls like https://abs-0.twimg.com/emoji/v2/svg/1f308.svg in that twitter thing, should we grab all twimg.com/emoji/v2/svg/ urls?
17:39:30Island joins
17:40:12BornOn420 (BornOn420) joins
17:40:28<klea>a slightly cleaned up version: <https://transfer.archivete.am/x50jY/urls-from-twitter-jobs-for-PC.txt>
17:40:28<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/x50jY/urls-from-twitter-jobs-for-PC.txt>
17:42:17<justauser>I think the whole Twitter emoji set is published somewhere...
17:46:41<PC>it's on github (Twemoji)
17:46:59sec^nd quits [Remote host closed the connection]
17:47:39<PC>(also, thank you klea!)
17:47:48<h2ibot>Justauser edited GeoCities Japan (+45, Added data to the infobox): https://wiki.archiveteam.org/?diff=60340&oldid=58056
17:47:55nicolas17_ (nicolas17) joins
17:48:09sec^nd (second) joins
17:48:49<h2ibot>Justauser edited GeoCities (+187, /* External links */ Updates): https://wiki.archiveteam.org/?diff=60341&oldid=58439
17:50:31nicolas17 quits [Ping timeout: 272 seconds]
17:57:50<h2ibot>Justauser edited FortuneCity (+185, Added link to another mirror): https://wiki.archiveteam.org/?diff=60342&oldid=59211
18:03:19second (second) joins
18:03:33sec^nd quits [Remote host closed the connection]
18:03:34second is now known as sec^nd
18:12:47sg72 joins
18:13:19sg-72 quits [Ping timeout: 272 seconds]
18:30:55etnguyen03 quits [Quit: Konversation terminated!]