00:24:25irisfreckles13 quits [Read error: Connection reset by peer]
00:38:07etnguyen03 quits [Client Quit]
00:47:47RadRooster joins
00:50:46<pokechu22>klea: that bucket is 41 TB
00:51:32<RadRooster>Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions?
00:52:21<pokechu22>RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load
00:53:17<RadRooster>Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently
00:54:46<RadRooster>Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways.
01:02:34BennyOtt quits [Ping timeout: 256 seconds]
01:08:41etnguyen03 (etnguyen03) joins
01:15:44@imer quits [Killed (NickServ (GHOST command used by imer5))]
01:15:56Sk1d quits [Read error: Connection reset by peer]
01:15:58imer (imer) joins
01:15:58@ChanServ sets mode: +o imer
01:19:37<katia>pokechu22: biggest single item is the only bottleneck though
01:19:46<katia>s/item/file/
01:20:12<pokechu22>I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting
01:23:24lukash984 quits [Quit: The Lounge - https://thelounge.chat]
01:25:19lukash984 joins
01:38:27Wohlstand quits [Client Quit]
01:38:48Mateon1 quits [Quit: Mateon1]
01:50:35lukash984 quits [Client Quit]
01:51:57lukash984 joins
01:58:43linuxgemini8 (linuxgemini) joins
01:59:48linuxgemini quits [Ping timeout: 256 seconds]
01:59:48linuxgemini8 is now known as linuxgemini
02:06:32etnguyen03 quits [Client Quit]
02:08:01Mateon1 joins
02:34:43ducky quits [Ping timeout: 272 seconds]
02:53:57<nicolas17>what bucket
02:54:03<nicolas17>oh creatorspace-public
02:54:20<nicolas17>is that from bento.me?
02:56:16<nicolas17>I tried "rclone ncdu" and it OOM'd less than a minute in :D
03:01:26<nicolas17>two 100MB+ gifs x_x
03:03:19<nicolas17>weird
03:03:26<nicolas17>https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg
03:03:27<nicolas17>https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg
03:03:29<nicolas17>same file
03:03:48<nicolas17>the base64 in the GCS filename is the behance URL
03:03:52<nicolas17>so this seems like a cache?
03:04:22ducky (ducky) joins
03:04:48nicolas17 .oO if only we had that deduplicating archival thing we talked about
03:08:38etnguyen03 (etnguyen03) joins
03:24:45nepeat quits [Ping timeout: 272 seconds]
03:25:40nepeat (nepeat) joins
03:36:09<nicolas17>yeah ok I don't think we care to archive the entire bucket
03:37:07<nicolas17>https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg
03:37:13<nicolas17>we'll get the image when we archive the profile
03:37:55<nicolas17>I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere
03:37:55<pokechu22>bento.me doesn't have a sitemap though so it's not clear if we could discover all of those
03:38:14<nicolas17>so now the bucket has thousands and thousands of images that won't actually show up in any profile
03:39:15<pokechu22>FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there
03:40:49<nicolas17>15GB damn
03:43:35<nicolas17>ok we can enumerate all users based on the bucket contents
03:43:39<nicolas17>but I think there's 832996 users
03:44:40<nicolas17>https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82
03:47:13<nicolas17>oh my smol VPS will run out of disk space before I get all 15GBs
03:47:35<pokechu22>https://storage.googleapis.com/creatorspace-public/?prefix=og
03:47:53<nicolas17>yeah I did get all of og
03:49:43<nicolas17>I could sacrifice my VPS IP and figure out if there's API rate limits :p
03:50:41<pokechu22>The AB job was also getting some 403s (though not in the API), so that's probably worth checking
03:50:52<pokechu22>(though not in the API == we didn't try the API)
03:51:00<nicolas17>AB job starting at bento.me homepage?
03:51:27<pokechu22>Yes
03:54:32<nicolas17>many of these IDs return 404 not found
04:00:45<nicolas17>got some gateway timeouts that succeeded on wget's automatic retry
04:01:28<nicolas17>465/700 IDs succeeded
04:02:36<nicolas17>wonder if I should try making my first wget-lua script...
04:04:23<nicolas17>if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket
04:10:03<nicolas17>2099/3124 IDs succeeded
04:15:03tripleshrimp joins
04:20:56etnguyen03 quits [Client Quit]
04:25:04etnguyen03 (etnguyen03) joins
04:28:40<nicolas17>5619/8331 IDs succeeded, 28m25s
04:44:47Arcorann joins
05:01:21<Doranwen>Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/
05:01:26etnguyen03 quits [Remote host closed the connection]
05:03:01DopefishJustin quits [Remote host closed the connection]
05:04:07DogsRNice quits [Read error: Connection reset by peer]
05:04:32n9nes quits [Ping timeout: 256 seconds]
05:06:06n9nes joins
05:15:28DopefishJustin joins
05:28:17Webuser164222 joins
05:28:28Webuser164222 quits [Client Quit]
05:46:58v01d quits [Remote host closed the connection]
06:21:10Justin[home] joins
06:21:15DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))]
06:21:18Justin[home] is now known as DopefishJustin
06:28:50nexussfan quits [Quit: Konversation terminated!]
06:49:41BennyOtt (BennyOtt) joins
07:42:10Island quits [Read error: Connection reset by peer]
07:46:14<cruller>nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes).
07:47:21<cruller>pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB).
07:47:46<cruller>Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr
07:51:09<cruller>I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f
07:53:24<h2ibot>Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619
07:54:24<h2ibot>SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231
07:56:34Webuser270860 joins
07:56:39Webuser270860 quits [Client Quit]
08:04:21<cruller>Or rather, this is also a kind of “shaking.”
08:04:35<cruller>* shucking
08:06:06<cruller>I think the advantage of "shucking" is deduplication when multiple warc files contain the same content.
08:17:48rohvani quits [Quit: Ping timeout (120 seconds)]
08:17:59rohvani joins
08:26:25hexagonwin_ joins
08:27:29hexagonwin quits [Ping timeout: 272 seconds]
10:37:50<klea>that's too big I suppose?
10:38:25rohvani quits [Client Quit]
10:38:34rohvani joins
10:40:55oxtyped quits [Read error: Connection reset by peer]
10:49:27oxtyped joins
10:54:46pabs quits [Read error: Connection reset by peer]
10:55:35pabs (pabs) joins
10:58:08rohvani quits [Ping timeout: 256 seconds]
11:19:02oxtyped quits [Read error: Connection reset by peer]
11:19:04oxtyped joins
11:27:49Kotomind joins
11:33:03anarcat quits [Ping timeout: 272 seconds]
11:38:09anarcat (anarcat) joins
12:00:06Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:51Bleo182600722719623455222 joins
12:10:10starg2|m joins
12:25:54twiswist quits [Read error: Connection reset by peer]
12:26:14twiswist (twiswist) joins
12:31:51HP_Archivist quits [Quit: Leaving]
12:32:06HP_Archivist (HP_Archivist) joins
12:49:14<justauser>Doranwen: We already know about Vimeo troubles but don't have a plan yet.
12:50:27<@arkiver>thanks Doranwen
12:57:52etnguyen03 (etnguyen03) joins
13:04:41Webuser243832 joins
13:05:26Webuser2438324 joins
13:05:55Webuser243832 quits [Client Quit]
13:05:55Webuser2438324 quits [Client Quit]
13:16:58Arcorann quits [Ping timeout: 256 seconds]
13:22:37Kotomind quits [Ping timeout: 272 seconds]
13:27:17Webuser183385 joins
13:27:35Ointment8862 quits [Quit: Lost terminal]
13:27:49Webuser183385 quits [Client Quit]
14:06:19TunaLobster quits [Ping timeout: 272 seconds]
14:07:37Dada joins
14:08:44Dada quits [Remote host closed the connection]
14:10:37Dada joins
14:30:09etnguyen03 quits [Client Quit]
14:31:08etnguyen03 (etnguyen03) joins
14:34:51chrismrtn quits [Quit: leaving]
14:41:01<kiska>pokechu22: It simply says that the general publication will cease on 17th Jan 2026 and that breaking news will continue until the end of the month
14:43:46<kiska>Or well that is my interpretation. The translation is: "Ming Pao Canada website will cease updates on January 17, 2026. Breaking news will continue to be updated until January 31, 2026. We sincerely thank everyone for 33 years of support!"
15:03:03chrismrtn (chrismrtn) joins
15:22:02pseudorizer (pseudorizer) joins
16:13:14Wohlstand (Wohlstand) joins
16:48:30chrismrtn quits [Client Quit]
16:58:53twiswist quits [Read error: Connection reset by peer]
16:59:56twiswist (twiswist) joins
17:00:29chrismrtn (chrismrtn) joins
17:05:43<h2ibot>Klea edited Bugzilla (+23, Add bugs.sysrq.in): https://wiki.archiveteam.org/?diff=60338&oldid=57630
17:20:43Dada quits [Remote host closed the connection]
17:22:33Wohlstand quits [Client Quit]
17:26:47BornOn420 quits [Read error: Connection reset by peer]
17:28:13Webuser220730 joins
17:28:20Webuser220730 quits [Client Quit]
17:34:47<h2ibot>Manu edited Discourse/archived (+95, HP_Archivist queued forum.rclone.org): https://wiki.archiveteam.org/?diff=60339&oldid=60325
17:37:44Webuser482213 joins
17:39:26<klea>I've noticed of urls like https://abs-0.twimg.com/emoji/v2/svg/1f308.svg in that twitter thing, should we grab all twimg.com/emoji/v2/svg/ urls?
17:39:30Island joins
17:40:12BornOn420 (BornOn420) joins
17:40:28<klea>a slightly cleaned up version: <https://transfer.archivete.am/x50jY/urls-from-twitter-jobs-for-PC.txt>
17:40:28<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/x50jY/urls-from-twitter-jobs-for-PC.txt>
17:42:17<justauser>I think the whole Twitter emoji set is published somewhere...
17:46:41<PC>it's on github (Twemoji)
17:46:59sec^nd quits [Remote host closed the connection]
17:47:39<PC>(also, thank you klea!)
17:47:48<h2ibot>Justauser edited GeoCities Japan (+45, Added data to the infobox): https://wiki.archiveteam.org/?diff=60340&oldid=58056
17:47:55nicolas17_ (nicolas17) joins
17:48:09sec^nd (second) joins
17:48:49<h2ibot>Justauser edited GeoCities (+187, /* External links */ Updates): https://wiki.archiveteam.org/?diff=60341&oldid=58439
17:50:31nicolas17 quits [Ping timeout: 272 seconds]
17:57:50<h2ibot>Justauser edited FortuneCity (+185, Added link to another mirror): https://wiki.archiveteam.org/?diff=60342&oldid=59211
18:03:19second (second) joins
18:03:33sec^nd quits [Remote host closed the connection]
18:03:34second is now known as sec^nd
18:12:47sg72 joins
18:13:19sg-72 quits [Ping timeout: 272 seconds]
18:30:55etnguyen03 quits [Quit: Konversation terminated!]
18:44:59etnguyen03 (etnguyen03) joins
18:51:00Wohlstand (Wohlstand) joins
19:15:03emily (pseudorizer) joins
19:15:06pseudorizer quits [Ping timeout: 256 seconds]
19:17:10Ryz quits [Read error: Connection reset by peer]
19:18:48Ryz (Ryz) joins
19:19:11@dxrt quits [Ping timeout: 272 seconds]
19:19:38dxrt joins
19:19:40dxrt quits [Changing host]
19:19:40dxrt (dxrt) joins
19:19:40@ChanServ sets mode: +o dxrt
19:19:49ell7 quits [Ping timeout: 272 seconds]
19:20:03irisfreckles13 joins
19:32:24<klea>PC: you're welcome, if you want make a crib and modify the urls however you want, else tell me what to do with them, and i can make a new url list for AB
19:33:14LddPotato_ joins
19:34:29<PC>crib?
19:36:01<PC>i think for the images, would just be good to get the orig versions perhaps if those haven't been gotten already. but the actual largest images should already be live on the WBM under either large or 4096x4096 urls
19:36:08<h2ibot>KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260126193551&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated
19:36:08<PC>so may not matter really
19:36:17LddPotato quits [Ping timeout: 272 seconds]
19:36:26LddPotato_ is now known as LddPotato
19:37:59<PC>(anyone else getting a lot of 503s on the WBM today? i don't think it's my IP since i'm getting them from another IP address too, just wondering what's up)
19:40:37<pokechu22>Same
19:42:25<hexagonwin_>same, it's *very* slow
19:42:44<PC>RIP, yeah i've never seen it this slow before. hope their servers are okay
19:43:16<klea>PC: yes, as in filter whichever urls you want, modify the existing urls however you want.
19:43:38<PC>sure! i'll put something together
19:46:30<IDK>Hi, is there plans for uploading data from #down-the-tube for late november/early december? During some misconfiguration
19:47:18<IDK>I was looking for some video that was persumably downloaded somewhere back then, no pressure tho!
20:06:59<c3manu>pabs: do you think throwing the flickr 429s into #// would be a good alternative to save them despite the rate-limiting? or would that just get people blocked longterm?
20:07:43<c3manu>like this i gotta watch out for every other job that might link heavily to flickr in case it messes with one of my requeue lists
20:21:51<PC>klea: almost done, but while doing this i found another URL that's not on the WBM, would appreciate getting it crawled too <3 https://twitter.com/YuGiOh_OCG_INFO/status/1560589899769708546
20:21:52<eggdrop>nitter: https://nitter.net/YuGiOh_OCG_INFO/status/1560589899769708546
20:23:34<PC>actually, maybe this one too? it's hard to tell with the WBM on the fritz but i guess better safe than sorry https://twitter.com/YuGiOh_OCG_INFO/status/1750851075840581816
20:23:35<eggdrop>nitter: https://nitter.net/YuGiOh_OCG_INFO/status/1750851075840581816
20:23:35DogsRNice joins
20:23:57<PC>(and there was also https://twitter.com/zunkome2/status/1713581981420618067, just in case that was missed)
20:23:58<eggdrop>nitter: https://nitter.net/zunkome2/status/1713581981420618067
20:24:19<klea>PC: btw join #jseater :)
20:24:29<PC>o7
21:35:39Chris5010 quits [Quit: ]
21:42:52Chris5010 (Chris5010) joins
21:53:46irisfreckles13 quits [Ping timeout: 256 seconds]
21:59:22<nicolas17_>IDK: I didn't know we had an upload backlog in youtube
22:18:30<h2ibot>Cooljeanius edited Twitter (+9, /* Archives */ use "as of" template): https://wiki.archiveteam.org/?diff=60346&oldid=60333
22:21:22nicolas17_ is now known as nicolas17
22:21:31<h2ibot>Cooljeanius edited Twitter (+104, /* Backup Tools */ misc. updates): https://wiki.archiveteam.org/?diff=60347&oldid=60346
22:27:47Webuser587265 joins
22:27:57Webuser587265 quits [Client Quit]
22:40:34<IDK>I checked back on some of the videos that I completed this december, and they have not been indexed yet, not sure if uploaded
22:58:53HP_Archivist quits [Quit: Leaving]
23:01:31nexussfan (nexussfan) joins
23:03:18HP_Archivist (HP_Archivist) joins
23:04:49SootBector quits [Remote host closed the connection]
23:05:57SootBector (SootBector) joins
23:12:12PC quits [Quit: PC]
23:12:36irisfreckles13 joins
23:18:04PC joins
23:18:54Arcorann joins
23:21:32Mateon1 quits [Remote host closed the connection]
23:22:17Mateon1 joins
23:27:20Mateon1 quits [Client Quit]
23:27:29Mateon1 joins
23:30:51Mateon1 quits [Remote host closed the connection]
23:31:39Mateon1 joins
23:36:43Webuser463996 joins
23:37:09Webuser463996 quits [Client Quit]
23:45:27<@JAA>Hmm, why still []?
23:54:43<Yakov>Rip raknet its hosting, http://www.raknet.com/ along with its forum seems to be gone
23:54:47<Yakov>> The hosting account for www.raknet.com expired.
23:57:45<pokechu22>I did an archivebot job for it early December - the site was up but the forum was broken by then