| 00:24:25 | | irisfreckles13 quits [Read error: Connection reset by peer] |
| 00:38:07 | | etnguyen03 quits [Client Quit] |
| 00:47:47 | | RadRooster joins |
| 00:50:46 | <pokechu22> | klea: that bucket is 41 TB |
| 00:51:32 | <RadRooster> | Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions? |
| 00:52:21 | <pokechu22> | RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load |
| 00:53:17 | <RadRooster> | Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently |
| 00:54:46 | <RadRooster> | Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways. |
| 01:02:34 | | BennyOtt quits [Ping timeout: 256 seconds] |
| 01:08:41 | | etnguyen03 (etnguyen03) joins |
| 01:15:44 | | @imer quits [Killed (NickServ (GHOST command used by imer5))] |
| 01:15:56 | | Sk1d quits [Read error: Connection reset by peer] |
| 01:15:58 | | imer (imer) joins |
| 01:15:58 | | @ChanServ sets mode: +o imer |
| 01:19:37 | <katia> | pokechu22: biggest single item is the only bottleneck though |
| 01:19:46 | <katia> | s/item/file/ |
| 01:20:12 | <pokechu22> | I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting |
| 01:23:24 | | lukash984 quits [Quit: The Lounge - https://thelounge.chat] |
| 01:25:19 | | lukash984 joins |
| 01:38:27 | | Wohlstand quits [Client Quit] |
| 01:38:48 | | Mateon1 quits [Quit: Mateon1] |
| 01:50:35 | | lukash984 quits [Client Quit] |
| 01:51:57 | | lukash984 joins |
| 01:58:43 | | linuxgemini8 (linuxgemini) joins |
| 01:59:48 | | linuxgemini quits [Ping timeout: 256 seconds] |
| 01:59:48 | | linuxgemini8 is now known as linuxgemini |
| 02:06:32 | | etnguyen03 quits [Client Quit] |
| 02:08:01 | | Mateon1 joins |
| 02:34:43 | | ducky quits [Ping timeout: 272 seconds] |
| 02:53:57 | <nicolas17> | what bucket |
| 02:54:03 | <nicolas17> | oh creatorspace-public |
| 02:54:20 | <nicolas17> | is that from bento.me? |
| 02:56:16 | <nicolas17> | I tried "rclone ncdu" and it OOM'd less than a minute in :D |
| 03:01:26 | <nicolas17> | two 100MB+ gifs x_x |
| 03:03:19 | <nicolas17> | weird |
| 03:03:26 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg |
| 03:03:27 | <nicolas17> | https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg |
| 03:03:29 | <nicolas17> | same file |
| 03:03:48 | <nicolas17> | the base64 in the GCS filename is the behance URL |
| 03:03:52 | <nicolas17> | so this seems like a cache? |
| 03:04:22 | | ducky (ducky) joins |
| 03:04:48 | | nicolas17 .oO if only we had that deduplicating archival thing we talked about |
| 03:08:38 | | etnguyen03 (etnguyen03) joins |
| 03:24:45 | | nepeat quits [Ping timeout: 272 seconds] |
| 03:25:40 | | nepeat (nepeat) joins |
| 03:36:09 | <nicolas17> | yeah ok I don't think we care to archive the entire bucket |
| 03:37:07 | <nicolas17> | https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg |
| 03:37:13 | <nicolas17> | we'll get the image when we archive the profile |
| 03:37:55 | <nicolas17> | I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere |
| 03:37:55 | <pokechu22> | bento.me doesn't have a sitemap though so it's not clear if we could discover all of those |
| 03:38:14 | <nicolas17> | so now the bucket has thousands and thousands of images that won't actually show up in any profile |
| 03:39:15 | <pokechu22> | FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there |
| 03:40:49 | <nicolas17> | 15GB damn |
| 03:43:35 | <nicolas17> | ok we can enumerate all users based on the bucket contents |
| 03:43:39 | <nicolas17> | but I think there's 832996 users |
| 03:44:40 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82 |
| 03:47:13 | <nicolas17> | oh my smol VPS will run out of disk space before I get all 15GBs |
| 03:47:35 | <pokechu22> | https://storage.googleapis.com/creatorspace-public/?prefix=og |
| 03:47:53 | <nicolas17> | yeah I did get all of og |
| 03:49:43 | <nicolas17> | I could sacrifice my VPS IP and figure out if there's API rate limits :p |
| 03:50:41 | <pokechu22> | The AB job was also getting some 403s (though not in the API), so that's probably worth checking |
| 03:50:52 | <pokechu22> | (though not in the API == we didn't try the API) |
| 03:51:00 | <nicolas17> | AB job starting at bento.me homepage? |
| 03:51:27 | <pokechu22> | Yes |
| 03:54:32 | <nicolas17> | many of these IDs return 404 not found |
| 04:00:45 | <nicolas17> | got some gateway timeouts that succeeded on wget's automatic retry |
| 04:01:28 | <nicolas17> | 465/700 IDs succeeded |
| 04:02:36 | <nicolas17> | wonder if I should try making my first wget-lua script... |
| 04:04:23 | <nicolas17> | if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket |
| 04:10:03 | <nicolas17> | 2099/3124 IDs succeeded |
| 04:15:03 | | tripleshrimp joins |
| 04:20:56 | | etnguyen03 quits [Client Quit] |
| 04:25:04 | | etnguyen03 (etnguyen03) joins |
| 04:28:40 | <nicolas17> | 5619/8331 IDs succeeded, 28m25s |
| 04:44:47 | | Arcorann joins |
| 05:01:21 | <Doranwen> | Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/ |
| 05:01:26 | | etnguyen03 quits [Remote host closed the connection] |
| 05:03:01 | | DopefishJustin quits [Remote host closed the connection] |
| 05:04:07 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:04:32 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:06:06 | | n9nes joins |
| 05:15:28 | | DopefishJustin joins |
| 05:15:28 | | DopefishJustin is now authenticated as DopefishJustin |
| 05:28:17 | | Webuser164222 joins |
| 05:28:28 | | Webuser164222 quits [Client Quit] |
| 05:46:58 | | v01d quits [Remote host closed the connection] |
| 06:21:10 | | Justin[home] joins |
| 06:21:10 | | Justin[home] is now authenticated as DopefishJustin |
| 06:21:15 | | DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))] |
| 06:21:18 | | Justin[home] is now known as DopefishJustin |
| 06:28:50 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:49:41 | | BennyOtt (BennyOtt) joins |
| 07:42:10 | | Island quits [Read error: Connection reset by peer] |
| 07:46:14 | <cruller> | nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes). |
| 07:47:21 | <cruller> | pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB). |
| 07:47:46 | <cruller> | Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr |
| 07:51:09 | <cruller> | I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f |
| 07:53:24 | <h2ibot> | Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619 |
| 07:54:24 | <h2ibot> | SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231 |
| 07:56:34 | | Webuser270860 joins |
| 07:56:39 | | Webuser270860 quits [Client Quit] |
| 08:04:21 | <cruller> | Or rather, this is also a kind of “shaking.” |
| 08:04:35 | <cruller> | * shucking |
| 08:06:06 | <cruller> | I think the advantage of "shucking" is deduplication when multiple warc files contain the same content. |
| 08:17:48 | | rohvani quits [Quit: Ping timeout (120 seconds)] |
| 08:17:59 | | rohvani joins |
| 08:26:25 | | hexagonwin_ joins |
| 08:27:29 | | hexagonwin quits [Ping timeout: 272 seconds] |
| 10:37:50 | <klea> | that's too big I suppose? |
| 10:38:25 | | rohvani quits [Client Quit] |
| 10:38:34 | | rohvani joins |
| 10:40:55 | | oxtyped quits [Read error: Connection reset by peer] |
| 10:49:27 | | oxtyped joins |
| 10:54:46 | | pabs quits [Read error: Connection reset by peer] |
| 10:55:35 | | pabs (pabs) joins |
| 10:58:08 | | rohvani quits [Ping timeout: 256 seconds] |
| 11:19:02 | | oxtyped quits [Read error: Connection reset by peer] |
| 11:19:04 | | oxtyped joins |
| 11:27:49 | | Kotomind joins |
| 11:33:03 | | anarcat quits [Ping timeout: 272 seconds] |
| 11:38:09 | | anarcat (anarcat) joins |
| 12:00:06 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:51 | | Bleo182600722719623455222 joins |
| 12:10:10 | | starg2|m joins |
| 12:25:54 | | twiswist quits [Read error: Connection reset by peer] |
| 12:26:14 | | twiswist (twiswist) joins |
| 12:31:51 | | HP_Archivist quits [Quit: Leaving] |
| 12:32:06 | | HP_Archivist (HP_Archivist) joins |
| 12:49:14 | <justauser> | Doranwen: We already know about Vimeo troubles but don't have a plan yet. |
| 12:50:27 | <@arkiver> | thanks Doranwen |
| 12:57:52 | | etnguyen03 (etnguyen03) joins |
| 13:04:41 | | Webuser243832 joins |
| 13:05:26 | | Webuser2438324 joins |
| 13:05:55 | | Webuser243832 quits [Client Quit] |
| 13:05:55 | | Webuser2438324 quits [Client Quit] |
| 13:16:58 | | Arcorann quits [Ping timeout: 256 seconds] |
| 13:22:37 | | Kotomind quits [Ping timeout: 272 seconds] |
| 13:27:17 | | Webuser183385 joins |
| 13:27:35 | | Ointment8862 quits [Quit: Lost terminal] |
| 13:27:49 | | Webuser183385 quits [Client Quit] |
| 14:06:19 | | TunaLobster quits [Ping timeout: 272 seconds] |
| 14:07:37 | | Dada joins |
| 14:08:44 | | Dada quits [Remote host closed the connection] |
| 14:10:37 | | Dada joins |
| 14:30:09 | | etnguyen03 quits [Client Quit] |
| 14:31:08 | | etnguyen03 (etnguyen03) joins |
| 14:34:51 | | chrismrtn quits [Quit: leaving] |
| 14:41:01 | <kiska> | pokechu22: It simply says that the general publication will cease on 17th Jan 2026 and that breaking news will continue until the end of the month |
| 14:43:46 | <kiska> | Or well that is my interpretation. The translation is: "Ming Pao Canada website will cease updates on January 17, 2026. Breaking news will continue to be updated until January 31, 2026. We sincerely thank everyone for 33 years of support!" |
| 15:03:03 | | chrismrtn (chrismrtn) joins |
| 15:22:02 | | pseudorizer (pseudorizer) joins |
| 16:13:14 | | Wohlstand (Wohlstand) joins |
| 16:48:30 | | chrismrtn quits [Client Quit] |
| 16:58:53 | | twiswist quits [Read error: Connection reset by peer] |
| 16:59:56 | | twiswist (twiswist) joins |
| 17:00:29 | | chrismrtn (chrismrtn) joins |
| 17:05:43 | <h2ibot> | Klea edited Bugzilla (+23, Add bugs.sysrq.in): https://wiki.archiveteam.org/?diff=60338&oldid=57630 |
| 17:20:43 | | Dada quits [Remote host closed the connection] |
| 17:22:33 | | Wohlstand quits [Client Quit] |
| 17:26:47 | | BornOn420 quits [Read error: Connection reset by peer] |
| 17:28:13 | | Webuser220730 joins |
| 17:28:20 | | Webuser220730 quits [Client Quit] |
| 17:34:47 | <h2ibot> | Manu edited Discourse/archived (+95, HP_Archivist queued forum.rclone.org): https://wiki.archiveteam.org/?diff=60339&oldid=60325 |
| 17:37:44 | | Webuser482213 joins |
| 17:39:26 | <klea> | I've noticed of urls like https://abs-0.twimg.com/emoji/v2/svg/1f308.svg in that twitter thing, should we grab all twimg.com/emoji/v2/svg/ urls? |
| 17:39:30 | | Island joins |
| 17:40:12 | | BornOn420 (BornOn420) joins |
| 17:40:28 | <klea> | a slightly cleaned up version: <https://transfer.archivete.am/x50jY/urls-from-twitter-jobs-for-PC.txt> |
| 17:40:28 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/x50jY/urls-from-twitter-jobs-for-PC.txt> |
| 17:42:17 | <justauser> | I think the whole Twitter emoji set is published somewhere... |
| 17:46:41 | <PC> | it's on github (Twemoji) |
| 17:46:59 | | sec^nd quits [Remote host closed the connection] |
| 17:47:39 | <PC> | (also, thank you klea!) |
| 17:47:48 | <h2ibot> | Justauser edited GeoCities Japan (+45, Added data to the infobox): https://wiki.archiveteam.org/?diff=60340&oldid=58056 |
| 17:47:55 | | nicolas17_ (nicolas17) joins |
| 17:48:09 | | sec^nd (second) joins |
| 17:48:49 | <h2ibot> | Justauser edited GeoCities (+187, /* External links */ Updates): https://wiki.archiveteam.org/?diff=60341&oldid=58439 |
| 17:50:31 | | nicolas17 quits [Ping timeout: 272 seconds] |
| 17:57:50 | <h2ibot> | Justauser edited FortuneCity (+185, Added link to another mirror): https://wiki.archiveteam.org/?diff=60342&oldid=59211 |
| 18:03:19 | | second (second) joins |
| 18:03:33 | | sec^nd quits [Remote host closed the connection] |
| 18:03:34 | | second is now known as sec^nd |
| 18:12:47 | | sg72 joins |
| 18:13:19 | | sg-72 quits [Ping timeout: 272 seconds] |
| 18:30:55 | | etnguyen03 quits [Quit: Konversation terminated!] |