| 00:24:25 | | irisfreckles13 quits [Read error: Connection reset by peer] |
| 00:38:07 | | etnguyen03 quits [Client Quit] |
| 00:47:47 | | RadRooster joins |
| 00:50:46 | <pokechu22> | klea: that bucket is 41 TB |
| 00:51:32 | <RadRooster> | Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions? |
| 00:52:21 | <pokechu22> | RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load |
| 00:53:17 | <RadRooster> | Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently |
| 00:54:46 | <RadRooster> | Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways. |
| 01:02:34 | | BennyOtt quits [Ping timeout: 256 seconds] |
| 01:08:41 | | etnguyen03 (etnguyen03) joins |
| 01:15:44 | | @imer quits [Killed (NickServ (GHOST command used by imer5))] |
| 01:15:56 | | Sk1d quits [Read error: Connection reset by peer] |
| 01:15:58 | | imer (imer) joins |
| 01:15:58 | | @ChanServ sets mode: +o imer |
| 01:19:37 | <katia> | pokechu22: biggest single item is the only bottleneck though |
| 01:19:46 | <katia> | s/item/file/ |
| 01:20:12 | <pokechu22> | I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting |
| 01:23:24 | | lukash984 quits [Quit: The Lounge - https://thelounge.chat] |
| 01:25:19 | | lukash984 joins |
| 01:38:27 | | Wohlstand quits [Client Quit] |
| 01:38:48 | | Mateon1 quits [Quit: Mateon1] |
| 01:50:35 | | lukash984 quits [Client Quit] |
| 01:51:57 | | lukash984 joins |
| 01:58:43 | | linuxgemini8 (linuxgemini) joins |
| 01:59:48 | | linuxgemini quits [Ping timeout: 256 seconds] |
| 01:59:48 | | linuxgemini8 is now known as linuxgemini |
| 02:06:32 | | etnguyen03 quits [Client Quit] |
| 02:08:01 | | Mateon1 joins |
| 02:34:43 | | ducky quits [Ping timeout: 272 seconds] |
| 02:53:57 | <nicolas17> | what bucket |
| 02:54:03 | <nicolas17> | oh creatorspace-public |
| 02:54:20 | <nicolas17> | is that from bento.me? |
| 02:56:16 | <nicolas17> | I tried "rclone ncdu" and it OOM'd less than a minute in :D |
| 03:01:26 | <nicolas17> | two 100MB+ gifs x_x |
| 03:03:19 | <nicolas17> | weird |
| 03:03:26 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg |
| 03:03:27 | <nicolas17> | https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg |
| 03:03:29 | <nicolas17> | same file |
| 03:03:48 | <nicolas17> | the base64 in the GCS filename is the behance URL |
| 03:03:52 | <nicolas17> | so this seems like a cache? |
| 03:04:22 | | ducky (ducky) joins |
| 03:04:48 | | nicolas17 .oO if only we had that deduplicating archival thing we talked about |
| 03:08:38 | | etnguyen03 (etnguyen03) joins |
| 03:24:45 | | nepeat quits [Ping timeout: 272 seconds] |
| 03:25:40 | | nepeat (nepeat) joins |
| 03:36:09 | <nicolas17> | yeah ok I don't think we care to archive the entire bucket |
| 03:37:07 | <nicolas17> | https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg |
| 03:37:13 | <nicolas17> | we'll get the image when we archive the profile |
| 03:37:55 | <nicolas17> | I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere |
| 03:37:55 | <pokechu22> | bento.me doesn't have a sitemap though so it's not clear if we could discover all of those |
| 03:38:14 | <nicolas17> | so now the bucket has thousands and thousands of images that won't actually show up in any profile |
| 03:39:15 | <pokechu22> | FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there |
| 03:40:49 | <nicolas17> | 15GB damn |
| 03:43:35 | <nicolas17> | ok we can enumerate all users based on the bucket contents |
| 03:43:39 | <nicolas17> | but I think there's 832996 users |
| 03:44:40 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82 |
| 03:47:13 | <nicolas17> | oh my smol VPS will run out of disk space before I get all 15GBs |
| 03:47:35 | <pokechu22> | https://storage.googleapis.com/creatorspace-public/?prefix=og |
| 03:47:53 | <nicolas17> | yeah I did get all of og |
| 03:49:43 | <nicolas17> | I could sacrifice my VPS IP and figure out if there's API rate limits :p |
| 03:50:41 | <pokechu22> | The AB job was also getting some 403s (though not in the API), so that's probably worth checking |
| 03:50:52 | <pokechu22> | (though not in the API == we didn't try the API) |
| 03:51:00 | <nicolas17> | AB job starting at bento.me homepage? |
| 03:51:27 | <pokechu22> | Yes |
| 03:54:32 | <nicolas17> | many of these IDs return 404 not found |
| 04:00:45 | <nicolas17> | got some gateway timeouts that succeeded on wget's automatic retry |
| 04:01:28 | <nicolas17> | 465/700 IDs succeeded |
| 04:02:36 | <nicolas17> | wonder if I should try making my first wget-lua script... |
| 04:04:23 | <nicolas17> | if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket |
| 04:10:03 | <nicolas17> | 2099/3124 IDs succeeded |
| 04:15:03 | | tripleshrimp joins |
| 04:20:56 | | etnguyen03 quits [Client Quit] |
| 04:25:04 | | etnguyen03 (etnguyen03) joins |
| 04:28:40 | <nicolas17> | 5619/8331 IDs succeeded, 28m25s |
| 04:44:47 | | Arcorann joins |
| 05:01:21 | <Doranwen> | Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/ |
| 05:01:26 | | etnguyen03 quits [Remote host closed the connection] |
| 05:03:01 | | DopefishJustin quits [Remote host closed the connection] |
| 05:04:07 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:04:32 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:06:06 | | n9nes joins |
| 05:15:28 | | DopefishJustin joins |
| 05:15:28 | | DopefishJustin is now authenticated as DopefishJustin |
| 05:28:17 | | Webuser164222 joins |
| 05:28:28 | | Webuser164222 quits [Client Quit] |
| 05:46:58 | | v01d quits [Remote host closed the connection] |
| 06:21:10 | | Justin[home] joins |
| 06:21:10 | | Justin[home] is now authenticated as DopefishJustin |
| 06:21:15 | | DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))] |
| 06:21:18 | | Justin[home] is now known as DopefishJustin |
| 06:28:50 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:49:41 | | BennyOtt (BennyOtt) joins |
| 07:42:10 | | Island quits [Read error: Connection reset by peer] |
| 07:46:14 | <cruller> | nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes). |
| 07:47:21 | <cruller> | pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB). |
| 07:47:46 | <cruller> | Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr |
| 07:51:09 | <cruller> | I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f |
| 07:53:24 | <h2ibot> | Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619 |
| 07:54:24 | <h2ibot> | SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231 |
| 07:56:34 | | Webuser270860 joins |
| 07:56:39 | | Webuser270860 quits [Client Quit] |
| 08:04:21 | <cruller> | Or rather, this is also a kind of “shaking.” |
| 08:04:35 | <cruller> | * shucking |
| 08:06:06 | <cruller> | I think the advantage of "shucking" is deduplication when multiple warc files contain the same content. |
| 08:17:48 | | rohvani quits [Quit: Ping timeout (120 seconds)] |
| 08:17:59 | | rohvani joins |
| 08:26:25 | | hexagonwin_ joins |
| 08:27:29 | | hexagonwin quits [Ping timeout: 272 seconds] |
| 10:37:50 | <klea> | that's too big I suppose? |
| 10:38:25 | | rohvani quits [Client Quit] |
| 10:38:34 | | rohvani joins |
| 10:40:55 | | oxtyped quits [Read error: Connection reset by peer] |
| 10:49:27 | | oxtyped joins |
| 10:54:46 | | pabs quits [Read error: Connection reset by peer] |
| 10:55:35 | | pabs (pabs) joins |
| 10:58:08 | | rohvani quits [Ping timeout: 256 seconds] |
| 11:19:02 | | oxtyped quits [Read error: Connection reset by peer] |
| 11:19:04 | | oxtyped joins |
| 11:27:49 | | Kotomind joins |
| 11:33:03 | | anarcat quits [Ping timeout: 272 seconds] |
| 11:38:09 | | anarcat (anarcat) joins |
| 12:00:06 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:51 | | Bleo182600722719623455222 joins |
| 12:10:10 | | starg2|m joins |
| 12:25:54 | | twiswist quits [Read error: Connection reset by peer] |
| 12:26:14 | | twiswist (twiswist) joins |
| 12:31:51 | | HP_Archivist quits [Quit: Leaving] |
| 12:32:06 | | HP_Archivist (HP_Archivist) joins |