| 00:24:25 | | irisfreckles13 quits [Read error: Connection reset by peer] |
| 00:38:07 | | etnguyen03 quits [Client Quit] |
| 00:47:47 | | RadRooster joins |
| 00:50:46 | <pokechu22> | klea: that bucket is 41 TB |
| 00:51:32 | <RadRooster> | Hey I'm trying to help someone backup a ton of intros/outros from turner classic movies on a DVR website. They're tied to longer movies and don't start/end right on time most of the time. Right now I'm just screen recording them, anyone have any better suggestions? |
| 00:52:21 | <pokechu22> | RadRooster: you might want to check the network tab in the browser console (F12) to see what's happening when they load |
| 00:53:17 | <RadRooster> | Yeah they're DRM protected and I haven't been able to crack it yet. But I'm not sure downloading so many movies and mass cutting them would even take less time tbh since they don't start/end consistently |
| 00:54:46 | <RadRooster> | Seems like I'd have to wait for them all to download if I don't get banned first, then go back in and cut them manually anyways. |
| 01:02:34 | | BennyOtt quits [Ping timeout: 256 seconds] |
| 01:08:41 | | etnguyen03 (etnguyen03) joins |
| 01:15:44 | | @imer quits [Killed (NickServ (GHOST command used by imer5))] |
| 01:15:56 | | Sk1d quits [Read error: Connection reset by peer] |
| 01:15:58 | | imer (imer) joins |
| 01:15:58 | | @ChanServ sets mode: +o imer |
| 01:19:37 | <katia> | pokechu22: biggest single item is the only bottleneck though |
| 01:19:46 | <katia> | s/item/file/ |
| 01:20:12 | <pokechu22> | I'm just going to save the marker= list since I don't think most of what's in the bucket is interesting |
| 01:23:24 | | lukash984 quits [Quit: The Lounge - https://thelounge.chat] |
| 01:25:19 | | lukash984 joins |
| 01:38:27 | | Wohlstand quits [Client Quit] |
| 01:38:48 | | Mateon1 quits [Quit: Mateon1] |
| 01:50:35 | | lukash984 quits [Client Quit] |
| 01:51:57 | | lukash984 joins |
| 01:58:43 | | linuxgemini8 (linuxgemini) joins |
| 01:59:48 | | linuxgemini quits [Ping timeout: 256 seconds] |
| 01:59:48 | | linuxgemini8 is now known as linuxgemini |
| 02:06:32 | | etnguyen03 quits [Client Quit] |
| 02:08:01 | | Mateon1 joins |
| 02:34:43 | | ducky quits [Ping timeout: 272 seconds] |
| 02:53:57 | <nicolas17> | what bucket |
| 02:54:03 | <nicolas17> | oh creatorspace-public |
| 02:54:20 | <nicolas17> | is that from bento.me? |
| 02:56:16 | <nicolas17> | I tried "rclone ncdu" and it OOM'd less than a minute in :D |
| 03:01:26 | <nicolas17> | two 100MB+ gifs x_x |
| 03:03:19 | <nicolas17> | weird |
| 03:03:26 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/richdata/behance/posts/aHR0cHM6Ly9taXItczMtY2RuLWNmLmJlaGFuY2UubmV0L3Byb2plY3RzLzQwNC85MGU2MTA3MDI1NTIyOS5ZM0p2Y0N3eE5qRTJMREV5TmpRc01Dd3cuanBn.jpeg |
| 03:03:27 | <nicolas17> | https://mir-s3-cdn-cf.behance.net/projects/404/90e61070255229.Y3JvcCwxNjE2LDEyNjQsMCww.jpg |
| 03:03:29 | <nicolas17> | same file |
| 03:03:48 | <nicolas17> | the base64 in the GCS filename is the behance URL |
| 03:03:52 | <nicolas17> | so this seems like a cache? |
| 03:04:22 | | ducky (ducky) joins |
| 03:04:48 | | nicolas17 .oO if only we had that deduplicating archival thing we talked about |
| 03:08:38 | | etnguyen03 (etnguyen03) joins |
| 03:24:45 | | nepeat quits [Ping timeout: 272 seconds] |
| 03:25:40 | | nepeat (nepeat) joins |
| 03:36:09 | <nicolas17> | yeah ok I don't think we care to archive the entire bucket |
| 03:37:07 | <nicolas17> | https://bento.me/josh has an <img> with https://storage.googleapis.com/creatorspace-public/sites%2Fogimages%2FaHR0cHM6Ly9jZG4uc2FuaXR5LmlvL2ltYWdlcy81OTlyNmh0Yy9yZWdpb25hbGl6ZWQvMDlmMDk3YzU3MDZjZjEyNGZhMTY3NmQ1MzhhMTE0ZmZhOTg2MWM4YS0yNDAweDEyNTYucG5nP3c9MTIwMCZxPTcwJmZpdD1tYXgmYXV0bz1mb3JtYXQ%3D.jpeg |
| 03:37:13 | <nicolas17> | we'll get the image when we archive the profile |
| 03:37:55 | <nicolas17> | I think if someone embeds their instagram in bento, it shows their latest instagram post, and if there's a new post, the old one stays in the bucket but isn't actually shown anywhere |
| 03:37:55 | <pokechu22> | bento.me doesn't have a sitemap though so it's not clear if we could discover all of those |
| 03:38:14 | <nicolas17> | so now the bucket has thousands and thousands of images that won't actually show up in any profile |
| 03:39:15 | <pokechu22> | FWIW I deleted the 15GB of URLs I downloaded, but the archivebot job listing it finished so the same data is available there |
| 03:40:49 | <nicolas17> | 15GB damn |
| 03:43:35 | <nicolas17> | ok we can enumerate all users based on the bucket contents |
| 03:43:39 | <nicolas17> | but I think there's 832996 users |
| 03:44:40 | <nicolas17> | https://storage.googleapis.com/creatorspace-public/og/clcnhp1wp06fwjp0ykolymk82/og-default.png -> https://api.bento.me/v1/users/clcnhp1wp06fwjp0ykolymk82 |
| 03:47:13 | <nicolas17> | oh my smol VPS will run out of disk space before I get all 15GBs |
| 03:47:35 | <pokechu22> | https://storage.googleapis.com/creatorspace-public/?prefix=og |
| 03:47:53 | <nicolas17> | yeah I did get all of og |
| 03:49:43 | <nicolas17> | I could sacrifice my VPS IP and figure out if there's API rate limits :p |
| 03:50:41 | <pokechu22> | The AB job was also getting some 403s (though not in the API), so that's probably worth checking |
| 03:50:52 | <pokechu22> | (though not in the API == we didn't try the API) |
| 03:51:00 | <nicolas17> | AB job starting at bento.me homepage? |
| 03:51:27 | <pokechu22> | Yes |
| 03:54:32 | <nicolas17> | many of these IDs return 404 not found |
| 04:00:45 | <nicolas17> | got some gateway timeouts that succeeded on wget's automatic retry |
| 04:01:28 | <nicolas17> | 465/700 IDs succeeded |
| 04:02:36 | <nicolas17> | wonder if I should try making my first wget-lua script... |
| 04:04:23 | <nicolas17> | if we do DPoS we don't need to scrape the API ahead of time, just make items for the user IDs collected from the bucket |
| 04:10:03 | <nicolas17> | 2099/3124 IDs succeeded |
| 04:15:03 | | tripleshrimp joins |
| 04:20:56 | | etnguyen03 quits [Client Quit] |
| 04:25:04 | | etnguyen03 (etnguyen03) joins |
| 04:28:40 | <nicolas17> | 5619/8331 IDs succeeded, 28m25s |
| 04:44:47 | | Arcorann joins |
| 05:01:21 | <Doranwen> | Wonder if this means bad news for some Vimeo videos - got a friend eyeing it slightly worried: https://techcrunch.com/2026/01/25/what-is-bending-spoons-everything-to-know-about-aols-acquirer/ |
| 05:01:26 | | etnguyen03 quits [Remote host closed the connection] |
| 05:03:01 | | DopefishJustin quits [Remote host closed the connection] |
| 05:04:07 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:04:32 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:06:06 | | n9nes joins |
| 05:15:28 | | DopefishJustin joins |
| 05:15:28 | | DopefishJustin is now authenticated as DopefishJustin |
| 05:28:17 | | Webuser164222 joins |
| 05:28:28 | | Webuser164222 quits [Client Quit] |
| 05:46:58 | | v01d quits [Remote host closed the connection] |
| 06:21:10 | | Justin[home] joins |
| 06:21:10 | | Justin[home] is now authenticated as DopefishJustin |
| 06:21:15 | | DopefishJustin quits [Killed (NickServ (GHOST command used by Justin[home]))] |
| 06:21:18 | | Justin[home] is now known as DopefishJustin |
| 06:28:50 | | nexussfan quits [Quit: Konversation terminated!] |
| 06:49:41 | | BennyOtt (BennyOtt) joins |
| 07:42:10 | | Island quits [Read error: Connection reset by peer] |
| 07:46:14 | <cruller> | nicolas17: (re: hytale) Running `hdiffz pwr_dir/ pwr.warc output.hdiff` unsurprisingly gave me a tiny file (13640 bytes). |
| 07:47:21 | <cruller> | pwr.warc (3GB) can be regenerated from 0/3.pwr (1.5GB), your patch (20MB), and output.hdiff (14KB). |
| 07:47:46 | <cruller> | Note: Both pwr_dir/ and pwr.warc contain 0/3.pwr and 0/4.pwr |
| 07:51:09 | <cruller> | I wonder if this could be another solution for https://irclogs.archivete.am/archiveteam-bs/2025-10-26#l82865c9f |
| 07:53:24 | <h2ibot> | Pez edited Restoring (+8, Previous project is no longer maintained): https://wiki.archiveteam.org/?diff=60335&oldid=59619 |
| 07:54:24 | <h2ibot> | SydWikiAccount edited List of websites excluded from the Wayback Machine (+103): https://wiki.archiveteam.org/?diff=60337&oldid=60231 |
| 07:56:34 | | Webuser270860 joins |
| 07:56:39 | | Webuser270860 quits [Client Quit] |
| 08:04:21 | <cruller> | Or rather, this is also a kind of “shaking.” |
| 08:04:35 | <cruller> | * shucking |
| 08:06:06 | <cruller> | I think the advantage of "shucking" is deduplication when multiple warc files contain the same content. |
| 08:17:48 | | rohvani quits [Quit: Ping timeout (120 seconds)] |
| 08:17:59 | | rohvani joins |
| 08:26:25 | | hexagonwin_ joins |
| 08:27:29 | | hexagonwin quits [Ping timeout: 272 seconds] |
| 10:37:50 | <klea> | that's too big I suppose? |
| 10:38:25 | | rohvani quits [Client Quit] |
| 10:38:34 | | rohvani joins |
| 10:40:55 | | oxtyped quits [Read error: Connection reset by peer] |
| 10:49:27 | | oxtyped joins |
| 10:54:46 | | pabs quits [Read error: Connection reset by peer] |
| 10:55:35 | | pabs (pabs) joins |
| 10:58:08 | | rohvani quits [Ping timeout: 256 seconds] |
| 11:19:02 | | oxtyped quits [Read error: Connection reset by peer] |
| 11:19:04 | | oxtyped joins |
| 11:27:49 | | Kotomind joins |
| 11:33:03 | | anarcat quits [Ping timeout: 272 seconds] |
| 11:38:09 | | anarcat (anarcat) joins |
| 12:00:06 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:51 | | Bleo182600722719623455222 joins |
| 12:10:10 | | starg2|m joins |
| 12:25:54 | | twiswist quits [Read error: Connection reset by peer] |
| 12:26:14 | | twiswist (twiswist) joins |
| 12:31:51 | | HP_Archivist quits [Quit: Leaving] |
| 12:32:06 | | HP_Archivist (HP_Archivist) joins |
| 12:49:14 | <justauser> | Doranwen: We already know about Vimeo troubles but don't have a plan yet. |
| 12:50:27 | <@arkiver> | thanks Doranwen |
| 12:57:52 | | etnguyen03 (etnguyen03) joins |
| 13:04:41 | | Webuser243832 joins |
| 13:05:26 | | Webuser2438324 joins |
| 13:05:55 | | Webuser243832 quits [Client Quit] |
| 13:05:55 | | Webuser2438324 quits [Client Quit] |
| 13:16:58 | | Arcorann quits [Ping timeout: 256 seconds] |
| 13:22:37 | | Kotomind quits [Ping timeout: 272 seconds] |
| 13:27:17 | | Webuser183385 joins |
| 13:27:35 | | Ointment8862 quits [Quit: Lost terminal] |
| 13:27:49 | | Webuser183385 quits [Client Quit] |
| 14:06:19 | | TunaLobster quits [Ping timeout: 272 seconds] |
| 14:07:37 | | Dada joins |
| 14:08:44 | | Dada quits [Remote host closed the connection] |
| 14:10:37 | | Dada joins |
| 14:30:09 | | etnguyen03 quits [Client Quit] |
| 14:31:08 | | etnguyen03 (etnguyen03) joins |
| 14:34:51 | | chrismrtn quits [Quit: leaving] |
| 14:41:01 | <kiska> | pokechu22: It simply says that the general publication will cease on 17th Jan 2026 and that breaking news will continue until the end of the month |
| 14:43:46 | <kiska> | Or well that is my interpretation. The translation is: "Ming Pao Canada website will cease updates on January 17, 2026. Breaking news will continue to be updated until January 31, 2026. We sincerely thank everyone for 33 years of support!" |
| 15:03:03 | | chrismrtn (chrismrtn) joins |
| 15:22:02 | | pseudorizer (pseudorizer) joins |
| 16:13:14 | | Wohlstand (Wohlstand) joins |
| 16:48:30 | | chrismrtn quits [Client Quit] |
| 16:58:53 | | twiswist quits [Read error: Connection reset by peer] |
| 16:59:56 | | twiswist (twiswist) joins |
| 17:00:29 | | chrismrtn (chrismrtn) joins |
| 17:05:43 | <h2ibot> | Klea edited Bugzilla (+23, Add bugs.sysrq.in): https://wiki.archiveteam.org/?diff=60338&oldid=57630 |
| 17:20:43 | | Dada quits [Remote host closed the connection] |
| 17:22:33 | | Wohlstand quits [Client Quit] |
| 17:26:47 | | BornOn420 quits [Read error: Connection reset by peer] |
| 17:28:13 | | Webuser220730 joins |
| 17:28:20 | | Webuser220730 quits [Client Quit] |
| 17:34:47 | <h2ibot> | Manu edited Discourse/archived (+95, HP_Archivist queued forum.rclone.org): https://wiki.archiveteam.org/?diff=60339&oldid=60325 |
| 17:37:44 | | Webuser482213 joins |
| 17:39:26 | <klea> | I've noticed of urls like https://abs-0.twimg.com/emoji/v2/svg/1f308.svg in that twitter thing, should we grab all twimg.com/emoji/v2/svg/ urls? |
| 17:39:30 | | Island joins |
| 17:40:12 | | BornOn420 (BornOn420) joins |
| 17:40:28 | <klea> | a slightly cleaned up version: <https://transfer.archivete.am/x50jY/urls-from-twitter-jobs-for-PC.txt> |
| 17:40:28 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/x50jY/urls-from-twitter-jobs-for-PC.txt> |
| 17:42:17 | <justauser> | I think the whole Twitter emoji set is published somewhere... |
| 17:46:41 | <PC> | it's on github (Twemoji) |
| 17:46:59 | | sec^nd quits [Remote host closed the connection] |
| 17:47:39 | <PC> | (also, thank you klea!) |
| 17:47:48 | <h2ibot> | Justauser edited GeoCities Japan (+45, Added data to the infobox): https://wiki.archiveteam.org/?diff=60340&oldid=58056 |
| 17:47:55 | | nicolas17_ (nicolas17) joins |
| 17:48:09 | | sec^nd (second) joins |
| 17:48:49 | <h2ibot> | Justauser edited GeoCities (+187, /* External links */ Updates): https://wiki.archiveteam.org/?diff=60341&oldid=58439 |
| 17:50:31 | | nicolas17 quits [Ping timeout: 272 seconds] |
| 17:57:50 | <h2ibot> | Justauser edited FortuneCity (+185, Added link to another mirror): https://wiki.archiveteam.org/?diff=60342&oldid=59211 |
| 18:03:19 | | second (second) joins |
| 18:03:33 | | sec^nd quits [Remote host closed the connection] |
| 18:03:34 | | second is now known as sec^nd |
| 18:12:47 | | sg72 joins |
| 18:13:19 | | sg-72 quits [Ping timeout: 272 seconds] |
| 18:30:55 | | etnguyen03 quits [Quit: Konversation terminated!] |
| 18:44:59 | | etnguyen03 (etnguyen03) joins |
| 18:51:00 | | Wohlstand (Wohlstand) joins |
| 19:15:03 | | emily (pseudorizer) joins |
| 19:15:06 | | pseudorizer quits [Ping timeout: 256 seconds] |
| 19:17:10 | | Ryz quits [Read error: Connection reset by peer] |
| 19:18:48 | | Ryz (Ryz) joins |
| 19:19:11 | | @dxrt quits [Ping timeout: 272 seconds] |
| 19:19:38 | | dxrt joins |
| 19:19:40 | | dxrt is now authenticated as dxrt |
| 19:19:40 | | dxrt quits [Changing host] |
| 19:19:40 | | dxrt (dxrt) joins |
| 19:19:40 | | @ChanServ sets mode: +o dxrt |
| 19:19:49 | | ell7 quits [Ping timeout: 272 seconds] |
| 19:20:03 | | irisfreckles13 joins |
| 19:32:24 | <klea> | PC: you're welcome, if you want make a crib and modify the urls however you want, else tell me what to do with them, and i can make a new url list for AB |
| 19:33:14 | | LddPotato_ joins |
| 19:34:29 | <PC> | crib? |
| 19:36:01 | <PC> | i think for the images, would just be good to get the orig versions perhaps if those haven't been gotten already. but the actual largest images should already be live on the WBM under either large or 4096x4096 urls |
| 19:36:08 | <h2ibot> | KleaBot made 2 bot changes: https://wiki.archiveteam.org/index.php?title=Special:Contributions/KleaBot&offset=20260126193551&limit=2&namespace=2&wpfilters[]=nsInvert&wpfilters[]=associated |
| 19:36:08 | <PC> | so may not matter really |
| 19:36:17 | | LddPotato quits [Ping timeout: 272 seconds] |
| 19:36:26 | | LddPotato_ is now known as LddPotato |
| 19:37:59 | <PC> | (anyone else getting a lot of 503s on the WBM today? i don't think it's my IP since i'm getting them from another IP address too, just wondering what's up) |
| 19:40:37 | <pokechu22> | Same |
| 19:42:25 | <hexagonwin_> | same, it's *very* slow |
| 19:42:44 | <PC> | RIP, yeah i've never seen it this slow before. hope their servers are okay |
| 19:43:16 | <klea> | PC: yes, as in filter whichever urls you want, modify the existing urls however you want. |
| 19:43:38 | <PC> | sure! i'll put something together |
| 19:46:30 | <IDK> | Hi, is there plans for uploading data from #down-the-tube for late november/early december? During some misconfiguration |
| 19:47:18 | <IDK> | I was looking for some video that was persumably downloaded somewhere back then, no pressure tho! |
| 20:06:59 | <c3manu> | pabs: do you think throwing the flickr 429s into #// would be a good alternative to save them despite the rate-limiting? or would that just get people blocked longterm? |
| 20:07:43 | <c3manu> | like this i gotta watch out for every other job that might link heavily to flickr in case it messes with one of my requeue lists |
| 20:21:51 | <PC> | klea: almost done, but while doing this i found another URL that's not on the WBM, would appreciate getting it crawled too <3 https://twitter.com/YuGiOh_OCG_INFO/status/1560589899769708546 |
| 20:21:52 | <eggdrop> | nitter: https://nitter.net/YuGiOh_OCG_INFO/status/1560589899769708546 |
| 20:23:34 | <PC> | actually, maybe this one too? it's hard to tell with the WBM on the fritz but i guess better safe than sorry https://twitter.com/YuGiOh_OCG_INFO/status/1750851075840581816 |
| 20:23:35 | <eggdrop> | nitter: https://nitter.net/YuGiOh_OCG_INFO/status/1750851075840581816 |
| 20:23:35 | | DogsRNice joins |
| 20:23:57 | <PC> | (and there was also https://twitter.com/zunkome2/status/1713581981420618067, just in case that was missed) |
| 20:23:58 | <eggdrop> | nitter: https://nitter.net/zunkome2/status/1713581981420618067 |
| 20:24:19 | <klea> | PC: btw join #jseater :) |
| 20:24:29 | <PC> | o7 |
| 21:35:39 | | Chris5010 quits [Quit: ] |
| 21:42:52 | | Chris5010 (Chris5010) joins |
| 21:53:46 | | irisfreckles13 quits [Ping timeout: 256 seconds] |
| 21:59:22 | <nicolas17_> | IDK: I didn't know we had an upload backlog in youtube |
| 22:18:30 | <h2ibot> | Cooljeanius edited Twitter (+9, /* Archives */ use "as of" template): https://wiki.archiveteam.org/?diff=60346&oldid=60333 |
| 22:21:22 | | nicolas17_ is now known as nicolas17 |
| 22:21:31 | <h2ibot> | Cooljeanius edited Twitter (+104, /* Backup Tools */ misc. updates): https://wiki.archiveteam.org/?diff=60347&oldid=60346 |
| 22:27:47 | | Webuser587265 joins |
| 22:27:57 | | Webuser587265 quits [Client Quit] |
| 22:40:34 | <IDK> | I checked back on some of the videos that I completed this december, and they have not been indexed yet, not sure if uploaded |
| 22:58:53 | | HP_Archivist quits [Quit: Leaving] |
| 23:01:31 | | nexussfan (nexussfan) joins |
| 23:03:18 | | HP_Archivist (HP_Archivist) joins |
| 23:04:49 | | SootBector quits [Remote host closed the connection] |
| 23:05:57 | | SootBector (SootBector) joins |
| 23:12:12 | | PC quits [Quit: PC] |
| 23:12:36 | | irisfreckles13 joins |
| 23:18:04 | | PC joins |
| 23:18:54 | | Arcorann joins |
| 23:21:32 | | Mateon1 quits [Remote host closed the connection] |
| 23:22:17 | | Mateon1 joins |
| 23:27:20 | | Mateon1 quits [Client Quit] |
| 23:27:29 | | Mateon1 joins |
| 23:30:51 | | Mateon1 quits [Remote host closed the connection] |
| 23:31:39 | | Mateon1 joins |
| 23:36:43 | | Webuser463996 joins |
| 23:37:09 | | Webuser463996 quits [Client Quit] |
| 23:45:27 | <@JAA> | Hmm, why still []? |
| 23:54:43 | <Yakov> | Rip raknet its hosting, http://www.raknet.com/ along with its forum seems to be gone |
| 23:54:47 | <Yakov> | > The hosting account for www.raknet.com expired. |
| 23:57:45 | <pokechu22> | I did an archivebot job for it early December - the site was up but the forum was broken by then |