05:04:57n9nes quits [Ping timeout: 272 seconds]
05:05:12n9nes joins
10:25:28jesterjunk quits [Read error: Connection reset by peer]
10:25:46jesterjunk joins
13:47:40kansei (kansei) joins
13:56:53Webuser2076318 joins
13:57:07<Webuser2076318>Hi, I was exploring UrlteamWebCrawls on archive.org and noticed public WARC uploads stopped around Aug 2025.
13:57:07<Webuser2076318>Are bulk releases paused or moved to a different dataset?
13:58:56<justauser>TL;DR a release is done once enough links are accumulated.
13:59:11<justauser>With goo.gl shutdown, the process slowed down a lot.
13:59:59<Webuser2076318>Thanks
14:00:49<justauser>You can look at the interactive tracker: it takes tens of seconds to minutes to get a single link.
14:09:29<Webuser2076318>Got it — thanks for the clarification! I was using the public UrlteamWebCrawls dumps for historical research and thought I was querying something incorrectly. Appreciate the explanation
14:11:30<Webuser2076318>One more question — do any ArchiveTeam public collections (besides URLTeam) end up capturing a lot of Google-hosted pages, like content on docs.google.com, sites.google.com, or blogspot? I’m mainly looking for historical preservation sources on archive.org.
14:19:07<justauser>URLTeam only captures the redirects themselves, but I think this data feeds URLs aka #//.
14:19:25<justauser>docs.google.com is hard to capture.
14:20:10<justauser>Blogspot has a dedicated project and should be covered well - if you find a website that isn't, visit #frogger.
14:22:12<justauser>For a list of what is archived, try interrogating archive.org CDX API.
14:29:16Webuser2076318 quits [Client Quit]
16:13:36zhongfu (zhongfu) joins
18:52:37MrAureliusR quits [Quit: ZNC - https://znc.in]
18:53:49MrAureliusR (MrAureliusR) joins
19:07:16zhongfu quits [Read error: Connection reset by peer]
19:07:19zhongfu_ (zhongfu) joins
22:32:29n9nes quits [Ping timeout: 272 seconds]
22:35:06n9nes joins
22:40:47n9nes quits [Ping timeout: 268 seconds]
22:41:17n9nes joins