| 02:22:14 | | BearFortress_ joins |
| 02:24:27 | | BearFortress quits [Ping timeout: 272 seconds] |
| 02:25:05 | | nicolas17 quits [Ping timeout: 272 seconds] |
| 02:25:20 | | nicolas17 (nicolas17) joins |
| 02:27:22 | | BearFortress joins |
| 02:31:25 | | BearFortress_ quits [Ping timeout: 272 seconds] |
| 02:54:10 | | tzt quits [Read error: Connection reset by peer] |
| 02:55:05 | | tzt (tzt) joins |
| 06:17:58 | | BearFortress_ joins |
| 06:19:26 | | BearFortress__ joins |
| 06:20:19 | | BearFortress___ joins |
| 06:21:57 | | BearFortress quits [Ping timeout: 272 seconds] |
| 06:23:13 | | Pedrosso quits [Ping timeout: 272 seconds] |
| 06:23:13 | | balrog quits [Ping timeout: 272 seconds] |
| 06:23:20 | | nicolas17 quits [Remote host closed the connection] |
| 06:23:44 | | nicolas17 (nicolas17) joins |
| 06:23:47 | | Pedrosso joins |
| 06:23:51 | | BearFortress_ quits [Ping timeout: 272 seconds] |
| 06:24:30 | | BearFortress__ quits [Ping timeout: 272 seconds] |
| 06:27:15 | | balrog (balrog) joins |
| 06:49:06 | | DogsRNice quits [Read error: Connection reset by peer] |
| 07:27:49 | | zhongfu quits [Ping timeout: 272 seconds] |
| 07:31:28 | | Sluggs quits [Ping timeout: 256 seconds] |
| 07:33:19 | | zhongfu (zhongfu) joins |
| 09:50:46 | | tzt quits [Quit: tzt] |
| 09:50:59 | | tzt (tzt) joins |
| 10:01:43 | | X-Scale quits [Ping timeout: 272 seconds] |
| 10:13:25 | | zhongfu_ (zhongfu) joins |
| 10:13:30 | | zhongfu quits [Read error: Connection reset by peer] |
| 11:08:31 | | X-Scale joins |
| 11:44:51 | <datechnoman> | Hey All. Got a simple one. What's the most efficient way to query the WBM CDX for all URLs for a specific site. For example lets use imgur. "https://web.archive.org/cdx/search/cdx?url=https://imgur.com/gallery*&output=json&limit=2500&from=20250201&to=20250230&filter=statuscode:200&collapse=urlkey" |
| 11:45:21 | <datechnoman> | I could create a script that runs "chunks" of dates but its very slow to query. Is this the most efficient or should I be using the IA command line tool |
| 11:51:26 | <Jake> | datechnoman: I believe JAA's utility works rather well https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/ia-cdx-search |
| 12:39:19 | <justauser|m> | https://wiki.archiveteam.org/index.php/Site_exploration has a different snippet, but the same basic idea. I'll probably link the JAA one. |
| 12:52:06 | | AK quits [Quit: AK] |
| 15:47:19 | | Dango360 quits [Quit: The Lounge - https://thelounge.chat] |
| 15:54:11 | | Dango360 (Dango360) joins |
| 19:45:42 | | SootBector quits [Remote host closed the connection] |
| 19:46:30 | | SootBector (SootBector) joins |
| 20:51:06 | <@JAA> | datechnoman: I'd suggest dropping from/to and listing the whole range in one go. It's purely an output filter, so if you split it up, the server ends up reading the same underlying data over and over. |
| 20:52:35 | <@JAA> | `ia-cdx-search --concurrency 4 --tries 10 'url=https://imgur.com/gallery*&filter=statuscode:200&collapse=urlkey'` |
| 20:52:50 | <@JAA> | If you want only the URL, you can also add `fl=original`. |
| 20:54:29 | <@JAA> | Oh yeah, also, `collapse=urlkey` will collapse case collisions. You probably don't want that on Imgur. |
| 20:55:43 | <@JAA> | E.g. https://web.archive.org/cdx/search/cdx?url=https://i.imgur.com/fPIGA.* vs https://web.archive.org/cdx/search/cdx?url=https://i.imgur.com/fPIGA.*&collapse=urlkey |
| 21:37:45 | | zhongfu_ quits [Ping timeout: 272 seconds] |
| 21:39:19 | | zhongfu (zhongfu) joins |
| 21:47:42 | | DogsRNice joins |
| 22:22:06 | <datechnoman> | Thank you very much Jake, JAA and justauser|m |
| 22:22:11 | <datechnoman> | Exactly what im after :) |
| 23:28:34 | | DopefishJustin quits [Ping timeout: 256 seconds] |
| 23:29:53 | <pabs> | wow "The capture will start in ~9 hours, 34 minutes because our service is currently overloaded." |
| 23:30:19 | <@JAA> | Yeah, way down from yesterday :-P |
| 23:35:14 | <pabs> | huh, must have been quite some backlog |
| 23:35:31 | <pabs> | wonder if someone was spamming the save API or something |
| 23:50:13 | | DopefishJustin joins |
| 23:50:13 | | DopefishJustin is now authenticated as DopefishJustin |