| 02:52:22 | | pabs quits [Ping timeout: 265 seconds] |
| 02:55:39 | | pabs (pabs) joins |
| 05:26:50 | | pabs quits [Ping timeout: 252 seconds] |
| 05:27:50 | | pabs (pabs) joins |
| 06:03:47 | <Ryz> | Heya folks, uhh, is there a way that when checking something like https://web.archive.org/web/*/http://tournaments.peliliiga.fi/* - it would just simply fetch when the last time a link under the domain has been grabbed, I would just like to know when it was last grabbed; |
| 06:04:41 | <Ryz> | Asking this because it would make it a lot faster to cross-examine whether a website is alive or dead, determining doing ignores more easily in #archivebot |
| 06:04:47 | | Arcorann (Arcorann) joins |
| 06:07:20 | <audrooku|m> | web/2/{url}, though the cdx api may be better suited, there are docs on github |
| 06:24:33 | | SF quits [Ping timeout: 265 seconds] |
| 06:24:49 | | SF joins |
| 06:39:23 | <pokechu22> | https://web.archive.org/web/*/http://tournaments.peliliiga.fi/* is implemented using the CDX API (more or less; I think it uses an internal undocumented version that behaves the same) |
| 06:46:33 | <Ryz> | Just don't like having to wait for the whole list to be loaded, the ones that go up to 10,000 links |
| 06:46:40 | <Ryz> | (It used to be 100,000 links loaded actually) |
| 07:00:27 | <pokechu22> | Yeah, the same would apply to the CDX server more or less |
| 09:36:37 | | systwi_ joins |
| 10:26:35 | | DLoader quits [Ping timeout: 252 seconds] |
| 10:29:52 | | DLoader joins |
| 10:33:00 | | DLoader_ joins |
| 10:33:39 | | DLoader_ quits [Read error: Connection reset by peer] |
| 10:34:26 | | DLoader quits [Ping timeout: 265 seconds] |
| 10:35:07 | | DLoader_ joins |
| 10:35:07 | | DLoader_ is now known as DLoader |
| 12:11:10 | | DLoader quits [Read error: Connection reset by peer] |
| 12:12:18 | | threedeeitguy39 quits [Client Quit] |
| 12:25:38 | | DLoader joins |
| 12:57:50 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:25:57 | <Parchivist> | is there some way to search for a filename across all sites that are archived? not even a text search of all archived contents, just the filenames themselves |
| 13:39:09 | | threedeeitguy39 (threedeeitguy) joins |
| 14:29:30 | | DLoader_ joins |
| 14:30:47 | | DLoader quits [Ping timeout: 265 seconds] |
| 14:30:47 | | DLoader_ quits [Read error: Connection reset by peer] |
| 14:33:32 | | DLoader_ joins |
| 14:33:33 | | DLoader_ is now known as DLoader |
| 14:51:34 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 14:51:34 | | DLoader quits [Read error: Connection reset by peer] |
| 14:52:07 | | DLoader joins |
| 17:11:31 | | HP_Archivist (HP_Archivist) joins |
| 18:20:41 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 18:25:06 | | DLoader quits [Read error: Connection reset by peer] |
| 18:28:36 | <audrooku|m> | Nothing of the sort. There is no way to search any of the page contents even partially, if you have a list of domains you can download the complete cdx data for each domain and search the list of urls in each domain for if the url matches your filename, but thats the closest you can get |
| 18:34:38 | <audrooku|m> | But this also only works if the pages are actually archived by the wayback machine |
| 19:06:05 | | DLoader joins |
| 19:08:52 | | DLoader_ joins |
| 19:10:18 | | HP_Archivist (HP_Archivist) joins |
| 19:11:17 | | DLoader quits [Ping timeout: 252 seconds] |
| 19:11:23 | | DLoader_ is now known as DLoader |
| 19:23:03 | | @arkiver is now known as @arkiver2 |
| 19:23:13 | | @arkiver2 is now known as @arkiver |
| 19:26:18 | | @arkiver is now known as @notark |
| 19:26:43 | | @notark is now known as @arkiver |
| 19:31:14 | | pabs quits [Read error: Connection reset by peer] |
| 19:33:41 | | pabs (pabs) joins |
| 19:52:12 | | DLoader quits [Ping timeout: 265 seconds] |
| 19:53:15 | | DLoader joins |
| 22:00:24 | | DopefishJustin quits [Read error: Connection reset by peer] |
| 22:03:18 | | Parchivist quits [Remote host closed the connection] |
| 22:15:17 | | Parchivist joins |
| 22:31:02 | | DopefishJustin joins |
| 22:31:02 | | DopefishJustin is now authenticated as DopefishJustin |
| 22:33:56 | <HP_Archivist> | Grr. What's wrong with my arg? ia search --itemlist 'uploader:email@email.com collection:audio' | xargs -P 8 -n 1 ia download --exclude-source=derivative |
| 22:36:38 | <HP_Archivist> | Ah, nvm go it. |
| 22:44:10 | <HP_Archivist> | Welp. No, that didn't work either. How do I only grab items from an account's audio collection? |
| 22:49:56 | <@JAA> | 'an account's audio collection'? |
| 22:51:30 | <HP_Archivist> | The uploader has items that are under the audio collection field. How do I only grab those items, JAA? |
| 22:51:36 | <HP_Archivist> | e.g. only items marked as audio |
| 22:53:10 | <@JAA> | 'Marked as audio' could mean mediatype rather than collection, but otherwise, the command above should work. |
| 22:54:09 | <HP_Archivist> | It's working, JAA. But it's also grabbing non-audio mediatypes, too. Videos. |
| 22:54:23 | <HP_Archivist> | That's why I thought something was wrong with my command |
| 22:54:57 | <HP_Archivist> | *And those videos are definitely marked as video and in video collections. I checked the mediatype under metadata |
| 22:55:41 | <@JAA> | Are they perhaps *also* in the audio collection? |
| 22:57:01 | <@JAA> | If you want me to take a look at the specific case, feel free to PM with the details. |
| 22:58:56 | <HP_Archivist> | Will do |
| 23:12:01 | <@JAA> | Solved. Some of the items are in opensource_audio, which itself is part of audio, but the 'Community' collections are all a bit ... special, so maybe that's why the items aren't found. The other items are in different collections altogether. |
| 23:19:43 | | nicolas17 joins |