02:52:22pabs quits [Ping timeout: 265 seconds]
02:55:39pabs (pabs) joins
05:26:50pabs quits [Ping timeout: 252 seconds]
05:27:50pabs (pabs) joins
06:03:47<Ryz>Heya folks, uhh, is there a way that when checking something like https://web.archive.org/web/*/http://tournaments.peliliiga.fi/* - it would just simply fetch when the last time a link under the domain has been grabbed, I would just like to know when it was last grabbed;
06:04:41<Ryz>Asking this because it would make it a lot faster to cross-examine whether a website is alive or dead, determining doing ignores more easily in #archivebot
06:04:47Arcorann (Arcorann) joins
06:07:20<audrooku|m>web/2/{url}, though the cdx api may be better suited, there are docs on github
06:24:33SF quits [Ping timeout: 265 seconds]
06:24:49SF joins
06:39:23<pokechu22>https://web.archive.org/web/*/http://tournaments.peliliiga.fi/* is implemented using the CDX API (more or less; I think it uses an internal undocumented version that behaves the same)
06:46:33<Ryz>Just don't like having to wait for the whole list to be loaded, the ones that go up to 10,000 links
06:46:40<Ryz>(It used to be 100,000 links loaded actually)
07:00:27<pokechu22>Yeah, the same would apply to the CDX server more or less
09:36:37systwi_ joins
10:26:35DLoader quits [Ping timeout: 252 seconds]
10:29:52DLoader joins
10:33:00DLoader_ joins
10:33:39DLoader_ quits [Read error: Connection reset by peer]
10:34:26DLoader quits [Ping timeout: 265 seconds]
10:35:07DLoader_ joins
10:35:07DLoader_ is now known as DLoader
12:11:10DLoader quits [Read error: Connection reset by peer]
12:12:18threedeeitguy39 quits [Client Quit]
12:25:38DLoader joins
12:57:50Arcorann quits [Ping timeout: 252 seconds]
13:25:57<Parchivist>is there some way to search for a filename across all sites that are archived? not even a text search of all archived contents, just the filenames themselves
13:39:09threedeeitguy39 (threedeeitguy) joins
14:29:30DLoader_ joins
14:30:47DLoader quits [Ping timeout: 265 seconds]
14:30:47DLoader_ quits [Read error: Connection reset by peer]
14:33:32DLoader_ joins
14:33:33DLoader_ is now known as DLoader
14:51:34HP_Archivist quits [Ping timeout: 265 seconds]
14:51:34DLoader quits [Read error: Connection reset by peer]
14:52:07DLoader joins
17:11:31HP_Archivist (HP_Archivist) joins
18:20:41HP_Archivist quits [Ping timeout: 252 seconds]
18:25:06DLoader quits [Read error: Connection reset by peer]
18:28:36<audrooku|m>Nothing of the sort. There is no way to search any of the page contents even partially, if you have a list of domains you can download the complete cdx data for each domain and search the list of urls in each domain for if the url matches your filename, but thats the closest you can get
18:34:38<audrooku|m>But this also only works if the pages are actually archived by the wayback machine
19:06:05DLoader joins
19:08:52DLoader_ joins
19:10:18HP_Archivist (HP_Archivist) joins
19:11:17DLoader quits [Ping timeout: 252 seconds]
19:11:23DLoader_ is now known as DLoader
19:23:03@arkiver is now known as @arkiver2
19:23:13@arkiver2 is now known as @arkiver
19:26:18@arkiver is now known as @notark
19:26:43@notark is now known as @arkiver
19:31:14pabs quits [Read error: Connection reset by peer]
19:33:41pabs (pabs) joins
19:52:12DLoader quits [Ping timeout: 265 seconds]
19:53:15DLoader joins
22:00:24DopefishJustin quits [Read error: Connection reset by peer]
22:03:18Parchivist quits [Remote host closed the connection]
22:15:17Parchivist joins
22:31:02DopefishJustin joins
22:33:56<HP_Archivist>Grr. What's wrong with my arg? ia search --itemlist 'uploader:email@email.com collection:audio' | xargs -P 8 -n 1 ia download --exclude-source=derivative
22:36:38<HP_Archivist>Ah, nvm go it.
22:44:10<HP_Archivist>Welp. No, that didn't work either. How do I only grab items from an account's audio collection?
22:49:56<@JAA>'an account's audio collection'?
22:51:30<HP_Archivist>The uploader has items that are under the audio collection field. How do I only grab those items, JAA?
22:51:36<HP_Archivist>e.g. only items marked as audio
22:53:10<@JAA>'Marked as audio' could mean mediatype rather than collection, but otherwise, the command above should work.
22:54:09<HP_Archivist>It's working, JAA. But it's also grabbing non-audio mediatypes, too. Videos.
22:54:23<HP_Archivist>That's why I thought something was wrong with my command
22:54:57<HP_Archivist>*And those videos are definitely marked as video and in video collections. I checked the mediatype under metadata
22:55:41<@JAA>Are they perhaps *also* in the audio collection?
22:57:01<@JAA>If you want me to take a look at the specific case, feel free to PM with the details.
22:58:56<HP_Archivist>Will do
23:12:01<@JAA>Solved. Some of the items are in opensource_audio, which itself is part of audio, but the 'Community' collections are all a bit ... special, so maybe that's why the items aren't found. The other items are in different collections altogether.
23:19:43nicolas17 joins