#internetarchive log for 2023-10-18

Home Search Previous day Next day

02:52:22		pabs quits [Ping timeout: 265 seconds]
02:55:39		pabs (pabs) joins
05:26:50		pabs quits [Ping timeout: 252 seconds]
05:27:50		pabs (pabs) joins
06:03:47	<Ryz>	Heya folks, uhh, is there a way that when checking something like https://web.archive.org/web//http://tournaments.peliliiga.fi/ - it would just simply fetch when the last time a link under the domain has been grabbed, I would just like to know when it was last grabbed;
06:04:41	<Ryz>	Asking this because it would make it a lot faster to cross-examine whether a website is alive or dead, determining doing ignores more easily in #archivebot
06:04:47		Arcorann (Arcorann) joins
06:07:20	<audrooku\|m>	web/2/{url}, though the cdx api may be better suited, there are docs on github
06:24:33		SF quits [Ping timeout: 265 seconds]
06:24:49		SF joins
06:39:23	<pokechu22>	https://web.archive.org/web//http://tournaments.peliliiga.fi/ is implemented using the CDX API (more or less; I think it uses an internal undocumented version that behaves the same)
06:46:33	<Ryz>	Just don't like having to wait for the whole list to be loaded, the ones that go up to 10,000 links
06:46:40	<Ryz>	(It used to be 100,000 links loaded actually)
07:00:27	<pokechu22>	Yeah, the same would apply to the CDX server more or less
09:36:37		systwi_ joins
10:26:35		DLoader quits [Ping timeout: 252 seconds]
10:29:52		DLoader joins
10:33:00		DLoader_ joins
10:33:39		DLoader_ quits [Read error: Connection reset by peer]
10:34:26		DLoader quits [Ping timeout: 265 seconds]
10:35:07		DLoader_ joins
10:35:07		DLoader_ is now known as DLoader
12:11:10		DLoader quits [Read error: Connection reset by peer]
12:12:18		threedeeitguy39 quits [Client Quit]
12:25:38		DLoader joins
12:57:50		Arcorann quits [Ping timeout: 252 seconds]
13:25:57	<Parchivist>	is there some way to search for a filename across all sites that are archived? not even a text search of all archived contents, just the filenames themselves
13:39:09		threedeeitguy39 (threedeeitguy) joins
14:29:30		DLoader_ joins
14:30:47		DLoader quits [Ping timeout: 265 seconds]
14:30:47		DLoader_ quits [Read error: Connection reset by peer]
14:33:32		DLoader_ joins
14:33:33		DLoader_ is now known as DLoader
14:51:34		HP_Archivist quits [Ping timeout: 265 seconds]
14:51:34		DLoader quits [Read error: Connection reset by peer]
14:52:07		DLoader joins
17:11:31		HP_Archivist (HP_Archivist) joins
18:20:41		HP_Archivist quits [Ping timeout: 252 seconds]
18:25:06		DLoader quits [Read error: Connection reset by peer]
18:28:36	<audrooku\|m>	Nothing of the sort. There is no way to search any of the page contents even partially, if you have a list of domains you can download the complete cdx data for each domain and search the list of urls in each domain for if the url matches your filename, but thats the closest you can get
18:34:38	<audrooku\|m>	But this also only works if the pages are actually archived by the wayback machine
19:06:05		DLoader joins
19:08:52		DLoader_ joins
19:10:18		HP_Archivist (HP_Archivist) joins
19:11:17		DLoader quits [Ping timeout: 252 seconds]
19:11:23		DLoader_ is now known as DLoader
19:23:03		@arkiver is now known as @arkiver2
19:23:13		@arkiver2 is now known as @arkiver
19:26:18		@arkiver is now known as @notark
19:26:43		@notark is now known as @arkiver
19:31:14		pabs quits [Read error: Connection reset by peer]
19:33:41		pabs (pabs) joins
19:52:12		DLoader quits [Ping timeout: 265 seconds]
19:53:15		DLoader joins
22:00:24		DopefishJustin quits [Read error: Connection reset by peer]
22:03:18		Parchivist quits [Remote host closed the connection]
22:15:17		Parchivist joins
22:31:02		DopefishJustin joins
22:31:02		DopefishJustin is now authenticated as DopefishJustin
22:33:56	<HP_Archivist>	Grr. What's wrong with my arg? ia search --itemlist 'uploader:email@email.com collection:audio' \| xargs -P 8 -n 1 ia download --exclude-source=derivative
22:36:38	<HP_Archivist>	Ah, nvm go it.
22:44:10	<HP_Archivist>	Welp. No, that didn't work either. How do I only grab items from an account's audio collection?
22:49:56	<@JAA>	'an account's audio collection'?
22:51:30	<HP_Archivist>	The uploader has items that are under the audio collection field. How do I only grab those items, JAA?
22:51:36	<HP_Archivist>	e.g. only items marked as audio
22:53:10	<@JAA>	'Marked as audio' could mean mediatype rather than collection, but otherwise, the command above should work.
22:54:09	<HP_Archivist>	It's working, JAA. But it's also grabbing non-audio mediatypes, too. Videos.
22:54:23	<HP_Archivist>	That's why I thought something was wrong with my command
22:54:57	<HP_Archivist>	*And those videos are definitely marked as video and in video collections. I checked the mediatype under metadata
22:55:41	<@JAA>	Are they perhaps also in the audio collection?
22:57:01	<@JAA>	If you want me to take a look at the specific case, feel free to PM with the details.
22:58:56	<HP_Archivist>	Will do
23:12:01	<@JAA>	Solved. Some of the items are in opensource_audio, which itself is part of audio, but the 'Community' collections are all a bit ... special, so maybe that's why the items aren't found. The other items are in different collections altogether.
23:19:43		nicolas17 joins

Home Search Previous day Next day