00:00:55<thalia>Is there a way to order PDFs in the book viewer when there's multiple, besides just filename lexical ordering?
00:01:23<thalia>And is there some metadata to mark that a PDF should be opened in the viewer with the 1-page view by default?
01:15:04xkey quits [Quit: WeeChat 4.7.2]
01:16:16xkey (xkey) joins
02:39:05nulldata-alt9 (nulldata) joins
02:40:51nulldata-alt quits [Ping timeout: 272 seconds]
02:40:51nulldata-alt9 is now known as nulldata-alt
03:40:24AlsoHP_Archivist joins
03:40:30PredatorIWD255 joins
03:42:18PredatorIWD25 quits [Ping timeout: 256 seconds]
03:42:18PredatorIWD255 is now known as PredatorIWD25
03:44:34HP_Archivist quits [Ping timeout: 256 seconds]
03:55:55AlsoHP_Archivist quits [Client Quit]
03:56:10HP_Archivist (HP_Archivist) joins
04:15:19<pabs>anyone know what could cause SPN2 (email API) to give "Error! Capture timed out" for a URL?
04:23:39<nicolas17>I just uploaded a zip to archive.org that I didn't mean to upload compressed, and I used --delete so it was gone locally
04:24:20<nicolas17>I re-downloaded it from IA, unzipped it, deleted it from IA using 'ia rm --no-backup', and uploaded the folder
04:24:39<nicolas17>which uploaded the unzipped contents *and the zip again* because I forgot to delete it after unzip :/
05:15:22DogsRNice quits [Read error: Connection reset by peer]
07:06:32DopefishJustin quits [Remote host closed the connection]
07:14:44DopefishJustin joins
07:17:46dendory3 joins
07:19:31dendory quits [Ping timeout: 272 seconds]
12:16:18<cruller>According to https://help.archive.org/help/using-the-wayback-machine/, “Alexa Internet, in cooperation with the Internet Archive, has designed a three dimensional index that allows browsing of web documents over multiple time periods, and turned this unique feature into the Wayback Machine.”
12:16:24<cruller>What exactly are these “three dimensions”?
12:29:37<vics>I suppose that two dimensions are browsing a web site as usual, and the third one is time.
12:42:31pabs quits [Ping timeout: 272 seconds]
12:47:25<cruller>You mean vertical, horizontal, and time? I also think that's the most plausible theory, but I'm not certain.
12:49:51<cruller>At least, time must be included.
13:06:37pabs (pabs) joins
13:37:31lexikiq quits [Quit: Ping timeout (120 seconds)]
13:37:56lexikiq joins
14:03:47dendory3 is now known as dendory
14:03:56dendory quits [Changing host]
14:03:56dendory (dendory) joins
14:47:08<nstrom|m>I mean it could be a VR virtualization where you fly through file folders like in Hackers. but I doubt it
15:08:12SootBector quits [Remote host closed the connection]
15:09:24SootBector (SootBector) joins
15:54:35rewby quits [Quit: WeeChat 4.4.2]
16:30:44rewby (rewby) joins
21:11:09<datechnoman>JAA - Is this still the best way to query / list subdomains using IA's CDX? - https://gitea.arpa.li/JustAnotherArchivist/little-things/raw/branch/master/ia-cdx-search-subdomains
21:11:29<datechnoman>I havent tried configuring the requirements yet. Wasnt sure if this is still valid or not
21:16:19<@JAA>datechnoman: Hmm, I haven't actually used that script in a long while, only ia-cdx-search directly.
21:16:42<@JAA>But I think it should work, yeah.
21:24:54<datechnoman>Got an easier way / one liner otherwise?
21:25:21<datechnoman>All I want to do is be able to query IA's CDX for all subdomains of a domain, eg: *.foo.com
21:25:56<datechnoman>Reading the CDX Documentation, https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server, it should be a simple command but I cant get it to work the way it should....
21:26:00<datechnoman>Its most likely just me :/
21:27:30<datechnoman>ChatGPT keeps telling me to run the same things and they dont seem to work either :/
21:30:25<@JAA>Do you want the list of domains or the list of URLs?
21:31:08<pokechu22>I'm not aware of a way that's better than listing all URLs and then collapsing it to a list of subdomains afterwards. The CDX server *does* let you collapse to a substring of URLs, but that doesn't match with subdomains directly - you need to pick a length that you expect substrings to be under, and you'll still get multiple results for most subdomains
21:32:34<datechnoman>Happy to do cleanup post pulling the data
21:32:53<datechnoman>In this case im wanting the domains and dont care about the URLs, BUT, it would be nice to know how to use that in the future
21:33:17<datechnoman>Lets say a list of URLs as I can then process it to be just the subdomains and have both capabilities
21:34:56<pokechu22>I think ia-cdx-search does that. The way I personally do it (which has some other limitations regarding rate-limiting/getting multiple pages at a time) is plugging it in to https://web.archive.org/cdx/search/cdx?url=example.com&collapse=urlkey&matchType=domain&fl=original&limit=100000&showResumeKey=true&resumeKey= and then as needed plugging the resume key at the end of the
21:34:58<pokechu22>list back into that, but I generally do that by hand and it's kinda slow
21:38:29<pokechu22>ia-cdx-search is probably faster/more reliable for scripting, and it looks like ia-cdx-search-subdomains just uses ia-cdx-search and then post-processes it to a list of subdomains. The way I do it works but sometimes gets 502s (I think, might be 503s?)
21:41:27<datechnoman>Only issue is that it is only listing urls matching "example.com*" where I want "subdomain1.example.com*", "subdomain2.example.com*" if you know what I mean? - https://web.archive.org/cdx/search/cdx?url=example.com&collapse=urlkey&matchType=domain&fl=original&limit=100000&showResumeKey=true&resumeKey=
21:42:00<@JAA>matchType=domain matches subdomains, too.
21:43:34<datechnoman>Hmmm maybe I need to let it run longer, results are streaming through. See if any subdomains start popping up
21:45:23<@JAA>url=example.org&matchType=domain&collapse=urlkey&fl=original is what I normally use for this use case. And I post-process it into a list of domains later when needed.
21:45:52<@JAA>Which is exactly what ia-cdx-search-subdomains does as well, just with the post-processing immediately.
21:46:13<@JAA>It does strip ports as well, in case that matters.
21:46:20<@JAA>So really only domains, not hosts.
21:46:41<pokechu22>Note that you'll get both www and non-www at the start because the urlkey suppresses www; other subdomains appear after those
22:45:56<datechnoman>Ahhh so thats working. Turns out it was always working, i just needed to let it output the root domain urls first.... *sigh*
22:46:39<datechnoman>Thank you very much all
22:52:38<pokechu22>Is there more information on the indexing issue (mentioned in #archiveteam-bs recently)? My understanding is that it's related to indexing but I'd appreciate something public/official I can point to
23:16:54atphoenix_ (atphoenix) joins
23:19:39atphoenix__ quits [Ping timeout: 272 seconds]