02:31:15DogsRNice joins
02:41:19qw3rty__ quits [Ping timeout: 272 seconds]
02:47:48qw3rty__ joins
03:09:15pabs quits [Read error: Connection reset by peer]
03:10:00pabs (pabs) joins
03:56:03DogsRNice quits [Read error: Connection reset by peer]
03:57:34fuzzy80211 quits [Read error: Connection reset by peer]
03:58:05fuzzy80211 (fuzzy80211) joins
04:15:13fuzzy80211 quits [Read error: Connection reset by peer]
04:15:43fuzzy80211 (fuzzy80211) joins
05:01:52fuzzy80211 quits [Read error: Connection reset by peer]
05:02:22fuzzy80211 (fuzzy80211) joins
05:51:41Sanqui joins
05:51:43Sanqui quits [Changing host]
05:51:43Sanqui (Sanqui) joins
06:07:01JaffaCakes118 quits [Remote host closed the connection]
06:07:24JaffaCakes118 (JaffaCakes118) joins
06:24:24nulldata1 (nulldata) joins
06:25:25nulldata quits [Ping timeout: 255 seconds]
06:25:25nulldata1 is now known as nulldata
06:51:30<IDK>I think they are tweaking something, now elements of some pages are returning 503
08:49:09Arcorann (Arcorann) joins
09:39:57nulldata quits [Ping timeout: 272 seconds]
09:48:49Arcorann quits [Ping timeout: 272 seconds]
09:50:21nulldata (nulldata) joins
09:55:12<@arkiver>IDK: got examples?
09:55:15<@arkiver>examples are always important
10:02:28MrMcNuggets (MrMcNuggets) joins
10:02:47MrMcNuggets quits [Remote host closed the connection]
10:02:58MrMcNuggets (MrMcNuggets) joins
10:11:35MrMcNuggets quits [Remote host closed the connection]
10:12:01MrMcNuggets (MrMcNuggets) joins
11:53:23<IDK>Not anymore, for both cases
11:54:09<IDK>Its like 1 day some goes down and they are back in a few hours, and then other goes down
12:35:49MrMcNugg1 (MrMcNuggets) joins
12:36:13MrMcNugg1 quits [Client Quit]
12:39:49MrMcNuggets quits [Ping timeout: 272 seconds]
12:40:05MrMcNuggets (MrMcNuggets) joins
13:58:34BearFortress_ quits [Ping timeout: 255 seconds]
14:32:46f_ quits [Ping timeout: 255 seconds]
14:47:23f_ (funderscore) joins
15:28:20MrMcNugg1 (MrMcNuggets) joins
15:32:05MrMcNuggets quits [Ping timeout: 272 seconds]
15:54:47MrMcNugg1 quits [Remote host closed the connection]
15:55:37MrMcNuggets (MrMcNuggets) joins
15:56:11<audrooku|m>interesting
15:58:46<audrooku|m>is there a working way to dump the cdx pages for a domain at the moment?
15:59:02<audrooku|m>I need all the urls but shownumpages is still broken and I'm getting lots of 504 errors
16:18:48MrMcNuggets quits [Client Quit]
16:29:53<Dango360>audR
16:30:06<Dango360>oops, had ahk on
16:30:55<Dango360>audrooku|m: you can use the cdx api: "https://web.archive.org/cdx/search?url=http://example.com/&matchType=prefix&collapse=urlkey&filter=statuscode%3A200&fl=original"
16:31:34<Dango360>oh wait, are you already using this?
16:33:45<Dango360>if it's not working the way it should you can try contacting info@archive.org and see if they can fix it
16:36:48<audrooku|m>I am sure ia is well aware, a lot of people have brought it up and its been broken for at least a month
16:36:54<audrooku|m>But sure I will email them
16:46:55<audrooku|m>I believe ark.iver has been mentioned a few times about this, but I'm not sure what he said or if he passed it along
16:58:14BearFortress joins
17:12:11<audrooku|m>hmm ok I don't think he actually saw it: arkiver
17:12:29<audrooku|m>example url: https://web.archive.org/cdx/search/cdx?url=https://goo.gl/*&matchType=domain&showNumPages=true
17:41:26<audrooku|m>*the asterisk causes the query to not return any results normally, nix that part
17:55:42DLoader quits [Quit: The Lounge - https://thelounge.chat]
18:06:53<audrooku|m>huh weird, this query gave me 1.17GiB of urls (7.6M) for the first page, maybe the issue is with pagination? https://web.archive.org/cdx/search/cdx?format=json&page=0&url=soundcloud.app.goo.gl&matchType=domain
18:08:38DLoader (DLoader) joins
18:18:42<@JAA>The problem is with the page-based pagination, yes.
18:19:21<@JAA>I didn't poke the resume key pagination much, but my attempts simply returned all results instead of a key.
18:21:38<audrooku|m>Yes I'm noticing this aswell. If I tweak the page size then shownumpages will simply return 1
18:24:28<audrooku|m>I think I have a script behaving with this query format https://web.archive.org/cdx/search/cdx?url=soundcloud.app.goo.gl&matchType=domain&output=json&pageSize=1&page=3
18:24:43<audrooku|m>but I will update and share the script when I have the data
18:26:10<audrooku|m>at least this will work for dumping all urls, I can live without the server side filtering
18:29:34<@JAA>Yeah, the resume key thing was somewhat broken with server-side filtering anyway last time I tried. That's why my script uses the page pagination.
18:30:44<audrooku|m>I didn't even know there was actual resume key pagination, odd
18:30:51DLoader quits [Client Quit]
18:30:59<audrooku|m>https://web.archive.org/cdx/search/cdx?url=soundcloud.app.goo.gl&matchType=domain&output=json&pageSize=1&page=6&showNumPages=1
18:31:01<audrooku|m>this works
18:31:11<audrooku|m>I'm wondering if maybe this is related to the order of the query string?
18:32:00<audrooku|m>or maybe ia saw my email 😅... apologies for the stream of conciousness by the way
18:36:04DLoader (DLoader) joins
18:36:06<audrooku|m>setting pagesize to 1 and/or 50 seems to cause it to work, maybe it will fail on larger domains
18:37:31<audrooku|m>https://web.archive.org/cdx/search/cdx?url=youtube.com&matchType=domain&pageSize=50&showNumPages=1
18:37:31<audrooku|m>`52479`
18:38:13<audrooku|m>I scraped this previously and there was >200k pages, but maybe the default pagesize was not 50 (the docs say it is)
18:59:09<OrIdow6>If you use curl -v
18:59:31<OrIdow6>It complains that "x-archive-wayback-runtime-error: positive pageSize is required"
19:02:13<audrooku|m>it sounds to me like they deleted or mangled the default config
19:02:39<audrooku|m>but thanks for pointing that out that's something I didn't notice and definitely a really relevant error message
19:04:10<OrIdow6>If someone wants to dig thru a very enterprise Java program I believe an old version of the CDX server source is online somewhere
19:05:00<audrooku|m>https://github.com/internetarchive/wayback
19:11:46<audrooku|m>I wrote my own ia-cdx-search script that I have finished validating behaves as properly as it can until this gets fully sorted out, but JAA's should work by just specifying pageSize=50 in the query, though I haven't tested that https://github.com/tntmod54321/audrey-ia-cdx-search
19:17:04ifconfig joins
19:53:22ifconfig quits [Client Quit]
20:02:10BearFortress quits [Ping timeout: 255 seconds]
21:06:28BearFortress joins
21:29:47yarrow quits [Read error: Connection reset by peer]
21:31:29yarrow (yarrow) joins
22:06:19DogsRNice joins
22:20:00yarr0w joins
22:22:34yarrow quits [Ping timeout: 255 seconds]
22:25:31yarr0w quits [Client Quit]
22:25:52yarrow (yarrow) joins
22:57:13tzt quits [Ping timeout: 255 seconds]
23:49:36tzt (tzt) joins
23:54:19qwertyasdfuiopghjkl quits [Ping timeout: 272 seconds]