00:57:38systwi quits [Ping timeout: 252 seconds]
01:05:25systwi (systwi) joins
05:26:57pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
05:29:05pabs (pabs) joins
06:46:53pabs quits [Client Quit]
06:49:15pabs (pabs) joins
06:49:51pabs quits [Client Quit]
07:03:53pabs (pabs) joins
07:51:10Exorcism (exorcism) joins
08:22:01Arcorann (Arcorann) joins
12:13:07HP_Archivist quits [Client Quit]
13:14:35Exorcism quits [Client Quit]
13:18:12Exorcism (exorcism) joins
13:30:47Exorcism quits [Client Quit]
13:30:55Exorcism (exorcism) joins
13:41:54pabs quits [Client Quit]
13:45:37pabs (pabs) joins
13:47:38Arcorann quits [Ping timeout: 252 seconds]
14:13:38<Nemo_bis>Is archivebot able to archive URLs behind hcaptcha like https://iopscience.iop.org/article/10.1086/301260/pdf ? https://archive.is/r5vO5 seems to manage while WBM just gives me "job failed".
14:15:25<@JAA>I'm not getting any captcha there, so there's likely an IP-based component to it.
14:15:35<Nemo_bis>Yes
14:15:45<@JAA>But no, AB doesn't have captcha stuff.
14:15:54<@JAA>So if its IPs are shitlisted, welp.
14:16:29<Nemo_bis>In which case how to help with additional IP addresses and slow crawls?
14:16:46<@JAA>Since we're in #internetarchive here, I'm guessing the on-topic question would be whether the WBM could gain support for captchas. My hunch is no, or it'd be very difficult at least.
14:17:25<Nemo_bis>Not really, in my opinion the on-topic question is how people can contribute items which would show up in the WBM and therefore in scholar.archive.org. :)
14:18:14<Nemo_bis>I can manually make WARCs and try to get them in the right collections but I'm not sure that's the best way.
14:18:37<@JAA>'manually'?
14:19:04<Nemo_bis>With simplistic wget-based crawls for example.
14:19:08<@JAA>Right
14:21:12<@JAA>The specific captcha thingy on that domain is 'ShieldSquare Captcha' apparently.
14:23:18<@JAA>SPN gets an HTTP 400 somehow‽
14:24:05<Nemo_bis>Maybe they additionally blocked IA networks at a lower level?
14:24:20<@JAA>Yeah, possibly.
14:24:46<Nemo_bis>HTTP headers are trollish: iop_licence_id: NO_LICENSE_IN_THE_NEW_WORLD
14:33:49<@JAA>Hmm, that seemed to work fine, actually: https://web.archive.org/web/20230821142328/https://iopscience.iop.org/article/10.1086/301260/pdf
14:33:55<@JAA>Despite the HTTP 400 message on SPN.
14:34:29<Nemo_bis>Curious
14:35:09<@JAA>The timestamp is later though, so maybe it retried after the 400.
14:38:37<@JAA>To be clear because 'SPN' can mean at least three things nowadays: I pasted the PDF URL into the form on https://web.archive.org/save/ with 'Save error pages' checked (not logged in, so no other checkboxes).
14:46:26<Nemo_bis>Ok. That's actually different from what happens if you load directly https://web.archive.org/save/https://iopscience.iop.org/article/10.1086/301260/pdf
14:46:30<Nemo_bis>isn't it
14:46:32<@JAA>Yep
14:46:48<@JAA>/save/URL fails on 4xx and 5xx, for one.
14:46:56<@JAA>As in, no snapshot is created.
14:53:08<Nemo_bis>Unrelatedly, https://fatcat.wiki/release/mpd2d4kyxfgp3nyz7irvvyuzfi is marked as dark preservation only even though https://web.archive.org/web/20200215180852/http://downloads.hindawi.com/archive/2012/375843.pdf is available from https://archive.org/details/OA-DOI-CRAWL-2020-02 . I guess nobody from IA is working on it any more though.
14:54:01<Nemo_bis>But it shows how yet another project would be to double check what PDFs URLs are already available in WBM and not yet mapped in fatcat (before embarking in any major crawl).
14:59:12<@JAA>TIL factat, neat.
15:41:55AK quits [Quit: Ping timeout (120 seconds)]
15:42:03Craigle quits [Quit: Ping timeout (120 seconds)]
15:42:13that_lurker quits [Quit: Ping timeout (120 seconds)]
15:42:22systwi_ quits [Quit: Ping timeout (120 seconds)]
15:42:27nothere quits [Quit: Leaving]
15:43:04@arkiver quits [Quit: .]
16:08:06arkiver (arkiver) joins
16:08:06@ChanServ sets mode: +o arkiver
16:08:06nothere joins
16:08:07nothere quits [Max SendQ exceeded]
16:08:43nothere joins
16:08:44nothere quits [Max SendQ exceeded]
16:08:57that_lurker (that_lurker) joins
16:09:14AK (AK) joins
16:09:26Craigle (Craigle) joins
16:10:43nothere joins
16:10:44nothere quits [Max SendQ exceeded]
16:11:58nothere joins
16:11:59nothere quits [Max SendQ exceeded]
16:14:29nothere joins
16:14:30nothere quits [Max SendQ exceeded]
16:18:07AK quits [Client Quit]
16:18:07that_lurker quits [Client Quit]
16:18:07Craigle quits [Client Quit]
16:19:22@arkiver quits [Client Quit]
16:24:19arkiver (arkiver) joins
16:24:20@ChanServ sets mode: +o arkiver
16:24:30that_lurker (that_lurker) joins
16:25:12that_lurker quits [Client Quit]
16:25:35that_lurker (that_lurker) joins
16:26:11AK (AK) joins
16:26:12Craigle (Craigle) joins
16:26:14nothere joins
16:26:15nothere quits [Max SendQ exceeded]
16:27:27nothere joins
16:27:28nothere quits [Max SendQ exceeded]
16:29:46nothere joins
16:29:47nothere quits [Max SendQ exceeded]
16:31:33nothere joins
16:31:33nothere quits [Max SendQ exceeded]
16:32:17nothere joins
16:32:18nothere quits [Max SendQ exceeded]
16:34:02nothere joins
16:34:03nothere quits [Max SendQ exceeded]
16:36:30nothere joins
16:36:31nothere quits [Max SendQ exceeded]
16:37:48nothere joins
16:37:49nothere quits [Max SendQ exceeded]
16:38:31nothere joins
16:38:32nothere quits [Max SendQ exceeded]
16:40:18nothere joins
16:40:19nothere quits [Max SendQ exceeded]
16:41:03nothere joins
16:41:04nothere quits [Max SendQ exceeded]
16:42:18nothere joins
16:42:19nothere quits [Max SendQ exceeded]
16:43:00nothere joins
16:43:01nothere quits [Max SendQ exceeded]
16:44:02nothere joins
16:44:03nothere quits [Max SendQ exceeded]
16:45:15nothere joins
16:45:16nothere quits [Max SendQ exceeded]
16:48:35nothere joins
16:48:36nothere quits [Max SendQ exceeded]
16:52:32nothere joins
16:52:33nothere quits [Max SendQ exceeded]
16:53:48nothere joins
16:53:49nothere quits [Max SendQ exceeded]
16:54:19nothere joins
16:54:20nothere quits [Max SendQ exceeded]
16:56:50nothere joins
16:56:50nothere quits [Max SendQ exceeded]
17:07:19nothere joins
17:07:20nothere quits [Max SendQ exceeded]
18:09:30nothere joins
19:41:44andrew (andrew) joins
19:50:38andrew quits [Ping timeout: 252 seconds]
19:52:53andrew (andrew) joins
20:37:13systwi_ joins
20:44:32nicolas17 quits [Ping timeout: 252 seconds]
20:49:17nicolas17 joins
21:35:33DLoader_ joins
21:38:07DLoader quits [Ping timeout: 258 seconds]
21:41:26DLoader_ quits [Ping timeout: 265 seconds]
21:49:53DLoader_ joins
21:49:53DLoader_ is now known as DLoader
22:39:26DLoader quits [Ping timeout: 265 seconds]
22:47:21DLoader joins
23:36:08Barto quits [Ping timeout: 252 seconds]
23:40:42Exorcism quits [Client Quit]
23:41:39Barto (Barto) joins