| 00:57:38 | | systwi quits [Ping timeout: 252 seconds] |
| 01:05:25 | | systwi (systwi) joins |
| 05:26:57 | | pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.] |
| 05:29:05 | | pabs (pabs) joins |
| 06:46:53 | | pabs quits [Client Quit] |
| 06:49:15 | | pabs (pabs) joins |
| 06:49:51 | | pabs quits [Client Quit] |
| 07:03:53 | | pabs (pabs) joins |
| 07:51:10 | | Exorcism (exorcism) joins |
| 08:22:01 | | Arcorann (Arcorann) joins |
| 12:13:07 | | HP_Archivist quits [Client Quit] |
| 13:14:35 | | Exorcism quits [Client Quit] |
| 13:18:12 | | Exorcism (exorcism) joins |
| 13:30:47 | | Exorcism quits [Client Quit] |
| 13:30:55 | | Exorcism (exorcism) joins |
| 13:41:54 | | pabs quits [Client Quit] |
| 13:45:37 | | pabs (pabs) joins |
| 13:47:38 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:13:38 | <Nemo_bis> | Is archivebot able to archive URLs behind hcaptcha like https://iopscience.iop.org/article/10.1086/301260/pdf ? https://archive.is/r5vO5 seems to manage while WBM just gives me "job failed". |
| 14:15:25 | <@JAA> | I'm not getting any captcha there, so there's likely an IP-based component to it. |
| 14:15:35 | <Nemo_bis> | Yes |
| 14:15:45 | <@JAA> | But no, AB doesn't have captcha stuff. |
| 14:15:54 | <@JAA> | So if its IPs are shitlisted, welp. |
| 14:16:29 | <Nemo_bis> | In which case how to help with additional IP addresses and slow crawls? |
| 14:16:46 | <@JAA> | Since we're in #internetarchive here, I'm guessing the on-topic question would be whether the WBM could gain support for captchas. My hunch is no, or it'd be very difficult at least. |
| 14:17:25 | <Nemo_bis> | Not really, in my opinion the on-topic question is how people can contribute items which would show up in the WBM and therefore in scholar.archive.org. :) |
| 14:18:14 | <Nemo_bis> | I can manually make WARCs and try to get them in the right collections but I'm not sure that's the best way. |
| 14:18:37 | <@JAA> | 'manually'? |
| 14:19:04 | <Nemo_bis> | With simplistic wget-based crawls for example. |
| 14:19:08 | <@JAA> | Right |
| 14:21:12 | <@JAA> | The specific captcha thingy on that domain is 'ShieldSquare Captcha' apparently. |
| 14:23:18 | <@JAA> | SPN gets an HTTP 400 somehow‽ |
| 14:24:05 | <Nemo_bis> | Maybe they additionally blocked IA networks at a lower level? |
| 14:24:20 | <@JAA> | Yeah, possibly. |
| 14:24:46 | <Nemo_bis> | HTTP headers are trollish: iop_licence_id: NO_LICENSE_IN_THE_NEW_WORLD |
| 14:33:49 | <@JAA> | Hmm, that seemed to work fine, actually: https://web.archive.org/web/20230821142328/https://iopscience.iop.org/article/10.1086/301260/pdf |
| 14:33:55 | <@JAA> | Despite the HTTP 400 message on SPN. |
| 14:34:29 | <Nemo_bis> | Curious |
| 14:35:09 | <@JAA> | The timestamp is later though, so maybe it retried after the 400. |
| 14:38:37 | <@JAA> | To be clear because 'SPN' can mean at least three things nowadays: I pasted the PDF URL into the form on https://web.archive.org/save/ with 'Save error pages' checked (not logged in, so no other checkboxes). |
| 14:46:26 | <Nemo_bis> | Ok. That's actually different from what happens if you load directly https://web.archive.org/save/https://iopscience.iop.org/article/10.1086/301260/pdf |
| 14:46:30 | <Nemo_bis> | isn't it |
| 14:46:32 | <@JAA> | Yep |
| 14:46:48 | <@JAA> | /save/URL fails on 4xx and 5xx, for one. |
| 14:46:56 | <@JAA> | As in, no snapshot is created. |
| 14:53:08 | <Nemo_bis> | Unrelatedly, https://fatcat.wiki/release/mpd2d4kyxfgp3nyz7irvvyuzfi is marked as dark preservation only even though https://web.archive.org/web/20200215180852/http://downloads.hindawi.com/archive/2012/375843.pdf is available from https://archive.org/details/OA-DOI-CRAWL-2020-02 . I guess nobody from IA is working on it any more though. |
| 14:54:01 | <Nemo_bis> | But it shows how yet another project would be to double check what PDFs URLs are already available in WBM and not yet mapped in fatcat (before embarking in any major crawl). |
| 14:59:12 | <@JAA> | TIL factat, neat. |
| 15:41:55 | | AK quits [Quit: Ping timeout (120 seconds)] |
| 15:42:03 | | Craigle quits [Quit: Ping timeout (120 seconds)] |
| 15:42:13 | | that_lurker quits [Quit: Ping timeout (120 seconds)] |
| 15:42:22 | | systwi_ quits [Quit: Ping timeout (120 seconds)] |
| 15:42:27 | | nothere quits [Quit: Leaving] |
| 15:43:04 | | @arkiver quits [Quit: .] |
| 16:08:06 | | arkiver (arkiver) joins |
| 16:08:06 | | @ChanServ sets mode: +o arkiver |
| 16:08:06 | | nothere joins |
| 16:08:07 | | nothere quits [Max SendQ exceeded] |
| 16:08:43 | | nothere joins |
| 16:08:44 | | nothere quits [Max SendQ exceeded] |
| 16:08:57 | | that_lurker (that_lurker) joins |
| 16:09:14 | | AK (AK) joins |
| 16:09:26 | | Craigle (Craigle) joins |
| 16:10:43 | | nothere joins |
| 16:10:44 | | nothere quits [Max SendQ exceeded] |
| 16:11:58 | | nothere joins |
| 16:11:59 | | nothere quits [Max SendQ exceeded] |
| 16:14:29 | | nothere joins |
| 16:14:30 | | nothere quits [Max SendQ exceeded] |
| 16:18:07 | | AK quits [Client Quit] |
| 16:18:07 | | that_lurker quits [Client Quit] |
| 16:18:07 | | Craigle quits [Client Quit] |
| 16:19:22 | | @arkiver quits [Client Quit] |
| 16:24:19 | | arkiver (arkiver) joins |
| 16:24:20 | | @ChanServ sets mode: +o arkiver |
| 16:24:30 | | that_lurker (that_lurker) joins |
| 16:25:12 | | that_lurker quits [Client Quit] |
| 16:25:35 | | that_lurker (that_lurker) joins |
| 16:26:11 | | AK (AK) joins |
| 16:26:12 | | Craigle (Craigle) joins |
| 16:26:14 | | nothere joins |
| 16:26:15 | | nothere quits [Max SendQ exceeded] |
| 16:27:27 | | nothere joins |
| 16:27:28 | | nothere quits [Max SendQ exceeded] |
| 16:29:46 | | nothere joins |
| 16:29:47 | | nothere quits [Max SendQ exceeded] |
| 16:31:33 | | nothere joins |
| 16:31:33 | | nothere quits [Max SendQ exceeded] |
| 16:32:17 | | nothere joins |
| 16:32:18 | | nothere quits [Max SendQ exceeded] |
| 16:34:02 | | nothere joins |
| 16:34:03 | | nothere quits [Max SendQ exceeded] |
| 16:36:30 | | nothere joins |
| 16:36:31 | | nothere quits [Max SendQ exceeded] |
| 16:37:48 | | nothere joins |
| 16:37:49 | | nothere quits [Max SendQ exceeded] |
| 16:38:31 | | nothere joins |
| 16:38:32 | | nothere quits [Max SendQ exceeded] |
| 16:40:18 | | nothere joins |
| 16:40:19 | | nothere quits [Max SendQ exceeded] |
| 16:41:03 | | nothere joins |
| 16:41:04 | | nothere quits [Max SendQ exceeded] |
| 16:42:18 | | nothere joins |
| 16:42:19 | | nothere quits [Max SendQ exceeded] |
| 16:43:00 | | nothere joins |
| 16:43:01 | | nothere quits [Max SendQ exceeded] |
| 16:44:02 | | nothere joins |
| 16:44:03 | | nothere quits [Max SendQ exceeded] |
| 16:45:15 | | nothere joins |
| 16:45:16 | | nothere quits [Max SendQ exceeded] |
| 16:48:35 | | nothere joins |
| 16:48:36 | | nothere quits [Max SendQ exceeded] |
| 16:52:32 | | nothere joins |
| 16:52:33 | | nothere quits [Max SendQ exceeded] |
| 16:53:48 | | nothere joins |
| 16:53:49 | | nothere quits [Max SendQ exceeded] |
| 16:54:19 | | nothere joins |
| 16:54:20 | | nothere quits [Max SendQ exceeded] |
| 16:56:50 | | nothere joins |
| 16:56:50 | | nothere quits [Max SendQ exceeded] |
| 17:07:19 | | nothere joins |
| 17:07:20 | | nothere quits [Max SendQ exceeded] |
| 18:09:30 | | nothere joins |
| 19:41:44 | | andrew (andrew) joins |
| 19:50:38 | | andrew quits [Ping timeout: 252 seconds] |
| 19:52:53 | | andrew (andrew) joins |
| 20:37:13 | | systwi_ joins |
| 20:44:32 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 20:49:17 | | nicolas17 joins |
| 21:35:33 | | DLoader_ joins |
| 21:38:07 | | DLoader quits [Ping timeout: 258 seconds] |
| 21:41:26 | | DLoader_ quits [Ping timeout: 265 seconds] |
| 21:49:53 | | DLoader_ joins |
| 21:49:53 | | DLoader_ is now known as DLoader |
| 22:39:26 | | DLoader quits [Ping timeout: 265 seconds] |
| 22:47:21 | | DLoader joins |
| 23:36:08 | | Barto quits [Ping timeout: 252 seconds] |
| 23:40:42 | | Exorcism quits [Client Quit] |
| 23:41:39 | | Barto (Barto) joins |