| 00:42:58 | | testttt quits [Ping timeout: 265 seconds] |
| 01:47:44 | | Naruyoko quits [Ping timeout: 265 seconds] |
| 01:57:55 | | Exorcism quits [Remote host closed the connection] |
| 01:59:09 | | Exorcism (exorcism) joins |
| 02:02:00 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 04:13:49 | <@arkiver> | up and running again! |
| 04:18:05 | <thuban> | arkiver: returning nil from queue_all if the site slug matches *\.(mairie|assoc|ecole) fails to queue any of those sites (in any variant). that's not the intended behavior, is it? |
| 04:19:43 | <@arkiver> | no |
| 04:19:52 | <@arkiver> | see where the queue_all function is called |
| 04:20:25 | <@arkiver> | hmm |
| 04:20:32 | <@arkiver> | actually i see |
| 04:20:45 | <@arkiver> | let me just strip it |
| 04:20:48 | <thuban> | yeah |
| 04:20:53 | <@arkiver> | good point |
| 04:29:46 | <@arkiver> | thuban: done |
| 04:31:11 | <@arkiver> | DLoader: project10: fuzzy8021: FYI we're back up! |
| 04:31:25 | <@arkiver> | according to findings by imer limits will be lowered tomorrow |
| 04:31:40 | <project10> | ty, updating ;) |
| 04:32:03 | <@arkiver> | project10: you have no auto-update stuff in place right? |
| 04:32:55 | <project10> | I have a few machines that were building the image manually, most with watchtower |
| 04:32:56 | <@arkiver> | i'll be off now |
| 04:33:02 | <@arkiver> | project10: alright! |
| 04:33:19 | <@arkiver> | so i'll ping you then in case of important updates |
| 04:33:39 | <project10> | nah, I'll just switch to mainstream updates. I just did that to jump the gun before the drone/image-poker got poked. |
| 04:33:51 | <@arkiver> | ah! i see |
| 04:34:04 | <project10> | (I guess warrior can git pull the grab project itself? People's warriors seemed to start working right away) |
| 04:34:23 | <@arkiver> | yeah, it auto updates |
| 04:34:31 | <@arkiver> | pulls the repo actually, not the image |
| 04:35:08 | <thuban> | arkiver: do we want to restrict stripping to the case where sub == "pagespro-orange"? theoretically possible that people could use those suffixes on the other domains |
| 04:35:59 | <thuban> | ah yep, there's at least one online http://kerglaw.ecole.pagesperso-orange.fr/ |
| 04:37:02 | <project10> | arkiver: thx for your efforts getting this going again |
| 04:41:15 | <@arkiver> | project10: thanks |
| 04:41:19 | <@arkiver> | thuban: yeah |
| 04:50:23 | <@arkiver> | thuban: also in |
| 04:50:29 | <@arkiver> | project10: updated again :) |
| 04:51:40 | <project10> | just as all the containers finally spun up ;) |
| 04:51:47 | <@arkiver> | ohno :P |
| 04:51:53 | <thuban> | lgtm! |
| 04:51:56 | <project10> | (back on watchtower mainline) |
| 04:52:00 | <thuban> | thanks arkiver :) |
| 04:52:10 | <@arkiver> | thanks for the watchful eye thuban |
| 04:52:24 | <@arkiver> | project10: so do i still need to ping in case of an update? |
| 04:53:14 | <project10> | nope, thanks for the ping. Maybe you could ping if you are making a major adjustment to the rates/limits such that people with weird setups might get banned :) |
| 04:53:30 | <@arkiver> | alright! |
| 04:53:31 | <@arkiver> | and yeah |
| 04:53:33 | <project10> | ref imer's findings, not sure what that's about |
| 04:53:46 | <@arkiver> | we'll put that in tomorrow |
| 04:54:34 | <project10> | were the (mairie|assoc|ecole) items removed from backfeed queue? or did my eyes deceive, and they were never there? |
| 04:56:12 | <@arkiver> | they're being moved away |
| 05:07:13 | <project10> | 3=200 https://mairie-ballainvilliers.mairie.pagespro-orange.fr/style.css |
| 05:07:17 | <project10> | cool :) |
| 05:55:34 | | yts98 leaves |
| 05:55:53 | | yts98 joins |
| 05:59:49 | | Exorcism quits [Remote host closed the connection] |
| 06:01:50 | | Exorcism (exorcism) joins |
| 06:51:52 | | Exorcism2 (exorcism) joins |
| 06:53:27 | | Exorcism quits [Read error: Connection reset by peer] |
| 06:53:27 | | Exorcism2 is now known as Exorcism |
| 07:40:34 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 09:13:21 | | toss (toss) joins |
| 09:35:39 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 09:38:21 | | Exorcism quits [Remote host closed the connection] |
| 09:39:07 | | Exorcism (exorcism) joins |
| 10:11:53 | | Exorcism quits [Remote host closed the connection] |
| 10:12:39 | | Exorcism (exorcism) joins |
| 10:30:39 | | Hans5958 (Hans5958) joins |
| 11:06:14 | | Peroniko (Peroniko) joins |
| 11:25:42 | | kallemarc joins |
| 11:27:26 | | kallemarc quits [Remote host closed the connection] |
| 11:27:47 | | toss quits [Read error: Connection reset by peer] |
| 11:38:02 | | levomi joins |
| 12:06:53 | | sonick (sonick) joins |
| 12:39:40 | | Maturion joins |
| 12:57:20 | | Peroniko quits [Ping timeout: 252 seconds] |
| 12:58:05 | | Peroniko (Peroniko) joins |
| 14:03:33 | <nulldata> | found item url:https://sibel.reunion.pagesperso-orange.fr///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////parapharmacie/parapharmacie_kln.html |
| 14:04:05 | <nulldata> | How many slashes could one need? lol |
| 14:05:29 | <imer> | Yes |
| 14:23:19 | <imer> | 4-5 day eta at current speed, so gotta go a bit faster yet ;) |
| 14:30:59 | <@arkiver> | nulldata: ouch - i'll add a little extra rule for that |
| 14:56:28 | | Exorcism quits [Remote host closed the connection] |
| 14:57:58 | | Exorcism (exorcism) joins |
| 15:29:25 | <@arkiver> | imer: with your research, what would be safe sleeping times? |
| 15:29:29 | <@arkiver> | we now sleep 2 second on 200 |
| 15:29:33 | <@arkiver> | and 6 seconds on non-200 |
| 15:33:00 | <imer> | arkiver: I'd probably keep the 6s for error redirects (if you have access to that info), but everything else should be fine with barely any sleeping, maybe half a second or less (I was doing 10 req/s without seeing bans)? |
| 15:33:13 | <imer> | Just checking my banned ips, those are unbanned again |
| 15:33:24 | <imer> | bit late, meant to check those at the 24h mark |
| 15:34:30 | <@arkiver> | i can reduce the sleep on 200 to 1 second |
| 15:34:48 | <@arkiver> | if we keep the non-200 sleep on 6 seconds, it won't change much since the majority we go through is non-200 |
| 15:35:04 | <imer> | can you sleep depending on if it's an error redirect or do you not have the info there? |
| 15:35:22 | <@arkiver> | i have that info! |
| 15:35:25 | <@arkiver> | so yes we can do that |
| 15:35:40 | <imer> | lets do that then :) |
| 15:36:07 | <imer> | 6s only for error redirect and then everything else low (1s for a start I guess? although I think we can go lower) |
| 15:36:13 | <@arkiver> | alright! |
| 15:54:49 | | Exorcism quits [Remote host closed the connection] |
| 15:54:56 | | yts98 leaves |
| 15:55:11 | | yts98 joins |
| 15:55:32 | | Exorcism (exorcism) joins |
| 15:59:40 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 16:25:52 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 16:38:56 | | threedeeitguy397 (threedeeitguy) joins |
| 16:40:38 | | threedeeitguy39 quits [Ping timeout: 252 seconds] |
| 16:45:02 | | threedeeitguy397 quits [Ping timeout: 252 seconds] |
| 16:52:21 | | threedeeitguy39 (threedeeitguy) joins |
| 17:17:47 | <pokechu22> | ArchiveBot reduces all repeated slashes to just 1 slash so we probably never noticed that (archivebot's behavior is bad for URLs that contain other URLs in them since it converts https://example.com/proxy/https://example.com into https://example.com/proxy/https:/example.com which some things don't like, but that's probably not a concern here) |
| 17:29:01 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 17:54:57 | | flashfire42 joins |
| 17:56:03 | | kiska (kiska) joins |
| 17:58:29 | <BornOn420> | 20 separate docker instances here for 4 hours already without bans. |
| 17:59:02 | <pokechu22> | BornOn420: On the same IP? |
| 18:01:37 | <pokechu22> | and actually separate instances, as in they don't know about each other, instead of using the standard concurrency mechanism? (Using the standard one adjusts the delays for this project to compensate) |
| 18:05:59 | <BornOn420> | yes, one IP, all concurrency=1 |
| 18:06:47 | <pokechu22> | Hmm |
| 18:11:26 | | kiska5 joins |
| 18:13:51 | <imer> | 40 resulted in a ban, so don't go too high :D |
| 18:36:50 | <BornOn420> | I do see an increase in 'bad response' answers, so no ban, but not all roses either. |
| 19:11:25 | <@arkiver> | update is in |
| 19:11:28 | <@arkiver> | to reduce sleep times |
| 19:11:42 | <@arkiver> | project10: DLoader: FYI ^ |
| 19:12:47 | <@arkiver> | 6 second sleep for redirect to 404 |
| 19:12:49 | <@arkiver> | else 1 second |
| 19:28:40 | | wickedplayer494 joins |
| 19:28:42 | | wickedplayer494 is now authenticated as wickedplayer494 |
| 19:37:23 | <imer> | the speeeed :D |
| 19:37:35 | <@arkiver> | :) |
| 19:42:11 | | Peroniko quits [Ping timeout: 265 seconds] |
| 19:42:39 | | Peroniko (Peroniko) joins |
| 19:42:39 | | Peroniko quits [Max SendQ exceeded] |
| 19:48:28 | | tzt quits [Ping timeout: 265 seconds] |
| 19:51:20 | | tzt (tzt) joins |
| 20:08:48 | | flashfire42 is now authenticated as flashfire42 |
| 20:08:48 | | @ChanServ sets mode: +o flashfire42 |
| 20:51:48 | | sonick quits [Client Quit] |
| 21:09:21 | <imer> | 2.5day eta at current rate, more like it |
| 21:20:03 | <levomi> | arkiver : ////// url's still show up in my logs : Archiving item url:http://pagesperso-orange.fr/sibel.reunion//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// |
| 21:20:03 | <levomi> | //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////images/gammes_produits/gam_carasa.jpg |
| 21:25:34 | <BornOn420> | Item url:https://pagesperso-orange.fr/sibel.reunion////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////images |
| 21:25:34 | <BornOn420> | /logos/logo_oligo.jpg is aborted. |
| 21:26:12 | <pokechu22> | If they're immediatley aborted that seems plausible |
| 21:26:29 | | Carnildo joins |
| 21:29:54 | <levomi> | Ah, i assumed that the they would be filtered/corrected at the tracker side |
| 21:29:58 | <BornOn420> | Aha of course. That's the output of 'Starting SetBadUrl' |
| 21:45:27 | <@JAA> | Maybe they should be filtered out tracker-side instead? |
| 21:45:43 | <@arkiver> | i'll add that in a bit, finishing up something else |
| 21:54:52 | | Exorcism quits [Remote host closed the connection] |
| 21:55:37 | | Exorcism (exorcism) joins |
| 21:57:36 | <@arkiver> | they're being filtered out now |
| 21:57:45 | <@arkiver> | periodically |
| 21:57:52 | <@arkiver> | at //// |
| 22:09:36 | <pokechu22> | Oh, are you doing anything to normalize for sites that use backslashes instead of front slashes? |
| 22:09:58 | <pokechu22> | e.g. https://hansi.pagespro-orange.fr/ |
| 22:10:14 | <pokechu22> | IIRC the pages don't work if you use backslashes, only if you use forward slashes, but I'm not 100% sure of that |
| 22:14:07 | | kalle joins |
| 22:32:19 | | kalle quits [Remote host closed the connection] |
| 22:38:02 | | tarsubo joins |
| 22:39:22 | | tarsubo quits [Remote host closed the connection] |
| 22:50:11 | | Exorcism quits [Remote host closed the connection] |
| 22:50:52 | | Exorcism (exorcism) joins |
| 23:29:11 | <thuban> | i believe the repeated slashes are due to ~malformed paths in the page source |
| 23:29:16 | <thuban> | eg, on sibel.reunion, a bunch of the img srcs begin with "..//" |
| 23:29:22 | <thuban> | as a result the second slash is captured as part of the `rest` group |
| 23:29:26 | <thuban> | and as it's collapsed by the browser but distinct to the deduper, the crawl just keeps bouncing back and forth adding more slashes |
| 23:30:21 | <thuban> | replacing the `rest`-preceding '/' with '/+' should fix this i think |
| 23:31:17 | <thuban> | ("by the browser"--well, you know what i mean) |
| 23:35:49 | | Maturion quits [Remote host closed the connection] |
| 23:53:45 | <@arkiver> | since when does Pages Perso Orange exist? and what is a reference for this? |