00:00:44Chris5010 quits [Remote host closed the connection]
00:25:50<@flashfire42>My warriors are spinning and ready
00:53:39<fireonlive>🥷🏻
01:17:48qwertyasdfuiopghjkl quits [Remote host closed the connection]
01:54:04pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
01:55:04pabs (pabs) joins
02:24:21<@arkiver>alright i think the changes as given by pokechu22 are in
02:30:15Peroniko quits [Ping timeout: 265 seconds]
02:30:43Peroniko (Peroniko) joins
02:31:34<thuban>arkiver: i don't see them on github
02:32:52<@arkiver>doing final test
02:33:08<thuban>oic
02:38:01<@arkiver>seems to be working well
02:47:58<@arkiver>thuban: pushed
02:50:49<@flashfire42>tracker limited seems slightly better than no item recieved
02:51:23<@arkiver>i paused it
02:51:27<@arkiver>items are being queued
02:51:47<project10>will JAA need to poke something for a docker image build?
02:52:20<pokechu22>Looks good. Theoretically we still don't need to try both protocols but doing both is still fine
02:53:04<pokechu22>arkiver: err, " local maxtries = tries - 1"?
02:53:59<pokechu22>I don't think that makes sense, as `tries > maxtries` will always be true (barring integer overflow maybe, which shouldn't matter). Though for maxtries = 1 it's probably fine
02:56:18<@arkiver>i know
02:56:25<@arkiver>it's just to not retry
02:56:38<@arkiver>while keeping original code somewhat in place for if we want to start doing more stuff with that
02:56:50<pokechu22>Does maxtries = 1 not do the same thing?
02:58:18<@arkiver>no because tries will be 1 and the condition after is a > condition
02:59:49<@flashfire42>looks like we are loaded
03:09:06<thuban>i'm unsure about skipping the working redirect domains (*.orange.fr, monsite.wanadoo.fr)
03:09:09<thuban>they are low-value, but they do preserve discoverability in the wbm
03:09:40<thuban>might it be better to 'pass through' urls we receive (either manually queued or through backfeed) at the redirect domains, but not generate them from other formats?
03:09:48<@flashfire42>could queue them if we have time?
03:10:00<@flashfire42>We are already on quite borrowed time here
03:11:48<thuban>we could, but
03:16:48<thuban>manually generating and queuing an e.g. monsite.orange version of every monsite-orange url is equivalent to (but less convenient than) the original queue_all_versions logic
03:16:55<thuban>and it fails to distinguish between monsite.orange urls actually discovered in the wild (for which preserving the redirect may be useful) and those we merely conjecture (less so)
03:21:39<thuban>my previous suggestion was to put the old domains in a secondary queue, but idk whether the tracker can actually do pattern-based sorting of backfed items
03:21:51<nstrom|m>so I'm around for a little bit if this gets started within the next 45 mins or so but otherwise will need to get to bed
03:24:19<@JAA>arkiver: So, do I need to poke Drone?
03:44:12<@flashfire42>Atm it looks like its still queueing items
03:48:00<nstrom|m>oh well I'll check back in tomorrow, gl
04:00:45kiryu quits [Client Quit]
04:01:45<@arkiver>JAA: yes please
04:03:29<@arkiver>thuban: i'll add them
04:12:46<@arkiver>it's running
04:12:57<@arkiver>for now i'll keep it running on a low rate, i'm off to bed
04:14:54<thuban>sounds good!
04:15:06<@arkiver>so 1000 items/min
04:15:10<@arkiver>we'll optimize that tomorrow
04:15:26<thuban>thanks for your hard work, good night :)
04:17:58<@arkiver>well sorry for the delays
04:18:01<@arkiver>but i think we'll make it
04:18:04<@arkiver>and thanks!
04:18:10<@arkiver>good day/night to you too
04:44:46<@flashfire42>https://server8.kiska.pw/uploads/8ce843fb94aa421f/image.png
04:55:31<@flashfire42>https://server8.kiska.pw/uploads/0699e0ade83c7397/image.png
04:55:41<pokechu22>I'm seeing items on 9=0 http://perso.wanadoo.fr/haikal/_themes/zero/zerbul1a.gif - that doesn't seem particularly useful since perso.wanadoo.fr always times out
04:56:17<fireonlive>docker: Error response from daemon: manifest for atdr.meo.ws/archiveteam/pagespersoorange-grab:latest not found: manifest unknown: manifest unknown.
04:56:19<fireonlive>UwU
05:03:31kiryu (kiryu) joins
05:04:49<@flashfire42>https://transfer.archivete.am/QrrL3/Failed%20SetBadUrls%20for%20Item%20urlhttpp.txt
05:07:26<@flashfire42>Traceback (most recent call last):
05:07:26<@flashfire42> File "/usr/local/lib/python3.9/site-packages/seesaw/task.py", line 88, in enqueue
05:07:26<@flashfire42> self.process(item)
05:07:26<@flashfire42> File "<string>", line 158, in process
05:07:26<@flashfire42>ValueError: 'url:http://perso.orange.fr/imagetransfert/mur du son/mur du sonframeset-1.htm/robots.txt' is not in list
05:07:43<@flashfire42>Seems its freaking out when the URL isnt in the list?
05:12:45<pokechu22>My guess is it's the spaces maybe?
05:14:18<pokechu22>Here's a bigger problem:
05:14:21<pokechu22>Archiving item url:http://pagesperso-orange.fr/closdominant/copains.htm
05:14:23<pokechu22>5=301 http://pagesperso-orange.fr/closdominant/copains.htm
05:14:25<pokechu22>Server returned bad response. Skipping.
05:14:27<pokechu22>Aborting item url:http://pagesperso-orange.fr/closdominant/copains.htm.
05:15:53<pokechu22>http://pagesperso-orange.fr/xxxx will ALWAYS redirect to xxxx.pagesperso-orange.fr, but we're aborting those redirects. There's no point in grabbing the page in that case. This might also affect on-site links that redirect (I don't have any examples of these, but maybe ones to an open directory where the slash is added would have that?) (Aborting for redirects makes some sense
05:15:53<@flashfire42>at an educated guess anything "aborted" goes into the backlog
05:15:56<pokechu22>for redirects to the redirect to the 404 page though)
05:16:50<@flashfire42>and yeah I would say its the spaces making it freak out
05:17:26<pokechu22>Yeah, but we still want to save that redirect, otherwise there's no point in downloading it in the first place :)
05:19:32<thuban>https://github.com/ArchiveTeam/pagespersoorange-grab/blob/aaeb1a7b35f02c3d56d944651cc4dd7c655f9553/pagespersoorange.lua#L221 yeah, whoops
05:23:50<@flashfire42>wait so they wont be retried later?
05:23:53<@flashfire42>I am hella confused
05:49:19<@flashfire42>No fair project10 we dont all have a fleet XD
05:50:01<project10>:3
05:50:24<fireonlive>💪
05:50:35<fireonlive>did you just build the image yourself
05:50:38<project10>I think arkiver said 1k items/min is the throttle rate? we're doing like 10/min
05:50:43<project10>I did yes
05:50:49<fireonlive>ah :3
05:50:55<project10>well via docker-compose build directive
05:59:25<project10>ETA ~1000d
07:02:02magmaus3 (magmaus3) joins
07:11:51<@flashfire42>Traceback (most recent call last):
07:11:51<@flashfire42> File "/usr/local/lib/python3.9/site-packages/seesaw/task.py", line 88, in enqueue
07:11:51<@flashfire42> self.process(item)
07:11:51<@flashfire42> File "<string>", line 156, in process
07:11:51<@flashfire42> File "/usr/local/lib/python3.9/codecs.py", line 322, in decode
07:11:51<@flashfire42> (result, consumed) = self._buffer_decode(data, self.errors, final)
07:11:51<@flashfire42>UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 290: invalid continuation byte
07:59:29Peroniko quits [Client Quit]
08:47:52qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:55:27<phaeton>was a docker container built for this?
11:55:49<phaeton>image
12:03:11Exorcism quits [Read error: Connection reset by peer]
12:06:06Exorcism (exorcism) joins
12:26:43<nulldata>phaeton - doesn't seem so yet
12:33:09<imer>Im sure we'll speed up plenty with docker and once this is set to default warrior project
12:35:16<phaeton>looks like a high percentage of my items are getting dumped due to that line 88 error. Seems to be unable to match spacing in one direction or another
12:36:49<phaeton>https://transfer.archivete.am/3AZPk/line88.txt
12:44:47Exorcism quits [Remote host closed the connection]
12:47:10Exorcism (exorcism) joins
13:32:41<phaeton>Same line, different error. Looks similar to one earlier in the chat. https://transfer.archivete.am/FFij7/line88UnicodeDecodeError
13:36:42<imer>arkiver: ^
13:44:26Maturion joins
13:47:21Maturion quits [Remote host closed the connection]
15:01:26<project10>Archiving item url:http://pagesperso-orange.fr/philippe.dornbusch/analyses/games/1?1/=2<30444=4243>4b.<.a-,.-?*=...<+,=+)<
15:07:15<imer>JAA: reminder to poke ci :)
15:07:45<@JAA>imer: I did, but it's not working correctly.
15:07:59<imer>ok, thanks!
15:08:20<imer>I assume you've tried poking it harder?
15:08:24<imer>longer stick?
15:08:51<@JAA>I think I need a sharper one instead.
15:08:59<imer>ooh, good idea
15:20:35<nulldata>Maybe needs to be slapped around a bit with a large trout?
15:45:09<@arkiver>pokechu22: so perso.wanadoo.fr is completely gone? what else if completely gone?
15:45:45<@arkiver>fixes coming
16:15:57<project10>arkiver: pro.wanadoo.fr is also NXDOMAIN
16:25:32<pokechu22>arkiver: perso.wanadoo.fr is completely gone, and I guess pro is too, but http://monsite.wanadoo.fr/ *isn't*. It was like this before the original deadline as well. (Though the content that was on perso.wanadoo.fr is still on pagesperso-orange.fr.)
16:26:34<pokechu22>The last capture IA has for perso.wanadoo.fr is in May 2023
16:33:43Exorcism0 (exorcism) joins
16:34:27Exorcism quits [Read error: Connection reset by peer]
16:34:27Exorcism0 is now known as Exorcism
16:50:34<Exorcism>https://thelounge.exorcism.repl.co/uploads/97722943a80c6a32/image.png 😭
17:03:38<nulldata>Exorcism - yeah, JA A is finding a sharper stick to poke drone with as of this morning
17:03:58VoynichCR (VoynichCR) joins
17:07:25<fireonlive>🔪
17:32:43<Exorcism>👹
17:45:10VoynichCR quits [Remote host closed the connection]
18:07:59<@arkiver>pokechu22: "i guess pro is too" - can you be specific?
18:08:09<@arkiver>pro.wanadoo.fr?
18:08:13<@arkiver>because i'll filter these out now
18:09:16<pokechu22>Yeah, http://pro.wanadoo.fr/ and http://perso.wanadoo.fr/ should be filtered out
18:09:46<pokechu22>They're already marked as "skip" in the script though, so I guess that's just not working?
18:10:54<pokechu22>The other thing I noticed is that the script aborted http://pagesperso-orange.fr/closdominant/copains.htm because it gave a 301, but it was a 301 to https://closdominant.pagesperso-orange.fr/copains.htm which is the kind of 301 we definitely want to be saving (otherwise there's no point in doing requests to http://pagesperso-orange.fr/xxx as all of those will redirect)
18:11:23@flashfire42 quits [Client Quit]
18:11:23kiska quits [Client Quit]
18:14:12flashfire42 joins
18:15:26kiska (kiska) joins
18:24:30flashfire42 quits [Client Quit]
18:24:31kiska quits [Client Quit]
18:50:23Peroniko (Peroniko) joins
18:56:35flashfire42 joins
18:57:37kiska (kiska) joins
19:03:43Exorcism quits [Remote host closed the connection]
19:04:31Exorcism (exorcism) joins
19:28:12Exorcism quits [Remote host closed the connection]
19:28:50Exorcism (exorcism) joins
19:29:08Exorcism quits [Remote host closed the connection]
19:29:51Exorcism (exorcism) joins
20:13:44Chris5010 (Chris5010) joins
20:15:51Chris5010 quits [Client Quit]
20:25:20Chris5010 (Chris5010) joins
21:03:33<@arkiver>pokechu22: the "skip" is for something else
21:03:44<@arkiver>so we have directly queued URLs from your dumps, and URLs discovered through others
21:03:55<@arkiver>those marked "skip" will not be queued from other URLs
21:04:08<@arkiver>(while still seen as 'being part' of the project)
21:04:28<pokechu22>Alright, that makes sense
21:05:39<pokechu22>Are pages that don't exist/aren't valid still recorded into the WARC (e.g. https://aquilon.pagesperso-orange.fr/)? It feels like it would be useful to save that into the WBM especially if you queued all of the stuff in my files already (recording that a file existed in the past but not now is still interesting)
21:08:40<@arkiver>pokechu22: to confirm - a ban is not a 301 to some "ban page" right?
21:08:49<pokechu22>Right
21:08:49<@arkiver>if i remember correctly you said they just don't respond
21:09:15<pokechu22>The 301s to error pages are normal behavior for 404s and 401s/403s
21:09:24<pokechu22>it just won't respond or will refuse connections when banned
21:09:28<@arkiver>alright we'll accept all 301s as being fine
21:09:31<@arkiver>good
21:10:24<pokechu22>There's also 302s, not 100% sure what the pattern is between 301 and 302
21:12:01<pokechu22>might just be that 301 goes to a new location that's valid (or that's the page for the site having moved to a new location), while 302s go to the 404/403 page (example of a 302 to the 403 page: http://paroisse.wambrechies.pagesperso-orange.fr/Files/Image/Nouveau%20dossiervie%20tous%20les%20jours/presbytere%20izi.jpg)
21:12:27<@arkiver>yeah
21:12:31<@arkiver>we'll accept both 301 and 302
21:14:17colona quits [Ping timeout: 252 seconds]
21:14:33colona (colona) joins
21:20:55<@arkiver>pokechu22: do we still need to queue all versions if there is a redirect to the 404 page?
21:21:21<pokechu22>No, that's probably not necessary
21:21:26<@arkiver>alright
21:23:18<@arkiver>pokechu22: do all types go to https://r.orange.fr/r/Oerreur_404 in case of a 404?
21:23:39<pokechu22>I'm pretty sure they do
21:23:47<pokechu22>though there's a complication to that
21:23:50<pokechu22>of course
21:24:11<@arkiver>of course :P
21:25:13<pokechu22>http://pagesperso-orange.fr/convoi/css/index.css doesn't exist, but http://perso.orange.fr/convoi/css/index.css redirects to http://convoi.perso.orange.fr/ (but interestingly, http://convoi.perso.orange.fr/css/index.css does just give a redirect to the 404 directly? Going to try a second one to get more data)
21:26:10<pokechu22>I guess relatedly: http://18-25ans.perso.orange.fr/ existing does not in any way imply that https://18-25ans.pagesperso-orange.fr/ exists :|
21:28:19<pokechu22>ok this is dumb: http://ovine.sngtv.pagesperso-orange.fr/Enterites%20infectieuses.pdf exists
21:28:22<pokechu22>http://pagesperso-orange.fr/ovine.sngtv/Enterites%20infectieuses.pdf redirect to http://ovine.sngtv.pagesperso-orange.fr/Enterites%20infectieuses.pdf
21:28:24<pokechu22>http://perso.orange.fr/ovine.sngtv/Enterites%20infectieuses.pdf redirect to http://ovine.sngtv.perso.orange.fr/
21:28:26<pokechu22>http://ovine.sngtv.perso.orange.fr/Enterites%20infectieuses.pdf 404 despite existing on http://pagesperso-orange.fr/
21:31:17<@arkiver>pokechu22: so... reading that we should still queue all versions even if we get a redirect to a 404
21:31:37<pokechu22>Yeah :|
21:32:05<pokechu22>at least for ones on perso.orange.fr
21:33:00<pokechu22>It's probably not super useful to queue URLs on perso.orange.fr if the site doesn't explicitly link into those forms, but those that have already been found need to be queued into the other forms even if they give a 404 or redirect on perso.orange.fr
21:33:25<pokechu22>err, "give a 404" won't happen, rather if they redirect to the front page or if they redirect to the 404 page
21:34:30<pokechu22>But just doing everything is fine too, assuming we have time, which may or may not be the case :|
21:40:26<@arkiver>what a mess
21:43:05<nstrom|m>what's the docker image for this one called? the one I tried didn't work
21:43:17<@arkiver>that is unfortunately still having problems :/
21:43:23<@arkiver>JAA: we did not find a solution yet right?
21:44:48<nstrom|m>I can build it myself, just wasn't sure how the warrior users were getting it but I guess that doesn't use docker, it gets w git
21:44:58<@arkiver>excactly, yes
21:45:04<@arkiver>exactly*
21:45:06<@arkiver>update is coming
21:50:16<@JAA>arkiver: Nope :-|
21:50:32<@JAA>Maybe another commit will fix it.
21:50:42<@JAA>Which it seems is coming anyway.
21:51:15<@arkiver>yep
21:58:39<@arkiver>JAA: it's building
21:58:48<@JAA>:-)
21:59:06<@arkiver>this is now the warrior default
22:04:33<@arkiver>imer: do you think 4 seconds pause on a non-200 is safe?
22:06:01<@arkiver>pokechu22: i'm thinking of not queuing all versions anymore
22:06:34<imer>arkiver: i'd go higher maybe, not sure how many error redirects we're expecting though
22:06:51<imer>start off high and lower if once we know how things are going is safer than getting everyone banned :D
22:07:08<@arkiver>right i'm at 6 second now for non-200
22:07:11<@arkiver>2 second for 200
22:07:28<pokechu22>arkiver: that's probably fine, queueing all versions isn't super important, but queueing the right version from a URL in an older format is pretty important
22:07:44<pokechu22>e.g. going from http://ovine.sngtv.perso.orange.fr/Enterites%20infectieuses.pdf to http://ovine.sngtv.pagesperso-orange.fr/Enterites%20infectieuses.pdf is important but not the other way around
22:07:59<@arkiver>hmm okey
22:08:12anewarchiverlol2 joins
22:08:47<pokechu22>and as for http versus https, I'm still fairly confident in the rule of whether or not a dot is present determining whether the site redirects to or from https, but some of the pagespro stuff complicates that
22:09:02<@arkiver>pokechu22: in https://github.com/ArchiveTeam/pagespersoorange-grab/blob/master/pagespersoorange.lua#L140-L159 can you please indicate which i should mark "skip"?
22:09:13<@arkiver>"skip" means we will not queue this URLs if another version of it is found
22:10:05<pokechu22>Probably "perso.orange", "monsite.orange", and "pro.orange"/"pros.orange"
22:10:19<pokechu22>... and "monsite.wanadoo"
22:10:45<@arkiver>okey then we're left with queuing _to_
22:10:50<@JAA>(Permalink for future reference: https://github.com/ArchiveTeam/pagespersoorange-grab/blob/48fc3b422ca345fb3fb14c855304efc0e96bfd69/pagespersoorange.lua#L140-L159 )
22:10:56<@arkiver>pagesperso-orange
22:11:01<@arkiver>monsite-orange
22:11:06<@arkiver>pagespro-orange
22:11:11<@arkiver>and the three *.pagespro-orange
22:11:14<pokechu22>Yeah
22:11:18<@arkiver>JAA: what key do i have to press again to get that?
22:11:21<pokechu22>y
22:11:22<@arkiver>that URL
22:11:25<@arkiver>ah
22:11:26<@arkiver>thanks
22:11:27<nstrom|m>does it make sense to run this one at any higher than 1 concurrency w the throttling?
22:11:35<@arkiver>pokechu22: alright that change is rolling out now
22:11:37<@arkiver>nstrom|m: no
22:11:41<@arkiver>nstrom|m: we need IPs here
22:11:45<nstrom|m>thought so
22:12:16<@arkiver>update is in for that
22:12:34<@arkiver>nstrom|m: if you have multiple concurrent this will just increase the sleep time per request
22:12:58<thuban>arkiver: with this change, will we still retrieve (eg) monsite.orange urls queued manually and/or through backfeed?
22:13:15<@arkiver>thuban: yes
22:13:38<@arkiver>but we will not during archiving queue the domains pokechu22 just listed
22:13:41<thuban>ok, cool
22:13:49<@arkiver>(we will still accept them though as being part of the project)
22:16:48<@arkiver>pokechu22: since they opened it up again for us, could we ask them to lift the rate limiting?
22:17:18<pokechu22>Maybe? Let me double check who contacted support...
22:17:30<@arkiver>imer: on the 200, how confident are you that 1 second is fine?
22:18:12<thuban>pokechu22: it was plcp
22:19:30<@arkiver>project10: nice speed :)
22:19:56<imer>arkiver: not very, I can retest if you'd like
22:20:36<@arkiver>we can give it a day and see how this progresses before we do risky stuff
22:20:37<imer>I did get one ip banned if I remember right, although the 2nd at 1s run didnt
22:20:49<@arkiver>i have a 6 second sleep at non-200
22:20:53<@arkiver>do you think that should be safe?
22:21:12<imer>Maybe? I don't think I tested higher than 4 for error redirects
22:21:24<@arkiver>and you were banned at 4?
22:21:28<imer>yeah
22:21:34<imer>thats only hitting error redirects though
22:21:42<@arkiver>hmm okey
22:21:48<@arkiver>we're hitting a few of those
22:22:03<imer>I can also test non-error redirects, could only delay for error onces if they are the limit
22:22:13<@arkiver>yes please!
22:22:21<@arkiver>if you can that would be greatly appreciated
22:22:27<@arkiver>we do have the most of that i believe
22:22:36<imer>sure. will report back in an hour or so, give this time to run
22:22:51<@arkiver>thanks a lot!
22:29:46sepro (sepro) joins
22:30:47BornOn420 (BornOn420) joins
22:31:34@JAA sets the topic to: Pages Perso Orange: 1 concurrency recommended, strict rate limits, needs lots of IPs | Finding ISP web hosting services before the Grim Reaper finds them. | https://archiveteam.org/index.php?title=ISP_Hosting
22:35:04<Ryz>o#o;
22:36:21<fireonlive>owo
22:37:13<Ryz>Any idea when the Pages Perso Orange stuff is gonna shut down? Apparently it's supposed to shut down on 2023 September 05, but it didn't happen yet?
22:37:38<imer>Ryz: there was an extension until the 5th? of october
22:37:41<@arkiver>pokechu22: how important are the mairie, assoc, and ecole URLs?
22:37:53<@arkiver>i wonder if we can skip them too
22:38:04<@arkiver>if yes, we have only one URL per 'type' left that will be queued
22:38:20<pokechu22>I've seen them all in the wild
22:38:51<pokechu22>we don't necessarily need to queue between them, but sites using one form should probably have all of the pages in that form
22:38:53<@arkiver>right, but should we convert URLs _to_ them as well?
22:39:18<@arkiver>all of the pages - i guess those would be linked from each other right? if yes, they would already be found and queued that way
22:39:20<pokechu22>It's probably not needed
22:39:42<pokechu22>on the other hand there's a lot less content in the pros section in the first place so it's probably not *that* important
22:39:57<@arkiver>shall i set in that third 'type' on pagespro-orange to "queue" and the other 3 after that to "skip"?
22:40:19<@arkiver>(sorry that was a badly written sentence - maybe still clear)
22:40:22<pokechu22>Yeah, that's probably fine. Maybe come back and do the other ones if we have more time?
22:40:30<@arkiver>that might be problematic
22:40:41<project10>arkiver: you awoke the sleeping DLoader
22:40:50<@arkiver>project10: indeed :)
22:40:56<@arkiver>now we're all doomed
22:41:17<pokechu22>The main reason why mairie/assoc/ecole matter is for determining whether to use http versus https, which we're just doing both of anyways so that doesn't really matter I think
22:41:19<@arkiver>almost 4k items/min now :)
22:41:19<DLoader>:D
22:41:23<imer>guess you'll have to scale up more project10, the battle is on >:D
22:41:55<DLoader>I'm missing a /26 :/
22:41:56<@arkiver>the battle of biggest IP ranges :P
22:42:12<fireonlive>time to start bgp hijacking
22:42:18<@arkiver>DLoader: did you check your pockets?
22:42:42<project10>it's nice to see the ETA less than two years
22:42:50<DLoader>lol
22:42:55<DLoader>will get it online tomorrow
22:43:27<imer>so it _was_ in your pockets?
22:43:46<DLoader>maybe
22:43:47<@arkiver>:P
22:43:50<fireonlive>*checks where i am on the leaderboard*
22:43:51<fireonlive>oof
22:44:17<fireonlive>need me an /8
22:44:37<project10>fireonlive: https://kagi.com/proxy/ignorance-is-bliss-cypher.gif
22:44:41<project10>err :/
22:44:46<project10>https://c.tenor.com/RZ1wlnUXbskAAAAC/ignorance-is-bliss-cypher.gif
22:44:56<fireonlive>true :D
22:48:27<anewarchiverlol2>Sorry if I'm in the wrong spot to ask (I just started today): if I need to stop running the Warrior appliance, I should press "shutdown" and wait for it to say it is finished before stopping, right? I think I understand this, but I want to check so I do it right.
22:48:46<@arkiver>anewarchiverlol2: that is the nice way yeah!
22:49:07<@arkiver>anewarchiverlol2: if you make it stop immediately though, it will not be too bad
22:49:35<imer>^ uncompleted items will get retried eventually
22:49:49<imer>no bans so far, which is odd. I expected at least one
22:50:01<@arkiver>imer: sound good
22:50:37<@arkiver>so we have 3 or 4 types of URLs:
22:50:38<project10>I didn't see the topic, am running conc 20, not seeing anything odd
22:50:38<@arkiver> - 200
22:50:41<anewarchiverlol2>Thank you, arkiver
22:50:50<@arkiver> - 3xx to 404
22:50:51<@JAA>project10: It'll throttle itself automatically.
22:51:01<@arkiver> - 3xx to https version from http
22:51:12<project10>JAA: throttle on what status code though, if any?
22:51:15<@arkiver> - 3xx from x.domain.fr to domain.fr/x/ version
22:51:41<@arkiver>project10: 2*concurrency on 200
22:51:51<@arkiver>project10: 6*concurrency on non-200
22:52:04<@arkiver>imer: if checking if we can lower the factor we multiple with the concurrency
22:52:18<imer>the inverse as well http://pagesperso-orange.fr/dupui/wow/trombinoscope_wow_038.htm redirects to http://dupui.pagesperso-orange.fr/... (and then 404, but thats beside the point :D)
22:52:25<@arkiver>oh yeah!
22:52:45<@arkiver>so for all those types i could add separate sleeping times if we know what the rates are there
22:54:31<imer>there's two(?) error pages I've seen https://r.orange.fr/r/Oerreur_40X redirects to https://e.orange.fr/error40X.html (replacing X with the code)
22:55:45<pokechu22>I don't think I've seen any examples of 3xx from x.domain.fr to domain.fr/x/ version, only the other direction, but not 100% sure
22:55:59<imer>https://r.orange.fr/r/Oerreur_404 doesnt seem to like curl, always redirects me to 403, but when I open that in browser I end up on 404
22:56:01<project10>is it just my eyes, or is todo:backfeed growing faster than todo queue is emptying?
22:56:29<imer>project10: yes, ~2.6x faster currently
22:58:15<@arkiver>imer: yeah on the two error pages
23:03:35<fireonlive>alert: the NotAlexes are multiplying 😱
23:05:03<imer>current status, all unbanned, testing:
23:05:03<imer>http://fabienne.oreb.pagesperso-orange.fr/photos/andalousie/af_andalousie.html 200 on (req/min) 90, 70, 60, 50
23:05:03<imer>http://imagesdeparfums.perso.orange.fr/Cartier/TN_SoPretty98.JPG redirect to 404 on 12, 10, 6, 4
23:05:03<imer>http://pagesperso-orange.fr/dupui/wow/trombinoscope_wow_038.htm normal redirect (not following the 2nd/3rd one to 404 here) on 90, 70, 60, 30
23:06:27<imer>gonna try an error page on a high rate to see if bans are still working..
23:10:44<@arkiver>well that sounds pretty promising
23:16:37<imer>5=400 http://baggio%20.monsite-orange.fr/ yep, thats bad request indeed :D
23:16:57<@arkiver>whoops..
23:16:59<imer>well, when I tested last I would have been banned by now
23:17:09<imer>so, uh, not sure what that means
23:17:20<pokechu22>Yeah, the list of URLs includes some junk like that :|
23:17:32<@arkiver>pokechu22: i tried to filter some junk out
23:17:36<@arkiver>but there may still be left
23:17:49<pokechu22>I think I did include ones with %20 like that filtered out but I'm not 100% sure
23:18:19<imer>don't have any other 400 in my logs so should be fine
23:18:43<pokechu22>Yeah, https://baggio.monsite-orange.fr is in some of the lists
23:18:48<@arkiver>well interesting, maybe increased the allowed rate
23:23:08<imer>been trying the error page on 5req/s for over 15min now and nothing, took ~7min at 1req/s last time
23:23:28<imer>going to stop that experiment now I think, is a bit rude
23:24:08<imer>yeah, no bans. weird
23:25:01<@arkiver>imer: please feel free to continue the experiment
23:25:13<imer>unless they do ip reputation stuff, not sure which ones I used last time around
23:25:24<@arkiver>if we increase the limits on our side, they'll get more requests anyway
23:26:07<imer>anything you'd like me to try in particular?
23:26:22<@arkiver>i guess just the 4 different cases
23:26:34<@arkiver>i could try lowering he timeout if you see no bans at all...
23:31:04<imer>bans *are* still working, my experimental vm just got banned (running 40x conc 1 containers on one ip)
23:32:05<imer>took about 20min
23:39:12Exorcism quits [Remote host closed the connection]
23:40:00Exorcism (exorcism) joins
23:44:19octylFractal|m joins
23:45:42<imer>running (different urls) for 200 in req/min: 600, 300, 150, 60. 404 redirect 120, 60, 30, 15. normal redirect: 600 300 200 60
23:46:09<imer>got a ban on 2/s 404 redirect for https://tatudream.pagesperso-orange.fr
23:46:59<imer>after 460s
23:47:58<project10>imer: what do bans look like here? 429, dropped SYNs, or RST?
23:48:04<imer>connection timeouts
23:48:33<project10>thanx
23:49:24<imer>1/s error got banned after 680s
23:51:41<project10>too bad they aren't listening on ipv6 >:P
23:54:37<@arkiver>:P
23:58:47<imer>0.5/s error got banned after 1200s