00:07:06Exorcism quits [Remote host closed the connection]
00:09:33Exorcism (exorcism) joins
00:21:05Chris5010 quits [Read error: Connection reset by peer]
00:23:07Megame (Megame) joins
00:35:20<@arkiver>sure
00:35:35<@arkiver>let me see if everything reported in here was queued (it should be)
01:03:58<@arkiver>little bit more is going through
01:46:18<@arkiver>thuban: https://transfer.archivete.am/ysWNb/pagespersoorange_claims.txt.gz
01:48:03<thuban>arkiver: cool, will do!
01:48:42<thuban>meanwhile, what about euthanizing bad items?
01:49:45<@arkiver>well sure
01:49:53<@arkiver>not sure if it matters much
01:50:05<@arkiver>whether they're in claims or the unretrievable list
01:51:24<thuban>maybe not, but it might make finding any remaining valid items quicker
01:51:53<@arkiver>perhaps
01:52:01<@arkiver>we've gone through those many times now
01:56:05<thuban>that can't quite be the case, because they spent a long time trickling in really slowly (and each new item completed was presumably processed for the first time)
02:02:27<@flashfire42>Not really because somehow we are still getting new stuff coming in
02:06:56<thuban>the only possible source of which is backfeed from those same successfully completed items, yeah?
02:07:46<@flashfire42>My presumption is some of them are being picked up by workers who are banned and just havent switched so they keep going round and round until OH LOOK SOMEONE WHO ISNT BANNED GOT ONE
02:08:17<thuban>well, when i do manage to get an item, i can see that most of the urls are bad
02:10:04<thuban>so i fear that it's some combination of that, and the probability of pulling an actually-valid url getting lower and lower (since the bad urls are never removed)
02:10:50decky_e joins
02:10:50systwi quits [Read error: Connection reset by peer]
02:11:42systwi (systwi) joins
02:12:45decky_e_ quits [Ping timeout: 265 seconds]
02:13:59<@flashfire42>Whats the unretrievable limit set to? maybe set it to like 1000? if a url goes around 1000 times and its still invalid its likely invalid
02:14:14<@flashfire42>once everything runs into unretrievable we queue it again until the time is over?
02:18:40<thuban>arkiver: a bunch of these are of the form 'pages.perso.rec.orange.fr/php/compteur.php?', which i'm guessing is a counter (and to which access requires authorization).
02:18:55<thuban>lua script treats them as onsite because the patterns at https://github.com/ArchiveTeam/pagespersoorange-grab/blob/bc158f9d77b085d680d079410b12f0541752f9e9/pagespersoorange.lua#L218-L226 are pretty lenient
02:19:28<thuban>should i change them to the homepages of the corresponding sites? that's likely where we got them in the first place, but if so it shouldn't cause them to get queued again, right?
02:38:44kiska quits [Ping timeout: 252 seconds]
02:38:44@flashfire42 quits [Ping timeout: 252 seconds]
02:55:24<thuban>ok, here's a funny one: 'http://mac.land.pagesperso-orange.fr/mobiland/son/Aqua(Barbiegirl)v2.0.emy%0D' is actually valid, despite the trailing cr. but the corresponding 'http://pagesperso-orange.fr/mac.land/mobiland/son/Aqua(Barbiegirl)v2.0.emy%0D' is a 500 server error
03:12:07<@arkiver>thuban: see what i wrote a little earlier - some more stuff was getting queued
03:12:42<thuban>i meant before that
03:13:36<@arkiver>setting a maxtries here
03:13:42<@arkiver>of value 25
03:14:02<@arkiver>done
03:14:24<@arkiver>thuban: if there is more stuff i should queued, please ping me and i'll put it in when i get up tomorrow
03:21:10<thuban>arkiver: deadline given is 1000 (presumably french time, so ~4.5 hours from now)
03:24:16<project10>the message reads almost like a standard maintenance outage
03:24:44<thuban>yeah, it's odd
03:25:24<thuban>do you want me to just dump what i have right now?
03:28:31<@arkiver>thuban: yes
03:29:07<@arkiver>thuban: possible in a minute or so?
03:29:30<thuban>sure
03:33:45<thuban>arkiver: https://transfer.archivete.am/ki1sJ/pagespersoorange_claims_partiallyscrubbed.txt
03:33:52<thuban>messy, not deduped, etc
03:33:53<@arkiver>thanks
03:34:32<thuban>is JAA or another tracker admin going to be around, or should i check out?
03:35:22<pokechu22>... huh, http://clubalm.pagesperso-orange.fr/1252022/t�l�chargement.jpg - this is an actual 404 onsite?
03:35:45<@arkiver>thuban: you have an invalid continuation byte in there?
03:35:50<pokechu22>interesting: anything on that site that's invalid gives a real 404
03:36:10<@arkiver>ignoring that byte for now
03:36:21<@arkiver>thuban: probably not many people around the next few hours no
03:36:31<pokechu22>(that one should be https://clubalm.pagesperso-orange.fr/1252022/t%C3%A9l%C3%A9chargement.jpg)
03:36:34<@arkiver>but just place it here for me and will queue it anyway
03:36:58<@arkiver>thuban: your list is queued. 76882 items
03:37:09<thuban>i said it was messy! dunno how that happened
03:37:11<@arkiver>they're going through nicely :)
03:37:43@arkiver is afk
03:38:00<@arkiver>thanks for all the effort here! this is also a successful project :)
03:38:35<thuban>pokechu22: yeah, a lot of them are real errors that don't correspond to anything currently accessible at all. sure makes it harder to figure out working patterns
03:38:52<thuban>arkiver: ditto!
03:39:04<fireonlive>Saved! ^_^
03:39:09<fireonlive>y’all are awesome
03:39:20<thuban>i have some errands to run just at the moment, but i'll get back to scrubbing in a bit
03:39:26<pokechu22>Most of the � ones are probably fixable, but it would require knowing french to determine what the right words should be
03:39:45<fireonlive>maybe Exorcism can help
03:40:16<fireonlive>if she's around
03:54:01<project10>well the "chargement" is "téléchargement", "download/downloading"
04:01:55<pokechu22>Yeah, that one is téléchargement - my guess is that in some situations accented characters got messed up
04:05:38<fireonlive>project10: someone took french in grades 6-12
04:05:41<fireonlive>:p
04:06:47<project10>k-10, I still remember ALT-130 = é, ALT-138 = è :D
04:07:29<fireonlive>ah! :D
05:08:24sonick (sonick) joins
05:26:41<Exorcism>yep it's téléchargement
05:27:13<Exorcism>didn't forget that we also use: ê and à
05:28:28<project10>ç :)
05:28:54<Exorcism>yep too
06:04:29<flashfire42|m>Um should I be worried about the amount of 4KB items? That are close to the same size and that my status=0 are being accepted by the tracker?
06:05:49<BornOn420>the complete list: ç é â ê ô û à è ù ë ï ü
06:06:13<pokechu22>Is the oe ligature still used nowadays?
06:06:55<thuban>flashfire42|m: i assume that's related to the maxtries=25 now set
06:07:01Exorcism quits [Remote host closed the connection]
06:07:39<flashfire42|m>Ok so are they gonna go into unretrievable?
06:08:11<thuban>i dunno, they appear to be counted as done >:?
06:08:24<thuban>i saw some stuff going into redo at first but there's nothing there now
06:08:58<flashfire42|m>So is it possible we are gonna miss some stuff?
06:09:23<thuban>not likely, but idk
06:11:24<flashfire42|m>Actually it looks like these are malformed urls
06:12:04<flashfire42|m>http://famille.delaye%20-%20ce%20site%20vous%20propose%20une%20decouverte%20des%20fleurs%20dans%20l'art%20et%20l'ikebana,%20un%20chemin%20de%20spiritualite%20et%20de%20theologie,%20l'evangile%20et%20les%20psaumes%20et%20des%20voyages%20en%20inde,%20a%20malte,%20a%20saint-petersbourg%20et%20dans%20le%20desert%20tunisien.pagesperso-orange.fr/
06:14:20flashfire42 joins
06:15:16<fireonlive>wb flashy
06:15:26kiska (kiska) joins
06:18:19Exorcism (exorcism) joins
06:20:19<thuban>that's what i mean--the bad ones hit maxtries
06:23:51@ChanServ sets mode: +o flashfire42
06:24:06<@flashfire42>So anyway kiska your lounge server is fried XD
06:27:14<fireonlive>flashfire42: PM me your nickserv password and i'll fix it ;)
06:27:18<fireonlive>:P
06:27:38<fireonlive>jkjkofc
06:28:34<project10>1=0 https://boussier-larequille.pagesperso-orange.fr/images/familial/mona/div_photos/therese22.jpg anyone else get a 401 for this?
06:29:27<fireonlive>401 here
06:29:47<fireonlive>/ loads fine
06:30:04<project10>seeing a number of these items, they show in the logs as =0 errors, but trying them out I see 401s
06:30:12<fireonlive>weird
06:30:26<project10>private family photos and such it looks like based on the url fragments
06:30:54<thuban>maybe more of the workers are banned than anticipated?
06:31:44<project10>https://Commisions+r%e9gionales.pagesperso-orange.fr/
06:32:10<project10>cursed hostname
06:32:43<fireonlive>my 401 is from my home connection
06:32:46<fireonlive>er
06:32:49<fireonlive>401 confirmation
06:32:57<fireonlive>which doesn't run any AT things atm
06:33:32<project10>another 0 in the logs, but 401 from home https://news.pagespro-orange.fr/l-europe-se-reunit-avec-zelensky-en-moldavie-attaque-russe-sur-kiev-CNT0000024ko6x.html
06:33:44<fireonlive>ye 401 here too
06:35:00<project10>it's basically all invalid hosts/nxdomain or 401s now from what I can see
06:36:36<project10>whole site is 401s: https://mfp64.pagesperso-orange.fr/ https://barbotiere.monsite-orange.fr/
06:38:02<project10>1=0 http://telephone..pagesperso-orange.fr/
06:40:02<BornOn420>pokechu22: I guess so, as in œil. Disclaimer: just did a conversation course, not spelling
07:30:05Megame quits [Read error: Connection reset by peer]
07:30:14fuzzy8021 quits [Ping timeout: 252 seconds]
07:31:26fuzzy8021 (fuzzy8021) joins
07:34:50Exorcism5 (exorcism) joins
07:35:44qwertyasdfuiopghjkl quits [Remote host closed the connection]
07:36:40Exorcism quits [Read error: Connection reset by peer]
07:36:40Exorcism5 is now known as Exorcism
07:45:31levomi quits [Remote host closed the connection]
08:01:42Exorcism5 (exorcism) joins
08:02:26Exorcism quits [Read error: Connection reset by peer]
08:27:27<thuban>arkiver: https://transfer.archivete.am/13d1cG/pagespersoorange_claims_partiallyscrubbed.txt (ugh, a mess still)
08:38:01<thuban>aaand it's down
08:39:23<plcp>today was the last day iirc
08:39:27<thuban>yep
08:45:01<@flashfire42>0s and 400s across the board
09:54:58BornOn420 quits [Client Quit]
09:56:19Exorcism5 quits [Remote host closed the connection]
09:57:21Exorcism5 (exorcism) joins
10:57:02mrfooooo quits [Ping timeout: 252 seconds]
11:49:40mrfooooo joins
11:56:07sonick quits [Client Quit]
12:18:44Peroniko (Peroniko) joins
12:32:11mrfooooo quits [Ping timeout: 252 seconds]
12:51:46mrfooooo joins
14:37:02<@arkiver>yep, project done!
14:37:41<thuban>does this qualify for the full "Saved!"?
14:37:45<@arkiver>yes
14:39:59<thuban>:)
14:41:20<DLoader>nice, gj!
14:43:36<imer>well done everyone! 🎉
14:47:39<@arkiver>plcp: you were in contact with Orange about this right?
14:48:09<@arkiver>perhaps we can let them know that making a copy of the websites was successful, and that the web pages will now be available through the Wayback Machine :)
14:48:22<@arkiver>project paused.
14:50:57<phaeton>nicely done all! I assume the auto warrior project should be updated?
14:51:54<@arkiver>oh right!
14:52:10<@arkiver>done, thanks :)
15:06:34fuzzy8021 quits [Ping timeout: 265 seconds]
15:07:46fuzzy8021 (fuzzy8021) joins
15:12:42jacksonchen666 (jacksonchen666) joins
15:19:30Exorcism5 quits [Remote host closed the connection]
15:21:41Exorcism5 (exorcism) joins
15:26:17jacksonchen666 leaves
15:30:25<fireonlive>😃
15:34:32<Exorcism5>gg everyone hehe
15:34:37Exorcism5 is now known as Exorcism
15:45:49Maturion joins
16:00:30Maturion quits [Remote host closed the connection]
16:02:14Maturion joins
16:19:22<plcp>arkiver: I was in contact with someone interally, yes, someone that survived the company going private in the 2000s and sympathetic to the "archiving stuff" side
16:20:26<plcp>(given that he is the author of one of the largest "pages persos orange" on french telco history, he sure is passionate about this :D)
16:20:51<plcp>he has been informed and very happy to have pulled the strings he could :)
16:22:43<plcp>(but for what it's worth, Orange is a very large company and I think that most of their employees weren't aware that pagespersos even existed, and IMHO at best making this more official would worry someone about copyright shit & co)
16:43:23qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
17:35:12Exorcism6 (exorcism) joins
17:35:57Exorcism quits [Read error: Connection reset by peer]
18:20:48yts98 joins
18:29:50BornOn420 (BornOn420) joins
18:30:22Exorcism6 quits [Client Quit]
18:31:02Exorcism (exorcism) joins
19:19:51<@arkiver>plcp: that is very nice to hear and great to hear about his enthusiastic reactions :)
19:20:03<@arkiver>a large part of Orange history will live on in the Wayback Machine!
19:20:09<@arkiver>thank you plcp :)
19:31:50Bespork leaves
19:35:17Maturion quits [Remote host closed the connection]
21:19:16BornOn420 quits [Client Quit]
21:32:57BornOn420 (BornOn420) joins
22:25:50decky joins
22:28:49decky_e quits [Ping timeout: 265 seconds]
22:48:43decky quits [Read error: Connection reset by peer]
23:40:20decky_e joins