| 00:07:06 | | Exorcism quits [Remote host closed the connection] |
| 00:09:33 | | Exorcism (exorcism) joins |
| 00:21:05 | | Chris5010 quits [Read error: Connection reset by peer] |
| 00:23:07 | | Megame (Megame) joins |
| 00:35:20 | <@arkiver> | sure |
| 00:35:35 | <@arkiver> | let me see if everything reported in here was queued (it should be) |
| 01:03:58 | <@arkiver> | little bit more is going through |
| 01:46:18 | <@arkiver> | thuban: https://transfer.archivete.am/ysWNb/pagespersoorange_claims.txt.gz |
| 01:48:03 | <thuban> | arkiver: cool, will do! |
| 01:48:42 | <thuban> | meanwhile, what about euthanizing bad items? |
| 01:49:45 | <@arkiver> | well sure |
| 01:49:53 | <@arkiver> | not sure if it matters much |
| 01:50:05 | <@arkiver> | whether they're in claims or the unretrievable list |
| 01:51:24 | <thuban> | maybe not, but it might make finding any remaining valid items quicker |
| 01:51:53 | <@arkiver> | perhaps |
| 01:52:01 | <@arkiver> | we've gone through those many times now |
| 01:56:05 | <thuban> | that can't quite be the case, because they spent a long time trickling in really slowly (and each new item completed was presumably processed for the first time) |
| 02:02:27 | <@flashfire42> | Not really because somehow we are still getting new stuff coming in |
| 02:06:56 | <thuban> | the only possible source of which is backfeed from those same successfully completed items, yeah? |
| 02:07:46 | <@flashfire42> | My presumption is some of them are being picked up by workers who are banned and just havent switched so they keep going round and round until OH LOOK SOMEONE WHO ISNT BANNED GOT ONE |
| 02:08:17 | <thuban> | well, when i do manage to get an item, i can see that most of the urls are bad |
| 02:10:04 | <thuban> | so i fear that it's some combination of that, and the probability of pulling an actually-valid url getting lower and lower (since the bad urls are never removed) |
| 02:10:50 | | decky_e joins |
| 02:10:50 | | systwi quits [Read error: Connection reset by peer] |
| 02:11:42 | | systwi (systwi) joins |
| 02:12:45 | | decky_e_ quits [Ping timeout: 265 seconds] |
| 02:13:59 | <@flashfire42> | Whats the unretrievable limit set to? maybe set it to like 1000? if a url goes around 1000 times and its still invalid its likely invalid |
| 02:14:14 | <@flashfire42> | once everything runs into unretrievable we queue it again until the time is over? |
| 02:18:40 | <thuban> | arkiver: a bunch of these are of the form 'pages.perso.rec.orange.fr/php/compteur.php?', which i'm guessing is a counter (and to which access requires authorization). |
| 02:18:55 | <thuban> | lua script treats them as onsite because the patterns at https://github.com/ArchiveTeam/pagespersoorange-grab/blob/bc158f9d77b085d680d079410b12f0541752f9e9/pagespersoorange.lua#L218-L226 are pretty lenient |
| 02:19:28 | <thuban> | should i change them to the homepages of the corresponding sites? that's likely where we got them in the first place, but if so it shouldn't cause them to get queued again, right? |
| 02:38:44 | | kiska quits [Ping timeout: 252 seconds] |
| 02:38:44 | | @flashfire42 quits [Ping timeout: 252 seconds] |
| 02:55:24 | <thuban> | ok, here's a funny one: 'http://mac.land.pagesperso-orange.fr/mobiland/son/Aqua(Barbiegirl)v2.0.emy%0D' is actually valid, despite the trailing cr. but the corresponding 'http://pagesperso-orange.fr/mac.land/mobiland/son/Aqua(Barbiegirl)v2.0.emy%0D' is a 500 server error |
| 03:12:07 | <@arkiver> | thuban: see what i wrote a little earlier - some more stuff was getting queued |
| 03:12:42 | <thuban> | i meant before that |
| 03:13:36 | <@arkiver> | setting a maxtries here |
| 03:13:42 | <@arkiver> | of value 25 |
| 03:14:02 | <@arkiver> | done |
| 03:14:24 | <@arkiver> | thuban: if there is more stuff i should queued, please ping me and i'll put it in when i get up tomorrow |
| 03:21:10 | <thuban> | arkiver: deadline given is 1000 (presumably french time, so ~4.5 hours from now) |
| 03:24:16 | <project10> | the message reads almost like a standard maintenance outage |
| 03:24:44 | <thuban> | yeah, it's odd |
| 03:25:24 | <thuban> | do you want me to just dump what i have right now? |
| 03:28:31 | <@arkiver> | thuban: yes |
| 03:29:07 | <@arkiver> | thuban: possible in a minute or so? |
| 03:29:30 | <thuban> | sure |
| 03:33:45 | <thuban> | arkiver: https://transfer.archivete.am/ki1sJ/pagespersoorange_claims_partiallyscrubbed.txt |
| 03:33:52 | <thuban> | messy, not deduped, etc |
| 03:33:53 | <@arkiver> | thanks |
| 03:34:32 | <thuban> | is JAA or another tracker admin going to be around, or should i check out? |
| 03:35:22 | <pokechu22> | ... huh, http://clubalm.pagesperso-orange.fr/1252022/t�l�chargement.jpg - this is an actual 404 onsite? |
| 03:35:45 | <@arkiver> | thuban: you have an invalid continuation byte in there? |
| 03:35:50 | <pokechu22> | interesting: anything on that site that's invalid gives a real 404 |
| 03:36:10 | <@arkiver> | ignoring that byte for now |
| 03:36:21 | <@arkiver> | thuban: probably not many people around the next few hours no |
| 03:36:31 | <pokechu22> | (that one should be https://clubalm.pagesperso-orange.fr/1252022/t%C3%A9l%C3%A9chargement.jpg) |
| 03:36:34 | <@arkiver> | but just place it here for me and will queue it anyway |
| 03:36:58 | <@arkiver> | thuban: your list is queued. 76882 items |
| 03:37:09 | <thuban> | i said it was messy! dunno how that happened |
| 03:37:11 | <@arkiver> | they're going through nicely :) |
| 03:37:43 | | @arkiver is afk |
| 03:38:00 | <@arkiver> | thanks for all the effort here! this is also a successful project :) |
| 03:38:35 | <thuban> | pokechu22: yeah, a lot of them are real errors that don't correspond to anything currently accessible at all. sure makes it harder to figure out working patterns |
| 03:38:52 | <thuban> | arkiver: ditto! |
| 03:39:04 | <fireonlive> | Saved! ^_^ |
| 03:39:09 | <fireonlive> | y’all are awesome |
| 03:39:20 | <thuban> | i have some errands to run just at the moment, but i'll get back to scrubbing in a bit |
| 03:39:26 | <pokechu22> | Most of the � ones are probably fixable, but it would require knowing french to determine what the right words should be |
| 03:39:45 | <fireonlive> | maybe Exorcism can help |
| 03:40:16 | <fireonlive> | if she's around |
| 03:54:01 | <project10> | well the "chargement" is "téléchargement", "download/downloading" |
| 04:01:55 | <pokechu22> | Yeah, that one is téléchargement - my guess is that in some situations accented characters got messed up |
| 04:05:38 | <fireonlive> | project10: someone took french in grades 6-12 |
| 04:05:41 | <fireonlive> | :p |
| 04:06:47 | <project10> | k-10, I still remember ALT-130 = é, ALT-138 = è :D |
| 04:07:29 | <fireonlive> | ah! :D |
| 05:08:24 | | sonick (sonick) joins |
| 05:26:41 | <Exorcism> | yep it's téléchargement |
| 05:27:13 | <Exorcism> | didn't forget that we also use: ê and à |
| 05:28:28 | <project10> | ç :) |
| 05:28:54 | <Exorcism> | yep too |
| 06:04:29 | <flashfire42|m> | Um should I be worried about the amount of 4KB items? That are close to the same size and that my status=0 are being accepted by the tracker? |
| 06:05:49 | <BornOn420> | the complete list: ç é â ê ô û à è ù ë ï ü |
| 06:06:13 | <pokechu22> | Is the oe ligature still used nowadays? |
| 06:06:55 | <thuban> | flashfire42|m: i assume that's related to the maxtries=25 now set |
| 06:07:01 | | Exorcism quits [Remote host closed the connection] |
| 06:07:39 | <flashfire42|m> | Ok so are they gonna go into unretrievable? |
| 06:08:11 | <thuban> | i dunno, they appear to be counted as done >:? |
| 06:08:24 | <thuban> | i saw some stuff going into redo at first but there's nothing there now |
| 06:08:58 | <flashfire42|m> | So is it possible we are gonna miss some stuff? |
| 06:09:23 | <thuban> | not likely, but idk |
| 06:11:24 | <flashfire42|m> | Actually it looks like these are malformed urls |
| 06:12:04 | <flashfire42|m> | http://famille.delaye%20-%20ce%20site%20vous%20propose%20une%20decouverte%20des%20fleurs%20dans%20l'art%20et%20l'ikebana,%20un%20chemin%20de%20spiritualite%20et%20de%20theologie,%20l'evangile%20et%20les%20psaumes%20et%20des%20voyages%20en%20inde,%20a%20malte,%20a%20saint-petersbourg%20et%20dans%20le%20desert%20tunisien.pagesperso-orange.fr/ |
| 06:14:20 | | flashfire42 joins |
| 06:15:16 | <fireonlive> | wb flashy |
| 06:15:26 | | kiska (kiska) joins |
| 06:18:19 | | Exorcism (exorcism) joins |
| 06:20:19 | <thuban> | that's what i mean--the bad ones hit maxtries |
| 06:23:51 | | flashfire42 is now authenticated as flashfire42 |
| 06:23:51 | | @ChanServ sets mode: +o flashfire42 |
| 06:24:06 | <@flashfire42> | So anyway kiska your lounge server is fried XD |
| 06:27:14 | <fireonlive> | flashfire42: PM me your nickserv password and i'll fix it ;) |
| 06:27:18 | <fireonlive> | :P |
| 06:27:38 | <fireonlive> | jkjkofc |
| 06:28:34 | <project10> | 1=0 https://boussier-larequille.pagesperso-orange.fr/images/familial/mona/div_photos/therese22.jpg anyone else get a 401 for this? |
| 06:29:27 | <fireonlive> | 401 here |
| 06:29:47 | <fireonlive> | / loads fine |
| 06:30:04 | <project10> | seeing a number of these items, they show in the logs as =0 errors, but trying them out I see 401s |
| 06:30:12 | <fireonlive> | weird |
| 06:30:26 | <project10> | private family photos and such it looks like based on the url fragments |
| 06:30:54 | <thuban> | maybe more of the workers are banned than anticipated? |
| 06:31:44 | <project10> | https://Commisions+r%e9gionales.pagesperso-orange.fr/ |
| 06:32:10 | <project10> | cursed hostname |
| 06:32:43 | <fireonlive> | my 401 is from my home connection |
| 06:32:46 | <fireonlive> | er |
| 06:32:49 | <fireonlive> | 401 confirmation |
| 06:32:57 | <fireonlive> | which doesn't run any AT things atm |
| 06:33:32 | <project10> | another 0 in the logs, but 401 from home https://news.pagespro-orange.fr/l-europe-se-reunit-avec-zelensky-en-moldavie-attaque-russe-sur-kiev-CNT0000024ko6x.html |
| 06:33:44 | <fireonlive> | ye 401 here too |
| 06:35:00 | <project10> | it's basically all invalid hosts/nxdomain or 401s now from what I can see |
| 06:36:36 | <project10> | whole site is 401s: https://mfp64.pagesperso-orange.fr/ https://barbotiere.monsite-orange.fr/ |
| 06:38:02 | <project10> | 1=0 http://telephone..pagesperso-orange.fr/ |
| 06:40:02 | <BornOn420> | pokechu22: I guess so, as in œil. Disclaimer: just did a conversation course, not spelling |
| 07:30:05 | | Megame quits [Read error: Connection reset by peer] |
| 07:30:14 | | fuzzy8021 quits [Ping timeout: 252 seconds] |
| 07:31:26 | | fuzzy8021 (fuzzy8021) joins |
| 07:34:50 | | Exorcism5 (exorcism) joins |
| 07:35:44 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 07:36:40 | | Exorcism quits [Read error: Connection reset by peer] |
| 07:36:40 | | Exorcism5 is now known as Exorcism |
| 07:45:31 | | levomi quits [Remote host closed the connection] |
| 08:01:42 | | Exorcism5 (exorcism) joins |
| 08:02:26 | | Exorcism quits [Read error: Connection reset by peer] |
| 08:27:27 | <thuban> | arkiver: https://transfer.archivete.am/13d1cG/pagespersoorange_claims_partiallyscrubbed.txt (ugh, a mess still) |
| 08:38:01 | <thuban> | aaand it's down |
| 08:39:23 | <plcp> | today was the last day iirc |
| 08:39:27 | <thuban> | yep |
| 08:45:01 | <@flashfire42> | 0s and 400s across the board |
| 09:54:58 | | BornOn420 quits [Client Quit] |
| 09:56:19 | | Exorcism5 quits [Remote host closed the connection] |
| 09:57:21 | | Exorcism5 (exorcism) joins |
| 10:57:02 | | mrfooooo quits [Ping timeout: 252 seconds] |
| 11:49:40 | | mrfooooo joins |
| 11:56:07 | | sonick quits [Client Quit] |
| 12:18:44 | | Peroniko (Peroniko) joins |
| 12:32:11 | | mrfooooo quits [Ping timeout: 252 seconds] |
| 12:51:46 | | mrfooooo joins |
| 14:37:02 | <@arkiver> | yep, project done! |
| 14:37:41 | <thuban> | does this qualify for the full "Saved!"? |
| 14:37:45 | <@arkiver> | yes |
| 14:39:59 | <thuban> | :) |
| 14:41:20 | <DLoader> | nice, gj! |
| 14:43:36 | <imer> | well done everyone! 🎉 |
| 14:47:39 | <@arkiver> | plcp: you were in contact with Orange about this right? |
| 14:48:09 | <@arkiver> | perhaps we can let them know that making a copy of the websites was successful, and that the web pages will now be available through the Wayback Machine :) |
| 14:48:22 | <@arkiver> | project paused. |
| 14:50:57 | <phaeton> | nicely done all! I assume the auto warrior project should be updated? |
| 14:51:54 | <@arkiver> | oh right! |
| 14:52:10 | <@arkiver> | done, thanks :) |
| 15:06:34 | | fuzzy8021 quits [Ping timeout: 265 seconds] |
| 15:07:46 | | fuzzy8021 (fuzzy8021) joins |
| 15:12:42 | | jacksonchen666 (jacksonchen666) joins |
| 15:19:30 | | Exorcism5 quits [Remote host closed the connection] |
| 15:21:41 | | Exorcism5 (exorcism) joins |
| 15:26:17 | | jacksonchen666 leaves |
| 15:30:25 | <fireonlive> | 😃 |
| 15:34:32 | <Exorcism5> | gg everyone hehe |
| 15:34:37 | | Exorcism5 is now known as Exorcism |
| 15:45:49 | | Maturion joins |
| 16:00:30 | | Maturion quits [Remote host closed the connection] |
| 16:02:14 | | Maturion joins |
| 16:19:22 | <plcp> | arkiver: I was in contact with someone interally, yes, someone that survived the company going private in the 2000s and sympathetic to the "archiving stuff" side |
| 16:20:26 | <plcp> | (given that he is the author of one of the largest "pages persos orange" on french telco history, he sure is passionate about this :D) |
| 16:20:51 | <plcp> | he has been informed and very happy to have pulled the strings he could :) |
| 16:22:43 | <plcp> | (but for what it's worth, Orange is a very large company and I think that most of their employees weren't aware that pagespersos even existed, and IMHO at best making this more official would worry someone about copyright shit & co) |
| 16:43:23 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 17:35:12 | | Exorcism6 (exorcism) joins |
| 17:35:57 | | Exorcism quits [Read error: Connection reset by peer] |
| 18:20:48 | | yts98 joins |
| 18:29:50 | | BornOn420 (BornOn420) joins |
| 18:30:22 | | Exorcism6 quits [Client Quit] |
| 18:31:02 | | Exorcism (exorcism) joins |
| 19:19:51 | <@arkiver> | plcp: that is very nice to hear and great to hear about his enthusiastic reactions :) |
| 19:20:03 | <@arkiver> | a large part of Orange history will live on in the Wayback Machine! |
| 19:20:09 | <@arkiver> | thank you plcp :) |
| 19:31:50 | | Bespork leaves |
| 19:35:17 | | Maturion quits [Remote host closed the connection] |
| 21:19:16 | | BornOn420 quits [Client Quit] |
| 21:32:57 | | BornOn420 (BornOn420) joins |
| 22:25:50 | | decky joins |
| 22:28:49 | | decky_e quits [Ping timeout: 265 seconds] |
| 22:48:43 | | decky quits [Read error: Connection reset by peer] |
| 23:40:20 | | decky_e joins |