00:00:10sec^nd quits [Remote host closed the connection]
00:00:56sec^nd (second) joins
00:28:55Umbire quits [Remote host closed the connection]
00:29:24Umbire (Umbire) joins
00:34:10xarph quits [Quit: ZNC 1.8.2+deb2ubuntu0.1 - https://znc.in]
00:36:08xarph joins
02:48:14lemuria (lemuria) joins
02:51:12archiveDrill quits [Quit: The Lounge - https://thelounge.chat]
03:17:41<pabs>https://discourse.ubuntu.com/t/introducing-debcrafters/63674
03:20:18<nicolas17>pabs: sounds like an easier way to get paid to work on Ubuntu than Canonical's hiring process :D
03:20:58<pabs>sounds like you'd still have to go through that process to get paid?
03:21:06<nicolas17>ah I misread the "paid to contribute one day per week" part
03:21:20<pabs>just with the extra requirement of having to already be a volunteer Ubuntu contributor first
03:21:21<nicolas17>that's "apart from their regular job doing Ubuntu stuff"
03:22:08<pabs>did you see this? https://dustri.org/b/my-experience-with-canonicals-interview-process.html
03:22:44<nicolas17>yes
03:22:47<nicolas17>and others
03:22:54<nicolas17>I have yet to see anyone talk about a positive experience
03:23:27<pabs>I hear people already there enjoy the job
03:26:26<nicolas17>yeah you just have to subject yourself to that hazing process to maybe get in
03:28:02<nicolas17>it sounds like Mark Shuttleworth should be tied to the same rocket as Elon Musk and Mark Zuckerberg and aimed in the general direction of Mars
03:29:00<BlankEclair>he's named shuttleworth of all things
03:31:34HackMii quits [Ping timeout: 264 seconds]
03:31:52<pabs>TBH, if I could skip working on the Ubuntu Archive, and only do FOSS work upstream on Debian and other projects, the hazing might be tempting...
03:32:05HackMii (hacktheplanet) joins
03:36:42<BlankEclair>> So I exercised my GDPR rights, and asked to be communicated everything pertaining to my interviews.
03:36:45<BlankEclair>holy shit that's nice
03:45:32<pabs>https://www.nullpt.rs/reversing-botid
03:52:14Umbire quits [Ping timeout: 260 seconds]
04:11:37Guest58 joins
04:12:53Guest58 quits [Client Quit]
04:13:51HackMii quits [Remote host closed the connection]
04:13:51HackMii_ (hacktheplanet) joins
04:40:48Guest58 joins
04:53:29jinn6 quits [Ping timeout: 260 seconds]
04:55:26jinn6 joins
04:58:49Lord_Nightmare quits [Quit: ZNC - http://znc.in]
05:02:11Lord_Nightmare (Lord_Nightmare) joins
05:16:06archiveDrill joins
05:25:36Guest58 quits [Client Quit]
05:44:08Umbire (Umbire) joins
05:46:14Guest58 joins
06:09:19Umbire quits [Ping timeout: 260 seconds]
06:18:14Guest58 quits [Client Quit]
06:19:33Guest58 joins
06:26:10trix quits [Quit: o7 lain.ripe.net]
06:30:39trix (trix) joins
06:43:20Umbire (Umbire) joins
07:05:13Guest58 quits [Client Quit]
07:05:54Guest58 joins
07:07:41Umbire quits [Ping timeout: 276 seconds]
07:10:34Guest58 quits [Ping timeout: 260 seconds]
08:04:30Guest58 joins
08:29:07Guest58 quits [Client Quit]
08:31:59Guest58 joins
09:34:39ducky quits [Ping timeout: 260 seconds]
09:35:01ducky (ducky) joins
09:40:45Dada joins
11:00:01Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:56Bleo182600722719623455222 joins
11:14:39Guest58 quits [Client Quit]
11:18:47Guest58 joins
12:42:26yasomi quits [Ping timeout: 276 seconds]
12:46:06yasomi (yasomi) joins
12:46:06Medowar quits [Read error: Connection reset by peer]
12:46:48Medowar joins
12:51:50<@JAA>I received a 'reaction' email from GMail for the first time. Thunderbird classified it as spam.
12:57:24<jinn6>sounds about right
12:57:35<jinn6>also no topic set in here
12:57:54katia pulls hexa_'s tail
12:58:14<jinn6>or in #archiveteam-bs for that matter, lol
13:05:23Guest58 quits [Client Quit]
13:06:42<nulldata>https://blog.cloudflare.com/introducing-pay-per-crawl/
13:07:03steering wonders which server(s) lost the topic
13:07:24<katia>aaaaaaaaaaaaaaaaaaaaaaaaa clownflare
13:30:01<masterx244|m>s/clownflare/buttflare/g
13:31:45Guest58 joins
13:36:54Dada quits [Remote host closed the connection]
13:38:10<jinn6>clownflare is great
13:38:33<jinn6>apparently I'm on vindobona.hackint.org and that doesn't have topic
13:42:03Guest58 quits [Client Quit]
13:46:57ducky quits [Remote host closed the connection]
13:47:55ducky (ducky) joins
13:48:16ducky quits [Read error: Connection reset by peer]
13:50:39ducky (ducky) joins
13:52:01ducky quits [Read error: Connection reset by peer]
13:53:48ducky (ducky) joins
13:53:49Guest58 joins
13:56:20Umbire (Umbire) joins
14:04:48Guest58 quits [Client Quit]
14:19:15HackMii_ quits [Remote host closed the connection]
14:19:35HackMii (hacktheplanet) joins
14:42:04<IDK>https://www.wired.com/story/cloudflare-blocks-ai-crawlers-default/
14:42:17<IDK>The end for SPN? :(
14:45:58<katia>fck cld
14:47:51<IDK>--🤡🔥
14:48:23<IDK>🤡🔥--
14:48:24<eggdrop>[karma] '🤡🔥' now has -1 karma!
14:48:30<IDK>there
15:12:49archiveDrill2 joins
15:15:12archiveDrill quits [Ping timeout: 276 seconds]
15:15:12archiveDrill2 is now known as archiveDrill
15:17:56@Fusl quits [Quit: K-Lined]
15:18:13Fusl (Fusl) joins
15:18:13@ChanServ sets mode: +o Fusl
15:30:09<jinn6>"You've run out of free articles." ah yes
15:40:26<Umbire>I mean the scraping has also been outright causing various sites to buckle from the traffic
15:40:40<Umbire>including a wiki I admin (which also offers downloads of its contents to boot)
15:41:39<Umbire>jinn6, https://archive.is/IT0Jj
15:42:33wessel15126 joins
15:44:33<jinn6>thx
15:44:54wessel15126 quits [Client Quit]
15:46:20wessel15126 joins
15:47:01<jinn6>you could argue that those sites were poorly designed to begin with, tbh, like, the expensive dynamically generated endpoints, like "show page edit history", or "show version diff" should be allowed for accepted users only, or such.....but yeah, the bots don't care
15:47:58<Umbire>yeah I don't think sites should like, explode because they were poorly designed
15:48:11<Umbire>and exploded sites tend to be a bit trickier to make design changes to
15:48:15wessel15126 quits [Client Quit]
15:48:41wessel1512 joins
15:50:04<Umbire>in the same vein one could also argue for much more sensible scraping (or just avoiding it altogether, in the case of sites offering their own archives and such)
15:50:15<Umbire>none of this said with any hostility, just to be clear
15:50:30<Umbire>tend to come off combative when I'm not aiming to
15:51:20<jinn6>something something robots.txt
15:52:28<jinn6>and because of a few (dozen) bad actors, now people end up blocking like, "every cloud hosting IP", and stuff
15:53:21<Umbire>nothing about it's ideal lol
15:53:41<Umbire>and I'm not a site designer so idk how feasible robots.txt is as an option for a given type of site
15:53:44<Umbire>in either direction
15:59:33<jinn6>quite feasible, but robots.txt is just a guideline that only well-behaved bots care for
15:59:53<Umbire>If it isn't already clear I'd rather bots used sensibly for archival purposes be not caught in the crossfire (also rather it be done somewhat sensibly but then I think most archival bots from teams that even halfway know what they're doing fall under that)
16:00:57<Umbire>and for further clarity I don't have anything to do with the wiki site's own hosting, I just admin the wiki itself
16:03:01<jinn6>yeah, most bots are fine, problem is the LLM-trainers just get like a gazillion IPs all hammering on stuff, so you cannot even block them with normal tools, since a given IP might only hit a few pages a minute, and when blocked, just switches to another IP...which is why I can't walk 10 steps without another captcha, anubis, turnstile, etc hitting me in the face, nowdays >:(
16:03:37<Umbire>yeah and I can't really blame most people for blanket blocking as much as I also think it fucking Sucks
16:04:07<Umbire>(even not factoring my own personal beefs with Cloudflare, which seems to be mandatory to chat in here :P)
16:06:36<jinn6>at least anubis by default lets stuff through that doesn't pretend to be mozilla-like (and as such can just be blocked based on user-agent if needed)...but most people just captcha/cloudflare...
16:09:43<Umbire>yeah another editor suggested anubis also, might bring it up again
16:09:56<Umbire>ideally while there's still a lull in the scraping hits for us
16:17:16<jinn6>it's not ideal, but at least it lets text-only browsers through
16:17:54<Umbire>mhmhmmm
16:20:09<jinn6>there's a bunch of other thingies by now too, but I have no idea what their defaults are like
16:21:59<jinn6>go-away, powxy, apparently even a css based one, I've heard that even just checking e.g. whether the requester sends a referer header is a surprisingly decent way to block bots, but I don't have any personal experience with all that jazz
16:23:21<Umbire>I see, will look into
16:23:36<Umbire>thanks for the chat, gonna make lunch now
16:26:41anonymoususer852 quits [Ping timeout: 276 seconds]
16:27:21<jinn6>a funny idea I've thought about, is using server-side imagemaps, every browser that is able to display images, supports it, down to ncsa mosaic, all it does is appends clicked coordinates to the url, but it could probably be used with an ai-poisoned-image as a crude DIY captcha
16:27:53<jinn6>(it wouldn't be accessible to blind people, and it wouldn't work in text browsers without images, though)
16:28:06anonymoususer852 (anonymoususer852) joins
16:48:54grill (grill) joins
17:22:45<nicolas17>it's unfortunate that any whitelist to let us archivers through would be easy for LLM scrapers to abuse too
17:25:00<Umbire>yeah
17:26:34grill quits [Ping timeout: 260 seconds]
17:28:26grill (grill) joins
17:32:13Dada joins
17:50:18Umbire quits [Quit: [brb]]
18:08:49HP_Archivist quits [Quit: Leaving]
18:13:14grill quits [Ping timeout: 260 seconds]
18:15:05grill (grill) joins
18:17:34Umbire (Umbire) joins
18:31:42Umbire quits [Remote host closed the connection]
18:32:11Umbire (Umbire) joins
18:55:40pixel (pixel) joins
19:00:44HP_Archivist (HP_Archivist) joins
19:08:11<yzqzss>> apparently even a css based one
19:08:11<yzqzss>You mean https://github.com/yzqzss/csswaf ? :)
19:09:37<yzqzss>> using server-side imagemaps
19:09:57<yzqzss>Imagemaps captcha is common on the Tor
19:20:34Umbire quits [Remote host closed the connection]
19:21:03Umbire (Umbire) joins
19:43:26<jinn6>I've never seen one
19:43:39<jinn6>in fact, I've never seen anyone use server-side imagemaps anywhere, ever
19:46:54HP_Archivist quits [Client Quit]
19:48:08<nicolas17>I have seen it for a JS-free IE5-era image editor
19:48:11<nicolas17>you could like
19:48:13<nicolas17>crop images
19:48:52<nicolas17>by clicking a corner of the rectangle you want to crop, it would send the coordinates to the server with an <input type="image"> and return a new html page where you click the second corner
19:49:33<jinn6>that's cool
19:50:11<nicolas17>idr if it was JS-free
19:50:27<jinn6>only by very explicitly searching for them, have I found a handful of sites demoing them, but I've never seen one be actually used, and all the software for dealing with it seems to be so ancient that the web servers themselves don't even exist
19:50:30<nicolas17>maybe it was one of those "if IE then use document.all, if Netscape then use document.layers"
19:58:20<steering>< nicolas17> it's unfortunate that any whitelist to let us archivers through would be easy for LLM scrapers to abuse too
19:58:55<steering>I would love to see someone start an "IP safelist" that would just compile IPs from/of various reputable sources (archive.org, whatever IPs AT chose to share)
19:59:34<nicolas17>IPs would be the safest way yeah
20:00:06<nicolas17>I just mean we can't ask to pretty please whitelist the SPN user agent, AI scrapers would figure it out pretty quick
20:00:13<steering>yeah for sure
20:00:39<nicolas17>ffs
20:00:46<nicolas17>https://apps.microsoft.com/detail/9nf052js0rj4 it seems this app includes an entire copy of Okular and then getting people angry at KDE because the so-called PDF editor doesn't actually edit
20:01:07<nicolas17>GPL violation too, of course
20:19:14grill quits [Ping timeout: 260 seconds]
20:21:19<jinn6>as long as they don't claim to be okular themselves, it should be somewhat fine-ish
20:22:13<jinn6>just like blender's had long-running trouble too, people just take it and rebrand slightly, and make money (or don't rebrand and just insert ads into the original)
20:23:35<masterx244|m>did the blender devs get that pest squashed somehow?
20:23:46<nicolas17>jinn6: well I don't see any signs of attribution, license, source code, etc
20:23:57<jinn6>yeahhhhhh
20:24:10<nicolas17>and they're emailing KDE asking for a refund
20:24:19<nicolas17>because KDE is what's mentioned in the About window
20:25:31<jinn6>that's sucky, yeah
20:26:17<jinn6>I think the "sayoapps" has 4 copies of the same-ish program even, lul
20:26:24<nicolas17>(also lol seems it's laughably easy to download an .appx from the microsoft store without buying the app, that's how I'm looking at the package contents)
20:27:43<jinn6>lol, is it?
20:29:10<nicolas17>I'm scraping https://store.rg-adguard.net/ to monitor when certain free apps get updated, a friend is doing similar monitoring by using the actual Microsoft Store APIs directly
20:29:19<nicolas17>and I didn't *expect* it to work for paid ones but it seems it does :|
20:49:34riteo quits [Ping timeout: 260 seconds]
20:59:51BornOn420 quits [Remote host closed the connection]
21:00:28BornOn420 (BornOn420) joins
21:02:33BornOn420 quits [Max SendQ exceeded]
21:03:05BornOn420 (BornOn420) joins
21:05:53BennyOtt_ joins
21:07:04BennyOtt quits [Ping timeout: 260 seconds]
21:07:04BennyOtt_ is now known as BennyOtt
21:15:36<@JAA>jinn6: Yes, topics missing from some servers are a known network-wide issue.
21:16:01<jinn6>it shouldn't be hard to like, make a bot to re-set them, I'd think?
21:17:57<@JAA>Sure, but this has been happening in various constellations for some time, so it'd be better to fix the underlying issue first.
21:50:00wickedplayer494 quits [Remote host closed the connection]
21:56:52wickedplayer494 joins
22:21:43Dada quits [Remote host closed the connection]
22:22:14Guest58 joins
22:32:27Barto quits [Quit: WeeChat 4.6.3]
22:36:38Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ]
22:37:04Barto (Barto) joins
22:37:39Shjosan (Shjosan) joins
23:19:22Guest58 quits [Client Quit]
23:24:53Guest58 joins
23:31:09Umbire quits [Ping timeout: 260 seconds]
23:34:44Umbire (Umbire) joins
23:34:55Umbire quits [Remote host closed the connection]
23:35:23Umbire (Umbire) joins