00:00:10sec^nd quits [Remote host closed the connection]
00:00:56sec^nd (second) joins
00:28:55Umbire quits [Remote host closed the connection]
00:29:24Umbire (Umbire) joins
00:34:10xarph quits [Quit: ZNC 1.8.2+deb2ubuntu0.1 - https://znc.in]
00:36:08xarph joins
02:48:14lemuria (lemuria) joins
02:51:12archiveDrill quits [Quit: The Lounge - https://thelounge.chat]
03:17:41<pabs>https://discourse.ubuntu.com/t/introducing-debcrafters/63674
03:20:18<nicolas17>pabs: sounds like an easier way to get paid to work on Ubuntu than Canonical's hiring process :D
03:20:58<pabs>sounds like you'd still have to go through that process to get paid?
03:21:06<nicolas17>ah I misread the "paid to contribute one day per week" part
03:21:20<pabs>just with the extra requirement of having to already be a volunteer Ubuntu contributor first
03:21:21<nicolas17>that's "apart from their regular job doing Ubuntu stuff"
03:22:08<pabs>did you see this? https://dustri.org/b/my-experience-with-canonicals-interview-process.html
03:22:44<nicolas17>yes
03:22:47<nicolas17>and others
03:22:54<nicolas17>I have yet to see anyone talk about a positive experience
03:23:27<pabs>I hear people already there enjoy the job
03:26:26<nicolas17>yeah you just have to subject yourself to that hazing process to maybe get in
03:28:02<nicolas17>it sounds like Mark Shuttleworth should be tied to the same rocket as Elon Musk and Mark Zuckerberg and aimed in the general direction of Mars
03:29:00<BlankEclair>he's named shuttleworth of all things
03:31:34HackMii quits [Ping timeout: 264 seconds]
03:31:52<pabs>TBH, if I could skip working on the Ubuntu Archive, and only do FOSS work upstream on Debian and other projects, the hazing might be tempting...
03:32:05HackMii (hacktheplanet) joins
03:36:42<BlankEclair>> So I exercised my GDPR rights, and asked to be communicated everything pertaining to my interviews.
03:36:45<BlankEclair>holy shit that's nice
03:45:32<pabs>https://www.nullpt.rs/reversing-botid
03:52:14Umbire quits [Ping timeout: 260 seconds]
04:11:37Guest58 joins
04:12:53Guest58 quits [Client Quit]
04:13:51HackMii quits [Remote host closed the connection]
04:13:51HackMii_ (hacktheplanet) joins
04:40:48Guest58 joins
04:53:29jinn6 quits [Ping timeout: 260 seconds]
04:55:26jinn6 joins
04:58:49Lord_Nightmare quits [Quit: ZNC - http://znc.in]
05:02:11Lord_Nightmare (Lord_Nightmare) joins
05:16:06archiveDrill joins
05:25:36Guest58 quits [Client Quit]
05:44:08Umbire (Umbire) joins
05:46:14Guest58 joins
06:09:19Umbire quits [Ping timeout: 260 seconds]
06:18:14Guest58 quits [Client Quit]
06:19:33Guest58 joins
06:26:10trix quits [Quit: o7 lain.ripe.net]
06:30:39trix (trix) joins
06:43:20Umbire (Umbire) joins
07:05:13Guest58 quits [Client Quit]
07:05:54Guest58 joins
07:07:41Umbire quits [Ping timeout: 276 seconds]
07:10:34Guest58 quits [Ping timeout: 260 seconds]
08:04:30Guest58 joins
08:29:07Guest58 quits [Client Quit]
08:31:59Guest58 joins
09:34:39ducky quits [Ping timeout: 260 seconds]
09:35:01ducky (ducky) joins
09:40:45Dada joins
11:00:01Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
11:02:56Bleo182600722719623455222 joins
11:14:39Guest58 quits [Client Quit]
11:18:47Guest58 joins
12:42:26yasomi quits [Ping timeout: 276 seconds]
12:46:06yasomi (yasomi) joins
12:46:06Medowar quits [Read error: Connection reset by peer]
12:46:48Medowar joins
12:51:50<@JAA>I received a 'reaction' email from GMail for the first time. Thunderbird classified it as spam.
12:57:24<jinn6>sounds about right
12:57:35<jinn6>also no topic set in here
12:57:54katia pulls hexa_'s tail
12:58:14<jinn6>or in #archiveteam-bs for that matter, lol
13:05:23Guest58 quits [Client Quit]
13:06:42<nulldata>https://blog.cloudflare.com/introducing-pay-per-crawl/
13:07:03steering wonders which server(s) lost the topic
13:07:24<katia>aaaaaaaaaaaaaaaaaaaaaaaaa clownflare
13:30:01<masterx244|m>s/clownflare/buttflare/g
13:31:45Guest58 joins
13:36:54Dada quits [Remote host closed the connection]
13:38:10<jinn6>clownflare is great
13:38:33<jinn6>apparently I'm on vindobona.hackint.org and that doesn't have topic
13:42:03Guest58 quits [Client Quit]
13:46:57ducky quits [Remote host closed the connection]
13:47:55ducky (ducky) joins
13:48:16ducky quits [Read error: Connection reset by peer]
13:50:39ducky (ducky) joins
13:52:01ducky quits [Read error: Connection reset by peer]
13:53:48ducky (ducky) joins
13:53:49Guest58 joins
13:56:20Umbire (Umbire) joins
14:04:48Guest58 quits [Client Quit]
14:19:15HackMii_ quits [Remote host closed the connection]
14:19:35HackMii (hacktheplanet) joins
14:42:04<IDK>https://www.wired.com/story/cloudflare-blocks-ai-crawlers-default/
14:42:17<IDK>The end for SPN? :(
14:45:58<katia>fck cld
14:47:51<IDK>--🤡🔥
14:48:23<IDK>🤡🔥--
14:48:24<eggdrop>[karma] '🤡🔥' now has -1 karma!
14:48:30<IDK>there
15:12:49archiveDrill2 joins
15:15:12archiveDrill quits [Ping timeout: 276 seconds]
15:15:12archiveDrill2 is now known as archiveDrill
15:17:56@Fusl quits [Quit: K-Lined]
15:18:13Fusl (Fusl) joins
15:18:13@ChanServ sets mode: +o Fusl
15:30:09<jinn6>"You've run out of free articles." ah yes
15:40:26<Umbire>I mean the scraping has also been outright causing various sites to buckle from the traffic
15:40:40<Umbire>including a wiki I admin (which also offers downloads of its contents to boot)
15:41:39<Umbire>jinn6, https://archive.is/IT0Jj
15:42:33wessel15126 joins
15:44:33<jinn6>thx
15:44:54wessel15126 quits [Client Quit]
15:46:20wessel15126 joins
15:47:01<jinn6>you could argue that those sites were poorly designed to begin with, tbh, like, the expensive dynamically generated endpoints, like "show page edit history", or "show version diff" should be allowed for accepted users only, or such.....but yeah, the bots don't care
15:47:58<Umbire>yeah I don't think sites should like, explode because they were poorly designed
15:48:11<Umbire>and exploded sites tend to be a bit trickier to make design changes to
15:48:15wessel15126 quits [Client Quit]
15:48:41wessel1512 joins
15:50:04<Umbire>in the same vein one could also argue for much more sensible scraping (or just avoiding it altogether, in the case of sites offering their own archives and such)
15:50:15<Umbire>none of this said with any hostility, just to be clear
15:50:30<Umbire>tend to come off combative when I'm not aiming to
15:51:20<jinn6>something something robots.txt
15:52:28<jinn6>and because of a few (dozen) bad actors, now people end up blocking like, "every cloud hosting IP", and stuff
15:53:21<Umbire>nothing about it's ideal lol
15:53:41<Umbire>and I'm not a site designer so idk how feasible robots.txt is as an option for a given type of site
15:53:44<Umbire>in either direction
15:59:33<jinn6>quite feasible, but robots.txt is just a guideline that only well-behaved bots care for
15:59:53<Umbire>If it isn't already clear I'd rather bots used sensibly for archival purposes be not caught in the crossfire (also rather it be done somewhat sensibly but then I think most archival bots from teams that even halfway know what they're doing fall under that)
16:00:57<Umbire>and for further clarity I don't have anything to do with the wiki site's own hosting, I just admin the wiki itself
16:03:01<jinn6>yeah, most bots are fine, problem is the LLM-trainers just get like a gazillion IPs all hammering on stuff, so you cannot even block them with normal tools, since a given IP might only hit a few pages a minute, and when blocked, just switches to another IP...which is why I can't walk 10 steps without another captcha, anubis, turnstile, etc hitting me in the face, nowdays >:(
16:03:37<Umbire>yeah and I can't really blame most people for blanket blocking as much as I also think it fucking Sucks
16:04:07<Umbire>(even not factoring my own personal beefs with Cloudflare, which seems to be mandatory to chat in here :P)
16:06:36<jinn6>at least anubis by default lets stuff through that doesn't pretend to be mozilla-like (and as such can just be blocked based on user-agent if needed)...but most people just captcha/cloudflare...
16:09:43<Umbire>yeah another editor suggested anubis also, might bring it up again
16:09:56<Umbire>ideally while there's still a lull in the scraping hits for us
16:17:16<jinn6>it's not ideal, but at least it lets text-only browsers through
16:17:54<Umbire>mhmhmmm
16:20:09<jinn6>there's a bunch of other thingies by now too, but I have no idea what their defaults are like
16:21:59<jinn6>go-away, powxy, apparently even a css based one, I've heard that even just checking e.g. whether the requester sends a referer header is a surprisingly decent way to block bots, but I don't have any personal experience with all that jazz
16:23:21<Umbire>I see, will look into
16:23:36<Umbire>thanks for the chat, gonna make lunch now
16:26:41anonymoususer852 quits [Ping timeout: 276 seconds]
16:27:21<jinn6>a funny idea I've thought about, is using server-side imagemaps, every browser that is able to display images, supports it, down to ncsa mosaic, all it does is appends clicked coordinates to the url, but it could probably be used with an ai-poisoned-image as a crude DIY captcha
16:27:53<jinn6>(it wouldn't be accessible to blind people, and it wouldn't work in text browsers without images, though)
16:28:06anonymoususer852 (anonymoususer852) joins
16:48:54grill (grill) joins
17:22:45<nicolas17>it's unfortunate that any whitelist to let us archivers through would be easy for LLM scrapers to abuse too
17:25:00<Umbire>yeah
17:26:34grill quits [Ping timeout: 260 seconds]
17:28:26grill (grill) joins
17:32:13Dada joins
17:50:18Umbire quits [Quit: [brb]]
18:08:49HP_Archivist quits [Quit: Leaving]
18:13:14grill quits [Ping timeout: 260 seconds]
18:15:05grill (grill) joins
18:17:34Umbire (Umbire) joins
18:31:42Umbire quits [Remote host closed the connection]
18:32:11Umbire (Umbire) joins