00:00:10 | | sec^nd quits [Remote host closed the connection] |
00:00:56 | | sec^nd (second) joins |
00:28:55 | | Umbire quits [Remote host closed the connection] |
00:29:24 | | Umbire (Umbire) joins |
00:34:10 | | xarph quits [Quit: ZNC 1.8.2+deb2ubuntu0.1 - https://znc.in] |
00:36:08 | | xarph joins |
02:48:14 | | lemuria (lemuria) joins |
02:51:12 | | archiveDrill quits [Quit: The Lounge - https://thelounge.chat] |
03:17:41 | <pabs> | https://discourse.ubuntu.com/t/introducing-debcrafters/63674 |
03:20:18 | <nicolas17> | pabs: sounds like an easier way to get paid to work on Ubuntu than Canonical's hiring process :D |
03:20:58 | <pabs> | sounds like you'd still have to go through that process to get paid? |
03:21:06 | <nicolas17> | ah I misread the "paid to contribute one day per week" part |
03:21:20 | <pabs> | just with the extra requirement of having to already be a volunteer Ubuntu contributor first |
03:21:21 | <nicolas17> | that's "apart from their regular job doing Ubuntu stuff" |
03:22:08 | <pabs> | did you see this? https://dustri.org/b/my-experience-with-canonicals-interview-process.html |
03:22:44 | <nicolas17> | yes |
03:22:47 | <nicolas17> | and others |
03:22:54 | <nicolas17> | I have yet to see anyone talk about a positive experience |
03:23:27 | <pabs> | I hear people already there enjoy the job |
03:26:26 | <nicolas17> | yeah you just have to subject yourself to that hazing process to maybe get in |
03:28:02 | <nicolas17> | it sounds like Mark Shuttleworth should be tied to the same rocket as Elon Musk and Mark Zuckerberg and aimed in the general direction of Mars |
03:29:00 | <BlankEclair> | he's named shuttleworth of all things |
03:31:34 | | HackMii quits [Ping timeout: 264 seconds] |
03:31:52 | <pabs> | TBH, if I could skip working on the Ubuntu Archive, and only do FOSS work upstream on Debian and other projects, the hazing might be tempting... |
03:32:05 | | HackMii (hacktheplanet) joins |
03:36:42 | <BlankEclair> | > So I exercised my GDPR rights, and asked to be communicated everything pertaining to my interviews. |
03:36:45 | <BlankEclair> | holy shit that's nice |
03:45:32 | <pabs> | https://www.nullpt.rs/reversing-botid |
03:52:14 | | Umbire quits [Ping timeout: 260 seconds] |
04:11:37 | | Guest58 joins |
04:12:53 | | Guest58 quits [Client Quit] |
04:13:51 | | HackMii quits [Remote host closed the connection] |
04:13:51 | | HackMii_ (hacktheplanet) joins |
04:40:48 | | Guest58 joins |
04:53:29 | | jinn6 quits [Ping timeout: 260 seconds] |
04:55:26 | | jinn6 joins |
04:58:49 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
05:02:11 | | Lord_Nightmare (Lord_Nightmare) joins |
05:16:06 | | archiveDrill joins |
05:25:36 | | Guest58 quits [Client Quit] |
05:44:08 | | Umbire (Umbire) joins |
05:46:14 | | Guest58 joins |
06:09:19 | | Umbire quits [Ping timeout: 260 seconds] |
06:18:14 | | Guest58 quits [Client Quit] |
06:19:33 | | Guest58 joins |
06:26:10 | | trix quits [Quit: o7 lain.ripe.net] |
06:30:39 | | trix (trix) joins |
06:43:20 | | Umbire (Umbire) joins |
07:05:13 | | Guest58 quits [Client Quit] |
07:05:54 | | Guest58 joins |
07:07:41 | | Umbire quits [Ping timeout: 276 seconds] |
07:10:34 | | Guest58 quits [Ping timeout: 260 seconds] |
08:04:30 | | Guest58 joins |
08:29:07 | | Guest58 quits [Client Quit] |
08:31:59 | | Guest58 joins |
09:34:39 | | ducky quits [Ping timeout: 260 seconds] |
09:35:01 | | ducky (ducky) joins |
09:40:45 | | Dada joins |
11:00:01 | | Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:56 | | Bleo182600722719623455222 joins |
11:14:39 | | Guest58 quits [Client Quit] |
11:18:47 | | Guest58 joins |
12:42:26 | | yasomi quits [Ping timeout: 276 seconds] |
12:46:06 | | yasomi (yasomi) joins |
12:46:06 | | Medowar quits [Read error: Connection reset by peer] |
12:46:48 | | Medowar joins |
12:47:47 | | Medowar is now authenticated as Medowar |
12:51:50 | <@JAA> | I received a 'reaction' email from GMail for the first time. Thunderbird classified it as spam. |
12:54:00 | | katocala is now authenticated as katocala |
12:57:24 | <jinn6> | sounds about right |
12:57:35 | <jinn6> | also no topic set in here |
12:57:54 | | katia pulls hexa_'s tail |
12:58:14 | <jinn6> | or in #archiveteam-bs for that matter, lol |
13:05:23 | | Guest58 quits [Client Quit] |
13:06:42 | <nulldata> | https://blog.cloudflare.com/introducing-pay-per-crawl/ |
13:07:03 | | steering wonders which server(s) lost the topic |
13:07:24 | <katia> | aaaaaaaaaaaaaaaaaaaaaaaaa clownflare |
13:30:01 | <masterx244|m> | s/clownflare/buttflare/g |
13:31:45 | | Guest58 joins |
13:36:54 | | Dada quits [Remote host closed the connection] |
13:38:10 | <jinn6> | clownflare is great |
13:38:33 | <jinn6> | apparently I'm on vindobona.hackint.org and that doesn't have topic |
13:42:03 | | Guest58 quits [Client Quit] |
13:46:57 | | ducky quits [Remote host closed the connection] |
13:47:55 | | ducky (ducky) joins |
13:48:16 | | ducky quits [Read error: Connection reset by peer] |
13:50:39 | | ducky (ducky) joins |
13:52:01 | | ducky quits [Read error: Connection reset by peer] |
13:53:48 | | ducky (ducky) joins |
13:53:49 | | Guest58 joins |
13:56:20 | | Umbire (Umbire) joins |
14:04:48 | | Guest58 quits [Client Quit] |
14:19:15 | | HackMii_ quits [Remote host closed the connection] |
14:19:35 | | HackMii (hacktheplanet) joins |
14:42:04 | <IDK> | https://www.wired.com/story/cloudflare-blocks-ai-crawlers-default/ |
14:42:17 | <IDK> | The end for SPN? :( |
14:45:58 | <katia> | fck cld |
14:47:51 | <IDK> | --🤡🔥 |
14:48:23 | <IDK> | 🤡🔥-- |
14:48:24 | <eggdrop> | [karma] '🤡🔥' now has -1 karma! |
14:48:30 | <IDK> | there |
15:12:49 | | archiveDrill2 joins |
15:15:12 | | archiveDrill quits [Ping timeout: 276 seconds] |
15:15:12 | | archiveDrill2 is now known as archiveDrill |
15:17:56 | | @Fusl quits [Quit: K-Lined] |
15:18:13 | | Fusl (Fusl) joins |
15:18:13 | | @ChanServ sets mode: +o Fusl |
15:30:09 | <jinn6> | "You've run out of free articles." ah yes |
15:40:26 | <Umbire> | I mean the scraping has also been outright causing various sites to buckle from the traffic |
15:40:40 | <Umbire> | including a wiki I admin (which also offers downloads of its contents to boot) |
15:41:39 | <Umbire> | jinn6, https://archive.is/IT0Jj |
15:42:33 | | wessel15126 joins |
15:44:33 | <jinn6> | thx |
15:44:54 | | wessel15126 quits [Client Quit] |
15:46:20 | | wessel15126 joins |
15:47:01 | <jinn6> | you could argue that those sites were poorly designed to begin with, tbh, like, the expensive dynamically generated endpoints, like "show page edit history", or "show version diff" should be allowed for accepted users only, or such.....but yeah, the bots don't care |
15:47:58 | <Umbire> | yeah I don't think sites should like, explode because they were poorly designed |
15:48:11 | <Umbire> | and exploded sites tend to be a bit trickier to make design changes to |
15:48:15 | | wessel15126 quits [Client Quit] |
15:48:41 | | wessel1512 joins |
15:50:04 | <Umbire> | in the same vein one could also argue for much more sensible scraping (or just avoiding it altogether, in the case of sites offering their own archives and such) |
15:50:15 | <Umbire> | none of this said with any hostility, just to be clear |
15:50:30 | <Umbire> | tend to come off combative when I'm not aiming to |
15:51:20 | <jinn6> | something something robots.txt |
15:52:28 | <jinn6> | and because of a few (dozen) bad actors, now people end up blocking like, "every cloud hosting IP", and stuff |
15:53:21 | <Umbire> | nothing about it's ideal lol |
15:53:41 | <Umbire> | and I'm not a site designer so idk how feasible robots.txt is as an option for a given type of site |
15:53:44 | <Umbire> | in either direction |
15:59:33 | <jinn6> | quite feasible, but robots.txt is just a guideline that only well-behaved bots care for |
15:59:53 | <Umbire> | If it isn't already clear I'd rather bots used sensibly for archival purposes be not caught in the crossfire (also rather it be done somewhat sensibly but then I think most archival bots from teams that even halfway know what they're doing fall under that) |
16:00:57 | <Umbire> | and for further clarity I don't have anything to do with the wiki site's own hosting, I just admin the wiki itself |
16:03:01 | <jinn6> | yeah, most bots are fine, problem is the LLM-trainers just get like a gazillion IPs all hammering on stuff, so you cannot even block them with normal tools, since a given IP might only hit a few pages a minute, and when blocked, just switches to another IP...which is why I can't walk 10 steps without another captcha, anubis, turnstile, etc hitting me in the face, nowdays >:( |
16:03:37 | <Umbire> | yeah and I can't really blame most people for blanket blocking as much as I also think it fucking Sucks |
16:04:07 | <Umbire> | (even not factoring my own personal beefs with Cloudflare, which seems to be mandatory to chat in here :P) |
16:06:36 | <jinn6> | at least anubis by default lets stuff through that doesn't pretend to be mozilla-like (and as such can just be blocked based on user-agent if needed)...but most people just captcha/cloudflare... |
16:09:43 | <Umbire> | yeah another editor suggested anubis also, might bring it up again |
16:09:56 | <Umbire> | ideally while there's still a lull in the scraping hits for us |
16:17:16 | <jinn6> | it's not ideal, but at least it lets text-only browsers through |
16:17:54 | <Umbire> | mhmhmmm |
16:20:09 | <jinn6> | there's a bunch of other thingies by now too, but I have no idea what their defaults are like |
16:21:59 | <jinn6> | go-away, powxy, apparently even a css based one, I've heard that even just checking e.g. whether the requester sends a referer header is a surprisingly decent way to block bots, but I don't have any personal experience with all that jazz |
16:22:51 | | wessel1512 is now authenticated as wessel1512 |
16:23:21 | <Umbire> | I see, will look into |
16:23:36 | <Umbire> | thanks for the chat, gonna make lunch now |
16:26:41 | | anonymoususer852 quits [Ping timeout: 276 seconds] |
16:27:21 | <jinn6> | a funny idea I've thought about, is using server-side imagemaps, every browser that is able to display images, supports it, down to ncsa mosaic, all it does is appends clicked coordinates to the url, but it could probably be used with an ai-poisoned-image as a crude DIY captcha |
16:27:53 | <jinn6> | (it wouldn't be accessible to blind people, and it wouldn't work in text browsers without images, though) |
16:28:06 | | anonymoususer852 (anonymoususer852) joins |
16:48:54 | | grill (grill) joins |
17:22:45 | <nicolas17> | it's unfortunate that any whitelist to let us archivers through would be easy for LLM scrapers to abuse too |
17:25:00 | <Umbire> | yeah |
17:26:34 | | grill quits [Ping timeout: 260 seconds] |
17:28:26 | | grill (grill) joins |
17:32:13 | | Dada joins |
17:50:18 | | Umbire quits [Quit: [brb]] |
18:08:49 | | HP_Archivist quits [Quit: Leaving] |
18:13:14 | | grill quits [Ping timeout: 260 seconds] |
18:15:05 | | grill (grill) joins |
18:17:34 | | Umbire (Umbire) joins |
18:31:42 | | Umbire quits [Remote host closed the connection] |
18:32:11 | | Umbire (Umbire) joins |