00:04:18Dada quits [Remote host closed the connection]
00:59:39<Doranwen>I've got most of the regex for this figured out but I'm stuck on how to do a first occurrence match of *either* of two characters. Anyone know what I should be searching for?
01:00:11<Doranwen>I want to match everything between a " and either " or ?, whichever comes first.
01:00:58<Doranwen>My searches aren't turning up useful results, alas.
01:22:17beardicus4 (beardicus) joins
01:22:47<jinn6>I'd just do like "[^"?]+
01:23:25<jinn6>which matches quote, and one or more characters that are not quote or question mark
01:23:51<jinn6>(and I guess add a . after to match the quote or questin mark after that)
01:23:58<jinn6>Doranwen: ^ ?
01:24:40beardicus quits [Ping timeout: 256 seconds]
01:24:40beardicus4 is now known as beardicus
01:24:44<jinn6>regex is also fun because there's so many flavours
01:25:00<jinn6>if you're using pcre, you can also look into negative lookahead
01:25:10<jinn6>my way just uses character negation
01:25:27<jinn6>positive lookahead also exists I guess
01:43:09<Doranwen>jinn6: I am such a newbie to it, lol.
01:43:58<Doranwen>I've mostly used it with a lot of commands I've been handed, particularly manipulating stuff with sed.
01:45:22<Doranwen>I tested that and it includes the first quote marks in the match string.
01:49:03<Doranwen>Ugh, realized it's more complicated than I thought.
01:49:48<Doranwen>(And I had forgotten to change my example sentence, so I'm sure your regex worked for what I'd said.)
01:50:25<Doranwen>I'm trying to get *just* the final string of numbers + extension for each of these: https://farm3.static.flickr.com/2800/4135375742_9e9eb8cb63.jpg and https://static.flickr.com/39/105042573_4603a0937b.jpg?v=0
01:52:02<Doranwen>The actual filename, so to speak.
01:52:21<steering>https://participate.whatwg.org/agreement-update Loading the script 'https://participate.whatwg.org/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js' violates the following Content Security Policy directive: "script-src 'nonce-e2b553d0-3c1f-4265-b17a-f7f333c91d86'". Note that 'script-src-elem' was not explicitly set, so 'script-src' is used as a fallback. The action has been blocked.
01:52:22<jinn6>idk how you can access the matching groups
01:52:30<steering>cloudflare--
01:52:31<eggdrop>[karma] 'cloudflare' now has -19 karma!
01:52:44<steering>lol
01:52:55<Doranwen>I'm trying to find one string of commands that'll do that for both of those sorts of urls.
01:52:57<jinn6>but "([^"?]+) would let you use \1 or howevr you specify "first group match" to not have the quote, Doranwen
01:52:59<jinn6>lol
01:53:11<Doranwen>I'll poke at it.
01:53:28<Doranwen>I'm starting to understand this a bit more than I did before.
01:53:36<jinn6>what is the "final string", 4135375742_9e9eb8cb63.jpg or 9e9eb8cb63.jpg?
01:54:10<steering>Doranwen: you want a non-greedy match, which just means putting a ? after your number-of-times metacharacter (probably a * or a +)
01:54:18<steering>so for example
01:55:25<jinn6>^ depends on the regex flavour
01:55:27<jinn6>lol
01:55:28<steering>processing '"foo" bar"' with regex /".*"/ would return 'foo" bar'
01:55:29<Doranwen>I am confusing *myself*. I'm trying to simultaneously work on two separate regex things and forgetting what I'm doing. /o\
01:55:32<jinn6>pcre can do a lot of fancy stuff
01:55:34<steering>processing '"foo" bar"' with regex /".*?"/ would return 'foo'
01:55:45<jinn6>most non-pcre flavours can't do nearly as much
01:55:49<steering>jinn6: not really, all the common regex engines people use can do non-greedy
01:55:59<Doranwen>One is stripping the " and ?, that's what I asked about first. But of course when I returned to my keyboard, I forgot about that and looked at the filename extension one. *facepalm*
01:56:58<Doranwen>One at a time (I tell myself) lol
01:57:16<steering>they're pretty much all RE2, except for posix
01:57:32<jinn6>posix ftw >;P
01:58:00<Doranwen>Lol
01:58:04<steering>if its posix then youre missing at least a few backslashes in yours :P
01:58:38<jinn6>not if it's ERE
01:59:07<Doranwen>So my command this is going in is `grep -oP 'src="\K[^"]+' "$f"` (where $f is an html file in a loop of a directory full of them).
01:59:31<Doranwen>That regex needs to be switched out to handle urls like this one with a ? where I want to stop at the ? and not just the "
02:00:31<steering>go plug it into regex101.com and try stuff until it works :P
02:00:44<@JAA>You can just add a question mark in that character group then.
02:00:46<steering>you can just make it [^"?]
02:00:52<@JAA>'src="\K[^"?]+'
02:00:55<steering>ye
02:00:55<Doranwen>I've been testing stuff in regex101 trying to learn more and it's telling me that "([^"?]+) has a match 1 and a group 1 - but I suspect grep is going to go off the match? am I totally getting this wrong?
02:01:02<Doranwen>Ahh
02:01:05<steering>yeah grep doesn't care about groups
02:01:19<jinn6>thre's also the option to not do everything in one regex, and process afterward ;P
02:01:38<@JAA>Literally easier to not do that though. :-P
02:01:49<Doranwen>Ah, I think I understand how this is working now.
02:02:10<jinn6>idk, would be way easier for me to first grab url, then chop off the ? if any, and then get the last portion of the path, lol
02:02:23<@JAA>It matches src=", then resets the start of the match due to \K (PCRE exclusive), then matches one or more characters that aren't " or ?.
02:02:54Doranwen is slowly improving in her understanding of this.
02:02:57<@JAA>So what you get is anything between src=" and the next " or ?, whichever comes sooner.
02:03:19<@JAA>Or the end of line, if there is no " or ?.
02:03:20<Doranwen>When I started working on the Yahoo Groups project, regex was arcane wizardry in an foreign language I'd never heard before, lol.
02:03:41<@JAA>Yeah, no, that's accurate.
02:03:43<Doranwen>(At least, that's how it felt.)
02:04:05<jinn6>I would do inefficient pipelines myself, way quicker to come up with, lol, like: grep -oE '"[^"?]+'|rev|cut -d / -f 1|rev
02:04:33<Doranwen>Only in the last week or two am I starting to be able to do some stuff with it without copy/pasting everything from StackOverflow posts.
02:04:34<jinn6>well you'd want src=" not just " but still
02:04:45<@JAA>`grep -oE 'src="[^"?]+' | sed 's,^.*["/],,'`
02:04:52<@JAA>Now you have two problems. :-P
02:05:00<jinn6>you could sed too, lol
02:05:09<Doranwen>Lol
02:05:14<@JAA>With PCRE, it's a bit cleaner.
02:05:31<@JAA>`grep -Po 'src="\K[^"?]+' | sed 's,^.*/,,'`
02:05:39<jinn6>i haven't benchmarked it, but sed would be "more expensive" resource-wise than just cut, I've heard, but idk if two revs make the cut just as "expensive" lol
02:05:58<jinn6>obviously modern computers have any of these be a few milliseconds
02:06:19<jinn6>but if you get to high volumes of stuff, it may matter
02:06:19<@JAA>Yeah, would only matter on very large input size.
02:06:39<@JAA>And in that case, you may not want to use a regex for matching in the first place.
02:06:45<jinn6>lol yeah
02:07:20<@JAA>Either prefilter with `grep -F 'src="'` or even something entirely custom.
02:09:38<@JAA>The `sed` is unlikely to be the bottleneck unless you get very long matches.
02:11:00<steering>Doranwen: fyi, () in regex do two things - the usual thing where they indicate precedence/grouping, so a(b|c) means something different than ab|c, and "capture groups" where you can say "the same thing again". mostly useful in programming languages (where you can ask it to tell you what any of the capture groups contained) but you can also use them in grep patterns technically.
02:12:18<Doranwen>I'm only starting to understand some of this, but I'll get there!
02:12:25<steering>grep -P "([0-9a-f]+)-\1" (or indeed I think that would work with just grep -E too) for example would match strings with some hex, a hyphen, and then *the same string of hex repeated*
02:12:45<Doranwen>I just modified my matching for the filename grabbing bit and was able to successfully strip off everything except for the filename so I'm pleased with that progress.
02:12:51<@JAA>That use is called 'back-reference'. Weirdly, it's supported in BRE and PCRE but not ERE.
02:12:55<steering>so that's all it means when it says a "group"
02:13:01<steering>oh, ew
02:13:14<@JAA>GNU grep will be fine with it, but it won't be portable under ERE.
02:13:20<steering>so grep "\([0-9a-f]+\)-\1" then :P
02:13:31<steering>(i hate BRE)
02:13:32<@JAA>Well, you had -P, which is PCRE and also fine.
02:13:45<steering>yeah
02:13:55<@JAA>No, that's not it either.
02:14:05<@JAA>No + in BRE.
02:14:08<steering>oh yeah
02:14:10<steering>\+
02:14:12<@JAA>And \+ is implementation-defined.
02:14:18<steering>lol really
02:14:22<steering>\{1,\}
02:14:25<@JAA>Nope
02:14:50<@JAA>Oh, yeah, no, that one does work.
02:14:58<@JAA>It's \{,3\} that doesn't exist for ??? reasons.
02:15:03<steering>lmao
02:15:17<@JAA>But not in ERE either, so...
02:15:32<@JAA>Yeah, it's a mess.
02:15:56<steering>people are sometimes confused why I've never bothered to really "learn" BRE or ERE well enough to even use them with grep.
02:16:01<steering>and... why would I? they're terrible.
02:16:30<@JAA>Well, the issue is that they're trying to standardise something after decades of proliferation.
02:16:41<@JAA>Same with AWK, sh, and everything else in POSIX.
02:17:21<@JAA>Every implementation has quirks, so they're trying to find common ground and sort out minor differences, but that doesn't mean things will actually follow all parts of the spec or won't extend it.
02:17:52<@JAA>{,n} is entirely undefined, for example.
02:18:06<steering>no no they're just terrible to use
02:18:21<@JAA>ERE is fine, I think.
02:18:28<steering>ERE is *better*
02:18:31<@JAA>Well, ERE with backrefs, like GNU's implementation.
02:18:55<steering>BRE has the problem of randomly "this is backslashed, this isn't"
02:18:58<@JAA>I don't recall whether that's just PCRE with a flag to restrict it to ERE or not.
02:19:38<steering>(although to be fair about the only thing that's *not* backslashed in BRE is [], but that's like the weirdest possible choice of one-thing-to-not-backslash)
02:20:43<@JAA>.[\*^$ are all non-ordinary characters in BRE.
02:21:16<steering>yeah, but ^$ are still literals when out of position even there, right?
02:21:44<@JAA>Yeah, there are some weird cases with those.
02:22:02<steering>really just .[\*
02:22:28<steering>. and * are obvious choices, \ is as good an escape character as any other, but then they throw in [
02:22:43<@JAA>I mean, if you argue like that, you'll have to exclude * as well, because it's matched literally when used at the start of an expression etc.
02:22:52<steering>oh really?
02:23:04<steering>sane regex engines just call that an error :P
02:23:38<@JAA>Also after a \(. And also after a \| if the implementation defines it to behave like ERE |.
02:24:03<steering>well yeah, same way as ^ could still be a metachar after each of those
02:24:19<@JAA>The one advantage of BRE is that it's relatively portable.
02:25:10<steering>my response to that is "so's pcre2 or even re2, why doesn't your platform have -P"
02:25:17<steering>:P
02:25:34<@JAA>That'll exclude lots of low-power or older systems.
02:26:34<steering>i'm sure it would, but why?
02:28:03<steering>like... BRE is portable *because* its specified in POSIX
02:28:38<steering>otherwise you wouldn't be able to assume grep takes any particular flavor of regex, right?
02:28:46<@JAA>Yeah
02:28:51<steering>(or indeed even exists)
02:29:58<@JAA>I mean, I mostly agree with you. I personally use -P all the time, just like I use Bash for my scripts even if I could theoretically write a lot of them as sh scripts. I use Bash virtually everywhere, so why would I torture myself by rewriting them sh-compatible?
02:30:02<steering>but why is POSIX the magical standard anyway? why should that be what anyone shoots for?
02:30:39<steering>realistically, it's not hard to get (or make) a grep with a -P that works well enough
02:32:50<steering>(a good, efficient one is of course another question :P)
02:36:35<@JAA>Yeah, why POSIX is a good question. It could be something different, but that's what emerged from the 'document how UNIX systems work' approach. POSIX and the Austin Group don't drive development really; they just document common ground between implementations.
02:36:48<@JAA>As for why you'd want to use it, well, precisely because it is the common ground.
02:37:40<@JAA>If you write an sh script with grep BRE or ERE, you *know* it will work on almost anything out there.
02:39:20<@JAA>But obviously, the common ground has to be technologically inferior to a more powerful implementation.
02:40:06<@JAA>And also, making sure your stuff is actually compliant with the specs isn't easy. Most implementations have some sort of extensions, sometimes even the same ones.
02:40:17<@JAA>Bash in POSIX mode still supports a lot of Bashisms.
02:40:43<@JAA>Some of those will also be present in ksh or zsh or others run in POSIX mode.
02:41:51<@JAA>IIRC, even dash goes beyond pure sh, but I haven't used it much.
03:40:42PredatorIWD258 joins
03:42:56PredatorIWD25 quits [Ping timeout: 256 seconds]
03:42:56PredatorIWD258 is now known as PredatorIWD25
04:01:57steering sighs in his gpg-agent forwarding not working suddenly for one system
05:06:39<nicolas17>>using gpg
05:30:34DogsRNice quits [Read error: Connection reset by peer]
05:34:21pabs quits [Ping timeout: 272 seconds]
05:43:31nexussfan quits [Quit: Konversation terminated!]
06:07:42pabs (pabs) joins
06:56:26<pabs>https://crimethinc.com/2026/01/07/iran-an-uprising-besieged-from-within-and-without-three-perspectives
07:58:24SootBector quits [Remote host closed the connection]
07:59:33SootBector (SootBector) joins
08:30:01SootBector quits [Remote host closed the connection]
08:30:22SootBector (SootBector) joins
09:23:22Juest quits [Read error: Connection reset by peer]
09:24:43Juest (Juest) joins
09:40:20grill (grill) joins
10:07:40nine quits [Quit: See ya!]
10:07:53nine joins
10:07:53nine quits [Changing host]
10:07:53nine (nine) joins
11:05:09Dada joins
11:09:23grill quits [Ping timeout: 272 seconds]
11:52:14Dada quits [Remote host closed the connection]
12:00:02Bleo182600722719623455222 quits [Quit: The Lounge - https://thelounge.chat]
12:02:48Bleo182600722719623455222 joins
12:59:43<justauser>Annoying Webuser has evolved a non-default nick "admin".
13:06:40<justauser>Perhaps banning (or quieting?) *!~webuser@*.mobile.vf-ua.net would stop this?
13:07:00<justauser>#archivebot-chat is fairly unusable.
15:45:50nulldata-alt1 (nulldata) joins
16:43:26grill (grill) joins
16:48:13<klea>c3manu: can you temporarily mute *!~webuser@*.mobile.vf-ua.net on #archivebot?
16:48:26klea agrees with justauser
16:50:12<justauser>And it's sockpuppetting, because why not?
16:50:43<klea>yeah, both even seem from the same ip.
16:51:29<c3manu>klea: i don’t know my way around IRC that much, and last time i trusted you blindly (with the mastodon list) it didn’t go well. i’m gonna leave that to a more experienced operator.
16:51:40<klea>oh ok, sorry
17:11:56<justauser>Yay. I can't see the ban in the list, but seemingly in worked.
17:26:41<justauser>Joined #39c3-offtopic and got instantly opped by ChanServ.
17:27:05<justauser>Apparently I have power to ban people, but not power to know why I have it in first place.
17:28:06<justauser>Everybody NickServ-identified apparently?
17:31:48lucifer_sam joins
17:39:13DogsRNice joins
17:45:44<f_>yep
17:50:06ducky quits [Ping timeout: 256 seconds]
17:50:11ducky (ducky) joins
17:55:21ducky quits [Ping timeout: 272 seconds]
17:57:25HackMii quits [Remote host closed the connection]
17:57:44HackMii (hacktheplanet) joins
17:57:51Dada joins
18:07:17ducky (ducky) joins
18:27:52<klea>justauser: JAA banned them via the control center switcher called #archiveteam
18:28:55<justauser>I didn't initially see them here either.
18:32:16<@JAA>Why is there on-topic discussion in the off-topic channel?
18:32:56nine quits [Quit: See ya!]
18:33:09nine joins
18:33:09nine quits [Changing host]
18:33:09nine (nine) joins
18:37:28<klea>sorry
18:51:08Chris5010 quits [Quit: ]
18:51:25Chris5010 (Chris5010) joins
18:58:48<justauser>I thought that me whining about Webusers is not related to archiving, thus offopic.
19:17:41lucifer_sam quits [Ping timeout: 272 seconds]
19:25:55<klea>i guess it becomes on topic the moment taking administrative actions is thought of
19:50:16ATinySpaceMarine quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
19:50:50ATinySpaceMarine joins
19:56:57grill quits [Ping timeout: 272 seconds]
20:02:22grill (grill) joins
20:14:03superkuh quits [Ping timeout: 272 seconds]
20:15:27HP_Archivist quits [Read error: Connection reset by peer]
20:19:27superkuh joins
20:24:25HP_Archivist (HP_Archivist) joins
20:53:28superkuh_ joins
20:55:51superkuh quits [Ping timeout: 272 seconds]
21:31:12<steering>"oh sorry i thought ot stood for *on*-topic, my bad"
21:32:51<klea>NF885: i think it doesn't load [[User:Nintendofan885/common.js]], but im not sure :p
21:59:26grill quits [Ping timeout: 256 seconds]
22:00:19nexussfan (nexussfan) joins
22:01:05grill (grill) joins
22:09:34sec^nd quits [Remote host closed the connection]
22:10:01sec^nd (second) joins
22:19:37<Flashfire42>https://server8.kiska.pw/uploads/76c4f45a920a6cdd/image.png lmao wgat
22:20:40<@JAA>I wonder what malware this is trying to get you to run. :-)
22:21:34<nexussfan>xD
22:22:17<klea>huh, what UA are you using Flashfire42?
22:22:31<klea>or what page where you at?
22:22:38<klea>i want to try to reverse engineer it somewhat
22:23:00<nexussfan>I wonder what happens if you set your UA to BSD, then it doesn't have a malware to give to you
22:23:04<TheTechRobo>Would probably be worth archiving whatever malware it is
22:24:02<Flashfire42>Salamvancouver.com
22:24:05<klea>the domain that it's mentioned on the image doesn't seem to be the one serving it for me for query with UA: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:145.0) Gecko/20100101 Firefox/145.0
22:24:11<klea>i should install ua switcher on my warcprox profile
22:24:27<Flashfire42>I am using Edge
22:24:47<klea>can you go to https://ip.envs.net/ua?
22:24:54<klea>and/or tell me the URI you were trying to access?
22:25:08<Flashfire42>Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36 Edg/143.0.0.0
22:25:36<klea>thx
22:28:39<klea>Flashfire42: are you sure you visited https://salamvancouver.com/?
22:30:26<Flashfire42>I am and strangely now its not showing up when i click on it
22:30:41<klea>do you have something on your clipboard?
22:30:48<klea>if so can you upload it as a txt file to transfer?
22:31:03<TheTechRobo>Windows also might have a clipboard history if not?
22:31:10<klea>iirc yes
22:31:58<Flashfire42>No I dont have anything on my clipboard but if you tell me how to access clipboard history I can take a look
22:32:22<Flashfire42>Oh I dont have it turned on
22:32:27<klea>oh
22:32:30<klea>thanks anyways :)
22:32:41<Flashfire42>Sorry
22:32:49<klea>don't worry
22:44:09grill quits [Ping timeout: 272 seconds]
23:38:42<klea>google news is crap?
23:38:54<klea>https://news.google.com/topics/CAAqIQgKIhtDQkFTRGdvSUwyMHZNRE56YUhBU0FuUnlLQUFQAQ?hl=en-US&gl=US&ceid=US:en
23:39:07<Guest>klea: https://www.youtube.com/watch?v=Wac7YCOv7nk
23:39:15<Guest>> How this Captcha scam steals your data
23:40:02<klea>Guest: yes, i wanted to have AB archieve the blobs from it anyways
23:40:14<Guest>there is a sponsor segment from 2:20-3:32, i recommend you get this extension if you havent already: https://sponsor.ajay.app/
23:40:21<Guest>oh ok
23:40:29<klea>i use sponsorblock trough mpv iirc
23:57:52yasomi quits [Ping timeout: 256 seconds]