00:06:30sralracer quits [Quit: Ooops, wrong browser tab.]
00:58:17IRC2DC joins
01:02:00<pabs>!remind arkiver 40d did you fix the pabs account problem?
01:02:01<eggdrop>[remind] ok, i'll remind arkiver at 2025-01-15T01:02:01Z
02:26:38<TheTechRobo>I'm guessing there isn't any sort of firehose for the CDX API, right?
02:26:53TheTechRobo is trying to figure out how to keep his index up to date
02:46:32<nicolas17>index of what?
02:46:37<nicolas17>are you watching specific URLs or domains?
04:00:32<TheTechRobo>nicolas17: see my messages from yesterday; I'm trying to build an index of YouTube oembed metadata URLs (and possibly others in the future). oembed URLs can have multiple formats that I've seen and the URL format in the ?url= query parameter appears to vary as well. I'm guessing the `filter` parameter will be too inefficient for this (not that I
04:00:32<TheTechRobo>can get it to work; I keep getting HTTP 400), hence my idea to build an index.
04:02:05<TheTechRobo>URLs generally look like http://www.youtube.com/oembed?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DETMvpdfS1rU%26&format=json
04:02:14<TheTechRobo>however I am also seeing some outliers like https://youtube.com/oembed?url
04:02:14<TheTechRobo>=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Du3r_FY9RfRk%26feature%3Demb_title%26ab_channel%3D20buckspinl
04:02:14<TheTechRobo>abel&format=json%5C
04:02:19<TheTechRobo>(thanks TheLounge)
04:03:19<TheTechRobo>I guess one idea would be to only index the outliers and to do a CDX API call each time for the usual format.
04:05:21<@JAA>The Lounge--
04:05:22<eggdrop>[karma] 'The Lounge' now has -47 karma!
04:05:52<@JAA>I doubt there's a firehose-like thing for CDX.
04:07:16<TheTechRobo>I figured, yeah
04:07:48<TheTechRobo>CDX sorts query parameters, right?
04:10:32<@JAA>Yeah
04:21:31<datechnoman>The sheer amount of data would and bandwidth to stream it would be insane lol
06:01:33atphoenix quits [Read error: Connection reset by peer]
06:04:26atphoenix (atphoenix) joins
06:24:46<pokechu22>How does wpull work when <a href="image.png"><img src="image.png" /></a> appears in an offsite page? Will it always download image.png as an embedded image, or is it random as image.png might first be seen as a link in an offsite page?
06:25:29<@JAA>Did you mean #archiveteam-dev ?
07:15:10<pokechu22>Yes
08:49:48BornOn420 quits [Remote host closed the connection]
08:50:16BornOn420 (BornOn420) joins
09:15:22nulldata quits [Quit: So long and thanks for all the fish!]
09:16:52nulldata (nulldata) joins
09:40:16BornOn420 quits [Remote host closed the connection]
09:40:56BornOn420 (BornOn420) joins
09:59:49driib quits [Quit: The Lounge - https://thelounge.chat]
10:00:09driib (driib) joins
11:09:33sralracer (sralracer) joins
13:25:04th3z0l4 joins
15:28:55BornOn420 quits [Remote host closed the connection]
15:30:22BornOn420 (BornOn420) joins
15:50:27BornOn420 quits [Remote host closed the connection]
15:50:51BornOn420 (BornOn420) joins
16:13:37BornOn420 quits [Remote host closed the connection]
16:14:09BornOn420 (BornOn420) joins
16:32:22th3z0l4 quits [Ping timeout: 260 seconds]
16:32:44th3z0l4 joins
17:58:41that_lurker quits [Remote host closed the connection]
17:58:46that_lurker (that_lurker) joins
19:09:00HP_Archivist quits [Read error: Connection reset by peer]
19:53:38imer quits [Quit: Oh no]
19:54:42imer (imer) joins
20:03:44imer quits [Changing host]
20:03:44imer (imer) joins
20:09:58Jinx-Pi2W joins
20:17:19<Jinx-Pi2W>is signing up for a new account at archive.org working for anyone else? or is it disabled for some reason?
20:18:27<Jinx-Pi2W>i can't seem to sign-up at https://archive.org/account/signup
20:18:53<Jinx-Pi2W>there is a message on top of the --Or-- bar that says ""The Google library failed to load. Please refresh and try again""
20:19:09<Jinx-Pi2W>but --Or-- seems to imply the bottom should still work?
20:19:23<@JAA>That message should only be relevant if you want to use a Google account, yeah.
20:19:58<@JAA>Hmm, or do they employ reCAPTCHA now?
20:22:10<@JAA>It looks like they might be blocking the reCAPTCHA script with the Content Security Policy.
20:43:00<Jinx-Pi2W>so... it's not my browser blocking it but their own policy??
20:44:20<nicolas17>l o l
20:44:54<@JAA>It's very secure. Not a single new account has been compromised since the measure was taken!
20:46:34<nicolas17>how long has this been broken
20:47:40<Jinx-Pi2W>lol
20:49:11<Jinx-Pi2W>from searching it seems possibly for months :(
22:00:51sralracer quits [Quit: Ooops, wrong browser tab.]