00:06:30 | | sralracer quits [Quit: Ooops, wrong browser tab.] |
00:58:17 | | IRC2DC joins |
01:02:00 | <pabs> | !remind arkiver 40d did you fix the pabs account problem? |
01:02:01 | <eggdrop> | [remind] ok, i'll remind arkiver at 2025-01-15T01:02:01Z |
02:26:38 | <TheTechRobo> | I'm guessing there isn't any sort of firehose for the CDX API, right? |
02:26:53 | | TheTechRobo is trying to figure out how to keep his index up to date |
02:46:32 | <nicolas17> | index of what? |
02:46:37 | <nicolas17> | are you watching specific URLs or domains? |
04:00:32 | <TheTechRobo> | nicolas17: see my messages from yesterday; I'm trying to build an index of YouTube oembed metadata URLs (and possibly others in the future). oembed URLs can have multiple formats that I've seen and the URL format in the ?url= query parameter appears to vary as well. I'm guessing the `filter` parameter will be too inefficient for this (not that I |
04:00:32 | <TheTechRobo> | can get it to work; I keep getting HTTP 400), hence my idea to build an index. |
04:02:05 | <TheTechRobo> | URLs generally look like http://www.youtube.com/oembed?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DETMvpdfS1rU%26&format=json |
04:02:14 | <TheTechRobo> | however I am also seeing some outliers like https://youtube.com/oembed?url |
04:02:14 | <TheTechRobo> | =https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Du3r_FY9RfRk%26feature%3Demb_title%26ab_channel%3D20buckspinl |
04:02:14 | <TheTechRobo> | abel&format=json%5C |
04:02:19 | <TheTechRobo> | (thanks TheLounge) |
04:03:19 | <TheTechRobo> | I guess one idea would be to only index the outliers and to do a CDX API call each time for the usual format. |
04:05:21 | <@JAA> | The Lounge-- |
04:05:22 | <eggdrop> | [karma] 'The Lounge' now has -47 karma! |
04:05:52 | <@JAA> | I doubt there's a firehose-like thing for CDX. |
04:07:16 | <TheTechRobo> | I figured, yeah |
04:07:48 | <TheTechRobo> | CDX sorts query parameters, right? |
04:10:32 | <@JAA> | Yeah |
04:21:31 | <datechnoman> | The sheer amount of data would and bandwidth to stream it would be insane lol |
06:01:33 | | atphoenix quits [Read error: Connection reset by peer] |
06:04:26 | | atphoenix (atphoenix) joins |
06:24:46 | <pokechu22> | How does wpull work when <a href="image.png"><img src="image.png" /></a> appears in an offsite page? Will it always download image.png as an embedded image, or is it random as image.png might first be seen as a link in an offsite page? |
06:25:29 | <@JAA> | Did you mean #archiveteam-dev ? |
07:15:10 | <pokechu22> | Yes |
08:49:48 | | BornOn420 quits [Remote host closed the connection] |
08:50:16 | | BornOn420 (BornOn420) joins |
09:15:22 | | nulldata quits [Quit: So long and thanks for all the fish!] |
09:16:52 | | nulldata (nulldata) joins |
09:40:16 | | BornOn420 quits [Remote host closed the connection] |
09:40:56 | | BornOn420 (BornOn420) joins |
09:59:49 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
10:00:09 | | driib (driib) joins |
11:09:33 | | sralracer (sralracer) joins |
13:25:04 | | th3z0l4 joins |
15:28:55 | | BornOn420 quits [Remote host closed the connection] |
15:30:22 | | BornOn420 (BornOn420) joins |
15:50:27 | | BornOn420 quits [Remote host closed the connection] |
15:50:51 | | BornOn420 (BornOn420) joins |
16:13:37 | | BornOn420 quits [Remote host closed the connection] |
16:14:09 | | BornOn420 (BornOn420) joins |
16:32:22 | | th3z0l4 quits [Ping timeout: 260 seconds] |
16:32:44 | | th3z0l4 joins |
17:58:41 | | that_lurker quits [Remote host closed the connection] |
17:58:46 | | that_lurker (that_lurker) joins |
19:09:00 | | HP_Archivist quits [Read error: Connection reset by peer] |
19:53:38 | | imer quits [Quit: Oh no] |
19:54:42 | | imer (imer) joins |
20:03:44 | | imer quits [Changing host] |
20:03:44 | | imer (imer) joins |
20:09:58 | | Jinx-Pi2W joins |
20:17:19 | <Jinx-Pi2W> | is signing up for a new account at archive.org working for anyone else? or is it disabled for some reason? |
20:18:27 | <Jinx-Pi2W> | i can't seem to sign-up at https://archive.org/account/signup |
20:18:53 | <Jinx-Pi2W> | there is a message on top of the --Or-- bar that says ""The Google library failed to load. Please refresh and try again"" |
20:19:09 | <Jinx-Pi2W> | but --Or-- seems to imply the bottom should still work? |
20:19:23 | <@JAA> | That message should only be relevant if you want to use a Google account, yeah. |
20:19:58 | <@JAA> | Hmm, or do they employ reCAPTCHA now? |
20:22:10 | <@JAA> | It looks like they might be blocking the reCAPTCHA script with the Content Security Policy. |
20:43:00 | <Jinx-Pi2W> | so... it's not my browser blocking it but their own policy?? |
20:44:20 | <nicolas17> | l o l |
20:44:54 | <@JAA> | It's very secure. Not a single new account has been compromised since the measure was taken! |
20:46:34 | <nicolas17> | how long has this been broken |
20:47:40 | <Jinx-Pi2W> | lol |
20:49:11 | <Jinx-Pi2W> | from searching it seems possibly for months :( |
22:00:51 | | sralracer quits [Quit: Ooops, wrong browser tab.] |