00:00:41lennier1 quits [Client Quit]
00:35:05<TheTechRobo>JAA, arkiver: Sounds good!
00:50:24<h2ibot>JustAnotherArchivist edited Issuu (+1697): https://wiki.archiveteam.org/?diff=49478&oldid=30144
01:03:00Pingerfowder quits [Client Quit]
01:03:12Pingerfowder (Pingerfowder) joins
01:05:01monoxane quits [Ping timeout: 252 seconds]
01:05:14monoxane (monoxane) joins
01:42:33useretail joins
02:14:46<pabs>once the site comes back online, it clarkesworldmagazine.com should probably get archived, as they are inundated with ChatGPT spam https://twitter.com/clarkesworld/status/1627711728245960704 https://news.ycombinator.com/item?id=34887478
02:17:11<pabs>hmm, https works but http doesn't...
02:18:27<pabs>oh, now their entire site is 403 with some sort of firewall enabled
02:18:42<pabs>aw "Your IP address was temporarily blocked by our IDS."
02:19:06<pabs>probably because I did wget without a U-A
02:20:18qwertyasdfuiopghjkl quits [Remote host closed the connection]
02:24:39<pabs>now they are back, going to AB
02:26:53fuzzy8021 quits [Read error: Connection reset by peer]
02:27:13fuzzy8021 (fuzzy8021) joins
02:27:18qwertyasdfuiopghjkl joins
02:29:28ArchivalEfforts quits [Read error: Connection reset by peer]
02:30:16sidpatchy quits [Ping timeout: 252 seconds]
02:30:30ArchivalEfforts joins
02:30:38anarcat quits [Ping timeout: 252 seconds]
02:31:01sidpatchy joins
02:31:11anarcat (anarcat) joins
02:34:01<TheTechRobo>Maybe we should add another "Project status" thing for when a service is partially going down, cutting functionality, or adding paywalls (like with Issuu)? "Special case" to me sounds like "the website is perfectly fine but we're archiving it anyway"
02:44:17leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in]
02:44:39leo60228 (leo60228) joins
02:45:12drin joins
02:45:40geezabiscuit quits [Ping timeout: 252 seconds]
02:46:03drin is now known as geezabiscuit
02:51:00fishingforsoup_ quits [Read error: Connection reset by peer]
02:51:22fishingforsoup_ joins
02:51:26fuzzy8021 quits [Read error: Connection reset by peer]
02:51:57ave quits [Quit: Ping timeout (120 seconds)]
02:51:57lun4 quits [Quit: Ping timeout (120 seconds)]
02:53:06nepeat quits [Quit: ZNC - https://znc.in]
02:53:22fuzzy8021 (fuzzy8021) joins
02:54:30ave (ave) joins
02:56:49lun4 (lun4) joins
02:59:31nepeat (nepeat) joins
03:05:52lennier1 (lennier1) joins
03:13:03fishingforsoup__ joins
03:16:28fishingforsoup_ quits [Ping timeout: 252 seconds]
03:21:07ell quits [Client Quit]
03:22:12ell (ell) joins
03:43:49Ketchup901 quits [Client Quit]
03:47:04Ketchup901 (Ketchup901) joins
04:41:53Island quits [Read error: Connection reset by peer]
04:43:30user_ quits [Remote host closed the connection]
04:50:18umgr036 joins
05:13:44useretail_ joins
05:13:46wyatt8750 joins
05:13:46useretail quits [Remote host closed the connection]
05:13:46wyatt8740 quits [Client Quit]
06:21:15Arcorann (Arcorann) joins
07:40:22hitgrr8 joins
07:46:26<pabs>https://techcrunch.com/2023/02/21/soylent-acquired-starco-brands-nutrition/
08:35:06treora quits [Remote host closed the connection]
08:35:06treora joins
09:26:11Gereon6200 quits [Client Quit]
09:26:11useretail_ quits [Remote host closed the connection]
09:26:11ell quits [Client Quit]
09:26:11Arcorann quits [Remote host closed the connection]
09:26:11Gereon62005 (Gereon) joins
09:26:11Gereon62005 is now known as Gereon6200
09:26:17treora quits [Client Quit]
09:26:18qwertyasdfuiopghjkl quits [Client Quit]
09:27:17treora joins
09:30:44Arcorann (Arcorann) joins
09:48:37raxxy-137409 quits [Ping timeout: 252 seconds]
09:48:40raxxy-137409 joins
10:21:18LeGoupil joins
11:56:46LeGoupil quits [Ping timeout: 252 seconds]
12:49:36LeGoupil joins
12:51:13Arcorann quits [Ping timeout: 252 seconds]
13:08:58<audrooku|m>It's possible to grab warcs of individual pages directly from the wayback machine, right? How do you do that?
13:56:50HP_Archivist (HP_Archivist) joins
14:58:55HP_Archivist quits [Client Quit]
15:35:17Island joins
15:57:29<@arkiver>kaz: HCross: are you able to reach rewby regarding issuu?
16:18:47<hitgrr8>Is there Flash archive for website banners and such that were on websites during early days?
16:19:13<hitgrr8>I hate that archive.org couldn't able to archive flash files in older websites :(
16:44:11LeGoupil quits [Client Quit]
16:52:42sonick quits [Client Quit]
16:52:43nstrom joins
17:30:04Gereon6200 quits [Ping timeout: 252 seconds]
17:33:01Gereon6200 (Gereon) joins
17:46:42sec^nd quits [Remote host closed the connection]
17:46:51charles joins
18:03:37lennier1 quits [Ping timeout: 252 seconds]
18:04:51lennier1 (lennier1) joins
18:05:38lennier2 joins
18:06:17umgr036 quits [Remote host closed the connection]
18:09:18lennier1 quits [Ping timeout: 252 seconds]
18:09:20lennier2 is now known as lennier1
18:11:09umgr036 joins
18:41:14fl0w joins
18:43:13fl0w_ quits [Ping timeout: 252 seconds]
19:19:53nstrom quits [Client Quit]
19:23:55wyatt8750 quits [Ping timeout: 252 seconds]
19:25:10wyatt8740 joins
19:51:43<@JAA>audrooku|m: No, that's not generally possible. A lot of data isn't publicly accessible.
19:52:23<@JAA>But you can see the item and WARC name in the response headers, and if it's in an open collection, you can access it that way. You'd still need to figure out where in the WARC the record is using the CDX.
20:24:03dan_a quits [Quit: webootsesit]
20:25:40dan_a (dan_a) joins
20:31:49qwertyasdfuiopghjkl joins
20:40:12<audrooku|m>..oh ;-;
20:40:44<audrooku|m>You can view the original webpage without the urls being changed to wayback versions at least though rightm
20:41:22<pokechu22>Yes
20:42:28<@arkiver>yes
20:42:37<pokechu22>https://web.archive.org/web/20230222093528/https://example.com/ for wayback toolbar, https://web.archive.org/web/20230222093528if_/https://example.com/ for no wayback toolbar but still links rewritten ("f" = frame, this is embedded by the other version), https://web.archive.org/web/20230222093528id_/https://example.com/ or
20:42:39<pokechu22>https://web.archive.org/web/20230222093528im_/https://example.com/ for no changing ("d" = data, "im" = image), there's a few other variants too
20:42:55<@arkiver>(was just about to write that, what pokechu22 says yes)
20:43:20<@arkiver>id_ is the general way of getting the original data
20:43:43<audrooku|m>Ooh, ok
20:45:11<pokechu22>Note that the link rewriting is important for things like CSS and images; compare https://web.archive.org/web/20230222092154if_/https://en.wikipedia.org/wiki/Main_Page with https://web.archive.org/web/20230222092154id_/https://en.wikipedia.org/wiki/Main_Page (I'm not 100% sure why *any* images work tbh)
20:45:50<pokechu22>Oh, right, also relative links - <a href="/wiki/Wikipedia" title="Wikipedia">Wikipedia</a> isn't going to work nicely
20:47:34<@JAA>The WBM does some weird magic for absolute URLs. That's why the images get redirected to snapshots.
20:47:55<@arkiver>*weird magic*
20:48:06<@arkiver>WBM is voodoo basically
20:48:57<@JAA>E.g. https://web.archive.org/static/images/icons/wikipedia.png redirects to https://web.archive.org/web/20230222092154/https://en.wikipedia.org/static/images/icons/wikipedia.png
20:49:14<@arkiver>oh *that* weird magic
20:49:18<@JAA>It's not referrer-based either.
20:49:24<pokechu22>For relative URLs, not absolute ones, right? Absolute ones are blocked by the content security policy or something like that?
20:50:36<@JAA>I mean relative URLs that are relative to the document root, yeah.
20:50:51<@JAA>My brain mixed that up with absolute paths. :-)
20:56:34<@JAA>Despite CSP, I've had WBM snapshots try to access external stuff before, by the way. uMatrix to the rescue.
21:21:21sec^nd (second) joins
21:24:09<audrooku|m><pokechu22> "Note that the link rewriting..." <- > Note that the link rewriting is important for things like CSS and images
21:24:09<audrooku|m>Yes I understand, I'm interested in scraping some metadata from page archives
22:24:48BlueMaxima joins
22:33:10p65 joins
22:33:14p65 quits [Remote host closed the connection]
22:33:26<h2ibot>Arkiver uploaded File:Issuu-icon.png: https://wiki.archiveteam.org/?title=File%3AIssuu-icon.png
22:34:00hitgrr8 quits [Client Quit]
23:04:58lennier1 quits [Client Quit]
23:06:06lennier1 (lennier1) joins