| 00:00:41 | | lennier1 quits [Client Quit] |
| 00:35:05 | <TheTechRobo> | JAA, arkiver: Sounds good! |
| 00:50:24 | <h2ibot> | JustAnotherArchivist edited Issuu (+1697): https://wiki.archiveteam.org/?diff=49478&oldid=30144 |
| 01:03:00 | | Pingerfowder quits [Client Quit] |
| 01:03:12 | | Pingerfowder (Pingerfowder) joins |
| 01:05:01 | | monoxane quits [Ping timeout: 252 seconds] |
| 01:05:14 | | monoxane (monoxane) joins |
| 01:42:33 | | useretail joins |
| 02:14:46 | <pabs> | once the site comes back online, it clarkesworldmagazine.com should probably get archived, as they are inundated with ChatGPT spam https://twitter.com/clarkesworld/status/1627711728245960704 https://news.ycombinator.com/item?id=34887478 |
| 02:17:11 | <pabs> | hmm, https works but http doesn't... |
| 02:18:27 | <pabs> | oh, now their entire site is 403 with some sort of firewall enabled |
| 02:18:42 | <pabs> | aw "Your IP address was temporarily blocked by our IDS." |
| 02:19:06 | <pabs> | probably because I did wget without a U-A |
| 02:20:18 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 02:24:39 | <pabs> | now they are back, going to AB |
| 02:26:53 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 02:27:13 | | fuzzy8021 (fuzzy8021) joins |
| 02:27:18 | | qwertyasdfuiopghjkl joins |
| 02:29:28 | | ArchivalEfforts quits [Read error: Connection reset by peer] |
| 02:30:16 | | sidpatchy quits [Ping timeout: 252 seconds] |
| 02:30:30 | | ArchivalEfforts joins |
| 02:30:38 | | anarcat quits [Ping timeout: 252 seconds] |
| 02:31:01 | | sidpatchy joins |
| 02:31:11 | | anarcat (anarcat) joins |
| 02:34:01 | <TheTechRobo> | Maybe we should add another "Project status" thing for when a service is partially going down, cutting functionality, or adding paywalls (like with Issuu)? "Special case" to me sounds like "the website is perfectly fine but we're archiving it anyway" |
| 02:44:17 | | leo60228- quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 02:44:39 | | leo60228 (leo60228) joins |
| 02:45:12 | | drin joins |
| 02:45:40 | | geezabiscuit quits [Ping timeout: 252 seconds] |
| 02:46:03 | | drin is now known as geezabiscuit |
| 02:51:00 | | fishingforsoup_ quits [Read error: Connection reset by peer] |
| 02:51:22 | | fishingforsoup_ joins |
| 02:51:26 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 02:51:57 | | ave quits [Quit: Ping timeout (120 seconds)] |
| 02:51:57 | | lun4 quits [Quit: Ping timeout (120 seconds)] |
| 02:53:06 | | nepeat quits [Quit: ZNC - https://znc.in] |
| 02:53:22 | | fuzzy8021 (fuzzy8021) joins |
| 02:54:30 | | ave (ave) joins |
| 02:56:49 | | lun4 (lun4) joins |
| 02:59:31 | | nepeat (nepeat) joins |
| 03:05:52 | | lennier1 (lennier1) joins |
| 03:13:03 | | fishingforsoup__ joins |
| 03:16:28 | | fishingforsoup_ quits [Ping timeout: 252 seconds] |
| 03:21:07 | | ell quits [Client Quit] |
| 03:22:12 | | ell (ell) joins |
| 03:43:49 | | Ketchup901 quits [Client Quit] |
| 03:47:04 | | Ketchup901 (Ketchup901) joins |
| 04:41:53 | | Island quits [Read error: Connection reset by peer] |
| 04:43:30 | | user_ quits [Remote host closed the connection] |
| 04:50:18 | | umgr036 joins |
| 05:13:44 | | useretail_ joins |
| 05:13:46 | | wyatt8750 joins |
| 05:13:46 | | useretail quits [Remote host closed the connection] |
| 05:13:46 | | wyatt8740 quits [Client Quit] |
| 06:21:15 | | Arcorann (Arcorann) joins |
| 07:40:22 | | hitgrr8 joins |
| 07:46:26 | <pabs> | https://techcrunch.com/2023/02/21/soylent-acquired-starco-brands-nutrition/ |
| 08:35:06 | | treora quits [Remote host closed the connection] |
| 08:35:06 | | treora joins |
| 09:26:11 | | Gereon6200 quits [Client Quit] |
| 09:26:11 | | useretail_ quits [Remote host closed the connection] |
| 09:26:11 | | ell quits [Client Quit] |
| 09:26:11 | | Arcorann quits [Remote host closed the connection] |
| 09:26:11 | | Gereon62005 (Gereon) joins |
| 09:26:11 | | Gereon62005 is now known as Gereon6200 |
| 09:26:17 | | treora quits [Client Quit] |
| 09:26:18 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 09:27:17 | | treora joins |
| 09:30:44 | | Arcorann (Arcorann) joins |
| 09:48:37 | | raxxy-137409 quits [Ping timeout: 252 seconds] |
| 09:48:40 | | raxxy-137409 joins |
| 10:21:18 | | LeGoupil joins |
| 11:56:46 | | LeGoupil quits [Ping timeout: 252 seconds] |
| 12:49:36 | | LeGoupil joins |
| 12:51:13 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:08:58 | <audrooku|m> | It's possible to grab warcs of individual pages directly from the wayback machine, right? How do you do that? |
| 13:56:50 | | HP_Archivist (HP_Archivist) joins |
| 14:58:55 | | HP_Archivist quits [Client Quit] |
| 15:35:17 | | Island joins |
| 15:57:29 | <@arkiver> | kaz: HCross: are you able to reach rewby regarding issuu? |
| 16:18:47 | <hitgrr8> | Is there Flash archive for website banners and such that were on websites during early days? |
| 16:19:13 | <hitgrr8> | I hate that archive.org couldn't able to archive flash files in older websites :( |
| 16:44:11 | | LeGoupil quits [Client Quit] |
| 16:52:42 | | sonick quits [Client Quit] |
| 16:52:43 | | nstrom joins |
| 17:30:04 | | Gereon6200 quits [Ping timeout: 252 seconds] |
| 17:33:01 | | Gereon6200 (Gereon) joins |
| 17:46:42 | | sec^nd quits [Remote host closed the connection] |
| 17:46:51 | | charles joins |
| 18:03:37 | | lennier1 quits [Ping timeout: 252 seconds] |
| 18:04:51 | | lennier1 (lennier1) joins |
| 18:05:38 | | lennier2 joins |
| 18:06:17 | | umgr036 quits [Remote host closed the connection] |
| 18:09:18 | | lennier1 quits [Ping timeout: 252 seconds] |
| 18:09:20 | | lennier2 is now known as lennier1 |
| 18:11:09 | | umgr036 joins |
| 18:41:14 | | fl0w joins |
| 18:43:13 | | fl0w_ quits [Ping timeout: 252 seconds] |
| 19:19:53 | | nstrom quits [Client Quit] |
| 19:23:55 | | wyatt8750 quits [Ping timeout: 252 seconds] |
| 19:25:10 | | wyatt8740 joins |
| 19:51:43 | <@JAA> | audrooku|m: No, that's not generally possible. A lot of data isn't publicly accessible. |
| 19:52:23 | <@JAA> | But you can see the item and WARC name in the response headers, and if it's in an open collection, you can access it that way. You'd still need to figure out where in the WARC the record is using the CDX. |
| 20:24:03 | | dan_a quits [Quit: webootsesit] |
| 20:25:40 | | dan_a (dan_a) joins |
| 20:31:49 | | qwertyasdfuiopghjkl joins |
| 20:40:12 | <audrooku|m> | ..oh ;-; |
| 20:40:44 | <audrooku|m> | You can view the original webpage without the urls being changed to wayback versions at least though rightm |
| 20:41:22 | <pokechu22> | Yes |
| 20:42:28 | <@arkiver> | yes |
| 20:42:37 | <pokechu22> | https://web.archive.org/web/20230222093528/https://example.com/ for wayback toolbar, https://web.archive.org/web/20230222093528if_/https://example.com/ for no wayback toolbar but still links rewritten ("f" = frame, this is embedded by the other version), https://web.archive.org/web/20230222093528id_/https://example.com/ or |
| 20:42:39 | <pokechu22> | https://web.archive.org/web/20230222093528im_/https://example.com/ for no changing ("d" = data, "im" = image), there's a few other variants too |
| 20:42:55 | <@arkiver> | (was just about to write that, what pokechu22 says yes) |
| 20:43:20 | <@arkiver> | id_ is the general way of getting the original data |
| 20:43:43 | <audrooku|m> | Ooh, ok |
| 20:45:11 | <pokechu22> | Note that the link rewriting is important for things like CSS and images; compare https://web.archive.org/web/20230222092154if_/https://en.wikipedia.org/wiki/Main_Page with https://web.archive.org/web/20230222092154id_/https://en.wikipedia.org/wiki/Main_Page (I'm not 100% sure why *any* images work tbh) |
| 20:45:50 | <pokechu22> | Oh, right, also relative links - <a href="/wiki/Wikipedia" title="Wikipedia">Wikipedia</a> isn't going to work nicely |
| 20:47:34 | <@JAA> | The WBM does some weird magic for absolute URLs. That's why the images get redirected to snapshots. |
| 20:47:55 | <@arkiver> | *weird magic* |
| 20:48:06 | <@arkiver> | WBM is voodoo basically |
| 20:48:57 | <@JAA> | E.g. https://web.archive.org/static/images/icons/wikipedia.png redirects to https://web.archive.org/web/20230222092154/https://en.wikipedia.org/static/images/icons/wikipedia.png |
| 20:49:14 | <@arkiver> | oh *that* weird magic |
| 20:49:18 | <@JAA> | It's not referrer-based either. |
| 20:49:24 | <pokechu22> | For relative URLs, not absolute ones, right? Absolute ones are blocked by the content security policy or something like that? |
| 20:50:36 | <@JAA> | I mean relative URLs that are relative to the document root, yeah. |
| 20:50:51 | <@JAA> | My brain mixed that up with absolute paths. :-) |
| 20:56:34 | <@JAA> | Despite CSP, I've had WBM snapshots try to access external stuff before, by the way. uMatrix to the rescue. |
| 21:21:21 | | sec^nd (second) joins |
| 21:24:09 | <audrooku|m> | <pokechu22> "Note that the link rewriting..." <- > Note that the link rewriting is important for things like CSS and images |
| 21:24:09 | <audrooku|m> | Yes I understand, I'm interested in scraping some metadata from page archives |
| 22:24:48 | | BlueMaxima joins |
| 22:33:10 | | p65 joins |
| 22:33:14 | | p65 quits [Remote host closed the connection] |
| 22:33:26 | <h2ibot> | Arkiver uploaded File:Issuu-icon.png: https://wiki.archiveteam.org/?title=File%3AIssuu-icon.png |
| 22:34:00 | | hitgrr8 quits [Client Quit] |
| 23:04:58 | | lennier1 quits [Client Quit] |
| 23:06:06 | | lennier1 (lennier1) joins |