| 01:26:33 | | rewby quits [Ping timeout: 268 seconds] |
| 01:28:19 | | rewby (rewby) joins |
| 01:51:18 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 01:54:56 | | Lord_Nightmare (Lord_Nightmare) joins |
| 02:39:52 | <TheTechRobo> | (by AT, do you mean archive.today?) |
| 02:48:19 | | DogsRNice quits [Read error: Connection reset by peer] |
| 03:59:29 | | grill quits [Ping timeout: 268 seconds] |
| 03:59:43 | | grill (grill) joins |
| 08:27:46 | <@arkiver> | yeah on Archive Team channels, AT means Archive Team. so if you mean archive.today, spell it out please |
| 08:38:06 | | s-crypt5 quits [Quit: Ping timeout (120 seconds)] |
| 08:38:23 | | s-crypt (s-crypt) joins |
| 08:49:19 | | kdy quits [Remote host closed the connection] |
| 08:49:35 | | kdy (kdy) joins |
| 09:24:36 | | SootBector quits [Remote host closed the connection] |
| 09:25:56 | | SootBector (SootBector) joins |
| 10:57:11 | | SootBector quits [Remote host closed the connection] |
| 10:58:18 | | SootBector (SootBector) joins |
| 11:13:05 | | Dango360 quits [Ping timeout: 268 seconds] |
| 12:15:37 | | Dango360 (Dango360) joins |
| 12:22:50 | | SootBector quits [Remote host closed the connection] |
| 12:23:58 | | SootBector (SootBector) joins |
| 12:30:05 | | Dango360 quits [Ping timeout: 268 seconds] |
| 12:35:38 | | Dango360 (Dango360) joins |
| 13:31:36 | | SootBector quits [Remote host closed the connection] |
| 13:32:56 | | SootBector (SootBector) joins |
| 14:12:06 | <klea> | This channel isn't a AT channel technically. |
| 14:12:23 | <klea> | Or the modes aren't setup to grant group members access. |
| 14:13:26 | <justauser|m> | WDYM? |
| 14:13:30 | <justauser|m> | They seem to be. |
| 14:46:17 | <klea> | Some people with access to !archiveteam-core or !archiveteam-ops (which you can check by going and seeing they're opped on every other AT channel) aren't opped here. |
| 14:48:04 | <klea> | This channel also doesn't seem to have +cC which is included in the modes from the guide. https://wiki.archiveteam.org/index.php/Archiveteam:IRC#Creating_a_channel |
| 16:11:55 | | DogsRNice joins |
| 16:27:03 | | grill_ (grill) joins |
| 16:30:40 | | grill quits [Ping timeout: 268 seconds] |
| 16:33:42 | | grill_ is now known as grill |
| 17:33:19 | | Chris50103 (Chris5010) joins |
| 17:35:20 | | Chris5010 quits [Ping timeout: 268 seconds] |
| 17:35:20 | | Chris50103 is now known as Chris5010 |
| 17:37:40 | | balrog quits [Quit: Bye] |
| 17:43:31 | | balrog (balrog) joins |
| 18:57:15 | | Webuser4615581 joins |
| 18:57:21 | <Webuser4615581> | does anybody know how to save an archive.today page on the wayback machine? it always says it fails to resolve |
| 19:14:15 | <klea> | Webuser4615581: Known at least here, I saw that too, and Yakov apparently did too. |
| 19:51:30 | | k02exuY0y8 joins |
| 19:51:32 | <k02exuY0y8> | Hi. I am trying to access a restricted ArchiveTeam CuriousCat item for narrow personal research. Would anyone know whether IA is likely to grant temporary access to one item, or search the restricted indexes on request? The item I am looking at is archiveteam_curiouscat_20240930231834_fd035d71. |
| 19:51:37 | | Webuser877077 joins |
| 19:57:13 | <klea> | (user was sent here from #archiveteam-bs) |
| 19:57:18 | <pokechu22> | You can search some public CDX, e.g. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey= and https://web.archive.org/web/*/https://curiouscat.live/TsarofMeats* https://web.archive.org/web/*/https://curiouscat.live/LandsharkRides* |
| 19:57:52 | <pokechu22> | Assuming the site works by all posts being under that prefix, that might be all that's saved (though https://wiki.archiveteam.org/index.php/CuriousCat does mention a few domains being in use) |
| 19:57:55 | <klea> | how to resumekey? |
| 19:58:26 | <klea> | Yeah, I suppose that's why k02exuY0y8 wanted to check cdx of item. |
| 19:58:39 | <pokechu22> | Plug in the value at the end (eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU) to the URL, i.e. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey=eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU |
| 19:58:45 | <pokechu22> | then slowly repeat |
| 19:59:08 | <klea> | TIL it had resume support. |
| 19:59:56 | <pokechu22> | There's also the pagination system. IIRC resumeKey interacts poorly with filter= and can give incomplete results (which isn't an issue with pagination), but if you're just looking at all results it's fine. |
| 20:01:17 | | TU1gvxEUrr joins |
| 20:01:20 | <TU1gvxEUrr> | What query params should I use for the older pagination system on CDX here? page= with showNumPages=true on a collapsed domain query? |
| 20:01:23 | <pokechu22> | k02exuY0y8: I guess it's also worth noting that the warcs+CDX in archiveteam_curiouscat_20240930231834_fd035d71 are restricted from downloading, the individual pages are still indexed by and accessible on web.archive.org |
| 20:03:42 | <pokechu22> | https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/ia-cdx-search uses page= and pageSize= and showNumPages= |
| 20:08:45 | | k02exuY0y8_ joins |
| 20:08:48 | <k02exuY0y8_> | Thanks. I am paging through public CDX now. First 200k collapsed curiouscat.live URLs did not include either target username. Does "individual pages are still indexed" usually include the API captures too, or mostly just the HTML page URLs? |
| 20:09:17 | | Webuser4615581 quits [Client Quit] |
| 20:09:55 | <klea> | Probably only HTML, unless the project had enough time that it wasn't in too much of a hurry. |
| 20:13:11 | <klea> | Apparently this project did grab /api/ endpoints. https://github.com/ArchiveTeam/curiouscat-grab/blob/master/curiouscat.lua#L277 |
| 20:16:55 | | k02exuY0y8__ joins |
| 20:18:33 | | TU1gvxEUrr quits [Remote host closed the connection] |
| 20:18:33 | | k02exuY0y8_ quits [Remote host closed the connection] |
| 20:18:33 | | k02exuY0y8 quits [Remote host closed the connection] |
| 20:18:33 | | k02exuY0y8__ quits [Remote host closed the connection] |
| 20:19:28 | | k02exuY0y8__ joins |
| 20:19:31 | <k02exuY0y8__> | Following up on the CuriousCat point: since curiouscat.lua grabbed /api/ endpoints, should those API URLs still show up in public web.archive.org CDX if I page broadly enough, or can they be effectively invisible unless you have the raw restricted item? And if they are not public, is info@archive.org for a staff-side search the right next step? |
| 20:20:17 | | k02exuY0y8__ quits [Remote host closed the connection] |
| 20:20:27 | | k02exuY0y8___ joins |
| 20:20:30 | <k02exuY0y8___> | pokechu22, klea: do you know whether restricted ArchiveTeam raw items can hide API captures from public CDX entirely, even when the project grabbed them? I am trying to tell whether more public paging is worthwhile or whether the next real step is IA-side access/search. |
| 20:21:16 | | k02exuY0y8___ quits [Remote host closed the connection] |
| 20:21:31 | <klea> | There's wbm exclusions, but if it has WBM exclusions, you're **very** unlikely to get access to the item. |
| 20:21:36 | | k02exuY0y8____ joins |
| 20:21:39 | <k02exuY0y8____> | Does public web.archive.org CDX omit URL records that exist only inside access-restricted ArchiveTeam items? |
| 20:21:57 | <klea> | As far as I know, by default no. |
| 20:22:24 | | k02exuY0y8_____ joins |
| 20:22:24 | | k02exuY0y8____ quits [Remote host closed the connection] |
| 20:22:25 | <k02exuY0y8_____> | Thanks. Then if exact-account CuriousCat /api queries return nothing in public CDX, is the likelier conclusion that those URLs were never captured at all, rather than hidden only because the raw ArchiveTeam item is restricted? |
| 20:22:56 | <klea> | I suppose. |
| 20:23:11 | | k02exuY0y8_____ quits [Remote host closed the connection] |
| 20:26:18 | | k02exuY0y8______ joins |
| 20:26:22 | <k02exuY0y8______> | Thanks, that helps. |
| 20:26:28 | | k02exuY0y8______ quits [Remote host closed the connection] |
| 21:13:47 | | Webuser877077 quits [Client Quit] |
| 21:35:09 | | angenieux2 quits [Read error: Connection reset by peer] |
| 21:36:10 | | angenieux2 (angenieux) joins |
| 22:11:57 | | DogsRNice_ joins |
| 22:12:22 | <klea> | WBM doesn't handle having web.archive.org links properly inside captures: https://web.archive.org/web/20250107145729mp_/https://cohost.org/ticky/post/15513-mods-are-asleep-post |
| 22:12:31 | <klea> | has a link to https://web.archive.org/web/20250107145729/https://web.archive.org/web/19990827174523/http://www.apple.com/main/maps/navbar2.map |
| 22:16:00 | | DogsRNice quits [Ping timeout: 268 seconds] |
| 22:17:04 | <@JAA> | For the future, questions about accessing our past project data are fine in -bs. |
| 22:18:25 | <klea> | Thanks JAA for the heads up. |
| 22:18:26 | <klea> | JAA++ |
| 22:18:27 | <eggdrop> | [karma] 'JAA' now has 351 karma! |