01:26:33rewby quits [Ping timeout: 268 seconds]
01:28:19rewby (rewby) joins
01:51:18Lord_Nightmare quits [Quit: ZNC - http://znc.in]
01:54:56Lord_Nightmare (Lord_Nightmare) joins
02:39:52<TheTechRobo>(by AT, do you mean archive.today?)
02:48:19DogsRNice quits [Read error: Connection reset by peer]
03:59:29grill quits [Ping timeout: 268 seconds]
03:59:43grill (grill) joins
08:27:46<@arkiver>yeah on Archive Team channels, AT means Archive Team. so if you mean archive.today, spell it out please
08:38:06s-crypt5 quits [Quit: Ping timeout (120 seconds)]
08:38:23s-crypt (s-crypt) joins
08:49:19kdy quits [Remote host closed the connection]
08:49:35kdy (kdy) joins
09:24:36SootBector quits [Remote host closed the connection]
09:25:56SootBector (SootBector) joins
10:57:11SootBector quits [Remote host closed the connection]
10:58:18SootBector (SootBector) joins
11:13:05Dango360 quits [Ping timeout: 268 seconds]
12:15:37Dango360 (Dango360) joins
12:22:50SootBector quits [Remote host closed the connection]
12:23:58SootBector (SootBector) joins
12:30:05Dango360 quits [Ping timeout: 268 seconds]
12:35:38Dango360 (Dango360) joins
13:31:36SootBector quits [Remote host closed the connection]
13:32:56SootBector (SootBector) joins
14:12:06<klea>This channel isn't a AT channel technically.
14:12:23<klea>Or the modes aren't setup to grant group members access.
14:13:26<justauser|m>WDYM?
14:13:30<justauser|m>They seem to be.
14:46:17<klea>Some people with access to !archiveteam-core or !archiveteam-ops (which you can check by going and seeing they're opped on every other AT channel) aren't opped here.
14:48:04<klea>This channel also doesn't seem to have +cC which is included in the modes from the guide. https://wiki.archiveteam.org/index.php/Archiveteam:IRC#Creating_a_channel
16:11:55DogsRNice joins
16:27:03grill_ (grill) joins
16:30:40grill quits [Ping timeout: 268 seconds]
16:33:42grill_ is now known as grill
17:33:19Chris50103 (Chris5010) joins
17:35:20Chris5010 quits [Ping timeout: 268 seconds]
17:35:20Chris50103 is now known as Chris5010
17:37:40balrog quits [Quit: Bye]
17:43:31balrog (balrog) joins
18:57:15Webuser4615581 joins
18:57:21<Webuser4615581>does anybody know how to save an archive.today page on the wayback machine? it always says it fails to resolve
19:14:15<klea>Webuser4615581: Known at least here, I saw that too, and Yakov apparently did too.
19:51:30k02exuY0y8 joins
19:51:32<k02exuY0y8>Hi. I am trying to access a restricted ArchiveTeam CuriousCat item for narrow personal research. Would anyone know whether IA is likely to grant temporary access to one item, or search the restricted indexes on request? The item I am looking at is archiveteam_curiouscat_20240930231834_fd035d71.
19:51:37Webuser877077 joins
19:57:13<klea>(user was sent here from #archiveteam-bs)
19:57:18<pokechu22>You can search some public CDX, e.g. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey= and https://web.archive.org/web/*/https://curiouscat.live/TsarofMeats* https://web.archive.org/web/*/https://curiouscat.live/LandsharkRides*
19:57:52<pokechu22>Assuming the site works by all posts being under that prefix, that might be all that's saved (though https://wiki.archiveteam.org/index.php/CuriousCat does mention a few domains being in use)
19:57:55<klea>how to resumekey?
19:58:26<klea>Yeah, I suppose that's why k02exuY0y8 wanted to check cdx of item.
19:58:39<pokechu22>Plug in the value at the end (eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU) to the URL, i.e. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey=eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU
19:58:45<pokechu22>then slowly repeat
19:59:08<klea>TIL it had resume support.
19:59:56<pokechu22>There's also the pagination system. IIRC resumeKey interacts poorly with filter= and can give incomplete results (which isn't an issue with pagination), but if you're just looking at all results it's fine.
20:01:17TU1gvxEUrr joins
20:01:20<TU1gvxEUrr>What query params should I use for the older pagination system on CDX here? page= with showNumPages=true on a collapsed domain query?
20:01:23<pokechu22>k02exuY0y8: I guess it's also worth noting that the warcs+CDX in archiveteam_curiouscat_20240930231834_fd035d71 are restricted from downloading, the individual pages are still indexed by and accessible on web.archive.org
20:03:42<pokechu22>https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/ia-cdx-search uses page= and pageSize= and showNumPages=
20:08:45k02exuY0y8_ joins
20:08:48<k02exuY0y8_>Thanks. I am paging through public CDX now. First 200k collapsed curiouscat.live URLs did not include either target username. Does "individual pages are still indexed" usually include the API captures too, or mostly just the HTML page URLs?
20:09:17Webuser4615581 quits [Client Quit]
20:09:55<klea>Probably only HTML, unless the project had enough time that it wasn't in too much of a hurry.
20:13:11<klea>Apparently this project did grab /api/ endpoints. https://github.com/ArchiveTeam/curiouscat-grab/blob/master/curiouscat.lua#L277
20:16:55k02exuY0y8__ joins
20:18:33TU1gvxEUrr quits [Remote host closed the connection]
20:18:33k02exuY0y8_ quits [Remote host closed the connection]
20:18:33k02exuY0y8 quits [Remote host closed the connection]
20:18:33k02exuY0y8__ quits [Remote host closed the connection]
20:19:28k02exuY0y8__ joins
20:19:31<k02exuY0y8__>Following up on the CuriousCat point: since curiouscat.lua grabbed /api/ endpoints, should those API URLs still show up in public web.archive.org CDX if I page broadly enough, or can they be effectively invisible unless you have the raw restricted item? And if they are not public, is info@archive.org for a staff-side search the right next step?
20:20:17k02exuY0y8__ quits [Remote host closed the connection]
20:20:27k02exuY0y8___ joins
20:20:30<k02exuY0y8___>pokechu22, klea: do you know whether restricted ArchiveTeam raw items can hide API captures from public CDX entirely, even when the project grabbed them? I am trying to tell whether more public paging is worthwhile or whether the next real step is IA-side access/search.
20:21:16k02exuY0y8___ quits [Remote host closed the connection]
20:21:31<klea>There's wbm exclusions, but if it has WBM exclusions, you're **very** unlikely to get access to the item.
20:21:36k02exuY0y8____ joins
20:21:39<k02exuY0y8____>Does public web.archive.org CDX omit URL records that exist only inside access-restricted ArchiveTeam items?
20:21:57<klea>As far as I know, by default no.
20:22:24k02exuY0y8_____ joins
20:22:24k02exuY0y8____ quits [Remote host closed the connection]
20:22:25<k02exuY0y8_____>Thanks. Then if exact-account CuriousCat /api queries return nothing in public CDX, is the likelier conclusion that those URLs were never captured at all, rather than hidden only because the raw ArchiveTeam item is restricted?
20:22:56<klea>I suppose.
20:23:11k02exuY0y8_____ quits [Remote host closed the connection]
20:26:18k02exuY0y8______ joins
20:26:22<k02exuY0y8______>Thanks, that helps.
20:26:28k02exuY0y8______ quits [Remote host closed the connection]
21:13:47Webuser877077 quits [Client Quit]
21:35:09angenieux2 quits [Read error: Connection reset by peer]
21:36:10angenieux2 (angenieux) joins
22:11:57DogsRNice_ joins
22:12:22<klea>WBM doesn't handle having web.archive.org links properly inside captures: https://web.archive.org/web/20250107145729mp_/https://cohost.org/ticky/post/15513-mods-are-asleep-post
22:12:31<klea>has a link to https://web.archive.org/web/20250107145729/https://web.archive.org/web/19990827174523/http://www.apple.com/main/maps/navbar2.map
22:16:00DogsRNice quits [Ping timeout: 268 seconds]
22:17:04<@JAA>For the future, questions about accessing our past project data are fine in -bs.
22:18:25<klea>Thanks JAA for the heads up.
22:18:26<klea>JAA++
22:18:27<eggdrop>[karma] 'JAA' now has 351 karma!