#internetarchive log for 2026-03-24

Home Search Previous day Next day

01:26:33		rewby quits [Ping timeout: 268 seconds]
01:28:19		rewby (rewby) joins
01:51:18		Lord_Nightmare quits [Quit: ZNC - http://znc.in]
01:54:56		Lord_Nightmare (Lord_Nightmare) joins
02:39:52	<TheTechRobo>	(by AT, do you mean archive.today?)
02:48:19		DogsRNice quits [Read error: Connection reset by peer]
03:59:29		grill quits [Ping timeout: 268 seconds]
03:59:43		grill (grill) joins
08:27:46	<@arkiver>	yeah on Archive Team channels, AT means Archive Team. so if you mean archive.today, spell it out please
08:38:06		s-crypt5 quits [Quit: Ping timeout (120 seconds)]
08:38:23		s-crypt (s-crypt) joins
08:49:19		kdy quits [Remote host closed the connection]
08:49:35		kdy (kdy) joins
09:24:36		SootBector quits [Remote host closed the connection]
09:25:56		SootBector (SootBector) joins
10:57:11		SootBector quits [Remote host closed the connection]
10:58:18		SootBector (SootBector) joins
11:13:05		Dango360 quits [Ping timeout: 268 seconds]
12:15:37		Dango360 (Dango360) joins
12:22:50		SootBector quits [Remote host closed the connection]
12:23:58		SootBector (SootBector) joins
12:30:05		Dango360 quits [Ping timeout: 268 seconds]
12:35:38		Dango360 (Dango360) joins
13:31:36		SootBector quits [Remote host closed the connection]
13:32:56		SootBector (SootBector) joins
14:12:06	<klea>	This channel isn't a AT channel technically.
14:12:23	<klea>	Or the modes aren't setup to grant group members access.
14:13:26	<justauser\|m>	WDYM?
14:13:30	<justauser\|m>	They seem to be.
14:46:17	<klea>	Some people with access to !archiveteam-core or !archiveteam-ops (which you can check by going and seeing they're opped on every other AT channel) aren't opped here.
14:48:04	<klea>	This channel also doesn't seem to have +cC which is included in the modes from the guide. https://wiki.archiveteam.org/index.php/Archiveteam:IRC#Creating_a_channel
16:11:55		DogsRNice joins
16:27:03		grill_ (grill) joins
16:30:40		grill quits [Ping timeout: 268 seconds]
16:33:42		grill_ is now known as grill
17:33:19		Chris50103 (Chris5010) joins
17:35:20		Chris5010 quits [Ping timeout: 268 seconds]
17:35:20		Chris50103 is now known as Chris5010
17:37:40		balrog quits [Quit: Bye]
17:43:31		balrog (balrog) joins
18:57:15		Webuser4615581 joins
18:57:21	<Webuser4615581>	does anybody know how to save an archive.today page on the wayback machine? it always says it fails to resolve
19:14:15	<klea>	Webuser4615581: Known at least here, I saw that too, and Yakov apparently did too.
19:51:30		k02exuY0y8 joins
19:51:32	<k02exuY0y8>	Hi. I am trying to access a restricted ArchiveTeam CuriousCat item for narrow personal research. Would anyone know whether IA is likely to grant temporary access to one item, or search the restricted indexes on request? The item I am looking at is archiveteam_curiouscat_20240930231834_fd035d71.
19:51:37		Webuser877077 joins
19:57:13	<klea>	(user was sent here from #archiveteam-bs)
19:57:18	<pokechu22>	You can search some public CDX, e.g. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey= and https://web.archive.org/web//https://curiouscat.live/TsarofMeats https://web.archive.org/web//https://curiouscat.live/LandsharkRides
19:57:52	<pokechu22>	Assuming the site works by all posts being under that prefix, that might be all that's saved (though https://wiki.archiveteam.org/index.php/CuriousCat does mention a few domains being in use)
19:57:55	<klea>	how to resumekey?
19:58:26	<klea>	Yeah, I suppose that's why k02exuY0y8 wanted to check cdx of item.
19:58:39	<pokechu22>	Plug in the value at the end (eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU) to the URL, i.e. https://web.archive.org/cdx/search/cdx?url=curiouscat.live&collapse=urlkey&matchType=domain&limit=100000&showResumeKey=true&resumeKey=eJzLySxL1UkuLcrMLy1OTizR1DcwNDQystQvyC8u0Tc0MjM0NzE0NrRQMDIwMjIwNLIwtDA1NTIGANGIDrU
19:58:45	<pokechu22>	then slowly repeat
19:59:08	<klea>	TIL it had resume support.
19:59:56	<pokechu22>	There's also the pagination system. IIRC resumeKey interacts poorly with filter= and can give incomplete results (which isn't an issue with pagination), but if you're just looking at all results it's fine.
20:01:17		TU1gvxEUrr joins
20:01:20	<TU1gvxEUrr>	What query params should I use for the older pagination system on CDX here? page= with showNumPages=true on a collapsed domain query?
20:01:23	<pokechu22>	k02exuY0y8: I guess it's also worth noting that the warcs+CDX in archiveteam_curiouscat_20240930231834_fd035d71 are restricted from downloading, the individual pages are still indexed by and accessible on web.archive.org
20:03:42	<pokechu22>	https://gitea.arpa.li/JustAnotherArchivist/little-things/src/branch/master/ia-cdx-search uses page= and pageSize= and showNumPages=
20:08:45		k02exuY0y8_ joins
20:08:48	<k02exuY0y8_>	Thanks. I am paging through public CDX now. First 200k collapsed curiouscat.live URLs did not include either target username. Does "individual pages are still indexed" usually include the API captures too, or mostly just the HTML page URLs?
20:09:17		Webuser4615581 quits [Client Quit]
20:09:55	<klea>	Probably only HTML, unless the project had enough time that it wasn't in too much of a hurry.
20:13:11	<klea>	Apparently this project did grab /api/ endpoints. https://github.com/ArchiveTeam/curiouscat-grab/blob/master/curiouscat.lua#L277
20:16:55		k02exuY0y8__ joins
20:18:33		TU1gvxEUrr quits [Remote host closed the connection]
20:18:33		k02exuY0y8_ quits [Remote host closed the connection]
20:18:33		k02exuY0y8 quits [Remote host closed the connection]
20:18:33		k02exuY0y8__ quits [Remote host closed the connection]
20:19:28		k02exuY0y8__ joins
20:19:31	<k02exuY0y8__>	Following up on the CuriousCat point: since curiouscat.lua grabbed /api/ endpoints, should those API URLs still show up in public web.archive.org CDX if I page broadly enough, or can they be effectively invisible unless you have the raw restricted item? And if they are not public, is info@archive.org for a staff-side search the right next step?
20:20:17		k02exuY0y8__ quits [Remote host closed the connection]
20:20:27		k02exuY0y8___ joins
20:20:30	<k02exuY0y8___>	pokechu22, klea: do you know whether restricted ArchiveTeam raw items can hide API captures from public CDX entirely, even when the project grabbed them? I am trying to tell whether more public paging is worthwhile or whether the next real step is IA-side access/search.
20:21:16		k02exuY0y8___ quits [Remote host closed the connection]
20:21:31	<klea>	There's wbm exclusions, but if it has WBM exclusions, you're very unlikely to get access to the item.
20:21:36		k02exuY0y8____ joins
20:21:39	<k02exuY0y8____>	Does public web.archive.org CDX omit URL records that exist only inside access-restricted ArchiveTeam items?
20:21:57	<klea>	As far as I know, by default no.
20:22:24		k02exuY0y8_____ joins
20:22:24		k02exuY0y8____ quits [Remote host closed the connection]
20:22:25	<k02exuY0y8_____>	Thanks. Then if exact-account CuriousCat /api queries return nothing in public CDX, is the likelier conclusion that those URLs were never captured at all, rather than hidden only because the raw ArchiveTeam item is restricted?
20:22:56	<klea>	I suppose.
20:23:11		k02exuY0y8_____ quits [Remote host closed the connection]
20:26:18		k02exuY0y8______ joins
20:26:22	<k02exuY0y8______>	Thanks, that helps.
20:26:28		k02exuY0y8______ quits [Remote host closed the connection]
21:13:47		Webuser877077 quits [Client Quit]
21:35:09		angenieux2 quits [Read error: Connection reset by peer]
21:36:10		angenieux2 (angenieux) joins
22:11:57		DogsRNice_ joins
22:12:22	<klea>	WBM doesn't handle having web.archive.org links properly inside captures: https://web.archive.org/web/20250107145729mp_/https://cohost.org/ticky/post/15513-mods-are-asleep-post
22:12:31	<klea>	has a link to https://web.archive.org/web/20250107145729/https://web.archive.org/web/19990827174523/http://www.apple.com/main/maps/navbar2.map
22:16:00		DogsRNice quits [Ping timeout: 268 seconds]
22:17:04	<@JAA>	For the future, questions about accessing our past project data are fine in -bs.
22:18:25	<klea>	Thanks JAA for the heads up.
22:18:26	<klea>	JAA++
22:18:27	<eggdrop>	[karma] 'JAA' now has 351 karma!

Home Search Previous day Next day