#archiveteam-bs log for 2025-04-25

Home Search Previous day Next day

00:01:28		hexa_ is now known as hexa
00:04:22	<Medaka>	Thanks a lot!
00:04:43		ericgallager joins
00:06:37		dabs joins
00:08:29	<Medaka>	I randomly selected about 30 URLs from piyo.fc2.com and attempted to archive them using the Wayback Machine—all of them were successfully archived without any issues.
00:09:36	<pabs>	ok https://piyo.fc2.com/20070618/ worked on tantalus, nice
00:10:03	<pabs>	in Australia https://piyo.fc2.com/20070618/ gives me a 403 though
00:12:33	<pabs>	Medaka: old-fc2web-urls.txt is now running, and getting 403s sometimes
00:12:50	<pabs>	different 403s to the ones from piyo though
00:14:22	<pabs>	started !a < piyo_fc2_urls.txt -u firefox -p tantalus
00:15:19	<pokechu22>	http://dorops.fc2web.com/ redirects to http://error.fc2.com/web/403.html
00:15:37	<pabs>	and the piyo ones for me redirect to https://error.fc2.com/other/forbidden.html
00:16:39	<pabs>	I'm guessing the 403.html ones are due to a missing index.html or similar
00:16:59	<pabs>	and the forbidden.html ones are geo restrictions
00:17:37		etnguyen03 quits [Client Quit]
00:17:48		NeonGlitch quits [Client Quit]
00:17:49	<pabs>	Medaka: piyo_fc2_urls.txt is now running, no 403s thus far
00:19:22	<Medaka>	pokechu22: Regarding http://dorops.fc2web.com/, I’m also getting redirected to a 403 error page. I believe the page was likely deleted.
00:19:52	<pabs>	you can watch the jobs here btw http://archivebot.com/
00:20:09	<pokechu22>	Yeah, that's my guess too. Nothing shows up for duckduckgo or google, and web.archive.org has no captures of anything on there
00:20:26	<pokechu22>	nothing on search.yahoo.co.jp either
00:21:01	<pabs>	ah, I probably should have done the piyo.fc2.com ones as https://piyo.fc2.com/20070618 instead of https://piyo.fc2.com/20070618/ because of the no-parent thing
00:21:17	<pabs>	just in case sites link to unknown sites
00:21:26		pabs abort/restart
00:21:46	<pokechu22>	Yeah, that's probably worth doing
00:21:57	<pokechu22>	there's also the sitemap trick but if they're all just dates like that it's not necessary
00:22:32	<pabs>	they aren't dates, just usernames it seems https://piyo.fc2.com/afesacvu/
00:23:33	<pabs>	Medaka: can you tell if https://piyo.fc2.com/afesacvu is identical to https://piyo.fc2.com/afesacvu/ ? or is there a redirect or?
00:24:04	<pokechu22>	oh, I can load it
00:24:17	<pabs>	hmm, not a redirect in AB
00:24:17	<pokechu22>	Identical, but no redirect
00:24:58	<pabs>	hmm, wonder if it would be problematic to have urls without / in the WBM
00:25:04	<pokechu22>	but curl https://piyo.fc2.com/afesacvu/ \| sha1sum and https://piyo.fc2.com/afesacvu \| sha1sum both give the same hash (ced54fe06750c1d5fc04d9f7516df05e30778289)
00:25:35	<pabs>	is there an index on https://piyo.fc2.com/ or something?
00:25:43	<Medaka>	pabs: Thanks for the archivebot link!
00:25:51	<pokechu22>	Both pages link to the slash version
00:26:06	<pabs>	so I guess we need the sitemap trick?
00:26:21	<Medaka>	Both https://piyo.fc2.com/afesacvu/ and https://piyo.fc2.com/afesacvu point to the same page, and there is no redirect between them.
00:26:22	<pokechu22>	https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition is a thing
00:26:23		NeonGlitch (NeonGlitch) joins
00:26:23		pabs not familiar with it
00:27:01	<pokechu22>	uh, they might actually have a sitemap already actually
00:27:26	<pokechu22>	http://piyo.fc2.com/sitemap_index.xml has <loc>//piyo.fc2.com/sitemap_1.xml</loc> which has entries like <loc>//piyo.fc2.com/sours/</loc>
00:27:45	<pabs>	hmm, will that work without http: ?
00:28:02	<pokechu22>	I'm not sure if archivebot will accept // links like that in sitemaps (those are standard for <a href="//"> though), but it's probably best to try
00:28:20	<pabs>	Medaka: can you compare the sitemaps to your list and see if yours has any more than theirs?
00:28:44	<pabs>	in the meantime lets start one based on the sitemap
00:29:09	<Medaka>	pokechu22: Yes, I created the list of piyo.fc2.com URLs from there. https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition
00:29:28	<pokechu22>	75210 entries in the sitemap
00:31:17	<pabs>	a naive job for http://piyo.fc2.com/ didn't find the sitemap :/
00:31:23	<pabs>	trying again with the right URL
00:31:46	<pokechu22>	It's listed in robots.txt
00:31:56	<pokechu22>	but probably better to just start it with the sitemap directly
00:33:16	<pokechu22>	hmm, there are a few users in the list but not in the sitemap, e.g. https://piyo.fc2.com/kukuku/
00:33:17	<pabs>	ok, AB does not like the sitemap URLs :/
00:33:34	<Medaka>	pabs: Got it, I'll give it a try.
00:34:00	<pabs>	Medaka: we are going to need a new list combining your stuff and the sitemap
00:34:11	<pabs>	and we are going to need pokechu22's custom sitemap trick
00:34:17	<pokechu22>	If they already have a sitemap it's pretty easy to create new sitemaps just by editing theirs
00:34:39	<pabs>	looks like they have a whole bunch of sitemaps
00:35:27	<pabs>	https://transfer.archivete.am/SkDqU/piyo.f2c.com-sitemaps.txt
00:35:28	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/SkDqU/piyo.f2c.com-sitemaps.txt
00:35:58	<pokechu22>	oh, and sitemap_mob is links like http://piyo.fc2.com/m/kars3zjt/ and http://piyo.fc2.com/m/idakoeru/ which don't seem obviously useful to me. Medaka, can you tell what those are supposed to do?
00:36:52	<pabs>	I think we should include http://piyo.fc2.com/ in the job too, since it finds a bunch of category/search stuff, which might be useful for navigation in the WBM
00:38:07	<pabs>	whats the content of the /m/ URLs compared to the non-/m/ equivalent?
00:39:29	<pokechu22>	Sitemaps are the same users in the same order, just with /m/ added
00:39:45	<Medaka>	pokechu22: They are likely meant to serve a role similar to a Twitter timeline, where the posts of many users are displayed in chronological order.
00:39:57	<pokechu22>	The actual page shows http://piyo.fc2.com/start/1/
00:41:16	<pabs>	also, lets include that search URL above in the custom sitemap?
00:42:06	<pokechu22>	Yeah, I can do that (and even include all pages in it)
00:42:42	<pabs>	would have assumed just the first page would be enough, others would be discovered?
00:42:45	<pabs>	but ack
00:44:05	<pokechu22>	Yeah, but might as well put them all in at once so we don't have to worry about it discovering pages at the end of the job
00:44:24	<pabs>	ah good point
00:44:28	<pokechu22>	oh, http://piyo.fc2.com/m/kars3zjt/ matches https://piyo.fc2.com/ exactly
00:45:37	<pabs>	Google does not know about http://piyo.fc2.com/m/ but does have http://piyo.fc2.com/ URLs, so /m/ is probably not important
00:46:09	<pabs>	for both site:http://piyo.fc2.com/m/ and "http://piyo.fc2.com/m/"
00:47:11	<pabs>	btw pokechu22, would be great to get a writeup of that custom sitemap trick :)
00:47:58		dabs quits [Read error: Connection reset by peer]
00:51:13	<pokechu22>	https://transfer.archivete.am/inline/eGpPV/piyo.fc2.com_seed_urls.txt - all there really is to the trick is to create an XML sitemap, and do an !a < list job with that in your list. You could just as easily create an HTML page with links but sitemaps are a bit cleaner
00:53:37	<pabs>	so the first URL sets the parent for the job?
00:54:26	<pabs>	going to add this to the wiki
00:54:50	<pokechu22>	I think the sitemap is treated as the parent, but because https://piyo.fc2.com/ is also in the list, it's willing to recurse over that site as well (even if it's not the parent). My understanding (untested though) is that if you didn't have https://piyo.fc2.com/ in the list, then it'd save one level of outlinks (still avoiding --no-parent issues for redirects) but not recurse
00:54:52	<pokechu22>	further
00:55:40	<pabs>	hmm, but if the sitemap is the parent, then https://transfer.archivete.am/ becomes the parent?
00:56:03	<pabs>	hmm ok
00:56:43	<pokechu22>	Yeah, https://transfer.archivete.am/ is treated as the parent. So things wouldn't behave nicely if you had https://transfer.archivete.am/ URLs in your sitemap (or, I guess, if the targetted site links to https://transfer.archivete.am/) but neither of those are likely
00:56:44	<eggdrop>	inline (for browser viewing): https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/)
00:57:11	<pokechu22>	(note that when I generate a list, I put the python script in the list directly, not in the generated sitemap)
00:58:23	<pabs>	ah, you have a script for this?
00:59:12	<pokechu22>	Not for generic sitemaps, but I did generate sitemaps for paper.comac.cc and ipaper.comac.cc
00:59:41		bilboed0 quits [Ping timeout: 260 seconds]
00:59:41	<pokechu22>	because those were awkward combinations of XML+javascript-based calendars and HTML articles
01:02:32		Wohlstand (Wohlstand) joins
01:03:07	<pabs>	whoa, are all these using the trick? https://archive.fart.website/archivebot/viewer/?q=_seed_urls.txt
01:04:30	<pokechu22>	Some are, some are just lists of subdomains (though I've been naming those _subdomains.txt instead lately), some are just URLs not containing slashes
01:07:45	<pabs>	so with the trick, will it recurse on http://piyo.fc2.com/ urls that aren't in the custom sitemaps?
01:08:45	<pokechu22>	Yes
01:12:27	<h2ibot>	PaulWise edited ArchiveBot (+1016, document the custom sitemap trick by pokechu22): https://wiki.archiveteam.org/?diff=55473&oldid=55077
01:12:38	<pabs>	pokechu22++
01:12:39	<pabs>	pokechu22++
01:12:39	<eggdrop>	[karma] 'pokechu22' now has 138 karma!
01:12:41	<eggdrop>	[karma] 'pokechu22' now has 139 karma!
01:17:37		NeonGlitch quits [Client Quit]
01:33:31		Megame quits [Quit: Leaving]
01:40:02	<Medaka>	I'm watching the crawl progress. Thank you so much!
01:41:52		ericgallager quits [Client Quit]
01:43:42		Medaka quits [Quit: Leaving]
01:58:36		emphatic quits [Ping timeout: 260 seconds]
02:50:53		NeonGlitch (NeonGlitch) joins
02:51:25		NeonGlitch quits [Client Quit]
02:53:22		NeonGlitch (NeonGlitch) joins
02:53:48		emphatic joins
03:04:18	<pabs>	!tell Medaka hmm are URLs like https://pr.fc2.com/lengthyst/ on your radar?
03:04:18	<eggdrop>	[tell] ok, I'll tell Medaka when they join next
03:07:37	<pabs>	!tell Medaka where did the URLs in old-fc2web-urls.txt come from btw?
03:07:38	<eggdrop>	[tell] ok, I'll tell Medaka when they join next
03:14:30	<pabs>	hmm, lots of other domains popping up too
03:48:03		Wohlstand quits [Remote host closed the connection]
04:08:48		NeonGlitch quits [Client Quit]
04:17:48		eroc19906 (eroc1990) joins
04:19:11		eroc1990 quits [Ping timeout: 260 seconds]
04:25:50		HP_Archivist quits [Quit: Leaving]
04:27:50		HP_Archivist (HP_Archivist) joins
04:28:17		HP_Archivist quits [Remote host closed the connection]
04:28:44		HP_Archivist (HP_Archivist) joins
04:30:03		HP_Archivist quits [Client Quit]
04:30:22		HP_Archivist (HP_Archivist) joins
05:06:13		NeonGlitch (NeonGlitch) joins
05:06:40		ericgallager joins
05:13:59		DogsRNice_ quits [Read error: Connection reset by peer]
05:26:55		NeonGlitch quits [Client Quit]
05:32:06		nicolas17 quits [Ping timeout: 260 seconds]
06:04:49	<BlankEclair>	envs.net running out of funds: https://catgirl.center/notes/a6yrph70akwz2ie4
06:06:13		ericgallager quits [Client Quit]
06:08:20		lennier2 quits [Ping timeout: 258 seconds]
06:08:31		lennier2 joins
06:20:55		Webuser608106 joins
06:24:23		Webuser608106 quits [Client Quit]
06:30:57		pokechu22 quits [Ping timeout: 258 seconds]
06:33:05		pokechu22 (pokechu22) joins
06:56:41		G4te_Keep3r34924156 quits [Ping timeout: 260 seconds]
07:18:59		Island quits [Read error: Connection reset by peer]
07:42:49	<@arkiver>	is fc2web being handled through AB?
07:43:09	<@arkiver>	i've been a bit less around this week due to a vacation, should be around more again next week and fully the week after
07:49:34		BornOn420 (BornOn420) joins
08:07:43	<pabs>	there are two AB jobs running for a couple of subdomains, but looking at the log so far there are other subdomains that may have been missed
08:25:57		leo60228- quits [Ping timeout: 258 seconds]
08:26:20		leo60228 (leo60228) joins
08:30:02		lys8 joins
08:30:36		lys quits [Read error: Connection reset by peer]
08:30:37		lys8 is now known as lys
08:51:59		G4te_Keep3r34924156 joins
09:16:45		midou quits [Remote host closed the connection]
09:17:02		midou joins
09:17:55		ducky (ducky) joins
09:40:39		pedantic-darwin joins
09:42:12		sec^nd quits [Remote host closed the connection]
09:42:39		sec^nd (second) joins
10:01:57		Lunarian1 (LunarianBunny1147) joins
10:05:06		LunarianBunny1147 quits [Ping timeout: 260 seconds]
10:37:06	<chrismrtn>	Is there a working method for getting info on a set of Twitter user IDs? I used to use the `UsersByRestIds` endpoint (as did snscrape), but that started returning 404 as of a few days ago...
11:00:04		Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
11:02:47		Bleo18260072271962345 joins
11:07:23	<chrismrtn>	Ah, it looks like it still works, but may need some special header in the request.
11:07:44		cyanbox joins
11:17:56	<h2ibot>	Himond000 edited Deathwatch (+152, /* 2025 */ add ojiji.net): https://wiki.archiveteam.org/?diff=55475&oldid=55472
11:23:08		ericgallager joins
11:43:41		nine quits [Ping timeout: 260 seconds]
11:45:37		nine joins
11:45:37		nine is now authenticated as nine
11:45:37		nine quits [Changing host]
11:45:37		nine (nine) joins
11:46:37		nine quits [Client Quit]
11:51:06		ericgallager quits [Read error: Connection reset by peer]
11:51:24		ericgallager joins
12:02:28		ericgallager quits [Client Quit]
12:34:26		Webuser586003 joins
12:34:55		Webuser586003 quits [Client Quit]
12:35:25		Webuser516190 joins
12:35:39		Webuser516190 quits [Client Quit]
12:36:14		Webuser055756 joins
12:36:35		Webuser055756 quits [Client Quit]
13:29:25	<jacksonchen666>	i know AT≠IA but i'm looking to to upload my ~440+ GB of dead/unlisted youtube videos to IA (structured as channelID/videoID/ with .info.json, hopefully), could i get pointers/help with metadata for IA, collection choice and uploading method?
13:34:09		FiTheArchiver joins
13:35:55		Wohlstand (Wohlstand) joins
13:53:03		NeonGlitch (NeonGlitch) joins
14:10:30		ducky quits [Read error: Connection reset by peer]
14:10:35		ducky (ducky) joins
14:15:56		FiTheArchiver quits [Client Quit]
14:33:06		Cronfox quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
14:33:20		Cronfox (Cronfox) joins
14:45:49	<Hans5958>	<jacksonchen666> "i know AT≠IA but i'm looking..." <- Maybe you can follow how TubeUp is doing: https://archive.org/details/youtube-2KttA3p799A. The identifier (youtube-2KttA3p799A) has a format that Wayback Machine can use, and other metadata is included in the description and other metadata stuff
14:46:08	<Hans5958>	AFAIK, maybe someone can fill me in, especially with such volume
14:54:39		nicolas17 joins
14:58:22	<@JAA>	I'm pretty sure that is not integrated into the Wayback Machine.
14:58:38	<@JAA>	(I'd hope it isn't.)
14:58:45	<h2ibot>	Hans5958 edited Twitch.tv (+6): https://wiki.archiveteam.org/?diff=55476&oldid=55393
14:58:59	<@JAA>	But yes, following that format is probably not a bad idea.
15:02:16		cyanbox quits [Read error: Connection reset by peer]
15:11:11	<Hans5958>	<JAA> "I'm pretty sure that is not..." <- Ah, I must've either confused or remembered things wrong. Well, at least with the format it can be found relatively easily (e.g. https://findyoutubevideo.thetechrobo.ca/)
15:12:48	<Hans5958>	(ah, I often forgot that I couldn't turn off mentions when using the reply feature on Matrix; Discord habits, please apologize for the ping)
15:36:21		Lambro_D joins
15:40:19		nine joins
15:40:19		nine is now authenticated as nine
15:40:19		nine quits [Changing host]
15:40:19		nine (nine) joins
15:47:13		nine quits [Client Quit]
15:50:18	<TheTechRobo>	Yes, that is the most typical format I see. I also see youtube_CHANNELID items sometimes. If you do the latter (or some other method), please let me know so I can index them :-)
15:57:02		NeonGlitch quits [Client Quit]
16:02:38		NeonGlitch (NeonGlitch) joins
16:02:39		nine joins
16:02:39		nine is now authenticated as nine
16:02:39		nine quits [Changing host]
16:02:39		nine (nine) joins
16:03:10		NeonGlitch quits [Client Quit]
16:03:37		nine quits [Client Quit]
16:05:07		NeonGlitch (NeonGlitch) joins
16:05:46		NeonGlitch quits [Client Quit]
16:10:12	<steering>	TheTechRobo++
16:10:13	<eggdrop>	[karma] 'TheTechRobo' now has 14 karma!
16:13:56		NeonGlitch (NeonGlitch) joins
16:14:44		Megame (Megame) joins
16:57:56		BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in]
16:59:06		NeonGlitch quits [Client Quit]
16:59:22		BennyOtt (BennyOtt) joins
17:02:26		BennyOtt quits [Remote host closed the connection]
17:03:38		NeonGlitch (NeonGlitch) joins
17:04:16		NeonGlitch quits [Client Quit]
17:05:04		BennyOtt (BennyOtt) joins
17:08:06	<h2ibot>	HadeanEon edited Deaths in 2000 (+383, BOT - Updating page: {{saved}} (117),…): https://wiki.archiveteam.org/?diff=55477&oldid=55430
17:08:07	<h2ibot>	HadeanEon edited Deaths in 2000/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55478&oldid=55431
17:14:04		NeonGlitch (NeonGlitch) joins
17:14:39		NeonGlitch quits [Client Quit]
17:17:06		riteo quits [Remote host closed the connection]
17:20:48		NeonGlitch (NeonGlitch) joins
17:21:21		NeonGlitch quits [Client Quit]
17:28:09	<h2ibot>	HadeanEon edited Deaths in 2004 (-420, BOT - Updating page: {{saved}} (6),…): https://wiki.archiveteam.org/?diff=55479&oldid=55435
17:28:10	<h2ibot>	HadeanEon edited Deaths in 2004/list (-33, BOT - Updating list): https://wiki.archiveteam.org/?diff=55480&oldid=55325
17:30:57		ljcool2006 joins
17:51:13	<h2ibot>	HadeanEon edited Deaths in 2008 (+379, BOT - Updating page: {{saved}} (3),…): https://wiki.archiveteam.org/?diff=55481&oldid=55440
17:51:14	<h2ibot>	HadeanEon edited Deaths in 2008/list (+22, BOT - Updating list): https://wiki.archiveteam.org/?diff=55482&oldid=55125
17:51:17		NeonGlitch (NeonGlitch) joins
17:54:43		dabs joins
18:02:27		riteo (riteo) joins
18:06:38		grill (grill) joins
18:06:58		grill quits [Client Quit]
18:07:12		grill (grill) joins
18:11:16	<h2ibot>	HadeanEon edited Deaths in 2010 (-371, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55483&oldid=55442
18:11:17	<h2ibot>	HadeanEon edited Deaths in 2010/list (-30, BOT - Updating list): https://wiki.archiveteam.org/?diff=55484&oldid=55443
18:14:22		NeonGlitch quits [Client Quit]
18:21:18	<h2ibot>	HadeanEon edited Deaths in 2011 (-391, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55485&oldid=55444
18:21:19	<h2ibot>	HadeanEon edited Deaths in 2011/list (-26, BOT - Updating list): https://wiki.archiveteam.org/?diff=55486&oldid=55334
18:28:24		SootBector quits [Remote host closed the connection]
18:29:36		SootBector (SootBector) joins
18:31:20		NeonGlitch (NeonGlitch) joins
18:34:20	<h2ibot>	HadeanEon edited Deaths in 2012 (+442, BOT - Updating page: {{saved}} (194),…): https://wiki.archiveteam.org/?diff=55487&oldid=55445
18:34:21	<h2ibot>	HadeanEon edited Deaths in 2012/list (+34, BOT - Updating list): https://wiki.archiveteam.org/?diff=55488&oldid=55446
18:44:22	<h2ibot>	HadeanEon edited Deaths in 2013 (+375, BOT - Updating page: {{saved}} (211),…): https://wiki.archiveteam.org/?diff=55489&oldid=55447
18:44:23	<h2ibot>	HadeanEon edited Deaths in 2013/list (+24, BOT - Updating list): https://wiki.archiveteam.org/?diff=55490&oldid=55336
18:53:06		nine joins
18:53:06		nine is now authenticated as nine
18:53:06		nine quits [Changing host]
18:53:06		nine (nine) joins
19:07:56		Lunarian1 is now known as LunarianBunny1147
19:16:21		grill quits [Ping timeout: 260 seconds]
19:16:27	<h2ibot>	HadeanEon edited Deaths in 2016/list (+44, BOT - Updating list): https://wiki.archiveteam.org/?diff=55491&oldid=55273
19:33:30		NeonGlitch quits [Client Quit]
19:34:37		NeonGlitch (NeonGlitch) joins
19:38:42		tzt quits [Ping timeout: 258 seconds]
19:43:01		tzt (tzt) joins
19:46:16		makeworld3 joins
19:46:22		makeworld quits [Ping timeout: 258 seconds]
19:46:22		makeworld3 is now known as makeworld
19:55:29		IDK (IDK) joins
19:59:39	<h2ibot>	HadeanEon edited Deaths in 2017 (-302, BOT - Updating page: {{saved}} (373),…): https://wiki.archiveteam.org/?diff=55492&oldid=55453
19:59:40	<h2ibot>	HadeanEon edited Deaths in 2017/list (-76, BOT - Updating list): https://wiki.archiveteam.org/?diff=55493&oldid=55454
20:04:34		APOLLO03 joins
20:15:13		NeonGlitch quits [Client Quit]
20:16:58		ericgallager joins
20:28:38		Island joins
20:35:33		DigitalDragons quits [Read error: Connection reset by peer]
20:35:41		Exorcism quits [Ping timeout: 260 seconds]
20:38:03		ShakespeareFan00 joins
20:38:08	<ShakespeareFan00>	Hi,
20:38:34	<ShakespeareFan00>	Is there an effort to backup important resources from Internet Archive to other sites?
20:39:14	<pokechu22>	quoting from my own message, just to get thing started: I'm not aware of any specific project for those (there's https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK but nothing ever really came of that). I don't think anything that's in public domain/digitized on behalf of the library of congress (which I believe includes Catalog of Copyright Entries) is at risk though
20:39:49	<ShakespeareFan00>	My specific concern was the Catalog of Copyright Entries scans ( which I've not found as scans on sites other than the Internet Archive)
20:39:57	<Flashfire42>	I do believe IA has out of country backups tho I am not sure how up to date they are per say
20:40:35	<ShakespeareFan00>	Some of the volumes were mirrored to Wikimedia Commons in 2020 (during the last scare about IA's existence, but I'm not sure on the quality.)
20:41:06	<ShakespeareFan00>	It eventually formed the starting point for - https://commons.wikimedia.org/wiki/Commons:IA_books
20:41:25	<ShakespeareFan00>	It would of course be nice not to have to rely on only 2 site hosting the volumes
20:42:02	<ShakespeareFan00>	@pokechu22: The resources may well be on the LOC site directly, but I hadn't found them previously, myself..
20:42:52	<ShakespeareFan00>	I could not add a project suggestion, to the Wiki, because of a disagreement or typo I made over a decade ago, meaning I am still blocked on the wiki.
20:44:36	<ShakespeareFan00>	It would of course be appreciated that if any Archiveteam Warriors (offfical or unoffical) started mirroring Federal works and those in the public domain (by expiry or non-renewal) to sites other than Internet Archive. One dedicated contributor to Wikimedia Commons was doing this in 2020-21, but single handedly 😭
20:44:46	<pokechu22>	I think some are on google books as well, but google books scans aren't great
20:44:52	<ShakespeareFan00>	Quite.
20:45:22	<ShakespeareFan00>	I cannot use the wiki, but I would support archival of PDF or DJVU of scanned books to Wikimedia Commons ...
20:47:53	<ShakespeareFan00>	Commons or Wikisource can use either format. The workd don't have to be in English even, The only requirment is that they are Federal (US) , in the public domain (by expiry, no-notice etc.). And full academic works under licenses such a Creative Commons Share Alike and CC-BY are also appreciated on Commons, generally..
20:49:47	<ShakespeareFan00>	I am suprised not to have seen a more active effort to mirror works from IA to Wikimedia Commons.. Perhaps the Commons project page I linked can be revived if people start mirroring old works.. Finds from IA now on Wikisource have included 18th century Statute Collections, obscure text on Color, and a US Federal Work giving a set of vector
20:49:47	<ShakespeareFan00>	fonts, as well as 19th century and early 20th works of fiction. ...
20:50:02	<ShakespeareFan00>	too numerous to mention ..
20:50:27	<ShakespeareFan00>	PLEASE Upoload the Public domain to Wikimedia Commons.. Please !.
20:51:10	<ShakespeareFan00>	I am of course not aware of any other 'asteroid impact' backups at the moment for public domain resources on IA.
20:52:47	<ShakespeareFan00>	pokechu22 : Well I've said my 2 cents.. I look forward to seeing plenty of Public Domain works appearing on Wikimedia Commons in the next few weeks.
20:53:08	<ShakespeareFan00>	That includes higher quality versions of scans Wikimedia Commons already has. :)
20:53:22	<ShakespeareFan00>	Apologies for the wall of text.
20:53:54	<pokechu22>	I don't have much experience uploading to commons, but I don't see anything stopping someone from uploading it, other than needing to sort through and catalog them
20:54:34	<pokechu22>	It does look like a lot of them are listed on https://en.wikisource.org/wiki/Catalog_of_Copyright_Entries
20:54:42	<ShakespeareFan00>	At present, the focus is on getting stuff uploaded. I've found on the whole that once it's on the site , it gets categorised very quckly..
20:55:19	<ShakespeareFan00>	See the project link I gave as well.. I won't mention it a second time, unless asked directly.
20:55:55	<pokechu22>	What stops you from uploading them? Seems like https://ia-upload.wmcloud.org/ seems to exist at least
20:56:24	<ShakespeareFan00>	I will also note here that English Wikisource ALWAYS needs transcribers. - https://en.wikisource.org/wiki/Wikisource:About
20:57:08	<ShakespeareFan00>	pokechu22 : Limited bandwidth.. But I see your sentiment :)
20:57:43	<ShakespeareFan00>	I'm also behind a firewall on my system.
20:58:15	<ShakespeareFan00>	When I can, I've certainly used the tool you just linked.
20:58:37	<ShakespeareFan00>	However, if 100 people were moving important works..
20:59:17	<pokechu22>	Ah, that makes a bit more sense at least
20:59:43	<@JAA>	That tool seems to transfer things directly, maybe?
20:59:50	<ShakespeareFan00>	Nothing stops anyone from using the tool for as many public domain works as possible :) (Generally public domain stuff doesn't get removed on Commons, unless Commons already has it, or an items proves to be in copyright, rare.).
21:00:44	<ShakespeareFan00>	@JAA: yes, and it can also make a DJVu directly from the JP2/ scans at IA, if a DJVU/PDF doesn't yet exist, or is lo quality.. (I've used that option once or twice myself.)
21:01:21	<ShakespeareFan00>	https://en.wikisource.org/wiki/Help:Internet_Archive
21:01:55	<ShakespeareFan00>	Mirroring of resouces as I said was being undertaken in 2020-21 but stalled.
21:02:16	<ShakespeareFan00>	With enough 'warriors' though..
21:03:24	<@JAA>	Yeah, with enough devs, our software would be in a better state, too. And with more resources, we could archive more of the stuff being lit on fire every day.
21:04:23	<ShakespeareFan00>	I also encourage people to check out Wikisource.. It's kind of like Distributed Proofreaders, but to me a lot more friendly ;)..
21:04:39	<@JAA>	IA does have a second location in Canada. I don't know what fraction of the data has been migrated there yet though.
21:05:00	<ShakespeareFan00>	@JAA : If only, If only (re never having the resources) 😁
21:05:16	<@JAA>	Personally, I don't see IA in immediate danger, and so I'll rather focus on archiving things that are.
21:05:33	<@JAA>	More copies are always good though.
21:05:37	<steering>	they'll probably both get nuked when WW3 kicks off anyway.
21:05:38	<steering>	:P
21:06:04	<pokechu22>	Wikisource is nice and I did a fair bit of contribution to i tin the past; I just have been focused more on things that I have unique skills for more recently
21:07:01	<ShakespeareFan00>	IA is at risk as I see it ( You aware of the dispute over old sound recordings? - 700 million isn't damages it's a "ruin-em" approach. And IA closing wouldn't just remove the disputed sound recordings. )
21:07:40	<ShakespeareFan00>	https://blog.archive.org/2025/04/17/take-action-defend-the-internet-archive/
21:08:14	<ShakespeareFan00>	That's why "asteroid-mitigation" measures became advisable...
21:08:21	<ShakespeareFan00>	🤣
21:09:19	<ShakespeareFan00>	As I said I strongly suggest mass mirroring of the public domain to Wikimedia Commons :)
21:09:32	<ShakespeareFan00>	(or other platforms as well..)
21:10:50	<ShakespeareFan00>	BTW My current handle is not necessarily the one I've used on Commons and Wikisource ...
21:13:01		etnguyen03 (etnguyen03) joins
21:13:10		NeonGlitch (NeonGlitch) joins
21:13:40		NeonGlitch quits [Client Quit]
21:17:01	<ShakespeareFan00>	This IRC is logged, so hoepfully see my 2cents, and acts accordingly :)
21:19:29	<ShakespeareFan00>	Also despite the typo or issue that got me blocked on the wiki I have undertaken some archival efforts of my own.. Thanks to using a specifc tool suggested here, the entirity of 8bs.com got backed up to Wayback.. including entire scan runs of publication that the Internet Archive did not have!
21:19:39	<ShakespeareFan00>	(Mostly Acorn computer related)
21:22:10	<ShakespeareFan00>	I have to go, but PLEASE keep archiving , especially the public domain :)
21:22:14		ShakespeareFan00 quits [Client Quit]
21:23:52	<h2ibot>	HadeanEon edited Deaths in 2019 (+275, BOT - Updating page: {{saved}} (491),…): https://wiki.archiveteam.org/?diff=55494&oldid=55455
21:23:53	<h2ibot>	HadeanEon edited Deaths in 2019/list (+35, BOT - Updating list): https://wiki.archiveteam.org/?diff=55495&oldid=55279
21:26:55	<@JAA>	<aatt.png>
21:34:07		etnguyen03 quits [Client Quit]
21:55:24		Webuser862912 joins
22:05:30	<szczot3k\|m>	moin
22:11:33		NeonGlitch (NeonGlitch) joins
22:12:03		NeonGlitch quits [Client Quit]
22:14:00	<h2ibot>	HadeanEon edited Deaths in 2020/list (+0, BOT - Updating list): https://wiki.archiveteam.org/?diff=55496&oldid=55340
22:15:02		etnguyen03 (etnguyen03) joins
22:42:01		DogsRNice joins
22:42:49		Webuser529564 quits [Quit: Ooops, wrong browser tab.]
22:43:16		Megame quits [Quit: Leaving]
22:45:08		etnguyen03 quits [Client Quit]
22:54:07	<h2ibot>	HadeanEon edited Deaths in 2021 (+318, BOT - Updating page: {{saved}} (53),…): https://wiki.archiveteam.org/?diff=55497&oldid=55457
22:54:08	<h2ibot>	HadeanEon edited Deaths in 2021/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55498&oldid=55342
23:10:14		ericgallager quits [Client Quit]
23:16:42		Bleo18260072271962345 quits [Quit: Ping timeout (120 seconds)]
23:16:54		kiska52 quits [Quit: Ping timeout (120 seconds)]
23:16:56		tek_dmn quits [Quit: ZNC - https://znc.in]
23:17:06		Ryz quits [Quit: Ping timeout (120 seconds)]
23:17:12		kiska52 joins
23:17:17		Bleo18260072271962345 joins
23:18:09		Ryz (Ryz) joins
23:18:14		tek_dmn (tek_dmn) joins
23:27:30		Webuser862912 quits [Client Quit]

Home Search Previous day Next day