00:01:28hexa_ is now known as hexa
00:04:22<Medaka>Thanks a lot!
00:04:43ericgallager joins
00:06:37dabs joins
00:08:29<Medaka>I randomly selected about 30 URLs from piyo.fc2.com and attempted to archive them using the Wayback Machine—all of them were successfully archived without any issues.
00:09:36<pabs>ok https://piyo.fc2.com/20070618/ worked on tantalus, nice
00:10:03<pabs>in Australia https://piyo.fc2.com/20070618/ gives me a 403 though
00:12:33<pabs>Medaka: old-fc2web-urls.txt is now running, and getting 403s sometimes
00:12:50<pabs>different 403s to the ones from piyo though
00:14:22<pabs>started !a < piyo_fc2_urls.txt -u firefox -p tantalus
00:15:19<pokechu22>http://dorops.fc2web.com/ redirects to http://error.fc2.com/web/403.html
00:15:37<pabs>and the piyo ones for me redirect to https://error.fc2.com/other/forbidden.html
00:16:39<pabs>I'm guessing the 403.html ones are due to a missing index.html or similar
00:16:59<pabs>and the forbidden.html ones are geo restrictions
00:17:37etnguyen03 quits [Client Quit]
00:17:48NeonGlitch quits [Client Quit]
00:17:49<pabs>Medaka: piyo_fc2_urls.txt is now running, no 403s thus far
00:19:22<Medaka>pokechu22: Regarding http://dorops.fc2web.com/, I’m also getting redirected to a 403 error page. I believe the page was likely deleted.
00:19:52<pabs>you can watch the jobs here btw http://archivebot.com/
00:20:09<pokechu22>Yeah, that's my guess too. Nothing shows up for duckduckgo or google, and web.archive.org has no captures of anything on there
00:20:26<pokechu22>nothing on search.yahoo.co.jp either
00:21:01<pabs>ah, I probably should have done the piyo.fc2.com ones as https://piyo.fc2.com/20070618 instead of https://piyo.fc2.com/20070618/ because of the no-parent thing
00:21:17<pabs>just in case sites link to unknown sites
00:21:26pabs abort/restart
00:21:46<pokechu22>Yeah, that's probably worth doing
00:21:57<pokechu22>there's also the sitemap trick but if they're all just dates like that it's not necessary
00:22:32<pabs>they aren't dates, just usernames it seems https://piyo.fc2.com/afesacvu/
00:23:33<pabs>Medaka: can you tell if https://piyo.fc2.com/afesacvu is identical to https://piyo.fc2.com/afesacvu/ ? or is there a redirect or?
00:24:04<pokechu22>oh, I can load it
00:24:17<pabs>hmm, not a redirect in AB
00:24:17<pokechu22>Identical, but no redirect
00:24:58<pabs>hmm, wonder if it would be problematic to have urls without / in the WBM
00:25:04<pokechu22>but curl https://piyo.fc2.com/afesacvu/ | sha1sum and https://piyo.fc2.com/afesacvu | sha1sum both give the same hash (ced54fe06750c1d5fc04d9f7516df05e30778289)
00:25:35<pabs>is there an index on https://piyo.fc2.com/ or something?
00:25:43<Medaka>pabs: Thanks for the archivebot link!
00:25:51<pokechu22>Both pages link to the slash version
00:26:06<pabs>so I guess we need the sitemap trick?
00:26:21<Medaka>Both https://piyo.fc2.com/afesacvu/ and https://piyo.fc2.com/afesacvu point to the same page, and there is no redirect between them.
00:26:22<pokechu22>https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition is a thing
00:26:23NeonGlitch (NeonGlitch) joins
00:26:23pabs not familiar with it
00:27:01<pokechu22>uh, they might actually have a sitemap already actually
00:27:26<pokechu22>http://piyo.fc2.com/sitemap_index.xml has <loc>//piyo.fc2.com/sitemap_1.xml</loc> which has entries like <loc>//piyo.fc2.com/sours/</loc>
00:27:45<pabs>hmm, will that work without http: ?
00:28:02<pokechu22>I'm not sure if archivebot will accept // links like that in sitemaps (those are standard for <a href="//"> though), but it's probably best to try
00:28:20<pabs>Medaka: can you compare the sitemaps to your list and see if yours has any more than theirs?
00:28:44<pabs>in the meantime lets start one based on the sitemap
00:29:09<Medaka>pokechu22: Yes, I created the list of piyo.fc2.com URLs from there. https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition
00:29:28<pokechu22>75210 entries in the sitemap
00:31:17<pabs>a naive job for http://piyo.fc2.com/ didn't find the sitemap :/
00:31:23<pabs>trying again with the right URL
00:31:46<pokechu22>It's listed in robots.txt
00:31:56<pokechu22>but probably better to just start it with the sitemap directly
00:33:16<pokechu22>hmm, there are a few users in the list but not in the sitemap, e.g. https://piyo.fc2.com/kukuku/
00:33:17<pabs>ok, AB does *not* like the sitemap URLs :/
00:33:34<Medaka>pabs: Got it, I'll give it a try.
00:34:00<pabs>Medaka: we are going to need a new list combining your stuff and the sitemap
00:34:11<pabs>and we are going to need pokechu22's custom sitemap trick
00:34:17<pokechu22>If they already have a sitemap it's pretty easy to create new sitemaps just by editing theirs
00:34:39<pabs>looks like they have a whole bunch of sitemaps
00:35:27<pabs>https://transfer.archivete.am/SkDqU/piyo.f2c.com-sitemaps.txt
00:35:28<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/SkDqU/piyo.f2c.com-sitemaps.txt
00:35:58<pokechu22>oh, and sitemap_mob is links like http://piyo.fc2.com/m/kars3zjt/ and http://piyo.fc2.com/m/idakoeru/ which don't seem obviously useful to me. Medaka, can you tell what those are supposed to do?
00:36:52<pabs>I think we should include http://piyo.fc2.com/ in the job too, since it finds a bunch of category/search stuff, which might be useful for navigation in the WBM
00:38:07<pabs>whats the content of the /m/ URLs compared to the non-/m/ equivalent?
00:39:29<pokechu22>Sitemaps are the same users in the same order, just with /m/ added
00:39:45<Medaka>pokechu22: They are likely meant to serve a role similar to a Twitter timeline, where the posts of many users are displayed in chronological order.
00:39:57<pokechu22>The actual page shows http://piyo.fc2.com/start/1/
00:41:16<pabs>also, lets include that search URL above in the custom sitemap?
00:42:06<pokechu22>Yeah, I can do that (and even include all pages in it)
00:42:42<pabs>would have assumed just the first page would be enough, others would be discovered?
00:42:45<pabs>but ack
00:44:05<pokechu22>Yeah, but might as well put them all in at once so we don't have to worry about it discovering pages at the end of the job
00:44:24<pabs>ah good point
00:44:28<pokechu22>oh, http://piyo.fc2.com/m/kars3zjt/ matches https://piyo.fc2.com/ exactly
00:45:37<pabs>Google does not know about http://piyo.fc2.com/m/ but does have http://piyo.fc2.com/ URLs, so /m/ is probably not important
00:46:09<pabs>for both site:http://piyo.fc2.com/m/ and "http://piyo.fc2.com/m/"
00:47:11<pabs>btw pokechu22, would be great to get a writeup of that custom sitemap trick :)
00:47:58dabs quits [Read error: Connection reset by peer]
00:51:13<pokechu22>https://transfer.archivete.am/inline/eGpPV/piyo.fc2.com_seed_urls.txt - all there really is to the trick is to create an XML sitemap, and do an !a < list job with that in your list. You could just as easily create an HTML page with links but sitemaps are a bit cleaner
00:53:37<pabs>so the first URL sets the parent for the job?
00:54:26<pabs>going to add this to the wiki
00:54:50<pokechu22>I think the sitemap is treated as the parent, but because https://piyo.fc2.com/ is also in the list, it's willing to recurse over that site as well (even if it's not the parent). My understanding (untested though) is that if you didn't have https://piyo.fc2.com/ in the list, then it'd save one level of outlinks (still avoiding --no-parent issues for redirects) but not recurse
00:54:52<pokechu22>further
00:55:40<pabs>hmm, but if the sitemap is the parent, then https://transfer.archivete.am/ becomes the parent?
00:56:03<pabs>hmm ok
00:56:43<pokechu22>Yeah, https://transfer.archivete.am/ is treated as the parent. So things wouldn't behave nicely if you had https://transfer.archivete.am/ URLs in your sitemap (or, I guess, if the targetted site links to https://transfer.archivete.am/) but neither of those are likely
00:56:44<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/)
00:57:11<pokechu22>(note that when I generate a list, I put the python script in the list directly, not in the generated sitemap)
00:58:23<pabs>ah, you have a script for this?
00:59:12<pokechu22>Not for generic sitemaps, but I did generate sitemaps for paper.comac.cc and ipaper.comac.cc
00:59:41bilboed0 quits [Ping timeout: 260 seconds]
00:59:41<pokechu22>because those were awkward combinations of XML+javascript-based calendars and HTML articles
01:02:32Wohlstand (Wohlstand) joins
01:03:07<pabs>whoa, are all these using the trick? https://archive.fart.website/archivebot/viewer/?q=_seed_urls.txt
01:04:30<pokechu22>Some are, some are just lists of subdomains (though I've been naming those _subdomains.txt instead lately), some are just URLs not containing slashes
01:07:45<pabs>so with the trick, will it recurse on http://piyo.fc2.com/ urls that aren't in the custom sitemaps?
01:08:45<pokechu22>Yes
01:12:27<h2ibot>PaulWise edited ArchiveBot (+1016, document the custom sitemap trick by pokechu22): https://wiki.archiveteam.org/?diff=55473&oldid=55077
01:12:38<pabs>pokechu22++
01:12:39<pabs>pokechu22++
01:12:39<eggdrop>[karma] 'pokechu22' now has 138 karma!
01:12:41<eggdrop>[karma] 'pokechu22' now has 139 karma!
01:17:37NeonGlitch quits [Client Quit]
01:33:31Megame quits [Quit: Leaving]
01:40:02<Medaka>I'm watching the crawl progress. Thank you so much!
01:41:52ericgallager quits [Client Quit]
01:43:42Medaka quits [Quit: Leaving]
01:58:36emphatic quits [Ping timeout: 260 seconds]
02:50:53NeonGlitch (NeonGlitch) joins
02:51:25NeonGlitch quits [Client Quit]
02:53:22NeonGlitch (NeonGlitch) joins
02:53:48emphatic joins
03:04:18<pabs>!tell Medaka hmm are URLs like https://pr.fc2.com/lengthyst/ on your radar?
03:04:18<eggdrop>[tell] ok, I'll tell Medaka when they join next
03:07:37<pabs>!tell Medaka where did the URLs in old-fc2web-urls.txt come from btw?
03:07:38<eggdrop>[tell] ok, I'll tell Medaka when they join next
03:14:30<pabs>hmm, lots of other domains popping up too
03:48:03Wohlstand quits [Remote host closed the connection]
04:08:48NeonGlitch quits [Client Quit]
04:17:48eroc19906 (eroc1990) joins
04:19:11eroc1990 quits [Ping timeout: 260 seconds]
04:25:50HP_Archivist quits [Quit: Leaving]
04:27:50HP_Archivist (HP_Archivist) joins
04:28:17HP_Archivist quits [Remote host closed the connection]
04:28:44HP_Archivist (HP_Archivist) joins
04:30:03HP_Archivist quits [Client Quit]
04:30:22HP_Archivist (HP_Archivist) joins
05:06:13NeonGlitch (NeonGlitch) joins
05:06:40ericgallager joins
05:13:59DogsRNice_ quits [Read error: Connection reset by peer]
05:26:55NeonGlitch quits [Client Quit]
05:32:06nicolas17 quits [Ping timeout: 260 seconds]
06:04:49<BlankEclair>envs.net running out of funds: https://catgirl.center/notes/a6yrph70akwz2ie4
06:06:13ericgallager quits [Client Quit]
06:08:20lennier2 quits [Ping timeout: 258 seconds]
06:08:31lennier2 joins
06:20:55Webuser608106 joins
06:24:23Webuser608106 quits [Client Quit]
06:30:57pokechu22 quits [Ping timeout: 258 seconds]
06:33:05pokechu22 (pokechu22) joins
06:56:41G4te_Keep3r34924156 quits [Ping timeout: 260 seconds]
07:18:59Island quits [Read error: Connection reset by peer]
07:42:49<@arkiver>is fc2web being handled through AB?
07:43:09<@arkiver>i've been a bit less around this week due to a vacation, should be around more again next week and fully the week after
07:49:34BornOn420 (BornOn420) joins
08:07:43<pabs>there are two AB jobs running for a couple of subdomains, but looking at the log so far there are other subdomains that may have been missed
08:25:57leo60228- quits [Ping timeout: 258 seconds]
08:26:20leo60228 (leo60228) joins
08:30:02lys8 joins
08:30:36lys quits [Read error: Connection reset by peer]
08:30:37lys8 is now known as lys
08:51:59G4te_Keep3r34924156 joins
09:16:45midou quits [Remote host closed the connection]
09:17:02midou joins
09:17:55ducky (ducky) joins
09:40:39pedantic-darwin joins
09:42:12sec^nd quits [Remote host closed the connection]
09:42:39sec^nd (second) joins
10:01:57Lunarian1 (LunarianBunny1147) joins
10:05:06LunarianBunny1147 quits [Ping timeout: 260 seconds]
10:37:06<chrismrtn>Is there a working method for getting info on a set of Twitter user IDs? I used to use the `UsersByRestIds` endpoint (as did snscrape), but that started returning 404 as of a few days ago...
11:00:04Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat]
11:02:47Bleo18260072271962345 joins
11:07:23<chrismrtn>Ah, it looks like it still works, but may need some special header in the request.
11:07:44cyanbox joins
11:17:56<h2ibot>Himond000 edited Deathwatch (+152, /* 2025 */ add ojiji.net): https://wiki.archiveteam.org/?diff=55475&oldid=55472
11:23:08ericgallager joins
11:43:41nine quits [Ping timeout: 260 seconds]
11:45:37nine joins
11:45:37nine quits [Changing host]
11:45:37nine (nine) joins
11:46:37nine quits [Client Quit]
11:51:06ericgallager quits [Read error: Connection reset by peer]
11:51:24ericgallager joins
12:02:28ericgallager quits [Client Quit]
12:34:26Webuser586003 joins
12:34:55Webuser586003 quits [Client Quit]
12:35:25Webuser516190 joins
12:35:39Webuser516190 quits [Client Quit]
12:36:14Webuser055756 joins
12:36:35Webuser055756 quits [Client Quit]
13:29:25<jacksonchen666>i know AT≠IA but i'm looking to to upload my ~440+ GB of dead/unlisted youtube videos to IA (structured as channelID/videoID/ with .info.json, hopefully), could i get pointers/help with metadata for IA, collection choice and uploading method?
13:34:09FiTheArchiver joins
13:35:55Wohlstand (Wohlstand) joins
13:53:03NeonGlitch (NeonGlitch) joins
14:10:30ducky quits [Read error: Connection reset by peer]
14:10:35ducky (ducky) joins
14:15:56FiTheArchiver quits [Client Quit]
14:33:06Cronfox quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
14:33:20Cronfox (Cronfox) joins
14:45:49<Hans5958><jacksonchen666> "i know AT≠IA but i'm looking..." <- Maybe you can follow how TubeUp is doing: https://archive.org/details/youtube-2KttA3p799A. The identifier (youtube-2KttA3p799A) has a format that Wayback Machine can use, and other metadata is included in the description and other metadata stuff
14:46:08<Hans5958>AFAIK, maybe someone can fill me in, especially with such volume
14:54:39nicolas17 joins
14:58:22<@JAA>I'm pretty sure that is not integrated into the Wayback Machine.
14:58:38<@JAA>(I'd hope it isn't.)
14:58:45<h2ibot>Hans5958 edited Twitch.tv (+6): https://wiki.archiveteam.org/?diff=55476&oldid=55393
14:58:59<@JAA>But yes, following that format is probably not a bad idea.
15:02:16cyanbox quits [Read error: Connection reset by peer]
15:11:11<Hans5958><JAA> "I'm pretty sure that is not..." <- Ah, I must've either confused or remembered things wrong. Well, at least with the format it can be found relatively easily (e.g. https://findyoutubevideo.thetechrobo.ca/)
15:12:48<Hans5958>(ah, I often forgot that I couldn't turn off mentions when using the reply feature on Matrix; Discord habits, please apologize for the ping)
15:36:21Lambro_D joins
15:40:19nine joins
15:40:19nine quits [Changing host]
15:40:19nine (nine) joins
15:47:13nine quits [Client Quit]
15:50:18<TheTechRobo>Yes, that is the most typical format I see. I also see youtube_CHANNELID items sometimes. If you do the latter (or some other method), please let me know so I can index them :-)
15:57:02NeonGlitch quits [Client Quit]
16:02:38NeonGlitch (NeonGlitch) joins
16:02:39nine joins
16:02:39nine quits [Changing host]
16:02:39nine (nine) joins
16:03:10NeonGlitch quits [Client Quit]
16:03:37nine quits [Client Quit]
16:05:07NeonGlitch (NeonGlitch) joins
16:05:46NeonGlitch quits [Client Quit]
16:10:12<steering>TheTechRobo++
16:10:13<eggdrop>[karma] 'TheTechRobo' now has 14 karma!
16:13:56NeonGlitch (NeonGlitch) joins
16:14:44Megame (Megame) joins
16:57:56BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in]
16:59:06NeonGlitch quits [Client Quit]
16:59:22BennyOtt (BennyOtt) joins
17:02:26BennyOtt quits [Remote host closed the connection]
17:03:38NeonGlitch (NeonGlitch) joins
17:04:16NeonGlitch quits [Client Quit]
17:05:04BennyOtt (BennyOtt) joins
17:08:06<h2ibot>HadeanEon edited Deaths in 2000 (+383, BOT - Updating page: {{saved}} (117),…): https://wiki.archiveteam.org/?diff=55477&oldid=55430
17:08:07<h2ibot>HadeanEon edited Deaths in 2000/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55478&oldid=55431
17:14:04NeonGlitch (NeonGlitch) joins
17:14:39NeonGlitch quits [Client Quit]
17:17:06riteo quits [Remote host closed the connection]
17:20:48NeonGlitch (NeonGlitch) joins
17:21:21NeonGlitch quits [Client Quit]
17:28:09<h2ibot>HadeanEon edited Deaths in 2004 (-420, BOT - Updating page: {{saved}} (6),…): https://wiki.archiveteam.org/?diff=55479&oldid=55435
17:28:10<h2ibot>HadeanEon edited Deaths in 2004/list (-33, BOT - Updating list): https://wiki.archiveteam.org/?diff=55480&oldid=55325
17:30:57ljcool2006 joins
17:51:13<h2ibot>HadeanEon edited Deaths in 2008 (+379, BOT - Updating page: {{saved}} (3),…): https://wiki.archiveteam.org/?diff=55481&oldid=55440
17:51:14<h2ibot>HadeanEon edited Deaths in 2008/list (+22, BOT - Updating list): https://wiki.archiveteam.org/?diff=55482&oldid=55125
17:51:17NeonGlitch (NeonGlitch) joins
17:54:43dabs joins
18:02:27riteo (riteo) joins
18:06:38grill (grill) joins
18:06:58grill quits [Client Quit]
18:07:12grill (grill) joins
18:11:16<h2ibot>HadeanEon edited Deaths in 2010 (-371, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55483&oldid=55442
18:11:17<h2ibot>HadeanEon edited Deaths in 2010/list (-30, BOT - Updating list): https://wiki.archiveteam.org/?diff=55484&oldid=55443
18:14:22NeonGlitch quits [Client Quit]
18:21:18<h2ibot>HadeanEon edited Deaths in 2011 (-391, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55485&oldid=55444
18:21:19<h2ibot>HadeanEon edited Deaths in 2011/list (-26, BOT - Updating list): https://wiki.archiveteam.org/?diff=55486&oldid=55334
18:28:24SootBector quits [Remote host closed the connection]
18:29:36SootBector (SootBector) joins
18:31:20NeonGlitch (NeonGlitch) joins
18:34:20<h2ibot>HadeanEon edited Deaths in 2012 (+442, BOT - Updating page: {{saved}} (194),…): https://wiki.archiveteam.org/?diff=55487&oldid=55445
18:34:21<h2ibot>HadeanEon edited Deaths in 2012/list (+34, BOT - Updating list): https://wiki.archiveteam.org/?diff=55488&oldid=55446
18:44:22<h2ibot>HadeanEon edited Deaths in 2013 (+375, BOT - Updating page: {{saved}} (211),…): https://wiki.archiveteam.org/?diff=55489&oldid=55447
18:44:23<h2ibot>HadeanEon edited Deaths in 2013/list (+24, BOT - Updating list): https://wiki.archiveteam.org/?diff=55490&oldid=55336
18:53:06nine joins
18:53:06nine quits [Changing host]
18:53:06nine (nine) joins
19:07:56Lunarian1 is now known as LunarianBunny1147
19:16:21grill quits [Ping timeout: 260 seconds]
19:16:27<h2ibot>HadeanEon edited Deaths in 2016/list (+44, BOT - Updating list): https://wiki.archiveteam.org/?diff=55491&oldid=55273
19:33:30NeonGlitch quits [Client Quit]
19:34:37NeonGlitch (NeonGlitch) joins
19:38:42tzt quits [Ping timeout: 258 seconds]
19:43:01tzt (tzt) joins
19:46:16makeworld3 joins
19:46:22makeworld quits [Ping timeout: 258 seconds]
19:46:22makeworld3 is now known as makeworld
19:55:29IDK (IDK) joins
19:59:39<h2ibot>HadeanEon edited Deaths in 2017 (-302, BOT - Updating page: {{saved}} (373),…): https://wiki.archiveteam.org/?diff=55492&oldid=55453
19:59:40<h2ibot>HadeanEon edited Deaths in 2017/list (-76, BOT - Updating list): https://wiki.archiveteam.org/?diff=55493&oldid=55454
20:04:34APOLLO03 joins
20:15:13NeonGlitch quits [Client Quit]
20:16:58ericgallager joins
20:28:38Island joins
20:35:33DigitalDragons quits [Read error: Connection reset by peer]
20:35:41Exorcism quits [Ping timeout: 260 seconds]
20:38:03ShakespeareFan00 joins
20:38:08<ShakespeareFan00>Hi,
20:38:34<ShakespeareFan00>Is there an effort to backup important resources from Internet Archive to other sites?
20:39:14<pokechu22>quoting from my own message, just to get thing started: I'm not aware of any specific project for those (there's https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK but nothing ever really came of that). I don't think anything that's in public domain/digitized on behalf of the library of congress (which I believe includes Catalog of Copyright Entries) is at risk though
20:39:49<ShakespeareFan00>My specific concern was the Catalog of Copyright Entries scans ( which I've not found as scans on sites other than the Internet Archive)
20:39:57<Flashfire42>I do believe IA has out of country backups tho I am not sure how up to date they are per say
20:40:35<ShakespeareFan00>Some of the volumes were mirrored to Wikimedia Commons in 2020 (during the last scare about IA's existence, but I'm not sure on the quality.)
20:41:06<ShakespeareFan00>It eventually formed the starting point for - https://commons.wikimedia.org/wiki/Commons:IA_books
20:41:25<ShakespeareFan00>It would of course be nice not to have to rely on only 2 site hosting the volumes
20:42:02<ShakespeareFan00>@pokechu22: The resources may well be on the LOC site directly, but I hadn't found them previously, myself..
20:42:52<ShakespeareFan00>I could not add a project suggestion, to the Wiki, because of a disagreement or typo I made over a decade ago, meaning I am still blocked on the wiki.
20:44:36<ShakespeareFan00>It would of course be appreciated that if any Archiveteam Warriors (offfical or unoffical) started mirroring Federal works and those in the public domain (by expiry or non-renewal) to sites other than Internet Archive. One dedicated contributor to Wikimedia Commons was doing this in 2020-21, but single handedly 😭
20:44:46<pokechu22>I think some are on google books as well, but google books scans aren't great
20:44:52<ShakespeareFan00>Quite.
20:45:22<ShakespeareFan00>I cannot use the wiki, but I would support archival of PDF or DJVU of scanned books to Wikimedia Commons ...
20:47:53<ShakespeareFan00>Commons or Wikisource can use either format. The workd don't have to be in English even, The only requirment is that they are Federal (US) , in the public domain (by expiry, no-notice etc.). And full academic works under licenses such a Creative Commons Share Alike and CC-BY are also appreciated on Commons, generally..
20:49:47<ShakespeareFan00>I am suprised not to have seen a more active effort to mirror works from IA to Wikimedia Commons.. Perhaps the Commons project page I linked can be revived if people start mirroring old works.. Finds from IA now on Wikisource have included 18th century Statute Collections, obscure text on Color, and a US Federal Work giving a set of vector
20:49:47<ShakespeareFan00>fonts, as well as 19th century and early 20th works of fiction. ...
20:50:02<ShakespeareFan00>too numerous to mention ..
20:50:27<ShakespeareFan00>PLEASE Upoload the Public domain to Wikimedia Commons.. Please !.
20:51:10<ShakespeareFan00>I am of course not aware of any other 'asteroid impact' backups at the moment for public domain resources on IA.
20:52:47<ShakespeareFan00>pokechu22 : Well I've said my 2 cents.. I look forward to seeing plenty of Public Domain works appearing on Wikimedia Commons in the next few weeks.
20:53:08<ShakespeareFan00>That includes higher quality versions of scans Wikimedia Commons already has. :)
20:53:22<ShakespeareFan00>Apologies for the wall of text.
20:53:54<pokechu22>I don't have much experience uploading to commons, but I don't see anything stopping someone from uploading it, other than needing to sort through and catalog them
20:54:34<pokechu22>It does look like a lot of them are listed on https://en.wikisource.org/wiki/Catalog_of_Copyright_Entries
20:54:42<ShakespeareFan00>At present, the focus is on getting stuff uploaded. I've found on the whole that once it's on the site , it gets categorised very quckly..
20:55:19<ShakespeareFan00>See the project link I gave as well.. I won't mention it a second time, unless asked directly.
20:55:55<pokechu22>What stops you from uploading them? Seems like https://ia-upload.wmcloud.org/ seems to exist at least
20:56:24<ShakespeareFan00>I will also note here that English Wikisource ALWAYS needs transcribers. - https://en.wikisource.org/wiki/Wikisource:About
20:57:08<ShakespeareFan00>pokechu22 : Limited bandwidth.. But I see your sentiment :)
20:57:43<ShakespeareFan00>I'm also behind a firewall on my system.
20:58:15<ShakespeareFan00>When I can, I've certainly used the tool you just linked.
20:58:37<ShakespeareFan00>However, if 100 people were moving important works..
20:59:17<pokechu22>Ah, that makes a bit more sense at least
20:59:43<@JAA>That tool seems to transfer things directly, maybe?
20:59:50<ShakespeareFan00>Nothing stops anyone from using the tool for as many public domain works as possible :) (Generally public domain stuff doesn't get removed on Commons, unless Commons already has it, or an items proves to be in copyright, rare.).
21:00:44<ShakespeareFan00>@JAA: yes, and it can also make a DJVu directly from the JP2/ scans at IA, if a DJVU/PDF doesn't yet exist, or is lo quality.. (I've used that option once or twice myself.)
21:01:21<ShakespeareFan00>https://en.wikisource.org/wiki/Help:Internet_Archive
21:01:55<ShakespeareFan00>Mirroring of resouces as I said was being undertaken in 2020-21 but stalled.
21:02:16<ShakespeareFan00>With enough 'warriors' though..
21:03:24<@JAA>Yeah, with enough devs, our software would be in a better state, too. And with more resources, we could archive more of the stuff being lit on fire every day.
21:04:23<ShakespeareFan00>I also encourage people to check out Wikisource.. It's kind of like Distributed Proofreaders, but to me a lot more friendly ;)..
21:04:39<@JAA>IA does have a second location in Canada. I don't know what fraction of the data has been migrated there yet though.
21:05:00<ShakespeareFan00>@JAA : If only, If only (re never having the resources) 😁
21:05:16<@JAA>Personally, I don't see IA in immediate danger, and so I'll rather focus on archiving things that are.
21:05:33<@JAA>More copies are always good though.
21:05:37<steering>they'll probably both get nuked when WW3 kicks off anyway.
21:05:38<steering>:P
21:06:04<pokechu22>Wikisource is nice and I did a fair bit of contribution to i tin the past; I just have been focused more on things that I have unique skills for more recently
21:07:01<ShakespeareFan00>IA is at risk as I see it ( You aware of the dispute over old sound recordings? - 700 million isn't damages it's a "ruin-em" approach. And IA closing wouldn't just remove the disputed sound recordings. )
21:07:40<ShakespeareFan00>https://blog.archive.org/2025/04/17/take-action-defend-the-internet-archive/
21:08:14<ShakespeareFan00>That's why "asteroid-mitigation" measures became advisable...
21:08:21<ShakespeareFan00>🤣
21:09:19<ShakespeareFan00>As I said I strongly suggest mass mirroring of the public domain to Wikimedia Commons :)
21:09:32<ShakespeareFan00>(or other platforms as well..)
21:10:50<ShakespeareFan00>BTW My current handle is not necessarily the one I've used on Commons and Wikisource ...
21:13:01etnguyen03 (etnguyen03) joins
21:13:10NeonGlitch (NeonGlitch) joins
21:13:40NeonGlitch quits [Client Quit]
21:17:01<ShakespeareFan00>This IRC is logged, so hoepfully see my 2cents, and acts accordingly :)
21:19:29<ShakespeareFan00>Also despite the typo or issue that got me blocked on the wiki I have undertaken some archival efforts of my own.. Thanks to using a specifc tool suggested here, the entirity of 8bs.com got backed up to Wayback.. including entire scan runs of publication that the Internet Archive did not have!
21:19:39<ShakespeareFan00>(Mostly Acorn computer related)
21:22:10<ShakespeareFan00>I have to go, but PLEASE keep archiving , especially the public domain :)
21:22:14ShakespeareFan00 quits [Client Quit]
21:23:52<h2ibot>HadeanEon edited Deaths in 2019 (+275, BOT - Updating page: {{saved}} (491),…): https://wiki.archiveteam.org/?diff=55494&oldid=55455
21:23:53<h2ibot>HadeanEon edited Deaths in 2019/list (+35, BOT - Updating list): https://wiki.archiveteam.org/?diff=55495&oldid=55279
21:26:55<@JAA><aatt.png>
21:34:07etnguyen03 quits [Client Quit]
21:55:24Webuser862912 joins
22:05:30<szczot3k|m>moin
22:11:33NeonGlitch (NeonGlitch) joins
22:12:03NeonGlitch quits [Client Quit]
22:14:00<h2ibot>HadeanEon edited Deaths in 2020/list (+0, BOT - Updating list): https://wiki.archiveteam.org/?diff=55496&oldid=55340
22:15:02etnguyen03 (etnguyen03) joins
22:42:01DogsRNice joins
22:42:49Webuser529564 quits [Quit: Ooops, wrong browser tab.]
22:43:16Megame quits [Quit: Leaving]
22:45:08etnguyen03 quits [Client Quit]
22:54:07<h2ibot>HadeanEon edited Deaths in 2021 (+318, BOT - Updating page: {{saved}} (53),…): https://wiki.archiveteam.org/?diff=55497&oldid=55457
22:54:08<h2ibot>HadeanEon edited Deaths in 2021/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55498&oldid=55342
23:10:14ericgallager quits [Client Quit]
23:16:42Bleo18260072271962345 quits [Quit: Ping timeout (120 seconds)]
23:16:54kiska52 quits [Quit: Ping timeout (120 seconds)]
23:16:56tek_dmn quits [Quit: ZNC - https://znc.in]
23:17:06Ryz quits [Quit: Ping timeout (120 seconds)]
23:17:12kiska52 joins
23:17:17Bleo18260072271962345 joins
23:18:09Ryz (Ryz) joins
23:18:14tek_dmn (tek_dmn) joins
23:27:30Webuser862912 quits [Client Quit]