00:01:28 | | hexa_ is now known as hexa |
00:04:22 | <Medaka> | Thanks a lot! |
00:04:43 | | ericgallager joins |
00:06:37 | | dabs joins |
00:08:29 | <Medaka> | I randomly selected about 30 URLs from piyo.fc2.com and attempted to archive them using the Wayback Machine—all of them were successfully archived without any issues. |
00:09:36 | <pabs> | ok https://piyo.fc2.com/20070618/ worked on tantalus, nice |
00:10:03 | <pabs> | in Australia https://piyo.fc2.com/20070618/ gives me a 403 though |
00:12:33 | <pabs> | Medaka: old-fc2web-urls.txt is now running, and getting 403s sometimes |
00:12:50 | <pabs> | different 403s to the ones from piyo though |
00:14:22 | <pabs> | started !a < piyo_fc2_urls.txt -u firefox -p tantalus |
00:15:19 | <pokechu22> | http://dorops.fc2web.com/ redirects to http://error.fc2.com/web/403.html |
00:15:37 | <pabs> | and the piyo ones for me redirect to https://error.fc2.com/other/forbidden.html |
00:16:39 | <pabs> | I'm guessing the 403.html ones are due to a missing index.html or similar |
00:16:59 | <pabs> | and the forbidden.html ones are geo restrictions |
00:17:37 | | etnguyen03 quits [Client Quit] |
00:17:48 | | NeonGlitch quits [Client Quit] |
00:17:49 | <pabs> | Medaka: piyo_fc2_urls.txt is now running, no 403s thus far |
00:19:22 | <Medaka> | pokechu22: Regarding http://dorops.fc2web.com/, I’m also getting redirected to a 403 error page. I believe the page was likely deleted. |
00:19:52 | <pabs> | you can watch the jobs here btw http://archivebot.com/ |
00:20:09 | <pokechu22> | Yeah, that's my guess too. Nothing shows up for duckduckgo or google, and web.archive.org has no captures of anything on there |
00:20:26 | <pokechu22> | nothing on search.yahoo.co.jp either |
00:21:01 | <pabs> | ah, I probably should have done the piyo.fc2.com ones as https://piyo.fc2.com/20070618 instead of https://piyo.fc2.com/20070618/ because of the no-parent thing |
00:21:17 | <pabs> | just in case sites link to unknown sites |
00:21:26 | | pabs abort/restart |
00:21:46 | <pokechu22> | Yeah, that's probably worth doing |
00:21:57 | <pokechu22> | there's also the sitemap trick but if they're all just dates like that it's not necessary |
00:22:32 | <pabs> | they aren't dates, just usernames it seems https://piyo.fc2.com/afesacvu/ |
00:23:33 | <pabs> | Medaka: can you tell if https://piyo.fc2.com/afesacvu is identical to https://piyo.fc2.com/afesacvu/ ? or is there a redirect or? |
00:24:04 | <pokechu22> | oh, I can load it |
00:24:17 | <pabs> | hmm, not a redirect in AB |
00:24:17 | <pokechu22> | Identical, but no redirect |
00:24:58 | <pabs> | hmm, wonder if it would be problematic to have urls without / in the WBM |
00:25:04 | <pokechu22> | but curl https://piyo.fc2.com/afesacvu/ | sha1sum and https://piyo.fc2.com/afesacvu | sha1sum both give the same hash (ced54fe06750c1d5fc04d9f7516df05e30778289) |
00:25:35 | <pabs> | is there an index on https://piyo.fc2.com/ or something? |
00:25:43 | <Medaka> | pabs: Thanks for the archivebot link! |
00:25:51 | <pokechu22> | Both pages link to the slash version |
00:26:06 | <pabs> | so I guess we need the sitemap trick? |
00:26:21 | <Medaka> | Both https://piyo.fc2.com/afesacvu/ and https://piyo.fc2.com/afesacvu point to the same page, and there is no redirect between them. |
00:26:22 | <pokechu22> | https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition is a thing |
00:26:23 | | NeonGlitch (NeonGlitch) joins |
00:26:23 | | pabs not familiar with it |
00:27:01 | <pokechu22> | uh, they might actually have a sitemap already actually |
00:27:26 | <pokechu22> | http://piyo.fc2.com/sitemap_index.xml has <loc>//piyo.fc2.com/sitemap_1.xml</loc> which has entries like <loc>//piyo.fc2.com/sours/</loc> |
00:27:45 | <pabs> | hmm, will that work without http: ? |
00:28:02 | <pokechu22> | I'm not sure if archivebot will accept // links like that in sitemaps (those are standard for <a href="//"> though), but it's probably best to try |
00:28:20 | <pabs> | Medaka: can you compare the sitemaps to your list and see if yours has any more than theirs? |
00:28:44 | <pabs> | in the meantime lets start one based on the sitemap |
00:29:09 | <Medaka> | pokechu22: Yes, I created the list of piyo.fc2.com URLs from there. https://piyo.fc2.com/contents/search/?usersearch=&area%5B%5D=0&ca%5B%5D=0&action=search&mode=composition |
00:29:28 | <pokechu22> | 75210 entries in the sitemap |
00:31:17 | <pabs> | a naive job for http://piyo.fc2.com/ didn't find the sitemap :/ |
00:31:23 | <pabs> | trying again with the right URL |
00:31:46 | <pokechu22> | It's listed in robots.txt |
00:31:56 | <pokechu22> | but probably better to just start it with the sitemap directly |
00:33:16 | <pokechu22> | hmm, there are a few users in the list but not in the sitemap, e.g. https://piyo.fc2.com/kukuku/ |
00:33:17 | <pabs> | ok, AB does *not* like the sitemap URLs :/ |
00:33:34 | <Medaka> | pabs: Got it, I'll give it a try. |
00:34:00 | <pabs> | Medaka: we are going to need a new list combining your stuff and the sitemap |
00:34:11 | <pabs> | and we are going to need pokechu22's custom sitemap trick |
00:34:17 | <pokechu22> | If they already have a sitemap it's pretty easy to create new sitemaps just by editing theirs |
00:34:39 | <pabs> | looks like they have a whole bunch of sitemaps |
00:35:27 | <pabs> | https://transfer.archivete.am/SkDqU/piyo.f2c.com-sitemaps.txt |
00:35:28 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/SkDqU/piyo.f2c.com-sitemaps.txt |
00:35:58 | <pokechu22> | oh, and sitemap_mob is links like http://piyo.fc2.com/m/kars3zjt/ and http://piyo.fc2.com/m/idakoeru/ which don't seem obviously useful to me. Medaka, can you tell what those are supposed to do? |
00:36:52 | <pabs> | I think we should include http://piyo.fc2.com/ in the job too, since it finds a bunch of category/search stuff, which might be useful for navigation in the WBM |
00:38:07 | <pabs> | whats the content of the /m/ URLs compared to the non-/m/ equivalent? |
00:39:29 | <pokechu22> | Sitemaps are the same users in the same order, just with /m/ added |
00:39:45 | <Medaka> | pokechu22: They are likely meant to serve a role similar to a Twitter timeline, where the posts of many users are displayed in chronological order. |
00:39:57 | <pokechu22> | The actual page shows http://piyo.fc2.com/start/1/ |
00:41:16 | <pabs> | also, lets include that search URL above in the custom sitemap? |
00:42:06 | <pokechu22> | Yeah, I can do that (and even include all pages in it) |
00:42:42 | <pabs> | would have assumed just the first page would be enough, others would be discovered? |
00:42:45 | <pabs> | but ack |
00:44:05 | <pokechu22> | Yeah, but might as well put them all in at once so we don't have to worry about it discovering pages at the end of the job |
00:44:24 | <pabs> | ah good point |
00:44:28 | <pokechu22> | oh, http://piyo.fc2.com/m/kars3zjt/ matches https://piyo.fc2.com/ exactly |
00:45:37 | <pabs> | Google does not know about http://piyo.fc2.com/m/ but does have http://piyo.fc2.com/ URLs, so /m/ is probably not important |
00:46:09 | <pabs> | for both site:http://piyo.fc2.com/m/ and "http://piyo.fc2.com/m/" |
00:47:11 | <pabs> | btw pokechu22, would be great to get a writeup of that custom sitemap trick :) |
00:47:58 | | dabs quits [Read error: Connection reset by peer] |
00:51:13 | <pokechu22> | https://transfer.archivete.am/inline/eGpPV/piyo.fc2.com_seed_urls.txt - all there really is to the trick is to create an XML sitemap, and do an !a < list job with that in your list. You could just as easily create an HTML page with links but sitemaps are a bit cleaner |
00:53:37 | <pabs> | so the first URL sets the parent for the job? |
00:54:26 | <pabs> | going to add this to the wiki |
00:54:50 | <pokechu22> | I think the sitemap is treated as the parent, but because https://piyo.fc2.com/ is also in the list, it's willing to recurse over that site as well (even if it's not the parent). My understanding (untested though) is that if you didn't have https://piyo.fc2.com/ in the list, then it'd save one level of outlinks (still avoiding --no-parent issues for redirects) but not recurse |
00:54:52 | <pokechu22> | further |
00:55:40 | <pabs> | hmm, but if the sitemap is the parent, then https://transfer.archivete.am/ becomes the parent? |
00:56:03 | <pabs> | hmm ok |
00:56:43 | <pokechu22> | Yeah, https://transfer.archivete.am/ is treated as the parent. So things wouldn't behave nicely if you had https://transfer.archivete.am/ URLs in your sitemap (or, I guess, if the targetted site links to https://transfer.archivete.am/) but neither of those are likely |
00:56:44 | <eggdrop> | inline (for browser viewing): https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/ https://transfer.archivete.am/inline/) |
00:57:11 | <pokechu22> | (note that when I generate a list, I put the python script in the list directly, not in the generated sitemap) |
00:58:23 | <pabs> | ah, you have a script for this? |
00:59:12 | <pokechu22> | Not for generic sitemaps, but I did generate sitemaps for paper.comac.cc and ipaper.comac.cc |
00:59:41 | | bilboed0 quits [Ping timeout: 260 seconds] |
00:59:41 | <pokechu22> | because those were awkward combinations of XML+javascript-based calendars and HTML articles |
01:02:32 | | Wohlstand (Wohlstand) joins |
01:03:07 | <pabs> | whoa, are all these using the trick? https://archive.fart.website/archivebot/viewer/?q=_seed_urls.txt |
01:04:30 | <pokechu22> | Some are, some are just lists of subdomains (though I've been naming those _subdomains.txt instead lately), some are just URLs not containing slashes |
01:07:45 | <pabs> | so with the trick, will it recurse on http://piyo.fc2.com/ urls that aren't in the custom sitemaps? |
01:08:45 | <pokechu22> | Yes |
01:12:27 | <h2ibot> | PaulWise edited ArchiveBot (+1016, document the custom sitemap trick by pokechu22): https://wiki.archiveteam.org/?diff=55473&oldid=55077 |
01:12:38 | <pabs> | pokechu22++ |
01:12:39 | <pabs> | pokechu22++ |
01:12:39 | <eggdrop> | [karma] 'pokechu22' now has 138 karma! |
01:12:41 | <eggdrop> | [karma] 'pokechu22' now has 139 karma! |
01:17:37 | | NeonGlitch quits [Client Quit] |
01:33:31 | | Megame quits [Quit: Leaving] |
01:40:02 | <Medaka> | I'm watching the crawl progress. Thank you so much! |
01:41:52 | | ericgallager quits [Client Quit] |
01:43:42 | | Medaka quits [Quit: Leaving] |
01:58:36 | | emphatic quits [Ping timeout: 260 seconds] |
02:50:53 | | NeonGlitch (NeonGlitch) joins |
02:51:25 | | NeonGlitch quits [Client Quit] |
02:53:22 | | NeonGlitch (NeonGlitch) joins |
02:53:48 | | emphatic joins |
03:04:18 | <pabs> | !tell Medaka hmm are URLs like https://pr.fc2.com/lengthyst/ on your radar? |
03:04:18 | <eggdrop> | [tell] ok, I'll tell Medaka when they join next |
03:07:37 | <pabs> | !tell Medaka where did the URLs in old-fc2web-urls.txt come from btw? |
03:07:38 | <eggdrop> | [tell] ok, I'll tell Medaka when they join next |
03:14:30 | <pabs> | hmm, lots of other domains popping up too |
03:48:03 | | Wohlstand quits [Remote host closed the connection] |
04:08:48 | | NeonGlitch quits [Client Quit] |
04:17:48 | | eroc19906 (eroc1990) joins |
04:19:11 | | eroc1990 quits [Ping timeout: 260 seconds] |
04:25:50 | | HP_Archivist quits [Quit: Leaving] |
04:27:50 | | HP_Archivist (HP_Archivist) joins |
04:28:17 | | HP_Archivist quits [Remote host closed the connection] |
04:28:44 | | HP_Archivist (HP_Archivist) joins |
04:30:03 | | HP_Archivist quits [Client Quit] |
04:30:22 | | HP_Archivist (HP_Archivist) joins |
05:06:13 | | NeonGlitch (NeonGlitch) joins |
05:06:40 | | ericgallager joins |
05:13:59 | | DogsRNice_ quits [Read error: Connection reset by peer] |
05:26:55 | | NeonGlitch quits [Client Quit] |
05:32:06 | | nicolas17 quits [Ping timeout: 260 seconds] |
06:04:49 | <BlankEclair> | envs.net running out of funds: https://catgirl.center/notes/a6yrph70akwz2ie4 |
06:06:13 | | ericgallager quits [Client Quit] |
06:08:20 | | lennier2 quits [Ping timeout: 258 seconds] |
06:08:31 | | lennier2 joins |
06:20:55 | | Webuser608106 joins |
06:24:23 | | Webuser608106 quits [Client Quit] |
06:30:57 | | pokechu22 quits [Ping timeout: 258 seconds] |
06:33:05 | | pokechu22 (pokechu22) joins |
06:56:41 | | G4te_Keep3r34924156 quits [Ping timeout: 260 seconds] |
07:18:59 | | Island quits [Read error: Connection reset by peer] |
07:42:49 | <@arkiver> | is fc2web being handled through AB? |
07:43:09 | <@arkiver> | i've been a bit less around this week due to a vacation, should be around more again next week and fully the week after |
07:49:34 | | BornOn420 (BornOn420) joins |
08:07:43 | <pabs> | there are two AB jobs running for a couple of subdomains, but looking at the log so far there are other subdomains that may have been missed |
08:25:57 | | leo60228- quits [Ping timeout: 258 seconds] |
08:26:20 | | leo60228 (leo60228) joins |
08:30:02 | | lys8 joins |
08:30:36 | | lys quits [Read error: Connection reset by peer] |
08:30:37 | | lys8 is now known as lys |
08:51:59 | | G4te_Keep3r34924156 joins |
09:16:45 | | midou quits [Remote host closed the connection] |
09:17:02 | | midou joins |
09:17:55 | | ducky (ducky) joins |
09:40:39 | | pedantic-darwin joins |
09:42:12 | | sec^nd quits [Remote host closed the connection] |
09:42:39 | | sec^nd (second) joins |
10:01:57 | | Lunarian1 (LunarianBunny1147) joins |
10:05:06 | | LunarianBunny1147 quits [Ping timeout: 260 seconds] |
10:37:06 | <chrismrtn> | Is there a working method for getting info on a set of Twitter user IDs? I used to use the `UsersByRestIds` endpoint (as did snscrape), but that started returning 404 as of a few days ago... |
11:00:04 | | Bleo18260072271962345 quits [Quit: The Lounge - https://thelounge.chat] |
11:02:47 | | Bleo18260072271962345 joins |
11:07:23 | <chrismrtn> | Ah, it looks like it still works, but may need some special header in the request. |
11:07:44 | | cyanbox joins |
11:17:56 | <h2ibot> | Himond000 edited Deathwatch (+152, /* 2025 */ add ojiji.net): https://wiki.archiveteam.org/?diff=55475&oldid=55472 |
11:23:08 | | ericgallager joins |
11:43:41 | | nine quits [Ping timeout: 260 seconds] |
11:45:37 | | nine joins |
11:45:37 | | nine is now authenticated as nine |
11:45:37 | | nine quits [Changing host] |
11:45:37 | | nine (nine) joins |
11:46:37 | | nine quits [Client Quit] |
11:51:06 | | ericgallager quits [Read error: Connection reset by peer] |
11:51:24 | | ericgallager joins |
12:02:28 | | ericgallager quits [Client Quit] |
12:34:26 | | Webuser586003 joins |
12:34:55 | | Webuser586003 quits [Client Quit] |
12:35:25 | | Webuser516190 joins |
12:35:39 | | Webuser516190 quits [Client Quit] |
12:36:14 | | Webuser055756 joins |
12:36:35 | | Webuser055756 quits [Client Quit] |
13:29:25 | <jacksonchen666> | i know AT≠IA but i'm looking to to upload my ~440+ GB of dead/unlisted youtube videos to IA (structured as channelID/videoID/ with .info.json, hopefully), could i get pointers/help with metadata for IA, collection choice and uploading method? |
13:34:09 | | FiTheArchiver joins |
13:35:55 | | Wohlstand (Wohlstand) joins |
13:53:03 | | NeonGlitch (NeonGlitch) joins |
14:10:30 | | ducky quits [Read error: Connection reset by peer] |
14:10:35 | | ducky (ducky) joins |
14:15:56 | | FiTheArchiver quits [Client Quit] |
14:33:06 | | Cronfox quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
14:33:20 | | Cronfox (Cronfox) joins |
14:45:49 | <Hans5958> | <jacksonchen666> "i know AT≠IA but i'm looking..." <- Maybe you can follow how TubeUp is doing: https://archive.org/details/youtube-2KttA3p799A. The identifier (youtube-2KttA3p799A) has a format that Wayback Machine can use, and other metadata is included in the description and other metadata stuff |
14:46:08 | <Hans5958> | AFAIK, maybe someone can fill me in, especially with such volume |
14:54:39 | | nicolas17 joins |
14:58:22 | <@JAA> | I'm pretty sure that is not integrated into the Wayback Machine. |
14:58:38 | <@JAA> | (I'd hope it isn't.) |
14:58:45 | <h2ibot> | Hans5958 edited Twitch.tv (+6): https://wiki.archiveteam.org/?diff=55476&oldid=55393 |
14:58:59 | <@JAA> | But yes, following that format is probably not a bad idea. |
15:02:16 | | cyanbox quits [Read error: Connection reset by peer] |
15:11:11 | <Hans5958> | <JAA> "I'm pretty sure that is not..." <- Ah, I must've either confused or remembered things wrong. Well, at least with the format it can be found relatively easily (e.g. https://findyoutubevideo.thetechrobo.ca/) |
15:12:48 | <Hans5958> | (ah, I often forgot that I couldn't turn off mentions when using the reply feature on Matrix; Discord habits, please apologize for the ping) |
15:36:21 | | Lambro_D joins |
15:40:19 | | nine joins |
15:40:19 | | nine is now authenticated as nine |
15:40:19 | | nine quits [Changing host] |
15:40:19 | | nine (nine) joins |
15:47:13 | | nine quits [Client Quit] |
15:50:18 | <TheTechRobo> | Yes, that is the most typical format I see. I also see youtube_CHANNELID items sometimes. If you do the latter (or some other method), please let me know so I can index them :-) |
15:57:02 | | NeonGlitch quits [Client Quit] |
16:02:38 | | NeonGlitch (NeonGlitch) joins |
16:02:39 | | nine joins |
16:02:39 | | nine is now authenticated as nine |
16:02:39 | | nine quits [Changing host] |
16:02:39 | | nine (nine) joins |
16:03:10 | | NeonGlitch quits [Client Quit] |
16:03:37 | | nine quits [Client Quit] |
16:05:07 | | NeonGlitch (NeonGlitch) joins |
16:05:46 | | NeonGlitch quits [Client Quit] |
16:10:12 | <steering> | TheTechRobo++ |
16:10:13 | <eggdrop> | [karma] 'TheTechRobo' now has 14 karma! |
16:13:56 | | NeonGlitch (NeonGlitch) joins |
16:14:44 | | Megame (Megame) joins |
16:57:56 | | BennyOtt quits [Quit: ZNC 1.9.1 - https://znc.in] |
16:59:06 | | NeonGlitch quits [Client Quit] |
16:59:22 | | BennyOtt (BennyOtt) joins |
17:02:26 | | BennyOtt quits [Remote host closed the connection] |
17:03:38 | | NeonGlitch (NeonGlitch) joins |
17:04:16 | | NeonGlitch quits [Client Quit] |
17:05:04 | | BennyOtt (BennyOtt) joins |
17:08:06 | <h2ibot> | HadeanEon edited Deaths in 2000 (+383, BOT - Updating page: {{saved}} (117),…): https://wiki.archiveteam.org/?diff=55477&oldid=55430 |
17:08:07 | <h2ibot> | HadeanEon edited Deaths in 2000/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55478&oldid=55431 |
17:14:04 | | NeonGlitch (NeonGlitch) joins |
17:14:39 | | NeonGlitch quits [Client Quit] |
17:17:06 | | riteo quits [Remote host closed the connection] |
17:20:48 | | NeonGlitch (NeonGlitch) joins |
17:21:21 | | NeonGlitch quits [Client Quit] |
17:28:09 | <h2ibot> | HadeanEon edited Deaths in 2004 (-420, BOT - Updating page: {{saved}} (6),…): https://wiki.archiveteam.org/?diff=55479&oldid=55435 |
17:28:10 | <h2ibot> | HadeanEon edited Deaths in 2004/list (-33, BOT - Updating list): https://wiki.archiveteam.org/?diff=55480&oldid=55325 |
17:30:57 | | ljcool2006 joins |
17:51:13 | <h2ibot> | HadeanEon edited Deaths in 2008 (+379, BOT - Updating page: {{saved}} (3),…): https://wiki.archiveteam.org/?diff=55481&oldid=55440 |
17:51:14 | <h2ibot> | HadeanEon edited Deaths in 2008/list (+22, BOT - Updating list): https://wiki.archiveteam.org/?diff=55482&oldid=55125 |
17:51:17 | | NeonGlitch (NeonGlitch) joins |
17:54:43 | | dabs joins |
18:02:27 | | riteo (riteo) joins |
18:06:38 | | grill (grill) joins |
18:06:58 | | grill quits [Client Quit] |
18:07:12 | | grill (grill) joins |
18:11:16 | <h2ibot> | HadeanEon edited Deaths in 2010 (-371, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55483&oldid=55442 |
18:11:17 | <h2ibot> | HadeanEon edited Deaths in 2010/list (-30, BOT - Updating list): https://wiki.archiveteam.org/?diff=55484&oldid=55443 |
18:14:22 | | NeonGlitch quits [Client Quit] |
18:21:18 | <h2ibot> | HadeanEon edited Deaths in 2011 (-391, BOT - Updating page: {{saved}} (204),…): https://wiki.archiveteam.org/?diff=55485&oldid=55444 |
18:21:19 | <h2ibot> | HadeanEon edited Deaths in 2011/list (-26, BOT - Updating list): https://wiki.archiveteam.org/?diff=55486&oldid=55334 |
18:28:24 | | SootBector quits [Remote host closed the connection] |
18:29:36 | | SootBector (SootBector) joins |
18:31:20 | | NeonGlitch (NeonGlitch) joins |
18:34:20 | <h2ibot> | HadeanEon edited Deaths in 2012 (+442, BOT - Updating page: {{saved}} (194),…): https://wiki.archiveteam.org/?diff=55487&oldid=55445 |
18:34:21 | <h2ibot> | HadeanEon edited Deaths in 2012/list (+34, BOT - Updating list): https://wiki.archiveteam.org/?diff=55488&oldid=55446 |
18:44:22 | <h2ibot> | HadeanEon edited Deaths in 2013 (+375, BOT - Updating page: {{saved}} (211),…): https://wiki.archiveteam.org/?diff=55489&oldid=55447 |
18:44:23 | <h2ibot> | HadeanEon edited Deaths in 2013/list (+24, BOT - Updating list): https://wiki.archiveteam.org/?diff=55490&oldid=55336 |
18:53:06 | | nine joins |
18:53:06 | | nine is now authenticated as nine |
18:53:06 | | nine quits [Changing host] |
18:53:06 | | nine (nine) joins |
19:07:56 | | Lunarian1 is now known as LunarianBunny1147 |
19:16:21 | | grill quits [Ping timeout: 260 seconds] |
19:16:27 | <h2ibot> | HadeanEon edited Deaths in 2016/list (+44, BOT - Updating list): https://wiki.archiveteam.org/?diff=55491&oldid=55273 |
19:33:30 | | NeonGlitch quits [Client Quit] |
19:34:37 | | NeonGlitch (NeonGlitch) joins |
19:38:42 | | tzt quits [Ping timeout: 258 seconds] |
19:43:01 | | tzt (tzt) joins |
19:46:16 | | makeworld3 joins |
19:46:22 | | makeworld quits [Ping timeout: 258 seconds] |
19:46:22 | | makeworld3 is now known as makeworld |
19:55:29 | | IDK (IDK) joins |
19:59:39 | <h2ibot> | HadeanEon edited Deaths in 2017 (-302, BOT - Updating page: {{saved}} (373),…): https://wiki.archiveteam.org/?diff=55492&oldid=55453 |
19:59:40 | <h2ibot> | HadeanEon edited Deaths in 2017/list (-76, BOT - Updating list): https://wiki.archiveteam.org/?diff=55493&oldid=55454 |
20:04:34 | | APOLLO03 joins |
20:15:13 | | NeonGlitch quits [Client Quit] |
20:16:58 | | ericgallager joins |
20:28:38 | | Island joins |
20:35:33 | | DigitalDragons quits [Read error: Connection reset by peer] |
20:35:41 | | Exorcism quits [Ping timeout: 260 seconds] |
20:38:03 | | ShakespeareFan00 joins |
20:38:08 | <ShakespeareFan00> | Hi, |
20:38:34 | <ShakespeareFan00> | Is there an effort to backup important resources from Internet Archive to other sites? |
20:39:14 | <pokechu22> | quoting from my own message, just to get thing started: I'm not aware of any specific project for those (there's https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK but nothing ever really came of that). I don't think anything that's in public domain/digitized on behalf of the library of congress (which I believe includes Catalog of Copyright Entries) is at risk though |
20:39:49 | <ShakespeareFan00> | My specific concern was the Catalog of Copyright Entries scans ( which I've not found as scans on sites other than the Internet Archive) |
20:39:57 | <Flashfire42> | I do believe IA has out of country backups tho I am not sure how up to date they are per say |
20:40:35 | <ShakespeareFan00> | Some of the volumes were mirrored to Wikimedia Commons in 2020 (during the last scare about IA's existence, but I'm not sure on the quality.) |
20:41:06 | <ShakespeareFan00> | It eventually formed the starting point for - https://commons.wikimedia.org/wiki/Commons:IA_books |
20:41:25 | <ShakespeareFan00> | It would of course be nice not to have to rely on only 2 site hosting the volumes |
20:42:02 | <ShakespeareFan00> | @pokechu22: The resources may well be on the LOC site directly, but I hadn't found them previously, myself.. |
20:42:52 | <ShakespeareFan00> | I could not add a project suggestion, to the Wiki, because of a disagreement or typo I made over a decade ago, meaning I am still blocked on the wiki. |
20:44:36 | <ShakespeareFan00> | It would of course be appreciated that if any Archiveteam Warriors (offfical or unoffical) started mirroring Federal works and those in the public domain (by expiry or non-renewal) to sites other than Internet Archive. One dedicated contributor to Wikimedia Commons was doing this in 2020-21, but single handedly 😭 |
20:44:46 | <pokechu22> | I think some are on google books as well, but google books scans aren't great |
20:44:52 | <ShakespeareFan00> | Quite. |
20:45:22 | <ShakespeareFan00> | I cannot use the wiki, but I would support archival of PDF or DJVU of scanned books to Wikimedia Commons ... |
20:47:53 | <ShakespeareFan00> | Commons or Wikisource can use either format. The workd don't have to be in English even, The only requirment is that they are Federal (US) , in the public domain (by expiry, no-notice etc.). And full academic works under licenses such a Creative Commons Share Alike and CC-BY are also appreciated on Commons, generally.. |
20:49:47 | <ShakespeareFan00> | I am suprised not to have seen a more active effort to mirror works from IA to Wikimedia Commons.. Perhaps the Commons project page I linked can be revived if people start mirroring old works.. Finds from IA now on Wikisource have included 18th century Statute Collections, obscure text on Color, and a US Federal Work giving a set of vector |
20:49:47 | <ShakespeareFan00> | fonts, as well as 19th century and early 20th works of fiction. ... |
20:50:02 | <ShakespeareFan00> | too numerous to mention .. |
20:50:27 | <ShakespeareFan00> | PLEASE Upoload the Public domain to Wikimedia Commons.. Please !. |
20:51:10 | <ShakespeareFan00> | I am of course not aware of any other 'asteroid impact' backups at the moment for public domain resources on IA. |
20:52:47 | <ShakespeareFan00> | pokechu22 : Well I've said my 2 cents.. I look forward to seeing plenty of Public Domain works appearing on Wikimedia Commons in the next few weeks. |
20:53:08 | <ShakespeareFan00> | That includes higher quality versions of scans Wikimedia Commons already has. :) |
20:53:22 | <ShakespeareFan00> | Apologies for the wall of text. |
20:53:54 | <pokechu22> | I don't have much experience uploading to commons, but I don't see anything stopping someone from uploading it, other than needing to sort through and catalog them |
20:54:34 | <pokechu22> | It does look like a lot of them are listed on https://en.wikisource.org/wiki/Catalog_of_Copyright_Entries |
20:54:42 | <ShakespeareFan00> | At present, the focus is on getting stuff uploaded. I've found on the whole that once it's on the site , it gets categorised very quckly.. |
20:55:19 | <ShakespeareFan00> | See the project link I gave as well.. I won't mention it a second time, unless asked directly. |
20:55:55 | <pokechu22> | What stops you from uploading them? Seems like https://ia-upload.wmcloud.org/ seems to exist at least |
20:56:24 | <ShakespeareFan00> | I will also note here that English Wikisource ALWAYS needs transcribers. - https://en.wikisource.org/wiki/Wikisource:About |
20:57:08 | <ShakespeareFan00> | pokechu22 : Limited bandwidth.. But I see your sentiment :) |
20:57:43 | <ShakespeareFan00> | I'm also behind a firewall on my system. |
20:58:15 | <ShakespeareFan00> | When I can, I've certainly used the tool you just linked. |
20:58:37 | <ShakespeareFan00> | However, if 100 people were moving important works.. |
20:59:17 | <pokechu22> | Ah, that makes a bit more sense at least |
20:59:43 | <@JAA> | That tool seems to transfer things directly, maybe? |
20:59:50 | <ShakespeareFan00> | Nothing stops anyone from using the tool for as many public domain works as possible :) (Generally public domain stuff doesn't get removed on Commons, unless Commons already has it, or an items proves to be in copyright, rare.). |
21:00:44 | <ShakespeareFan00> | @JAA: yes, and it can also make a DJVu directly from the JP2/ scans at IA, if a DJVU/PDF doesn't yet exist, or is lo quality.. (I've used that option once or twice myself.) |
21:01:21 | <ShakespeareFan00> | https://en.wikisource.org/wiki/Help:Internet_Archive |
21:01:55 | <ShakespeareFan00> | Mirroring of resouces as I said was being undertaken in 2020-21 but stalled. |
21:02:16 | <ShakespeareFan00> | With enough 'warriors' though.. |
21:03:24 | <@JAA> | Yeah, with enough devs, our software would be in a better state, too. And with more resources, we could archive more of the stuff being lit on fire every day. |
21:04:23 | <ShakespeareFan00> | I also encourage people to check out Wikisource.. It's kind of like Distributed Proofreaders, but to me a lot more friendly ;).. |
21:04:39 | <@JAA> | IA does have a second location in Canada. I don't know what fraction of the data has been migrated there yet though. |
21:05:00 | <ShakespeareFan00> | @JAA : If only, If only (re never having the resources) 😁 |
21:05:16 | <@JAA> | Personally, I don't see IA in immediate danger, and so I'll rather focus on archiving things that are. |
21:05:33 | <@JAA> | More copies are always good though. |
21:05:37 | <steering> | they'll probably both get nuked when WW3 kicks off anyway. |
21:05:38 | <steering> | :P |
21:06:04 | <pokechu22> | Wikisource is nice and I did a fair bit of contribution to i tin the past; I just have been focused more on things that I have unique skills for more recently |
21:07:01 | <ShakespeareFan00> | IA is at risk as I see it ( You aware of the dispute over old sound recordings? - 700 million isn't damages it's a "ruin-em" approach. And IA closing wouldn't just remove the disputed sound recordings. ) |
21:07:40 | <ShakespeareFan00> | https://blog.archive.org/2025/04/17/take-action-defend-the-internet-archive/ |
21:08:14 | <ShakespeareFan00> | That's why "asteroid-mitigation" measures became advisable... |
21:08:21 | <ShakespeareFan00> | 🤣 |
21:09:19 | <ShakespeareFan00> | As I said I strongly suggest mass mirroring of the public domain to Wikimedia Commons :) |
21:09:32 | <ShakespeareFan00> | (or other platforms as well..) |
21:10:50 | <ShakespeareFan00> | BTW My current handle is not necessarily the one I've used on Commons and Wikisource ... |
21:13:01 | | etnguyen03 (etnguyen03) joins |
21:13:10 | | NeonGlitch (NeonGlitch) joins |
21:13:40 | | NeonGlitch quits [Client Quit] |
21:17:01 | <ShakespeareFan00> | This IRC is logged, so hoepfully see my 2cents, and acts accordingly :) |
21:19:29 | <ShakespeareFan00> | Also despite the typo or issue that got me blocked on the wiki I have undertaken some archival efforts of my own.. Thanks to using a specifc tool suggested here, the entirity of 8bs.com got backed up to Wayback.. including entire scan runs of publication that the Internet Archive did not have! |
21:19:39 | <ShakespeareFan00> | (Mostly Acorn computer related) |
21:22:10 | <ShakespeareFan00> | I have to go, but PLEASE keep archiving , especially the public domain :) |
21:22:14 | | ShakespeareFan00 quits [Client Quit] |
21:23:52 | <h2ibot> | HadeanEon edited Deaths in 2019 (+275, BOT - Updating page: {{saved}} (491),…): https://wiki.archiveteam.org/?diff=55494&oldid=55455 |
21:23:53 | <h2ibot> | HadeanEon edited Deaths in 2019/list (+35, BOT - Updating list): https://wiki.archiveteam.org/?diff=55495&oldid=55279 |
21:26:55 | <@JAA> | <aatt.png> |
21:34:07 | | etnguyen03 quits [Client Quit] |
21:55:24 | | Webuser862912 joins |
22:05:30 | <szczot3k|m> | moin |
22:11:33 | | NeonGlitch (NeonGlitch) joins |
22:12:03 | | NeonGlitch quits [Client Quit] |
22:14:00 | <h2ibot> | HadeanEon edited Deaths in 2020/list (+0, BOT - Updating list): https://wiki.archiveteam.org/?diff=55496&oldid=55340 |
22:15:02 | | etnguyen03 (etnguyen03) joins |
22:42:01 | | DogsRNice joins |
22:42:49 | | Webuser529564 quits [Quit: Ooops, wrong browser tab.] |
22:43:16 | | Megame quits [Quit: Leaving] |
22:45:08 | | etnguyen03 quits [Client Quit] |
22:54:07 | <h2ibot> | HadeanEon edited Deaths in 2021 (+318, BOT - Updating page: {{saved}} (53),…): https://wiki.archiveteam.org/?diff=55497&oldid=55457 |
22:54:08 | <h2ibot> | HadeanEon edited Deaths in 2021/list (+21, BOT - Updating list): https://wiki.archiveteam.org/?diff=55498&oldid=55342 |
23:10:14 | | ericgallager quits [Client Quit] |
23:16:42 | | Bleo18260072271962345 quits [Quit: Ping timeout (120 seconds)] |
23:16:54 | | kiska52 quits [Quit: Ping timeout (120 seconds)] |
23:16:56 | | tek_dmn quits [Quit: ZNC - https://znc.in] |
23:17:06 | | Ryz quits [Quit: Ping timeout (120 seconds)] |
23:17:12 | | kiska52 joins |
23:17:17 | | Bleo18260072271962345 joins |
23:18:09 | | Ryz (Ryz) joins |
23:18:14 | | tek_dmn (tek_dmn) joins |
23:27:30 | | Webuser862912 quits [Client Quit] |