00:02:10Ruthalas quits [Ping timeout: 258 seconds]
00:05:14gayspaghetti joins
00:05:47<gayspaghetti>lol, the main #archiveteam topic
00:06:02<gayspaghetti>literally came here to share some wikia stuff >.>
00:07:51<gayspaghetti>anyways. found something out about their API! they don't restrict the limit parameter on the article listing endpoint, and it's actually pretty responsive - so running a scrape to get all article links is pretty doable
00:08:34<gayspaghetti>i've gotten about 8.7 million urls and still have ~1k urls to go - would the article url list be useful here?
00:08:49<gayspaghetti>~1k wikis*
00:09:11<Barto>at least someone reads the topic :-)
00:09:34<gayspaghetti>please, the topic is the best part of an IRC channel
00:10:29<gayspaghetti>you can feel the years of people popping up with random questions like the rings of a tree trunk
00:11:35<wizards>extremely well spoken; i laughed
00:12:46gayspaghetti bows
00:19:55TheTechRobo quits [Ping timeout: 252 seconds]
00:21:39TheTechRobo (TheTechRobo) joins
01:00:01dm4v quits [Client Quit]
01:01:35dm4v joins
01:01:38dm4v quits [Changing host]
01:01:38dm4v (dm4v) joins
01:08:09monoxane quits [Remote host closed the connection]
01:09:49BlueMaxima_ joins
01:11:59psyl quits [Remote host closed the connection]
01:12:16monoxane (monoxane) joins
01:13:51BlueMaxima quits [Ping timeout: 258 seconds]
01:20:12Hackerpcs quits [Quit: Hackerpcs]
01:23:36Hackerpcs (Hackerpcs) joins
01:25:53<gayspaghetti>aight, scrape done, 3605 wiki domains
01:26:02<gayspaghetti>14,708,113 article urls
01:34:07<gayspaghetti>alright if i post the file with 'em here? (wikia article URLs)
01:38:03<ThreeHM>If you're talking about the regular mediawiki API, we already have tools to dump wikis through their API (https://wiki.archiveteam.org/index.php/WikiTeam)
01:38:31<gayspaghetti>oh, no
01:38:36<gayspaghetti>i'm talking about fandom.com
01:38:56<gayspaghetti>/ wikia.com
01:39:41<gayspaghetti>which iirc doesn't expose the classic MW api?
01:39:49<gayspaghetti>unless i'm utterly blind and missed it
01:40:00<ThreeHM>It does (example: https://minecraft.fandom.com/api.php)
01:40:54<ThreeHM>We've dumped a few NSFW wikis with shutdown notices using the wikiteam software
01:41:06<gayspaghetti>oh, god, i'm utterly blind!
01:41:19<gayspaghetti>knew i should've gotten my glasses checked again
01:41:28<gayspaghetti>i think the /w/ prefix got me >.>
01:41:44<@JAA>A list of wikis would be great though. I'm sure we have one, but can't hurt to have another. https://transfer.archivete.am/
01:42:00<gayspaghetti>sure! i scraped them off waybackurls
01:42:33<gayspaghetti>https://transfer.archivete.am/11AIGI/wikia-domains.txt
01:42:43<ThreeHM>Yeah, it'd be nice if we could search for NSFW stuff specifically since that would likely be at risk now
01:44:44<gayspaghetti>yeah :/ if only they had an NSFW label in the API - though i guess if they did that they'd be a company such that we wouldn't have to worry about it
01:45:45<@JAA>Thanks
01:45:51<gayspaghetti>np!
01:46:43<gayspaghetti>'bout 3.6k of them in the file, few of them don't exist anymore tho
02:02:53dm4v quits [Client Quit]
02:03:14dm4v joins
02:03:16dm4v quits [Changing host]
02:03:16dm4v (dm4v) joins
02:50:24<mgrandi>There can't be that many wikis on fandom, might be easier to do just the entire thing
02:50:53<mgrandi>I think we have 2 weeks if that twitter post of that one wiki admin is correct
02:52:12<@JAA>'385k+ communities' is what their homepage claims.
02:52:27<@JAA>And 50M+ wiki pages.
02:52:34<@JAA>Revisions might be in the billions.
02:53:06<@JAA>Not sure whether it includes the discussion stuff either.
02:53:38<mgrandi>385k communities? What the hecko
02:54:00<@JAA>There is a global sitemap here: https://community.fandom.com/wiki/Sitemap
02:54:05<mgrandi>I guess anyone can just create what they want huh
02:54:07<@JAA>No clue whether that's up-to-date and complete.
02:54:24<@JAA>Yeah, I think so.
02:54:58<mgrandi>Well, I can try and get a list of urls from that tonight , see how many show up
02:56:02<mgrandi>Is fandom a full mediawiki service? Can we access that weird XML dump special page that gets all the revisions in one API call?
02:58:02<@JAA>Many wikis actually have dumps linked on the Special:Statistics page.
02:59:02<@JAA>But Special:Export exists as well. I guess that's what you mean?
03:00:15<mgrandi>Yeah, reading the AT wiki page on fandom says that it creates dumps ...sometimes
03:00:17<mgrandi>Dumps are produced only on demand, hence – normally – for a small minority of wikis (in a typical month, about 50 random wikis are dumped). U
03:01:19<@JAA>Ah, right.
03:01:25<mgrandi>And yeah, special:export is what I've used in the past: https://m.mediawiki.org/wiki/Special:Export/Help:Export
03:04:53<Jake>I took another search through their global search for 'MediaWiki:Sitenotice' and 'MediaWiki:Anonnotice'; nothing new outside of the 5 we found.
03:06:24<mgrandi>As in nsfw wikis being taken down?
03:06:51<mgrandi>Also, lol the special:export seems pretty fraught with bugs, glad wiki has regular dumps at least https://m.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
03:07:07<Jake>Yes, NSFW 'notice' wise.
03:07:19gayspaghetti quits [Ping timeout: 258 seconds]
03:07:31gayspaghetti joins
03:07:51<mgrandi>Were those made by the admins of said community or admin notices from the site itself
03:08:02<mgrandi>Cause if it's per community they might not be exhaustive
03:08:06<@JAA>Taking a quick stab at the sitemap.
03:10:05fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))]
03:10:12fuzzy8021 (fuzzy8021) joins
03:10:13<Ryz>!ig b5stoyrw9yks1omuh6h53fq42 ^https?://app\.powerbi\.com/
03:10:15<Ryz>Oops
03:10:45<Jake>All 5 were from the Fandom admin.
03:13:20HP_Archivist quits [Read error: Connection reset by peer]
03:13:21gayspaghetti quits [Read error: Connection reset by peer]
03:13:39HP_Archivist (HP_Archivist) joins
03:13:43gayspaghetti joins
03:14:17<@JAA>Looks like the sitemap either immediately crumbles under load or is broken: https://community.fandom.com/wiki/Sitemap?level=2&from=birdencyclopedia&to=chickipedia
03:18:28<gayspaghetti>if you have a list of wikis
03:18:34<gayspaghetti>it's pretty easy to get articles from them
03:19:02<gayspaghetti>a list of them, that is
03:19:25<gayspaghetti>curl http://whatever.fandom.com/api/v1/Articles/List?limit=arbitrarily_long_number
03:19:38<@JAA>Yeah, it's the list of wikis I'm trying to get.
03:19:43<gayspaghetti>yeahh
03:19:43<@JAA>As in, the entire 385k or whatever.
03:19:58<@JAA>But uh, surprise surprise, Fandom sucks.
03:19:59<gayspaghetti>that's the hard part - they had a wiki list api according to some docs but it seems to have been taken down
03:20:12<gayspaghetti>https://web.archive.org/web/20200809084858/https://www.wikia.com/api/v1/#!/Mercury/getWikiData_get_0
03:20:25<gayspaghetti>this is the most recents documentation i could find
03:20:32<@JAA>Well, https://community.fandom.com/wiki/Sitemap still exists, but it breaks as mentioned above.
03:23:26<gayspaghetti>oh hey
03:23:28<@JAA>I've set up a monitor, let's see if it recovers sometime or is truly broken.
03:23:31<gayspaghetti>think i found somethin' JAA
03:23:36<gayspaghetti>https://community.fandom.com/wiki/Special:NewWikis
03:24:19<@JAA>Hmm, neat.
03:24:55<@JAA>Nice find. Hold my beer, breaking that page as well. :-P
03:25:22<gayspaghetti>lmao
03:25:36<@JAA>The offsets are probably wiki IDs.
03:25:49<@JAA>Goes to 2836767 as of right now.
03:29:43<@JAA>This is very useful because it also includes the language code.
03:30:07<gayspaghetti>oh, neat
03:31:00<@JAA>Doesn't seem to be complete though.
03:31:47<@JAA>E.g. https://community.fandom.com/wiki/Special:NewWikis?start=monster+girl doesn't list https://monstergirlencyclopedia.fandom.com/pt-br/ or https://monster-girl-encyclopedia.fandom.com/ru/
03:34:02<gayspaghetti>they've done a remarkable job of taking down every wiki list scraping we could use
03:40:59<@JAA>Found a way to enumerate wikis through the API.
03:41:07<gayspaghetti>oh?
03:41:36<@JAA>https://www.wikia.com/api/v1/Wikis/Details?ids=2836768,2836767
03:41:44<gayspaghetti>oh
03:41:48<gayspaghetti>oh are the ids sequential
03:41:56<@JAA>Haven't tried yet how many IDs you can supply.
03:42:46<@JAA>But even if it's 10, qwarc will happily shred through 28k URLs in a few minutes if their servers can handle it.
03:42:59<gayspaghetti>monch
03:43:16<@JAA>Er, 280k, and probably about half an hour.
03:43:49<@JAA>Although I've done 2k req/s before with it, so...
03:49:05<gayspaghetti>"\__ subdomains found: 43973"
03:49:08<gayspaghetti>[sickos.jpg]
03:50:15<@JAA>The API is quite slow. Looks like you can request 250 IDs at once.
03:50:38<Jake>seems to be a theme with Fandom
03:50:48<@JAA>If you request more, you just get the reply for the first 250 IDs, no error.
03:50:52<gayspaghetti>it is a miracle this site has not fallen apart
04:00:30<@JAA>Special:NewWikis retrieval is a bit over half-way done. Once it finishes, I'll run the API thing.
04:10:21qw3rty_ joins
04:14:01qw3rty__ quits [Ping timeout: 258 seconds]
04:28:15<gayspaghetti>eyo
04:28:20<gayspaghetti>got another list of urls
04:29:22<gayspaghetti>again no guarantee that any of these are still valid; got these through scraping
04:29:24<gayspaghetti>https://transfer.archivete.am/4s1W9/more-fandom-wikis.txt
04:31:18<@JAA>List from Special:NewWikis: https://transfer.archivete.am/15HuJB/fandom-newwikis.zst
04:31:29<@JAA>265542 are listed there.
04:31:42<gayspaghetti>that's a fair few
04:33:27<@JAA>Fetching the API data now, but that will take a bit. Currently getting an average response time of a bit over 1 second.
04:33:41<gayspaghetti>what lmfao
04:37:58<@JAA>If you think that's bad, it was 4.5 seconds across the first 15 requests.
04:38:45<@JAA>Anyway, yeah, 3.5 hours or so.
04:49:15<Frogging101>I wouldn't put it past Wikia/Fandom to make this difficult on purpose
04:49:22<Frogging101>Their management is hostile and vindictive
05:30:58BlueMaxima_ quits [Read error: Connection reset by peer]
05:59:23starship_8601 quits [Quit: starship_8601]
06:00:00<@OrIdow6>Took me a bit to realize that that didn't mean you were getting 1 b/s
06:00:06starship_8601 (starship_8601) joins
06:02:30HP_Archivist quits [Ping timeout: 258 seconds]
06:03:33<gayspaghetti>22427 fandom.com subdomains
06:03:35<gayspaghetti>https://transfer.archivete.am/i7jIF/even-more-fandom-wikis.txt
06:04:47<gayspaghetti>and w/that i'm heading off for the night, gnight all and have fun with this trainwreck of a website/company
06:04:49gayspaghetti quits [Client Quit]
06:05:24<@JAA>Oh, heh, yeah.
06:05:29<@JAA>Not quite that bad. lol
06:07:58<@JAA>Fun: https://www.wikia.com/api/v1/Wikis/Details?ids=1342509
06:13:47<@OrIdow6>Beautiful
06:30:43pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
06:33:57pabs (pabs) joins
06:49:42<h2ibot>IDKhowToEdit edited Roblox (+171, Roblox server update): https://wiki.archiveteam.org/?diff=47772&oldid=47763
06:55:23wolfin quits [Quit: ZNC - https://znc.in]
06:56:09wolfin (wolfin) joins
08:30:38<duce1337>https://www.wsj.com/articles/tpg-backed-fandom-buys-gaming-e-commerce-platform-11614186324
08:30:54<duce1337>this might result in a lot of wiki's going down imo
08:50:16jspiros quits [Client Quit]
08:55:49jspiros (jspiros) joins
09:18:58alihandro joins
09:21:21<alihandro>hi! I joined today because I learned a forum I get help from time to time has been closing
09:21:28<alihandro>I created a wiki page for it at https://wiki.archiveteam.org/index.php?title=Win_Raid_Forum
09:23:17<alihandro>It is old and somewhat large forum and I can't imagine the extent of it as it has hundreds of pages per discussion and may have images and linked files etc. etc.
09:23:37<alihandro>any help from the archive team would be appreciated
09:23:38<alihandro> thanks
09:28:09<@OrIdow6>Is there a notice on the wiki for people who get spam-filtered?
09:28:23<@OrIdow6>Spam-filter-queued or whatever it is
09:33:25<@OrIdow6>Anyhow, to summarize, 8k topic, 147k web forum closing "probably at the end of this year" according to JS popup, further details "will" be given at some indeterminate point in the future ^
09:33:44<@OrIdow6>Will wait till this page gets approved before adding to DW
09:33:56<alihandro>thanks
09:34:40<alihandro>sorry didn't realize you couldn't see the page before approval, but yeah, not much details at this point anyway
09:37:46<@OrIdow6>Any info besides what someone unfamiliar with the forum can tell from the message that pops up?
09:40:21Mateon2 joins
09:40:41alihandro quits [Ping timeout: 244 seconds]
09:41:23Mateon1 quits [Ping timeout: 258 seconds]
09:41:23Mateon2 is now known as Mateon1
09:43:11alihandro joins
09:48:26alihandro quits [Ping timeout: 244 seconds]
10:12:44Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
10:13:08Terbium joins
10:16:01pabs quits [Ping timeout: 265 seconds]
10:38:41HackMii_ is now known as HackMii
11:18:38pabs (pabs) joins
12:22:50jspiros quits [Client Quit]
12:24:29jspiros (jspiros) joins
13:41:16katocala quits [Ping timeout: 252 seconds]
13:42:05katocala joins
14:04:47qwertyasdfuiopghjkl joins
14:43:23yoshinomjm joins
14:44:54<yoshinomjm>update on the wikia?
14:58:05yoshinomjm quits [Remote host closed the connection]
15:06:23monoxane2 (monoxane) joins
15:07:36monoxane quits [Ping timeout: 258 seconds]
15:07:36monoxane2 is now known as monoxane
15:10:08HP_Archivist (HP_Archivist) joins
15:22:56monoxane quits [Ping timeout: 258 seconds]
15:25:34monoxane (monoxane) joins
15:33:31alihandro joins
15:35:38alihandro quits [Remote host closed the connection]
16:19:00<duce1337>all wikis on wikia fandom most likely will shutdown
16:20:26<@JAA>Source?
16:23:30<h2ibot>Alihandro created Win Raid Forum (+675, created page): https://wiki.archiveteam.org/?title=Win%20Raid%20Forum
16:23:31<h2ibot>Alihandro uploaded File:Winraid-closing.png: https://wiki.archiveteam.org/?title=File%3AWinraid-closing.png
16:30:28<@JAA>OrIdow6: There is no spam filter. Non-automodded users get a message that their edit needs to be approved, specifically https://wiki.archiveteam.org/index.php/MediaWiki:Moderation-edit-queued .
16:30:46<@JAA>The 403 error on certain edits got fixed. :-)
16:44:34<h2ibot>JustAnotherArchivist edited Friendster (+19, Update infobox): https://wiki.archiveteam.org/?diff=47775&oldid=40979
16:45:34<h2ibot>JustAnotherArchivist edited Spanish Revolution (-66, Update infobox): https://wiki.archiveteam.org/?diff=47776&oldid=27602
16:45:35<h2ibot>JustAnotherArchivist edited DNS History (+45, Update infobox): https://wiki.archiveteam.org/?diff=47777&oldid=46654
16:46:34<h2ibot>JustAnotherArchivist edited GeoCities Japan (+45, Update infobox): https://wiki.archiveteam.org/?diff=47778&oldid=36082
16:46:35<h2ibot>JustAnotherArchivist edited URLTeam (-24, Update infobox): https://wiki.archiveteam.org/?diff=47779&oldid=47001
17:46:09<@JAA>Roughly half of the Tor network has now upgraded to Tor 0.4.6, which no longer supports onion v2: https://metrics.torproject.org/versions.html . There are still roughly 100k v2 sites currently, but they're dropping rapidly since the release of Tor Browser 11.0 a week ago: https://metrics.torproject.org/hidserv-dir-onions-seen.html
17:46:30<@JAA>In other words, if there are any significant v2 sites we want to archive, we should do so very soon.
17:50:42<@JAA>Actually, correction, 0.4.5.11 also doesn't support v2 addresses. The versions.html page unfortunately doesn't display it in that much detail, but yeah, likely less than half of the network still supports them then.
17:51:47<h2ibot>Jake edited 4chan/4plebs (+38, Add link to collection (I believe this is where…): https://wiki.archiveteam.org/?diff=47780&oldid=30041
17:52:41<@JAA>Same with 0.3.5.17
17:57:48<h2ibot>Jake edited DailyBooth (+49, Add link to collection, fix search page.): https://wiki.archiveteam.org/?diff=47781&oldid=27468
17:58:48<h2ibot>JustAnotherArchivist edited Deathwatch (+1133, Dead sites are dead, reanimated ones are on…): https://wiki.archiveteam.org/?diff=47782&oldid=47768
18:03:49<h2ibot>JustAnotherArchivist edited CodePlex (+126, Dead): https://wiki.archiveteam.org/?diff=47783&oldid=47709
18:06:50<h2ibot>Jake edited 8tracks (+47, Add link to collection): https://wiki.archiveteam.org/?diff=47784&oldid=47580
18:08:01<@JAA>My Fandom/Wikia API retrieval finished and found 294242 wikis, by the way.
18:08:29<@JAA>Missing a few from that broken one I linked last night, but should otherwise be complete I guess.
18:20:29<@HCross>JAA: the problem is discovery
18:20:34<@HCross>For those tor sites
18:25:24<@HCross>I have the software setup to grab them, but no list
18:36:38<@JAA>Yeah :-/
18:45:31<@OrIdow6>JAA: oh
18:45:46<@OrIdow6>duce1337: Why do you think this is going to result in Wikia being shut down?
18:50:49<duce1337>im just guessing
18:50:58<duce1337>didn't wikia fandom get bought by some comapany?
18:58:39<@OrIdow6>The article was about it acquiring another company, not the other way around.
18:59:12<@OrIdow6>It did discuss, as background. its being acquired in 2018.
19:01:43<@OrIdow6>Either way, I don't think being acquired is likely to see the site shut down or have huge content removals, unless it's unprofitable or nearing there
19:03:33<@OrIdow6>HCross: If you wanted to grab all such sites you could set up something like a conventional web crawler
19:03:40<@OrIdow6>If you[re talking about TOR
19:03:57<@HCross>I have that bit working
19:04:04<@HCross>But it’s more getting a list of sites
19:06:32<@OrIdow6>I mean a conventional crawler a la Googlebot or Heretrix that discovers new sites from outlinks of existing ones
20:05:04TheTechRobo quits [Ping timeout: 258 seconds]
20:18:40TheTechRobo (TheTechRobo) joins
20:23:01Megame (Megame) joins
20:25:27TheTechRobo quits [Remote host closed the connection]
20:39:00<pcr>https://github.com/ahmia/ahmia-crawler/blob/f700b41ff87eadd8f7fda64f47b4467806f13fa2/ahmia/ahmia/settings.py#L70 has a couple starting points for onion crawling
21:27:37Hass joins
21:28:41Hass quits [Remote host closed the connection]
21:47:15alihandro joins
22:00:02<IDK>When is fandom shutting down
22:06:23<@OrIdow6>IDK: Why do you think that is happening?
22:06:57<IDK>OrIdow6: idk anything about wikia here
22:07:29<@OrIdow6>Wikia is just removing some NSFW wikis
22:07:33<@OrIdow6>Not shutting down
22:08:13<IDK>Oh I see
22:15:19alihandro quits [Remote host closed the connection]
22:15:47<IDK>Youtube Dislikes API will be removed on december 13th
22:15:49<IDK>https://support.google.com/youtube/thread/134791097/update-to-youtube-dislike-counts?hl=en
22:16:55<IDK>Developers: If you’re using the YouTube API for dislikes, you will no longer have access to public dislike data beginning on December 13th.
22:45:22BlueMaxima joins
22:53:44HP_Archivist quits [Ping timeout: 258 seconds]
23:16:39onetruth joins
23:50:51mutantmnky quits [Ping timeout: 258 seconds]
23:51:16mutantmnky (mutantmonkey) joins
23:55:08Megame quits [Client Quit]