| 00:00:47 | | sunlight leaves |
| 00:04:58 | | BlueMaxima joins |
| 00:12:18 | | testing joins |
| 00:12:28 | | testing quits [Remote host closed the connection] |
| 00:33:41 | | ram|m is now known as ramOld|m |
| 00:37:54 | | Arcorann (Arcorann) joins |
| 00:44:37 | | ramOld|m leaves |
| 01:20:39 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 01:24:11 | | Lord_Nightmare (Lord_Nightmare) joins |
| 02:42:55 | | nicolas17 joins |
| 02:46:12 | | nicolas17 is now authenticated as nicolas17 |
| 02:55:27 | | railen69 quits [Remote host closed the connection] |
| 02:55:47 | | railen69 joins |
| 02:56:00 | | railen69 quits [Remote host closed the connection] |
| 02:56:14 | | railen69 joins |
| 03:56:21 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 03:56:38 | | AmAnd0A joins |
| 04:22:10 | | killsushi joins |
| 04:54:51 | | BlueMaxima quits [Ping timeout: 258 seconds] |
| 05:12:21 | | BlueMaxima joins |
| 05:14:29 | | orbweaver quits [Remote host closed the connection] |
| 05:23:30 | | nicolas17 quits [Ping timeout: 265 seconds] |
| 06:09:21 | | hitgrr8 joins |
| 06:15:43 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
| 06:33:23 | | railen69 quits [Remote host closed the connection] |
| 06:36:27 | | railen63 joins |
| 06:42:57 | | MactasticMendez (MactasticMendez) joins |
| 06:42:58 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:07:09 | | Perk4 joins |
| 07:07:29 | | Perk quits [Ping timeout: 258 seconds] |
| 07:07:40 | | Perk4 is now known as Perk |
| 08:21:47 | <@OrIdow6> | Sent an email to the Wysp contact email since it looks like it is/was run by one person |
| 08:44:27 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 08:44:27 | | yts98 leaves |
| 08:44:35 | | yts98 joins |
| 08:44:38 | | AmAnd0A joins |
| 08:51:01 | <Barto> | looks like youtube-grab was filling my root partition like a crazy |
| 08:52:44 | | MactasticMendez quits [Client Quit] |
| 09:28:14 | | nfriedly quits [Ping timeout: 252 seconds] |
| 09:28:31 | | nfriedly joins |
| 09:51:30 | <h2ibot> | PaulWise edited Mailman2 (+31, tetaneutral lists not yet archived): https://wiki.archiveteam.org/?diff=50007&oldid=49982 |
| 09:51:30 | | lflare is now authenticated as * |
| 09:51:30 | | lflare is now known as RJHacker90154 |
| 09:51:30 | | RJHacker90154 quits [Killed (nuke.hackint.org (Nickname regained by services))] |
| 09:51:31 | | lflare (lflare) joins |
| 09:52:59 | | W7RFa6AbNFz joins |
| 09:56:16 | | Ketchup901 quits [Ping timeout: 245 seconds] |
| 10:00:02 | | railen63 quits [Remote host closed the connection] |
| 10:00:18 | | railen63 joins |
| 10:07:41 | | Ketchup901 (Ketchup901) joins |
| 11:13:19 | | W7RFa6AbNFz_ joins |
| 11:16:02 | | W7RFa6AbNFz quits [Ping timeout: 252 seconds] |
| 11:16:03 | | W7RFa6AbNFz_ quits [Read error: Connection reset by peer] |
| 11:17:00 | | W7RFa6AbNFz joins |
| 11:18:03 | | W7RFa6AbNFz_ joins |
| 11:21:37 | | W7RFa6AbNFz_ quits [Read error: Connection reset by peer] |
| 11:21:38 | | W7RFa6AbNFz quits [Ping timeout: 258 seconds] |
| 11:21:40 | | W7RFa6AbNFz joins |
| 11:24:03 | | W7RFa6AbNFz quits [Remote host closed the connection] |
| 11:24:31 | | W7RFa6AbNFz joins |
| 11:25:10 | | W7RFa6AbNFz_ joins |
| 11:27:06 | | W7RFa6AbNFz quits [Read error: Connection reset by peer] |
| 11:32:31 | | razul quits [Quit: Bye -] |
| 12:03:32 | | justmolamola joins |
| 12:12:41 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 12:13:59 | | justmolamola quits [Remote host closed the connection] |
| 12:22:23 | | VickoSaviour joins |
| 12:41:48 | <@JAA> | Over 80% of the Knowledge Adventure CDN is done. ETA is still early hours of the 28th. |
| 12:43:21 | | sec^nd quits [Ping timeout: 245 seconds] |
| 12:43:40 | | sec^nd (second) joins |
| 13:04:00 | | Iki joins |
| 13:33:07 | | balrog quits [Ping timeout: 265 seconds] |
| 13:34:05 | | Arcorann quits [Ping timeout: 252 seconds] |
| 13:40:46 | | Arcorann (Arcorann) joins |
| 13:48:23 | | Arcorann quits [Ping timeout: 252 seconds] |
| 14:08:28 | | dumbgoy_ joins |
| 14:20:49 | <Dallas> | Is there a script floating about I can dump a list of warrior ips in to to get some stats out of ? I've got all the os metrics but working on visability of the warriors atm and it occurred to me it may be a solved problem |
| 14:24:55 | <@JAA> | Maybe, but I haven't heard of it. Most powerusers don't run the warrior, so that's one thing to keep in mind. Docker log aggregation is a solved problem in a much more general sense than AT, of course. Loki comes to mind (and is used by some people here), but there are countless others. |
| 14:26:18 | <Dallas> | Oh yeah I have grafana up I could use loki good point |
| 14:26:49 | <Dallas> | I'm sure I'll get a bit of a perf boost dropping docker but for scaling it makes it to easy atm lol, cheers JAA |
| 14:27:44 | | killsushi quits [Ping timeout: 265 seconds] |
| 14:28:18 | <@JAA> | I'm not suggesting dropping Docker. I'm distinguishing the warrior (either VM or Docker image, controlled via web interface, yadda yadda) from the project images. The latter is what virtually all powerusers run. |
| 14:28:45 | <@JAA> | Bare metal is too easy to mess up and produce weird or bad data, so it's strongly recommeneded against. |
| 14:29:23 | <@JAA> | And with modern containerisation, it probably doesn't even matter that much for performance anymore. |
| 14:29:50 | <@JAA> | (Unless you're on Windows/macOS, where Docker is basically a Linux VM as I understand it.) |
| 14:31:54 | <Dallas> | This is all on digital ocean, hetz and aws so yeah the docker overhead won’t be much, dw no plans to go bare metal, I’m just working on a script to let me control all those warrior instances at the same time and wanted to see if I was duplicating work |
| 14:36:48 | <VickoSaviour> | i suggest you to edit the main page, cuz it has not been edited since 24th of April. Also, after we sort out the Egloos and LINE BLOG data, we should focus on some other short term projects to do, like Bedrock Automation. |
| 14:37:02 | <VickoSaviour> | on Wiki, ofc |
| 14:39:34 | <BigBrain> | VickoSaviour: every account can edit wiki, edit needs to be accepted by op |
| 14:52:01 | <@JAA> | VickoSaviour: The main page was last edited a week ago. You're not looking in the right place. |
| 14:54:45 | | VickoSaviour quits [Remote host closed the connection] |
| 15:15:43 | | Megame (Megame) joins |
| 15:35:05 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 15:35:12 | | AmAnd0A joins |
| 15:47:34 | | balrog (balrog) joins |
| 15:58:09 | | Dango360 quits [Read error: Connection reset by peer] |
| 16:00:29 | | Dango360 (Dango360) joins |
| 16:02:12 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 16:02:29 | | AmAnd0A joins |
| 16:14:19 | | lk quits [Quit: lk] |
| 16:40:50 | | dumbgoy_ quits [Read error: Connection reset by peer] |
| 17:03:40 | | lk (lk) joins |
| 17:32:31 | | sec^nd quits [Ping timeout: 245 seconds] |
| 17:36:18 | | pseudorizer (pseudorizer) joins |
| 17:37:45 | | sec^nd (second) joins |
| 17:51:57 | | Megame quits [Client Quit] |
| 18:17:48 | | programmerq quits [Ping timeout: 265 seconds] |
| 18:19:22 | | programmerq (programmerq) joins |
| 18:40:34 | | upintheairsheep joins |
| 18:41:17 | <upintheairsheep> | This Discord server will go unmoderated: https://discord.gg/5Zq8MVq2SW https://www.tiktok.com/t/ZT8Jch5cw/ |
| 18:41:33 | | upintheairsheep quits [Remote host closed the connection] |
| 18:54:52 | <pokechu22> | balrog: I have attempted to download sourceforge wikis using wikiteam tools but I'm not 100% sure I got all of them since there's no good index (and it seems I didn't find https://mldonkey.sourceforge.net/Main_Page). I'm downloading that now though |
| 18:55:49 | <balrog> | pokechu22: it's not only wikis but anything hosted that's affected (*.sourceforge.net) |
| 18:56:12 | | Naruyoko quits [Read error: Connection reset by peer] |
| 18:56:20 | <balrog> | I'm not sure if there's a way to get an index of all SF projects and then generate an index? |
| 18:56:27 | <pokechu22> | Yeah, just pointing out my previous attempt at downloading wikis |
| 18:56:38 | <balrog> | e.g. for mldonkey there's https://mldonkey.sourceforge.net/forums/ as well |
| 18:56:47 | <balrog> | and wikiteam tools won't get that |
| 18:57:05 | <balrog> | now, a lot of projects have merely static sites, not PHP-driven |
| 18:57:17 | <balrog> | but if they're forcing update to PHP7, I suspect a lot of them will just break |
| 18:57:25 | <balrog> | a lot of the PHP-driven ones* |
| 18:57:43 | <balrog> | anyway I wanted to bring this to the attention of Archive Team, since this was not properly announced... |
| 18:58:07 | <pokechu22> | I believe my main technique was stuff like googling `site:sourceforge.net wiki` and then trying to get stuff to show up; I also did `site:sourceforge.net inurl:Main_Page` etc. But that isn't complete. There's also a lot of ones that seem to have been already lost or moved to the non-mediawiki system (e.g. see https://axiomengine.sourceforge.net/, or |
| 18:58:09 | <pokechu22> | https://tilemaster.sourceforge.net/ which links to http://sourceforge.net/apps/mediawiki/tilemaster/ which is dead) |
| 18:59:04 | <balrog> | might be able to scrape project names from https://sourceforge.net/directory/ |
| 18:59:52 | <balrog> | that tilemaster one looks like an SF-run mediawiki system that's dead, as opposed to the project maintainers installing mediawiki themselves |
| 19:00:18 | <pokechu22> | oh yeah, one other thing to note is that running the tools directly on https://mldonkey.sourceforge.net/Main_Page will fail because it detects http://mldonkey.sourceforge.net/mediawiki/api.php and http://mldonkey.sourceforge.net/mediawiki/index.php but that redirects to https. This breaks the POST requests wikiteam tools use (causing only 1 revision to be exported for each |
| 19:00:20 | <pokechu22> | page). You need to specify --api and --index for it to work right. |
| 19:00:56 | <pokechu22> | Yeah, that's what I gathered, but it still makes finding valid wikis difficult since there's a bunch of dead links in pages that still say "wiki" |
| 19:01:16 | <balrog> | An appropriate approach might be: 1. collect list of all SF projects; 2. use wget-warc/wpull/archivebot to ingest PROJECTNAME.sourceforge.net but do not recursively follow redirects to https://sourceforge.net/*; 3. review the scraped contents for actual MediaWiki |
| 19:02:01 | <balrog> | (and for 2, do not recursively follow redirects to different domains, again to reduce scope) |
| 19:02:01 | <pokechu22> | I initially scraped wikiapiary for this which is where I got my big list, but the query I used gives different results now: |
| 19:02:03 | <pokechu22> | https://wikiapiary.com/w/index.php?title=Special:Ask&limit=500&q=%5B%5BCategory%3AWebsite%5D%5D+%5B%5BHas+farm%3A%3AFarm%3ASourceForge%5D%5D+%5B%5BIs+defunct%3A%3Afalse%5D%5D&p=mainlabel%3D-2D%2Fformat%3Dtable&po=%3F%3DWiki%0A%3FHas+pages+count%3DPages%0A%3FHas+edit+count%3DEdits%0A%3FHas+API+URL%3DAPI%0A%3FHas+URL%3DURL%0A%3FHas+Internet+Archive+added+date%3DIA%0A%3FHas+imag |
| 19:02:05 | <pokechu22> | es+count%3DFiles%0A%3FCapture+date%3DLast+sample%0A |
| 19:02:13 | <balrog> | pokechu22: my point is that this isn't only wikis here |
| 19:02:14 | | nicolas17 joins |
| 19:02:24 | <balrog> | this is HTTPS sites, whatever the maintainers chose to run |
| 19:02:26 | <pokechu22> | Yeah, I understand that |
| 19:02:57 | <balrog> | I suspect that 99%+ of them redirect to https://sourceforge.net/projectname or so |
| 19:03:09 | <balrog> | and/or the maintainers never uploaded files |
| 19:03:18 | <pokechu22> | Does every project have a subdomain like that? (I also remember seeing some things on sourceforge.io and I don't know what the difference was) |
| 19:04:23 | <balrog> | I believe yes all do have a subdomain but most don't have content, and those subdomains redirect to https://sourceforge.net/projects/projectname for those that don't have content there |
| 19:05:28 | <balrog> | yeah that's the behavior that I see. For some the "content" there is just a redirect to another website |
| 19:05:50 | <pokechu22> | My old notes mention https://bibdesk.sourceforge.io/mediawiki as something broken, and it looks like https://bibdesk.sourceforge.net/ redirects to https://bibdesk.sourceforge.io/. I've definitely seen other ones that just redirect off site (e.g. to github) |
| 19:06:20 | <pokechu22> | another .io redirect: https://lynkeos.sourceforge.net/ -> https://lynkeos.sourceforge.io/ - this one does have a working wiki |
| 19:06:57 | <pokechu22> | ... and on the other hand, https://alphaplot.sourceforge.net/wiki redirects to http://www.alphaplotwiki.com/, which is now dead (but I did previously save it) |
| 19:07:05 | <fireonlive> | directory; sorting by name and paginating though only lets you go to 999 before it errors out |
| 19:07:06 | <balrog> | .io is a separate newer infra apparently https://sourceforge.net/p/forge/documentation/Project%20Web%20Services/#php-version-and-io-domain |
| 19:07:10 | <fireonlive> | https://sourceforge.net/directory/?sort=name&page=999 vs https://sourceforge.net/directory/?sort=name&page=1000 |
| 19:07:28 | <fireonlive> | 999 ends on "Boy on Riddlin" so it's not everything |
| 19:08:06 | <pokechu22> | Looks like https://micro-os-plus.sourceforge.io/ redirects to https://micro-os-plus.sourceforge.net/ so .net vs .io doesn't matter apart from making AB's on-site handling a bit of a mess |
| 19:08:09 | <balrog> | fireonlive: sigh. and their UI shows buttons for 1000+ |
| 19:08:31 | <fireonlive> | yeah :/ |
| 19:08:46 | <balrog> | > Projects web space is a subdomain under .sourceforge.io and uses PHP 7. Please be prepared for PHP 8 upgrades which are expected later in the year. |
| 19:08:46 | <balrog> | > Projects registered before Nov 2016 started on an older service using PHP 5.4 and subdomain of sourceforge.net. If you have an older project, you can switch your project web over at any point using the project web settings under Admin -> Project Web Hosting -> PHP Version. |
| 19:09:00 | <fireonlive> | looks like you can search aa* ab* etc (but not a*) |
| 19:09:04 | <balrog> | so .sourceforge.net are PHP 5.4 (and at risk), while .sourceforge.io are PHP7 (and not yet at risk) |
| 19:10:16 | <fireonlive> | "We based our search system on the Lucene search engine. We expose a sub-set of the Lucene Query Parser Syntax so you can try some advanced queries." (https://sourceforge.net/p/forge/documentation/Finding%20Software/) |
| 19:10:58 | <pokechu22> | I should also note that I get a 429 when using wikiteam tools after ~400, 500 pages (and then wikiteam tools immediately stop rather than retrying). This happens both with no delay specified and with a 1-second delay between each request. Dunno what causes it exactly. |
| 19:11:55 | | Naruyoko joins |
| 19:12:10 | <fireonlive> | name:/title:/project:aa* don't seem to work though; so i guess rough search is all you can get |
| 19:16:26 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 19:16:29 | | AmAnd0A joins |
| 19:17:00 | <pokechu22> | russss: looks like https://audioscrobbler.sourceforge.net/wiki/ is broken (which doesn't surprise me but is still unfortunate). https://audioscrobbler.sourceforge.net/wiki/index.php is borked and https://audioscrobbler.sourceforge.net/wiki/api.php doesn't exist at all |
| 19:17:59 | <fireonlive> | https://gitlab.softwareheritage.org/swh/meta/-/issues/735 |
| 19:18:03 | <fireonlive> | interesting issue here |
| 19:18:37 | <fireonlive> | looks like all projects might be in the sitemaps? |
| 19:18:54 | <russss> | pokechu22: I looked and I'm not sure there was ever anything in that wiki |
| 19:19:11 | <fireonlive> | found via: https://forge.softwareheritage.org/T735 |
| 19:19:16 | <russss> | I think that was from some separate Sourceforge wiki product |
| 19:19:31 | <pokechu22> | https://web.archive.org/web/*/https://audioscrobbler.sourceforge.net/* makes it look like there is something... |
| 19:19:44 | <pokechu22> | ah, not mediawiki, though: https://web.archive.org/web/20030415211152/http://audioscrobbler.sourceforge.net:80/wiki/index.php/Bob's%20Happy%20Fun%20Page |
| 19:19:52 | <fireonlive> | oh gitlab has the same content |
| 19:23:06 | | thenes quits [Quit: WeeChat 2.8] |
| 19:24:34 | <russss> | it was a huge blast from the past to get the email from Sourceforge saying it would be shut down, I had no idea it was still there |
| 19:24:43 | | thenes (thenes) joins |
| 19:25:11 | <russss> | and that PHP version is massively long out of support. someone said that the version they're upgrading to is also out of support! |
| 19:25:37 | <russss> | so kudos to SF for keeping it up this long I guess? |
| 19:26:31 | <fireonlive> | i ran https://forge.softwareheritage.org/source/snippets/browse/master/listers/sourceforge/sourceforge-ls-projects.py and got 108,911 results: https://transfer.archivete.am/inline/13m9To/sourceforge-projects.txt |
| 19:26:49 | <fireonlive> | seems low, though. unless sourceforge isn't just htat popular |
| 19:27:10 | <fireonlive> | though.. the tool does exclude the projects namespace assuming /p/ exists too hm |
| 19:36:22 | <fireonlive> | oh doesn't actually seem to be any '.net/projects/' links in those sitemaps |
| 19:36:32 | <fireonlive> | gotta go for a bit; gl :3 |
| 19:55:20 | | razul joins |
| 19:59:38 | | gfhh quits [Ping timeout: 252 seconds] |
| 20:29:50 | | pseudorizer quits [Client Quit] |
| 20:30:33 | | pseudorizer (pseudorizer) joins |
| 21:00:41 | | TheTechRobo quits [Ping timeout: 252 seconds] |
| 21:02:46 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 21:03:25 | | AmAnd0A joins |
| 21:05:08 | | TheTechRobo (TheTechRobo) joins |
| 21:13:44 | <fireonlive> | on 2020-10-22 it was reported there were 480,711 projects and in 2021 317,973 (via https://gitlab.softwareheritage.org/swh/meta/-/issues/735) and now (maybe?) 108,911? i can't see anything about them purging inactive projects on a quick search... 209,062 less projects in 2 years? unless sitemap doesn't list everything anymore or there's a bug in |
| 21:13:44 | <fireonlive> | that code i didn't quickly spot :p |
| 21:15:26 | <fireonlive> | (or that's the old school system and the others are on .io?) |
| 21:15:30 | <fireonlive> | (not sure) |
| 21:16:57 | | AmAnd0A quits [Ping timeout: 258 seconds] |
| 21:17:40 | | AmAnd0A joins |
| 21:21:10 | | graham joins |
| 21:21:31 | | Dango360 quits [Read error: Connection reset by peer] |
| 21:31:17 | <@JAA> | https://sourceforge.net/directory/?clear says there should be at least 192k projects. |
| 21:31:49 | <@JAA> | Some of the filters are overlapping (e.g. projects supporting multiple OS), others don't seem to cover all projects (e.g. Status), but 192k hits for Windows projects, so it should be at least that. |
| 21:33:18 | <fireonlive> | hmm so 108,911 is deffo undercounting then |
| 21:34:10 | <@JAA> | Also, #sourceforget exists, let's use it. |
| 21:35:53 | | lukash quits [Ping timeout: 252 seconds] |
| 22:08:48 | <betamax> | JAA: how is progress on the school of dragons archive? The game shuts down on Friday. |
| 22:11:25 | <@JAA> | betamax: 12:41:48 <@JAA> Over 80% of the Knowledge Adventure CDN is done. ETA is still early hours of the 28th. |
| 22:13:50 | <nicolas17> | betamax: this is one of the files that appeared in the last few weeks, sad http://media.schoolofdragons.com/Content/DWAPromos/en-US/SoD-061623_ClosingSale.jpg |
| 22:17:33 | | hitgrr8 quits [Client Quit] |
| 22:24:09 | <betamax> | thanks! (I assume knowledge advebture CDN == School of Dragons? |
| 22:24:26 | <betamax> | (sorry I'm a bit out of the loop) |
| 22:24:56 | <nicolas17> | yes, origin.ka.cdn |
| 22:38:39 | <fireonlive> | closing sale?? |
| 22:38:45 | <fireonlive> | for a game that won't be playable? |
| 22:38:49 | <fireonlive> | o_o |
| 22:39:03 | <fireonlive> | s/playable/playable very shortly after/ |
| 22:39:26 | <FireFly> | that's kind of a weird concept yeah |
| 22:39:30 | <FireFly> | unless it's like, physical merch |
| 22:39:33 | <@JAA> | 'We'll slaughter your beloved game next week. Now give us monies please.' |
| 22:39:39 | <FireFly> | (but I guess not) |
| 22:40:10 | <nicolas17> | might be the price of things within the game, in in-game money |
| 22:41:51 | <@JAA> | Could be, their original announcement from a few weeks ago did say that purchases would be disabled and to use in-game currency before the shutdown. |
| 22:42:30 | <fireonlive> | ahh |
| 22:43:30 | <fireonlive> | hmm.. lol. i guess if in app purchases are disabled... but at that point just make everything free i guess for one last hurrah lol |
| 22:43:54 | <@JAA> | Yeah, that's what other games have done. No reason not to, really. |
| 22:44:53 | <@JAA> | Regarding the CDN/bucket, when the current download completes, I'll rerun the bucket listing and grab anything that was missed. Then there are a few objects that weren't downloaded correctly due to URL encoding reasons (question marks in names...). |
| 22:45:18 | <@JAA> | s/that was missed/that's new/ I guess |
| 22:57:50 | | Doranwen quits [Ping timeout: 252 seconds] |
| 23:02:09 | | Doranwen (Doranwen) joins |
| 23:06:18 | | BlueMaxima joins |
| 23:07:33 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 23:07:42 | | BlueMaxima joins |
| 23:18:05 | <h2ibot> | Manu edited ArchiveTeam Warrior (+255, Add info that an rsync max connection error is…): https://wiki.archiveteam.org/?diff=50013&oldid=49883 |
| 23:21:03 | <Ryz> | Anyone trying to tackle Microsoft Language Portal? It's shutting down on 2023 June 30 as per https://old.reddit.com/r/Archiveteam/comments/1488sdy/microsoft_language_portal_will_be_removed_on_june/ |
| 23:25:45 | <fireonlive> | afaik no |
| 23:26:24 | <fireonlive> | it says some or all? of it will be moved to microsoft learn but... we all know how careful companies are |
| 23:32:53 | <mikolaj|m> | there's large Polish-language PhpBB forum farm at https://www.fora.pl/ existing since 2005. A large portion of the forums is dead, there was also a pruning done in 2016 by the server owners |
| 23:34:38 | <mikolaj|m> | thinking about archiving it. Is this doable for ArchiveBot? I presume that yall need some discussion or reason to throw larger sites into ArchiveBot |
| 23:53:06 | <fireonlive> | you could ask in #archivebot there seem to be a few people pumping away commands at the moment in there |
| 23:55:02 | | imer quits [Ping timeout: 252 seconds] |
| 23:55:10 | <pokechu22> | I don't know polish, but those look like large numbers in the statistics section at the bottom... |
| 23:55:34 | <Ryz> | Hmm, mikolaj|m, it looks like it's not just https://www.fora.pl/ to grab, since it's just a hub, but also the countless subdomains that represent the forums they are hosting, like http://www.naturalnemetody.fora.pl/ - coming from https://www.fora.pl/?file=cat&md=index&cid=15 |
| 23:55:47 | | imer (imer) joins |
| 23:55:52 | <flashfire42> | I was about to throw it in until you said that ryz |
| 23:55:54 | <pokechu22> | 82 078 forums, 5 616 613 users, 159 344 842 "statements" (not sure if those are threads or posts) |
| 23:56:13 | <flashfire42> | Time to start chipping away at the individual forums? |
| 23:56:27 | <Ryz> | 82 thousand forums?! Oo;;; |
| 23:56:29 | <flashfire42> | I can start queueing those 1 at a time until I forget what I am doing |
| 23:56:30 | <pokechu22> | I like how http://www.random.fora.pl/ seems to be an actual forum in addition to a redirect site |
| 23:57:01 | <pokechu22> | I've done long lists, but those were usually a few hundred at most; 82k is probably too much to manually handle |
| 23:57:12 | <Ryz> | ...82 thousand is holy shit, a lot Oo; |
| 23:57:35 | | wickedplayer494 quits [Ping timeout: 265 seconds] |
| 23:57:37 | <flashfire42> | I once scraped a forum by hand for links to other sites. I will start if I get given the OK |
| 23:58:08 | <pokechu22> | It looks like the list is at https://img.mruczek.trade/fora.txt which is "only" 64790 |
| 23:58:09 | <Ryz> | First things first, need to find where are the 82k amount of forums can be found, if it's more than just what the example source link is about~ |
| 23:58:48 | <pokechu22> | That's from line 279 of view-source:http://www.random.fora.pl/ |