00:02:02hitgrr8_ quits [Client Quit]
00:11:08Sanqui_ joins
00:13:58@Sanqui quits [Ping timeout: 252 seconds]
01:20:25Sanqui_ quits [Client Quit]
01:23:11Sanqui joins
01:23:13Sanqui quits [Changing host]
01:23:13Sanqui (Sanqui) joins
01:23:13@ChanServ sets mode: +o Sanqui
02:10:07<michaelblob_>https://archive.org/details/wiki-arwikishianet https://archive.org/details/wiki-eswikishianet https://archive.org/details/wiki-ruwikishianet https://archive.org/details/wiki-idwikishianet https://archive.org/details/wiki-dewikishianet https://archive.org/details/wiki-frwikishianet https://archive.org/details/wiki-bnwikishianet
02:10:33<michaelblob_>https://archive.org/details/wiki-tgwikishianet https://archive.org/details/wiki-azwikishianet https://archive.org/details/wiki-ptwikishianet https://archive.org/details/wiki-itwikishianet https://archive.org/details/wiki-hawikishianet https://archive.org/details/wiki-pswikishianet https://archive.org/details/wiki-thwikishianet
02:10:47<michaelblob_>https://archive.org/details/wiki-mywikishianet https://archive.org/details/wiki-commonswikishianet https://archive.org/details/wiki-urwikishianet
02:12:06<michaelblob_>got internal api error with {fa|en|zh|hi|sw}.wikishia.net apis, tr.wikishia.net downloaded xml fine but ran into incorrect header check when downloading the first couple images
02:12:34<michaelblob_>still running tcrf.net generasia.com and linkedopendata.eu
02:17:37michaelblob_ quits [Read error: Connection reset by peer]
03:05:00<yzqzss|m>https://archive.org/details/www.dokuwiki.org-20230323
03:05:00<yzqzss|m>Can the wikidump for DokuWiki go into the WikiTeam collection? dokuWikiUploader is ready. :-D
04:49:07michaelblob (michaelblob) joins
04:50:34<michaelblob>yzqzss|m: would be nice if the identifier generation followed the same format as the wikiteam format for consistency
05:06:54<yzqzss|m>michaelblob: Do I just need to add the "wiki-" prefix?
05:17:42<michaelblob>see https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/utils/domain.py#L5
05:19:21<michaelblob>the actual identifier uses that prefix here to generate the wikiname https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L99
05:19:40<michaelblob>then the identifier is just adding 'wiki-' in front https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L112
05:20:23<michaelblob>i see you have the date append by default, which i believe should be the proper way to differentiate dumps (see https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L115)
05:21:27<michaelblob>kinda just being picky here :) but it does keep things consistent across wikiteam-related projects
05:24:43vitzli (vitzli) joins
05:27:02vitzli quits [Client Quit]
05:29:40eroc1990 quits [Ping timeout: 252 seconds]
06:06:24<yzqzss|m>michaelblob: I know the process of wikiname and indentifier. I just forgot to add "wiki-" as indentifier prefix in dokuwikiuploader. lol
06:06:31<yzqzss|m>In the dokuwikidumper version of domain2prefix(), I didn't remove all the dots "." in the domain name from the domain name. This will make wikidump's directory names more recognizable. (If this is not correct, I can just remove all the dots in the indentifier) https://github.com/saveweb/dokuwiki-dumper/blob/93e7e32e7f20ea64c80f9d427660a2059bd1eb8a/dokuWikiDumper/utils/util.py#L100
06:06:56<yzqzss|m>Other than that, most of the process is not too different from wikiteam/mw-scraper. (Since DokuWiki is relatively simple, I didn't design the launcher.py, so I put the 7z compression process in uploader.py)
06:18:50<michaelblob>there's not really a "correct" way to do it but i personally prefer to use only "-" and "_" in the names, like what wikiteam's naming convention currently uses
06:20:30<michaelblob>i would just remove the dots and keep the identifier format aligned with what wikiteam uses
06:22:39<@JAA>I find the wikiteam identifiers without dots very ugly.
06:22:52<@JAA>¯\_(ツ)_/¯
06:23:42<yzqzss|m>JAA: Haha, you said what I wanted to say.
06:35:05<pokechu22>Yeah, the ones without dots and with - replaced with _ are a bit annoying, especially when it introduces ambiguity
07:04:01hitgrr8 joins
07:30:00<yzqzss|m>For domains that are punycoded or unicoded, this is an absolute disaster.
07:30:00<yzqzss|m>"http://你好.信息/" -> "____" (all replaced!!!!)
07:30:00<yzqzss|m>"http://xn--6qq79v.xn--vuq861b/" -> "xn__6qq79vxn__vuq861b"
11:06:49Sanqui|m quits [Changing host]
11:06:49Sanqui|m (Sanqui) joins
11:06:49@ChanServ sets mode: +o Sanqui|m
14:03:46Matthww1 quits [Quit: Ping timeout (120 seconds)]
14:07:10qwertyasdfuiopghjkl quits [Remote host closed the connection]
14:10:13<michaelblob>fair enough, i suppose we could rework the identifier generation for mediawiki while we're at it then
14:14:51qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
14:16:53Matthww1 joins
15:39:43Bedivere joins
15:41:49Sir_Bedivere quits [Ping timeout: 252 seconds]
16:11:25<yzqzss|m>requests.exceptions.HTTPError: error uploading acwiki.xyz-20230326-dumpMeta/config.json to wiki-acwiki.xyz-20230326, Please reduce your request rate. - Your upload of wiki-acwiki.xyz-20230326 from username yzqzss@yandex.com appears to be spam. If you believe this is a mistake, contact info@archive.org and include this entire message in your email.Strange, I can't upload acwiki.xyz (DokuWiki), before I send an email, does anyone have a suggestion
16:11:25<yzqzss|m>about this?
16:15:25<yzqzss|m>Oh, I accidentally clicked Enter and the message turned into a pile.( \n before "Strange, I can't ..." )
16:37:50Matthww1 quits [Client Quit]
16:37:50hitgrr8 quits [Client Quit]
16:37:50qwertyasdfuiopghjkl quits [Client Quit]
16:37:57hitgrr8 joins
16:38:37Matthww1 joins
16:41:01Matthww1 quits [Client Quit]
16:41:51Matthww1 joins
16:44:49Matthww1 quits [Client Quit]
16:45:38Matthww1 joins
16:57:52<pokechu22>yzqzss|m: I think there's a limit of 10 uploads per day initially, which info@archive.org can resolve (though that might be separate from their spam check)
17:05:25kdqep quits [Ping timeout: 252 seconds]
17:50:24Sanqui_ joins
17:50:43@Sanqui quits [Read error: Connection reset by peer]
17:56:55Sanqui_ quits [Changing host]
17:56:55Sanqui_ (Sanqui) joins
17:56:55@ChanServ sets mode: +o Sanqui_
17:57:03@Sanqui_ is now known as @Sanqui
20:00:15kdqep (kdqep) joins
20:25:27Matthww1 quits [Client Quit]
20:26:21Matthww1 joins
20:56:17Sir_Bedivere joins
20:57:02Sir_Bedivere quits [Remote host closed the connection]
20:57:25Sir_Bedivere joins
20:58:37Bedivere quits [Ping timeout: 252 seconds]
21:38:30Kuatrero joins
21:42:51Sir_Bedivere quits [Ping timeout: 265 seconds]
21:46:46hitgrr8 quits [Client Quit]
21:47:17Gooshka joins
21:47:32<Gooshka>https://database.factgrid.de/wiki/Main_Page - wiki for historians
21:47:44<Gooshka>It is based on wikibase
21:57:21Gooshka quits [Ping timeout: 265 seconds]
22:05:43TheTechRobo quits [Quit: bye]
22:07:49TheTechRobo (TheTechRobo) joins
22:12:56TheTechRobo quits [Remote host closed the connection]
23:28:28Iki joins
23:28:57Iki1 joins
23:32:48Iki quits [Ping timeout: 252 seconds]
23:38:51TheTechRobo (TheTechRobo) joins