00:02:02 | | hitgrr8_ quits [Client Quit] |
00:11:08 | | Sanqui_ joins |
00:13:58 | | @Sanqui quits [Ping timeout: 252 seconds] |
01:20:25 | | Sanqui_ quits [Client Quit] |
01:23:11 | | Sanqui joins |
01:23:13 | | Sanqui is now authenticated as Sanqui |
01:23:13 | | Sanqui quits [Changing host] |
01:23:13 | | Sanqui (Sanqui) joins |
01:23:13 | | @ChanServ sets mode: +o Sanqui |
02:10:07 | <michaelblob_> | https://archive.org/details/wiki-arwikishianet https://archive.org/details/wiki-eswikishianet https://archive.org/details/wiki-ruwikishianet https://archive.org/details/wiki-idwikishianet https://archive.org/details/wiki-dewikishianet https://archive.org/details/wiki-frwikishianet https://archive.org/details/wiki-bnwikishianet |
02:10:33 | <michaelblob_> | https://archive.org/details/wiki-tgwikishianet https://archive.org/details/wiki-azwikishianet https://archive.org/details/wiki-ptwikishianet https://archive.org/details/wiki-itwikishianet https://archive.org/details/wiki-hawikishianet https://archive.org/details/wiki-pswikishianet https://archive.org/details/wiki-thwikishianet |
02:10:47 | <michaelblob_> | https://archive.org/details/wiki-mywikishianet https://archive.org/details/wiki-commonswikishianet https://archive.org/details/wiki-urwikishianet |
02:12:06 | <michaelblob_> | got internal api error with {fa|en|zh|hi|sw}.wikishia.net apis, tr.wikishia.net downloaded xml fine but ran into incorrect header check when downloading the first couple images |
02:12:34 | <michaelblob_> | still running tcrf.net generasia.com and linkedopendata.eu |
02:17:37 | | michaelblob_ quits [Read error: Connection reset by peer] |
03:05:00 | <yzqzss|m> | https://archive.org/details/www.dokuwiki.org-20230323 |
03:05:00 | <yzqzss|m> | Can the wikidump for DokuWiki go into the WikiTeam collection? dokuWikiUploader is ready. :-D |
04:49:07 | | michaelblob (michaelblob) joins |
04:50:34 | <michaelblob> | yzqzss|m: would be nice if the identifier generation followed the same format as the wikiteam format for consistency |
05:06:54 | <yzqzss|m> | michaelblob: Do I just need to add the "wiki-" prefix? |
05:17:42 | <michaelblob> | see https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/utils/domain.py#L5 |
05:19:21 | <michaelblob> | the actual identifier uses that prefix here to generate the wikiname https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L99 |
05:19:40 | <michaelblob> | then the identifier is just adding 'wiki-' in front https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L112 |
05:20:23 | <michaelblob> | i see you have the date append by default, which i believe should be the proper way to differentiate dumps (see https://github.com/mediawiki-client-tools/mediawiki-scraper/blob/934161b6f9f987250ef656f66abeb35c97c203e2/wikiteam3/uploader.py#L115) |
05:21:27 | <michaelblob> | kinda just being picky here :) but it does keep things consistent across wikiteam-related projects |
05:24:43 | | vitzli (vitzli) joins |
05:27:02 | | vitzli quits [Client Quit] |
05:29:40 | | eroc1990 quits [Ping timeout: 252 seconds] |
06:06:24 | <yzqzss|m> | michaelblob: I know the process of wikiname and indentifier. I just forgot to add "wiki-" as indentifier prefix in dokuwikiuploader. lol |
06:06:31 | <yzqzss|m> | In the dokuwikidumper version of domain2prefix(), I didn't remove all the dots "." in the domain name from the domain name. This will make wikidump's directory names more recognizable. (If this is not correct, I can just remove all the dots in the indentifier) https://github.com/saveweb/dokuwiki-dumper/blob/93e7e32e7f20ea64c80f9d427660a2059bd1eb8a/dokuWikiDumper/utils/util.py#L100 |
06:06:56 | <yzqzss|m> | Other than that, most of the process is not too different from wikiteam/mw-scraper. (Since DokuWiki is relatively simple, I didn't design the launcher.py, so I put the 7z compression process in uploader.py) |
06:18:50 | <michaelblob> | there's not really a "correct" way to do it but i personally prefer to use only "-" and "_" in the names, like what wikiteam's naming convention currently uses |
06:20:30 | <michaelblob> | i would just remove the dots and keep the identifier format aligned with what wikiteam uses |
06:22:39 | <@JAA> | I find the wikiteam identifiers without dots very ugly. |
06:22:52 | <@JAA> | ¯\_(ツ)_/¯ |
06:23:42 | <yzqzss|m> | JAA: Haha, you said what I wanted to say. |
06:35:05 | <pokechu22> | Yeah, the ones without dots and with - replaced with _ are a bit annoying, especially when it introduces ambiguity |
07:04:01 | | hitgrr8 joins |
07:30:00 | <yzqzss|m> | For domains that are punycoded or unicoded, this is an absolute disaster. |
07:30:00 | <yzqzss|m> | "http://你好.信息/" -> "____" (all replaced!!!!) |
07:30:00 | <yzqzss|m> | "http://xn--6qq79v.xn--vuq861b/" -> "xn__6qq79vxn__vuq861b" |
11:06:49 | | Sanqui|m is now authenticated as Sanqui |
11:06:49 | | Sanqui|m quits [Changing host] |
11:06:49 | | Sanqui|m (Sanqui) joins |
11:06:49 | | @ChanServ sets mode: +o Sanqui|m |
14:03:46 | | Matthww1 quits [Quit: Ping timeout (120 seconds)] |
14:07:10 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
14:10:13 | <michaelblob> | fair enough, i suppose we could rework the identifier generation for mediawiki while we're at it then |
14:14:51 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
14:16:53 | | Matthww1 joins |
15:39:43 | | Bedivere joins |
15:41:49 | | Sir_Bedivere quits [Ping timeout: 252 seconds] |
16:11:25 | <yzqzss|m> | requests.exceptions.HTTPError: error uploading acwiki.xyz-20230326-dumpMeta/config.json to wiki-acwiki.xyz-20230326, Please reduce your request rate. - Your upload of wiki-acwiki.xyz-20230326 from username yzqzss@yandex.com appears to be spam. If you believe this is a mistake, contact info@archive.org and include this entire message in your email.Strange, I can't upload acwiki.xyz (DokuWiki), before I send an email, does anyone have a suggestion |
16:11:25 | <yzqzss|m> | about this? |
16:15:25 | <yzqzss|m> | Oh, I accidentally clicked Enter and the message turned into a pile.( \n before "Strange, I can't ..." ) |
16:37:50 | | Matthww1 quits [Client Quit] |
16:37:50 | | hitgrr8 quits [Client Quit] |
16:37:50 | | qwertyasdfuiopghjkl quits [Client Quit] |
16:37:57 | | hitgrr8 joins |
16:38:37 | | Matthww1 joins |
16:41:01 | | Matthww1 quits [Client Quit] |
16:41:51 | | Matthww1 joins |
16:44:49 | | Matthww1 quits [Client Quit] |
16:45:38 | | Matthww1 joins |
16:57:52 | <pokechu22> | yzqzss|m: I think there's a limit of 10 uploads per day initially, which info@archive.org can resolve (though that might be separate from their spam check) |
17:05:25 | | kdqep quits [Ping timeout: 252 seconds] |
17:50:24 | | Sanqui_ joins |
17:50:43 | | @Sanqui quits [Read error: Connection reset by peer] |
17:56:55 | | Sanqui_ is now authenticated as Sanqui |
17:56:55 | | Sanqui_ quits [Changing host] |
17:56:55 | | Sanqui_ (Sanqui) joins |
17:56:55 | | @ChanServ sets mode: +o Sanqui_ |
17:57:03 | | @Sanqui_ is now known as @Sanqui |
20:00:15 | | kdqep (kdqep) joins |
20:25:27 | | Matthww1 quits [Client Quit] |
20:26:21 | | Matthww1 joins |
20:56:17 | | Sir_Bedivere joins |
20:57:02 | | Sir_Bedivere quits [Remote host closed the connection] |
20:57:25 | | Sir_Bedivere joins |
20:58:37 | | Bedivere quits [Ping timeout: 252 seconds] |
21:38:30 | | Kuatrero joins |
21:42:51 | | Sir_Bedivere quits [Ping timeout: 265 seconds] |
21:46:46 | | hitgrr8 quits [Client Quit] |
21:47:17 | | Gooshka joins |
21:47:32 | <Gooshka> | https://database.factgrid.de/wiki/Main_Page - wiki for historians |
21:47:44 | <Gooshka> | It is based on wikibase |
21:57:21 | | Gooshka quits [Ping timeout: 265 seconds] |
22:05:43 | | TheTechRobo quits [Quit: bye] |
22:07:49 | | TheTechRobo (TheTechRobo) joins |
22:12:56 | | TheTechRobo quits [Remote host closed the connection] |
23:28:28 | | Iki joins |
23:28:57 | | Iki1 joins |
23:32:48 | | Iki quits [Ping timeout: 252 seconds] |
23:38:51 | | TheTechRobo (TheTechRobo) joins |