01:02:43 | <pabs> | cloudflare turnstile on https://akvopedia.org/ |
01:07:27 | <pabs> | 403 in wikibot and with curl locally https://www.appropedia.org/ |
01:08:37 | <pabs> | hmm, not TLS fingerprinting though |
01:09:35 | <BlankEclair> | trying to dump, but also trying to figure out the http cookie file format |
01:10:27 | <BlankEclair> | oh i forgot expiry |
01:13:32 | <pabs> | BlankEclair: these curl params seem to be needed: -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0' -H 'Pragma: no-cache' |
01:13:55 | <BlankEclair> | for cf_clearance, you need to send the cookie (duh), same ip, and same user agent |
01:14:14 | <BlankEclair> | now dumping |
01:25:29 | | pabs quits [Ping timeout: 260 seconds] |
01:36:05 | | pabs (pabs) joins |
01:40:39 | | davispuh quits [Ping timeout: 260 seconds] |
01:43:44 | <pabs> | the params were for https://www.appropedia.org/ |
01:48:18 | <BlankEclair> | https://archive.org/details/wiki-akvopedia.org_s_wiki-20250806 |
01:48:21 | <BlankEclair> | ah okay |
01:51:43 | | pabs needs to get curlmin setup... |
01:52:19 | | nepeat quits [Ping timeout: 260 seconds] |
01:57:54 | | nepeat (nepeat) joins |
02:25:46 | <pabs> | cool, curlmin works great |
02:25:50 | <pabs> | curlmin++ |
02:25:51 | <eggdrop> | [karma] 'curlmin' now has 1 karma! |
02:25:57 | <pabs> | dh-make-golang++ |
02:25:58 | <eggdrop> | [karma] 'dh-make-golang' now has 1 karma! |
02:26:27 | <pabs> | hmm, curlmin does need a mode for using args instead of stdin or a string... |
02:28:14 | <pabs> | time to learn Golang :/ |
04:46:22 | | katia leaves |
04:56:24 | | katia (katia) joins |
06:10:39 | | Matthww quits [Quit: The Lounge - https://thelounge.chat] |
08:26:04 | <@arkiver> | i remember talking about this before, but can't find it in my logs |
08:26:42 | <@arkiver> | is there a way to list the domains used for hosting images for a wiki? |
08:26:48 | <@arkiver> | i don't see it in the siteinfo API data |
08:28:31 | <@arkiver> | i do see something there is an "externalimages" field in the siteinfo data |
08:39:55 | <@arkiver> | DigitalDragon: i'm just going to dump some more questions in here |
08:40:33 | <@arkiver> | are there cases of pages that have an API page like https://howtotrainyourdragon.fandom.com/api.php?action=query&prop=info&inprop=url&titles=Toothless_(Franchise) but no HTML rendered version of that page? |
08:42:27 | <@arkiver> | i also hope to have a look at the API responses for every _type_ of wiki suported by the wikiteam tools (and perhaps some that are not supported?) |
08:44:51 | <@arkiver> | are there cases of wikis that do not have their API publicly accessible? |
09:08:52 | | davispuh joins |
09:15:04 | | davispuh quits [Ping timeout: 260 seconds] |
11:22:18 | | Matthww joins |
13:59:18 | <DigitalDragons> | for images: yes, see https://mediawiki.org/wiki/API:Filerepoinfo ex. https://howtotrainyourdragon.fandom.com/api.php?action=query&meta=filerepoinfo |
14:00:54 | <@arkiver> | DigitalDragons: tank you |
14:00:56 | <@arkiver> | thank* |
14:30:52 | <DigitalDragons> | I've seen examples of "pages" on Fandom that redirect to non-mediawiki discussion posts, and also some wikis in other places with "private" pages that still appear in the API |
14:31:23 | <DigitalDragons> | Some wikis do have apis disabled yes |
14:34:59 | <@arkiver> | DigitalDragons: do you think it would be problematic for sites if we load the filerepoinfo page for each wiki page? |
14:35:26 | <@arkiver> | we could also not do it, but it'll be a bit less clean |
14:51:47 | <@arkiver> | for discovery, this will only discover in-wiki links |
14:58:54 | | leo60228 quits [Quit: ZNC 1.8.2 - https://znc.in] |
14:59:44 | | leo60228 (leo60228) joins |
15:10:57 | <DigitalDragons> | would it be substantially less clean? i'm not sure how expensive that call specifically is, but it's probably good to keep requests lower |
15:13:25 | <DigitalDragons> | we've had Fandom pop into #wikibot and ask us to chill for scraping more than one or two wikis at the same time |
18:58:36 | | DogsRNice joins |
19:11:40 | | DogsRNice quits [Client Quit] |
20:41:06 | | taavi quits [Remote host closed the connection] |
20:42:53 | | taavi (taavi) joins |
21:08:43 | | nulldata-alt (nulldata) joins |
21:41:27 | | useretail joins |
22:39:28 | | TheTechRobo quits [Quit: Ping timeout (120 seconds)] |
22:42:58 | | TheTechRobo (TheTechRobo) joins |
23:09:49 | | nulldata-alt quits [Ping timeout: 260 seconds] |