| 00:04:15 | | wyatt8740 joins |
| 00:45:04 | | britmob quits [Read error: Connection reset by peer] |
| 00:54:26 | | MaxG-1 quits [Remote host closed the connection] |
| 00:55:19 | | britmob joins |
| 01:02:20 | | dm4v quits [Client Quit] |
| 01:03:05 | | dm4v joins |
| 01:03:07 | | dm4v is now authenticated as dm4v |
| 01:03:07 | | dm4v quits [Changing host] |
| 01:03:07 | | dm4v (dm4v) joins |
| 01:30:04 | | britmob quits [Read error: Connection reset by peer] |
| 01:42:30 | | britmob joins |
| 01:44:23 | | Edsavoie_srv quits [Ping timeout: 244 seconds] |
| 02:01:55 | | Iki quits [Read error: Connection reset by peer] |
| 02:57:45 | | britmob quits [Read error: Connection reset by peer] |
| 02:59:34 | | ThreeHM quits [Ping timeout: 250 seconds] |
| 03:01:35 | | ThreeHM (ThreeHeadedMonkey) joins |
| 03:05:42 | <atphoenix> | this may or may not be at risk: https://viking.tv/ . To my knowledge it originated as part of Viking's response to the covid pandemic, and was regularly featured during pre-show ad rolls on PBS Masterpiece episodes. It has been replaced with a new pre-show ad roll. |
| 03:22:24 | | Krownest quits [Read error: Connection reset by peer] |
| 03:25:11 | | Krownest (Krownest) joins |
| 03:27:31 | | teej (teej) joins |
| 03:40:02 | | teej quits [Client Quit] |
| 03:52:11 | | qw3rty_ joins |
| 03:55:54 | | qw3rty__ quits [Ping timeout: 250 seconds] |
| 03:58:30 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:21:08 | | save_fn joins |
| 04:21:53 | | save_fn quits [Client Quit] |
| 04:22:02 | | AntiLiberal joins |
| 04:23:23 | | CrasherTN joins |
| 04:24:01 | | CrasherTN quits [Remote host closed the connection] |
| 04:24:13 | | AntiLiberal is now known as save_fn |
| 05:12:21 | | rsn quits [Ping timeout: 258 seconds] |
| 05:38:06 | | forkwhilefork quits [Quit: The Lounge - https://irc.rekt.app] |
| 06:13:03 | | nertzy_ joins |
| 06:14:08 | | nertzy__ quits [Ping timeout: 250 seconds] |
| 06:50:07 | | sec^nd quits [Ping timeout: 255 seconds] |
| 06:50:49 | | sec^nd (second) joins |
| 07:43:07 | | HP_Archivist (HP_Archivist) joins |
| 07:45:36 | | wyatt8750 joins |
| 07:46:00 | | wyatt8740 quits [Ping timeout: 250 seconds] |
| 07:54:26 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 09:44:33 | | wizards_ joins |
| 09:47:20 | | wizards quits [Ping timeout: 251 seconds] |
| 09:54:01 | | LeGoupil joins |
| 11:59:19 | | LeGoupil quits [Client Quit] |
| 12:39:55 | | yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/] |
| 12:40:03 | | yanome quits [Quit: The Lounge - https://thelounge.chat] |
| 12:40:14 | | ThreeHM quits [Ping timeout: 250 seconds] |
| 12:41:12 | | yanome (yano) joins |
| 12:41:49 | | yano (yano) joins |
| 12:46:59 | | ThreeHM (ThreeHeadedMonkey) joins |
| 12:50:40 | | KRG joins |
| 12:50:40 | | KRG is now authenticated as KRG |
| 13:17:18 | | KRG` joins |
| 13:17:39 | | KRG quits [Ping timeout: 258 seconds] |
| 13:18:59 | <jodizzle> | (from #archiveteam) https://hk.appledaily.com/ has been running in AB for a few days, and I've now some other domains |
| 13:19:35 | <jodizzle> | One important detail from the reddit post: "Many articles contain videos, but youtube-dl doesn't seem to work. I'm out of ideas on how to get them." (I haven't verified this myself.) |
| 13:21:49 | | KRG` is now known as KRG |
| 13:22:06 | | KRG is now authenticated as KRG |
| 13:23:25 | <Jake> | The videos on the pages are just m3u8s from a JS script tag |
| 13:26:57 | | KRG quits [Remote host closed the connection] |
| 13:31:39 | | achivarin (achivarin) joins |
| 13:38:37 | | ats (ats) joins |
| 13:48:05 | <jodizzle> | Okay, thanks. Looks like it should be possible to script something to assemble the .ts files. |
| 13:49:37 | <@OrIdow6> | If it's normally set up, you should just be able to pass the m3u8s into ffmpeg |
| 13:52:22 | <@OrIdow6> | Well, give it tthe url |
| 13:54:41 | <Jake> | It is normally setup ;) |
| 13:56:34 | | KRG joins |
| 13:56:34 | | KRG is now authenticated as KRG |
| 13:57:30 | <jodizzle> | Trying it and it does seem to work. Thanks! |
| 13:58:27 | <jodizzle> | I usually like to get the raw files in AB as well, though, but that's not too much work. |
| 13:59:04 | <jodizzle> | One problem might be iterating all the articles. Sitemaps don't seem to be complete. |
| 13:59:32 | | Mateon1 quits [Ping timeout: 250 seconds] |
| 14:05:46 | <Jake> | (I also wonder if the videos on the site are also on YouTube?) |
| 14:08:24 | | ragu__ joins |
| 14:09:03 | <jodizzle> | Ah, maybe. |
| 14:11:02 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 14:12:05 | | ragu_ quits [Ping timeout: 258 seconds] |
| 14:12:53 | | ragu_ joins |
| 14:14:00 | | ragu__ quits [Ping timeout: 258 seconds] |
| 14:14:42 | | Mateon1 joins |
| 14:14:54 | | ragu__ joins |
| 14:15:18 | | Jonboy345 joins |
| 14:18:59 | | ragu_ quits [Ping timeout: 258 seconds] |
| 14:22:54 | | Iki joins |
| 14:30:05 | | britmob joins |
| 14:41:31 | <@EggplantN> | hkdaily is AB job 2u3qbx8mpv42jxi76wq27fbb2 |
| 14:41:37 | <@EggplantN> | but it seems quite slow |
| 14:48:11 | <@EggplantN> | *cracks nuckles* |
| 14:48:15 | <@EggplantN> | sends IA 6Gbit >_> |
| 14:55:17 | <rewby> | Is this where we need to acquire some china telecom transit? |
| 14:56:55 | <@EggplantN> | i thing its just AB isnt the tool for the job |
| 14:57:04 | <@EggplantN> | or needs some more fine tuning |
| 14:57:46 | <@OrIdow6> | If you want to go really fast, 2 options I know of are Qwarc, and hackish backfeed warrior recursion |
| 14:59:28 | <rewby> | qwarc would go fast. But we'd need somewhere with good china connectivity. That said, I agree AB isn't the right tool for the job |
| 14:59:56 | <@EggplantN> | rewby sir? |
| 15:00:00 | <@EggplantN> | its akamai? |
| 15:00:02 | <@EggplantN> | not CT |
| 15:00:14 | <rewby> | Oh it's akamai? |
| 15:00:18 | <rewby> | I thought it was hosted in china |
| 15:00:22 | <rewby> | Did I look up the wrong domain |
| 15:00:27 | <@EggplantN> | no, even if it was it would be HK |
| 15:00:31 | <@EggplantN> | and HK is weird |
| 15:00:38 | <rewby> | That's a fair point |
| 15:01:18 | <@EggplantN> | does it have a usable sitemap or a way to find all the articles |
| 15:01:20 | <rewby> | If it's akamai, then yeah throw qwarc at the problem. I personally don't have amazing throughput to them but I'm sure someone here has |
| 15:02:06 | <rewby> | It appears to have a sitemap https://hk.appledaily.com/robots.txt |
| 15:02:52 | <@OrIdow6> | curl 'https://hk.appledaily.com/sitemap002.xml' | grep loc | wc -l gets me 14664, scale seems reasonable |
| 15:03:28 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 15:03:35 | <rewby> | OrIdow6: try the sitemap-index |
| 15:03:41 | <rewby> | It's got a ton of additional sitemaps listed |
| 15:04:17 | <@OrIdow6> | I noticed |
| 15:04:40 | <@OrIdow6> | You know, if we really want to panic grab |
| 15:04:55 | <@OrIdow6> | We can generate an URL list from the sitemaps and feed them into #// |
| 15:05:41 | <rewby> | I have scripts to do this... |
| 15:05:58 | <@OrIdow6> | To get the sitemaps? |
| 15:06:03 | | Jonboy345 joins |
| 15:06:05 | <rewby> | Yeah, to take sitemaps and turn them into urls |
| 15:06:16 | <rewby> | Including recursive sitemaps |
| 15:06:22 | <@OrIdow6> | Nice |
| 15:06:50 | <@OrIdow6> | DO we have an idea for the timescale for this? |
| 15:07:01 | <rewby> | It's not very fast, only single threaded. But give it like an hour or two and it'll extract the urls |
| 15:07:06 | <rewby> | *from a map this size ish |
| 15:07:09 | <@OrIdow6> | For the shutdown, I mean |
| 15:07:12 | <@OrIdow6> | Hm |
| 15:07:49 | <@OrIdow6> | Would it work to extract from the big list with grep, and then do the others in parallel from a bash script or something? |
| 15:08:06 | <@OrIdow6> | Cheap parallelization with &, I mean |
| 15:08:15 | <@EggplantN> | if you grabbed all onsite content via a warrior project + offsite links to #// ? |
| 15:08:21 | <@EggplantN> | we could scale up to like |
| 15:08:24 | <@EggplantN> | infinity |
| 15:08:31 | <@EggplantN> | have this done quick AF |
| 15:08:36 | <rewby> | OrIdow6: Not quite with grep. The format is a bit weird so you have to actually python parse it |
| 15:08:40 | <@OrIdow6> | Warrior projects still take a while to set up |
| 15:08:42 | <@OrIdow6> | rewby: Oh |
| 15:08:50 | <rewby> | I do actual xml parsing because spacing is not consistent |
| 15:08:58 | <rewby> | And sometimes there's multiple items on one line |
| 15:09:02 | <rewby> | And other times there's weird encodings |
| 15:09:11 | <rewby> | Better to just let lxml deal with it |
| 15:09:40 | <@OrIdow6> | EggplantN: Maybe do an initial pass of the sitemap with #//, and then set up something more complicated? |
| 15:09:58 | <rewby> | The problem with #// is that it selects items from the queue randomly, doesn't it? |
| 15:10:08 | <rewby> | so we wouldn't guarantee it actually finishes those urls in time |
| 15:10:11 | <@OrIdow6> | The last-~hour increase in people talking about this sounds to me like social media panic, but you never know |
| 15:10:47 | <@EggplantN> | uh |
| 15:10:55 | <@EggplantN> | rewby we can make it do it quickly-sih |
| 15:10:55 | <@OrIdow6> | I haven't been paying much attention to #// recently, is it at capacity? |
| 15:11:03 | <@EggplantN> | no but its not in an amazing state |
| 15:11:26 | <rewby> | Can we just duplicate the urls project code and run it on its own tracker with just these urls? |
| 15:11:32 | <rewby> | That's an "easy" way to scale to the moon quickly |
| 15:11:45 | <achivarin> | Hi. For the sitemap idea, check out my reddit thread for more information: https://www.reddit.com/r/DataHoarder/comments/o4r4jv/help_wanted_hong_kongs_prodemocracy_newspaper_in/ |
| 15:12:31 | <rewby> | Oh hm paywall. |
| 15:12:38 | <rewby> | Well, we can just put in the right cookie and that's fine |
| 15:13:01 | <@EggplantN> | oh |
| 15:13:06 | <@EggplantN> | i wonder if thats fucked AB over |
| 15:13:18 | <rewby> | I think it depends on whether you have the cookie or not? |
| 15:13:21 | <rewby> | Not sure |
| 15:13:39 | <@EggplantN> | it is a 200 response code |
| 15:13:49 | <rewby> | Oh I hate websites that do this |
| 15:15:22 | <achivarin> | Maybe try the googlebot user agent? It's also mentioned in the reddit thread. |
| 15:15:31 | <@EggplantN> | yeah that exact one didnt work for me |
| 15:15:37 | <@EggplantN> | plus it looks like a JS based paywall |
| 15:16:09 | <@OrIdow6> | So the correct text still gets sent in the response? |
| 15:16:28 | <rewby> | EggplantN: If I have adblock turned on I don't get paywalled |
| 15:16:36 | <@EggplantN> | i have adblock |
| 15:17:03 | <@OrIdow6> | Yeah, browsing around with Noscript gets me the right material |
| 15:17:28 | <achivarin> | Strange. In my testing I found that reading mode in Firefox just bypasses the wall, and if you go into developer console to delete the paywall box you can read the whole article. |
| 15:17:41 | <rewby> | Yeah so as long as we don't store cookies we should be fien |
| 15:17:46 | <@OrIdow6> | achivarin: Yes, that's what would be expected from a JS paywall |
| 15:17:55 | <rewby> | Heck, I think most of our tools ignore JS anyway |
| 15:18:10 | <achivarin> | Oh my bad |
| 15:19:07 | <@arkiver> | i believe JAA is able to archive lists of URLs at high speed into WARCs |
| 15:19:18 | <@OrIdow6> | Looks like the cookie may be set through JS anyway |
| 15:19:25 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 15:19:35 | <@OrIdow6> | Though I didn't really check that thouroughly |
| 15:20:23 | <@OrIdow6> | arkiver: We were talking about doing it with #// |
| 15:20:30 | <@arkiver> | right |
| 15:20:37 | <@OrIdow6> | For a rough pass in case it shuts down next few hrs |
| 15:20:48 | <@arkiver> | how do they check if you read the first article without paywall? |
| 15:20:51 | <@arkiver> | do they set a cookie? |
| 15:21:13 | <@EggplantN> | yes but the paywall is JS based anyway |
| 15:21:24 | <@arkiver> | right so content is there anyway |
| 15:21:31 | <@arkiver> | in that case - what is the cookie talk about/ |
| 15:21:31 | <@arkiver> | ? |
| 15:22:04 | <rewby> | I'm enumerating the urls, FYI |
| 15:22:09 | <rewby> | From the sitemap, at least |
| 15:22:09 | <@arkiver> | rewby: thanks |
| 15:22:13 | <@OrIdow6> | Well, first we had to figure out if it was JS-based |
| 15:22:21 | | Jonboy345 joins |
| 15:22:24 | <@OrIdow6> | But now it's just been reduced to a playback concern |
| 15:22:27 | <@arkiver> | rewby: make sure to get the sitemap URLs themselves as well :) |
| 15:22:37 | | Arcorann quits [Ping timeout: 258 seconds] |
| 15:22:55 | <rewby> | arkiver: Working on it! |
| 15:24:33 | <rewby> | There's a lot of submaps |
| 15:24:41 | <rewby> | So my scripts are having to send a ton of requests |
| 15:30:25 | <achivarin> | After this fire is hopefully put out, other independent media outlets are in the crosshairs too: https://thestandnews.com/ https://www.hkcnews.com/ https://hongkongfp.com/ |
| 15:30:42 | <achivarin> | And they also have large YouTube channels |
| 15:33:56 | <@JAA> | arkiver: qwarc isn't currently very good at archiving lists of URLs because it quickly gets bogged down by the DB locking. Can't go much beyond a couple thousand items per minute or so, and one item per URL would be the most reasonable approach with URL lists. Can be worked around though. |
| 15:34:05 | <rewby> | If these urls are structure the way I'm expecting, it's somewhere in 2008's sitemaps working it's way up to today |
| 15:34:17 | <rewby> | Doing about a month every 4 seconds |
| 15:34:53 | | Mateon1 quits [Ping timeout: 258 seconds] |
| 15:38:42 | | Mateon1 joins |
| 15:41:38 | <@EggplantN> | is there much documentation on qwarc |
| 15:42:13 | <rewby> | I'm midway through 2016 with extracting things |
| 15:42:29 | <@EggplantN> | just extracting for now? |
| 15:42:36 | <rewby> | Just extracting URLs from sitemaps |
| 15:42:39 | <@JAA> | EggplantN: How much is zero? |
| 15:43:58 | <@EggplantN> | ok so everything i've made |
| 15:54:29 | | missmega joins |
| 15:54:32 | <missmega> | Yo |
| 15:54:40 | <missmega> | So the megaupload archive was a failure? |
| 15:55:09 | <missmega> | I'm trying to find some content linked in this video at this moment: https://youtu.be/fD7X9SCn0To?t=398 It's all megaupload links. Sadly I can't find it with archive.org |
| 15:55:35 | <Jake> | https://wiki.archiveteam.org/index.php/MegaUpload The status is listed as Lost, so I imagine we didn't do a project. |
| 15:56:05 | <missmega> | That really sucks |
| 15:56:07 | <missmega> | :| |
| 16:00:19 | | ragu_ joins |
| 16:03:28 | <missmega> | OrIdow6 |
| 16:03:30 | <missmega> | I am here |
| 16:04:01 | | ragu__ quits [Ping timeout: 258 seconds] |
| 16:04:12 | <@OrIdow6> | Oh |
| 16:04:24 | | ragu__ joins |
| 16:05:09 | <@OrIdow6> | I didn't notice you were the same person |
| 16:05:10 | | ragu_ quits [Ping timeout: 258 seconds] |
| 16:06:58 | <@OrIdow6> | Sorry |
| 16:08:09 | <@OrIdow6> | EggplantN: "everything I've made"? |
| 16:13:45 | | Jonboy345 quits [Remote host closed the connection] |
| 16:14:01 | | Jonboy345 joins |
| 16:20:34 | <Jake> | I believe it was a joke on a lack of documentation on what he codes |
| 16:23:02 | <rewby> | OrIdow6, EggplantN, arkiver: I've parsed the hk.appledaily.com sitemaps. Here's the urls extracted: https://transfer.archivete.am/fzhO1/sorted_urls.txt and here's the urls of the sitemaps I pulled them from: https://transfer.archivete.am/14ZEQ8/sitemaps.txt |
| 16:23:13 | <rewby> | I've got 3221709 urls from that |
| 16:23:16 | <rewby> | That's quite a lot |
| 16:28:42 | <@EggplantN> | rewby feeding into urls now |
| 16:30:31 | <@EggplantN> | backfeed go brrrrr |
| 16:31:18 | <@JAA> | It looks like the AB job for AppleDaily will be incomplete. I'm seeing countless parsing warnings in the log, for example: 2021-06-20 21:53:40,365 - wpull.scraper.html - WARNING - Failed to read document at ‘https://hk.appledaily.com/racing/20190505/JX6MZ2JBWZR4BXTPPXLS6A4DLQ/’: 'utf-8' codec can't decode byte 0xe8 in position 46: unexpected end of data |
| 16:31:38 | <@EggplantN> | yeah we're doing an emergency via #// |
| 16:31:49 | <rewby> | EggplantN: Are you somehow giving the urls priority over the rest of the queue? |
| 16:31:52 | <@EggplantN> | nope |
| 16:32:03 | <@EggplantN> | i'm just gonna scale up instead |
| 16:32:39 | <rewby> | Mkay |
| 16:32:42 | <rewby> | Good luckj |
| 16:32:44 | <@EggplantN> | hrm |
| 16:32:45 | <@EggplantN> | okay |
| 16:32:49 | <@EggplantN> | i've found an issue perhaps |
| 16:32:57 | <rewby> | Oh no |
| 16:33:40 | <@JAA> | 502 |
| 16:33:44 | <@JAA> | ? |
| 16:33:48 | <@EggplantN> | nah |
| 16:33:53 | <@EggplantN> | its backfeed related |
| 16:34:14 | <@JAA> | Ah, AB Job started 502ing a bit the same moment you said you'd start queueing. lol |
| 16:34:35 | <rewby> | Are you hug-of-death-ing it Eggplant |
| 16:34:56 | <@JAA> | Yay, hugs. |
| 16:35:27 | <@JAA> | But yeah, let's try not to murder it. |
| 16:35:39 | <@EggplantN> | FYI |
| 16:35:49 | <@EggplantN> | i've also removed the :maxtries from #// |
| 16:35:57 | <@EggplantN> | until i can verify we've grabbed everything |
| 16:37:34 | <missmega> | So |
| 16:38:03 | <missmega> | megaupload's data is gone |
| 16:38:17 | <@EggplantN> | link? |
| 16:39:10 | <@arkiver> | rewby: does this include embedded images? |
| 16:40:22 | <Jake> | missmega: I believe so, yes. Unless someone out there has a copy. |
| 16:40:44 | <@arkiver> | from what i see no embedded images from pages |
| 16:41:04 | <@arkiver> | EggplantN: i can quickly turn on getting embedded images |
| 16:41:12 | <@EggplantN> | sure if you want |
| 16:41:24 | <@EggplantN> | i forgot that was enabled now |
| 16:41:29 | <@EggplantN> | 🤦 |
| 16:41:37 | <rewby> | arkiver: No, I think it's just links to stories. |
| 16:42:15 | <rewby> | Also, I'm seeing 502s while I'm trying to tweak my scripts. So we're going a smidge too fast for them I think |
| 16:42:23 | <@arkiver> | EggplantN: its not enabled now |
| 16:42:28 | <@arkiver> | i'm enabling it now |
| 16:42:30 | <@EggplantN> | sorry *supported |
| 16:42:34 | <@EggplantN> | wrong word |
| 16:44:56 | <@arkiver> | this site doesnt embed images like most sites do |
| 16:45:12 | <@arkiver> | Wget-AT cant easily extract them (without us parsing the HTML) |
| 16:45:22 | <@arkiver> | that is custom extraction |
| 16:46:16 | <@arkiver> | let's just finish this run now |
| 16:46:25 | <@arkiver> | we can maybe do a second run later to get the images |
| 16:48:14 | <@arkiver> | that would mean duplicating HTML pages, but thats fine with me |
| 17:05:14 | <@EggplantN> | yeah that was my thinking |
| 17:05:17 | <@EggplantN> | lets just get what we can |
| 17:08:59 | <rewby> | arkiver: In case it's useful to you, I put my sitemap scraper on github. https://github.com/rewbycraft/sitemap-enumerator |
| 17:09:06 | <rewby> | It's mostly a wrapper/partial reimplementation of a library |
| 17:09:10 | <rewby> | But this one can go nyooom |
| 17:13:46 | | britm0b joins |
| 17:15:32 | | Webuser431 joins |
| 17:15:42 | | britmob quits [Ping timeout: 258 seconds] |
| 17:19:15 | <Ryz> | On the subject on looking out other Hong Kong journalist/news websites, https://en.wikipedia.org/wiki/List_of_newspapers_in_Hong_Kong could be a good starting point |
| 17:24:45 | <@JAA> | Looks like en.appledaily.com needs some work as well. It was run through AB twice (once a few days ago, once today), but those jobs finished surprisingly quickly. It has infinite scrolling, and the sitemap only seems to cover the past month. :-/ |
| 17:25:16 | | Daloader joins |
| 17:32:11 | <achivarin> | Ryz: You have the right idea but most papers on that list are pro-Beijing rags. Reposting what I said above: |
| 17:32:16 | <achivarin> | After this fire is hopefully put out, other independent media outlets are in the crosshairs too: https://thestandnews.com/ https://www.hkcnews.com/ https://hongkongfp.com/ And they also have large YouTube channels |
| 18:29:16 | | CZ joins |
| 18:36:01 | | CZ quits [Remote host closed the connection] |
| 18:36:23 | | Cz joins |
| 19:10:25 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 19:13:46 | | Jonboy345 joins |
| 19:16:27 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 19:18:52 | | Jonboy345 joins |
| 19:20:12 | | Daloader quits [Ping timeout: 250 seconds] |
| 19:29:14 | | missmega quits [Ping timeout: 244 seconds] |
| 19:30:10 | | Cz quits [Remote host closed the connection] |
| 19:34:12 | | Jonboy3451 joins |
| 19:34:12 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 19:51:52 | | AlsoHP_Archivist joins |
| 19:51:52 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 20:03:48 | | lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)] |
| 20:04:49 | | lennier1 (lennier1) joins |
| 20:29:58 | | ThreeHM quits [Ping timeout: 250 seconds] |
| 20:33:16 | | ThreeHM (ThreeHeadedMonkey) joins |
| 20:47:44 | | lunik1 quits [Ping timeout: 250 seconds] |
| 21:05:30 | | lunik1 joins |
| 21:18:06 | | EdSavoie joins |
| 21:20:42 | | AlsoHP_Archivist quits [Read error: Connection reset by peer] |
| 21:21:10 | | AlsoHP_Archivist joins |
| 21:38:46 | <@JAA> | Bethesda forum and mod comment API data retrieval is running. Specifically, I'm fetching /community/api/topic/TOPICID + /community/api/topic/SLUG?page=PAGE for topics and https://api.bethesda.net/mods/ugc-workshop/content/get?content_id=MODID + /community/comments/get/mods/mods_MODID/0 (and .../1 etc. for pagination) for mod comments. /community/* URLs are on bethesda.net. |
| 21:39:35 | <@JAA> | The list of mods also comes from the API, namely https://api.bethesda.net/mods/ugc-workshop/list/?number_results=20&order=desc&page=PAGE&platform=&product=&sort=published&text= |
| 21:49:55 | | @EggplantN quits [Quit: Ping timeout (120 seconds)] |
| 21:50:15 | | EggplantN joins |
| 21:53:14 | <@JAA> | Uh, looks like I'm getting zero mod comments. Maybe they already disabled that part without another announcement. :-( |
| 21:53:41 | <@JAA> | https://bethesda.net/en/mods/fallout4/mod-detail/911793 had a lot of comments as of a month ago, for example. |
| 22:37:14 | | DogsRNice (Webuser299) joins |
| 22:42:44 | | EggplantN is now authenticated as EggplantN |
| 22:42:44 | | EggplantN quits [Changing host] |
| 22:42:44 | | EggplantN (EggplantN) joins |
| 22:42:44 | | @ChanServ sets mode: +o EggplantN |
| 22:44:09 | <@EggplantN> | when tf did my connection die |
| 22:44:23 | <@JAA> | 21:49:55 |
| 22:44:36 | <@EggplantN> | is it 22:44 now for you |
| 22:44:47 | <@JAA> | One time zone to rule them all (UTC). Yes |
| 22:45:02 | <@EggplantN> | that |
| 22:45:05 | <@EggplantN> | kinda doesnt make sense |
| 22:45:07 | <@EggplantN> | but ok |
| 22:45:10 | | Muad_Dib quits [Ping timeout: 250 seconds] |
| 22:55:36 | | Muad-Dib joins |
| 23:21:47 | <@JAA> | The forums part of the Bethesda crawl is done. I'm sending them an email about the mod comments. |
| 23:23:40 | | AlsoHP_Archivist quits [Client Quit] |
| 23:23:56 | | HP_Archivist (HP_Archivist) joins |
| 23:42:40 | | BlueMaxima joins |
| 23:52:05 | | Specular joins |
| 23:53:04 | | Arcorann (Arcorann) joins |
| 23:55:21 | | Specular quits [Client Quit] |
| 23:55:39 | | nerdguy1138 quits [Quit: Leaving.] |
| 23:58:25 | <@JAA> | Bethesda forum stats in my WARC: 297121 topics, 363778 topic pages, 2851308 posts. The topic numbers on the forum homepage add up to 299359; close enough. Post IDs go to 3.3 million, but there are plenty of deleted topics (IDs go to just over 456789), so that sounds good enough as well. |