00:00:23<@OrIdow6>https://docs.framasoft.org/fr/framasite/rattachement-nom-de-domaine.html is their page on custom domains, might be able to get a few from the Rapid7 data - I will run this myself when I get back, if no one else has
00:00:54dm4v quits [Read error: Connection reset by peer]
00:01:39dm4v joins
00:01:41dm4v quits [Changing host]
00:01:41dm4v (dm4v) joins
00:03:42<@OrIdow6>JAA (or anyone else with a means of queueing them to AB): Can you run those lists I put in above?
00:05:53<@JAA>OrIdow6: Yep, will set it up shortly.
00:06:03<@JAA>Also checking Rapid7 now.
00:15:42tzt quits [Ping timeout: 250 seconds]
00:18:39<@JAA>framasite_subdomains_from_wbm_cdx is running through queueh2ibot now.
00:21:32<@arkiver>JAA: this is nice
00:21:59<@arkiver>looks very good on the dashboard as well
00:22:12@arkiver didnt know we have auto queuing for AB now
00:22:16<@JAA>:-) Thanks.
00:22:28<@JAA>Yeah, only really for big lists of stuff that can be done mostly unsupervised though.
00:23:13<@JAA>It has been deployed three times before: US elections, UK elections, and pttk.pl.
00:23:49<@JAA>It has a configurable limit and checks against http://dashboard.at.ninjawedding.org/status constantly to not build up a huge queue.
00:28:48<@JAA>OrIdow6: 118 frama.site subdomains in Project Sonar FDNS (2021-06-25-1624579787-fdns_a.json.gz): https://transfer.archivete.am/5vETi/sonar_fdns_a_frama.site
00:34:39wyatt8740 quits [Remote host closed the connection]
00:43:03<@JAA>OrIdow6: And only 20 .wiki: https://transfer.archivete.am/cbsnZ/sonar_fdns_a_frama.wiki
00:47:09wyatt8740 joins
01:02:35dm4v_ joins
01:04:14dm4v quits [Ping timeout: 250 seconds]
01:04:14dm4v_ is now known as dm4v
01:04:14dm4v quits [Changing host]
01:04:14dm4v (dm4v) joins
01:15:48t3chler quits [Client Quit]
01:15:48masterX244 quits [Remote host closed the connection]
01:18:22t3chler joins
01:18:29masterX244 (masterX244) joins
01:24:24Ruthalas quits [Ping timeout: 258 seconds]
01:28:31SCSi quits [Ping timeout: 250 seconds]
01:33:57BlueMaxima joins
01:35:33BPCZ quits [Ping timeout: 258 seconds]
01:36:09qw3rty__ joins
01:37:03BPCZ joins
01:37:24pabs (pabs) joins
01:37:54<@JAA>Yeah, I saw somewhere that a date for the static site transformation was supposed to be announced last month.
01:37:57<@JAA>(That's on GNOME Bugzilla.)
01:38:14<pabs>JAA: good to hear that GNOME Bugzilla will become a static site, the comments on bug closing make it sound like they will just shut down the server
01:38:54qw3rty_ quits [Ping timeout: 250 seconds]
01:39:45<@JAA>Yeah, they want to shut down the Bugzilla-specific infrastructure by the sound of it. I.e. just throw a static copy on something they're running anyway, then get rid of that.
01:39:45<@JAA>But let's archive it anyway because such migrations are always a nice source of errors and issues.
01:41:46<pabs>agreed
01:47:42Ruthalas (Ruthalas) joins
02:03:34HP_Archivist quits [Client Quit]
02:37:59AntiLiberal joins
02:45:58jacobk joins
02:48:10tzt joins
02:58:12AntiLiberal quits [Ping timeout: 249 seconds]
03:31:01DogsRNice quits [Read error: Connection reset by peer]
03:35:12qw3rty_ joins
03:38:53qw3rty__ quits [Ping timeout: 258 seconds]
03:43:16monoxane quits [Quit: estoy fuera]
03:43:41monoxane (monoxane) joins
03:44:31<@JAA>OrIdow6: The 48 sites that appear in Rapid7's list but not in the CDX one are running through now. Will deal with the wikis afterwards, but that's messier due to DokuWiki requiring extra ignores.
04:23:08Ajay_m quits [Ping timeout: 250 seconds]
04:33:02Ajay_m joins
04:46:53<@JAA>And now the 83 Framawikis from CDX + Rapid7 are running through AB.
04:57:01<@OrIdow6>Thanks
05:02:28<@OrIdow6>JAA: Did you do A and AAAA in addition to CNAME?
05:02:47<@OrIdow6>Since that's how they instruct people to do custom domains
05:03:16<@JAA>OrIdow6: I did A, not CNAME. See specific filename above.
05:03:54<@JAA>I'm guessing Rapid7 does an A lookup and puts all answer records into the A file regardless of whether they're actually A records.
05:04:45<@JAA>Also, this was essentially a `grep .frama.site`, so I'm guessing it should've caught custom domains unless they weren't using a CNAME.
05:05:07<@JAA>Oh, actually, no.
05:07:03<@JAA>Running another scan without the leading dot, i.e. frama.site and frama.wiki; that's how the CNAMEs show up.
05:09:56<@JAA>(Sorry for not wanting to parse 2+ billion JSON objects and run the scan properly. lol)
05:22:21<@OrIdow6>JAA: I mean, A to 144\.76\.131\.210 and AAAA to 2a01:4f8:141:3421::210
05:22:32<@JAA>Ah
05:22:39<@JAA>No, did not.
05:22:52<@OrIdow6>Mine for that may have finished
05:23:25<@OrIdow6>Yes it id
05:23:27<Barto>OrIdow6: i guess we just don't live on the same side of the world, is it? :-)
05:23:37<@JAA>But I did find a few custom domains with frama.site.
05:23:52<@OrIdow6>Barto: So it would seem, haha
05:24:16<@JAA>Apparently zstdgrep doesn't support multiple patterns, so it didn't search for wikis. :-|
05:25:41<Barto>OrIdow6: their page to describe which service they'll keep and which they'll drop sounds complete. Anything else needed from my end?
05:26:17<@JAA>OrIdow6: https://transfer.archivete.am/6MXkK/sonar_fdns_a_frama.site_custom
05:26:41<@OrIdow6>JAA: Here is A to raw address: https://transfer.archivete.am/dV14w/framasite_framawiki_rapid7_a_to_raw_address
05:27:09<@OrIdow6>Thanks for this new one
05:27:47<@JAA>Guess we need to filter out their own sites first. Can you prepare a combined list of these two so I can throw them into queueh2ibot?
05:28:07<@OrIdow6>Barto: What we would like to have is a list of user-created domains, links, etc.
05:28:11<@OrIdow6>Within the services
05:28:42<@OrIdow6>JAA: Alright; do you have something automatically doing dedup on your end?
05:28:56<@JAA>I can handle dedup, yeah.
05:28:59<@OrIdow6>OK
05:29:16<@JAA>I just realised I'll also have to detect DokuWikis on these though. :-|
05:31:31<@JAA>Shame that they don't use distinct IPs for sites and wikis.
05:35:08monoxane quits [Client Quit]
05:35:17<@OrIdow6>Yeah
05:35:34monoxane (monoxane) joins
05:35:57<@OrIdow6>Would make this is bit easier as well
05:38:25<@OrIdow6>Going to strip out non-www if www is present
05:42:20<@OrIdow6>JAA: https://transfer.archivete.am/5Yb9w/framasite_framawiki_rapid7_cname_a_combined_and_processed_a_bit.txt
05:43:53<@OrIdow6>There's at least one (https://mta-sts.b0c.asso.st/) that seems to be misconfigured in some way, so all it gives is a cert error and then (if you bypass it) a site not found error
05:44:23<@OrIdow6>But do not think that should cause problems
05:46:14<@JAA>Yeah, there are 25 of those.
05:46:24<Barto>OrIdow6: i see. Shall I try to ask them kindly on libera or is it a lost cause?
05:46:26<@JAA>Or well, 25 certificate issues, didn't check further.
05:52:17Ajay_m quits [Ping timeout: 258 seconds]
05:53:45<@OrIdow6>Barto: I think there's still a chance
05:55:02<@OrIdow6>Since they do very much seem to be a "community" organization - the thing I can see shifting it the other way would be privacy concerns, hence my focus on Framasite, Framawiki, and Framalink, which seem to be the most "public" of the ones going down
05:55:39<@OrIdow6>Be aware that hook54321 asked ca. 35 minutes ago in that channel, and there has been no response nor any other activity sense
05:55:41<@OrIdow6>*since
05:56:31<@JAA>List of AB commands is ready, just waiting for the current batch to finish.
05:57:51<Barto>their irc channel is unofficial, will try another way
05:59:26<@OrIdow6>Oh
05:59:28<@OrIdow6>Do what you think is best
06:01:33<Barto>i'm just seeing there's no ops in this channel and the title explicitely mentions it's "unofficial" in their description
06:03:26<Barto>i'll try to throw them a message via https://contact.framasoft.org/ asking how they do their framasite counter, and if they're allowed to prove this number
06:04:32Megame quits [Ping timeout: 250 seconds]
06:04:41<Barto>and i'll ask if i can check that none of my friend are affected by this closure, how can i find that ;)
06:07:40<@OrIdow6>I'd rather not be deceitful about it
06:11:58balrog quits [Ping timeout: 250 seconds]
06:11:58<Barto>do you want me to be direct and ask them if they have a list of sites?
06:12:14Ajay_m joins
06:12:24<Barto>just so we do throw archivebot on it and slam their bandwidth?
06:14:16<@JAA>I'd suggest something along the lines of: 'Hi, we heard you're shutting down some services and would like to preserve them indefinitely at the Internet Archive. Would you be willing to work with us to make this possible?' (But clear enough that we aren't IA etc.)
06:15:18<@OrIdow6>Yeah
06:15:38<@OrIdow6>"just so we do throw archivebot on it and slam their bandwidth?" - I don't think it's using up very much of their BW at present, if that's what this is asking?
06:16:23<Barto>alright, i'll try something like that
06:16:31<Barto>i was worried they'd close the door shut if i were too explicit
06:17:11<@JAA>Yeah, I haven't noticed any slowdown even with 40+ AB jobs in parallel. But that's also part of the 'working with us' basically. Acceptable request rate limits etc.
06:18:49<@OrIdow6>Even if lying is more effective, I'd rather not do it
06:19:37<@JAA>Yeah, fully agreed.
06:22:12balrog (balrog) joins
06:24:39<@JAA>OrIdow6: That last list is running through now.
06:38:36<FalconK>whois 64.71.160.46
06:38:39<FalconK>argh
06:42:14balrog quits [Ping timeout: 250 seconds]
06:55:46balrog (balrog) joins
07:00:08<thuban>is there a way to confirm (on current infra) whether an archivebot job finished normally?
07:01:25<thuban>oh nvm, it's in the json metadata
07:06:36<thuban>it looks like no new ab jobs have been submitted for the hong kong media sites, even though several of the first round of jobs have finished and there are more sites on the list.
07:06:50h3ndr1k quits [Quit: No Ping reply in 180 seconds.]
07:07:04h3ndr1k (h3ndr1k) joins
07:07:34<thuban>i've just updated https://wiki.archiveteam.org/index.php/Hong_Kong_media ; can someone put in a new round? (we're not waiting on pipeline capacity, are we?)
07:11:13jspiros quits []
07:11:27jspiros (jspiros) joins
07:12:30h3ndr1k quits [Client Quit]
07:15:05fuzzy8021 quits [Ping timeout: 258 seconds]
07:18:12h3ndr1k (h3ndr1k) joins
07:22:50<thuban>have also just added the twitter links nuroten dug up for potential snscrape jobbs; thanks again nuroten
07:23:12<thuban>(are you sure about the social media for hkpeanut and the twitter for memehk? they seem unrelated to me)
08:01:53<@HCross>arkiver: you were right
08:09:25<thuban>i've also just updated the youtube and youtubearchive video counts (diff: https://wiki.archiveteam.org/index.php?title=Hong_Kong_media&type=revision&diff=46929&oldid=46928).
08:09:38<thuban>if someone with yta privileges could have a look, that would be really helpful--some channels we didn't and still don't have complete copies of, some channels we did have complete copies of but have since published more news, and some channels' video counts have dropped precipitously
08:11:05<thuban>last group is tvmost and d100, which we seem to have had most of, and i-cable, which we definitely didn't
08:22:11<AK>thuban, goot point, I need to go through and add them in
08:22:24<AK>I'll get the rest of the HK media stuff added in today
08:23:35<thuban>thank you! i'm taking a quick break before i add the political parties and other stuff from the rest of the etherpad
08:27:25<thuban>let me know whether you start submitting jobs for stuff that isn't on the wiki page yet, and if so i'll make sure i link them
09:00:53mutantmonkey quits [Ping timeout: 258 seconds]
09:12:55mutantmonkey (mutantmonkey) joins
09:16:14godane1 joins
09:18:40godane2 quits [Ping timeout: 250 seconds]
09:30:47BlueMaxima quits [Client Quit]
09:33:43godane2 joins
09:36:09godane1 quits [Ping timeout: 258 seconds]
09:48:13godane1 joins
09:50:43godane2 quits [Ping timeout: 250 seconds]
10:52:03HackMii quits [Ping timeout: 258 seconds]
10:53:47HackMii (hacktheplanet) joins
11:01:56vela quits [Client Quit]
11:02:55vela (vela) joins
11:04:22ave quits [Quit: Ping timeout (120 seconds)]
11:04:51lun4 quits [Quit: Ping timeout (120 seconds)]
11:05:56igloo22225 quits [Quit: Ping timeout (120 seconds)]
11:06:14Eighty quits [Ping timeout: 258 seconds]
11:06:43igloo22225 (igloo22225) joins
11:09:08ave (ave) joins
11:09:43lun4 (lun4) joins
11:10:00@dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
11:10:23dxrt joins
11:10:26dxrt quits [Changing host]
11:10:26dxrt (dxrt) joins
11:10:26@ChanServ sets mode: +o dxrt
11:21:51Eighty (Eighty) joins
11:57:44<Iki>Highly speculative, but maybe worth proactively covering Taiwan media in the next couple years?
11:57:47<Iki>https://www.msn.com/en-in/news/world/china-tells-taiwan-it-cannot-rely-on-united-states-future-lies-in-reunification/ar-AALo9H3
11:57:54<Iki>Given the recent HK project
12:49:24yanome quits [Quit: The Lounge - https://thelounge.chat]
12:49:34yanome (yano) joins
12:53:33<lunik1>what about Macau too, I don't know what the situation is there
12:54:30LeGoupil joins
13:00:54noteness quits [Remote host closed the connection]
13:01:12noteness (noteness) joins
13:21:31mls (mls) joins
13:52:47Matthww8 quits [Quit: The Lounge - https://thelounge.chat]
14:17:36Megame (Megame) joins
14:17:43Gereon (Gereon) joins
14:23:27Matthww8 joins
14:34:17KRG joins
14:34:17KRG quits [Changing host]
14:34:17KRG (KRG) joins
15:11:00KRG` joins
15:13:29KRG quits [Ping timeout: 258 seconds]
15:16:36nuroten joins
15:22:48KRG joins
15:22:48KRG quits [Changing host]
15:22:48KRG (KRG) joins
15:26:08KRG` quits [Ping timeout: 258 seconds]
15:27:17Arcorann quits [Ping timeout: 258 seconds]
15:28:15spirit joins
15:40:03thuban quits [Read error: Connection reset by peer]
15:40:30thuban joins
15:57:02SCSi (SCSi) joins
16:03:27<nuroten>thuban: thanks for updating the wiki page with the links. some of the associations from the parties section have disbanded or since have deactivated their FB pages, so feel free to remove in that case. not sure if it will be useful to note in a separate section the dead ones, mostly if someone asks if they have been saved
16:04:43<nuroten>* since disbanded and deactivated
16:06:03<nuroten>they're falling off at a quicker pace now, unfortunately I can't keep track of them all daily
16:08:01<Barto>OrIdow6: no answers yet, we'll see if someone will answer this evening
16:14:49Doran (Doranwen) joins
16:15:35Doranwen quits [Ping timeout: 258 seconds]
16:33:13HackMii quits [Ping timeout: 258 seconds]
16:33:31<Ryz>Don't think you should remove those Facebook links, just because of future reference stuff; just mark them as dead and not delete the entries
16:35:04HackMii (hacktheplanet) joins
16:38:33<nuroten>okay. a few are crossed out in place as they might have other accounts or website that are still up, some only have Facebook, not much to archive otherwise?
16:39:15<nuroten>1-2 I moved to a dead section, but yeah, whatever makes most sense to people
17:10:08t3chler quits [Client Quit]
17:11:48Ryz quits [Remote host closed the connection]
17:12:24Ryz (Ryz) joins
17:14:48KRG quits [Remote host closed the connection]
17:15:08KRG joins
17:15:08KRG quits [Changing host]
17:15:08KRG (KRG) joins
17:36:06<thuban>nuroten: do the neo democrats still have a facebook page? i guess it doesn't matter that much if we can't archive it anyway, but the news story about the disbandment mentions it
17:49:03<@JAA>AK: Your wiki account is automoderated now.
17:49:10<AK>Uh oh
17:49:13<AK>That sounds dangerous
17:49:26<@JAA>No more manual approval of your edits. :-)
17:51:39<thuban>is there a way to get the full archivebot job id for a finished job? the json file doesn't include it, and https://archive.fart.website/archivebot/viewer/ only displays a truncated version
17:53:52<@JAA>Not easily, no.
17:54:18<@JAA>It's in the -meta.warc.gz in the command line arguments and log messages.
17:54:34flashmeow quits [Quit: ZNC 1.8.2 - https://znc.in]
17:54:54flashmeow (flashmeow) joins
17:55:43<thuban>good enough for me. thanks!
17:57:02DogsRNice (Webuser299) joins
18:04:00<Iki>tai
18:04:42<Iki>oops
18:23:59<nuroten>thuban: probably not anymore, it's not coming up for me in searches
18:26:31<nuroten>individual party members may still have Facebook accounts (likely managed by friends/relatives), also tried looking at the WBM snapshots of their website, didn't see one
18:28:00duce1337 joins
18:29:27KRG` joins
18:32:26KRG quits [Ping timeout: 258 seconds]
18:33:20leo60228 quits [Ping timeout: 250 seconds]
18:33:43leo60228 (leo60228) joins
18:43:07spirit quits [Client Quit]
19:03:07mutantmonkey quits [Remote host closed the connection]
19:04:19mutantmonkey (mutantmonkey) joins
19:07:09<nuroten>thuban: the wiki links to hkpeanut.com which is a different website. hkpeanuts.com is down
19:10:55<thuban>that would explain it, thanks
19:11:07<nuroten>the domain redirects to simcast ... facebook link updated, I haven't found twitter yet (the wrong links were initially grabbed from that other website)
19:11:18<thuban>fixed on etherpad & wiki page
19:13:30<thuban>(job cyko5axvbovcbec1vahgyz2xm was started on the wrong site, but it looks like it's finished already)
19:17:06<nuroten>I can't check if their Facebook is still active, site keeps prompting me to log in, but twitter is gone
19:17:47<nuroten>ah well, hkparenting might be eventually useful haha
19:20:21<nuroten>they have an updated tumblr though
19:21:39duce1337 quits [Changing host]
19:21:39duce1337 (duce1337) joins
19:24:07<thuban>thanks. i'm gonna add the last of the stuff from the etherpad sometime in the next few hours, so if you haven't already i'll edit the link in then
19:24:33<nuroten>great, thanks :)
19:30:25<nuroten>regarding Taiwan and Macau, I'm unfamiliar with the situation there, at this time I would probably put Taiwanese media in the "healthy" category. last I heard Macau already passed a version of NSL a few years ago, so protests there are rare now
19:32:16lennier1 quits [Ping timeout: 250 seconds]
19:38:31HP_Archivist (HP_Archivist) joins
19:41:18KRG` quits [Remote host closed the connection]
19:41:41<nuroten>(but I agree it might not be a bad idea to start covering Taiwan eventually, if only to have a head start when things go south fast)
19:58:27ats quits [Quit: quieter GPU]
20:37:43ats (ats) joins
20:53:37HP_Archivist quits [Read error: Connection reset by peer]
21:04:09LeGoupil quits [Client Quit]
21:08:43lennier1 (lennier1) joins
22:19:45mutantmonkey quits [Ping timeout: 258 seconds]
22:20:24mutantmonkey (mutantmonkey) joins
22:25:41<@OrIdow6>Thanks Barto
22:27:54<AK>At the very least, a list of possible Taiwanese sites would make the job of capturing them easier if/when it's needed
22:29:09<@OrIdow6>In the event of an invasion, it is unlikely that only the Taiwanese media would be threatened
22:29:49<@OrIdow6>We have (the old) domains-grab
22:30:00<@arkiver>yeah
22:30:08<AK>Good point
22:30:10<AK>It'd be a big one
22:30:10<@arkiver>hopefully #Y will be ready before that happens
22:30:16<@arkiver>(i think it will)
22:30:23<AK>eu-domains was one of the first projects I did
23:14:11Ajay_m quits [Ping timeout: 258 seconds]
23:14:43Ajay_m joins
23:33:07Arcorann (Arcorann) joins