| 00:27:48 | | HP_Archivist (HP_Archivist) joins |
| 00:38:23 | <@OrIdow6> | I had a dream last night that Wikidot shut down and we only got paid wikis |
| 00:39:11 | <@OrIdow6> | In related news I am going to try to see if there's a better way to structure my regexes besides copying and pasting them |
| 00:42:32 | <@OrIdow6> | arkiver: What is the function of the item-name:// argument? |
| 00:48:46 | | phiresky quits [Quit: Ping timeout (120 seconds)] |
| 00:48:51 | | phiresky joins |
| 01:02:23 | | dm4v quits [Read error: Connection reset by peer] |
| 01:03:07 | | dm4v joins |
| 01:03:09 | | dm4v is now authenticated as dm4v |
| 01:03:09 | | dm4v quits [Changing host] |
| 01:03:09 | | dm4v (dm4v) joins |
| 01:04:31 | | Arcorann (Arcorann) joins |
| 01:06:53 | <@jrwr> | Alert, Freenode is going kaboom |
| 01:06:59 | <@jrwr> | someone should scrape chanserv |
| 01:07:07 | <@jrwr> | old services are online at 212.102.48.58 |
| 01:07:18 | <@jrwr> | Nickser/Chanserv meta |
| 01:08:13 | | lun4 quits [Ping timeout: 258 seconds] |
| 01:08:24 | | ave quits [Ping timeout: 250 seconds] |
| 01:08:36 | | igloo22225 quits [Ping timeout: 258 seconds] |
| 01:08:50 | | RJHacker79772 quits [Ping timeout: 250 seconds] |
| 01:09:35 | <nico> | frenode's global notice https://pastebin.com/jgbwucCr |
| 01:09:53 | <nico> | jrwr: what is 212.102.48.58? |
| 01:10:02 | <nico> | no http/https service there |
| 01:10:08 | <@jrwr> | one the last irc servers online there |
| 01:10:13 | <@jrwr> | with old services |
| 01:10:49 | <nico> | i am connected to |
| 01:10:49 | <nico> | 03:09 [FreeNode] -%- server : hostsailor.freenode.net [EU] |
| 01:10:54 | | wizards joins |
| 01:11:40 | <nico> | 03:10 -NickServ(NickServ@services.)- Information on nico_32 (account nico_32): |
| 01:11:43 | <nico> | 03:10 -NickServ(NickServ@services.)- Registered : Oct 25 23:36:44 2015 (5y 33w 3d ago) |
| 01:12:01 | <@jrwr> | now try chat.freenode.net chat.freenode.net |
| 01:12:07 | <@jrwr> | and watch all that gone |
| 01:12:48 | <nico> | i still connected to freenode to keep a channel closed |
| 01:13:03 | <nico> | so no big loss |
| 01:13:39 | <nico> | you want to grab channel metadata? |
| 01:13:47 | <@jrwr> | ya, chanserv infos |
| 01:14:13 | | wizards_ joins |
| 01:14:16 | <nico> | doing a /list then /msg chanserv info #channelname? |
| 01:14:33 | | wizards leaves |
| 01:14:55 | | wizards_ is now known as wizards |
| 01:15:30 | | vela quits [Ping timeout: 258 seconds] |
| 01:16:00 | <@jrwr> | yep |
| 01:19:29 | <nico> | if you've an irsii script, i would be happy to run it |
| 01:20:01 | <nico> | i see that /list isn't logged by irssi |
| 01:21:01 | <Mateon1> | Seems that /list isn't logged by a lot of clients, then. I tried in mine and it wasn't logged |
| 01:23:10 | <@JAA> | nico: Try using the raw log. You'll likely have to increase the buffer size though. |
| 01:23:40 | <@jrwr> | and I would hurry up |
| 01:23:43 | <wizards> | is there a list of subdomains that still go to the old server(s)? |
| 01:23:44 | <@jrwr> | its getting crazy |
| 01:23:51 | <@jrwr> | there are none at the moment |
| 01:23:58 | <thuban> | weechat also logs server buffers by default, fwiw |
| 01:24:07 | <@jrwr> | you just haev to figure out what nodes they hae vnot moved over yet |
| 01:24:32 | <nico> | it worked |
| 01:24:33 | <@JAA> | kornbluth.freenode.net is still the old as of right now. |
| 01:25:06 | <nico> | so now I need to spam chanserv |
| 01:25:12 | <wizards> | ah, okay, then i guess i should repeat the subdomains i saw being passed around in ##freenode@libera |
| 01:25:51 | <thuban> | might also be worthwhile collecting /names and nickserv info, tho depending on implementation/config it may be necessary to join all the channels first |
| 01:29:36 | <wizards> | 'happytree', 'rinnegan', and 'adams' are said to still go to old freenode (not tested) |
| 01:32:33 | <@JAA> | I can't see NickServ or ChanServ from kornbluth, FWIW. |
| 01:32:46 | <@JAA> | As in, no such nick. |
| 01:33:04 | <@jrwr> | better change your nick to chanserv |
| 01:34:44 | <nico> | try 185.198.56.38 / hostsailor.freenode.net |
| 01:39:06 | <nico> | i set rawlog_lines to 999999999 |
| 01:39:11 | <nico> | it isn't enough |
| 01:40:11 | <nico> | https://internalexception.byme.at/freenode_chan_list.txt |
| 01:40:42 | <nico> | ~ 6100 channels |
| 01:41:12 | <thuban> | hold on, i'll see if i can connect to one of the working servers |
| 01:42:09 | | pekster (pekster) joins |
| 01:42:40 | <thuban> | ok, got it |
| 01:42:58 | <thuban> | lemme just clip it out |
| 01:44:45 | <pekster> | So, not sure how useful the data is at this point, but I'm using a channel list from Freenode taken mid-day on 6/11, and just set a query script loose to work through the ~2.5k unique non-secret channels and grab info. |
| 01:48:39 | <SCSi> | man |
| 01:48:52 | <SCSi> | new freenode services are hosed |
| 01:48:55 | <SCSi> | sasl isnt working |
| 01:48:58 | <SCSi> | emails arent being sent |
| 01:49:00 | <SCSi> | its great |
| 01:49:05 | <thuban> | pekster: wonderful! i have the /list output i just took; please let me know if you want it and/or need it processed down to the channel names |
| 01:49:30 | <@jrwr> | run it a few time with filters to ensure you get everything |
| 01:49:59 | <@JAA> | Let's please keep general freenode discussion in -ot and only the data archival part here. |
| 01:51:35 | <pekster> | thuban: I doubt anything of significance is there that wasn't on ALIS 3 days ago, but thanks. At this point it still needs another hour to 90 minutes just to work though the info, then if the thing hasn't burnt down at that point, I'll set a flags cycle too, though many channels have hidden those. |
| 01:52:40 | <thuban> | pekster: i'm showing almost 6k channels |
| 01:53:12 | <pekster> | In your list? Interesting. Can I PM you an email for it? I'll combine it with mine for a 2nd-round script-scan then. |
| 01:53:32 | <thuban> | i'll just upload it, it's not exactly secret |
| 01:53:36 | <pekster> | Sure :) |
| 01:54:12 | <pekster> | Oh, I stopped at -min 10, so my list excludes channels with less than that; perhaps that accounted for half the list. |
| 01:54:22 | <thuban> | ah, very probably |
| 01:54:52 | <thuban> | https://transfer.archivete.am/UgbUa/freenode.txt is just the channel names; i do have the user counts if you want those |
| 01:55:10 | <pekster> | Nope, I'll de-dupe against my list, and after I run through all of these, I'll update for the uniques on yours. |
| 01:55:23 | <pekster> | Thanks much, glad to add some of the less-populated items to my crawler script! |
| 01:56:56 | <thuban> | sounds good! would it be practical to parallelize this in some way (perhaps by starting a separate run for flags or smaller-channel info)? |
| 01:57:26 | <thuban> | also, what info are you grabbing? just the chanserv output? |
| 01:57:56 | <pekster> | Possibly; I have 2 alts, though I'm running them at 3/s, which isn't awful. In theory "old" FN is open for a bit, but I don't trust that a ton. I can probably send the de-duped copy to one of my alts. |
| 01:58:24 | <pekster> | 'info $chanName' now, then I'll re-run with flags too, to get any non-hidden access list info. |
| 01:59:56 | <thuban> | some of the people in here can probably run more if you share your script |
| 02:00:49 | <thuban> | before you joined i suggested also getting /names and nickserv info, although that is definitely more of an undertaking |
| 02:02:08 | <pekster> | Yea, though if someone has a clever way to populate that, my script is trivial to port to that. I'm invoking it via /exec in irssi, so all it does is read an INI file in the Perl source, then open a filename & work through a loop w/ whatever command is needed. I can toss that on gist.gh.c shortly if someone wants to re-tool it for nickserv. |
| 02:06:04 | <yanmaani> | pekster: why not just do /list ? |
| 02:06:07 | <thuban> | population is the hard part; you'd probably have to join the channels. might be easier to re-run the channel script grabbing /names, process & dedupe everything with shell tools, and feed the results into a modified nick script, than to join, dedupe, and get nicks on the fly |
| 02:06:17 | <yanmaani> | /list will give you everything, not just -min 10 |
| 02:06:21 | <yanmaani> | and is faster, I think |
| 02:06:53 | <yanmaani> | (also, alis doesn't have all channels) |
| 02:07:01 | <thuban> | (although the latter would have the benefit of returning partial results) |
| 02:07:45 | <thuban> | yanmaani: it's been done, the remaining channels will be run afterwards and/or concurrently |
| 02:08:15 | <pekster> | When I ran ALIS a few days ago, I did it in descending order of channel population, and at the time intended to ignore low popualtion channels; I wasn't thinking in terms of an archive then :P |
| 02:08:41 | <thuban> | ^ i have only slight familiarity with both irssi and perl though :< |
| 02:10:36 | | vela (vela) joins |
| 02:11:25 | | lun4 (lun4) joins |
| 02:11:33 | | nepeat joins |
| 02:11:35 | | nepeat is now authenticated as nepeat |
| 02:12:56 | | ave (ave) joins |
| 02:16:18 | <thuban> | i've got to get going; is there any way i can be of use at the moment? if not i'll be back early tomorrow |
| 02:16:57 | <pekster> | Not that I can think of, that list is plenty help; thanks again. |
| 02:17:46 | <thuban> | you're very welcome! |
| 02:29:12 | <@arkiver> | OrIdow6: you're copying your regexes? |
| 02:30:27 | <@arkiver> | OrIdow6: item-name:// will set the current item name, it'll be added as WARC header to the records as well (header X-Wget-AT-Project-Item-Name) |
| 02:30:34 | <@OrIdow6> | arkiver: I mean I have the same regex ^https?://[^%./]+%.wikidot%.com in like 15 places, and that makes it difficult to adapt to custom domains (which I can do, it's just a bit tedious) |
| 02:30:53 | <@arkiver> | 15? |
| 02:31:05 | <@arkiver> | why not do some check in the `allowed` function |
| 02:31:06 | <@OrIdow6> | I didn't count, that's just how it feels |
| 02:31:11 | <@arkiver> | right |
| 02:31:19 | <@arkiver> | single item, single wiki? |
| 02:31:54 | <@OrIdow6> | That's how it's set up now, yes |
| 02:32:03 | <@arkiver> | alright |
| 02:32:14 | <@arkiver> | last time i checked the largest wiki wasnt that large right |
| 02:32:16 | <@arkiver> | so should be fine |
| 02:33:07 | <@arkiver> | ah maybe it is large |
| 02:33:15 | <@arkiver> | turns out these were just from the last 7 days |
| 02:33:57 | <@arkiver> | well i'm off - ping me when you have something online :) |
| 02:34:16 | <@OrIdow6> | There is an alternate item scheme with revision numbers, it would be complicated but it's possible |
| 02:34:18 | <@OrIdow6> | Ok |
| 02:34:36 | <@OrIdow6> | Will do |
| 02:35:10 | <@arkiver> | thanks, good day to you! |
| 02:35:52 | <@OrIdow6> | You too |
| 03:56:39 | | DogsRNice quits [Read error: Connection reset by peer] |
| 03:58:51 | | qw3rty_ joins |
| 04:02:36 | | qw3rty__ quits [Ping timeout: 250 seconds] |
| 04:14:09 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 04:14:33 | | HP_Archivist (HP_Archivist) joins |
| 05:29:39 | | katocala quits [Ping timeout: 258 seconds] |
| 05:38:22 | | bradp quits [Ping timeout: 250 seconds] |
| 05:39:06 | | bradp joins |
| 06:03:46 | | c00k13 quits [Ping timeout: 258 seconds] |
| 06:07:44 | | sec^nd quits [Remote host closed the connection] |
| 06:08:04 | | sec^nd (second) joins |
| 06:42:09 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 06:42:38 | | HP_Archivist (HP_Archivist) joins |
| 06:44:12 | | igloo22225 (igloo22225) joins |
| 07:29:59 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 07:46:51 | | c00k13 joins |
| 08:19:59 | | c00k13 quits [Client Quit] |
| 08:40:45 | | c00k13 joins |
| 09:01:10 | | leo60228 quits [Read error: Connection reset by peer] |
| 09:01:22 | | leo60228 (leo60228) joins |
| 09:13:23 | | grawity quits [Remote host closed the connection] |
| 09:13:46 | | grawity (grawity) joins |
| 09:18:30 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 09:44:09 | | Sylirana quits [Ping timeout: 244 seconds] |
| 09:44:36 | | Sylirana (Sylirana) joins |
| 09:51:54 | | Sylirana quits [Ping timeout: 244 seconds] |
| 09:52:45 | | Sylirana (Sylirana) joins |
| 09:58:40 | | berndj joins |
| 10:37:02 | | katocala joins |
| 10:37:14 | | katocala is now authenticated as katocala |
| 12:34:19 | | sec^nd quits [Remote host closed the connection] |
| 12:37:01 | | Hackerpcs quits [Quit: Hackerpcs] |
| 12:38:15 | | Hackerpcs (Hackerpcs) joins |
| 12:39:22 | | Mateon1 quits [Ping timeout: 258 seconds] |
| 12:46:39 | | Mateon1 joins |
| 13:25:56 | | nothere quits [Ping timeout: 250 seconds] |
| 13:26:42 | | nothere joins |
| 13:30:20 | <@EggplantN> | 2hrs 🥳 nearly there |
| 13:36:52 | | mutantmonkey quits [Ping timeout: 258 seconds] |
| 13:46:36 | <Jake> | best of luck to IA! hope it all goes well! |
| 13:49:50 | | mutantmonkey (mutantmonkey) joins |
| 13:59:54 | | sonick (sonick) joins |
| 14:09:45 | | qwertyasdfuiopghjkl joins |
| 14:34:09 | <sembiance> | Jake: ? |
| 14:34:30 | <Jake> | https://twitter.com/internetarchive/status/1404489931251077126 more in #archiveteam |
| 14:35:06 | <sembiance> | ahhh, gotcha |
| 14:37:20 | <sembiance> | yes, I hope it goes well! *fingers crossed* |
| 14:40:23 | | yanmaani1 (yanmaani) joins |
| 14:41:16 | | yanmaani quits [Ping timeout: 258 seconds] |
| 15:05:38 | | LeGoupil joins |
| 15:06:59 | <IDK> | So s3 is the main server |
| 15:07:06 | <@EggplantN> | no |
| 15:07:20 | <@EggplantN> | s3 is their upload interface. |
| 15:07:25 | <IDK> | Oh |
| 15:07:31 | <@EggplantN> | s3 is what we use to upload our data |
| 15:07:38 | <IDK> | Ok |
| 15:07:51 | <IDK> | Then when is wbm and the libary going to be down |
| 15:08:25 | <@EggplantN> | See the tweets in #archiveteam |
| 15:09:39 | | Sylirana quits [Ping timeout: 244 seconds] |
| 15:10:06 | | Sylirana (Sylirana) joins |
| 15:32:18 | <@EggplantN> | Confirmed can see IA boxes are now going offline :) |
| 15:33:07 | <IDK> | WBM, open libary, upload down |
| 15:35:42 | | Arcorann quits [Ping timeout: 258 seconds] |
| 15:40:41 | | yanmaani1 quits [Ping timeout: 258 seconds] |
| 15:46:51 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 15:48:26 | | yanmaani1 (yanmaani) joins |
| 15:55:38 | | Sylirana quits [Ping timeout: 244 seconds] |
| 15:58:35 | | qwertyasdfuiopghjkl joins |
| 16:08:22 | | Sylirana (Sylirana) joins |
| 16:12:41 | | sec^nd (second) joins |
| 16:28:08 | <pekster> | thuban: No rush at all, but a copy of that same /list output with topics would be useful to include, since I only have that now for my smaller ALIS list. A more raw file format is fine (I can parse no problem.) |
| 16:29:10 | <thuban> | pekster: sure, one min |
| 16:32:11 | <thuban> | pekster: https://transfer.archivete.am/zTLwE/freenode_topics.txt tsv |
| 16:35:15 | <pekster> | Got it, thanks! I ran my list plus all of the new entries from yours last night, so that new info will let me include uniform topics too. I'm planning to do another round of parsing and toss the useful bits into a sqlite db, plus include the raw files and see where those need to go when the data is a bit more than 3 piles from my VPSes :D |
| 16:40:32 | <thuban> | cool cool. did you just get info or have you also made an attempt at flags and/or names? |
| 16:42:01 | <Jake> | (hopefully when the power comes back everything turns on fine!) |
| 16:42:08 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 16:43:25 | <pekster> | I got access (flags are default-hidden, but the 'ACCESS' command gives much the same on old-freenode.) I also have a partial list of users starting with my ALIS batch of founders & access that's wrapping up its query scan now, and will redo with your list from my other 2 VPSes shortly. |
| 16:43:44 | <pekster> | At least, that is for accounts that weren't dropped, but an informal skimming suggests many old accounts of founders & access-holders weren't. |
| 16:51:13 | <thuban> | are you planning on getting /names or just sticking to the users listed in access? (i have no idea whether join/part spam would be likely to attract banhammer) |
| 16:51:35 | <SCSi> | im sure if you went in, nabbed, left, sleep rand(), repeat |
| 16:51:40 | <SCSi> | you wouldnt raise any red flags |
| 16:52:46 | <grawity> | I'm not sure I see the point of backing up access lists and member lists, of all things |
| 16:52:54 | <pekster> | I wasn't planning on that, notably because at this point the old-FN network is in complete shambles. |
| 16:53:17 | <pekster> | It's in multiple pieces at the moment, notwithstanding the new ircd :\ |
| 16:54:19 | <SCSi> | i think the boat already sailed on the freenode project |
| 16:54:26 | <SCSi> | its too much of a mess now |
| 16:54:45 | <pekster> | Right. I think the channel list (not so much topics as any serious project updated those to warn users) and some of the founder/access accounts I have that are quite old are useful. |
| 16:55:06 | <pekster> | Granted even that is only a snapshot of things at the very end, but better than nothing. |
| 16:55:18 | <thuban> | surely the point of archiveteam is that we back stuff up just in case someone else thinks of specific points in the future? historical value, &c |
| 16:55:43 | <pekster> | I'll have a more useful summary once I parse this into sqlite and can run some queries, like distribution of accounts & how many older than 5/10/15/20 yrs. |
| 16:56:29 | <pekster> | Sure, I mean, if someone else wanted to script a bot that went into channels for usernames, I can de-dupe against my userlists and run scans, or get the de-dupe & query code on GitHub (it's low-brow, just enough to get the job done really.) |
| 16:57:18 | <pekster> | code intended to run in irssi's /exec, but it'll work anywhere a client can send command output to a directed nick. It just sends 'info $nickname' and sleeps for a predefined time, default 3s. |
| 17:01:31 | | qwertyasdfuiopghjkl joins |
| 17:03:59 | <thuban> | could you actually do channel interaction with /exec? seems to me from the docs that you can't (& would have to use the plugin api), but i haven't tried to script irssi in a decade |
| 17:04:43 | <thuban> | (er, by "channel interaction" i mean join/names/part; obviously you can send to a channel) |
| 17:06:14 | <pekster> | I don't think so, unless I've missed something. /exec will either print (privately to the client) in the active window, or to a target, be it the active window or a named nick/channel. Scripting joins would need lower level code or a script plugin. |
| 17:06:42 | <pekster> | In my case, all I've done so far is send messages to chanserv/nickserv, and I don't even parse them until later from logs. As I said, very low-brow scripting, but it's enough to harvest some data. |
| 17:09:09 | <pekster> | For nick harvesting you really want a context-aware bot or script interface that can move through a list of channels (ideally in size and/or importance preference to start scans partway though generation) and 1) join 2) collect nicks 3) dump that somewhere 4) part 5) pause 6) repeat. |
| 17:10:15 | <grawity> | tbh |
| 17:10:17 | <pekster> | And while that'll certainly catch more users, many of note have already left, so I question its value, but am happy to provide interim work (lists & code) sooner if someone had interest in making it happen. |
| 17:10:33 | <grawity> | if you intend on collecting this, I'd say don't forget the channel creation time in /mode |
| 17:11:03 | <pekster> | Already got the registered date in the chanserv 'info' queries. |
| 17:11:15 | <pekster> | That's for the entier ALIS & the /list I was given last night. |
| 17:11:16 | <grawity> | hmm |
| 17:11:19 | <pekster> | entire* |
| 17:11:32 | | HackMii quits [Ping timeout: 258 seconds] |
| 17:12:09 | <pekster> | eg, a completely random sample from what happened to be on my screen a moment ago: Registered : Apr 03 10:14:08 2014 (7y 10w 4d ago) |
| 17:12:12 | <grawity> | I mean some channels got registered a few years after they were created |
| 17:12:26 | <pekster> | Ah, true. |
| 17:12:46 | <grawity> | e.g. "Registered : Aug 21 11:12:33 2013" vs "Channel created on Wed, 14 Apr 2010 03:58:39" |
| 17:13:13 | <grawity> | (I already dropped this one earlier in the morning, it was kind of the last straw) |
| 17:15:20 | | HackMii (hacktheplanet) joins |
| 18:04:13 | <thuban> | ok, i have a channel-joiner working |
| 18:06:05 | <thuban> | unfortunately the weechat plugin api has a rather hairy attitude toward callbacks, and i ended up attaching it to a timer rather than incorporating a random sleep :X |
| 18:06:26 | <thuban> | what do you guys think i should set the interval to |
| 18:06:50 | <grawity> | oh I thought you were using irssi |
| 18:06:52 | | Sylirana quits [Ping timeout: 244 seconds] |
| 18:07:10 | | Sylirana (Sylirana) joins |
| 18:07:31 | <thuban> | not me, just pekster. i have a lot more experience scripting weechat, so i thought, if i want to get this done today... |
| 18:08:14 | <grawity> | if it's old freenode then just go with 1s or whatever, nobody is going to care anymore |
| 18:09:07 | <thuban> | 1s it is |
| 18:11:01 | <grawity> | is /names going to be just a source of user lists? |
| 18:11:43 | <thuban> | yeah |
| 18:11:48 | <grawity> | ok |
| 18:27:32 | <grawity> | I thought I'd get some logs from my backups but ugh, Borg really doesn't deal well with `ls */home` |
| 18:39:12 | | C4K3 quits [Remote host closed the connection] |
| 18:48:43 | | Sylirana quits [Ping timeout: 244 seconds] |
| 18:49:32 | <IDK> | is ia80* and 90* down |
| 18:53:44 | <IDK> | I need a wayback machine of wayback machine |
| 19:01:34 | | LeGoupil quits [Client Quit] |
| 19:03:15 | <@EggplantN> | No you don’t |
| 19:03:15 | <@EggplantN> | sir |
| 19:03:42 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 19:06:35 | | DogsRNice (Webuser299) joins |
| 19:07:45 | <IDK> | https://monitor.archive.org/weathermap/weathermap.html |
| 19:07:51 | <IDK> | A lot have 0 traffic |
| 19:12:36 | <@EggplantN> | They do |
| 19:12:48 | <@EggplantN> | But that’s only what the weathermap says |
| 19:15:18 | <rewby> | I don't trust that the weathermap is accurate |
| 19:15:31 | <rewby> | Given that the cacti graphs appear to be 50x-ing |
| 19:47:24 | <IDK> | I think docker changed their color for the container |
| 19:50:33 | | rbraun joins |
| 19:58:55 | | kilo4 joins |
| 19:59:12 | <kilo4> | hi, website is going down 18th june, and i would like to archive it, but it needs a login (which i have). can someone help? |
| 19:59:41 | <@EggplantN> | Firstly whats the site |
| 19:59:48 | <kilo4> | thesource2.to |
| 20:00:06 | <@EggplantN> | shutdown notice? |
| 20:00:33 | <@JAA> | 'on THE SOURCE legit sellers can sell real exclusive unreleased music and serious buyers can purchase them.' |
| 20:00:43 | <@JAA> | Uhh yeah, I'm sure that's all very legal. |
| 20:01:22 | <kilo4> | it's not but i really would like to archive it |
| 20:02:15 | <kilo4> | https://imgur.com/a/KsiXjtI here the notice |
| 20:02:41 | <@JAA> | Direct link: https://i.imgur.com/GWf4WAj.png |
| 20:02:48 | <IDK> | kilo4: Is that in a DM |
| 20:02:56 | <IDK> | or email |
| 20:03:12 | <kilo4> | it's basically a forum |
| 20:03:39 | <IDK> | Can we have the direct link for the forum post |
| 20:03:44 | <IDK> | kilo4 |
| 20:03:50 | <kilo4> | no, you need a login |
| 20:04:07 | <@arkiver> | if it needs login - no we're not archiving it |
| 20:04:26 | <kilo4> | ok, thanks anyways |
| 20:04:40 | <thuban> | you can make an archive for your personal use with https://github.com/archiveteam/grab-site |
| 20:04:47 | <IDK> | Bot / Projects cannot archive login restricted Data |
| 20:04:50 | <@arkiver> | we archive only public data |
| 20:05:04 | <thuban> | but it won't go in the wayback machine, etc |
| 20:05:28 | <kilo4> | okok, no problem i just wanted to have it saved somewhere locally or online |
| 20:05:49 | <kilo4> | https://github.com/archiveteam/grab-site works with login? |
| 20:06:04 | <@JAA> | You can give cookies to grab-site, then it does. |
| 20:06:08 | <thuban> | yes, ctrl-f "Website requiring login / cookies" for instructions on how to get it working |
| 20:06:24 | <@JAA> | https://github.com/archiveteam/grab-site#website-requiring-login--cookies |
| 20:06:36 | <thuban> | oh, right :) |
| 20:06:39 | <@JAA> | :-) |
| 20:07:03 | <kilo4> | thank you guys |
| 20:07:14 | <kilo4> | does it works on wsl? |
| 20:07:28 | <thuban> | https://github.com/archiveteam/grab-site#install-on-windows-10-experimental |
| 20:09:14 | <kilo4> | thank you, damnnn y'all fast replying ahahahh |
| 20:12:38 | <kilo4> | another question, can i bulk submit links to wayback machine? |
| 20:14:01 | <@JAA> | You can't really nowadays, I think. They severely rate-limit Save Page Now because it's constantly overloaded. |
| 20:14:50 | <@JAA> | If it's something within our scope, we can run it through ArchiveBot though. |
| 20:16:10 | <kilo4> | it's archivebot private right? |
| 20:16:23 | <thuban> | hm? |
| 20:17:21 | | HP_Archivist (HP_Archivist) joins |
| 20:17:36 | <kilo4> | archivebot is just for the archiveteam team ? |
| 20:17:42 | <kilo4> | or is it public |
| 20:17:50 | <@EggplantN> | ArchiveBot requires op/voice to use it here. The code is public however |
| 20:17:55 | <@EggplantN> | if you wish to run your own |
| 20:18:57 | <kilo4> | ok thank you bye thanks for your help :) |
| 20:20:04 | | kilo4 quits [Remote host closed the connection] |
| 20:24:43 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 20:25:10 | | HP_Archivist (HP_Archivist) joins |
| 20:32:56 | <thuban> | whoops, my freenode crawler is now getting refused connections |
| 20:33:15 | <russss> | apparently it's gone |
| 20:33:21 | <russss> | he killed "Freenode Classic" |
| 20:33:38 | <rewby> | I'm connected? |
| 20:33:45 | <thuban> | we're aware; i was talking about some of the old servers which are still providing services |
| 20:33:54 | <rewby> | I'm talking about classic too |
| 20:34:07 | <rewby> | Currently logged in to datapacket.freenode.net |
| 20:34:09 | <russss> | datapacket.freenode.net seems up still |
| 20:34:44 | <thuban> | are they working with ssl? (some channels require it) |
| 20:35:02 | <rewby> | Yep |
| 20:35:03 | <pekster> | datapacket up (for now) and I think ace, but only by IP: 104.129.24.66. |
| 20:36:03 | <thuban> | and it's back, thanks guys |
| 20:36:08 | <pekster> | I finished my full userscan shortly before happytree went down, which was anything on the founder + access lists from chanserv using the combined ALIS output I took plus the /list channels I was sent. I could reload the nickserv info query, but at this point I don't trust any of the remaining servers to remain up or have stable services for much longer. |
| 20:36:47 | <rewby> | I fully expect freenode classic to be offline by the time I wake up tomorrow |
| 20:36:50 | <pekster> | To do anything remotely useful, I think I'd have to get a new user from what the crawler has, then de-dupe against the lists I've already run. |
| 20:37:17 | <russss> | seems like he's threatened to shut it down about 4 times today |
| 20:37:43 | <thuban> | i didn't bother implementing very smart error handling, lol, but i can just retry all the failures |
| 20:37:59 | <pekster> | Well, at this point it would just be to catch "more" stuff that wasn't on the founder+access list. |
| 20:38:23 | <thuban> | pekster: want the ~7k nicks i already have? |
| 20:38:45 | <pekster> | Sure, I'll run it through my de-duper and can fire it up on at least 2 VPSes. |
| 20:38:59 | <thuban> | sounds good |
| 20:39:00 | <pekster> | Worst case the servers go down a few dozen in, but maybe it'll rack up hundreds/thousands more. |
| 20:40:03 | <thuban> | a wrinkle: i wrote this in a way which i realized belatedly didn't disambiguate prefixes |
| 20:40:30 | <pekster> | prefixes? Like the leading symbols? That's trivial to handle. |
| 20:40:55 | <pekster> | I have to run this through a de-duplication so I don't waste time querying what I have, so I can just regex any surrounding junk off. |
| 20:41:11 | <pekster> | Already have the de-dupe script, so adding a PCRE is easy ;) |
| 20:42:35 | <thuban> | er, yes and no? i'm not aware of a canonical list for freenode, and i know a lot of networks didn't follow the nick spec (permitting more characters, etc) |
| 20:43:34 | <pekster> | I don't catch your meaning of "disambiguate prefixes" then. If you mean something like @opedNick or +voicedNick, I can remove those. |
| 20:45:12 | <pekster> | I'm doing nothing fancy, but can easily parse things before de-duping in the loops. Trivial print-uniques-from-2nd-file script: https://pastebin.com/kMkLyjwK |
| 20:45:45 | <rbraun> | pekster: did you at least get a lot of chanserv info? |
| 20:46:09 | <pekster> | Yup, tons of it, and I seeded userlists from the chanserv founder+access. Thousands of both. |
| 20:46:37 | <pekster> | I've got to write a parser and remove some dupes (I forgot the -s to sort once, so queried a bunch of dupes on 1 run.) Then parse that into a sqlitedb to make the data marginally useful. |
| 20:49:35 | <rbraun> | cool! |
| 20:49:51 | <rbraun> | i assume you trawled alis for channel names too? |
| 20:50:15 | <thuban> | i mean, i'm not sure what other characters might have been used as prefixes. eg: there seems to be a lot of '^', which isn't allowed per spec, but it's a usage i'm not at all familiar with. without either a whitelist of known freenode prefixes or a blacklist of known characters not allowed in freenode nicknames, i can't _guarantee_ parsing that or possible other unusual |
| 20:50:18 | <thuban> | prefixes correctly |
| 20:50:25 | <pekster> | That's how I started, though I only went down to -min 10 (>=10 users) on my 6/11 scan. thuban was kind enough to provide me his /list from earlier, so I de-duped & scanned that too, then users from all of them. |
| 20:51:23 | <thuban> | (sorry for slow responses; running this on my main instance is maybe not the best decision i've ever made...) |
| 20:52:27 | <pekster> | Oh, micro-tier VPS? Yea, I'm using GCP+AWS stuff here, though irssi is pretty thin compared to some clients. It's about not to matter anyway for the extra users as there's only a single client-node left, so what I got before is likely all I'm going to get at this point. |
| 20:52:45 | <thuban> | 15:52:15 freenode -!- datapacket.freenode.net: Server Terminating. rasengan[~rasengan@freenode/staff/rasengan] |
| 20:52:47 | <thuban> | rude |
| 20:53:30 | <pekster> | Yup, there it goes. Game over, but I'm happy my original lists finished earlier without disconnect or netsplit/services disruption. |
| 20:53:36 | <thuban> | pekster: https://transfer.archivete.am/3rJhf/freenode_nicks.txt here's what i had... though that may be academic at this point. (prefixes unstripped.) |
| 20:53:52 | <rewby> | And classic. Now points to the main new network |
| 20:54:19 | <pekster> | Thanks, appreciate the work anyway. I'll grab it just in case in 2 days it's back online after another Lee-inspired decision, but I think it's moot now. |
| 20:54:25 | <@JAA> | thuban: The most rude thing here is that your timestamps aren't UTC. :-) |
| 20:54:31 | <rbraun> | lol |
| 20:55:13 | <thuban> | wow hurtful ;_; |
| 20:56:17 | <thuban> | (https://github.com/weechat/weechat/issues/886 hm) |
| 20:56:36 | <@JAA> | export TZ=UTC |
| 20:56:39 | <@JAA> | There, fixed it. :-) |
| 20:57:22 | <thuban> | i want local time in my display! this may surprise you but i have friends i know in meatspace |
| 20:57:35 | | Doranwen quits [Remote host closed the connection] |
| 20:57:55 | | Doranwen (Doranwen) joins |
| 20:58:34 | <@JAA> | Just messing with you. :-) I also have a clock on my screen in local time. All systems are configured to UTC though. Anyway, getting too off-topicky. |
| 21:00:50 | <pekster> | I'll continue to lurk, but will post both my raw chanserv/nickserv data (once I nuke the dupes due to rushed commands) & perhaps more usefully RDBMS files of the data. May take a day or several depending on #LifeStuff. |
| 21:01:39 | <thuban> | thanks again :) |
| 21:17:30 | <h3ndr1k> | Let's just invent ArchiveTeam Coin and People will throw storage space at us. Like with chia, but we could actually use it for something... |
| 21:27:09 | <nico> | 23:40 -!- Your host is capone.freenode.net[104.237.198.130/6667], running version ircd-seven-1.1.9 |
| 21:27:38 | <nico> | 23:40 -!- NickServ [~fuckyou@198.52.153.176 |
| 21:27:39 | <nico> | lol |
| 21:27:40 | <pekster> | I don't think that's linked to services, from the Libera/##freenode chatter. |
| 21:27:43 | <pekster> | Yea. |
| 21:28:50 | <thuban> | are all the servers with old services definitely gone, then? |
| 21:29:13 | <nico> | trying the list one by one |
| 21:29:23 | <nico> | lot have the irc port down |
| 21:34:45 | <nico> | done the whole list, everything point to the new freenode |
| 21:35:36 | <rbraun> | try 6667 and 6697 both too, but i can't find any that work |
| 21:35:52 | <rbraun> | h3ndr1k: that sounds good... |
| 21:36:15 | <rbraun> | they somehow lost the ability to log into capone iirc, so they delinked it |
| 21:36:27 | <nico> | lol |
| 21:36:59 | <nico> | and nobody could do an /oper then /quote die? |
| 21:37:08 | <rbraun> | idk |
| 21:37:09 | <rbraun> | i mean |
| 21:37:12 | <rbraun> | they are clueless, maybe they could have |
| 21:56:04 | | sonick_ (sonick) joins |
| 21:56:21 | | sonick quits [Remote host closed the connection] |
| 21:56:21 | | sonick_ is now known as sonick |
| 22:15:36 | | nico quits [Quit: leaving] |
| 22:21:10 | | nico joins |
| 22:42:06 | <berndj> | i used to connect on port 8001 |
| 23:07:28 | | BlueMaxima joins |
| 23:13:04 | | MrRadar quits [Quit: Rebooting] |
| 23:15:42 | | MrRadar (MrRadar) joins |
| 23:21:15 | | raftl joins |
| 23:21:43 | <raftl> | hey, im having an error installing grab-site on lubuntu |
| 23:23:00 | <raftl> | and yes it's me from hours ago i just forgot what my username was |
| 23:23:17 | <@JAA> | Welcome back, 'kilo4'. :-) |
| 23:23:36 | <raftl> | https://i.imgur.com/73iYJaF.png |
| 23:23:40 | <raftl> | this is the error i'm having |
| 23:23:58 | <raftl> | its in the last step of the instlation |
| 23:25:12 | <@JAA> | Hmm, interesting. |
| 23:25:17 | <raftl> | https://i.imgur.com/YLqWh90.png at the end |
| 23:25:18 | <thuban> | it looks like you haven't included the full error message. you should log the entire output to a file and link us to it |
| 23:25:27 | <thuban> | oh |
| 23:25:28 | <raftl> | ok |
| 23:25:36 | <@JAA> | cryptography requires wheel for building, but that is listed in pyproject.toml, so it should be handled automatically. |
| 23:26:02 | <@JAA> | In any case, try `.../pip install wheel` I think. |
| 23:26:17 | <@JAA> | And then again the command from the install instructions. |
| 23:26:24 | <raftl> | ok |
| 23:26:27 | <thuban> | (btw do we have a recommended pastebin? afaik transfer.archivete.am doesn't do inline for text files) |
| 23:26:54 | <@JAA> | thuban: It does. Just insert 'inline/' before the file ID. |
| 23:27:32 | <@EggplantN> | paste.ee is cool, ran by a friend (cats) |
| 23:28:18 | <thuban> | JAA: huh, i've tried that in the past (with a .sql file) and not had it work. extension-based? |
| 23:28:57 | <raftl> | i installed wheel and even updated pip and same error |
| 23:29:10 | <@JAA> | Yeah. Or well, whatever MIME type it detects. It works for text/plain and some others, possibly text/*? Not sure. |
| 23:31:19 | <raftl> | here are the logs |
| 23:31:19 | <raftl> | https://paste.ee/p/SmXTi |
| 23:32:40 | <@JAA> | Uh, that output indicates that you don't have wheel installed nor updated pip. Or is that from before? |
| 23:33:18 | <raftl> | its from now |
| 23:33:21 | <raftl> | after |
| 23:34:14 | <@JAA> | You ran `~/gs-venv/bin/pip install wheel` and `~/gs-venv/bin/pip install --upgrade pip` ? |
| 23:34:19 | <raftl> | https://paste.ee/p/Ql9Yq |
| 23:34:31 | <raftl> | ohhh ill try wait |
| 23:34:42 | <@JAA> | Yeah, it needs to be in that venv. |
| 23:36:22 | <raftl> | no such file or directory |
| 23:37:30 | <raftl> | apparently "pip" isnt a directory |
| 23:38:06 | <raftl> | ~/gs-venv/bin/pip |
| 23:38:17 | <@JAA> | No, it's an executable file. But I don't really understand what you're doing. That's the path you used according to your other paste. |
| 23:39:09 | <@JAA> | EggplantN: Meh, '[email protected]' bullshit from Cloudflare. |
| 23:40:22 | <raftl> | i mean, it worked with wsl but i had some errors when archiving so i tried on lubuntu |
| 23:41:18 | <raftl> | https://paste.ee/p/H16uw this is the error on windows wls |
| 23:41:36 | <raftl> | i prefer using wsl instead if it wasnt the error |
| 23:42:51 | <@JAA> | Not particularly surprised by that, but I have no experience with the dupes DB. |
| 23:43:42 | <raftl> | windows 10 is experimental so yea im not worried about |
| 23:44:20 | <raftl> | well, i have to sleep |
| 23:44:21 | <raftl> | d |
| 23:44:31 | <raftl> | how do i save this login? |
| 23:47:07 | | yanome quits [Quit: The Lounge - https://thelounge.chat] |
| 23:47:17 | | yanome (yano) joins |
| 23:47:45 | | raftl leaves |