| 07:17:40 | | dunger quits [Ping timeout: 265 seconds] |
| 09:12:01 | | chrismeller (chrismeller) joins |
| 10:09:23 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 10:20:56 | | Minkafighter quits [Client Quit] |
| 10:22:33 | | Minkafighter joins |
| 13:15:28 | | Barto quits [Ping timeout: 265 seconds] |
| 13:15:43 | | Barto (Barto) joins |
| 13:46:33 | | chrismeller quits [Client Quit] |
| 14:50:30 | <@phuzion> | Hey Ryz, you around? |
| 14:50:41 | <Ryz> | Yo, what's up? o: |
| 14:50:58 | <@phuzion> | Question for ya. Are you interested in adding shorteners to the tracker? |
| 14:51:34 | <Ryz> | I would prefer it rather having to wait for my requests to be taken up~ o.o; |
| 14:51:45 | <Ryz> | At least self-service kind of thing |
| 14:52:22 | <Ryz> | Alternatively, I was gonna ask if someone can add the URL shorteners, then I would add 'em to the wiki as encouragement |
| 14:52:29 | <@phuzion> | So we really only have one way to allow people to do that, and that's by giving them a tracker admin account. Is that something you'd be interested in? |
| 14:55:29 | <Ryz> | That is something I have interest in, since from time to time, I've been posting these URL shorteners here and tried to add 'em to the wiki, but there hasn't been much activity so I shied away from doing a lot of diggery |
| 14:55:47 | <@phuzion> | Alright cool. |
| 14:56:01 | <@phuzion> | Do you have Discord? this woud probably be easier to explain the admin UI over a voice call |
| 14:56:41 | <Ryz> | I did it again because I decided to dig into a random set of lists from #// and found some URL shorteners~ |
| 14:57:10 | <Ryz> | Uhh, I think I rather have it in text form, because I'm not as strong when learning via voice or speech~ |
| 14:57:19 | <@phuzion> | Ok, no problem. |
| 14:57:34 | <Ryz> | Would not hesitate to asking a bunch of questions for much clarity |
| 14:58:50 | <@phuzion> | Ok, so I just DM'd you your credentials. |
| 14:59:22 | <@phuzion> | If you'd like to change them, you have the power to do so in the admin UI. However, because your account is an admin account, please keep the password secure. |
| 15:00:26 | <@phuzion> | Ok, so there are really three things that you can do that affect the tracker. Add/change projects (shorteners), ban clients, and handle error reports. |
| 15:01:55 | <@phuzion> | Handling error reports is easy. If people are complaining that the tracker is full of error reports and people aren't getting new jobs, you can delete the error reports under the "Error reports" link and then click the "Delete all" button at the bottom of the page. Don't just incessantly do this if it keeps filling up though, because it could be an indicator of an issue with the project. |
| 15:02:16 | <@phuzion> | These haven't been filling up as much lately, so it's not something you should need to do often. |
| 15:02:44 | <@phuzion> | However, if you're going to be adding new projects, you may periodically run into an issue where the error reports are filling up because of a misconfiguration on a new project. |
| 15:04:05 | <@phuzion> | If you do notice that all of the error reports are filling up on a single project, you can open the project from the "Projects" link and disable it there. If you do so on a bigger project (yahoo, goo-gl, etc), make a note in here so that the rest of us can take a peek into what's going on with it. |
| 15:05:18 | <@phuzion> | Banning clients is very rare, it's mostly a feature that exists just in case. If someone tampers with their urlteam client and makes it report nothing but their onlyfans or something dumb like that, we can ban their client. In general, I'd ask for advice in here from others before jumping to a client ban. |
| 15:05:29 | <Somebody2> | Yay, a new tracker admin is born! Sorry I've been totally absent for so long, but glad to see we're getting a new person! |
| 15:05:51 | <@phuzion> | Somebody2: Yeah, JAA and I discussed it the other day. |
| 15:06:01 | <Somebody2> | nice |
| 15:06:09 | <Somebody2> | I also trust Ryz |
| 15:06:24 | <@phuzion> | Glad they get a :+1: from you as well. :) |
| 15:07:51 | <@phuzion> | Ok, so, now the tab that you're probably gonna be spending the most time in, Projects. This is where you add shorteners, and modify settings on existing shorteners. When you're adding a new shortener, you'll want to name it basically "something-tld", so if the shorturl is https://foobar.xyz, you would name it foobar-xyz. Obviously check and make sure that the project doesn't already exist before adding it. |
| 15:09:59 | <@phuzion> | Ryz: Everything make sense so far? |
| 15:10:35 | <Ryz> | Trying to take it all in, re-reading the parts to make sure I understand |
| 15:10:47 | <Ryz> | Since well, it's a different process from doing ArchiveBot stuff all these years |
| 15:11:11 | <@phuzion> | Right. |
| 15:11:41 | <Ryz> | When I see the Projects section, I thought there would be a lot more URL shorteners, until realizing the projects can have multiple URL shorteners to cover |
| 15:11:58 | <@phuzion> | We have a LARGE backlog of shorteners to add. |
| 15:12:40 | <Ryz> | Oh my goodness, when was the last time this was worked on? oo;;; |
| 15:12:54 | <@phuzion> | I just added one today to re-familiarize myself with the project lol |
| 15:13:26 | <@phuzion> | But I haven't done much with it in a while. |
| 15:14:07 | <Somebody2> | probably a couple years ago |
| 15:14:18 | <Somebody2> | AFAIK, it's been ticking along quietly the whole time |
| 15:14:26 | <Somebody2> | but without *new* shorteners being added |
| 15:14:38 | <@phuzion> | Yeah, afaik we've only got like 5-6 shorteners enabled right now. |
| 15:15:10 | <Somebody2> | most of the existing ones we've already gone thru |
| 15:15:23 | <Somebody2> | it's a long tale sort of situation -- |
| 15:15:29 | <@phuzion> | Yeah. |
| 15:15:33 | <Somebody2> | bit.ly is gigantic and we probably won't ever finish |
| 15:15:54 | <Somebody2> | then there are hundreds of ones with only 4 character shortcodes that we go thru in a few hours |
| 15:17:04 | <@phuzion> | Ryz: Ok, so here's what I want you to do for your first project. Pick a shortener. Figure out what type of redirection it has for a valid shorturl, and an invalid shorturl, and demonstrate to me how you figured those things out. Think you can do that? |
| 15:17:38 | <Ryz> | Gonna try a really simple URL shorterner, let's see from what I posted above... |
| 15:17:49 | <@phuzion> | I've already added amp.gs, so not that one. |
| 15:17:54 | <@phuzion> | And I added t.ly from the wiki page. |
| 15:18:11 | <Ryz> | http://amp.gs/ would be one, http://amp.gs/jpWAq with 5 character letters |
| 15:18:19 | <Ryz> | Oh, heheh |
| 15:19:32 | <Ryz> | The next would be https://stuf.in/ - sample URL: https://stuf.in/b95jak - then |
| 15:20:38 | <@phuzion> | And what type of redirection does that shortener use? |
| 15:22:18 | <Ryz> | Uhh, how can I identify the type of redirection? |
| 15:22:49 | <Ryz> | Tried to do view-source:http://amp.gs/jpWAq first to see if it's that other kind of redirect |
| 15:23:15 | <@phuzion> | In general, it's an HTTP status redirect. The way I find it is by using `curl -v https://stuf.in/b95jak` and look for HTTP 301 or HTTP 302 or something. |
| 15:24:27 | <@phuzion> | You can use wheregoes.com if you don't have curl, or if you find it to be easier to use a web tool. https://wheregoes.com/ |
| 15:25:05 | <Ryz> | Okay, seeing that, it is a 301 Redirect |
| 15:26:12 | <@phuzion> | And what happens if you try to shorten an invalid URL? |
| 15:26:48 | <Ryz> | ...A 200 followed by a 200, uhh~ |
| 15:27:07 | <Ryz> | I did something gibberish like https://stuf.in/sdffggggrwrfsfef |
| 15:27:33 | <@phuzion> | Awesome. So based on what you’ve found, it looks like this is a fairly standard shortener. |
| 15:28:04 | <@phuzion> | (I’ve been verifying everything you’ve been saying so I know it’s correct by the way) |
| 15:28:29 | <@phuzion> | So, the first step is to create the project in the admin UI. What are we gonna call this project? |
| 15:28:34 | <Ryz> | Ah, I see I'm veering towards similar territory where I might encounter really bullshit URL shorteners like I've been encountering in ArchiveBot, finding bizzare websites that trip up ArchiveBot from time to time <#>; |
| 15:29:10 | <Ryz> | Based on the naming scheme on http://amp.gs/ with amp-gs - with https://stuf.in/ - I'll go with stuf-in |
| 15:29:39 | <Ryz> | Gonna type it in~ |
| 15:29:50 | <@phuzion> | Perfect. Go ahead and create that project and let me know when you’ve done that. Just create it, but don’t make any changes once you’ve created it. |
| 15:30:14 | <Ryz> | I do notice '-' was used in place of '.' - not all of 'em, some don't have '-' |
| 15:30:23 | <Ryz> | Okay, I created the 'stuf-in' entry |
| 15:31:07 | <Somebody2> | Yeah, some of the early ones didn't use the - for . convention |
| 15:31:59 | <@phuzion> | Ok, great. Now, go to shortener settings |
| 15:32:40 | <Ryz> | Okay, I'm in 'Shortener Settings'; whoa, looks like 12 things to be able fill in Oo; |
| 15:32:56 | <@phuzion> | For the most part, you can generally ignore "minimum library version" and "minimum pipeline version". Those generally tend to only get messed with if we are working on a custom shortener that does something strange like javascript or something that requires client-side work. |
| 15:33:20 | <Ryz> | Mm, which I'll need help for when I encounter something BS like that >#<; |
| 15:33:57 | <@phuzion> | Alphabet, this is the list of characters (case senstiive) that can be supplied as potential shorturls. So if you have a shortener where the shorturls are basically https://short.url/123 https://short.url/548 https://short.url/12612 https://short.url/123521 https://short.url/662323, you could change the alphabet to 1234567890 |
| 15:34:15 | <@phuzion> | If you know that the alphabet is lowercase only, you can remove the capitals, etc. |
| 15:35:14 | <Ryz> | Aaaah, would need to find away to see if it uses any other character letters or there are ones that need to be removed~ |
| 15:35:30 | <@phuzion> | URL template is where you change the actual shorturl in the project. I'd recommend checking if the shortener supports https, and if it does, putting that into the shorturl template, because that would save one redirect for each attempt if they automatically redirect you from HTTP to HTTPS. |
| 15:35:44 | <@phuzion> | So what do you think you should enter for the URL template here? |
| 15:36:19 | <Ryz> | Before moving on to that, for me, one way of checking a sampling of available used URLs is https://web.archive.org/web/*/https://stuf.in/* |
| 15:36:35 | <@phuzion> | That's a decent way to do it, sure. |
| 15:36:50 | <@phuzion> | Another thing you could do is try creating a few URLs using their interface if it's public and see what URLs they give you back. |
| 15:37:00 | <Ryz> | There's no '_', but then again, that function got nerfed from 200,000 max shown links to a mere 10,000 links... |
| 15:37:15 | <Ryz> | Or I think it was previously 100,000 links |
| 15:37:58 | <Ryz> | Checking https://stuf.in/ - it both supports HTTPS and HTTP - for some reason, HTTP doesn't redirect to HTTPS |
| 15:39:24 | <Ryz> | I guess in that case, in the 'URL template' entry, changing from 'http://example.com/{shortcode}' (without the ') to 'https://stuf.in/{shortcode}' |
| 15:39:32 | <@phuzion> | Yep. |
| 15:39:50 | <@phuzion> | I'd recommend still going with HTTPS because if they DO decide to turn on HTTPS redirection at some point, you'll already have handled that there. |
| 15:40:09 | <Ryz> | Definitely, keeping it on HTTPS |
| 15:42:52 | <@phuzion> | Ok, so we have the URL template set. Ignore the next two, "time between requests" and "http method". You'll know if you need to change those. |
| 15:43:05 | <Ryz> | Mhm o: |
| 15:43:59 | <@phuzion> | Redirect status codes. This is where you tell the client "this is what a successful redirect looks like". We basically keep all of the "HTTP Redirect" options in there by default. Since you found that this tracker uses 301s, we don't need to change anything here. |
| 15:45:50 | <@phuzion> | No redirect status codes. In general, you don't need to change this. If the tracker does something weird, you can change it, but it's not necessary in this case. |
| 15:47:04 | <@phuzion> | Unavailable status codes, this is where that 200 goes. When you type a bad shorturl like "https://stuf.in/jkd2kl3jksldxjk34" it gives you an HTTP 200 basically saying "This isn't a valid URL" Since we've already got that, you can just ignore it and leave it at the default. |
| 15:47:26 | <@phuzion> | Banned, this is what happens if we hit the tracker too aggressively and they decide to say "Fuck you, you're not allowed to hit us." |
| 15:47:43 | <@phuzion> | Content body regular expression and location header reject regular expression aren't things you need to worry about. |
| 15:47:54 | <Ryz> | Ah, there's a distinct difference between 'No redirect status codes' and 'Unavailable status codes'; thought it sounded the same at first |
| 15:48:37 | <Ryz> | As for "HTTP method (get/head)" entry, how can you tell whether to use one of the two? Since even using https://wheregoes.com/ - it doesn't seem to say whether the action is a 'get' or a 'head' |
| 15:49:04 | <@phuzion> | Almost every shortener is a HEAD. |
| 15:51:07 | <@phuzion> | Ok, so go ahead and save your settings for the shortener and let me know when you've done so, that way I can verify |
| 15:51:17 | <Ryz> | Ah, how can I tell if it's a 'GET'? |
| 15:51:34 | <Ryz> | Ah, hold on, was gonna ask whether to go for it, since the only thing that's changed is the 'URL template' entry |
| 15:52:18 | <Ryz> | I see a blue box saying 'Settings saved.' at the top |
| 15:53:12 | <@phuzion> | Ok, that all looks good. |
| 15:53:40 | <@phuzion> | Now, you'll go to queue settings, and turn on AutoQueue, and hit the blue apply button at the bottom of the page |
| 15:54:37 | <Ryz> | Getting to the 'Queue Settings'; ah, the AutoQueue is a checkbox option that's not filled in right now, |
| 15:54:55 | <@phuzion> | Yep, turn that on and click apply. |
| 15:55:19 | <Ryz> | Hmm, what about having to set up what's the length of the URL shortener code? oo; |
| 15:55:26 | <Ryz> | Or is that after? |
| 15:55:49 | <@phuzion> | We just start at 0, which would be a, b, c, d... aa, ab, ac, etc. |
| 15:55:57 | <Ryz> | Aaah, okay~ |
| 15:56:02 | <Ryz> | Pressing 'Apply' |
| 15:56:13 | <Ryz> | 'Settings saved.' via blue box |
| 15:56:13 | <@phuzion> | A lot of shorteners start there, so we just start there as a default. |
| 15:56:32 | <@phuzion> | Alright, now click the "Enabled" checkbox, and click the apply button immediately below it. |
| 15:56:51 | <@phuzion> | Once you do that, jobs will be released to the clients, who will start hitting the shortener. |
| 15:56:57 | <Ryz> | Done and done |
| 15:57:04 | <Ryz> | 'Enabled', the blue box says |
| 15:57:18 | <@phuzion> | Alright, now you can click on "Claims" to see the jobs that are released out to clients. |
| 15:57:56 | <Ryz> | Oh, that's a lot more entries Oo; |
| 15:58:04 | <@phuzion> | So, you can see the first queue, lower seq num 0 ranges from 0 to N, so 0, 1, 2, 3, etc, a, b, c, d, etc, A, B, C, D, all the way through to L, M, N. |
| 15:58:28 | <@phuzion> | Each client will hit all of those shorturls, and return results when they're done. |
| 15:58:45 | <@phuzion> | You can click results and watch them come in |
| 15:59:00 | <Ryz> | Definitely wanna see the results, seeing the fruits of my labor O: |
| 15:59:39 | <Ryz> | Oh look, another potlink source for me to mine it and archive it via ArchiveBot xD |
| 16:00:33 | <Ryz> | I'm assuming these URLs get sent to #// ? |
| 16:00:52 | <@phuzion> | The shorturls that we find? |
| 16:01:20 | <Ryz> | The resulting URL from going through the URL shortener link |
| 16:02:03 | <@phuzion> | No, we don't automatically post these URLs anywhere. |
| 16:02:14 | <@phuzion> | They're archived and stored on IA. |
| 16:02:41 | <Ryz> | Oh, I'm surprised it's not being sent to #// for further processing~ |
| 16:04:51 | <Ryz> | Checking the 'Error Reports' section, seeing entries, unsure what to do or leave it alone |
| 16:05:24 | <@phuzion> | We're probably overloading the shortener, I'm gonna edit the settings down a bit. |
| 16:06:13 | <@phuzion> | Alright, I reduced the amount of queues we have, and increased the time between checking URLs. |
| 16:06:43 | <Ryz> | I guess the 500s errors will be retried again at some point |
| 16:08:11 | <Somebody2> | It would be great if someone (you?) wanted to mine the shorturls we find and pick relevant ones to send thru Archivebot |
| 16:08:17 | <Somebody2> | but we haven't done so yet |
| 16:08:45 | <Somebody2> | Also, as for the distinction between HEAD and GET ... |
| 16:08:51 | <Ryz> | Yeah, that's definitely an option, but would wanna focus on the other potlinks I've been building up over the years; so many potlinks x_x; |
| 16:08:57 | <Somebody2> | heheh |
| 16:09:19 | <Somebody2> | HEAD is requesting just the headers from the server, while GET requests the whole page |
| 16:09:31 | <@phuzion> | Anyways, I'm about to step away for the day. Ryz, if you wanna add another shortener, go ahead, maybe have Somebody2 verify that it all looks good before enabling it? |
| 16:09:41 | <Somebody2> | sure, happy to check things over |
| 16:09:55 | <@phuzion> | Somebody2: do you have a tracker admin account? |
| 16:10:00 | <Somebody2> | yep |
| 16:10:04 | <Somebody2> | let me make sure it's working |
| 16:10:05 | <@phuzion> | cool |
| 16:10:08 | <Ryz> | That would be very helpful for someone to check for errors~ >#<; |
| 16:10:13 | <@phuzion> | yeah I can reset your password if needed |
| 16:10:49 | <Somebody2> | yep, it works |
| 16:10:52 | <@phuzion> | cool |
| 16:11:19 | <@phuzion> | Yeah you seem to know your way around the project a bit better than I do, but I figured I'd get ryz up to speed with "tracker admin 101" |
| 16:11:31 | <Somebody2> | and I'm very grateful! |
| 16:12:11 | <Somebody2> | we should always default to trying HEAD first (because it's less data), but there are a few reasons why we might need to switch to GET |
| 16:12:26 | <Somebody2> | 1) The server is weird, and explicitly refuses to respond to HEAD |
| 16:12:46 | <Ryz> | Meanwhile, checking the 'amp-gs' project out of curiosity, checking 'Error Reports', no reports at all, woo o: |
| 16:12:55 | <Somebody2> | 2) The redirection is weird and the target URL is only *present* in the body |
| 16:12:58 | <Ryz> | Unless they're cleaned out... oo; |
| 16:12:58 | <Somebody2> | that's basically it |
| 16:13:20 | <Somebody2> | Not having any error reports isn't that odd. The main question is if there are recent *results* |
| 16:13:32 | <@phuzion> | I cleared the errors on the project that was erroring, Ryz |
| 16:13:42 | <@phuzion> | I think it was stuf-in |
| 16:13:46 | <Ryz> | Aaaah >#<; |
| 16:13:52 | <Ryz> | I was checking 'amp-gs', |
| 16:14:01 | <Somebody2> | and amp-gs is returning data, yes |
| 16:14:36 | <Somebody2> | BTW, you can get directly to the results and error pages for a shortener by clicking on the Found and Scanned columns on the main page |
| 16:14:49 | <Somebody2> | I added that, a while back, because it was pissing me off |
| 16:14:53 | <@phuzion> | Also I checked, and I don't believe a 500 is a banned code for stuf.in, I hammered it with a bunch of requests (while true; do curl -v stuf.in/8x4; done) and it didn't give me a 500. |
| 16:15:00 | <Somebody2> | having to click thru twice |
| 16:15:16 | <@phuzion> | So anyways, I'm gonna step away for the day. Ryz, have fun, and let us know if you have any issues. |
| 16:15:53 | <Somebody2> | See ya, phuzion ! |
| 16:16:01 | <Somebody2> | Ryz: want to try adding another shortener? |
| 16:16:48 | <Ryz> | Yeah, definitely, this time it's going to be http://toi.in/ - which immediately, there's a '_' in the shortened ID |
| 16:17:39 | <Somebody2> | nice -- just put that in the alphabet |
| 16:18:01 | <Ryz> | Mhm, trying to do it in order, and then having to slot it in the wiki afterwards |
| 16:18:46 | <Somebody2> | also, feel free to refactor the wiki page if/when you think of a better way to arrange things |
| 16:18:46 | <Ryz> | Okay, that one up there, since the attempts of doing projects like this in general isn't recorded automatically or done like ArchiveBot ><; |
| 16:19:01 | <Ryz> | New project, toi-in |
| 16:19:05 | <Somebody2> | looking now |
| 16:19:33 | <Somebody2> | I'm turning off ow-ly since it doesn't seem to have returned results in a while |
| 16:19:41 | <Ryz> | Checking in 'Shortener Settings', obviously I have to add the '_' in the alphabet~ |
| 16:20:01 | <Somebody2> | yep |
| 16:20:56 | <Ryz> | Changed URL template to https://toi.in/{shortcode} since HTTPS overpowers HTTP |
| 16:21:08 | <Ryz> | ...I don't know why I got that HTTP link before, probably because a really old link |
| 16:21:19 | <Somebody2> | could be |
| 16:21:35 | <Ryz> | Time to check what kind of URL redirect is this... |
| 16:22:22 | <Ryz> | Hmm, uhh, https://wheregoes.com/trace/20222457585/ |
| 16:22:28 | <Ryz> | It's a 301, then a 301, and then a 200 |
| 16:23:31 | <Ryz> | Then there are URLs like https://toi.in/XpjP_a70+ that redirect to https://toi.in/micron/analytics.html?str=XpjP_a70+ - which seem to be broken, but since the alphabet doesn't have '+', don't need to worry |
| 16:23:32 | <Somebody2> | Ah, just put in the second redirect |
| 16:23:58 | <Ryz> | In the URL template? |
| 16:24:07 | <Somebody2> | i.e. the one with micron in it -- yeah, in the URL template |
| 16:24:15 | <Somebody2> | confirm that it works, though |
| 16:25:10 | <Ryz> | Was able to go through http://toi.in/micron/redirect.html?str=XpjP_a70 fine |
| 16:26:13 | <Ryz> | I was gonna say should've we also get http://toi.in/XpjP_a70 ? Because when sending the stuff via WBM, people would think http://toi.in/XpjP_a70 wouldn't exist while http://toi.in/micron/redirect.html?str=XpjP_a70 is archived |
| 16:26:13 | <Somebody2> | perfect |
| 16:26:23 | <Somebody2> | this doesn't directly go into the WBM |
| 16:26:25 | <Somebody2> | sadly |
| 16:26:48 | <Somebody2> | it just gets saved as basically CSV files that map from the short *code* to the long one |
| 16:27:16 | <Somebody2> | actually saving WARCs of the redirections is another neat project we haven't got to |
| 16:28:00 | <Ryz> | I wonder if that's the reason there hasn't been much activity here to add more and more URL shorteners... s: |
| 16:28:23 | <Somebody2> | quite possible! |
| 16:28:35 | <Ryz> | Okay, so there's a 301 Redirect, now what's the unavailable one... |
| 16:28:50 | <Somebody2> | although I think the biggest reason is just that people don't think saving shorturls is that exciting or interesting |
| 16:29:14 | <Ryz> | https://wheregoes.com/trace/20222457603/ - it just redirects to http://toi.in/micron/error.html - which is a 200 |
| 16:29:40 | <Ryz> | Also https://wheregoes.com/trace/20222457611/ |
| 16:30:16 | <@phuzion> | One thing I’ve kinda mused about was a multi-function browser extension that simultaneously revolves shorturls for live links, and fixes dead shorteners by providing the original link |
| 16:30:20 | <Ryz> | For me, I think it's yet another way for me to find more obscure and odd material, potlink material |
| 16:30:34 | <Somebody2> | what's a potlink, btw? |
| 16:31:03 | <Ryz> | For me, it's my own personal term for finding even more websites and links to look over, hence a pot full of links |
| 16:31:37 | <Ryz> | Think those lists of sections under 'Blogroll' for example |
| 16:31:42 | <Somebody2> | ah, got it |
| 16:32:02 | <Somebody2> | pot 'o links |
| 16:32:18 | <Ryz> | Okay, so there's nothing else to alter, the unavailable status codes is still 200 after checking it |
| 16:32:29 | <Ryz> | Somebody2, check it o: |
| 16:35:09 | <Somebody2> | will do |
| 16:35:54 | <Somebody2> | looks good, open it up |
| 16:36:42 | <Ryz> | 'Open it up'? Like run it? |
| 16:36:46 | <Somebody2> | yep |
| 16:36:57 | <Somebody2> | turn on the auto-queue and enable it |
| 16:37:56 | <Ryz> | Did both of those, one after another |
| 16:39:25 | <Somebody2> | ah, we need to add one more setting to avoid including errors |
| 16:39:37 | <Somebody2> | it looks like it redirects to /micron/error.html on error |
| 16:39:44 | <Ryz> | Uh-oh~ |
| 16:39:58 | <Somebody2> | so you can ignore those by adding that to the Location header reject regular expression |
| 16:40:05 | <Somebody2> | which I'm going to do now |
| 16:40:14 | <Ryz> | Alright alright ><; |
| 16:40:45 | <Ryz> | Would have to figure out what to do if I'm alone after some time getting accustomed to this |
| 16:40:50 | <Somebody2> | heh, no worries |
| 16:44:41 | <Somebody2> | and looks like the results are coming in nicely |
| 16:45:06 | | JackThompson05 quits [Ping timeout: 265 seconds] |
| 16:45:24 | <Ryz> | After the initial clear up, yup o: |
| 16:45:46 | | JackThompson05 joins |
| 16:46:35 | <Somebody2> | :-) |
| 16:47:02 | <Somebody2> | Wanna put in another one? |
| 16:47:33 | <Ryz> | I'm trying to see what's going on with the Error Reports in 'stuf-in' or if it's unfortunately something that is inherent to that URL shortener |
| 16:48:03 | <Somebody2> | You can probably just add HTTP 500 as an "Unavailable" code |
| 16:48:14 | <Somebody2> | that'll remove them from the Error Reports |
| 16:49:56 | <Ryz> | Hmm, asking considering phuzion's thoughts on the 500 stuff that they spoke out |
| 16:50:21 | <Somebody2> | makes sense |
| 16:51:15 | <Somebody2> | yeah, stuf-in does seem to be having problems |
| 16:51:20 | <Somebody2> | no recent results either |
| 16:51:20 | <Ryz> | As for adding more, well, there's 2 more left above that need to be added and I guess I'm done for now (besides having to punch 'em in the wiki) |
| 16:51:41 | <Ryz> | Being https://cstu.io/ and https://chng.it/ |
| 16:51:43 | <Somebody2> | heh, there are lots more to research and add listed on the wiki page |
| 16:51:59 | <Somebody2> | but doing just those two is fine |
| 16:52:08 | <Somebody2> | thanks for stepping up! |
| 16:52:28 | <Ryz> | Mhm >#<; |
| 16:52:45 | <Ryz> | I do really wish the resulting URLs get fed back to #// |
| 16:53:17 | <Ryz> | Onto https://cstu.io/ - new project: cstu-io |
| 16:54:47 | <Ryz> | Hmm, checking https://web.archive.org/web/*/https://cstu.io/* - it looks like there's no upper case character letters |
| 16:54:52 | <@phuzion> | Ryz: the amount of URLs that we collect, even from a single shortener, would overwhelm that channel in a matter of seconds. |
| 16:55:22 | <Somebody2> | Although, we could likely do some filtering and put the *interesting* ones thru |
| 16:55:32 | <Somebody2> | there is a lot of absolute garbage |
| 16:55:36 | <@phuzion> | Yeah |
| 16:55:44 | <Ryz> | Wait a minute, https://web.archive.org/web/2021*/https://cstu.io/081a49 - hmm, this has been archived before, but there doesn't seem to be such a thing as a cstu-io project |
| 16:56:01 | <@phuzion> | Ryz: It might have gotten crawled otherwise |
| 16:56:05 | <Ryz> | Archived under the archiveteam_urls collection |
| 16:56:06 | <Somebody2> | Yeah, there are other ways that shorteners get into the WBM |
| 16:56:21 | <Somebody2> | also, it might have gotten lost somewhere |
| 16:57:21 | <Ryz> | Welp, oh well, I guess more of a potlink source then for me, though would really wanna mine out those goodies |
| 16:57:40 | <Ryz> | Anyway, alphabet, only lowcase and numbers since checking https://web.archive.org/web/*/https://cstu.io/* |
| 16:57:46 | <Ryz> | URL template is https://cstu.io/{shortcode} |
| 16:58:18 | <Ryz> | It's a 302 Redirect as per https://wheregoes.com/trace/20222457833/ |
| 16:59:03 | <Ryz> | But the redirect status codes already got that by default |
| 17:00:00 | <@phuzion> | Ryz: Yeah, a 301 is permanent. A 302 is temporary. |
| 17:00:09 | <Ryz> | For unavailable URL, it's a 200 as per https://wheregoes.com/trace/20222457844/ |
| 17:00:36 | <Ryz> | It just redirects back to https://contentstudio.io/ |
| 17:01:01 | <Somebody2> | mining them is a great idea -- it just takes someone to do the work |
| 17:01:40 | <Somebody2> | that all seems right, go ahead with starting it up |
| 17:01:41 | <Ryz> | If mining is a thing, one good way to go through stuff is filtering out URLs that I already saw so I don't resee 'em again and get my time wasted |
| 17:01:49 | <Somebody2> | yep |
| 17:02:04 | <Ryz> | Saved the settings for cstu.io - check incase Somebody2 |
| 17:02:22 | <Somebody2> | will look now |
| 17:02:45 | <Somebody2> | looks good! |
| 17:03:14 | <Ryz> | And then autoqueue followed by enable~ |
| 17:03:28 | <Somebody2> | Also, something that would be good to do is try and figure out how to get tinyurl and bit.ly working again |
| 17:05:26 | <Ryz> | Uhh, the results uhh, they're just going back to https://contentstudio.io/ - help? |
| 17:06:31 | <Ryz> | Somebody2? |
| 17:08:05 | <Ryz> | phuzion? |
| 17:08:22 | <@phuzion> | disable the project |
| 17:08:50 | <@phuzion> | I just disabled it |
| 17:08:55 | <Ryz> | Turn off autoqueue and queue? |
| 17:08:57 | <Ryz> | Oh |
| 17:10:22 | <Ryz> | Would it be possible to just reject https://contentstudio.io/ similar to how Somebody2 did with the previous project? |
| 17:12:51 | <@phuzion> | I just did that. |
| 17:13:17 | <@phuzion> | So if it includes "contentstudio.io" in the results, it'll ignore it |
| 17:13:38 | <Ryz> | Ah, I think Somebody2 didn't notice me saying any invalid URL just redirects back to https://contentstudio.io/ |
| 17:13:45 | <@phuzion> | Possibly. |
| 17:13:47 | <Ryz> | And I went with the go-ahead, welp ><; |
| 17:13:53 | <@phuzion> | But keep an eye on it for a bit |
| 17:14:11 | <Ryz> | Mhm, incase of errors and such |
| 17:14:15 | <@phuzion> | In the future if you get into a place where you're not sure whether you should keep going, or if something is broken, just disable the project. |
| 17:14:30 | <@phuzion> | Disable it, take a note to work on it later, and continue on to the next one. |
| 17:14:43 | <@phuzion> | Anyways, I'm actually heading out now. I'll catch you later. |
| 17:15:01 | <Ryz> | Ah, I have one more before being done but alrighty >#<; |
| 17:15:16 | <Ryz> | The last one for the time being which is https://chng.it/ |
| 17:15:20 | <Somebody2> | yeah, I missed that |
| 17:15:22 | <Somebody2> | sorry! |
| 17:15:46 | <Somebody2> | Yeah, it's always harmless to disable a project |
| 17:17:52 | <Ryz> | As for the results in general, I can kinda see why some people think it's kinda boring, because some of the fruits of labor isn't readily apparently when it's starting from the smallest code length to the longest~ |
| 17:18:03 | <Somebody2> | yep :-) |
| 17:18:11 | <Ryz> | I guess I got lucky with some of 'em I got that immediately gave results |
| 17:18:21 | <Ryz> | Don't really mind, it's like planting a tree :p |
| 17:18:39 | <Ryz> | Just more concerned whether I may be overloading the URLTeam project ><; |
| 17:18:46 | <Somebody2> | although for a lot of shorteners, they also start from the shortest codes |
| 17:18:56 | <Somebody2> | it's pretty much impossible to overload this project :-P |
| 17:19:27 | <Somebody2> | easier to overload individual shorteners, though |
| 17:21:52 | <Ryz> | New project: chng-it |
| 17:22:08 | <Somebody2> | nice |
| 17:23:04 | <Ryz> | No need to touch alphabet; URL template is https://chng.it/{shortcode} - redirect type is 301 as per https://wheregoes.com/trace/20222458015/ |
| 17:23:10 | <Somebody2> | perfect |
| 17:23:30 | <Ryz> | As for unavailable URL, this time it's a 404 as per https://wheregoes.com/trace/20222458029/ |
| 17:24:15 | <Ryz> | So for 'Unavailable status codes', I should add 404 next to 200, or remove 200 to replace it with 404? |
| 17:24:22 | <Somebody2> | add it |
| 17:24:43 | <Ryz> | Huh, even though in 'No redirect status codes', there's a 404 there? |
| 17:25:51 | <Somebody2> | ah, in that case, it's probably fine |
| 17:26:07 | <Somebody2> | honestly, I'm not sure what the distinction between No redirect and Unavailable is, exactly |
| 17:26:16 | <Somebody2> | I think they do the same thing |
| 17:26:24 | <Ryz> | Could just wait for phuzion when they come back |
| 17:26:30 | <Somebody2> | Unavailable may be used for redirects that *used* to exist, but were removed |
| 17:26:41 | <Somebody2> | but I'm pretty sure the code treats them the same |
| 17:27:09 | <Ryz> | I'll leave 'Unavailable status codes' entry alone, just leaving 200 alone |
| 17:27:33 | <Ryz> | Settings saved, check it please Somebody2 |
| 17:28:03 | <Ryz> | Oops, I somehow left a 404 there still |
| 17:28:26 | <Somebody2> | heh, will look |
| 17:28:30 | <Ryz> | Interesting, separated by a space, not something like ',' |
| 17:29:52 | <Somebody2> | looks good |
| 17:30:07 | <Somebody2> | yeah there are a bunch of weird style things about this codebase |
| 17:30:16 | <Ryz> | autoqueue and then queue time~ o: |
| 17:31:16 | <Somebody2> | go go go! :-) |
| 17:31:53 | <Ryz> | Yeah, I think that's it for now |
| 17:32:03 | <Ryz> | Gonna try and add the ones I added via URLTeam tracker into the wiki |
| 17:32:58 | <Somebody2> | excellent |
| 17:33:03 | <Somebody2> | let me know if you run into any problem |
| 17:34:07 | <Ryz> | Hmm, a bit confused on how to add the URL shorteners in https://wiki.archiveteam.org/index.php/URLTeam |
| 17:34:15 | <Ryz> | I think I only know how to add the dead ones... |
| 17:36:50 | <Ryz> | Somebody2? |
| 17:37:09 | <Somebody2> | looking |
| 17:37:48 | <Somebody2> | Enter them under the "Warrior projects" section |
| 17:38:07 | <Somebody2> | Using the template, urlteam blank warrior entry, mentioned at the top of the table |
| 17:38:32 | <Somebody2> | The three pieces of info are just project name, an example URL, and any relevant comments you want to make |
| 17:43:15 | <Ryz> | I think I'm a lot more confused than I thought :c |
| 17:44:11 | <Somebody2> | no worries |
| 17:44:27 | <Somebody2> | have you opened the edit view for the page? :-) |
| 17:44:59 | <Somebody2> | https://wiki.archiveteam.org/index.php?title=URLTeam&action=edit§ion=8 |
| 17:45:14 | <Ryz> | Yeah, which is part of the confusing part |
| 17:45:27 | <Ryz> | Like, I was more or less thinking of filling in an entry in a table, but via a template...? |
| 17:45:33 | <Somebody2> | so, once you do, you'll see the wikitext |
| 17:45:47 | <Somebody2> | find the project name that your new one goes after |
| 17:45:52 | <Somebody2> | alphabetically |
| 17:47:02 | <Ryz> | Here's what I have so far what I edited: |
| 17:47:04 | <Ryz> | {{subst:urlteam blank warrior entry|amp-gs|https://amp.gs/jpWAq|https://amp.gs/ redirects to https://amplifr.com/}} |
| 17:47:08 | <Somebody2> | perfect! |
| 17:47:26 | <Ryz> | I was initially thinking of shoving it in table, but that looked really really wrong |
| 17:47:27 | <Somebody2> | You can actually just put that at the top or bottom of the table and I or someone else will move it into alphabetical order |
| 17:48:20 | <Ryz> | Huh, that's an odd way to do this Oo; |
| 17:48:35 | <Somebody2> | eh, it's shorter (but more confusing) |
| 17:48:43 | <Ryz> | Yeah, I guess I cover the other ones that I mentioned earlier that's running right now |
| 17:50:17 | <Somebody2> | save the page? |
| 17:50:34 | <Somebody2> | and I can look at it? |
| 17:50:38 | <Ryz> | Hold on, I'm doing all 7 of these-- |
| 17:50:47 | <Somebody2> | that works too |
| 17:50:48 | <Ryz> | Oh, okay, I'll do mine and one of phuzion's |
| 17:50:59 | <Ryz> | Back to original plan then xD |
| 17:51:08 | <Somebody2> | :-P |
| 18:02:51 | <Ryz> | I did it Somebody2, check it o: |
| 18:03:12 | <Somebody2> | looking |
| 18:03:41 | <Somebody2> | cool, let me put it in the table |
| 18:05:01 | <Somebody2> | ah, my account isn't auto approved |
| 18:05:14 | <Ryz> | Ah, welp |
| 18:42:24 | <Ryz> | Hmm, Somebody2, looking at chng-it - it looks like there's 503s being hit |
| 18:42:42 | <Ryz> | Instead of 404s, when encountering links like https://chng.it/1qm4 |
| 18:43:51 | <Ryz> | It's in sprinkles~ |
| 18:46:45 | <Ryz> | Would like to try and continue on this, wanna go through the stuff I posted here over the years and may or may have not been added into the wiki |
| 18:48:14 | <Ryz> | For instance, https://on.belk.com/ - sample URL: https://on.belk.com/2Ln8a6h - however, may be danger territory since it's a Bit.ly URL shortener for https://belk.com/ |
| 18:53:20 | <@JAA> | Yeah, we don't want any bit.ly aliases here since it just duplicates work with the existing bit.ly scrape. |
| 18:54:14 | <Ryz> | Oh, huh, can I add it to the wiki then still? |
| 18:55:06 | <@JAA> | Hmm, actually, it appears that bit.ly no longer resolves codes belonging to an alias under bit.ly. |
| 18:55:30 | <@JAA> | There's a section for bit.ly aliases on the page. |
| 18:56:08 | <Ryz> | Ah yeah, putting it down there~ |
| 18:58:11 | <Ryz> | Next one is http://rlu.ru/ - which is uhh, definitely not a normal URL shortener at all - http://rlu.ru/2KGu |
| 18:58:31 | <Ryz> | Found it from https://vadiqusite.wordpress.com/2014/09/24/%d0%b1%d0%be%d0%b9%d0%ba%d0%be%d1%82-%d0%b8%d1%80%d0%b8%d1%88%d0%ba%d0%b0/ |
| 19:10:17 | <Ryz> | This looks like there's a lot more work involved with this one... |
| 19:14:43 | <@JAA> | Yeah, that one's weird. Another code redirected me directly on the first attempt but then gave a similar page. |
| 19:14:55 | <@JAA> | A code resolving to another shortener produces this: http://rlu.ru/2YdDt |
| 19:19:48 | <Ryz> | Ugh, meanwhile something like https://4fun.tw/ isn't better, seeing links like https://4fun.tw/iedi |
| 19:31:11 | <Somebody2> | Sorry, didn't see the message till now |
| 19:31:46 | <Ryz> | There's some occasional sprinkles of 503s, Somebody2, the last time I saw it |
| 19:32:28 | <Somebody2> | occassional 503s are fine |
| 19:32:56 | <Somebody2> | the only concern is if the *same* shortcode consistently 503s |
| 19:33:05 | <Ryz> | Hmm... oo; |
| 19:33:47 | <Somebody2> | because we'll retry |
| 19:36:00 | <Ryz> | Ah, well, here's 2 URL shorteners up there that I wanna work are stupidly non-standard >#<; |
| 19:36:22 | <Ryz> | Meanwhile I'm grinding through putting down dead URL shorteners, some of them have actually died since mentioning them X_x; |
| 19:38:14 | <Ryz> | That's one of the other reasons for wanting access >#<; |
| 19:42:40 | <Somebody2> | yeah, that's a good reason to jump on them -- they die like mayflies |
| 19:43:05 | <Ryz> | Considering my endless looting adventuring, woooooo <#>; |
| 19:44:03 | <Ryz> | And having to deal with more complicated URL shorteners |
| 19:44:17 | <Ryz> | ...And probably one more that's user agent specific <_>; |
| 19:46:19 | <Somebody2> | heheheh |
| 19:59:49 | <Ryz> | Like, it's mobile device useragent specific x_x; |
| 20:05:05 | <Somebody2> | Yeah, we don't support useragent changes without custom code (that we *do* support) |
| 20:26:41 | <Ryz> | JAA, adding in the 3 dead URL shorteners as per https://wiki.archiveteam.org/index.php?title=URLTeam&oldid=48552#Dead_or_Broken - a bit surprised there isn't a bot that would auto-sort the links alphabetically like https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine |
| 20:26:57 | <Ryz> | Especially as the dead list will grow longer and longer |
| 20:28:46 | <Ryz> | Hmm, probably gonna do http://www.man.ac.uk/ - sample URL: http://www.man.ac.uk/3g7Xni - since it's as simple as the other projects I done |
| 20:29:07 | <Ryz> | Problem is, whether to have it as http://www.man.ac.uk/3g7Xni or http://man.ac.uk/3g7Xni - thoughts, Somebody2? |
| 20:31:45 | <Ryz> | Oh interesting, it didn't like HTTPS at all, https://man.ac.uk/3g7Xni - it would just instead give a "Not Found" followed by "The requested URL /3g7Xni was not found on this server." |
| 20:32:18 | <Ryz> | Meanwhile, https://wheregoes.com/trace/20222459121/ - redirect type, it's just 302 Redirect |
| 20:33:05 | <Ryz> | Meanwhile, for unavailable IDs, as per https://wheregoes.com/trace/20222459135/ - it's a 404 |
| 20:33:30 | <Ryz> | Or "No redirect status codes", a flippy floppy |
| 20:34:23 | <Somebody2> | generally using a shorter base URL is better, but it doesn't matter much |
| 20:34:48 | <Somebody2> | there likely could be a bot to sort the list, but someone has to write it |
| 20:37:29 | <Ryz> | Hmm, this is a bit interesting and hesitate-y, checking https://web.archive.org/web/*/http://man.ac.uk/* - it uses a URL shortener but there's non-URL shorter content there s: |
| 20:38:33 | <Ryz> | Unsure if need further instructions or a go-ahead |
| 20:38:53 | <Ryz> | Since I set up the 'Shortener Settings', nothing unusual initially |
| 20:39:21 | <Somebody2> | eh, go ahead |
| 20:40:20 | <Ryz> | Yeah, another project running through |
| 20:40:27 | <Somebody2> | ? |
| 20:40:37 | <Ryz> | That one was was requested by flashfire |
| 20:40:44 | <Ryz> | A long time ago~ |
| 20:40:50 | <Somebody2> | another project where? |
| 20:41:02 | <Ryz> | Just man-ac-uk - http://www.man.ac.uk/ |
| 20:41:27 | <Ryz> | http://www.man.ac.uk/3g7Xni again being the sample URL |
| 20:41:58 | <Somebody2> | I don't see another project on the urlteam tracker...? |
| 20:42:01 | <Ryz> | Uhh, is there something wrong? |
| 20:42:09 | <Somebody2> | just the one you created just now |
| 20:42:18 | <Ryz> | Oh, I didn't do the Queue enable somehow |
| 20:42:30 | <Somebody2> | yep :-) |
| 20:42:39 | <Somebody2> | not exactly sure why we have the two steps |
| 20:43:02 | <Ryz> | I guess to reduce accidents? |
| 20:43:08 | <Somebody2> | probably |
| 20:43:29 | <Ryz> | Anyway, all I have left is 3 non-standard URL shorteners in possession that I mentioned here years ago x_x |
| 20:44:49 | <Ryz> | I stopped it because I just realize that http://man.ac.uk/2pW redirects to http://www.man.ac.uk/2pW - but is seen as valid and captured |
| 20:46:01 | <Ryz> | So I'll have to keep it as http://www.man.ac.uk/ then Somebody2, unless I would have to make an ignore |
| 20:48:22 | <Somebody2> | oy |
| 20:48:27 | <Somebody2> | makes sense |
| 20:48:31 | <Somebody2> | we can look at it again later |
| 20:48:44 | <Somebody2> | I'm going to be AFK for a while now, but I'll read the logs when I get back |
| 20:48:53 | <Ryz> | Okay >#<; |
| 23:30:12 | <@JAA> | Ryz: It's certainly possible to have a bot do that. Someone would just have to write it. |
| 23:30:58 | <Ryz> | Your JAABot seems to do the same alphabetical sorting task; would it be possible to have it cover the dead URL shorteners? |
| 23:31:28 | <@JAA> | It's the same but not the same really. |
| 23:32:45 | <@JAA> | For starters, this one would be only a section, not the entire page, which already makes it a bit more complicated. |
| 23:42:08 | <Somebody2> | I'm totally in favor of separating that page into multiple pages -- this could be a good opertunity to do that. |
| 23:42:53 | <@JAA> | The one nice thing about having everything on one page is that you can easily ^F to check whether a shortener is known. But other than that, yeah, I agree. |