07:17:40dunger quits [Ping timeout: 265 seconds]
09:12:01chrismeller (chrismeller) joins
10:09:23qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
10:20:56Minkafighter quits [Client Quit]
10:22:33Minkafighter joins
13:15:28Barto quits [Ping timeout: 265 seconds]
13:15:43Barto (Barto) joins
13:46:33chrismeller quits [Client Quit]
14:50:30<@phuzion>Hey Ryz, you around?
14:50:41<Ryz>Yo, what's up? o:
14:50:58<@phuzion>Question for ya. Are you interested in adding shorteners to the tracker?
14:51:34<Ryz>I would prefer it rather having to wait for my requests to be taken up~ o.o;
14:51:45<Ryz>At least self-service kind of thing
14:52:22<Ryz>Alternatively, I was gonna ask if someone can add the URL shorteners, then I would add 'em to the wiki as encouragement
14:52:29<@phuzion>So we really only have one way to allow people to do that, and that's by giving them a tracker admin account. Is that something you'd be interested in?
14:55:29<Ryz>That is something I have interest in, since from time to time, I've been posting these URL shorteners here and tried to add 'em to the wiki, but there hasn't been much activity so I shied away from doing a lot of diggery
14:55:47<@phuzion>Alright cool.
14:56:01<@phuzion>Do you have Discord? this woud probably be easier to explain the admin UI over a voice call
14:56:41<Ryz>I did it again because I decided to dig into a random set of lists from #// and found some URL shorteners~
14:57:10<Ryz>Uhh, I think I rather have it in text form, because I'm not as strong when learning via voice or speech~
14:57:19<@phuzion>Ok, no problem.
14:57:34<Ryz>Would not hesitate to asking a bunch of questions for much clarity
14:58:50<@phuzion>Ok, so I just DM'd you your credentials.
14:59:22<@phuzion>If you'd like to change them, you have the power to do so in the admin UI. However, because your account is an admin account, please keep the password secure.
15:00:26<@phuzion>Ok, so there are really three things that you can do that affect the tracker. Add/change projects (shorteners), ban clients, and handle error reports.
15:01:55<@phuzion>Handling error reports is easy. If people are complaining that the tracker is full of error reports and people aren't getting new jobs, you can delete the error reports under the "Error reports" link and then click the "Delete all" button at the bottom of the page. Don't just incessantly do this if it keeps filling up though, because it could be an indicator of an issue with the project.
15:02:16<@phuzion>These haven't been filling up as much lately, so it's not something you should need to do often.
15:02:44<@phuzion>However, if you're going to be adding new projects, you may periodically run into an issue where the error reports are filling up because of a misconfiguration on a new project.
15:04:05<@phuzion>If you do notice that all of the error reports are filling up on a single project, you can open the project from the "Projects" link and disable it there. If you do so on a bigger project (yahoo, goo-gl, etc), make a note in here so that the rest of us can take a peek into what's going on with it.
15:05:18<@phuzion>Banning clients is very rare, it's mostly a feature that exists just in case. If someone tampers with their urlteam client and makes it report nothing but their onlyfans or something dumb like that, we can ban their client. In general, I'd ask for advice in here from others before jumping to a client ban.
15:05:29<Somebody2>Yay, a new tracker admin is born! Sorry I've been totally absent for so long, but glad to see we're getting a new person!
15:05:51<@phuzion>Somebody2: Yeah, JAA and I discussed it the other day.
15:06:01<Somebody2>nice
15:06:09<Somebody2>I also trust Ryz
15:06:24<@phuzion>Glad they get a :+1: from you as well. :)
15:07:51<@phuzion>Ok, so, now the tab that you're probably gonna be spending the most time in, Projects. This is where you add shorteners, and modify settings on existing shorteners. When you're adding a new shortener, you'll want to name it basically "something-tld", so if the shorturl is https://foobar.xyz, you would name it foobar-xyz. Obviously check and make sure that the project doesn't already exist before adding it.
15:09:59<@phuzion>Ryz: Everything make sense so far?
15:10:35<Ryz>Trying to take it all in, re-reading the parts to make sure I understand
15:10:47<Ryz>Since well, it's a different process from doing ArchiveBot stuff all these years
15:11:11<@phuzion>Right.
15:11:41<Ryz>When I see the Projects section, I thought there would be a lot more URL shorteners, until realizing the projects can have multiple URL shorteners to cover
15:11:58<@phuzion>We have a LARGE backlog of shorteners to add.
15:12:40<Ryz>Oh my goodness, when was the last time this was worked on? oo;;;
15:12:54<@phuzion>I just added one today to re-familiarize myself with the project lol
15:13:26<@phuzion>But I haven't done much with it in a while.
15:14:07<Somebody2>probably a couple years ago
15:14:18<Somebody2>AFAIK, it's been ticking along quietly the whole time
15:14:26<Somebody2>but without *new* shorteners being added
15:14:38<@phuzion>Yeah, afaik we've only got like 5-6 shorteners enabled right now.
15:15:10<Somebody2>most of the existing ones we've already gone thru
15:15:23<Somebody2>it's a long tale sort of situation --
15:15:29<@phuzion>Yeah.
15:15:33<Somebody2>bit.ly is gigantic and we probably won't ever finish
15:15:54<Somebody2>then there are hundreds of ones with only 4 character shortcodes that we go thru in a few hours
15:17:04<@phuzion>Ryz: Ok, so here's what I want you to do for your first project. Pick a shortener. Figure out what type of redirection it has for a valid shorturl, and an invalid shorturl, and demonstrate to me how you figured those things out. Think you can do that?
15:17:38<Ryz>Gonna try a really simple URL shorterner, let's see from what I posted above...
15:17:49<@phuzion>I've already added amp.gs, so not that one.
15:17:54<@phuzion>And I added t.ly from the wiki page.
15:18:11<Ryz>http://amp.gs/ would be one, http://amp.gs/jpWAq with 5 character letters
15:18:19<Ryz>Oh, heheh
15:19:32<Ryz>The next would be https://stuf.in/ - sample URL: https://stuf.in/b95jak - then
15:20:38<@phuzion>And what type of redirection does that shortener use?
15:22:18<Ryz>Uhh, how can I identify the type of redirection?
15:22:49<Ryz>Tried to do view-source:http://amp.gs/jpWAq first to see if it's that other kind of redirect
15:23:15<@phuzion>In general, it's an HTTP status redirect. The way I find it is by using `curl -v https://stuf.in/b95jak` and look for HTTP 301 or HTTP 302 or something.
15:24:27<@phuzion>You can use wheregoes.com if you don't have curl, or if you find it to be easier to use a web tool. https://wheregoes.com/
15:25:05<Ryz>Okay, seeing that, it is a 301 Redirect
15:26:12<@phuzion>And what happens if you try to shorten an invalid URL?
15:26:48<Ryz>...A 200 followed by a 200, uhh~
15:27:07<Ryz>I did something gibberish like https://stuf.in/sdffggggrwrfsfef
15:27:33<@phuzion>Awesome. So based on what you’ve found, it looks like this is a fairly standard shortener.
15:28:04<@phuzion>(I’ve been verifying everything you’ve been saying so I know it’s correct by the way)
15:28:29<@phuzion>So, the first step is to create the project in the admin UI. What are we gonna call this project?
15:28:34<Ryz>Ah, I see I'm veering towards similar territory where I might encounter really bullshit URL shorteners like I've been encountering in ArchiveBot, finding bizzare websites that trip up ArchiveBot from time to time <#>;
15:29:10<Ryz>Based on the naming scheme on http://amp.gs/ with amp-gs - with https://stuf.in/ - I'll go with stuf-in
15:29:39<Ryz>Gonna type it in~
15:29:50<@phuzion>Perfect. Go ahead and create that project and let me know when you’ve done that. Just create it, but don’t make any changes once you’ve created it.
15:30:14<Ryz>I do notice '-' was used in place of '.' - not all of 'em, some don't have '-'
15:30:23<Ryz>Okay, I created the 'stuf-in' entry
15:31:07<Somebody2>Yeah, some of the early ones didn't use the - for . convention
15:31:59<@phuzion>Ok, great. Now, go to shortener settings
15:32:40<Ryz>Okay, I'm in 'Shortener Settings'; whoa, looks like 12 things to be able fill in Oo;
15:32:56<@phuzion>For the most part, you can generally ignore "minimum library version" and "minimum pipeline version". Those generally tend to only get messed with if we are working on a custom shortener that does something strange like javascript or something that requires client-side work.
15:33:20<Ryz>Mm, which I'll need help for when I encounter something BS like that >#<;
15:33:57<@phuzion>Alphabet, this is the list of characters (case senstiive) that can be supplied as potential shorturls. So if you have a shortener where the shorturls are basically https://short.url/123 https://short.url/548 https://short.url/12612 https://short.url/123521 https://short.url/662323, you could change the alphabet to 1234567890
15:34:15<@phuzion>If you know that the alphabet is lowercase only, you can remove the capitals, etc.
15:35:14<Ryz>Aaaah, would need to find away to see if it uses any other character letters or there are ones that need to be removed~
15:35:30<@phuzion>URL template is where you change the actual shorturl in the project. I'd recommend checking if the shortener supports https, and if it does, putting that into the shorturl template, because that would save one redirect for each attempt if they automatically redirect you from HTTP to HTTPS.
15:35:44<@phuzion>So what do you think you should enter for the URL template here?
15:36:19<Ryz>Before moving on to that, for me, one way of checking a sampling of available used URLs is https://web.archive.org/web/*/https://stuf.in/*
15:36:35<@phuzion>That's a decent way to do it, sure.
15:36:50<@phuzion>Another thing you could do is try creating a few URLs using their interface if it's public and see what URLs they give you back.
15:37:00<Ryz>There's no '_', but then again, that function got nerfed from 200,000 max shown links to a mere 10,000 links...
15:37:15<Ryz>Or I think it was previously 100,000 links
15:37:58<Ryz>Checking https://stuf.in/ - it both supports HTTPS and HTTP - for some reason, HTTP doesn't redirect to HTTPS
15:39:24<Ryz>I guess in that case, in the 'URL template' entry, changing from 'http://example.com/{shortcode}' (without the ') to 'https://stuf.in/{shortcode}'
15:39:32<@phuzion>Yep.
15:39:50<@phuzion>I'd recommend still going with HTTPS because if they DO decide to turn on HTTPS redirection at some point, you'll already have handled that there.
15:40:09<Ryz>Definitely, keeping it on HTTPS
15:42:52<@phuzion>Ok, so we have the URL template set. Ignore the next two, "time between requests" and "http method". You'll know if you need to change those.
15:43:05<Ryz>Mhm o:
15:43:59<@phuzion>Redirect status codes. This is where you tell the client "this is what a successful redirect looks like". We basically keep all of the "HTTP Redirect" options in there by default. Since you found that this tracker uses 301s, we don't need to change anything here.
15:45:50<@phuzion>No redirect status codes. In general, you don't need to change this. If the tracker does something weird, you can change it, but it's not necessary in this case.
15:47:04<@phuzion>Unavailable status codes, this is where that 200 goes. When you type a bad shorturl like "https://stuf.in/jkd2kl3jksldxjk34" it gives you an HTTP 200 basically saying "This isn't a valid URL" Since we've already got that, you can just ignore it and leave it at the default.
15:47:26<@phuzion>Banned, this is what happens if we hit the tracker too aggressively and they decide to say "Fuck you, you're not allowed to hit us."
15:47:43<@phuzion>Content body regular expression and location header reject regular expression aren't things you need to worry about.
15:47:54<Ryz>Ah, there's a distinct difference between 'No redirect status codes' and 'Unavailable status codes'; thought it sounded the same at first
15:48:37<Ryz>As for "HTTP method (get/head)" entry, how can you tell whether to use one of the two? Since even using https://wheregoes.com/ - it doesn't seem to say whether the action is a 'get' or a 'head'
15:49:04<@phuzion>Almost every shortener is a HEAD.
15:51:07<@phuzion>Ok, so go ahead and save your settings for the shortener and let me know when you've done so, that way I can verify
15:51:17<Ryz>Ah, how can I tell if it's a 'GET'?
15:51:34<Ryz>Ah, hold on, was gonna ask whether to go for it, since the only thing that's changed is the 'URL template' entry
15:52:18<Ryz>I see a blue box saying 'Settings saved.' at the top
15:53:12<@phuzion>Ok, that all looks good.
15:53:40<@phuzion>Now, you'll go to queue settings, and turn on AutoQueue, and hit the blue apply button at the bottom of the page
15:54:37<Ryz>Getting to the 'Queue Settings'; ah, the AutoQueue is a checkbox option that's not filled in right now,
15:54:55<@phuzion>Yep, turn that on and click apply.
15:55:19<Ryz>Hmm, what about having to set up what's the length of the URL shortener code? oo;
15:55:26<Ryz>Or is that after?
15:55:49<@phuzion>We just start at 0, which would be a, b, c, d... aa, ab, ac, etc.
15:55:57<Ryz>Aaah, okay~
15:56:02<Ryz>Pressing 'Apply'
15:56:13<Ryz>'Settings saved.' via blue box
15:56:13<@phuzion>A lot of shorteners start there, so we just start there as a default.
15:56:32<@phuzion>Alright, now click the "Enabled" checkbox, and click the apply button immediately below it.
15:56:51<@phuzion>Once you do that, jobs will be released to the clients, who will start hitting the shortener.
15:56:57<Ryz>Done and done
15:57:04<Ryz>'Enabled', the blue box says
15:57:18<@phuzion>Alright, now you can click on "Claims" to see the jobs that are released out to clients.
15:57:56<Ryz>Oh, that's a lot more entries Oo;
15:58:04<@phuzion>So, you can see the first queue, lower seq num 0 ranges from 0 to N, so 0, 1, 2, 3, etc, a, b, c, d, etc, A, B, C, D, all the way through to L, M, N.
15:58:28<@phuzion>Each client will hit all of those shorturls, and return results when they're done.
15:58:45<@phuzion>You can click results and watch them come in
15:59:00<Ryz>Definitely wanna see the results, seeing the fruits of my labor O:
15:59:39<Ryz>Oh look, another potlink source for me to mine it and archive it via ArchiveBot xD
16:00:33<Ryz>I'm assuming these URLs get sent to #// ?
16:00:52<@phuzion>The shorturls that we find?
16:01:20<Ryz>The resulting URL from going through the URL shortener link
16:02:03<@phuzion>No, we don't automatically post these URLs anywhere.
16:02:14<@phuzion>They're archived and stored on IA.
16:02:41<Ryz>Oh, I'm surprised it's not being sent to #// for further processing~
16:04:51<Ryz>Checking the 'Error Reports' section, seeing entries, unsure what to do or leave it alone
16:05:24<@phuzion>We're probably overloading the shortener, I'm gonna edit the settings down a bit.
16:06:13<@phuzion>Alright, I reduced the amount of queues we have, and increased the time between checking URLs.
16:06:43<Ryz>I guess the 500s errors will be retried again at some point
16:08:11<Somebody2>It would be great if someone (you?) wanted to mine the shorturls we find and pick relevant ones to send thru Archivebot
16:08:17<Somebody2>but we haven't done so yet
16:08:45<Somebody2>Also, as for the distinction between HEAD and GET ...
16:08:51<Ryz>Yeah, that's definitely an option, but would wanna focus on the other potlinks I've been building up over the years; so many potlinks x_x;
16:08:57<Somebody2>heheh
16:09:19<Somebody2>HEAD is requesting just the headers from the server, while GET requests the whole page
16:09:31<@phuzion>Anyways, I'm about to step away for the day. Ryz, if you wanna add another shortener, go ahead, maybe have Somebody2 verify that it all looks good before enabling it?
16:09:41<Somebody2>sure, happy to check things over
16:09:55<@phuzion>Somebody2: do you have a tracker admin account?
16:10:00<Somebody2>yep
16:10:04<Somebody2>let me make sure it's working
16:10:05<@phuzion>cool
16:10:08<Ryz>That would be very helpful for someone to check for errors~ >#<;
16:10:13<@phuzion>yeah I can reset your password if needed
16:10:49<Somebody2>yep, it works
16:10:52<@phuzion>cool
16:11:19<@phuzion>Yeah you seem to know your way around the project a bit better than I do, but I figured I'd get ryz up to speed with "tracker admin 101"
16:11:31<Somebody2>and I'm very grateful!
16:12:11<Somebody2>we should always default to trying HEAD first (because it's less data), but there are a few reasons why we might need to switch to GET
16:12:26<Somebody2>1) The server is weird, and explicitly refuses to respond to HEAD
16:12:46<Ryz>Meanwhile, checking the 'amp-gs' project out of curiosity, checking 'Error Reports', no reports at all, woo o:
16:12:55<Somebody2>2) The redirection is weird and the target URL is only *present* in the body
16:12:58<Ryz>Unless they're cleaned out... oo;
16:12:58<Somebody2>that's basically it
16:13:20<Somebody2>Not having any error reports isn't that odd. The main question is if there are recent *results*
16:13:32<@phuzion>I cleared the errors on the project that was erroring, Ryz
16:13:42<@phuzion>I think it was stuf-in
16:13:46<Ryz>Aaaah >#<;
16:13:52<Ryz>I was checking 'amp-gs',
16:14:01<Somebody2>and amp-gs is returning data, yes
16:14:36<Somebody2>BTW, you can get directly to the results and error pages for a shortener by clicking on the Found and Scanned columns on the main page
16:14:49<Somebody2>I added that, a while back, because it was pissing me off
16:14:53<@phuzion>Also I checked, and I don't believe a 500 is a banned code for stuf.in, I hammered it with a bunch of requests (while true; do curl -v stuf.in/8x4; done) and it didn't give me a 500.
16:15:00<Somebody2>having to click thru twice
16:15:16<@phuzion>So anyways, I'm gonna step away for the day. Ryz, have fun, and let us know if you have any issues.
16:15:53<Somebody2>See ya, phuzion !
16:16:01<Somebody2>Ryz: want to try adding another shortener?
16:16:48<Ryz>Yeah, definitely, this time it's going to be http://toi.in/ - which immediately, there's a '_' in the shortened ID
16:17:39<Somebody2>nice -- just put that in the alphabet
16:18:01<Ryz>Mhm, trying to do it in order, and then having to slot it in the wiki afterwards
16:18:46<Somebody2>also, feel free to refactor the wiki page if/when you think of a better way to arrange things
16:18:46<Ryz>Okay, that one up there, since the attempts of doing projects like this in general isn't recorded automatically or done like ArchiveBot ><;
16:19:01<Ryz>New project, toi-in
16:19:05<Somebody2>looking now
16:19:33<Somebody2>I'm turning off ow-ly since it doesn't seem to have returned results in a while
16:19:41<Ryz>Checking in 'Shortener Settings', obviously I have to add the '_' in the alphabet~
16:20:01<Somebody2>yep
16:20:56<Ryz>Changed URL template to https://toi.in/{shortcode} since HTTPS overpowers HTTP
16:21:08<Ryz>...I don't know why I got that HTTP link before, probably because a really old link
16:21:19<Somebody2>could be
16:21:35<Ryz>Time to check what kind of URL redirect is this...
16:22:22<Ryz>Hmm, uhh, https://wheregoes.com/trace/20222457585/
16:22:28<Ryz>It's a 301, then a 301, and then a 200
16:23:31<Ryz>Then there are URLs like https://toi.in/XpjP_a70+ that redirect to https://toi.in/micron/analytics.html?str=XpjP_a70+ - which seem to be broken, but since the alphabet doesn't have '+', don't need to worry
16:23:32<Somebody2>Ah, just put in the second redirect
16:23:58<Ryz>In the URL template?
16:24:07<Somebody2>i.e. the one with micron in it -- yeah, in the URL template
16:24:15<Somebody2>confirm that it works, though
16:25:10<Ryz>Was able to go through http://toi.in/micron/redirect.html?str=XpjP_a70 fine
16:26:13<Ryz>I was gonna say should've we also get http://toi.in/XpjP_a70 ? Because when sending the stuff via WBM, people would think http://toi.in/XpjP_a70 wouldn't exist while http://toi.in/micron/redirect.html?str=XpjP_a70 is archived
16:26:13<Somebody2>perfect
16:26:23<Somebody2>this doesn't directly go into the WBM
16:26:25<Somebody2>sadly
16:26:48<Somebody2>it just gets saved as basically CSV files that map from the short *code* to the long one
16:27:16<Somebody2>actually saving WARCs of the redirections is another neat project we haven't got to
16:28:00<Ryz>I wonder if that's the reason there hasn't been much activity here to add more and more URL shorteners... s:
16:28:23<Somebody2>quite possible!
16:28:35<Ryz>Okay, so there's a 301 Redirect, now what's the unavailable one...
16:28:50<Somebody2>although I think the biggest reason is just that people don't think saving shorturls is that exciting or interesting
16:29:14<Ryz>https://wheregoes.com/trace/20222457603/ - it just redirects to http://toi.in/micron/error.html - which is a 200
16:29:40<Ryz>Also https://wheregoes.com/trace/20222457611/
16:30:16<@phuzion>One thing I’ve kinda mused about was a multi-function browser extension that simultaneously revolves shorturls for live links, and fixes dead shorteners by providing the original link
16:30:20<Ryz>For me, I think it's yet another way for me to find more obscure and odd material, potlink material
16:30:34<Somebody2>what's a potlink, btw?
16:31:03<Ryz>For me, it's my own personal term for finding even more websites and links to look over, hence a pot full of links
16:31:37<Ryz>Think those lists of sections under 'Blogroll' for example
16:31:42<Somebody2>ah, got it
16:32:02<Somebody2>pot 'o links
16:32:18<Ryz>Okay, so there's nothing else to alter, the unavailable status codes is still 200 after checking it
16:32:29<Ryz>Somebody2, check it o:
16:35:09<Somebody2>will do
16:35:54<Somebody2>looks good, open it up
16:36:42<Ryz>'Open it up'? Like run it?
16:36:46<Somebody2>yep
16:36:57<Somebody2>turn on the auto-queue and enable it
16:37:56<Ryz>Did both of those, one after another
16:39:25<Somebody2>ah, we need to add one more setting to avoid including errors
16:39:37<Somebody2>it looks like it redirects to /micron/error.html on error
16:39:44<Ryz>Uh-oh~
16:39:58<Somebody2>so you can ignore those by adding that to the Location header reject regular expression
16:40:05<Somebody2>which I'm going to do now
16:40:14<Ryz>Alright alright ><;
16:40:45<Ryz>Would have to figure out what to do if I'm alone after some time getting accustomed to this
16:40:50<Somebody2>heh, no worries
16:44:41<Somebody2>and looks like the results are coming in nicely
16:45:06JackThompson05 quits [Ping timeout: 265 seconds]
16:45:24<Ryz>After the initial clear up, yup o:
16:45:46JackThompson05 joins
16:46:35<Somebody2>:-)
16:47:02<Somebody2>Wanna put in another one?
16:47:33<Ryz>I'm trying to see what's going on with the Error Reports in 'stuf-in' or if it's unfortunately something that is inherent to that URL shortener
16:48:03<Somebody2>You can probably just add HTTP 500 as an "Unavailable" code
16:48:14<Somebody2>that'll remove them from the Error Reports
16:49:56<Ryz>Hmm, asking considering phuzion's thoughts on the 500 stuff that they spoke out
16:50:21<Somebody2>makes sense
16:51:15<Somebody2>yeah, stuf-in does seem to be having problems
16:51:20<Somebody2>no recent results either
16:51:20<Ryz>As for adding more, well, there's 2 more left above that need to be added and I guess I'm done for now (besides having to punch 'em in the wiki)
16:51:41<Ryz>Being https://cstu.io/ and https://chng.it/
16:51:43<Somebody2>heh, there are lots more to research and add listed on the wiki page
16:51:59<Somebody2>but doing just those two is fine
16:52:08<Somebody2>thanks for stepping up!
16:52:28<Ryz>Mhm >#<;
16:52:45<Ryz>I do really wish the resulting URLs get fed back to #//
16:53:17<Ryz>Onto https://cstu.io/ - new project: cstu-io
16:54:47<Ryz>Hmm, checking https://web.archive.org/web/*/https://cstu.io/* - it looks like there's no upper case character letters
16:54:52<@phuzion>Ryz: the amount of URLs that we collect, even from a single shortener, would overwhelm that channel in a matter of seconds.
16:55:22<Somebody2>Although, we could likely do some filtering and put the *interesting* ones thru
16:55:32<Somebody2>there is a lot of absolute garbage
16:55:36<@phuzion>Yeah
16:55:44<Ryz>Wait a minute, https://web.archive.org/web/2021*/https://cstu.io/081a49 - hmm, this has been archived before, but there doesn't seem to be such a thing as a cstu-io project
16:56:01<@phuzion>Ryz: It might have gotten crawled otherwise
16:56:05<Ryz>Archived under the archiveteam_urls collection
16:56:06<Somebody2>Yeah, there are other ways that shorteners get into the WBM
16:56:21<Somebody2>also, it might have gotten lost somewhere
16:57:21<Ryz>Welp, oh well, I guess more of a potlink source then for me, though would really wanna mine out those goodies
16:57:40<Ryz>Anyway, alphabet, only lowcase and numbers since checking https://web.archive.org/web/*/https://cstu.io/*
16:57:46<Ryz>URL template is https://cstu.io/{shortcode}
16:58:18<Ryz>It's a 302 Redirect as per https://wheregoes.com/trace/20222457833/
16:59:03<Ryz>But the redirect status codes already got that by default
17:00:00<@phuzion>Ryz: Yeah, a 301 is permanent. A 302 is temporary.
17:00:09<Ryz>For unavailable URL, it's a 200 as per https://wheregoes.com/trace/20222457844/
17:00:36<Ryz>It just redirects back to https://contentstudio.io/
17:01:01<Somebody2>mining them is a great idea -- it just takes someone to do the work
17:01:40<Somebody2>that all seems right, go ahead with starting it up
17:01:41<Ryz>If mining is a thing, one good way to go through stuff is filtering out URLs that I already saw so I don't resee 'em again and get my time wasted
17:01:49<Somebody2>yep
17:02:04<Ryz>Saved the settings for cstu.io - check incase Somebody2
17:02:22<Somebody2>will look now
17:02:45<Somebody2>looks good!
17:03:14<Ryz>And then autoqueue followed by enable~
17:03:28<Somebody2>Also, something that would be good to do is try and figure out how to get tinyurl and bit.ly working again
17:05:26<Ryz>Uhh, the results uhh, they're just going back to https://contentstudio.io/ - help?
17:06:31<Ryz>Somebody2?
17:08:05<Ryz>phuzion?
17:08:22<@phuzion>disable the project
17:08:50<@phuzion>I just disabled it
17:08:55<Ryz>Turn off autoqueue and queue?
17:08:57<Ryz>Oh
17:10:22<Ryz>Would it be possible to just reject https://contentstudio.io/ similar to how Somebody2 did with the previous project?
17:12:51<@phuzion>I just did that.
17:13:17<@phuzion>So if it includes "contentstudio.io" in the results, it'll ignore it
17:13:38<Ryz>Ah, I think Somebody2 didn't notice me saying any invalid URL just redirects back to https://contentstudio.io/
17:13:45<@phuzion>Possibly.
17:13:47<Ryz>And I went with the go-ahead, welp ><;
17:13:53<@phuzion>But keep an eye on it for a bit
17:14:11<Ryz>Mhm, incase of errors and such
17:14:15<@phuzion>In the future if you get into a place where you're not sure whether you should keep going, or if something is broken, just disable the project.
17:14:30<@phuzion>Disable it, take a note to work on it later, and continue on to the next one.
17:14:43<@phuzion>Anyways, I'm actually heading out now. I'll catch you later.
17:15:01<Ryz>Ah, I have one more before being done but alrighty >#<;
17:15:16<Ryz>The last one for the time being which is https://chng.it/
17:15:20<Somebody2>yeah, I missed that
17:15:22<Somebody2>sorry!
17:15:46<Somebody2>Yeah, it's always harmless to disable a project
17:17:52<Ryz>As for the results in general, I can kinda see why some people think it's kinda boring, because some of the fruits of labor isn't readily apparently when it's starting from the smallest code length to the longest~
17:18:03<Somebody2>yep :-)
17:18:11<Ryz>I guess I got lucky with some of 'em I got that immediately gave results
17:18:21<Ryz>Don't really mind, it's like planting a tree :p
17:18:39<Ryz>Just more concerned whether I may be overloading the URLTeam project ><;
17:18:46<Somebody2>although for a lot of shorteners, they also start from the shortest codes
17:18:56<Somebody2>it's pretty much impossible to overload this project :-P
17:19:27<Somebody2>easier to overload individual shorteners, though
17:21:52<Ryz>New project: chng-it
17:22:08<Somebody2>nice
17:23:04<Ryz>No need to touch alphabet; URL template is https://chng.it/{shortcode} - redirect type is 301 as per https://wheregoes.com/trace/20222458015/
17:23:10<Somebody2>perfect
17:23:30<Ryz>As for unavailable URL, this time it's a 404 as per https://wheregoes.com/trace/20222458029/
17:24:15<Ryz>So for 'Unavailable status codes', I should add 404 next to 200, or remove 200 to replace it with 404?
17:24:22<Somebody2>add it
17:24:43<Ryz>Huh, even though in 'No redirect status codes', there's a 404 there?
17:25:51<Somebody2>ah, in that case, it's probably fine
17:26:07<Somebody2>honestly, I'm not sure what the distinction between No redirect and Unavailable is, exactly
17:26:16<Somebody2>I think they do the same thing
17:26:24<Ryz>Could just wait for phuzion when they come back
17:26:30<Somebody2>Unavailable may be used for redirects that *used* to exist, but were removed
17:26:41<Somebody2>but I'm pretty sure the code treats them the same
17:27:09<Ryz>I'll leave 'Unavailable status codes' entry alone, just leaving 200 alone
17:27:33<Ryz>Settings saved, check it please Somebody2
17:28:03<Ryz>Oops, I somehow left a 404 there still
17:28:26<Somebody2>heh, will look
17:28:30<Ryz>Interesting, separated by a space, not something like ','
17:29:52<Somebody2>looks good
17:30:07<Somebody2>yeah there are a bunch of weird style things about this codebase
17:30:16<Ryz>autoqueue and then queue time~ o:
17:31:16<Somebody2>go go go! :-)
17:31:53<Ryz>Yeah, I think that's it for now
17:32:03<Ryz>Gonna try and add the ones I added via URLTeam tracker into the wiki
17:32:58<Somebody2>excellent
17:33:03<Somebody2>let me know if you run into any problem
17:34:07<Ryz>Hmm, a bit confused on how to add the URL shorteners in https://wiki.archiveteam.org/index.php/URLTeam
17:34:15<Ryz>I think I only know how to add the dead ones...
17:36:50<Ryz>Somebody2?
17:37:09<Somebody2>looking
17:37:48<Somebody2>Enter them under the "Warrior projects" section
17:38:07<Somebody2>Using the template, urlteam blank warrior entry, mentioned at the top of the table
17:38:32<Somebody2>The three pieces of info are just project name, an example URL, and any relevant comments you want to make
17:43:15<Ryz>I think I'm a lot more confused than I thought :c
17:44:11<Somebody2>no worries
17:44:27<Somebody2>have you opened the edit view for the page? :-)
17:44:59<Somebody2>https://wiki.archiveteam.org/index.php?title=URLTeam&action=edit&section=8
17:45:14<Ryz>Yeah, which is part of the confusing part
17:45:27<Ryz>Like, I was more or less thinking of filling in an entry in a table, but via a template...?
17:45:33<Somebody2>so, once you do, you'll see the wikitext
17:45:47<Somebody2>find the project name that your new one goes after
17:45:52<Somebody2>alphabetically
17:47:02<Ryz>Here's what I have so far what I edited:
17:47:04<Ryz>{{subst:urlteam blank warrior entry|amp-gs|https://amp.gs/jpWAq|https://amp.gs/ redirects to https://amplifr.com/}}
17:47:08<Somebody2>perfect!
17:47:26<Ryz>I was initially thinking of shoving it in table, but that looked really really wrong
17:47:27<Somebody2>You can actually just put that at the top or bottom of the table and I or someone else will move it into alphabetical order
17:48:20<Ryz>Huh, that's an odd way to do this Oo;
17:48:35<Somebody2>eh, it's shorter (but more confusing)
17:48:43<Ryz>Yeah, I guess I cover the other ones that I mentioned earlier that's running right now
17:50:17<Somebody2>save the page?
17:50:34<Somebody2>and I can look at it?
17:50:38<Ryz>Hold on, I'm doing all 7 of these--
17:50:47<Somebody2>that works too
17:50:48<Ryz>Oh, okay, I'll do mine and one of phuzion's
17:50:59<Ryz>Back to original plan then xD
17:51:08<Somebody2>:-P
18:02:51<Ryz>I did it Somebody2, check it o:
18:03:12<Somebody2>looking
18:03:41<Somebody2>cool, let me put it in the table
18:05:01<Somebody2>ah, my account isn't auto approved
18:05:14<Ryz>Ah, welp
18:42:24<Ryz>Hmm, Somebody2, looking at chng-it - it looks like there's 503s being hit
18:42:42<Ryz>Instead of 404s, when encountering links like https://chng.it/1qm4
18:43:51<Ryz>It's in sprinkles~
18:46:45<Ryz>Would like to try and continue on this, wanna go through the stuff I posted here over the years and may or may have not been added into the wiki
18:48:14<Ryz>For instance, https://on.belk.com/ - sample URL: https://on.belk.com/2Ln8a6h - however, may be danger territory since it's a Bit.ly URL shortener for https://belk.com/
18:53:20<@JAA>Yeah, we don't want any bit.ly aliases here since it just duplicates work with the existing bit.ly scrape.
18:54:14<Ryz>Oh, huh, can I add it to the wiki then still?
18:55:06<@JAA>Hmm, actually, it appears that bit.ly no longer resolves codes belonging to an alias under bit.ly.
18:55:30<@JAA>There's a section for bit.ly aliases on the page.
18:56:08<Ryz>Ah yeah, putting it down there~
18:58:11<Ryz>Next one is http://rlu.ru/ - which is uhh, definitely not a normal URL shortener at all - http://rlu.ru/2KGu
18:58:31<Ryz>Found it from https://vadiqusite.wordpress.com/2014/09/24/%d0%b1%d0%be%d0%b9%d0%ba%d0%be%d1%82-%d0%b8%d1%80%d0%b8%d1%88%d0%ba%d0%b0/
19:10:17<Ryz>This looks like there's a lot more work involved with this one...
19:14:43<@JAA>Yeah, that one's weird. Another code redirected me directly on the first attempt but then gave a similar page.
19:14:55<@JAA>A code resolving to another shortener produces this: http://rlu.ru/2YdDt
19:19:48<Ryz>Ugh, meanwhile something like https://4fun.tw/ isn't better, seeing links like https://4fun.tw/iedi
19:31:11<Somebody2>Sorry, didn't see the message till now
19:31:46<Ryz>There's some occasional sprinkles of 503s, Somebody2, the last time I saw it
19:32:28<Somebody2>occassional 503s are fine
19:32:56<Somebody2>the only concern is if the *same* shortcode consistently 503s
19:33:05<Ryz>Hmm... oo;
19:33:47<Somebody2>because we'll retry
19:36:00<Ryz>Ah, well, here's 2 URL shorteners up there that I wanna work are stupidly non-standard >#<;
19:36:22<Ryz>Meanwhile I'm grinding through putting down dead URL shorteners, some of them have actually died since mentioning them X_x;
19:38:14<Ryz>That's one of the other reasons for wanting access >#<;
19:42:40<Somebody2>yeah, that's a good reason to jump on them -- they die like mayflies
19:43:05<Ryz>Considering my endless looting adventuring, woooooo <#>;
19:44:03<Ryz>And having to deal with more complicated URL shorteners
19:44:17<Ryz>...And probably one more that's user agent specific <_>;
19:46:19<Somebody2>heheheh
19:59:49<Ryz>Like, it's mobile device useragent specific x_x;
20:05:05<Somebody2>Yeah, we don't support useragent changes without custom code (that we *do* support)
20:26:41<Ryz>JAA, adding in the 3 dead URL shorteners as per https://wiki.archiveteam.org/index.php?title=URLTeam&oldid=48552#Dead_or_Broken - a bit surprised there isn't a bot that would auto-sort the links alphabetically like https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine
20:26:57<Ryz>Especially as the dead list will grow longer and longer
20:28:46<Ryz>Hmm, probably gonna do http://www.man.ac.uk/ - sample URL: http://www.man.ac.uk/3g7Xni - since it's as simple as the other projects I done
20:29:07<Ryz>Problem is, whether to have it as http://www.man.ac.uk/3g7Xni or http://man.ac.uk/3g7Xni - thoughts, Somebody2?
20:31:45<Ryz>Oh interesting, it didn't like HTTPS at all, https://man.ac.uk/3g7Xni - it would just instead give a "Not Found" followed by "The requested URL /3g7Xni was not found on this server."
20:32:18<Ryz>Meanwhile, https://wheregoes.com/trace/20222459121/ - redirect type, it's just 302 Redirect
20:33:05<Ryz>Meanwhile, for unavailable IDs, as per https://wheregoes.com/trace/20222459135/ - it's a 404
20:33:30<Ryz>Or "No redirect status codes", a flippy floppy
20:34:23<Somebody2>generally using a shorter base URL is better, but it doesn't matter much
20:34:48<Somebody2>there likely could be a bot to sort the list, but someone has to write it
20:37:29<Ryz>Hmm, this is a bit interesting and hesitate-y, checking https://web.archive.org/web/*/http://man.ac.uk/* - it uses a URL shortener but there's non-URL shorter content there s:
20:38:33<Ryz>Unsure if need further instructions or a go-ahead
20:38:53<Ryz>Since I set up the 'Shortener Settings', nothing unusual initially
20:39:21<Somebody2>eh, go ahead
20:40:20<Ryz>Yeah, another project running through
20:40:27<Somebody2>?
20:40:37<Ryz>That one was was requested by flashfire
20:40:44<Ryz>A long time ago~
20:40:50<Somebody2>another project where?
20:41:02<Ryz>Just man-ac-uk - http://www.man.ac.uk/
20:41:27<Ryz>http://www.man.ac.uk/3g7Xni again being the sample URL
20:41:58<Somebody2>I don't see another project on the urlteam tracker...?
20:42:01<Ryz>Uhh, is there something wrong?
20:42:09<Somebody2>just the one you created just now
20:42:18<Ryz>Oh, I didn't do the Queue enable somehow
20:42:30<Somebody2>yep :-)
20:42:39<Somebody2>not exactly sure why we have the two steps
20:43:02<Ryz>I guess to reduce accidents?
20:43:08<Somebody2>probably
20:43:29<Ryz>Anyway, all I have left is 3 non-standard URL shorteners in possession that I mentioned here years ago x_x
20:44:49<Ryz>I stopped it because I just realize that http://man.ac.uk/2pW redirects to http://www.man.ac.uk/2pW - but is seen as valid and captured
20:46:01<Ryz>So I'll have to keep it as http://www.man.ac.uk/ then Somebody2, unless I would have to make an ignore
20:48:22<Somebody2>oy
20:48:27<Somebody2>makes sense
20:48:31<Somebody2>we can look at it again later
20:48:44<Somebody2>I'm going to be AFK for a while now, but I'll read the logs when I get back
20:48:53<Ryz>Okay >#<;
23:30:12<@JAA>Ryz: It's certainly possible to have a bot do that. Someone would just have to write it.
23:30:58<Ryz>Your JAABot seems to do the same alphabetical sorting task; would it be possible to have it cover the dead URL shorteners?
23:31:28<@JAA>It's the same but not the same really.
23:32:45<@JAA>For starters, this one would be only a section, not the entire page, which already makes it a bit more complicated.
23:42:08<Somebody2>I'm totally in favor of separating that page into multiple pages -- this could be a good opertunity to do that.
23:42:53<@JAA>The one nice thing about having everything on one page is that you can easily ^F to check whether a shortener is known. But other than that, yeah, I agree.