| 00:05:32 | <Ryz> | Hey phuzion, is it possible to change 'Shortener Settings' while the project is running? |
| 00:05:50 | <Ryz> | Somebody2, I don't think you read my concerns regarding ed-gr... s: |
| 00:10:09 | <Somebody2> | Ryz: huh, did I miss them? |
| 00:10:42 | <Somebody2> | I admit, my attention is ... somewhat lacking sometimes. :-) |
| 00:10:44 | <Ryz> | The reason I did not immediately ran ed-gr is that one instance is that https://ed.gr/nc9Z and https://ed.gr/nc9z are considered the same URL |
| 00:10:59 | <Somebody2> | Eh, that's not actually a big problem. |
| 00:11:18 | <Ryz> | Uppercase and lowercase character letters are interchangeable and not unique |
| 00:11:31 | <Somebody2> | It'll slow things a bit (which is fine), and we'll get duplication, but <shrug> it's not that much data. |
| 00:11:42 | <Somebody2> | Or we could just strip out the uppercase from the alphabet. |
| 00:12:06 | <Ryz> | I was asking if we should get both versions regardless or strip the uppercase from the alphabet |
| 00:12:16 | <Somebody2> | Ah, I'd go with both versions |
| 00:12:32 | <Somebody2> | At least, until we get thru 4 characters. |
| 00:12:41 | <Somebody2> | Maybe for 5 and 6 we'll just do lowercase |
| 00:12:45 | <Somebody2> | to speed it up |
| 00:13:25 | <Somebody2> | Like, we've already made it to n |
| 00:13:36 | <Somebody2> | We should be done with the 4 character ones in a day or two |
| 00:15:49 | <Somebody2> | Added HTTP 502 to git-io Unavailable codes |
| 00:19:59 | <Ryz> | Hey folks, I have https://alme.re/3rh5rOI - when checking https://alme.re/ - it reveals itself to be a Bit.ly thingy (even more so when doing https://alme.re/3rh5rOI+ ), |
| 00:20:19 | <Ryz> | I was informed that I should not add it as a project because it would share the same ID as Bit.ly in general, |
| 00:20:49 | <Ryz> | But when I did it as https://bit.ly/3rh5rOI it doesn't work |
| 00:21:17 | <Ryz> | Am I able to run this as a separate project? |
| 00:21:51 | <@JAA> | Yeah, that's what I meant the other day, apparently custom site codes no longer work on bit.ly and vice-versa. |
| 00:22:50 | <@JAA> | Which would mean we'd have to scan the full keyspace on every bit.ly alias for proper coverage, but bit.ly alone is already too big to fully cover... |
| 00:42:48 | | tzt (tzt) joins |
| 02:02:46 | <Ryz> | So JAA and/or phuzion, on chng.it - I'm pondering whether to just not process https://chng.it/ IDs anymore even more (the project already stopped), because sampling on https://web.archive.org/web/*/https://chng.it/* - it's a 8 character letter ID, which is oof, might take too long and out of reach |
| 02:03:26 | <@JAA> | 'might' :-) |
| 02:04:42 | <@JAA> | Eight characters of alphanumeric digits is 218 trillion combinations. Not even 1000 times more than all snapshots in the WBM. :-) |
| 02:05:48 | <Ryz> | Yeah s: |
| 02:06:07 | <@JAA> | Basically, six digits is doable but takes a long time, and anything beyond is impossible. That's for 0-9a-zA-Z. |
| 02:07:06 | <Ryz> | If you put it that way, welp~ |
| 02:07:52 | <Ryz> | Would it be better to just delete the project in that case? It looks like it would have to be stored in the 'Alive' section of https://wiki.archiveteam.org/index.php/URLTeam almost forever |
| 02:08:04 | <Ryz> | ...There's even ones that have 10 character letter IDs |
| 02:27:28 | <Ryz> | Gonna probably work on a couple of URL shorteners |
| 02:28:23 | <Ryz> | To go for better workflow, I will post what I'm doing all in one piece; gonna type it all in the text file, that's my workspace, and and then post it all here |
| 02:28:41 | <Ryz> | I will say this: Starting project zc-vg ( https://zc.vg/0GMqC ) |
| 02:32:00 | <Ryz> | Checked https://web.archive.org/web/*/https://zc.vg/* - usual alphabet, no - or _ weirdness |
| 02:32:00 | <Ryz> | URL template is https://zc.vg/{shortcode} - there's a HTTP version, but HTTPS will override it |
| 02:32:00 | <Ryz> | For redirect method, checking https://wheregoes.com/trace/20222521977/ - it's a 302 Redirect |
| 02:32:00 | <Ryz> | For when there's an invalid ID, checking https://wheregoes.com/trace/20222522010/ - it gives a 302 Redirect and instead redirects to https://campaigns.zoho.com/InvalidShortURL.zc - which means I'll have to type up a "Location header reject regular expression" |
| 02:33:13 | <Ryz> | Entry on Location header reject regular expression: https?://campaigns\.zoho\.com/InvalidShortURL\.zc$ |
| 02:34:05 | <Ryz> | Hmm, reading the error page, "Our server can not find the page you requested. You might have typed the link incorrectly or used an outdated link. Please check the link once again.", the phrase 'outdated link', wondering if that means the URL ID can be deactivated or something... |
| 02:35:57 | <Ryz> | Running it~ |
| 02:37:26 | <Ryz> | It's looking good, the "Location header reject regular expression" thing I filled it seems to be followed properly |
| 02:37:50 | <Ryz> | Woo, nice, I see results, happy happy o: |
| 02:38:13 | <Ryz> | More of a trickle but still valid and good o: |
| 03:11:12 | <@phuzion> | Ryz: Yes it's possible to change the settings while it's running, but any outstanding claims will still be out with the old settings. |
| 03:11:43 | | AK quits [Remote host closed the connection] |
| 03:13:24 | | AK (AK) joins |
| 03:13:26 | | AK quits [Remote host closed the connection] |
| 03:15:03 | <Ryz> | Hey phuzion, you available? I have 2 URL shorteners that I need some help with |
| 03:15:24 | <@phuzion> | sup |
| 03:15:31 | <Ryz> | Or rather one URL shortener with 2 variants |
| 03:15:40 | <@phuzion> | alright, give em to me |
| 03:15:55 | <Ryz> | mpg-smh-re ( https://mpg.smh.re/2jC3 ) |
| 03:15:58 | <Ryz> | For valid, https://wheregoes.com/trace/20222522124/ - uhh, interesting, it's a 200, which means it's not the usual redirect thing at all; it's supposed to redirect to https://jobs.morganphilips.com/es-es/contable-barcelona-133329?utm_campaign=611bbe77e785860001cdddbf&utm_content=625006aed5d5300001c509af&utm_medium=smarpshare&utm_source=twitter |
| 03:16:02 | <Ryz> | This is the offending part S: |
| 03:16:44 | <@phuzion> | ok so |
| 03:17:06 | <@phuzion> | what appears to be happening is a redirect in the client, likely using javascript |
| 03:17:11 | <@phuzion> | rather than using an HTTP redirect |
| 03:17:15 | <Ryz> | If I'm able to do this, I can also finish up eu-smh-re ( https://eu.smh.re/008p ) |
| 03:18:43 | <Ryz> | Oh, so I'll have to use "Content body regular expression" then s: |
| 03:18:49 | <@phuzion> | Yeah sounds like it |
| 03:18:58 | <@phuzion> | Assuming that the destination URL is served in the content body. |
| 03:19:18 | <@phuzion> | If it's not, and it's dependent on javascript hitting another endpoint or something, it might require custom code. |
| 03:19:24 | <Ryz> | Checking the view source of https://mpg.smh.re/2jC3 |
| 03:19:28 | <Ryz> | <link rel="prefetch prerender canonical" href="https://jobs.morganphilips.com/es-es/contable-barcelona-133329?utm_campaign=611bbe77e785860001cdddbf&utm_content=625006aed5d5300001c509af&utm_medium=smarpshare&utm_source=twitter"> |
| 03:19:44 | <Ryz> | Clean one it seems, no '\' infecting it |
| 03:20:26 | <@phuzion> | Yeah that should work then. Assuming that <link rel> tag is consistent, you should be able to snag that with a client body regex. |
| 03:21:10 | <Ryz> | What would be the suggested text for this? |
| 03:21:18 | <@phuzion> | Are you familiar with regular expressions? |
| 03:22:19 | <@JAA> | Does that handle the HTML entity encoding, e.g. &? |
| 03:22:43 | <Ryz> | Uhhh, basic or from me doing a lot of ignores at ARchiveBot, with the help of the original dashboard's premade ignores |
| 03:42:53 | <Ryz> | phuzion? oo; |
| 03:43:42 | <@phuzion> | Ryz: Basically, what you want to do is write a regex that will capture the URL and only the URL from the HTML body. |
| 03:46:03 | <Ryz> | Hmm, well, I'm not too sure how to wrap my head around that since I do ignores in ArchiveBot jobs |
| 03:46:35 | <Ryz> | The dumbest thing I can think of is: <link rel="prefetch prerender canonical" href=".*"> >_>; |
| 03:48:29 | <@JAA> | phuzion: Good luck, I've been trying to get Ryz to learn regex for years now over in #archivebot. :-P |
| 03:48:57 | <Ryz> | Sssssssh, at least I'm learning in pieces, even if it's so dumb <.<; |
| 03:49:12 | <@phuzion> | JAA: Me teaching someone regex would be like a blind driving instructor whose only concept of a car is "i think it has wheels and i know it goes vroom" |
| 03:49:32 | <@phuzion> | Ryz: regex101.com is a good tool that you can use to experiment with regex and learn what you're doing and why it's doing what it's doing. |
| 03:51:43 | <Ryz> | Trying to wrap it around in the context of having the bot find something rather than ignore a link~ |
| 03:59:38 | <Ryz> | Am I supposed to do a match or a substitution for this? |
| 04:00:00 | | treora quits [Quit: blub blub.] |
| 04:01:14 | | treora joins |
| 04:03:32 | <Ryz> | phuzion? |
| 04:03:48 | <@phuzion> | Not sure, looking now. |
| 04:04:01 | <@phuzion> | My guess would be a match but don't quote me on that yet |
| 04:04:45 | <Ryz> | Because uhh, trying to deal with regex mixed with HTML, uhh, that's steps above the usual ignoring at ArchiveBot |
| 04:08:31 | <@phuzion> | Ryz: I'm not 100% sure but this might work? <link rel=\"prefetch prerender canonical\" href=\"(.+)\"> |
| 04:09:13 | <@phuzion> | I used the dlvr-it project as a reference, and roughly based my regex off of how that one looks. |
| 04:09:57 | <Ryz> | Checking via https://regex101.com/ - it seems to work; I know .+, but I never seen it used when it came to being sealed by ( ) at all O_o; |
| 04:10:39 | <Ryz> | The test string being: |
| 04:10:40 | <Ryz> | <link rel="prefetch prerender canonical" href="https://jobs.morganphilips.com/es-es/contable-barcelona-133329?utm_campaign=611bbe77e785860001cdddbf&utm_content=625006aed5d5300001c509af&utm_medium=smarpshare&utm_source=twitter"> |
| 04:13:25 | <@phuzion> | Ryz: Try a few different ones, and see how they work. |
| 04:13:52 | <Ryz> | Oh, different examples of the URLs to extract from? |
| 04:13:59 | <@phuzion> | Yeah |
| 04:14:57 | <Ryz> | Let's see... |
| 04:19:55 | <Ryz> | http://mpg.smh.re/0Tz3 - https://mpg.smh.re/0zAZ - https://mpg.smh.re/1-Hn - https://mpg.smh.re/1_9Q - https://mpg.smh.re/1HA3 - https://mpg.smh.re/225d |
| 04:20:16 | <Ryz> | They all seem to be contained as per the regex you made phuzion |
| 04:20:24 | <Ryz> | The links I sampled from https://web.archive.org/web/*/https://mpg.smh.re/* |
| 04:21:44 | <@phuzion> | Set it up, enable it for a moment to let a few jobs go out, disable it again, and watch the results come in. |
| 04:21:56 | <@phuzion> | Ryz: Did you set up the alphabet properly on that project? |
| 04:22:03 | <Ryz> | Gimme a moment o: |
| 04:22:15 | <Ryz> | I haven't set up the project yet |
| 04:22:34 | <Ryz> | I typed the instructions on what to do on Notepad++ in order to increase workflow |
| 04:23:16 | <Ryz> | While I punch in the instructions, do you think I should delete off the chng-it project? Since talking to JAA, I found out it's a 8 character letter ID (sometimes 10) |
| 04:23:37 | <@phuzion> | Leave it disabled for now. It's set up, but outside of the scope of what we can check. |
| 04:23:50 | <Ryz> | Mhm~ |
| 04:24:01 | <@flashfire42> | Dont delete any projects guys..... |
| 04:24:12 | <@flashfire42> | Disable it and mark it on the wiki as dead if its dead |
| 04:24:19 | <Ryz> | I was gonna also say let's do a review on all the existing projects to see if they need to see if they should be resumed or not |
| 04:24:44 | <Ryz> | The projects don't seem to have notes, which makes it hard to tell if they're supposed to be considered finish |
| 04:25:17 | <Ryz> | Here's what I typed up for the latest project: |
| 04:25:17 | <Ryz> | Starting mpg-smh-re ( https://mpg.smh.re/2jC3 ) |
| 04:25:17 | <Ryz> | Checking https://web.archive.org/web/*/https://mpg.smh.re/* - alphabet is the usual default, wait a minute, I see - and _ - welp, time to change it |
| 04:25:17 | <Ryz> | https://mpg.smh.re/{shortcode} |
| 04:25:17 | <Ryz> | For valid, https://wheregoes.com/trace/20222522124/ - uhh, interesting, it's a 200, which means it's not the usual redirect thing at all; it's supposed to redirect to https://jobs.morganphilips.com/es-es/contable-barcelona-133329?utm_campaign=611bbe77e785860001cdddbf&utm_content=625006aed5d5300001c509af&utm_medium=smarpshare&utm_source=twitter |
| 04:25:17 | <Ryz> | For invalid, https://wheregoes.com/trace/20222522159/ - it's a 400 |
| 04:25:18 | <Ryz> | (((in progress))) |
| 04:26:25 | <Ryz> | So for unavailable status codes, I would have to leave it blank, delete the 200, since that's for the redirect |
| 04:28:37 | <Ryz> | phuzion, I set up the project, please check the "Shortener Settings" if I did something wrong with this |
| 04:28:50 | <@phuzion> | Gimme a few minutes and I’ll look. |
| 04:42:44 | <@phuzion> | Ryz: Looks good so far, I enabled it for a second to see if we get any results. |
| 04:43:03 | <@phuzion> | Yeah the regex failed for some reason. |
| 04:44:08 | <@phuzion> | flashfire42: You around to troubleshoot a regex? |
| 04:44:19 | <Ryz> | Does that mean we have to extend to everything else in the redirect page? |
| 04:44:31 | <@phuzion> | No, we probably just need to tweak the regex. |
| 04:44:32 | <Ryz> | The only other thing I can think of that may trip the project is those two space bars |
| 04:44:52 | <Ryz> | That's a thing in every of the JS redirect |
| 04:50:55 | <@phuzion> | Figured it out. |
| 04:50:59 | <@flashfire42> | Oh I am sorry I am awake whats up |
| 04:51:11 | <@flashfire42> | regex aint my expertise but I have a rough idea |
| 04:51:11 | <@phuzion> | I think we figured out the issue, gonna fire off a test real quick |
| 04:51:31 | <@phuzion> | Ryz: It looks like you used / instead of \ |
| 04:51:49 | <@phuzion> | wait |
| 04:52:07 | <@flashfire42> | https://linkjust.com/ |
| 04:52:16 | <@flashfire42> | https://exe.io/ |
| 04:52:49 | <Ryz> | Huh? I used the one you gave me exactly phuzion |
| 04:52:59 | <@phuzion> | Ryz: Nevermind, the tracker converted \ to / |
| 04:53:14 | <@phuzion> | and it looks like we don't need to escape " in the tracker |
| 04:53:59 | <Ryz> | That's so weird s: |
| 04:54:51 | <@flashfire42> | I come across all sorts of link shorteners from various sketchy sources |
| 04:54:54 | | eroc1990 quits [Client Quit] |
| 04:55:25 | <@phuzion> | I think it would be useful to have a bot in here where we could be like !shortener https://foo.bar and it adds it to the wiki page or something. |
| 04:55:56 | <@flashfire42> | I wish. I just dump them here and occasionally go back through logs and add them to the unsorted page |
| 04:56:20 | <Ryz> | flashfire42, sketchy loot o#o; |
| 04:56:22 | <@flashfire42> | Collecting URL shorteners was a side project of mine for a short time. As was the FTP list before they removed FTP support from most browsers |
| 04:57:05 | <Ryz> | Mmm :C |
| 04:57:06 | <Ryz> | Lost loot |
| 04:57:25 | <@flashfire42> | Not lost just not easy to access anymore |
| 04:57:31 | <@flashfire42> | any standard FTP client can still do it |
| 04:57:51 | <Ryz> | More difficult loot then ;-; |
| 04:57:51 | <@flashfire42> | WGET just hates FTP. ~~I think thats the one it is might be WPULL~~ |
| 04:58:45 | <Ryz> | Are we good to go phyzion? |
| 04:58:49 | <Ryz> | *phuzion |
| 04:58:52 | <@phuzion> | No |
| 04:59:21 | <Ryz> | Oh? |
| 04:59:32 | <@phuzion> | Working on it |
| 04:59:34 | <@phuzion> | Fixing the regex |
| 05:00:27 | <@phuzion> | flashfire42: is delete queue safe to use on a project that hasn't technically started? |
| 05:00:33 | <Ryz> | Aaaaah, okay~ |
| 05:00:34 | <@phuzion> | in the sense that we don't have any results |
| 05:00:40 | <@flashfire42> | Yes |
| 05:00:44 | <@flashfire42> | should be. |
| 05:01:11 | <@phuzion> | Ok. I keep getting "AttributeError: 'HTMLParser' object has no attribute 'unescape'" |
| 05:01:23 | <@phuzion> | In the error reports |
| 05:01:28 | <Ryz> | If we can manage to make this work, not only that and https://eu.smh.re/ would be processed, but any URL shortener under https://smh.re/ would also be covered |
| 05:02:50 | <@flashfire42> | which project is it phuzion |
| 05:03:03 | <@phuzion> | https://tracker.archiveteam.org:1338/project/mpg-smh-re/settings |
| 05:03:11 | <@phuzion> | oh hang on |
| 05:03:28 | <@phuzion> | would it be because we have 200 as a redirect status code? |
| 05:03:39 | <@flashfire42> | you wanna start it from the very start? |
| 05:03:54 | <@phuzion> | Well, I wanna figure out what's going on first before we wipe the project |
| 05:04:29 | <@flashfire42> | 301 302 303 307 200 are all of these valid redirect codes? |
| 05:04:56 | | eroc1990 (eroc1990) joins |
| 05:05:08 | <@flashfire42> | You check using Curl which are actually in use then when you start it you wait til it runs into an error to see if there are other status codes that it encounters |
| 05:06:07 | <@phuzion> | Well, a 200 is what you get with a redirect. But it happens with javascript, so we're not looking at the redirect header. |
| 05:06:12 | <@phuzion> | Similar to how dlvr-it works |
| 05:06:26 | <@phuzion> | And that project has 200 in the unavailable status codes box |
| 05:06:58 | <@flashfire42> | Oh lord Javascript redirects? this is why I only ever added very simple shorteners |
| 05:06:58 | <Ryz> | Yeah, I removed 200 from "Unavailable status codes" because otherwise it would get skipped or discarded |
| 05:07:07 | <@flashfire42> | theres is a reason we never did adfly |
| 05:07:32 | <@phuzion> | flashfire42: It's very simple, the destination URL is in the HTML of the page you load. |
| 05:07:57 | <@phuzion> | Nice, lttr-ai's queue is starting at "Jedi" as of the last time I reloaded the projects page |
| 05:08:07 | <@phuzion> | https://phuzion.s-ul.eu/w4STOqkK.png |
| 05:08:18 | <Ryz> | lol nice |
| 05:08:45 | <@phuzion> | I'm gonna retry this with 200 in the unavailable status codes and see if it picks things up that way. |
| 05:10:38 | <@JAA> | flashfire42: Both wget and wpull have poor FTP support. wpull is the one that crashes on any error though. |
| 05:15:40 | <@phuzion> | Yeah I'm not gonna lie, I'm stumped on this one Ryz |
| 05:16:02 | <Ryz> | Awww this really sucks... |
| 05:16:22 | <Ryz> | Are there other projects that had to deal with javascript? |
| 05:16:31 | <Ryz> | Not the ones with custom code necessarily |
| 05:17:19 | <@phuzion> | I mean, dlvr-it doesn't do an HTTP redirect, but gives the URL to redirect to directly in the HTTP body, so you can use that as a starting point. |
| 05:24:51 | <Ryz> | Hmm, it's hard to tell how exactly it works since it's not a Javascript redirect anymore for me to look into it easier |
| 05:26:33 | <Ryz> | Hmm, maybe I can fix prefetch prerender canonical\" href=\"(.+)\"> into prefetch prerender canonical" href="(.+)"> |
| 05:26:43 | <Ryz> | Since dlvr-it had it like that for some weird reason |
| 05:27:54 | <Ryz> | Gonna run it again phuzion >#<; |
| 05:28:55 | <@phuzion> | Go for it |
| 05:29:39 | <@phuzion> | Nope, no luck |
| 05:29:46 | <Ryz> | Nope, it still didn't work... |
| 05:29:58 | <@phuzion> | We're getting two errors: UnexpectedNoResult: Unexpectedly did not get a body result for u'https://mpg.smh.re/wQ' |
| 05:30:06 | <@phuzion> | AttributeError: 'HTMLParser' object has no attribute 'unescape' |
| 05:30:22 | <@phuzion> | Ryz: I think it's time to put this one down until someone who understands things a bit better can chime in. |
| 05:30:29 | <Ryz> | Is this supposed to be this finnicky? :c |
| 05:30:32 | <Ryz> | Oof, probably... |
| 05:31:17 | <Ryz> | Well, I guess one URL shortener I can do is pst-cr ( https://pst.cr/AqsBL ) |
| 05:32:11 | <Ryz> | phuzion, you think it's worth reviewing the other projects to see if it's worth running again? |
| 05:32:35 | <@phuzion> | Not yet. |
| 05:33:39 | <Ryz> | Hmm oo; |
| 05:34:20 | <@phuzion> | exe.io has a recaptcha so that's probably a no-go |
| 05:34:22 | <Ryz> | Checking https://web.archive.org/web/*/https://pst.cr/* - usual alphabet; though curious on always seeing the IDs in 5 character letters |
| 05:35:13 | <@JAA> | As far as I can tell, `AttributeError: 'HTMLParser' object has no attribute 'unescape'` is a bug in the terroroftinytown code. |
| 05:35:23 | <Ryz> | For valid, https://wheregoes.com/trace/20222523410/ - 301 Redirect |
| 05:35:29 | <@JAA> | Specifically, it's using a method of HTMLParser that was never part of the public API and removed at some point. |
| 05:35:53 | <@JAA> | https://github.com/python/cpython/blob/v2.7.18/Lib/HTMLParser.py#L445 |
| 05:36:28 | <@phuzion> | JAA: Any idea why we keep triggering it on mpg-smh-re? |
| 05:36:32 | <@JAA> | It was removed a couple years ago, first released in 3.9: https://github.com/python/cpython/commit/fae0ed5099de594a9204071d555cb8b76368cbf4 |
| 05:36:42 | <@JAA> | So it gets triggered on any worker that uses Python 3.9 or higher. |
| 05:37:00 | <@phuzion> | huh |
| 05:37:38 | <Ryz> | Although, I do notice it seems to redirect to https://byq6z.app.goo.gl/AqsBL - checked another one https://wheregoes.com/trace/20222523424/ - https://byq6z.app.goo.gl/14ZLC - this is unusual |
| 05:37:45 | <@JAA> | That commit claims that it was 'undocumented and deprecated since Python 3.4', but I can't find it in the 3.3 docs either, and it was always marked as internal in a comment in the code... |
| 05:37:50 | <@JAA> | ¯\_(ツ)_/¯ |
| 05:37:57 | <Ryz> | Are all of https://pst.cr/ redirect to https://byq6z.app.goo.gl/ ? |
| 05:38:27 | <Ryz> | When doing invalid as per https://wheregoes.com/trace/20222523437/ - giving a 404; yes, yes it does |
| 05:38:38 | <@JAA> | But yeah, that HTML unescaping thing needs a fix in terroroftinytown to use html.unescape instead. |
| 05:39:26 | <@phuzion> | Ryz: Seems like it's a double redirect. |
| 05:41:53 | <Ryz> | Yeah, we could cover https://byq6z.app.goo.gl/ instead, but that would make https://pst.cr/ unreachable, if one is covered over the other~ |
| 05:42:09 | <Ryz> | Like, it looks like there isn't a way to do a double redirect validation |
| 05:45:22 | <Ryz> | Sigh, looks like 3 more URL shorteners into the 'Alive' pile~ |
| 05:56:31 | <Ryz> | It looks like I have 3 more smh.re types |
| 05:56:39 | <Ryz> | In the text file that is |
| 05:56:49 | <Ryz> | I guess I'll group them together as one entry I suppose ><; |
| 05:59:27 | <datechnoman> | Atleast we have plenty of bandwidth (server) to process through them all! |
| 05:59:52 | <datechnoman> | But no one like shortners tbh. Only good on presentations for viewers |
| 05:59:56 | <datechnoman> | sigh.... |
| 06:03:03 | <Ryz> | shorturl-at ( https://www.shorturl.at/dorGH ) seems to be a lot simpler... |
| 06:05:53 | <Ryz> | phuzion, what are your thoughts on handling upper and lowercase character letters being interchangeable and not unique? |
| 06:06:04 | <Ryz> | For instance: https://www.shorturl.at/dorGH and https://www.shorturl.at/dorgh are the same URL |
| 06:06:36 | <Ryz> | Should we archive only either uppercase or lowercase, or still cover both? |
| 06:07:36 | <@phuzion> | Ryz: I'm sure that someone has documented a previous shortener on the wiki that is case-insensitive. Why not check and see what we did in the past there. |
| 06:09:34 | <Ryz> | phuzion, in one case, ebing kas-pr, it mentions this: "Codes are case-insensitive, so only scanning lower-case." |
| 06:09:52 | <Ryz> | All the other ones that mention case-insensitive have no instructions on what they're done at 'em |
| 06:10:20 | <Ryz> | Well, actually one other project job, qt-catbox-moe - the others were just in the 'Alive' and not projects |
| 06:13:21 | <Ryz> | Hmm, https://shorturl.at/dorGH redirects to https://www.shorturl.at/dorGH - which redirects to https://www.instagram.com/accounts/login/ |
| 06:13:52 | <Ryz> | Shouldn't the project be named www-shorturl-at instead of shorturl-at, phuzion? |
| 06:14:16 | <@phuzion> | Dunno, I'm going to bed for the night. |
| 06:14:29 | <Ryz> | To clarify, https://wheregoes.com/trace/20222523882/ - it's a 301 followed by a 302 |
| 06:14:38 | <Ryz> | Ah, nini~ |
| 06:15:46 | <@JAA> | I'd run it against www and only lower-case, I think. |
| 06:16:49 | <Ryz> | Against? Can you clarify on that? |
| 06:16:58 | <Ryz> | Because not having www would be a double redirect |
| 06:17:33 | <Ryz> | As for invalid, https://wheregoes.com/trace/20222523707/ - it's a 302 Redirect that goes back to https://www.shorturl.at/ |
| 06:17:42 | <Ryz> | So https?://www\.shorturl\.at/$ for the "Location header reject regular expression" it is |
| 06:18:27 | <@JAA> | As in, make requests to www.shorturl.at, not shorturl.at (and follow the predictable redirect to www). |
| 06:19:02 | <Ryz> | Welp, meanwhile "Export in progress" is going on, so stuck for a bit |
| 06:21:56 | <Ryz> | The double redirect JAA? |
| 06:25:38 | <Ryz> | Think phuzion told me if somehing like that happens, redirect with https://www.shorturl.at/dorGH and not https://shorturl.at/dorGH |
| 06:37:01 | | michaelblob quits [Read error: Connection reset by peer] |
| 06:40:12 | <Ryz> | Mmm, twas gonna run it now but was wondering whether to make it target 5 character letter IDs only instead of going from 1 character letter ID into increasing it |
| 07:24:06 | | michaelblob (michaelblob) joins |
| 08:55:33 | | Gereon62 (Gereon) joins |
| 09:42:04 | | Atom quits [Read error: Connection reset by peer] |
| 11:51:34 | | AK (AK) joins |
| 15:42:16 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 15:45:11 | | Gereon620 (Gereon) joins |
| 15:45:29 | | Gereon62 quits [Client Quit] |
| 15:45:29 | | eroc1990 quits [Client Quit] |
| 15:45:29 | | Gereon620 is now known as Gereon62 |
| 15:46:04 | | eroc1990 (eroc1990) joins |
| 16:10:06 | | knecht420 quits [Quit: The Lounge - https://thelounge.chat] |
| 16:10:58 | | knecht420 (knecht420) joins |
| 18:45:40 | <h2ibot> | [AT] URLTeam tracker https://tracker.archiveteam.org:1338/api/health is down (507,HTTP status returned: 507) |
| 19:34:09 | | TheTechRobo quits [Remote host closed the connection] |
| 19:35:10 | | TheTechRobo (TheTechRobo) joins |
| 19:35:33 | | TheTechRobo quits [Remote host closed the connection] |
| 19:35:55 | | TheTechRobo (TheTechRobo) joins |
| 20:05:32 | <Somebody2> | t-ly was freaking out. Cleared the errors, turned off auto-queue, we'll see what happens. |
| 20:05:40 | | qwertyasdfuiopghjkl joins |
| 20:05:45 | <h2ibot> | [AT] URLTeam tracker https://tracker.archiveteam.org:1338/api/health is up (200,Success) |
| 20:05:46 | <Somebody2> | (also increased timeout between requests from 0.5 sec to 1.5 sec) |
| 20:50:08 | | TheTechRobo quits [Remote host closed the connection] |
| 20:50:25 | | TheTechRobo (TheTechRobo) joins |
| 22:30:27 | | HackMii quits [Ping timeout: 252 seconds] |
| 22:33:18 | | HackMii (hacktheplanet) joins |
| 22:38:42 | | HackMii quits [Ping timeout: 252 seconds] |
| 23:07:49 | | HackMii (hacktheplanet) joins |
| 23:18:39 | | mgrandi quits [*.net *.split] |
| 23:18:39 | | @JAA quits [*.net *.split] |
| 23:18:39 | | h2ibot quits [*.net *.split] |
| 23:18:39 | | Dj-Wawa quits [*.net *.split] |
| 23:18:39 | | Hecz quits [*.net *.split] |
| 23:18:53 | | Hecz joins |
| 23:18:54 | | Hecz is now authenticated as Hecz |
| 23:18:54 | | Hecz quits [Changing host] |
| 23:18:54 | | Hecz (Hecz) joins |
| 23:19:15 | | mgrandi (mgrandi) joins |
| 23:19:35 | | Dj-Wawa joins |
| 23:19:36 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 23:19:36 | | JAA (JAA) joins |
| 23:19:36 | | @ChanServ sets mode: +o JAA |
| 23:19:43 | | h2ibot (h2ibot) joins |
| 23:29:03 | | AK quits [Remote host closed the connection] |
| 23:30:03 | | AK (AK) joins |
| 23:30:33 | | AK quits [Remote host closed the connection] |