00:25:59sidpatchy joins
02:25:43threedeeitguy quits [Client Quit]
02:46:16threedeeitguy (threedeeitguy) joins
03:05:06nulldata quits [Quit: The Lounge - https://thelounge.chat]
03:13:16nulldata joins
03:32:19IDK quits [Client Quit]
04:34:32eroc1990 quits [Client Quit]
04:35:29eroc1990 (eroc1990) joins
05:12:58Ryz quits [Ping timeout: 265 seconds]
05:15:59qw3rty quits [Ping timeout: 252 seconds]
05:24:39<@phuzion>albertlarsan68: Generally, you can either add it to the wiki page or just report the URL shortener here, and someone will get to adding it eventually.
05:24:53<@phuzion>albertlarsan68: What is the shortener you'd like to add?
05:39:07Ryz (Ryz) joins
05:51:48<albertlarsan68>I was thinking of adding the sk.mu shortener, already documented on the wiki in the Skyblog page.
05:53:37<albertlarsan68>I'm not sure it is a good idea, since it seems to be deterministic, and we can know where it points to just by looking at it.
06:02:44Ryz quits [Ping timeout: 252 seconds]
06:09:38Ryz (Ryz) joins
06:15:52<@phuzion>albertlarsan68: Yeah, looking at the info about how the URLs are generated, it's unlikely that sk.mu would be a good candidate to run in terroroftinytown
06:16:56<albertlarsan68>Maybe find a way to generate the redirects may be a way to archive those URLs nonetheless?
06:21:26<@phuzion>albertlarsan68: I'd say that it's probably better for the project to do this, because the project is going to have a starting point of some sort. URLTeam projects tend to be better suited when there's a relatively small URL space to work with. If the shortener slugs were say, 6 or 7 characters long, I'd probably fire it off tonight. But with sluts that are a-zA-Z0-9, 11 characters long, we're looking at an ENORMOUS scope of URLs to run, and
06:21:26<@phuzion>if we were to start absolutely slamming their site with 404 requests, tens of thousands of times per second, they'd ban our user agent in a heartbeat.
06:21:41<@phuzion>slugs, wow, what a typo
06:26:44<albertlarsan68>I was proposing generating the potential response you would get if you asked the server, to keep the short links working. I agree it would be better to do this in a cross-project effort, ie maybe sending slugs to and collect them here. I'm not sure the range is the whole of the a-zA-Z0-9 * 11 spectrum, because not all IDs are filled, but it would
06:26:44<albertlarsan68>still be a huge space IMO.
06:27:09<albertlarsan68>Its stream is #bowlofpetunias FYI
06:30:04<@phuzion>albertlarsan68: It's late, and it's entirely possible I'm lacking some information to be able to fully understand what you're talking about, but I fail to see why the shortlinks need to be crawled if they're deterministic per the wiki page?
06:31:09<albertlarsan68>They do not, that's why I propose we **generate ourselves** the 302/301 that occurs.
06:31:19<albertlarsan68>Not asking the server.
06:32:29<@flashfire42>You are suggesting archiveing the results of the shortlinks which will be long links of Skyblog which we plan to grab anyway?
06:33:27<albertlarsan68>Yep
06:35:00datechnoman quits [Quit: The Lounge - https://thelounge.chat]
06:35:50<@phuzion>albertlarsan68: So, let me make sure I am on the same page as you. Your suggestion is to generate a list of short URLs to scrape, and capture the 301/302, right?
06:36:26<albertlarsan68>My suggestion is to have a list of short URLs, and create the 302/301/... out of thin air.
06:36:34datechnoman (datechnoman) joins
06:36:45<@phuzion>albertlarsan68: Where would we get this list of short URLs?
06:37:33<albertlarsan68>Thin air??? Or the skyblog project, generate them from the Blog and Post IDs.
06:37:52<albertlarsan68>The goal would be for IA to be able to keep the short links alive.
06:39:23flashfire42|m joins
06:39:47<@phuzion>So you're suggesting that we brute force 100 billion post IDs, convert them to shortURLs, and then cram those into IA even though 99.99% of them will be 404s?
06:40:26<albertlarsan68>We could even (not backed by testing not researching) be able to see if a post is put in "secret" mode, depending on the implementation, if it redirects to the page but the page does not work or just the shortlink does not work.
06:41:27<@phuzion>Also, we tend not to generate data ourselves. Even if we know in principle how the shortener system works, we don't know for certain that there's not a bug that affects certain URLs, or other URLs break the rule of how the shortener works with manual overrides or something.
06:42:07<albertlarsan68>We would not cram all of those into IA, but the valid ones yes. Since we gather valid blog+post IDs in the skyblog project, we could transfer them to urlteam.
06:42:38<@flashfire42>I am not sure you understand how any of this works?
06:42:43@flashfire42 sets mode: +o flashfire42|m
06:43:37<albertlarsan68>flashfire42: Who are you talking to?
06:43:56<@flashfire42>You. Phuzion has been here longer than me if memory serves
06:44:49<@phuzion>I've had an account on the Wiki for almost 9 years, for whatever that's worth.
06:45:32<albertlarsan68>Anyway, it was just a kinda random thought that I have grown with you all, no problem if it won't work.
06:46:19<albertlarsan68>It is not something that I am attached to, I just want to be sure that no big part of the French internet history disappears.
06:46:52<@flashfire42>As it is the shortlinks dont get directly archived at all. They are stored in a text document and put on IA. There are a few people slowly grepping them and putting them into the URL project but yeah that idea is not how any of this works
06:48:18<@phuzion>It's not that it won't work. Your idea of pulling post IDs from the project grab might be feasible. I was just confused earlier when you were talking about pulling a list of post IDs out of thin air, because if that was the case, we would have to scrape at 25,000 URLs per second until the shutdown, and that's a super quick way to get banned.
06:50:03<albertlarsan68>It should be tested that if a "secret" post is created, and its shortlink accessed, what happens if we're not logged in? a) A normal 404 b) A redirect to the post page, but the post page errors c) Something else.
06:50:31<albertlarsan68>Since we don't know what blog an ID belongs to, this is what we should bruteforce.
06:50:38<@phuzion>Honestly, this sounds more and more like something that should be discussed in the project channel.
06:51:26<@phuzion>The URLTeam project is fairly scope-limited. We have a lot of URL shorteners that don't get scraped because we don't have the resources to develop custom code for each one of them.
06:51:48<@phuzion>But if someone is going to write a seesaw pipeline for this project, your idea about checking the URL shorteners could be implemented as a step.
06:52:10<albertlarsan68>OK, I'll try to move the ideas to the #bowlofpetunias channel.
07:03:40IDK (IDK) joins
07:11:32someone1 joins
07:13:37qw3rty joins
07:26:42masterX244 (masterX244) joins
07:47:47nulldata quits [Ping timeout: 252 seconds]
07:56:42nulldata joins
11:45:55Ryz quits [Ping timeout: 265 seconds]
12:13:28Ryz (Ryz) joins
12:38:29PredatorIWD quits [Read error: Connection reset by peer]
12:41:58PredatorIWD joins
13:51:45W7RFa6AbNFz joins
13:51:51W7RFa6AbNFz quits [Remote host closed the connection]
14:00:38atphoenix__ quits [Remote host closed the connection]
14:01:19atphoenix__ (atphoenix) joins
14:02:47atphoenix__ is now known as atphoenix
15:07:09nulldata quits [Ping timeout: 258 seconds]
15:17:25nulldata joins
15:24:42VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
15:25:22VerifiedJ (VerifiedJ) joins
16:01:47driib quits [Client Quit]
16:03:22driib (driib) joins
16:18:03driib quits [Client Quit]
16:19:56driib (driib) joins
17:15:12T31M quits [Read error: Connection reset by peer]
17:15:31T31M joins
17:51:35TheTechRobo quits [Client Quit]
17:51:55TheTechRobo (TheTechRobo) joins
17:54:16TheTechRobo quits [Remote host closed the connection]
17:54:33TheTechRobo (TheTechRobo) joins
18:09:27atphoenix quits [Read error: Connection reset by peer]
18:10:09atphoenix (atphoenix) joins
18:50:11nulldata quits [Client Quit]
18:50:52nulldata joins
19:00:22IDK quits [Client Quit]
19:31:41threedeeitguy quits [Client Quit]
19:32:32threedeeitguy (threedeeitguy) joins
20:28:56someone1 quits [Client Quit]
21:58:04driib quits [Client Quit]
21:58:04kiska quits [Quit: Ping timeout (120 seconds)]
21:58:04@flashfire42 quits [Quit: Ping timeout (120 seconds)]
21:58:04VerifiedJ quits [Client Quit]
21:58:04andrew quits [Quit: Ping timeout (120 seconds)]
21:58:04Matthww1 quits [Quit: Ping timeout (120 seconds)]
21:58:04ave quits [Quit: Ping timeout (120 seconds)]
21:58:26VerifiedJ (VerifiedJ) joins
21:58:26ave (ave) joins
21:58:31driib (driib) joins
21:58:39Matthww1 joins
21:58:54andrew (andrew) joins
21:59:09flashfire42 (flashfire42) joins
21:59:09@ChanServ sets mode: +o flashfire42
22:00:03kiska (kiska) joins