| 00:00:09 | <@JAA> | j.mp is a special case I believe. It acts like an alias but is operated by Bitly itself, not by a customer. |
| 00:15:34 | <flashfire42> | there are edge cases where Bit.ly owned stuff doesnt resolve to bit.ly like the purina url shortener for example has some of its own custom urls but for the most part its not worth it |
| 00:36:26 | <aarchi> | Do you know of a reverse IP lookup tool that could find domains that point to 67.199.248.[10-13]? |
| 00:40:32 | <aarchi> | Sweet! I found two bitly aliases with https://hackertarget.com/reverse-ip-lookup/ |
| 00:40:48 | <aarchi> | hsgk.in |
| 00:40:48 | <aarchi> | myepg.online |
| 00:41:21 | <aarchi> | That's obviously a very incomplete list, so I'm looking for a better tool |
| 00:43:51 | <@JAA> | Project Sonar may be useful here. |
| 00:46:04 | <@hook54321> | a ridiculous number of the cokeurl-com results are coke rewards URLs |
| 00:46:46 | <@hook54321> | still almost no results for a-ll-st |
| 00:46:55 | <@JAA> | I was going to suggest crt.sh, but looks like they use Let's Encrypt with no indication in the cert that it's Bitly. |
| 00:48:08 | <aarchi> | What range has been scanned for a-ll-st? I could compare that to the shortcodes that I queried from IA |
| 00:50:47 | <@hook54321> | aarchi: it's at 916708131 right now |
| 00:51:14 | <aarchi> | I'll check the code for how to convert that to a shortcode |
| 00:51:31 | <aarchi> | Unless there's an easy way |
| 00:52:39 | <@hook54321> | aarchi: 102pF1 |
| 00:53:04 | <aarchi> | How'd you do it so fast? A base conversion utility? |
| 00:53:14 | <@JAA> | https://tracker.archiveteam.org:1338/calculator |
| 00:53:26 | <@hook54321> | ye |
| 00:53:29 | <@hook54321> | that |
| 00:54:19 | <aarchi> | There's only these 7 shortcodes on IA before 102pF1: 0EtKQc 0F6eQJ 0XwTh5 0gzJD4 0k6igy 0nsico 0tH7gU |
| 00:55:05 | <aarchi> | https://www.irccloud.com/pastebin/NgfMXydb/a-ll-st.ia_shortcodes.txt |
| 00:56:29 | <aarchi> | Definitely not sequential for a [0-9A-Za-z] alphabet. The alphabet is probably permuted or generation isn't sequential |
| 00:58:34 | <aarchi> | I generated that list with https://github.com/andrewarchi/urlhero/blob/main/shorteners/a-ll-st/allst.go . I'm working on generalizing it for more shorteners |
| 01:07:18 | <flashfire42> | aarchi if you want more shorteners check any of the unsorted stuff here https://wiki.archiveteam.org/index.php/URLTeam/unsorted |
| 01:07:38 | <flashfire42> | https://wiki.archiveteam.org/index.php/URLTeam/unsorted#Flashfire.27s_dump I alone have dumped a bunch there because I was lazy |
| 01:58:08 | <@hook54321> | According to curl, the location header for invalid shortcodes on s.uconn.edu is empty. Rejecting $ should work, right? (It does the same status code for valid and invalid shortcodes. |
| 01:58:13 | <@hook54321> | ) |
| 01:59:03 | <@JAA> | $ will match every value. |
| 01:59:12 | <@hook54321> | ah |
| 01:59:16 | <@JAA> | ^$ I guess, but not sure. |
| 02:05:36 | <@hook54321> | Note on s.uconn.edu: shortcodes aren't case-sensitive. |
| 02:50:51 | <@hook54321> | s-uconn-edu running |
| 03:40:01 | | qw3rty_ joins |
| 03:43:41 | | qw3rty__ quits [Ping timeout: 258 seconds] |
| 04:55:44 | <@hook54321> | JAA: it's angry |
| 05:07:09 | <@hook54321> | seems to be bitly issues |
| 05:11:14 | <@hook54321> | some items seem to be having issues going out too |
| 05:22:35 | <@hook54321> | ah i see it stops when there's too many errors |
| 05:25:23 | <aarchi> | hook54321a: s.uconn.edu doesn't look like a bit.ly alias |
| 05:25:39 | <aarchi> | The IP is 137.99.146.52, not a bitly one |
| 06:31:59 | <aarchi> | The tracker looks like it's down |
| 06:32:01 | <aarchi> | Error communicating with tracker: 507 Server Error: The tracker needs an operator for manual maintenance. Try again later. for url: https://tracker.archiveteam.org:1338/api/get. |
| 08:25:12 | <flashfire42> | aarchi sorry only saw this now I will kick it |
| 08:27:01 | <flashfire42> | Ok I have paused red-ht |
| 08:27:14 | <flashfire42> | I have no idea what the error is but it seems to be all related to that shortener for the most part |
| 08:28:00 | <flashfire42> | Traceback (most recent call last): |
| 08:28:01 | <flashfire42> | File "scraper.py", line 44, in main |
| 08:28:01 | <flashfire42> | result = scraper_client.run() |
| 08:28:01 | <flashfire42> | File "/home/archiveteam/terroroftinytown-client-grab/terroroftinytown/terroroftinytown/client/scraper.py", line 57, in run |
| 08:28:01 | <flashfire42> | .format(repr(item), shortcode) |
| 08:28:01 | <flashfire42> | ScraperError: Number of attempts exceeded for |
| 08:28:06 | <flashfire42> | thats the error |
| 08:29:06 | <flashfire42> | https://s.uconn.edu/ got tripped up by a few thigns |
| 08:31:07 | <flashfire42> | I am pausing the autoqueue for the cokeurls as its getting 500s |
| 08:31:11 | <flashfire42> | I dont know what else to do there |
| 08:31:30 | <aarchi> | I'm getting a different stacktrace. I'll paste it here, in case it's helpful: |
| 08:31:33 | <aarchi> | Traceback (most recent call last): |
| 08:31:33 | <aarchi> | File "/grab/terroroftinytown/terroroftinytown/client/tracker.py", line 28, in wrapper |
| 08:31:33 | <aarchi> | return func(*args, **kwargs) |
| 08:31:33 | <aarchi> | File "/grab/terroroftinytown/terroroftinytown/client/tracker.py", line 67, in get_item |
| 08:31:34 | <aarchi> | response.raise_for_status() |
| 08:31:34 | <aarchi> | File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status |
| 08:31:35 | <aarchi> | raise HTTPError(http_error_msg, response=self) |
| 08:31:35 | <aarchi> | requests.exceptions.HTTPError: 507 Server Error: The tracker needs an operator for manual maintenance. Try again later. for url: https://tracker.archiveteam.org:1338/api/get |
| 08:31:36 | <aarchi> | The above exception was the direct cause of the following exception: |
| 08:31:37 | <aarchi> | Traceback (most recent call last): |
| 08:31:37 | <aarchi> | File "/grab/scraper.py", line 78, in <module> |
| 08:31:38 | <aarchi> | main() |
| 08:31:38 | <aarchi> | File "/grab/scraper.py", line 34, in main |
| 08:31:39 | <aarchi> | item_info = try_with_tracker(tracker_client.get_item) |
| 08:31:53 | <flashfire42> | Yeah I am assuming you doint have access to the tracker admin |
| 08:32:01 | <flashfire42> | I do |
| 08:32:06 | <flashfire42> | I am just not opped here |
| 08:32:17 | <flashfire42> | Because I am lazy and do the bare minimum lol |
| 08:32:23 | <aarchi> | Yeah, I'm not a tracker admin |
| 08:32:30 | | Jasdemi quits [Remote host closed the connection] |
| 08:33:22 | <aarchi> | I don't know who does the approvals, but I'd be willing to help admin the tracker |
| 08:34:03 | <flashfire42> | There is something wrong with the coke url stuff specifically it keeps trying to access https://cokeurl.com/4whlz and it fails because 500 isnt listed as an expected status code |
| 08:34:35 | <flashfire42> | I am not sure whats wrong with the redhat one usually stuff just feeds back into the pool and is tried again if it fails from too many attempts |
| 08:35:02 | <flashfire42> | cokeurl also broke on https://cokeurl.com/8p2bw |
| 08:35:07 | <flashfire42> | I have no idea why those specific urls |
| 08:36:43 | <aarchi> | Oh yeah that's weird |
| 09:42:02 | <aarchi> | Well that was disappointing. I scanned the entirety of the Project Sonar reverse DNS lookup table and didn't get any aliases. Just this: |
| 09:42:07 | <aarchi> | https://www.irccloud.com/pastebin/KWHWS9T0/ |
| 11:02:47 | | HackMii quits [Remote host closed the connection] |
| 11:03:18 | | HackMii (hacktheplanet) joins |
| 14:20:05 | | Daloader_ joins |
| 14:53:24 | <@JAA> | aarchi: You want the FDNS dataset, not rDNS. |
| 14:56:22 | <@JAA> | Yeah, most errors are from red-ht, which was already paused, but something else then pushed it over the threshold. |
| 15:24:44 | <@JAA> | Well, also lots of errors from bitly_6, probably because some clients got banned due to red-ht. |
| 15:25:10 | <@JAA> | Anyway, we're back in business. |
| 15:47:51 | <@hook54321> | thanks |
| 18:10:00 | | Atom quits [Ping timeout: 250 seconds] |
| 18:28:12 | | Daloader_ quits [Ping timeout: 250 seconds] |
| 20:08:22 | <aarchi> | How long do bans typically last? Permanent? |
| 20:16:51 | <aarchi> | Does anyone have a copy of these qr-cx dumps? |
| 20:16:51 | <aarchi> | http://qr.cx/dataset/qr.cx_dataset_9da2a85d-c842-4e7b-8350-7c53f9576f34.7z |
| 20:16:51 | <aarchi> | http://qr.cx/dataset/qr.cx_dataset_401fe2f2-f280-48f4-aedf-3e57a8d6fa7f.7z |
| 20:16:51 | <aarchi> | Only this one is on IA: |
| 20:16:51 | <aarchi> | http://qr.cx/dataset/qrcx_all_06eec9b9-1f29-4860-bd91-49c2d517d87d.7z |
| 21:23:28 | | Zerote joins |
| 21:31:35 | <aarchi> | I just grabed the Project Sonar FDNS data and unfortunately, it's a mapping only to the immediate forward pointer, not from hostname to IP. Most are multiple redirects until the IP, so I can't just parse it in one pass and will need to throw into a db. I might come back to this later, but there will be so many that I don't know what I'd do with it. |
| 21:32:49 | <aarchi> | I guess I could periodically archive the list of bit.ly aliases found via FDNS, so that if one dies or is repurposed, we can still recognize the old links as bit.ly |
| 21:50:28 | <aarchi> | Yeah as cool as it would be to engineer a system that regularly retrieve and store the FDNS records in a graph database, if I go down that route, knowing the archivist in me I'd end up keeping the FDNS records, even after processing, so that would eat up 160GB/yr if monthly or 700GB/yr if weekly in just gzip-compressed raw data. The graph database would be significantly larger. |
| 21:51:13 | <aarchi> | And of course, once I'd get that working, I'd branch off into other tangential projects, which are less related to my main project. |
| 22:26:37 | <@JAA> | aarchi: Uh, I don't understand. Bitly aliases are direct A records to their IPs...? |
| 22:28:44 | <@JAA> | So from fdns_a.json.gz, you can just take lines with a value in the relevant IP range. Or if you're lazy like me: grep -F '"value":"67.199.248.1' |
| 22:40:26 | <aarchi> | JAA: I saw very few that were direct, so I assumed that was also true for bit.ly. I'll run the query now then |
| 22:41:21 | <@JAA> | May be true for other shorteners, but I've never seen a Bitly alias that didn't directly resolve to those IPs. |
| 22:41:51 | <@JAA> | The rDNS is cname.bitly.com, but it's not actually a CNAME because fuck logic. |