00:00:09<@JAA>j.mp is a special case I believe. It acts like an alias but is operated by Bitly itself, not by a customer.
00:15:34<flashfire42>there are edge cases where Bit.ly owned stuff doesnt resolve to bit.ly like the purina url shortener for example has some of its own custom urls but for the most part its not worth it
00:36:26<aarchi>Do you know of a reverse IP lookup tool that could find domains that point to 67.199.248.[10-13]?
00:40:32<aarchi>Sweet! I found two bitly aliases with https://hackertarget.com/reverse-ip-lookup/
00:40:48<aarchi>hsgk.in
00:40:48<aarchi>myepg.online
00:41:21<aarchi>That's obviously a very incomplete list, so I'm looking for a better tool
00:43:51<@JAA>Project Sonar may be useful here.
00:46:04<@hook54321>a ridiculous number of the cokeurl-com results are coke rewards URLs
00:46:46<@hook54321>still almost no results for a-ll-st
00:46:55<@JAA>I was going to suggest crt.sh, but looks like they use Let's Encrypt with no indication in the cert that it's Bitly.
00:48:08<aarchi>What range has been scanned for a-ll-st? I could compare that to the shortcodes that I queried from IA
00:50:47<@hook54321>aarchi: it's at 916708131 right now
00:51:14<aarchi>I'll check the code for how to convert that to a shortcode
00:51:31<aarchi>Unless there's an easy way
00:52:39<@hook54321>aarchi: 102pF1
00:53:04<aarchi>How'd you do it so fast? A base conversion utility?
00:53:14<@JAA>https://tracker.archiveteam.org:1338/calculator
00:53:26<@hook54321>ye
00:53:29<@hook54321>that
00:54:19<aarchi>There's only these 7 shortcodes on IA before 102pF1: 0EtKQc 0F6eQJ 0XwTh5 0gzJD4 0k6igy 0nsico 0tH7gU
00:55:05<aarchi>https://www.irccloud.com/pastebin/NgfMXydb/a-ll-st.ia_shortcodes.txt
00:56:29<aarchi>Definitely not sequential for a [0-9A-Za-z] alphabet. The alphabet is probably permuted or generation isn't sequential
00:58:34<aarchi>I generated that list with https://github.com/andrewarchi/urlhero/blob/main/shorteners/a-ll-st/allst.go . I'm working on generalizing it for more shorteners
01:07:18<flashfire42>aarchi if you want more shorteners check any of the unsorted stuff here https://wiki.archiveteam.org/index.php/URLTeam/unsorted
01:07:38<flashfire42>https://wiki.archiveteam.org/index.php/URLTeam/unsorted#Flashfire.27s_dump I alone have dumped a bunch there because I was lazy
01:58:08<@hook54321>According to curl, the location header for invalid shortcodes on s.uconn.edu is empty. Rejecting $ should work, right? (It does the same status code for valid and invalid shortcodes.
01:58:13<@hook54321>)
01:59:03<@JAA>$ will match every value.
01:59:12<@hook54321>ah
01:59:16<@JAA>^$ I guess, but not sure.
02:05:36<@hook54321>Note on s.uconn.edu: shortcodes aren't case-sensitive.
02:50:51<@hook54321>s-uconn-edu running
03:40:01qw3rty_ joins
03:43:41qw3rty__ quits [Ping timeout: 258 seconds]
04:55:44<@hook54321>JAA: it's angry
05:07:09<@hook54321>seems to be bitly issues
05:11:14<@hook54321>some items seem to be having issues going out too
05:22:35<@hook54321>ah i see it stops when there's too many errors
05:25:23<aarchi>hook54321a: s.uconn.edu doesn't look like a bit.ly alias
05:25:39<aarchi>The IP is 137.99.146.52, not a bitly one
06:31:59<aarchi>The tracker looks like it's down
06:32:01<aarchi>Error communicating with tracker: 507 Server Error: The tracker needs an operator for manual maintenance. Try again later. for url: https://tracker.archiveteam.org:1338/api/get.
08:25:12<flashfire42>aarchi sorry only saw this now I will kick it
08:27:01<flashfire42>Ok I have paused red-ht
08:27:14<flashfire42>I have no idea what the error is but it seems to be all related to that shortener for the most part
08:28:00<flashfire42>Traceback (most recent call last):
08:28:01<flashfire42> File "scraper.py", line 44, in main
08:28:01<flashfire42> result = scraper_client.run()
08:28:01<flashfire42> File "/home/archiveteam/terroroftinytown-client-grab/terroroftinytown/terroroftinytown/client/scraper.py", line 57, in run
08:28:01<flashfire42> .format(repr(item), shortcode)
08:28:01<flashfire42>ScraperError: Number of attempts exceeded for
08:28:06<flashfire42>thats the error
08:29:06<flashfire42>https://s.uconn.edu/ got tripped up by a few thigns
08:31:07<flashfire42>I am pausing the autoqueue for the cokeurls as its getting 500s
08:31:11<flashfire42>I dont know what else to do there
08:31:30<aarchi>I'm getting a different stacktrace. I'll paste it here, in case it's helpful:
08:31:33<aarchi>Traceback (most recent call last):
08:31:33<aarchi>File "/grab/terroroftinytown/terroroftinytown/client/tracker.py", line 28, in wrapper
08:31:33<aarchi>return func(*args, **kwargs)
08:31:33<aarchi>File "/grab/terroroftinytown/terroroftinytown/client/tracker.py", line 67, in get_item
08:31:34<aarchi>response.raise_for_status()
08:31:34<aarchi>File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 943, in raise_for_status
08:31:35<aarchi>raise HTTPError(http_error_msg, response=self)
08:31:35<aarchi>requests.exceptions.HTTPError: 507 Server Error: The tracker needs an operator for manual maintenance. Try again later. for url: https://tracker.archiveteam.org:1338/api/get
08:31:36<aarchi>The above exception was the direct cause of the following exception:
08:31:37<aarchi>Traceback (most recent call last):
08:31:37<aarchi>File "/grab/scraper.py", line 78, in <module>
08:31:38<aarchi>main()
08:31:38<aarchi>File "/grab/scraper.py", line 34, in main
08:31:39<aarchi>item_info = try_with_tracker(tracker_client.get_item)
08:31:53<flashfire42>Yeah I am assuming you doint have access to the tracker admin
08:32:01<flashfire42>I do
08:32:06<flashfire42>I am just not opped here
08:32:17<flashfire42>Because I am lazy and do the bare minimum lol
08:32:23<aarchi>Yeah, I'm not a tracker admin
08:32:30Jasdemi quits [Remote host closed the connection]
08:33:22<aarchi>I don't know who does the approvals, but I'd be willing to help admin the tracker
08:34:03<flashfire42>There is something wrong with the coke url stuff specifically it keeps trying to access https://cokeurl.com/4whlz and it fails because 500 isnt listed as an expected status code
08:34:35<flashfire42>I am not sure whats wrong with the redhat one usually stuff just feeds back into the pool and is tried again if it fails from too many attempts
08:35:02<flashfire42>cokeurl also broke on https://cokeurl.com/8p2bw
08:35:07<flashfire42>I have no idea why those specific urls
08:36:43<aarchi>Oh yeah that's weird
09:42:02<aarchi>Well that was disappointing. I scanned the entirety of the Project Sonar reverse DNS lookup table and didn't get any aliases. Just this:
09:42:07<aarchi>https://www.irccloud.com/pastebin/KWHWS9T0/
11:02:47HackMii quits [Remote host closed the connection]
11:03:18HackMii (hacktheplanet) joins
14:20:05Daloader_ joins
14:53:24<@JAA>aarchi: You want the FDNS dataset, not rDNS.
14:56:22<@JAA>Yeah, most errors are from red-ht, which was already paused, but something else then pushed it over the threshold.
15:24:44<@JAA>Well, also lots of errors from bitly_6, probably because some clients got banned due to red-ht.
15:25:10<@JAA>Anyway, we're back in business.
15:47:51<@hook54321>thanks
18:10:00Atom quits [Ping timeout: 250 seconds]
18:28:12Daloader_ quits [Ping timeout: 250 seconds]
20:08:22<aarchi>How long do bans typically last? Permanent?
20:16:51<aarchi>Does anyone have a copy of these qr-cx dumps?
20:16:51<aarchi>http://qr.cx/dataset/qr.cx_dataset_9da2a85d-c842-4e7b-8350-7c53f9576f34.7z
20:16:51<aarchi>http://qr.cx/dataset/qr.cx_dataset_401fe2f2-f280-48f4-aedf-3e57a8d6fa7f.7z
20:16:51<aarchi>Only this one is on IA:
20:16:51<aarchi>http://qr.cx/dataset/qrcx_all_06eec9b9-1f29-4860-bd91-49c2d517d87d.7z
21:23:28Zerote joins
21:31:35<aarchi>I just grabed the Project Sonar FDNS data and unfortunately, it's a mapping only to the immediate forward pointer, not from hostname to IP. Most are multiple redirects until the IP, so I can't just parse it in one pass and will need to throw into a db. I might come back to this later, but there will be so many that I don't know what I'd do with it.
21:32:49<aarchi>I guess I could periodically archive the list of bit.ly aliases found via FDNS, so that if one dies or is repurposed, we can still recognize the old links as bit.ly
21:50:28<aarchi>Yeah as cool as it would be to engineer a system that regularly retrieve and store the FDNS records in a graph database, if I go down that route, knowing the archivist in me I'd end up keeping the FDNS records, even after processing, so that would eat up 160GB/yr if monthly or 700GB/yr if weekly in just gzip-compressed raw data. The graph database would be significantly larger.
21:51:13<aarchi>And of course, once I'd get that working, I'd branch off into other tangential projects, which are less related to my main project.
22:26:37<@JAA>aarchi: Uh, I don't understand. Bitly aliases are direct A records to their IPs...?
22:28:44<@JAA>So from fdns_a.json.gz, you can just take lines with a value in the relevant IP range. Or if you're lazy like me: grep -F '"value":"67.199.248.1'
22:40:26<aarchi>JAA: I saw very few that were direct, so I assumed that was also true for bit.ly. I'll run the query now then
22:41:21<@JAA>May be true for other shorteners, but I've never seen a Bitly alias that didn't directly resolve to those IPs.
22:41:51<@JAA>The rDNS is cname.bitly.com, but it's not actually a CNAME because fuck logic.