00:02:47Unholy236131 (Unholy2361) joins
00:05:59Unholy23613 quits [Ping timeout: 252 seconds]
00:05:59Unholy236131 is now known as Unholy23613
00:11:49BlueMaxima joins
00:30:10decky_e_ quits [Read error: Connection reset by peer]
00:32:15lk quits [Ping timeout: 265 seconds]
00:34:33lk (lk) joins
00:36:30ats (ats) joins
00:41:02DV180 joins
00:42:17lk quits [Ping timeout: 252 seconds]
00:42:36lk (lk) joins
00:47:29Mateon2 joins
00:48:53Mateon1 quits [Ping timeout: 252 seconds]
00:48:53Mateon2 is now known as Mateon1
00:49:42DV180 quits [Remote host closed the connection]
01:07:02Hackerpcs quits [Quit: Hackerpcs]
01:07:13<fireonlive>JAA: you looking at forward dns right?
01:08:59Hackerpcs (Hackerpcs) joins
01:12:17<@JAA>fireonlive: No, because Ryz isn't either.
01:12:36<@JAA>(Based on the AB jobs, at least.)
01:12:50<@JAA>But good point. :-)
01:12:52<Ryz>oo;
01:14:39<fireonlive>:3
01:15:11<Ryz>Forward DNS? Can you two clarify?
01:15:18<fireonlive>the lack of linking directly to that page doesn't help
01:15:48<fireonlive>hm, reverse DNS is like what you see when people connect to IRC
01:15:52<fireonlive>like the hostname of an IP
01:15:59<fireonlive>forward DNS is like www.google.com to an IP
01:16:34<fireonlive>some services like bgp.tools report all the things they come across among their travels of what hostnames resolve to what IPs and provide 'forward dns' lookups
01:16:41<fireonlive>or 'what's hosted on this IP'
01:17:01<Ryz>Yeah, unfortunately I was a bit miffed on that and had to make those links access for me; then again, I was given what flashfire42 provided
01:17:13<flashfire42>Sorry
01:17:43<fireonlive>google.com is 142.250.179.206 for my server but 142.250.179.206's 'hostname' (or reverse DNS) is ams15s42-in-f14.1e100.net
01:17:57<fireonlive>so if you just looked up 142.250.179.206 you'd only see the latter but not nesc. the former
01:18:13za3k joins
01:18:14<fireonlive>(unless you used a special service)
01:21:04<@JAA>The rDNS section is also easy to check: you just do a bunch of DNS queries for each IP.
01:21:25<@JAA>Forward DNS requires knowledge of the domains that resolve to an IP in the block, which is exactly what we're looking for.
01:21:49<@JAA>The rDNS section data is old, might be worth rerunning that.
01:21:54<fireonlive>indeed :)
01:23:05<@JAA>I see a bunch of domains there that no longer resolve to FutureQuest IPs.
01:23:11<fireonlive>there's various sources to do fDNS but it's much harder to populate
01:23:19<fireonlive>ye, also rDNS has no verification
01:23:34<fireonlive>i could set my rDNS to free-bitcoin.google.com and that would be 100% ok
01:23:38<@JAA>I suspect it's just outdated data in this case, but yeah.
01:23:52<@JAA>I used to set my rDNS to a .invalid domain on a provider that let me. :-)
01:23:52<fireonlive>that too :)
01:23:55<fireonlive>ah :D
01:24:10<fireonlive>or maybe they migrated but didn't bother updating rDNS on the way out the door
01:24:19<fireonlive>and the going out of business host didn't bother either
01:24:29<@JAA>Yeah, that's what I'm thinking.
01:24:45<@JAA>Well, or it did get updated, but bgp.tools never refetched it.
01:24:50<fireonlive>mm
01:24:51<@JAA>> Data Age between: 2022-10-15T11:32:55Z UTC and 2020-06-16T10:52:28Z UTC
01:24:56<fireonlive>ah there you go
01:25:17<@JAA>So who wants to do a few thousand rDNS queries? :-)
01:31:33<fireonlive>ok uwu
01:31:55<fireonlive>8192 * 2 potential dns queries since i'll also check forward dns lol
01:32:04<@JAA>Thanks
01:32:17<fireonlive>:)
01:32:20lk quits [Ping timeout: 252 seconds]
01:32:34lk (lk) joins
01:32:34<fireonlive>i didn't add a sleep or anything but it's like DNS so should be ok lol
01:34:29<fireonlive>well a rough check
01:34:39<fireonlive>fdns = 69.5.* lol
01:37:51decky_e joins
01:38:49tzt quits [Ping timeout: 258 seconds]
01:40:13dumbgoy quits [Read error: Connection reset by peer]
01:43:00tzt (tzt) joins
01:47:44lk quits [Ping timeout: 252 seconds]
01:49:24lk (lk) joins
02:00:45<fireonlive>979 matches so far
02:01:01<fireonlive>(on x.x.22.227 atm)
02:13:36<fireonlive>leela.futurequest.net genesis.futurequest.net. neon.futurequest.net. evangelion.futurequest.net. eva.futurequest.net.
02:13:39<fireonlive>nerds :P
02:14:53<fireonlive>here's what I got cc JAA: https://transfer.archivete.am/inline/Acexz/rdns-matches.txt
02:16:52icedice quits [Client Quit]
02:22:05<@JAA>Thanks!
02:24:28<@JAA>1103 domains in there after excluding *.futurequest.net, which seem to be noise.
02:26:39<fireonlive>:)
02:26:44<fireonlive>JAA: there seems to be a lot from securecnc.net as well
02:26:55<fireonlive>mqs0042.securecnc.net mqs0043.securecnc.net etc
02:27:22<@JAA>Hmm yeah
02:27:29<fireonlive>seem to share some names with futurequest as well
02:27:39<fireonlive>leela exists in both, genesis does too
02:27:50<@JAA>1009 after kicking those out.
02:28:06<fireonlive>=]
02:31:53mr_sarge quits [Read error: Connection reset by peer]
02:34:25<Ryz>Yeah, I'm actually maybe getting a bit sick of running constant jobs and checking the content myself, oof; I'm pondering on regarding queueh2ibot, the problem which ones are the real links since I had to manually check if they were actually dead or if it's HTTP only :/
02:35:01mr_sarge (sarge) joins
02:35:33lk quits [Ping timeout: 258 seconds]
02:35:52lk (lk) joins
02:35:55<Ryz>I think that's part of the reason I wasn't too sure on using that bot, JAA
02:36:26za3k quits [Client Quit]
02:36:32<@JAA>HTTP vs HTTPS check can easily be automated.
02:36:55<Ryz>WWW and non-WWW too?
02:36:56<fireonlive>curl -m or something i suppose
02:37:18<Ryz>And I'm assuming a combination of the 4?
02:39:42dumbgoy joins
02:44:40<Ryz>JAA, there is unfortunately only other reason I went for the non-automated route at the time, is spotting the jobs that failed because it doesn't work on pipeline but it could on another pipeline, since I don't think the bot can detect something like that all
02:44:58<Ryz>The occasional 'Connection closed' and 'Connection refused' entries on some of the jobs S:
02:45:10<@JAA>Correct, someone would need to watch the dashboard and requeue those manually.
02:45:44beario_ joins
02:46:54<@JAA>It's obviously still less work if the queueing happens automatically.
02:47:08<@JAA>Also, ain't nobody got time for manually queueing a thousand websites.
02:47:26beario quits [Ping timeout: 258 seconds]
02:47:46<Ryz>Mhm, even me eventually, since I would wanna spend more leisure time, or at least make jobs finish faster myself since that part isn't easily automated~
02:47:59<Ryz>I'll give you the list of what I have left
02:48:14dumbgoy quits [Ping timeout: 252 seconds]
02:48:47<nicolas17>at least you need to batch them... send a hundred and *then* check how they go, instead of pasting them into IRC one by one and switching windows to the pipeline status for each and everyone etc
02:48:57<Ryz>For each automated job, should have concurrency 2, for leeway reasons, and ignoreset badvideos because annoyingly some of the jobs I thought would be safe from New York Times videos...pushed it's ugly head up :/
02:52:02lk quits [Ping timeout: 258 seconds]
02:52:22<Ryz>Here is the remaining stuff (except for the really big jobs that I put them in a separate list, which is small): https://transfer.archivete.am/zr086/remaining-list - again, this is based on https://bgp.tools/prefix/69.5.0.0/19#dns that flashfire42, I did some cleanup, mainly the '.' stuff at the end of the URL, although there's still some of them;
02:52:43<Ryz>I removed the futurequest.net domains because I did a sampling of a few and...they don't respond or exist S:
02:53:41lk (lk) joins
02:54:10killsushi joins
02:55:03<fireonlive>if you have vim you can type :%s/\.$//<enter> to remove the dots at the end
02:55:54<Ryz>Heh, I went around maybe 400-500 links before I burnt out, lol
02:56:13<fireonlive>ah 😅
03:00:24pseudorizer quits [Quit: ZNC 1.8.2 - https://znc.in]
03:01:07pseudorizer (pseudorizer) joins
03:05:05<@JAA>Ugh, their TLS is pretty broken.
03:05:23<@JAA>All kinds of weak key and signature errors.
03:06:09<@JAA>Or well, I guess it's the customers and their ancient setups, but same difference.
03:06:32<fireonlive>:(
03:07:20<@JAA>Insecure OPENSSL_CONF time...
03:19:45nulldata quits [Client Quit]
03:20:55nulldata (nulldata) joins
03:28:04lk quits [Ping timeout: 258 seconds]
03:28:23lk (lk) joins
03:32:07<fireonlive>if they don't use TLSv1.3 it's not worth archiving
03:32:11<fireonlive>:p
03:47:00@JAA offers fireonlive an SSLv2 server with an MD5 signature.
03:47:03<@JAA>Best I can do.
03:47:29<fireonlive>*dies*
03:47:47<@JAA>Scan finished, need to process it into something that can be queued, but too tired for that now.
03:48:13<fireonlive>would you like some coffee
03:48:29<fireonlive>re: oceangate; looks like every url except images/logo-offwhite-600.png returns the same HTML
03:48:45<fireonlive>wow it's literally named oceangate i just realized
03:48:51<fireonlive>anything ending in gate is just doomed
03:49:03<fireonlive>anyways was just checking if a sitemap or something existed still :D
03:49:48<fireonlive>oh, some other files too. (manifest, etc) i guess they just rewrote all 404s
03:50:02benjins quits [Read error: Connection reset by peer]
03:53:48Icyelut (Icyelut) joins
03:56:59lk quits [Ping timeout: 252 seconds]
03:57:44lk (lk) joins
03:57:53Icyelut|2 (Icyelut) joins
03:58:02benjins joins
04:02:01Icyelut quits [Ping timeout: 265 seconds]
04:02:30lk quits [Ping timeout: 265 seconds]
04:02:52lk (lk) joins
04:10:40etnguyen03 (etnguyen03) joins
04:10:48katocala quits [Remote host closed the connection]
04:11:23lk quits [Ping timeout: 258 seconds]
04:11:46lk (lk) joins
04:14:16nicolas17 quits [Client Quit]
04:15:33nulldata quits [Client Quit]
04:15:58nulldata (nulldata) joins
04:18:21cobertos quits [Remote host closed the connection]
04:19:04cobertos joins
04:26:26etnguyen03 quits [Client Quit]
04:33:21cobertos quits [Remote host closed the connection]
04:33:55lk quits [Ping timeout: 265 seconds]
04:33:59cobertos joins
04:34:42lk (lk) joins
04:44:04benjins2 quits [Read error: Connection reset by peer]
04:44:21cobertos quits [Remote host closed the connection]
04:44:39cobertos joins
04:44:48chessnoob280 joins
04:50:37Island quits [Read error: Connection reset by peer]
04:54:12yts98 joins
04:55:00katocala joins
04:55:21killsushi quits [Client Quit]
04:55:39sec^nd quits [Remote host closed the connection]
04:56:12sec^nd (second) joins
05:00:02HP_Archivist quits [Read error: Connection reset by peer]
05:33:12BlueMaxima quits [Read error: Connection reset by peer]
05:59:11cobertos_ joins
05:59:57cobertos quits [Ping timeout: 265 seconds]
06:10:17hitgrr8 joins
06:11:30Carnildo_again is now known as Carnildo
06:27:44Justin[home] quits [Remote host closed the connection]
06:47:00DopefishJustin joins
07:17:58yts98 leaves
07:24:49<pabs>hmm, does AB not upload meta WARCs any more? https://archive.fart.website/archivebot/viewer/job/202306040627542dvm3
07:27:08<pabs>or is the viewer not showing them up
07:28:15<pabs>ah, indeed the meta warc is on https://archive.fart.website/archivebot/viewer/item/archiveteam_archivebot_go_20230606040557_ca293687
07:31:07Arcorann (Arcorann) joins
07:39:28<h2ibot>PaulWise edited Bugzilla (+22, kde bugzilla): https://wiki.archiveteam.org/?diff=50178&oldid=50163
07:42:10yts98 joins
07:43:33BigBrain_ (bigbrain) joins
07:45:22yts98 leaves
07:47:06BigBrain quits [Ping timeout: 245 seconds]
08:43:32Naruyoko quits [Ping timeout: 252 seconds]
08:56:24W7RFa6AbNFz_ quits [Read error: Connection reset by peer]
08:56:37W7RFa6AbNFz_ joins
09:03:43mls (mls) joins
09:09:14hitgrr8 quits [Client Quit]
09:44:07pseudorizer quits [Ping timeout: 258 seconds]
09:44:11emily (pseudorizer) joins
09:46:49chessnoob280 quits [Remote host closed the connection]
09:49:29yts98 joins
10:04:06mls quits [Client Quit]
10:18:58<h2ibot>PaulWise edited Mailman2 (+57, afrinic lists): https://wiki.archiveteam.org/?diff=50179&oldid=50159
10:23:58<h2ibot>PaulWise edited Mailman2 (+11, twisted legacy archives): https://wiki.archiveteam.org/?diff=50180&oldid=50179
10:36:15hitgrr8 joins
10:38:01<h2ibot>OrIdow6 edited Wysp (+571, Initial remarks on grab): https://wiki.archiveteam.org/?diff=50181&oldid=50167
10:58:05<h2ibot>OrIdow6 edited Wysp (+473, On auth): https://wiki.archiveteam.org/?diff=50182&oldid=50181
10:59:05<h2ibot>OrIdow6 edited Wysp (+2): https://wiki.archiveteam.org/?diff=50183&oldid=50182
11:07:30chessnoob280 joins
11:23:29<imer>just going to repost is here so it doesnt get lost in the depths of #archivebot
11:23:29<imer>10:25 <mexat2> does archivebot have space for 7891 mini-blog entries? they're hosted on a site that have no/very little activity for the past 3 years and may shutdown anytime
11:23:29<imer>10:26 <imer> mexat2: maybe! do you have a list of urls/sites ready? (kindly upload to https://transfer.archivete.am if you do)
11:23:29<imer>10:27 <imer> someone with permission (= not me) will look at it, might take a bit for someone to get to it though
11:23:29<imer>10:27 <mexat2> https://transfer.archivete.am/qzcm2/mexatblog
11:23:30<imer>10:29 <mexat2> the whole forum needs to be archived as it's one of the few remaining giants in Arabic web. the forum already does have sitemap up and ready for crawling.
11:23:30<imer>10:30 <mexat2> I try to archive whatever I can, but it takes forever using the Wayback Machine browser extension.
11:24:37<flashfire42>I will start on it on sunday
11:25:10<imer>ah, great
11:26:40VickoSaviour joins
11:27:02yano quits [Quit: WeeChat, the better IRC client, https://weechat.org/]
11:27:21yano (yano) joins
11:29:43<VickoSaviour>damn banciyuan images progress is rolling!
11:33:29pabs quits [Ping timeout: 252 seconds]
11:46:40pabs (pabs) joins
12:15:52<anarcat>pabs: not much, i'm afraid
12:42:04<@rewby>Ryz, JAA: Is the list I provided not useful? (It's forward dns and not reverse.)
12:53:47AmAnd0A quits [Ping timeout: 252 seconds]
12:54:23AmAnd0A joins
12:55:20etnguyen03 (etnguyen03) joins
12:56:58benjins2 joins
13:00:15AmAnd0A quits [Read error: Connection reset by peer]
13:00:31AmAnd0A joins
13:10:50Unholy23613 quits [Ping timeout: 252 seconds]
13:29:02VickoSaviour leaves
13:53:10driib quits [Quit: The Lounge - https://thelounge.chat]
13:54:21driib (driib) joins
13:59:08katocala quits [Remote host closed the connection]
14:06:56Arcorann quits [Ping timeout: 252 seconds]
14:18:22katocala joins
14:48:52<h2ibot>Yts98 edited Games/Engines, Platforms and Hostings (+429, Added ノベăƒȘス): https://wiki.archiveteam.org/?diff=50184&oldid=50168
14:51:37ave9 (ave) joins
14:51:47yano1 (yano) joins
14:51:56TastyWiener95 quits [Client Quit]
14:51:56ave quits [Quit: Ping timeout (120 seconds)]
14:51:56yano quits [Remote host closed the connection]
14:51:56chrismeller3 quits [Client Quit]
14:51:56datechnoman quits [Quit: Ping timeout (120 seconds)]
14:51:56Mateon1 quits [Remote host closed the connection]
14:51:56IDK_ quits [Quit: Ping timeout (120 seconds)]
14:51:56ave9 is now known as ave
14:51:58chrismeller35 (chrismeller) joins
14:52:01TastyWiener951 (TastyWiener95) joins
14:52:02Mateon1 joins
14:52:09datechnoman (datechnoman) joins
14:52:10IDK_ joins
14:57:50<@JAA>rewby: No, your list is useful, I was just confused because I didn't notice the rDNS/fDNS switch on the page. I'll combine the rDNS list from fireonlive with yours, filter out what Ryz did, and run that through AB.
15:05:55<h2ibot>Yts98 edited Discourse (+14, Format adjustment): https://wiki.archiveteam.org/?diff=50185&oldid=50161
15:09:59chessnoob280 quits [Ping timeout: 265 seconds]
15:13:56<@rewby|backup>JAA: Ah yeah, that switch trips people up. The fdns data is usually only (fully) available via login and even then not designed for scraping. And logins can only really be gotten by network administrators. I have a full login, but I just asked the ben to do a backend lookup for me.
15:14:42<@rewby|backup>It's at least partially based on CT logs
15:16:16<pokechu22>imer: re https://transfer.archivete.am/inline/qzcm2/mexatblog - looks like that's suitable as just an !ao < list job, but the s= parameter is a session ID, so it'd probably be better to do !a http://www.mexat.com/vb?archiveteam instead and I'm pretty sure it'd find everything
15:16:42<pokechu22>Fortunately everything is all on the same domain, so flashfire42 doesn't need to manually run through a list of 7891 posts
15:17:01<pokechu22>I think -i forums should cover everything fairly well
15:21:37driib quits [Client Quit]
15:24:46Unholy23613 (Unholy2361) joins
15:31:20yts98 leaves
15:31:21yts98 joins
15:36:16nostalgebraist joins
15:37:34driib (driib) joins
16:06:07<h2ibot>Yts98 uploaded File:ZOWA-icon.png: https://wiki.archiveteam.org/?title=File%3AZOWA-icon.png
16:07:07<h2ibot>Yts98 uploaded File:ZOWA-logo.png: https://wiki.archiveteam.org/?title=File%3AZOWA-logo.png
16:08:07<h2ibot>Yts98 created ZOWA (+1353, Create ZOWA): https://wiki.archiveteam.org/?title=ZOWA
16:11:08<h2ibot>Yts98 edited Deathwatch (-22, Update ZOWA): https://wiki.archiveteam.org/?diff=50189&oldid=50152
16:22:06yts98 leaves
16:22:08yts98 joins
16:24:01icedice (icedice) joins
17:08:02nostalgebraist quits [Client Quit]
17:46:06railen63 joins
17:48:27myself quits [Quit: The Lounge - https://thelounge.chat]
17:48:51nicolas17 joins
17:48:53myself (myself) joins
18:03:30icedice quits [Ping timeout: 265 seconds]
18:22:06sec^nd quits [Ping timeout: 245 seconds]
18:29:14dumbgoy joins
18:51:25spirit joins
18:52:06T31M quits [Remote host closed the connection]
18:52:26T31M joins
18:53:54sec^nd (second) joins
19:37:29beario_ quits [Ping timeout: 252 seconds]
19:41:42Island joins
19:58:32HP_Archivist (HP_Archivist) joins
21:00:07lennier2 joins
21:03:00lennier1 quits [Ping timeout: 258 seconds]
21:03:09lennier2 is now known as lennier1
21:03:25hitgrr8 quits [Client Quit]
21:04:02<lennier1>Wow, Twitter is actually suing data scrapers. It sounds like they had a script to automatically sign up for new accounts. https://www.theverge.com/2023/7/13/23794163/elon-musk-lawsuit-data-scraping-twitter-x-corp
21:04:43upintheairsheep joins
21:05:12<nicolas17>if they don't know their identities how do they know they profited?
21:05:52<upintheairsheep>Have you heard: https://cdn.discordapp.com/attachments/1002873478980046858/1003203199513149510/Screenshot_20220731-103049_Reddit.jpg
21:06:08<Jake>also laughing that somehow 4 IPs can overload Twitter's servers.
21:07:46<@JAA>upintheairsheep: That Reddit thread is from almost three years ago. https://old.reddit.com/r/DataHoarder/comments/js7rou/meganz_will_delete_your_files_now/
21:08:00<upintheairsheep>Thanks.
21:08:08upintheairsheep leaves
21:08:17<nicolas17>the IPs in question are from linode
21:08:22<@JAA>And as the comments explain, it was nothing new then either.
21:08:30<@JAA>Ugh, why do they leave immediately?
21:08:57Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
21:09:02<nicolas17>block them if they keep doing that (?)
21:09:47<Jake>looks like they are joining from web, so they're probably closing the tab
21:10:04<@JAA>No, they parted the channel rather than closed the connection.
21:10:24<@JAA>Still connected, in fact.
21:11:21pilford joins
21:16:24<TheTechRobo>JAA: couldn't the connection have timed out? i don't think they're connected anymore
21:17:41Terbium joins
21:17:50<@JAA>TheTechRobo: Possible, but they did explicitly /part this channel (or however you do that in the web UI with clicky things).
21:18:05<nicolas17>and it's not the first time
21:18:13<@JAA>^
21:18:26<@JAA>They constantly appear, drop something, and leave again.
21:18:36<TheTechRobo>yeah
21:30:24icedice (icedice) joins
21:34:26nicolas17 quits [Ping timeout: 258 seconds]
21:38:25nicolas17 joins
21:41:28sec^nd quits [Remote host closed the connection]
21:42:07sec^nd (second) joins
21:47:56pilford quits [Remote host closed the connection]
21:57:38pilford joins
22:15:55pilford quits [Remote host closed the connection]
22:55:47chessnoob280 joins
23:07:05<@JAA>rewby: FYI, quite a few of the domains in your list do not in fact resolve to FutureQuest IPs. I don't know if that many sites were migrated in the past week, but yeah, skipping anything that isn't still in that range.
23:08:41<flashfire42>Thats a good and smart idea I love grabbing random sites but when on a deadline best to put them in the do later pile
23:10:14<@JAA>Getting about 851 out of 1555 still in 69.5.0.0/16 (too lazy to restrict it properly to /19).
23:10:35will1|m is now known as will|m
23:11:51<fireonlive>(same lol)
23:18:56Nick joins
23:19:33Nick quits [Remote host closed the connection]
23:25:40IDK (IDK) joins
23:30:12<@JAA>So, quick explanation about what I'm doing: I curl'd each domain with HTTP/HTTPS and non-www/www, and I'm ignoring anything for now that doesn't return HTTP 200. Then I'm filtering out any domain which has been done by Ryz and taking the 'best' remaining URL for each domain, preferring HTTPS over HTTP and non-www over www.
23:30:40qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
23:31:24<@JAA>Where the starting point for 'each domain' is the combination of fireonlive's rDNS list and rewby's fDNS list (filtered to domains that still resolve to there now).
23:31:38<@JAA>This is crude and will miss some things, but it's enough work that it should keep AB busy for a while.
23:32:30railen69 joins
23:33:42<@JAA>Results in a total of 1022 jobs to run.
23:35:33railen63 quits [Ping timeout: 265 seconds]
23:45:54<@JAA>queueh2ibot has been unleashed.
23:52:57Megaweapon quits [Ping timeout: 265 seconds]
23:52:59Arcorann (Arcorann) joins