00:00:16<Jonimus>There are some reports of some people still being able to access it so the server may still be up and its only the domain that has reverted.
00:00:24<nicolas17>ugh
00:00:27<@JAA>Jonimus: Do you know the server's IP?
00:00:32<nicolas17>yeah if you figure out what IP address it used to have
00:01:03<@JAA>Ah, it's in DNS History.
00:01:06<Jonimus>I'll check with the discord I found out about the issue from.
00:01:39<@JAA><html><head><title>rrpicturesarchives.net</title></head><body><h1>rrpicturesarchives.net</h1><p>Coming soon.</p></body></html>
00:01:42<@JAA>Welp
00:01:56<nicolas17>/o\
00:02:05<Jonimus>ahh mybad
00:02:10<Jonimus>rrpicturearchives.net
00:02:29<Jonimus>My copy paste was failing on my IRC client :(
00:02:59<@JAA>Ok, that looks better, and yep, still up.
00:03:03<@JAA>208.69.231.186
00:03:22<@JAA>Pretty slow though
00:04:08jtagcat quits [Quit: Bye!]
00:04:24jtagcat (jtagcat) joins
00:04:29<Jonimus>From what people in the railfan discord I got the info from said is the site had rate limiting in later years due to server load.
00:04:50<nicolas17>well if there's multiple people trying to archive stuff at the same time, it's going to make things worse
00:05:21<Jonimus>I don't believe they are trying to archive it, most of the people in that discord can't access it anymore and aren't trying.
00:05:35<Jonimus>They would like to, but most of them are not tech savvy.
00:05:40<@JAA>Ew, ASP.NET
00:05:47<Jonimus>also that.
00:06:18<@JAA>148k albums with over 6 million pictures, apparently.
00:06:34<nicolas17>oh my
00:06:50<@JAA>And wpull won't be able to handle it correctly because some links have backslashes.
00:06:59<@JAA><img src="/pictures\147919\thumbnails\IMG_9205.JPG"
00:07:16<nicolas17>Microsoft IIS moment
00:08:51<@JAA>At least it looks like the ASP.NET form crap isn't actually used.
00:09:47<@JAA>Oh, nevermind, it is on some pages.
00:09:54<@JAA>E.g. archiveList.aspx
00:10:05<@JAA>But the images can be retrieved without it, I think.
00:10:29<Jonimus>The images would be the main desireble thing I believe.
00:10:29<nicolas17>yeah and I guess we can grab archivethumbs.aspx?id=$number for all numbers without needing to discover them by crawling
00:10:31<nicolas17>easy to enumerate
00:10:42<@JAA>Yeah
00:12:35<Jonimus>Is there anything else I can help with/ask of the regular users of the site? Or should I just let you see what you can do?
00:13:56<nicolas17>fullpwnmedia: do you think that dynabook ftp actually has a chance of changing soon? it was put into archivebot, and I archived it too, but keeping a *local* copy to sync and check for changes is burning a hole in my hard disk atm
00:14:02<@JAA>Hmm, I see comments as well.
00:14:43<nicolas17>I think I'll delete the yahoo groups stuff too (deleting 500k files will take a while even on SSD...)
00:19:00<@JAA>http://rrpicturearchives.net/showPicture.aspx?id=260097 has 30 comments and no pagination. Can't find a bigger one right now.
00:19:47<@JAA>I see that they also had 'contributor sites' on subdomains, e.g. http://railfanblaise05.rrpicturearchives.net/
00:19:56<nicolas17>same IP?
00:19:59<@JAA>Yeah
00:20:37<@JAA>Same content too, just filtered down to uploads by that user, it seems.
00:21:03<@JAA>But all served through the subdomain. :-|
00:21:49thedudedude quits [Client Quit]
00:24:34<pabs>immibis: I threw it into ArchiveBot
00:25:28<nicolas17>ok I added an 'at' job to delete my "tb2b" in 24h if nobody stops me before then :P
00:27:18<pabs>immibis: sadly the AB job completely failed, need an op to expire and retry
00:27:26<pabs>joepie91|m: ^
00:34:11<immibis>how does AB handle rate limits?
00:34:43<immibis>curseforge is very much an operation to make lots of money with ads and user data, so I doubt there are no rate limits
00:34:52<pabs>it has a concurrency setting and a request delay setting, might also handle 429, not sure
00:35:18<@JAA>429s aren't handled specially.
00:35:32<pabs>do they get retried at least?
00:35:37<@JAA>Yes
00:40:27<pabs>immibis: from #archivebot, it is apparently cloudflare and hard to archive. JAA rescheduled it on a pipeline where it might work.
00:41:20<@JAA>Yeah, I tried to grab it in December, when they announced the deprecation and someone brought it up here, and that failed.
01:29:50pabs quits [Ping timeout: 265 seconds]
01:43:52pabs (pabs) joins
01:50:24icedice2 joins
01:51:50icedice quits [Ping timeout: 252 seconds]
02:10:10<Jonimus>JAA: is that rrpicturearchives.net site a doable project or is there anything else I can do to help, the estimates I am seeing is about 10TB of photos.
02:17:15<@JAA>Jonimus: I don't suppose there's any hint at how long the server will last?
02:17:31<Jake>(re: curseforge, I will try again, but I believe it didn't work last time.)
02:17:48Unholy2361 quits [Quit: The Lounge - https://thelounge.chat]
02:18:08Unholy2361 (Unholy2361) joins
02:18:25<Jonimus>Nope, apparently the admin passed a little over a year ago.
02:19:52<Jonimus>I can see if anyone in the railfan groups I know of know the family or could reach out but I'm hesistant to do that.
02:22:08<Jonimus>I didn't personally know of the site until I saw people mentioning it as down and thought it might be good to bring it up to ya
02:22:14<Jonimus>y'all
02:22:50<@JAA>Definitely, seems like a very nice resource and a shame to lose.
02:23:18<nicolas17>the images are in S3
02:23:26<@JAA>Some, but not all of them.
02:23:46<nicolas17>oh yeah just saw some that aren't
02:24:42<nicolas17>guess I'll have to learn to use wget-at lua scripting
02:26:23HP_Archivist quits [Ping timeout: 265 seconds]
02:27:27<TheTechRobo>nicolas17: It's not too difficult, FWIW: https://github.com/ArchiveTeam/wget-lua/wiki
02:34:11<pokechu22>Re curseforge it looks like https://authors-old.curseforge.com/forums is also going away
02:37:46<nicolas17>TheTechRobo: JAA: http://rrpicturearchives.net/archivethumbs.aspx?id=147904 this album had new pictures added *right now*
02:38:38lennier1 quits [Client Quit]
02:39:10lennier1 (lennier1) joins
02:45:29<Jonimus>The DNS going down hasn't propogated to everyone yet.
02:45:42<Jonimus>Some users may not be aware its going down.
02:46:13<Jonimus>Also some people are still accessing via the IP trying to grab their own photos or their favorites etc.
02:46:29<nicolas17>view counters are going up on some images too
02:49:03<Jonimus>Yeah, that doesn't surprise me, there is likely a number of people trying to grab specific albums etc.
02:49:26<@JAA>Yeah, you can access the site directly via IP as well, don't neven need /etc/hosts et al.
02:51:05<Jonimus>Does that IP tell you anything about how it was hosted, the sites owner worked in security, possibly cyber so for all we know its a box he had in the corner at work or similar.
02:51:26<nicolas17>oh I didn't check
02:52:07<nicolas17>the whole IP address block is registered under his name :|
02:52:11<@JAA>Apparently hosted at https://dartpoints.com/
02:52:39<nicolas17>CIDR: 208.69.231.184/29
02:52:41<nicolas17>NetName: TIMHU001
02:52:42<nicolas17>OriginAS: AS15085
02:52:44<nicolas17>Customer: Tim Huemmer (C03350945)
02:54:07<Jonimus>The guy who passed was named mike so that must be who owns dartpoints
02:54:23<nicolas17>I was going off the page footer
02:54:25<nicolas17>"Site Design ©2001-2020 Tim Huemmer"
02:54:51<@JAA>It does look like a 2001 design. :-)
02:55:16<Jonimus>https://www.legacy.com/us/obituaries/mywebtimes/name/michael-maskel-obituary?id=32138722
02:55:58<Jonimus>Thats the obit for the guy who ran it from what I've seen.
02:56:28BlueMaxima quits [Read error: Connection reset by peer]
03:15:14<nicolas17>welp
03:15:16<nicolas17>JAA: https://transfer.archivete.am/KAuSj/response.txt
03:16:34<nicolas17>looks like we're gonna need IPs
03:18:38hackbug quits [Remote host closed the connection]
03:19:31<nicolas17>images aren't affected, only aspx
03:19:56fishingforsoup_ joins
03:20:25<myself>or can you just write to the Tim guy and explain you're trying to preserve the site and does he have any knobs to turn?
03:20:49hackbug (hackbug) joins
03:21:54<nicolas17>I *did* do an excessive number of requests :P
03:23:09<nicolas17>myself: I'm getting 503 Service Unavailable now
03:23:42<nicolas17>including for pictures
03:24:00<nicolas17>can you reproduce?
03:24:12<nicolas17>or did my IP get blocked at another level now?
03:24:14fishingforsoup quits [Ping timeout: 252 seconds]
03:24:59<nicolas17>ok tried from VPS, 503 there too... did I kill the site?
03:25:22fishingforsoup__ joins
03:25:57<nicolas17>now connection refused on port 80
03:26:06<nicolas17>this looks a lot like someone actively messing with the server
03:27:02<nicolas17>sorry Tim Huemmer, I won't do that again, plz bring site back
03:28:50fishingforsoup_ quits [Ping timeout: 252 seconds]
03:32:13<nicolas17>if *my* requests caused high load and raised some alert that made someone go "oh this server is causing it, isn't this the customer that hasn't paid in a year?" and turn it off I'm going to die of guilt
03:33:37<myself>if that's all it took, nobody was gonna be able to archive it anyway
03:36:15<@JAA>Oof
03:38:36<myself>"Hey if a few railfans pool a few bucks to pay this guy's hosting bills, can you stand the server back up long enough to archive it?"
03:43:43<nicolas17>myself: "nobody was gonna be able to archive it anyway" I downloaded a thousand /archivethumbs.aspx... multiple times, concurrency 10 each time, I could have certainly been more subtle about it 😓 I didn't expect anyone would be watching
03:44:00<myself>lmao
03:45:45<Terbium>it'll be funny if there was a firewall or IDS in front that treated the high level of requests as a DDOS and null routed the server
03:46:26<nicolas17>Terbium: port 80 is giving an active "connection refused", and port 443 is giving a sonicwall firewall login like it was before
03:49:03<nicolas17>also, first I got this https://transfer.archivete.am/KAuSj/response.txt (almost certainly per IP and automated), then I got "503 Service Unavailable" (at home and at my VPS), *then* it escalated to "connection refused", sure looked like manual intervention
03:50:18<Jonimus>Do you think it would be better if someone from archiveteam or archive.org tried to contact Tim or if someone from the railfan community?
03:50:50<Jake>As of May 9th, that response was being returned for some people already. https://webcache.googleusercontent.com/search?q=cache:LdFpYwBRe5IJ:https://www.tuugo.in/Companies/rr-technosoft/0150008336586&cd=10&hl=en&ct=clnk&gl=us
03:51:11decky_e quits [Ping timeout: 252 seconds]
03:51:49decky_e (decky_e) joins
03:56:48<@JAA>I did see that 'Excessive Usage Error' on some search results as well.
03:57:18<nicolas17>yeah I'm sure that was automated and affecting my IP alone, and the limit has been there for a while
03:57:32<nicolas17>the error page had "Last-Modified: Fri, 10 Dec 2010"
03:58:39<nicolas17>but it feels like after I was already blocked, someone took manual action
04:00:12decky_e quits [Remote host closed the connection]
04:00:30<@JAA>It's timing out for me now.
04:01:08<nicolas17>I still get connection refused, but sometimes it takes several seconds
04:02:07<@JAA>And now I'm able to connect, but the server doesn't respond to the HTTP request.
04:02:47<nicolas17>http://rrpicturearchives.net/pictures/147000/thumbnails/20221231_154825.jpg here's a picture link, which should bypass the ASP.NET crap
04:02:49<@JAA>Ah there we go, got a response again.
04:02:57<Jonimus>It just worked for me, so I suspect the issue is multiple people were trying to download stuff and the firewall or manual intervention is happening.
04:04:05<nicolas17>everything works again now
04:04:12<nicolas17>I'm *not* going to do that request rate again
04:04:16<nicolas17>geez
04:06:10<nicolas17>okay WOW
04:06:25<nicolas17>JAA:
04:06:27<nicolas17>- <img src="/pictures\147904\thumbnails\050921 Perry (41).JPG" border="0" alt="UP 5488">
04:06:28<nicolas17>+ <img src="http://s3.amazonaws.com/rrpa_photos/147904/thumbnails/050921 Perry (41).JPG" border="0" alt="UP 5488">
04:07:09<Jonimus>Wait is it actively being moved to s3 or something?
04:07:26<@JAA>Huh
04:08:05<@JAA>Yeah, looks like it.
04:08:29<Jonimus>Or maybe some sort of s3 using caching setup is being used?
04:09:23<pokechu22>Better local to s3 than the other way around :P
04:09:28<@JAA>This would be a weird way of doing it but certainly not the weirdest.
04:09:57<@JAA>Shedding load would be another possibility.
04:10:04<@JAA>But again, weird.
04:10:16<@JAA>Serving static files is one of the easiest things a web server can do.
04:10:30<@JAA>Maybe IIS sucks at that though, who knows. It's Microsoft, after all.
04:11:48<nicolas17>well
04:12:16<nicolas17>the S3 image above
04:12:22<nicolas17>Date: Mon, 22 May 2023 04:12:08 GMT
04:12:24<nicolas17>Last-Modified: Mon, 22 May 2023 04:03:01 GMT
04:12:33<nicolas17>sure looks like it was recently uploaded to S3
04:13:40<Jonimus>The dartpoints or whateve that owns the IP is some cloudy "edge colocation
04:14:05<Jonimus>service, it could easly be some system they have doing the load shedding or whatever.
04:14:39dumbgoy quits [Ping timeout: 265 seconds]
04:16:01<Jake>That's...w eird...
04:18:39<Jonimus>Yeah it is weird to move things from local to s3, that said the does mean the paths are now forward slashs which is I think better for your tools isn't it?
04:18:53<nicolas17>not *all* were moved
04:19:03<nicolas17>in fact I even see albums with a mix
04:19:12<@JAA>The other possibility is that they're actively migrating everything to a new site or similar.
04:19:59<nicolas17>JAA: from the initial description of the situation, I didn't expect to find someone alive to do that
04:20:07<@JAA>Yeah
04:25:16<Jonimus>Unless the tim guy is doing it, he may not have had access to the domain to renew it but may still be trying to keep the site up, and messing with it for that reason?
04:25:36<nicolas17>yeah probably him
04:26:07<nicolas17>if he's aware the domain expired *and* that some people are still accessing (someone uploaded new pictures a few hours ago!), he should put some notice on the front page...
04:27:01<Jonimus>You'd think he'd have an admin account since he designed it but maybe he doesn't for reasons.
04:28:00<Jonimus>Like depending on how the site was built in 2001 it may not be the easy to just update the homepage.
04:28:41<@JAA>'Please install Microsoft FrontPage 2000'
04:29:11<Jonimus>Wait the main pages "updated photo albums" all list today.
04:29:52<Jonimus>http://208.69.231.186/archiveList.aspx?Sort=dtUpdateDate
04:30:38<nicolas17>Jonimus: http://208.69.231.186/archivethumbs.aspx?id=147904 this album got new pictures *after* you told us about the site and I started looking into it
04:31:22<Jonimus>Yeah so either people are uploading photo's by connecting via the IP or their DNS hadn't updated.
04:31:42<Jonimus>I don't think there were any like phone apps or similar.
04:32:01<nicolas17>if there was a phone app, I'm sure it would depend on the domain working...
04:32:24<Jonimus>You'd think.
05:18:00decky_e joins
05:31:22woans (WOANS) joins
05:54:50<@arkiver>nicolas17: what is tb2b?
05:55:04<@arkiver>JAA: is rrpicturearchives.net somethig for archivebot?
05:55:07<nicolas17>arkiver: https://uk.dynabook.com/generic/general-new-ftp-and-software-guide-sheets/ this FTP
05:55:40<@arkiver>nicolas17: do you perhaps have a tl;dr of the above conversation?
05:55:55<nicolas17>regarding rrpicturearchives?
05:56:26<@JAA>arkiver: rrpicturearchives.net uses backslashes in its image URLs, which fail on AB. Also, the domain expired, so DNS trickery is needed if we want it under the domain rather than the IP.
05:56:36<@arkiver>yeah
05:56:50<@arkiver>it clearly shows a godaddy page on http://rrpicturearchives.net/archivethumbs.aspx?id=147904
05:57:02<@arkiver>(without altering DNS results)
05:57:35<nicolas17>yeah what JAA said
05:57:40<@arkiver>nicolas17: if you have the only copy of that, please upload it ot IA
05:57:52<@arkiver>on rrpicturearchives.net - what exactly did it hold?
05:58:04<@JAA>Millions of train photos
05:58:12<nicolas17>arkiver: of the tb2b FTP? no, it was archivebot'd successfully
05:58:18<@arkiver>nicolas17: ah good
05:58:35<@JAA>I can throw http://208.69.231.186/ into AB and then deal with the backslashes when it finishes, I guess.
05:58:51<@JAA>Unless we want to do DNS fuckery and archive it under the expired domain.
05:58:51<nicolas17>and when I remember I "rclone sync" against my local copy and I haven't seen the contents actually change
05:58:52<fireonlive>1-2 registrars allow anyone to renew any domain but sadly not godaddy it seems
05:59:24<@arkiver>we _could_ archive it under the expired domain, but that will not go into the Wayback Machine
05:59:25<@JAA>fireonlive: I'd like to know which ones, so I can add them to my 'never, ever use' list of registrars.
05:59:36<@arkiver>JAA: nicolas17: i'd say, get http://208.69.231.186/
05:59:44<@arkiver>as that IP, not under the old domain.
05:59:59<fireonlive>from what i quickly found it's just Hover
06:00:05<@arkiver>archives of that IP (without DNS trickery) can go into the Wayback Machine
06:00:06<@JAA>arkiver: There is precedence for such archives going into the WBM. But it's not ideal, sure.
06:00:08<@arkiver>DNS trickery cannot
06:00:37<fireonlive>customers were like 'i'm locked out of my account' or 'person X is unavailable' so they were like 'sure we'll take the money but the owner retains control'
06:00:45<@arkiver>JAA: is it small (and easy) enough for archivebot?
06:01:15<nicolas17>there may be 6 million photos in rrpicturearchives
06:01:33<@JAA>5.6M is what the homepage says, yeah.
06:01:48<@arkiver>oh they have the easy sequential IDs
06:02:05<@JAA>The content can mostly be gotten easily with AB, yeah.
06:02:17<@arkiver>can we just !a < a list of these URLs in archivebot together with the main page?
06:02:21<@JAA>Some of the navigation is ASP.NET POST nonsense.
06:02:21<nicolas17>JAA: I was going off a photo ID being 6025791, but there may be gaps I guess
06:03:14<@JAA>Yeah, the IDs go higher. Probably some deleted stuff etc.
06:03:34<@JAA>arkiver: Yeah, will do that shortly.
06:04:57<@JAA>What's the highest locomotive ID?
06:05:58<nicolas17>as in locoPicture.aspx?
06:06:02<@JAA>Yeah
06:06:13<@JAA>Oh nice, capitalisation bullshit from Microsoft, of course.
06:06:37<@JAA>Locopicture.aspx and LocoPicture.aspx appear in links across the site.
06:06:42<nicolas17>/o\
06:06:51<@arkiver>luckily the Wayback Machine handles that :P
06:07:02@JAA slaps arkiver around a bit with a large trout
06:07:17<@JAA>AB doesn't, anyway, so it will retrieve those things multiple times.
06:07:42<@arkiver>maybe we should fix AB? :P
06:08:09<@arkiver>(that was a joke - to be clear)
06:08:29<fishingforsoup__> I need help finding some YouTube videos.
06:08:35<fishingforsoup__>https://www.youtube.com/watch?v=JDyCsTDoKc0, https://www.youtube.com/watch?v=yj8MTPX8zDE, https://www.youtube.com/watch?v=2L5dfunuF6g, and https://www.youtube.com/watch?v=XFMS0Hr1Ub4.
06:08:41<@JAA>I mean, we should, URL rewriting has been on my wishlist for a while.
06:11:07<@JAA>Looks like the highest loco ID is slightly above 265700.
06:11:29<@JAA>Nope, exactly that.
06:12:51<nicolas17>I just tried 265700..265800 and 265700 was the only successful one
06:12:54<nicolas17>so yes
06:13:26<@arkiver>i need to open source my ID range scanning thing some time soon
06:13:30<@arkiver>it can be used for this stuff
06:13:43<nicolas17>binary search?
06:13:47<@arkiver>though basically it's the same as is in the telegram-grab Lua code
06:13:53<@arkiver>nicolas17: what?
06:14:12<@arkiver>and the same thing that is used in #telegrab to find highest ID for channels that don't have a public index
06:14:13<nicolas17>I guess you do something like binary search to find the last valid ID?
06:14:20<@JAA>I'll first do the photo pages in random order, then albums, then locomotives.
06:14:30<@arkiver>nicolas17: sort of
06:14:32<@arkiver>not exactly
06:14:44<@JAA>nicolas17: You can't do a binary search if you don't know the possible upper end. And you can't simply start at a gazillion because it takes too long then.
06:14:46<nicolas17>yeah, if you have to cope with potential gaps it's trickier
06:14:57<@JAA>Gaps as well, yeah.
06:21:17<@JAA>The 'Excessive Usage Error' will be annoying since it's served with HTTP 200.
06:22:10<@JAA>Anyway, AB job is started.
06:25:15Island quits [Read error: Connection reset by peer]
06:36:45<masterX244>JAA: ASP post request pagination is the pest... had that shit once, too when grabbing the tm-exchange sites. Bonus: bugged server where you get a 500 in the middle and unloadable pages + a ipban. Had to resort to TOR for bruteforcing id-s to do a crawl from a fresh IP
06:38:34<@JAA>For the record: `{ echo http://208.69.231.186/; seq 6025791 | shuf | sed 's,^,http://208.69.231.186/showPicture.aspx?id=,'; seq 147920 | shuf | sed 's,^,http://208.69.231.186/archiveThumbs.aspx?id=,'; seq 265700 | shuf | sed 's,^,http://208.69.231.186/locoPicture.aspx?id=,'; } | zstd -10`
06:48:44<nicolas17>does archivebot use warriors or otherwise parallel requests across IPs, or is it like 1 machine?
06:51:59<Maakuth|m>there is a handful of servers sharing the load
06:57:41<@JAA>Each job is a single process on a single machine.
06:59:31Arcorann (Arcorann) joins
07:00:02nfriedly quits [Remote host closed the connection]
07:05:39<masterx244|m>the warrior is the "big gun", we only use it on major targets
07:05:51<masterx244|m>usually big sites that can bear the load
07:24:23vantec quits [Read error: Connection reset by peer]
07:31:04<@JAA>nicolas17: Welp, the AB job is also in 'Excessive Usage Error' hell now.
07:31:59<@JAA>I wonder whether it really is a daily limit or not.
07:39:25<@JAA>And now it's getting ECONNRESET.
07:45:16<masterx244|m>drats...
07:45:51<masterx244|m>sometimes sites are more triggerhappy if you poke on too many 404 or 403s
07:51:15decky_e quits [Read error: Connection reset by peer]
08:02:15decky_e (decky_e) joins
08:13:36vantec (vantec) joins
08:19:54<@JAA>It happened after pretty much exactly 10k requests.
08:27:50woans quits [Ping timeout: 252 seconds]
08:28:24<masterx244|m>603 chunks on showPicture if splitting it in 10k blocks; 15 chunks on archiveThumbs and 27 chunks on locoPicture; maybe someone with the crazy clusters can move the stuff around between his IPs so after each IP gets burned it gets continued at a fresh one
08:57:08Dango360_ quits [Read error: Connection reset by peer]
09:17:48nfriedly joins
10:37:39icedice2 quits [Client Quit]
10:38:10icedice (icedice) joins
10:40:34trumad|m joins
10:43:37<trumad|m>apologies for post non-urgent stuff in the main channel. I'm still getting used to how things work
11:04:35PredatorIWD_ joins
11:04:35Ivan22 joins
11:04:52pikabluu joins
11:04:52superkuh_ joins
11:04:53vantec_ joins
11:04:56rohvani9 joins
11:05:02marto_3 joins
11:05:03fullpwn joins
11:05:10lflare quits [Killed (ing.hackint.org (Nickname regained by services))]
11:05:11lflare (lflare) joins
11:05:12Letur6 joins
11:05:20wyatt8750 joins
11:05:29monoxane quits [Client Quit]
11:05:29marto_ quits [Client Quit]
11:05:29Letur quits [Quit: Ping timeout (120 seconds)]
11:05:29fullpwnmedia quits [Remote host closed the connection]
11:05:29rr quits [Quit: Ping timeout (120 seconds)]
11:05:29tbc1887 quits [Remote host closed the connection]
11:05:29s-crypt quits [Quit: Ping timeout (120 seconds)]
11:05:29wyatt8740 quits [Quit: ZNC got killed or something else has gone wrong, probably.]
11:05:29kiska quits [Quit: Ping timeout (120 seconds)]
11:05:29apache2 quits [Remote host closed the connection]
11:05:29pikablu quits [Remote host closed the connection]
11:05:29nic quits [Quit: Ping timeout (120 seconds)]
11:05:29Ivan226 quits [Remote host closed the connection]
11:05:29vantec quits [Remote host closed the connection]
11:05:29Ryz2 quits [Quit: Ping timeout (120 seconds)]
11:05:29superkuh quits [Remote host closed the connection]
11:05:29PredatorIWD quits [Remote host closed the connection]
11:05:29rohvani quits [Quit: Ping timeout (120 seconds)]
11:05:29Letur6 is now known as Letur
11:05:29marto_3 is now known as marto_
11:05:29rohvani9 is now known as rohvani
11:05:48apache2 joins
11:05:54tbc1887 (tbc1887) joins
11:06:13kiska (kiska) joins
11:13:46Arcorann quits [Ping timeout: 252 seconds]
11:33:34dumbgoy joins
11:53:21pikabluu quits [Read error: Connection reset by peer]
12:57:20HP_Archivist (HP_Archivist) joins
13:32:55killsushi joins
13:39:49<h2ibot>Bzc6p edited EOldal (+353, /* Archiving */ Archives finished uploading): https://wiki.archiveteam.org/?diff=49817&oldid=49430
13:41:49<h2ibot>Bzc6p edited EOldal (+72, Add link to archives to infobox): https://wiki.archiveteam.org/?diff=49818&oldid=49817
13:42:30hitgrr8 joins
13:45:48<Jonimus>JAA: apparently people got ahold of the Tim and he has taken over the site and apparently gotten ahold of the domain.
13:46:13<Jonimus>So crisis averted I guess? Though a good backup might still be worth while.
13:55:52<h2ibot>Bzc6p edited Kepfeltoltes.eu (+126, Added links to archives): https://wiki.archiveteam.org/?diff=49819&oldid=49447
14:01:39aismallard quits [Remote host closed the connection]
14:01:39phuzion quits [Remote host closed the connection]
14:02:43phuzion (phuzion) joins
14:02:49aismallard joins
14:27:30<@JAA>Jonimus: Nice, but yeah, agreed.
14:28:08<Jonimus>At least now you can just use the domain instead of hitting the IP directly.
14:29:42<@JAA>Well, once the DNS propagates, at least.
14:30:41<@JAA>The 10k reqs/day limit is still going to be a pain though.
14:35:44Island joins
15:24:11nicolas17 quits [Ping timeout: 252 seconds]
15:27:07spirit quits [Client Quit]
15:30:31nostalgebraist joins
15:30:47decky_e quits [Ping timeout: 252 seconds]
15:34:20nicolas17 joins
16:00:13<h2ibot>JAABot edited CurrentWarriorProject (+6): https://wiki.archiveteam.org/?diff=49820&oldid=49760
16:04:52decky_e (decky_e) joins
16:24:21datechnoman quits [Quit: Ping timeout (120 seconds)]
16:34:24datechnoman (datechnoman) joins
17:21:47chrismeller quits [Client Quit]
17:21:52chrismeller6 (chrismeller) joins
17:22:09chrismeller6 is now known as chrismeller
17:28:39rhodez joins
17:37:15<pokechu22>rhodez: logs at https://hackint.logs.kiska.pw/archiveteam-bs/20230522
17:37:17icedice quits [Ping timeout: 252 seconds]
17:37:22<rhodez>Thank you
17:41:11Tom|m12 joins
17:47:40icedice (icedice) joins
18:15:00<nicolas17>JAA: I'm still blocked so I'm pretty sure rrpicturearchive's block really is daily
18:15:28<@JAA>:-|
18:15:47<nicolas17>"It happened after pretty much exactly 10k requests." that's good to know, I thought of burning my VPS IP doing requests to figure out what the limit was, now I won't have to :P
18:16:02<@JAA>Yeah, not going to happen with AB then obviously.
18:22:23<Jake>There's an impressive amount of spam on the Curseforge forums
18:32:00rageear joins
18:49:19<Jake>Some content also appears to be behind a login wall? 🤔 https://minecraft.curseforge.com/forums/modding-java-edition/modpacks/modpack-discussion/magic-farm-3-harvest/questions/general/794-can-you-plant-magic-beans
19:02:56Ketchup901 quits [Ping timeout: 245 seconds]
19:03:19Ketchup901 (Ketchup901) joins
19:30:11nostalgebraist quits [Client Quit]
19:43:42spirit joins
19:51:11<nicolas17>(still blocked on rrpa)
20:19:19sonick quits [Client Quit]
20:24:13rhodez quits [Ping timeout: 265 seconds]
20:29:49rhodez joins
20:41:21superkuh_ quits [Client Quit]
20:42:25Lambro_D joins
20:42:53Unholy2361 quits [Remote host closed the connection]
20:43:41Unholy2361 (Unholy2361) joins
20:48:48katocala quits [Remote host closed the connection]
21:00:32ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
21:05:37hitgrr8 quits [Client Quit]
21:06:27woans (WOANS) joins
21:08:37<icedice>Are shallow WARCs what you get from running !ao ?
21:09:16<@JAA>Yep
21:13:06<nicolas17>still blocked on rrpa, will it reset at midnight UTC or another timezone? we'll see
21:14:30<nicolas17>10k/day means ~9 sec between requests per IP
21:14:51<nicolas17>harold-pain.png
21:14:53<@JAA>Yeah, we need 650 IP-days to grab it.
21:15:06<nicolas17>(lol IP-days)
21:15:18<@JAA>So if someone has a /24, it can be done in a couple days.
21:15:31<@JAA>Assuming they only have bans per IP, anyway.
21:16:38<nicolas17>I suspect (but I haven't tested it) that the limit is on aspx and doesn't apply to jpg downloads
21:17:48<nicolas17>and if you archive albums alone, it's enough to get the image URLs (albums have thumbnails, but you can easily infer the full-image URL from that), so in theory we could do that to get albums and jpg files, and get showPicture.aspx later
21:18:11<@JAA>Can confirm, images don't get blocked.
21:18:36<nicolas17>however I don't know how many requests are needed to get all albums; the highest album ID was 147920 yesterday, but some are paginated, so it's more than that
21:18:41<@JAA>Or at least not at that draconian limit.
21:29:08<chrismeller>is there a warrior project for this yet?
21:30:27<nicolas17>chrismeller: no
21:30:48<nicolas17>we should archive it, but today it turned less urgent
21:32:56<chrismeller>less urgent?
21:33:14<nicolas17><Jonimus> apparently people got ahold of the Tim and he has taken over the site and apparently gotten ahold of the domain.
21:33:27<chrismeller>oohhh, ok. well that's good.
21:33:43<chrismeller>i was looking at the number of posts per day and it was crazy :D
21:33:51<chrismeller>glad someone will be maintaining it
21:34:36<nicolas17>I wonder if he had to overpay for the domain...
21:35:32<chrismeller>the domain is only a piece of the puzzle, though
21:35:49<nicolas17>I think this Tim guy already had control over the server?
21:36:15<chrismeller>ah, ok. i thought the original maintainer had died
21:36:58<nicolas17>me too, until yesterday during the scraping I saw stuff that definitely looked like a human actively messing with the server
21:37:14<nicolas17>(scraping = exploration, I don't have anything of archival quality)
21:38:10<chrismeller>well i'm not a train guy, but i really found their corpus of train imagery amazing
21:38:40<nicolas17>these people obsessed with specific topics make the internet go round
21:39:18<chrismeller>"nerds on the internet" as they say :)
21:39:45<nicolas17>I say as I carefully curate the contents of https://theapplewiki.com/wiki/Firmware/iPhone/16.x
21:39:46<nicolas17>~~when I should be studying~~
21:40:28<chrismeller>what are you studying for?
21:41:16<nicolas17>I signed up for an online DevOps course and I'm not giving it as much time and attention as I should
21:41:43<chrismeller>yadda yadda yadda kubernetes
21:43:18<nicolas17>yadda yadda devops is not just a set of tech tools, please get 'dev' and 'ops' people to talk to each other
21:43:41decky_e quits [Ping timeout: 252 seconds]
21:44:06decky_e (decky_e) joins
22:01:25katocala joins
22:18:18<fireonlive>instead of screaming matches
22:33:17Dango360 (Dango360) joins
22:43:31Iki1 joins
22:44:11decky_e quits [Ping timeout: 252 seconds]
22:46:56AnotherIki quits [Ping timeout: 252 seconds]
22:47:24<fireonlive>does AT have an official view of cloudflare/ddos-guard and the like :p
22:47:54<nicolas17>afaik cloudflare is well known for being a pain in the ass for archiving
22:48:51<icedice>If you have a sympathetic site admin they can whitelist ArchiveBot IPs in CloudFlare
22:49:07<icedice>One of the sites we're archiving is such a case
22:58:46<andrew>I've been running a grab-site against a Cloudflare-protected site for a few days by now, I guess a lot of it is down to how the owner configured it
23:00:32<icedice>Yeah, and probably the ASN and IP of your server/VPS is on
23:01:27<nicolas17>there's an "I'm under attack" mode in cloudflare that really tunes up the restrictions
23:01:31<icedice>If you're on a highly abused ASN like Frantech (BuyVM), OVH, or ColoCrossing, they're probably not going to be as nice
23:02:01<icedice>M247 also has a pretty bad rep, I think
23:02:02<Doranwen>nicolas17: "these people obsessed with specific topics" is like all of Yahoo Groups, lol - I've lost track of the weird and/or highly specific groups I've stumbled across so far while sorting the metadata
23:02:27<icedice>And Reddit
23:02:33<icedice>There's a subreddit for everything
23:03:20<Doranwen>I think Yahoo Groups was even more so. There are some *really* weird and esoteric groups out there. (Plus a few just plain brain-splodey "what on EARTH" types.)
23:03:29<nicolas17>someone said wikipedia is powered by neurodiversity...
23:03:43<Doranwen>(Or "were" might be the better verb tense, but since I'm looking at them all the time, it feels present.)
23:10:39<@JAA>Yes, it's all about the site owner's config on Buttflare. Plenty of sites use it and cause no issues, but the ones with more aggressive configs are annoying.
23:16:16lflare quits [Killed (nuke.hackint.org (Nickname regained by services))]
23:16:17lflare (lflare) joins
23:17:48<fireonlive>ah that makes sense
23:18:04<fireonlive>a couple things i tried had them cranked to 11 i guess :/
23:19:46<@JAA>There are three broad levels: anything goes, 'I'm under attack' mode (= JS challenge), and the hardcore mode with captchas. There's a very wide variety of options to finetune this though.
23:36:35BlueMaxima joins
23:39:15<Jake>(for anyone wondering, the Minecraft section of CurseForge is running. Will run the author one afterwards. Everything seems to be going fine this time)
23:39:36<@JAA>Nice!
23:39:42Hackerpcs quits [Quit: Hackerpcs]
23:42:22Hackerpcs (Hackerpcs) joins
23:42:31nicolas17 quits [Client Quit]
23:43:37Lambro_D quits [Read error: Connection reset by peer]
23:44:19woans quits [Ping timeout: 265 seconds]