| 00:00:16 | <Jonimus> | There are some reports of some people still being able to access it so the server may still be up and its only the domain that has reverted. |
| 00:00:24 | <nicolas17> | ugh |
| 00:00:27 | <@JAA> | Jonimus: Do you know the server's IP? |
| 00:00:32 | <nicolas17> | yeah if you figure out what IP address it used to have |
| 00:01:03 | <@JAA> | Ah, it's in DNS History. |
| 00:01:06 | <Jonimus> | I'll check with the discord I found out about the issue from. |
| 00:01:39 | <@JAA> | <html><head><title>rrpicturesarchives.net</title></head><body><h1>rrpicturesarchives.net</h1><p>Coming soon.</p></body></html> |
| 00:01:42 | <@JAA> | Welp |
| 00:01:56 | <nicolas17> | /o\ |
| 00:02:05 | <Jonimus> | ahh mybad |
| 00:02:10 | <Jonimus> | rrpicturearchives.net |
| 00:02:29 | <Jonimus> | My copy paste was failing on my IRC client :( |
| 00:02:59 | <@JAA> | Ok, that looks better, and yep, still up. |
| 00:03:03 | <@JAA> | 208.69.231.186 |
| 00:03:22 | <@JAA> | Pretty slow though |
| 00:04:08 | | jtagcat quits [Quit: Bye!] |
| 00:04:24 | | jtagcat (jtagcat) joins |
| 00:04:29 | <Jonimus> | From what people in the railfan discord I got the info from said is the site had rate limiting in later years due to server load. |
| 00:04:50 | <nicolas17> | well if there's multiple people trying to archive stuff at the same time, it's going to make things worse |
| 00:05:21 | <Jonimus> | I don't believe they are trying to archive it, most of the people in that discord can't access it anymore and aren't trying. |
| 00:05:35 | <Jonimus> | They would like to, but most of them are not tech savvy. |
| 00:05:40 | <@JAA> | Ew, ASP.NET |
| 00:05:47 | <Jonimus> | also that. |
| 00:06:18 | <@JAA> | 148k albums with over 6 million pictures, apparently. |
| 00:06:34 | <nicolas17> | oh my |
| 00:06:50 | <@JAA> | And wpull won't be able to handle it correctly because some links have backslashes. |
| 00:06:59 | <@JAA> | <img src="/pictures\147919\thumbnails\IMG_9205.JPG" |
| 00:07:16 | <nicolas17> | Microsoft IIS moment |
| 00:08:51 | <@JAA> | At least it looks like the ASP.NET form crap isn't actually used. |
| 00:09:47 | <@JAA> | Oh, nevermind, it is on some pages. |
| 00:09:54 | <@JAA> | E.g. archiveList.aspx |
| 00:10:05 | <@JAA> | But the images can be retrieved without it, I think. |
| 00:10:29 | <Jonimus> | The images would be the main desireble thing I believe. |
| 00:10:29 | <nicolas17> | yeah and I guess we can grab archivethumbs.aspx?id=$number for all numbers without needing to discover them by crawling |
| 00:10:31 | <nicolas17> | easy to enumerate |
| 00:10:42 | <@JAA> | Yeah |
| 00:12:35 | <Jonimus> | Is there anything else I can help with/ask of the regular users of the site? Or should I just let you see what you can do? |
| 00:13:56 | <nicolas17> | fullpwnmedia: do you think that dynabook ftp actually has a chance of changing soon? it was put into archivebot, and I archived it too, but keeping a *local* copy to sync and check for changes is burning a hole in my hard disk atm |
| 00:14:02 | <@JAA> | Hmm, I see comments as well. |
| 00:14:43 | <nicolas17> | I think I'll delete the yahoo groups stuff too (deleting 500k files will take a while even on SSD...) |
| 00:19:00 | <@JAA> | http://rrpicturearchives.net/showPicture.aspx?id=260097 has 30 comments and no pagination. Can't find a bigger one right now. |
| 00:19:47 | <@JAA> | I see that they also had 'contributor sites' on subdomains, e.g. http://railfanblaise05.rrpicturearchives.net/ |
| 00:19:56 | <nicolas17> | same IP? |
| 00:19:59 | <@JAA> | Yeah |
| 00:20:37 | <@JAA> | Same content too, just filtered down to uploads by that user, it seems. |
| 00:21:03 | <@JAA> | But all served through the subdomain. :-| |
| 00:21:49 | | thedudedude quits [Client Quit] |
| 00:24:34 | <pabs> | immibis: I threw it into ArchiveBot |
| 00:25:28 | <nicolas17> | ok I added an 'at' job to delete my "tb2b" in 24h if nobody stops me before then :P |
| 00:27:18 | <pabs> | immibis: sadly the AB job completely failed, need an op to expire and retry |
| 00:27:26 | <pabs> | joepie91|m: ^ |
| 00:34:11 | <immibis> | how does AB handle rate limits? |
| 00:34:43 | <immibis> | curseforge is very much an operation to make lots of money with ads and user data, so I doubt there are no rate limits |
| 00:34:52 | <pabs> | it has a concurrency setting and a request delay setting, might also handle 429, not sure |
| 00:35:18 | <@JAA> | 429s aren't handled specially. |
| 00:35:32 | <pabs> | do they get retried at least? |
| 00:35:37 | <@JAA> | Yes |
| 00:40:27 | <pabs> | immibis: from #archivebot, it is apparently cloudflare and hard to archive. JAA rescheduled it on a pipeline where it might work. |
| 00:41:20 | <@JAA> | Yeah, I tried to grab it in December, when they announced the deprecation and someone brought it up here, and that failed. |
| 01:29:50 | | pabs quits [Ping timeout: 265 seconds] |
| 01:43:52 | | pabs (pabs) joins |
| 01:50:24 | | icedice2 joins |
| 01:51:50 | | icedice quits [Ping timeout: 252 seconds] |
| 02:10:10 | <Jonimus> | JAA: is that rrpicturearchives.net site a doable project or is there anything else I can do to help, the estimates I am seeing is about 10TB of photos. |
| 02:17:15 | <@JAA> | Jonimus: I don't suppose there's any hint at how long the server will last? |
| 02:17:31 | <Jake> | (re: curseforge, I will try again, but I believe it didn't work last time.) |
| 02:17:48 | | Unholy2361 quits [Quit: The Lounge - https://thelounge.chat] |
| 02:18:08 | | Unholy2361 (Unholy2361) joins |
| 02:18:25 | <Jonimus> | Nope, apparently the admin passed a little over a year ago. |
| 02:19:52 | <Jonimus> | I can see if anyone in the railfan groups I know of know the family or could reach out but I'm hesistant to do that. |
| 02:22:08 | <Jonimus> | I didn't personally know of the site until I saw people mentioning it as down and thought it might be good to bring it up to ya |
| 02:22:14 | <Jonimus> | y'all |
| 02:22:50 | <@JAA> | Definitely, seems like a very nice resource and a shame to lose. |
| 02:23:18 | <nicolas17> | the images are in S3 |
| 02:23:26 | <@JAA> | Some, but not all of them. |
| 02:23:46 | <nicolas17> | oh yeah just saw some that aren't |
| 02:24:42 | <nicolas17> | guess I'll have to learn to use wget-at lua scripting |
| 02:26:23 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 02:27:27 | <TheTechRobo> | nicolas17: It's not too difficult, FWIW: https://github.com/ArchiveTeam/wget-lua/wiki |
| 02:34:11 | <pokechu22> | Re curseforge it looks like https://authors-old.curseforge.com/forums is also going away |
| 02:37:46 | <nicolas17> | TheTechRobo: JAA: http://rrpicturearchives.net/archivethumbs.aspx?id=147904 this album had new pictures added *right now* |
| 02:38:38 | | lennier1 quits [Client Quit] |
| 02:39:10 | | lennier1 (lennier1) joins |
| 02:45:29 | <Jonimus> | The DNS going down hasn't propogated to everyone yet. |
| 02:45:42 | <Jonimus> | Some users may not be aware its going down. |
| 02:46:13 | <Jonimus> | Also some people are still accessing via the IP trying to grab their own photos or their favorites etc. |
| 02:46:29 | <nicolas17> | view counters are going up on some images too |
| 02:49:03 | <Jonimus> | Yeah, that doesn't surprise me, there is likely a number of people trying to grab specific albums etc. |
| 02:49:26 | <@JAA> | Yeah, you can access the site directly via IP as well, don't neven need /etc/hosts et al. |
| 02:51:05 | <Jonimus> | Does that IP tell you anything about how it was hosted, the sites owner worked in security, possibly cyber so for all we know its a box he had in the corner at work or similar. |
| 02:51:26 | <nicolas17> | oh I didn't check |
| 02:52:07 | <nicolas17> | the whole IP address block is registered under his name :| |
| 02:52:11 | <@JAA> | Apparently hosted at https://dartpoints.com/ |
| 02:52:39 | <nicolas17> | CIDR: 208.69.231.184/29 |
| 02:52:41 | <nicolas17> | NetName: TIMHU001 |
| 02:52:42 | <nicolas17> | OriginAS: AS15085 |
| 02:52:44 | <nicolas17> | Customer: Tim Huemmer (C03350945) |
| 02:54:07 | <Jonimus> | The guy who passed was named mike so that must be who owns dartpoints |
| 02:54:23 | <nicolas17> | I was going off the page footer |
| 02:54:25 | <nicolas17> | "Site Design ©2001-2020 Tim Huemmer" |
| 02:54:51 | <@JAA> | It does look like a 2001 design. :-) |
| 02:55:16 | <Jonimus> | https://www.legacy.com/us/obituaries/mywebtimes/name/michael-maskel-obituary?id=32138722 |
| 02:55:58 | <Jonimus> | Thats the obit for the guy who ran it from what I've seen. |
| 02:56:28 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 03:15:14 | <nicolas17> | welp |
| 03:15:16 | <nicolas17> | JAA: https://transfer.archivete.am/KAuSj/response.txt |
| 03:16:34 | <nicolas17> | looks like we're gonna need IPs |
| 03:18:38 | | hackbug quits [Remote host closed the connection] |
| 03:19:31 | <nicolas17> | images aren't affected, only aspx |
| 03:19:56 | | fishingforsoup_ joins |
| 03:20:25 | <myself> | or can you just write to the Tim guy and explain you're trying to preserve the site and does he have any knobs to turn? |
| 03:20:49 | | hackbug (hackbug) joins |
| 03:21:54 | <nicolas17> | I *did* do an excessive number of requests :P |
| 03:23:09 | <nicolas17> | myself: I'm getting 503 Service Unavailable now |
| 03:23:42 | <nicolas17> | including for pictures |
| 03:24:00 | <nicolas17> | can you reproduce? |
| 03:24:12 | <nicolas17> | or did my IP get blocked at another level now? |
| 03:24:14 | | fishingforsoup quits [Ping timeout: 252 seconds] |
| 03:24:59 | <nicolas17> | ok tried from VPS, 503 there too... did I kill the site? |
| 03:25:22 | | fishingforsoup__ joins |
| 03:25:57 | <nicolas17> | now connection refused on port 80 |
| 03:26:06 | <nicolas17> | this looks a lot like someone actively messing with the server |
| 03:27:02 | <nicolas17> | sorry Tim Huemmer, I won't do that again, plz bring site back |
| 03:28:50 | | fishingforsoup_ quits [Ping timeout: 252 seconds] |
| 03:32:13 | <nicolas17> | if *my* requests caused high load and raised some alert that made someone go "oh this server is causing it, isn't this the customer that hasn't paid in a year?" and turn it off I'm going to die of guilt |
| 03:33:37 | <myself> | if that's all it took, nobody was gonna be able to archive it anyway |
| 03:36:15 | <@JAA> | Oof |
| 03:38:36 | <myself> | "Hey if a few railfans pool a few bucks to pay this guy's hosting bills, can you stand the server back up long enough to archive it?" |
| 03:43:43 | <nicolas17> | myself: "nobody was gonna be able to archive it anyway" I downloaded a thousand /archivethumbs.aspx... multiple times, concurrency 10 each time, I could have certainly been more subtle about it 😓 I didn't expect anyone would be watching |
| 03:44:00 | <myself> | lmao |
| 03:45:45 | <Terbium> | it'll be funny if there was a firewall or IDS in front that treated the high level of requests as a DDOS and null routed the server |
| 03:46:26 | <nicolas17> | Terbium: port 80 is giving an active "connection refused", and port 443 is giving a sonicwall firewall login like it was before |
| 03:49:03 | <nicolas17> | also, first I got this https://transfer.archivete.am/KAuSj/response.txt (almost certainly per IP and automated), then I got "503 Service Unavailable" (at home and at my VPS), *then* it escalated to "connection refused", sure looked like manual intervention |
| 03:50:18 | <Jonimus> | Do you think it would be better if someone from archiveteam or archive.org tried to contact Tim or if someone from the railfan community? |
| 03:50:50 | <Jake> | As of May 9th, that response was being returned for some people already. https://webcache.googleusercontent.com/search?q=cache:LdFpYwBRe5IJ:https://www.tuugo.in/Companies/rr-technosoft/0150008336586&cd=10&hl=en&ct=clnk&gl=us |
| 03:51:11 | | decky_e quits [Ping timeout: 252 seconds] |
| 03:51:49 | | decky_e (decky_e) joins |
| 03:56:48 | <@JAA> | I did see that 'Excessive Usage Error' on some search results as well. |
| 03:57:18 | <nicolas17> | yeah I'm sure that was automated and affecting my IP alone, and the limit has been there for a while |
| 03:57:32 | <nicolas17> | the error page had "Last-Modified: Fri, 10 Dec 2010" |
| 03:58:39 | <nicolas17> | but it feels like after I was already blocked, someone took manual action |
| 04:00:12 | | decky_e quits [Remote host closed the connection] |
| 04:00:30 | <@JAA> | It's timing out for me now. |
| 04:01:08 | <nicolas17> | I still get connection refused, but sometimes it takes several seconds |
| 04:02:07 | <@JAA> | And now I'm able to connect, but the server doesn't respond to the HTTP request. |
| 04:02:47 | <nicolas17> | http://rrpicturearchives.net/pictures/147000/thumbnails/20221231_154825.jpg here's a picture link, which should bypass the ASP.NET crap |
| 04:02:49 | <@JAA> | Ah there we go, got a response again. |
| 04:02:57 | <Jonimus> | It just worked for me, so I suspect the issue is multiple people were trying to download stuff and the firewall or manual intervention is happening. |
| 04:04:05 | <nicolas17> | everything works again now |
| 04:04:12 | <nicolas17> | I'm *not* going to do that request rate again |
| 04:04:16 | <nicolas17> | geez |
| 04:06:10 | <nicolas17> | okay WOW |
| 04:06:25 | <nicolas17> | JAA: |
| 04:06:27 | <nicolas17> | - <img src="/pictures\147904\thumbnails\050921 Perry (41).JPG" border="0" alt="UP 5488"> |
| 04:06:28 | <nicolas17> | + <img src="http://s3.amazonaws.com/rrpa_photos/147904/thumbnails/050921 Perry (41).JPG" border="0" alt="UP 5488"> |
| 04:07:09 | <Jonimus> | Wait is it actively being moved to s3 or something? |
| 04:07:26 | <@JAA> | Huh |
| 04:08:05 | <@JAA> | Yeah, looks like it. |
| 04:08:29 | <Jonimus> | Or maybe some sort of s3 using caching setup is being used? |
| 04:09:23 | <pokechu22> | Better local to s3 than the other way around :P |
| 04:09:28 | <@JAA> | This would be a weird way of doing it but certainly not the weirdest. |
| 04:09:57 | <@JAA> | Shedding load would be another possibility. |
| 04:10:04 | <@JAA> | But again, weird. |
| 04:10:16 | <@JAA> | Serving static files is one of the easiest things a web server can do. |
| 04:10:30 | <@JAA> | Maybe IIS sucks at that though, who knows. It's Microsoft, after all. |
| 04:11:48 | <nicolas17> | well |
| 04:12:16 | <nicolas17> | the S3 image above |
| 04:12:22 | <nicolas17> | Date: Mon, 22 May 2023 04:12:08 GMT |
| 04:12:24 | <nicolas17> | Last-Modified: Mon, 22 May 2023 04:03:01 GMT |
| 04:12:33 | <nicolas17> | sure looks like it was recently uploaded to S3 |
| 04:13:40 | <Jonimus> | The dartpoints or whateve that owns the IP is some cloudy "edge colocation |
| 04:14:05 | <Jonimus> | service, it could easly be some system they have doing the load shedding or whatever. |
| 04:14:39 | | dumbgoy quits [Ping timeout: 265 seconds] |
| 04:16:01 | <Jake> | That's...w eird... |
| 04:18:39 | <Jonimus> | Yeah it is weird to move things from local to s3, that said the does mean the paths are now forward slashs which is I think better for your tools isn't it? |
| 04:18:53 | <nicolas17> | not *all* were moved |
| 04:19:03 | <nicolas17> | in fact I even see albums with a mix |
| 04:19:12 | <@JAA> | The other possibility is that they're actively migrating everything to a new site or similar. |
| 04:19:59 | <nicolas17> | JAA: from the initial description of the situation, I didn't expect to find someone alive to do that |
| 04:20:07 | <@JAA> | Yeah |
| 04:25:16 | <Jonimus> | Unless the tim guy is doing it, he may not have had access to the domain to renew it but may still be trying to keep the site up, and messing with it for that reason? |
| 04:25:36 | <nicolas17> | yeah probably him |
| 04:26:07 | <nicolas17> | if he's aware the domain expired *and* that some people are still accessing (someone uploaded new pictures a few hours ago!), he should put some notice on the front page... |
| 04:27:01 | <Jonimus> | You'd think he'd have an admin account since he designed it but maybe he doesn't for reasons. |
| 04:28:00 | <Jonimus> | Like depending on how the site was built in 2001 it may not be the easy to just update the homepage. |
| 04:28:41 | <@JAA> | 'Please install Microsoft FrontPage 2000' |
| 04:29:11 | <Jonimus> | Wait the main pages "updated photo albums" all list today. |
| 04:29:52 | <Jonimus> | http://208.69.231.186/archiveList.aspx?Sort=dtUpdateDate |
| 04:30:38 | <nicolas17> | Jonimus: http://208.69.231.186/archivethumbs.aspx?id=147904 this album got new pictures *after* you told us about the site and I started looking into it |
| 04:31:22 | <Jonimus> | Yeah so either people are uploading photo's by connecting via the IP or their DNS hadn't updated. |
| 04:31:42 | <Jonimus> | I don't think there were any like phone apps or similar. |
| 04:32:01 | <nicolas17> | if there was a phone app, I'm sure it would depend on the domain working... |
| 04:32:24 | <Jonimus> | You'd think. |
| 05:18:00 | | decky_e joins |
| 05:31:22 | | woans (WOANS) joins |
| 05:54:50 | <@arkiver> | nicolas17: what is tb2b? |
| 05:55:04 | <@arkiver> | JAA: is rrpicturearchives.net somethig for archivebot? |
| 05:55:07 | <nicolas17> | arkiver: https://uk.dynabook.com/generic/general-new-ftp-and-software-guide-sheets/ this FTP |
| 05:55:40 | <@arkiver> | nicolas17: do you perhaps have a tl;dr of the above conversation? |
| 05:55:55 | <nicolas17> | regarding rrpicturearchives? |
| 05:56:26 | <@JAA> | arkiver: rrpicturearchives.net uses backslashes in its image URLs, which fail on AB. Also, the domain expired, so DNS trickery is needed if we want it under the domain rather than the IP. |
| 05:56:36 | <@arkiver> | yeah |
| 05:56:50 | <@arkiver> | it clearly shows a godaddy page on http://rrpicturearchives.net/archivethumbs.aspx?id=147904 |
| 05:57:02 | <@arkiver> | (without altering DNS results) |
| 05:57:35 | <nicolas17> | yeah what JAA said |
| 05:57:40 | <@arkiver> | nicolas17: if you have the only copy of that, please upload it ot IA |
| 05:57:52 | <@arkiver> | on rrpicturearchives.net - what exactly did it hold? |
| 05:58:04 | <@JAA> | Millions of train photos |
| 05:58:12 | <nicolas17> | arkiver: of the tb2b FTP? no, it was archivebot'd successfully |
| 05:58:18 | <@arkiver> | nicolas17: ah good |
| 05:58:35 | <@JAA> | I can throw http://208.69.231.186/ into AB and then deal with the backslashes when it finishes, I guess. |
| 05:58:51 | <@JAA> | Unless we want to do DNS fuckery and archive it under the expired domain. |
| 05:58:51 | <nicolas17> | and when I remember I "rclone sync" against my local copy and I haven't seen the contents actually change |
| 05:58:52 | <fireonlive> | 1-2 registrars allow anyone to renew any domain but sadly not godaddy it seems |
| 05:59:24 | <@arkiver> | we _could_ archive it under the expired domain, but that will not go into the Wayback Machine |
| 05:59:25 | <@JAA> | fireonlive: I'd like to know which ones, so I can add them to my 'never, ever use' list of registrars. |
| 05:59:36 | <@arkiver> | JAA: nicolas17: i'd say, get http://208.69.231.186/ |
| 05:59:44 | <@arkiver> | as that IP, not under the old domain. |
| 05:59:59 | <fireonlive> | from what i quickly found it's just Hover |
| 06:00:05 | <@arkiver> | archives of that IP (without DNS trickery) can go into the Wayback Machine |
| 06:00:06 | <@JAA> | arkiver: There is precedence for such archives going into the WBM. But it's not ideal, sure. |
| 06:00:08 | <@arkiver> | DNS trickery cannot |
| 06:00:37 | <fireonlive> | customers were like 'i'm locked out of my account' or 'person X is unavailable' so they were like 'sure we'll take the money but the owner retains control' |
| 06:00:45 | <@arkiver> | JAA: is it small (and easy) enough for archivebot? |
| 06:01:15 | <nicolas17> | there may be 6 million photos in rrpicturearchives |
| 06:01:33 | <@JAA> | 5.6M is what the homepage says, yeah. |
| 06:01:48 | <@arkiver> | oh they have the easy sequential IDs |
| 06:02:05 | <@JAA> | The content can mostly be gotten easily with AB, yeah. |
| 06:02:17 | <@arkiver> | can we just !a < a list of these URLs in archivebot together with the main page? |
| 06:02:21 | <@JAA> | Some of the navigation is ASP.NET POST nonsense. |
| 06:02:21 | <nicolas17> | JAA: I was going off a photo ID being 6025791, but there may be gaps I guess |
| 06:03:14 | <@JAA> | Yeah, the IDs go higher. Probably some deleted stuff etc. |
| 06:03:34 | <@JAA> | arkiver: Yeah, will do that shortly. |
| 06:04:57 | <@JAA> | What's the highest locomotive ID? |
| 06:05:58 | <nicolas17> | as in locoPicture.aspx? |
| 06:06:02 | <@JAA> | Yeah |
| 06:06:13 | <@JAA> | Oh nice, capitalisation bullshit from Microsoft, of course. |
| 06:06:37 | <@JAA> | Locopicture.aspx and LocoPicture.aspx appear in links across the site. |
| 06:06:42 | <nicolas17> | /o\ |
| 06:06:51 | <@arkiver> | luckily the Wayback Machine handles that :P |
| 06:07:02 | | @JAA slaps arkiver around a bit with a large trout |
| 06:07:17 | <@JAA> | AB doesn't, anyway, so it will retrieve those things multiple times. |
| 06:07:42 | <@arkiver> | maybe we should fix AB? :P |
| 06:08:09 | <@arkiver> | (that was a joke - to be clear) |
| 06:08:29 | <fishingforsoup__> | I need help finding some YouTube videos. |
| 06:08:35 | <fishingforsoup__> | https://www.youtube.com/watch?v=JDyCsTDoKc0, https://www.youtube.com/watch?v=yj8MTPX8zDE, https://www.youtube.com/watch?v=2L5dfunuF6g, and https://www.youtube.com/watch?v=XFMS0Hr1Ub4. |
| 06:08:41 | <@JAA> | I mean, we should, URL rewriting has been on my wishlist for a while. |
| 06:11:07 | <@JAA> | Looks like the highest loco ID is slightly above 265700. |
| 06:11:29 | <@JAA> | Nope, exactly that. |
| 06:12:51 | <nicolas17> | I just tried 265700..265800 and 265700 was the only successful one |
| 06:12:54 | <nicolas17> | so yes |
| 06:13:26 | <@arkiver> | i need to open source my ID range scanning thing some time soon |
| 06:13:30 | <@arkiver> | it can be used for this stuff |
| 06:13:43 | <nicolas17> | binary search? |
| 06:13:47 | <@arkiver> | though basically it's the same as is in the telegram-grab Lua code |
| 06:13:53 | <@arkiver> | nicolas17: what? |
| 06:14:12 | <@arkiver> | and the same thing that is used in #telegrab to find highest ID for channels that don't have a public index |
| 06:14:13 | <nicolas17> | I guess you do something like binary search to find the last valid ID? |
| 06:14:20 | <@JAA> | I'll first do the photo pages in random order, then albums, then locomotives. |
| 06:14:30 | <@arkiver> | nicolas17: sort of |
| 06:14:32 | <@arkiver> | not exactly |
| 06:14:44 | <@JAA> | nicolas17: You can't do a binary search if you don't know the possible upper end. And you can't simply start at a gazillion because it takes too long then. |
| 06:14:46 | <nicolas17> | yeah, if you have to cope with potential gaps it's trickier |
| 06:14:57 | <@JAA> | Gaps as well, yeah. |
| 06:21:17 | <@JAA> | The 'Excessive Usage Error' will be annoying since it's served with HTTP 200. |
| 06:22:10 | <@JAA> | Anyway, AB job is started. |
| 06:25:15 | | Island quits [Read error: Connection reset by peer] |
| 06:36:45 | <masterX244> | JAA: ASP post request pagination is the pest... had that shit once, too when grabbing the tm-exchange sites. Bonus: bugged server where you get a 500 in the middle and unloadable pages + a ipban. Had to resort to TOR for bruteforcing id-s to do a crawl from a fresh IP |
| 06:38:34 | <@JAA> | For the record: `{ echo http://208.69.231.186/; seq 6025791 | shuf | sed 's,^,http://208.69.231.186/showPicture.aspx?id=,'; seq 147920 | shuf | sed 's,^,http://208.69.231.186/archiveThumbs.aspx?id=,'; seq 265700 | shuf | sed 's,^,http://208.69.231.186/locoPicture.aspx?id=,'; } | zstd -10` |
| 06:48:44 | <nicolas17> | does archivebot use warriors or otherwise parallel requests across IPs, or is it like 1 machine? |
| 06:51:59 | <Maakuth|m> | there is a handful of servers sharing the load |
| 06:57:41 | <@JAA> | Each job is a single process on a single machine. |
| 06:59:31 | | Arcorann (Arcorann) joins |
| 07:00:02 | | nfriedly quits [Remote host closed the connection] |
| 07:05:39 | <masterx244|m> | the warrior is the "big gun", we only use it on major targets |
| 07:05:51 | <masterx244|m> | usually big sites that can bear the load |
| 07:24:23 | | vantec quits [Read error: Connection reset by peer] |
| 07:31:04 | <@JAA> | nicolas17: Welp, the AB job is also in 'Excessive Usage Error' hell now. |
| 07:31:59 | <@JAA> | I wonder whether it really is a daily limit or not. |
| 07:39:25 | <@JAA> | And now it's getting ECONNRESET. |
| 07:45:16 | <masterx244|m> | drats... |
| 07:45:51 | <masterx244|m> | sometimes sites are more triggerhappy if you poke on too many 404 or 403s |
| 07:51:15 | | decky_e quits [Read error: Connection reset by peer] |
| 08:02:15 | | decky_e (decky_e) joins |
| 08:13:36 | | vantec (vantec) joins |
| 08:19:54 | <@JAA> | It happened after pretty much exactly 10k requests. |
| 08:27:50 | | woans quits [Ping timeout: 252 seconds] |
| 08:28:24 | <masterx244|m> | 603 chunks on showPicture if splitting it in 10k blocks; 15 chunks on archiveThumbs and 27 chunks on locoPicture; maybe someone with the crazy clusters can move the stuff around between his IPs so after each IP gets burned it gets continued at a fresh one |
| 08:57:08 | | Dango360_ quits [Read error: Connection reset by peer] |
| 09:17:48 | | nfriedly joins |
| 10:37:39 | | icedice2 quits [Client Quit] |
| 10:38:10 | | icedice (icedice) joins |
| 10:40:34 | | trumad|m joins |
| 10:43:37 | <trumad|m> | apologies for post non-urgent stuff in the main channel. I'm still getting used to how things work |
| 11:04:35 | | PredatorIWD_ joins |
| 11:04:35 | | Ivan22 joins |
| 11:04:52 | | pikabluu joins |
| 11:04:52 | | superkuh_ joins |
| 11:04:53 | | vantec_ joins |
| 11:04:56 | | rohvani9 joins |
| 11:05:02 | | marto_3 joins |
| 11:05:03 | | fullpwn joins |
| 11:05:10 | | lflare is now authenticated as * |
| 11:05:10 | | lflare quits [Killed (ing.hackint.org (Nickname regained by services))] |
| 11:05:11 | | lflare (lflare) joins |
| 11:05:12 | | Letur6 joins |
| 11:05:20 | | wyatt8750 joins |
| 11:05:29 | | monoxane quits [Client Quit] |
| 11:05:29 | | marto_ quits [Client Quit] |
| 11:05:29 | | Letur quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | fullpwnmedia quits [Remote host closed the connection] |
| 11:05:29 | | rr quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | tbc1887 quits [Remote host closed the connection] |
| 11:05:29 | | s-crypt quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | wyatt8740 quits [Quit: ZNC got killed or something else has gone wrong, probably.] |
| 11:05:29 | | kiska quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | apache2 quits [Remote host closed the connection] |
| 11:05:29 | | pikablu quits [Remote host closed the connection] |
| 11:05:29 | | nic quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | Ivan226 quits [Remote host closed the connection] |
| 11:05:29 | | vantec quits [Remote host closed the connection] |
| 11:05:29 | | Ryz2 quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | superkuh quits [Remote host closed the connection] |
| 11:05:29 | | PredatorIWD quits [Remote host closed the connection] |
| 11:05:29 | | rohvani quits [Quit: Ping timeout (120 seconds)] |
| 11:05:29 | | Letur6 is now known as Letur |
| 11:05:29 | | marto_3 is now known as marto_ |
| 11:05:29 | | rohvani9 is now known as rohvani |
| 11:05:48 | | apache2 joins |
| 11:05:54 | | tbc1887 (tbc1887) joins |
| 11:06:13 | | kiska (kiska) joins |
| 11:13:46 | | Arcorann quits [Ping timeout: 252 seconds] |
| 11:33:34 | | dumbgoy joins |
| 11:53:21 | | pikabluu quits [Read error: Connection reset by peer] |
| 12:57:20 | | HP_Archivist (HP_Archivist) joins |
| 13:32:55 | | killsushi joins |
| 13:39:49 | <h2ibot> | Bzc6p edited EOldal (+353, /* Archiving */ Archives finished uploading): https://wiki.archiveteam.org/?diff=49817&oldid=49430 |
| 13:41:49 | <h2ibot> | Bzc6p edited EOldal (+72, Add link to archives to infobox): https://wiki.archiveteam.org/?diff=49818&oldid=49817 |
| 13:42:30 | | hitgrr8 joins |
| 13:45:48 | <Jonimus> | JAA: apparently people got ahold of the Tim and he has taken over the site and apparently gotten ahold of the domain. |
| 13:46:13 | <Jonimus> | So crisis averted I guess? Though a good backup might still be worth while. |
| 13:55:52 | <h2ibot> | Bzc6p edited Kepfeltoltes.eu (+126, Added links to archives): https://wiki.archiveteam.org/?diff=49819&oldid=49447 |
| 14:01:39 | | aismallard quits [Remote host closed the connection] |
| 14:01:39 | | phuzion quits [Remote host closed the connection] |
| 14:02:43 | | phuzion (phuzion) joins |
| 14:02:49 | | aismallard joins |
| 14:27:30 | <@JAA> | Jonimus: Nice, but yeah, agreed. |
| 14:28:08 | <Jonimus> | At least now you can just use the domain instead of hitting the IP directly. |
| 14:29:42 | <@JAA> | Well, once the DNS propagates, at least. |
| 14:30:41 | <@JAA> | The 10k reqs/day limit is still going to be a pain though. |
| 14:35:44 | | Island joins |
| 15:24:11 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 15:27:07 | | spirit quits [Client Quit] |
| 15:30:31 | | nostalgebraist joins |
| 15:30:47 | | decky_e quits [Ping timeout: 252 seconds] |
| 15:34:20 | | nicolas17 joins |
| 16:00:13 | <h2ibot> | JAABot edited CurrentWarriorProject (+6): https://wiki.archiveteam.org/?diff=49820&oldid=49760 |
| 16:04:52 | | decky_e (decky_e) joins |
| 16:24:21 | | datechnoman quits [Quit: Ping timeout (120 seconds)] |
| 16:34:24 | | datechnoman (datechnoman) joins |
| 17:21:47 | | chrismeller quits [Client Quit] |
| 17:21:52 | | chrismeller6 (chrismeller) joins |
| 17:22:09 | | chrismeller6 is now known as chrismeller |
| 17:28:39 | | rhodez joins |
| 17:37:15 | <pokechu22> | rhodez: logs at https://hackint.logs.kiska.pw/archiveteam-bs/20230522 |
| 17:37:17 | | icedice quits [Ping timeout: 252 seconds] |
| 17:37:22 | <rhodez> | Thank you |
| 17:41:11 | | Tom|m12 joins |
| 17:47:40 | | icedice (icedice) joins |
| 18:15:00 | <nicolas17> | JAA: I'm still blocked so I'm pretty sure rrpicturearchive's block really is daily |
| 18:15:28 | <@JAA> | :-| |
| 18:15:47 | <nicolas17> | "It happened after pretty much exactly 10k requests." that's good to know, I thought of burning my VPS IP doing requests to figure out what the limit was, now I won't have to :P |
| 18:16:02 | <@JAA> | Yeah, not going to happen with AB then obviously. |
| 18:22:23 | <Jake> | There's an impressive amount of spam on the Curseforge forums |
| 18:32:00 | | rageear joins |
| 18:42:27 | | rageear is now authenticated as rageear |
| 18:49:19 | <Jake> | Some content also appears to be behind a login wall? 🤔 https://minecraft.curseforge.com/forums/modding-java-edition/modpacks/modpack-discussion/magic-farm-3-harvest/questions/general/794-can-you-plant-magic-beans |
| 19:02:56 | | Ketchup901 quits [Ping timeout: 245 seconds] |
| 19:03:19 | | Ketchup901 (Ketchup901) joins |
| 19:30:11 | | nostalgebraist quits [Client Quit] |
| 19:43:42 | | spirit joins |
| 19:51:11 | <nicolas17> | (still blocked on rrpa) |
| 20:19:19 | | sonick quits [Client Quit] |
| 20:24:13 | | rhodez quits [Ping timeout: 265 seconds] |
| 20:29:49 | | rhodez joins |
| 20:41:21 | | superkuh_ quits [Client Quit] |
| 20:42:25 | | Lambro_D joins |
| 20:42:53 | | Unholy2361 quits [Remote host closed the connection] |
| 20:43:41 | | Unholy2361 (Unholy2361) joins |
| 20:48:48 | | katocala quits [Remote host closed the connection] |
| 21:00:32 | | ThetaDev quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
| 21:05:37 | | hitgrr8 quits [Client Quit] |
| 21:06:27 | | woans (WOANS) joins |
| 21:08:37 | <icedice> | Are shallow WARCs what you get from running !ao ? |
| 21:09:16 | <@JAA> | Yep |
| 21:13:06 | <nicolas17> | still blocked on rrpa, will it reset at midnight UTC or another timezone? we'll see |
| 21:14:30 | <nicolas17> | 10k/day means ~9 sec between requests per IP |
| 21:14:51 | <nicolas17> | harold-pain.png |
| 21:14:53 | <@JAA> | Yeah, we need 650 IP-days to grab it. |
| 21:15:06 | <nicolas17> | (lol IP-days) |
| 21:15:18 | <@JAA> | So if someone has a /24, it can be done in a couple days. |
| 21:15:31 | <@JAA> | Assuming they only have bans per IP, anyway. |
| 21:16:38 | <nicolas17> | I suspect (but I haven't tested it) that the limit is on aspx and doesn't apply to jpg downloads |
| 21:17:48 | <nicolas17> | and if you archive albums alone, it's enough to get the image URLs (albums have thumbnails, but you can easily infer the full-image URL from that), so in theory we could do that to get albums and jpg files, and get showPicture.aspx later |
| 21:18:11 | <@JAA> | Can confirm, images don't get blocked. |
| 21:18:36 | <nicolas17> | however I don't know how many requests are needed to get all albums; the highest album ID was 147920 yesterday, but some are paginated, so it's more than that |
| 21:18:41 | <@JAA> | Or at least not at that draconian limit. |
| 21:29:08 | <chrismeller> | is there a warrior project for this yet? |
| 21:29:31 | | fishingforsoup__ is now authenticated as fishingforsoup |
| 21:30:27 | <nicolas17> | chrismeller: no |
| 21:30:48 | <nicolas17> | we should archive it, but today it turned less urgent |
| 21:32:56 | <chrismeller> | less urgent? |
| 21:33:14 | <nicolas17> | <Jonimus> apparently people got ahold of the Tim and he has taken over the site and apparently gotten ahold of the domain. |
| 21:33:27 | <chrismeller> | oohhh, ok. well that's good. |
| 21:33:43 | <chrismeller> | i was looking at the number of posts per day and it was crazy :D |
| 21:33:51 | <chrismeller> | glad someone will be maintaining it |
| 21:34:36 | <nicolas17> | I wonder if he had to overpay for the domain... |
| 21:35:32 | <chrismeller> | the domain is only a piece of the puzzle, though |
| 21:35:49 | <nicolas17> | I think this Tim guy already had control over the server? |
| 21:36:15 | <chrismeller> | ah, ok. i thought the original maintainer had died |
| 21:36:58 | <nicolas17> | me too, until yesterday during the scraping I saw stuff that definitely looked like a human actively messing with the server |
| 21:37:14 | <nicolas17> | (scraping = exploration, I don't have anything of archival quality) |
| 21:38:10 | <chrismeller> | well i'm not a train guy, but i really found their corpus of train imagery amazing |
| 21:38:40 | <nicolas17> | these people obsessed with specific topics make the internet go round |
| 21:39:18 | <chrismeller> | "nerds on the internet" as they say :) |
| 21:39:45 | <nicolas17> | I say as I carefully curate the contents of https://theapplewiki.com/wiki/Firmware/iPhone/16.x |
| 21:39:46 | <nicolas17> | ~~when I should be studying~~ |
| 21:40:28 | <chrismeller> | what are you studying for? |
| 21:41:16 | <nicolas17> | I signed up for an online DevOps course and I'm not giving it as much time and attention as I should |
| 21:41:43 | <chrismeller> | yadda yadda yadda kubernetes |
| 21:43:18 | <nicolas17> | yadda yadda devops is not just a set of tech tools, please get 'dev' and 'ops' people to talk to each other |
| 21:43:41 | | decky_e quits [Ping timeout: 252 seconds] |
| 21:44:06 | | decky_e (decky_e) joins |
| 22:01:25 | | katocala joins |
| 22:02:07 | | katocala is now authenticated as katocala |
| 22:18:18 | <fireonlive> | instead of screaming matches |
| 22:33:17 | | Dango360 (Dango360) joins |
| 22:43:31 | | Iki1 joins |
| 22:44:11 | | decky_e quits [Ping timeout: 252 seconds] |
| 22:46:56 | | AnotherIki quits [Ping timeout: 252 seconds] |
| 22:47:24 | <fireonlive> | does AT have an official view of cloudflare/ddos-guard and the like :p |
| 22:47:54 | <nicolas17> | afaik cloudflare is well known for being a pain in the ass for archiving |
| 22:48:51 | <icedice> | If you have a sympathetic site admin they can whitelist ArchiveBot IPs in CloudFlare |
| 22:49:07 | <icedice> | One of the sites we're archiving is such a case |
| 22:58:46 | <andrew> | I've been running a grab-site against a Cloudflare-protected site for a few days by now, I guess a lot of it is down to how the owner configured it |
| 23:00:32 | <icedice> | Yeah, and probably the ASN and IP of your server/VPS is on |
| 23:01:27 | <nicolas17> | there's an "I'm under attack" mode in cloudflare that really tunes up the restrictions |
| 23:01:31 | <icedice> | If you're on a highly abused ASN like Frantech (BuyVM), OVH, or ColoCrossing, they're probably not going to be as nice |
| 23:02:01 | <icedice> | M247 also has a pretty bad rep, I think |
| 23:02:02 | <Doranwen> | nicolas17: "these people obsessed with specific topics" is like all of Yahoo Groups, lol - I've lost track of the weird and/or highly specific groups I've stumbled across so far while sorting the metadata |
| 23:02:27 | <icedice> | And Reddit |
| 23:02:33 | <icedice> | There's a subreddit for everything |
| 23:03:20 | <Doranwen> | I think Yahoo Groups was even more so. There are some *really* weird and esoteric groups out there. (Plus a few just plain brain-splodey "what on EARTH" types.) |
| 23:03:29 | <nicolas17> | someone said wikipedia is powered by neurodiversity... |
| 23:03:43 | <Doranwen> | (Or "were" might be the better verb tense, but since I'm looking at them all the time, it feels present.) |
| 23:10:39 | <@JAA> | Yes, it's all about the site owner's config on Buttflare. Plenty of sites use it and cause no issues, but the ones with more aggressive configs are annoying. |
| 23:16:16 | | lflare is now authenticated as * |
| 23:16:16 | | lflare quits [Killed (nuke.hackint.org (Nickname regained by services))] |
| 23:16:17 | | lflare (lflare) joins |
| 23:17:48 | <fireonlive> | ah that makes sense |
| 23:18:04 | <fireonlive> | a couple things i tried had them cranked to 11 i guess :/ |
| 23:19:46 | <@JAA> | There are three broad levels: anything goes, 'I'm under attack' mode (= JS challenge), and the hardcore mode with captchas. There's a very wide variety of options to finetune this though. |
| 23:36:35 | | BlueMaxima joins |
| 23:39:15 | <Jake> | (for anyone wondering, the Minecraft section of CurseForge is running. Will run the author one afterwards. Everything seems to be going fine this time) |
| 23:39:36 | <@JAA> | Nice! |
| 23:39:42 | | Hackerpcs quits [Quit: Hackerpcs] |
| 23:42:22 | | Hackerpcs (Hackerpcs) joins |
| 23:42:31 | | nicolas17 quits [Client Quit] |
| 23:43:37 | | Lambro_D quits [Read error: Connection reset by peer] |
| 23:44:19 | | woans quits [Ping timeout: 265 seconds] |