| 00:00:02 | <nuroten> | thanks for considering |
| 00:00:42 | | dm4v quits [Read error: Connection reset by peer] |
| 00:01:40 | <nuroten> | also, does anyone know the status of a potential backup of the June 4th Museum exhibit website https://8964museum.com/ ? |
| 00:02:10 | | dm4v joins |
| 00:02:13 | | dm4v is now authenticated as dm4v |
| 00:02:13 | | dm4v quits [Changing host] |
| 00:02:13 | | dm4v (dm4v) joins |
| 00:04:29 | <nuroten> | some pro-Beijing media have opined that online records of the June 4th Museum and vigils should be made illegal ... and the current government has developed a tendency to take instructions from such media |
| 00:11:32 | <@JAA> | neon: It should all be available in the Wayback Machine. |
| 00:14:03 | <@JAA> | nuroten: I'll run https://collection.news/ through ArchiveBot, but it's JS-heavy and likely won't work properly. |
| 00:14:43 | <@JAA> | The same thing applies to https://8964museum.com/ I suppose. Not aware of any prior attempts, but since it's even more useless than collection.news without JS, I highly doubt it'll grab much. |
| 00:15:29 | <nuroten> | JAA: thanks. that's a blow, though I expected that might be the case |
| 00:16:09 | <neon> | JAA: oh so everything is just ingested into the waybackmachine. that makes sense, ty. |
| 00:16:14 | | neon leaves |
| 00:16:47 | <nuroten> | though also hoped someone might have ideas how to get as much as possible out of them :) |
| 00:16:53 | <nuroten> | the group behind the physical museum dissolved on the 23rd, also possibly charged under NSL for foreign collusion |
| 00:17:29 | <nuroten> | so at this time it's a toss-up how long it can stay up |
| 00:17:43 | <@JAA> | Yeah, it'd probably have to be something based on a full browser. brozzler or similar. Not sure if anyone here has a working setup for that. |
| 00:18:25 | <@JAA> | Perhaps SPN is also an option, but depends on how large the site is. |
| 00:20:48 | <nuroten> | okay :) thanks for anything you can save from them |
| 00:40:41 | | lukash7 joins |
| 00:41:05 | | nuroten quits [Remote host closed the connection] |
| 00:46:29 | | archzz_ quits [Quit: leaving] |
| 01:03:14 | | dm4v quits [Read error: Connection reset by peer] |
| 01:03:34 | | dm4v joins |
| 01:03:36 | | dm4v is now authenticated as dm4v |
| 01:03:36 | | dm4v quits [Changing host] |
| 01:03:36 | | dm4v (dm4v) joins |
| 01:09:35 | | Arcorann (Arcorann) joins |
| 02:15:11 | | heart_ quits [Quit: Connection closed for inactivity] |
| 02:16:59 | <@JAA> | Eurogamer apparently handles over a billion page views per year. Means they should easily be able to handle qwarc. I'll look into that soon. |
| 02:17:09 | <@JAA> | Ryz: Please add it to the wiki. Thanks. |
| 02:17:30 | <Ryz> | Onto Deathwatch? |
| 02:17:36 | <@JAA> | Yep |
| 02:17:45 | <Ryz> | ...Also, I don't think I'm automoderated <#>; |
| 02:20:56 | <Ryz> | That kinda demotivated me from doing impulse edits from time to time~ |
| 02:22:27 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 02:22:59 | <@JAA> | Well uh, if you never make edits, you don't get automodded either. lol |
| 02:23:57 | | qwertyasdfuiopghjkl joins |
| 02:24:56 | <Ryz> | Uhh, I made an account and made edits onto the wiki before oo; |
| 02:25:28 | <@JAA> | Your last content edit was over a year ago though. |
| 02:26:36 | <@JAA> | We don't preemptively make users automodded. We do it when good edits from the same users end up in the moderation queue over and over. |
| 02:39:49 | <h2ibot> | Ryz edited Deathwatch (+160, /* 2021 */ Added forum of Eurogamer.net entry): https://wiki.archiveteam.org/?diff=47087&oldid=47076 |
| 02:44:53 | | hexa- quits [Ping timeout: 612 seconds] |
| 02:44:57 | <Ryz> | Oh, I'm also reminded why I didn't edit as much, having to check in-between what's previewed and what's edited in the wiki text stuff ><; |
| 02:45:02 | <Ryz> | That's actually a bit rough... |
| 02:51:56 | | hexa- (hexa-) joins |
| 03:00:01 | | swebb joins |
| 03:02:52 | <Ryz> | I'm gonna have to edit this entirely in Notepad++ for flexibility rasons~ |
| 03:09:35 | | hexa- quits [Ping timeout: 612 seconds] |
| 03:11:46 | | hexa- (hexa-) joins |
| 03:30:10 | | Nay quits [Quit: this quit message is a social construct] |
| 03:35:15 | <@OrIdow6> | Yeah, having to wait for it to be approved really does take motivation away |
| 03:36:53 | | hexa- quits [Ping timeout: 612 seconds] |
| 03:37:32 | | Nay (JeDa) joins |
| 03:38:45 | | Nay quits [Client Quit] |
| 03:39:27 | | Nay (JeDa) joins |
| 03:43:28 | | Nay quits [Client Quit] |
| 03:44:22 | | Nay (JeDa) joins |
| 03:44:32 | | hexa- (hexa-) joins |
| 03:44:48 | | Nay quits [Client Quit] |
| 03:45:13 | | Nay (JeDa) joins |
| 03:47:43 | | qw3rty_ joins |
| 03:51:26 | | qw3rty__ quits [Ping timeout: 250 seconds] |
| 04:47:44 | | hexa- quits [Ping timeout: 612 seconds] |
| 04:47:59 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:02:25 | | Jonboy345 joins |
| 05:03:28 | | Jonboy3451 quits [Ping timeout: 244 seconds] |
| 05:07:21 | | hexa- (hexa-) joins |
| 06:13:32 | | hexa- quits [Ping timeout: 612 seconds] |
| 06:15:25 | | hexa- (hexa-) joins |
| 06:29:08 | | hexa- quits [Ping timeout: 612 seconds] |
| 06:30:22 | | gazorpazorp quits [Remote host closed the connection] |
| 06:30:34 | | gazorpazorp (gazorpazorp) joins |
| 06:31:42 | | hexa- (hexa-) joins |
| 06:38:12 | <Ryz> | flashfire42, if talking about https://community.eurogamer.net/ - JAA is gonna qwarc it~ >#<; |
| 06:39:08 | | nicolas17 quits [Ping timeout: 250 seconds] |
| 07:40:01 | | benjins quits [Ping timeout: 244 seconds] |
| 07:41:48 | | benjins joins |
| 08:07:58 | | benjins quits [Ping timeout: 250 seconds] |
| 08:15:44 | | benjins joins |
| 08:24:16 | | BlueMaxima quits [Client Quit] |
| 08:34:47 | | benjins quits [Ping timeout: 244 seconds] |
| 08:35:42 | | benjins joins |
| 08:50:53 | | spirit joins |
| 09:41:00 | <pabs> | an electronics startup that just died: https://blog.kobol.io/2021/08/25/we-are-pulling-the-plug/ |
| 09:43:10 | <pabs> | (small site, kobol.io and 3 subdomains) |
| 09:51:44 | <AK> | Looks like Ryz and Megame ran it through yesterday. Thanks for letting us know though pabs |
| 10:02:43 | | Terbium quits [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.] |
| 10:03:08 | | Terbium joins |
| 10:14:13 | <pabs> | ah good |
| 10:23:48 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 10:28:32 | | qwertyasdfuiopghjkl joins |
| 10:43:26 | <@arkiver> | OrIdow6: thanks! i'm back from vacation later today and will then check that |
| 10:56:21 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 11:19:40 | | benjins is now authenticated as benjins |
| 11:44:03 | | Megame quits [Client Quit] |
| 12:53:38 | | fuzzy8021 quits [Ping timeout: 244 seconds] |
| 12:59:21 | | fuzzy8021 (fuzzy8021) joins |
| 13:11:33 | | Daloader joins |
| 14:07:02 | | Iki joins |
| 14:37:29 | | Iki quits [Ping timeout: 244 seconds] |
| 14:42:44 | | Daloader quits [Ping timeout: 250 seconds] |
| 15:12:23 | | nicolas17 joins |
| 15:24:56 | | IDK quits [Client Quit] |
| 15:34:35 | | IDK (IDK) joins |
| 15:34:48 | <IDK> | Wait a minute... |
| 15:34:50 | <IDK> | http://archive.today/ |
| 15:34:53 | <IDK> | is down? |
| 15:36:15 | <[42]> | redirects me to https://archive.ph/ |
| 15:36:29 | <IDK> | All of the domains are down |
| 15:36:38 | <IDK> | DNS_PROBE_FINISHED_NXDOMAIN |
| 15:37:03 | <IDK> | Down for more than a week??? |
| 15:37:31 | <[42]> | archive.ph loads fine for me |
| 15:37:44 | <@OrIdow6> | IDK: archive.unreliable blocks Cloudflare DNS because the owner has a grudge against them |
| 15:37:55 | <IDK> | https://archive.li/ |
| 15:37:59 | <IDK> | This is up |
| 15:40:39 | <[42]> | supposedly the owner wants to do geo balancing via dns which is not reliable through cloudflare dns as they don't pass information or just wrong information about the origin of the query so they got blocked by archive.today |
| 15:41:19 | <[42]> | probably more to the story, here's some statement from them https://twitter.com/archiveis/status/1018691421182791680 |
| 15:51:38 | | Arcorann quits [Ping timeout: 250 seconds] |
| 16:31:10 | <@JAA> | Ryz: It should still be run through AB as well though. As always, I'll only grab the actual thread pages, no images, avatars, etc. with qwarc. |
| 16:59:28 | | sec^nd quits [Remote host closed the connection] |
| 16:59:49 | | sec^nd (second) joins |
| 17:16:45 | | dserve joins |
| 17:17:14 | <dserve> | hey, do we have any status on anyone working on demodrop? |
| 17:17:35 | <dserve> | would love to help, wrote a (small scale and simple) proof of concept that downloading works |
| 17:17:36 | <dserve> | https://paste.ee/p/EsfS1 |
| 17:17:51 | <@JAA> | I threw it into ArchiveBot a few days ago, but it got banned almost immediately and was still banned as of a few hours ago. |
| 17:18:23 | <dserve> | why? |
| 17:18:29 | <dserve> | :-/ |
| 17:23:40 | <@JAA> | Also, I'm getting various TLS errors from stream.demodrop.com. |
| 17:26:54 | | Daloader joins |
| 17:27:12 | | Daloader quits [Remote host closed the connection] |
| 17:31:48 | <Ryz> | JAA, in that case, need to find a good holding pipeline, might be ananiel <#>; |
| 17:33:09 | <@JAA> | Ryz: It's not that big really. There aren't any per-post links, for example. |
| 17:33:21 | <Ryz> | O#o; |
| 17:33:34 | <@JAA> | Well, there are on the profile pages, but those are not on the community subdomain. |
| 17:33:41 | <Ryz> | Hmm... |
| 17:33:52 | <@JAA> | And only for the most recent few posts from each user. |
| 17:49:04 | | balrog quits [Ping timeout: 250 seconds] |
| 17:57:55 | | nertzy_ joins |
| 18:12:54 | | nertzy_ quits [Client Quit] |
| 18:12:57 | | systwi_ (systwi) joins |
| 18:13:46 | | systwi quits [Ping timeout: 250 seconds] |
| 18:50:59 | <Ryz> | IDK brought up something interesting on whether to archive scam websites in ArchiveBot~ |
| 18:51:51 | <IDK> | Like they are 10/11 going to be deleted at some point |
| 18:52:13 | <IDK> | But its intention |
| 18:53:18 | <Ryz> | That's true; in that case, it's more on how to find them, which seems it might be spam websites or something unsavory oo; |
| 18:53:51 | <IDK> | Like they get sent through discord |
| 18:54:08 | <IDK> | And people who check back on 5 days thinking its real |
| 18:54:21 | <IDK> | Go on wayback machine, check the archive |
| 18:54:54 | <IDK> | And found the contact and ended up DMing them about the winning stuff |
| 18:55:06 | <IDK> | So im not sure on that one |
| 18:55:35 | <Ryz> | Also would count archiving new websites that pop up, earliest grab possible, the beginnings o: |
| 18:56:07 | <IDK> | Im not sure |
| 18:58:10 | <IDK> | socialbot: snscrape twitter-user D4T4ENJYR |
| 19:09:09 | | bsmith093 joins |
| 19:09:09 | | bsmith093 is now authenticated as bsmith093 |
| 19:13:05 | | bsmith093 quits [Client Quit] |
| 19:16:40 | | Iki joins |
| 19:25:07 | | balrog (balrog) joins |
| 19:35:23 | <AK> | Another option is look at tweets to namecheap+other hosts |
| 19:35:31 | <AK> | They get a lot of "Delete this domain, it's spam" tweets |
| 19:46:27 | | Iki quits [Ping timeout: 244 seconds] |
| 20:06:31 | | Megame (Megame) joins |
| 20:14:46 | | C4K3 joins |
| 20:14:46 | | C4K3 is now authenticated as C4K3 |
| 20:58:14 | | Iki joins |
| 21:20:36 | <spirit> | https://www.4players.de/ will shut down soon, in case it is not known already |
| 21:20:41 | <spirit> | good old gaming site |
| 21:32:14 | | Lord_Nightmare quits [Ping timeout: 250 seconds] |
| 21:32:57 | | Lord_Nightmare (Lord_Nightmare) joins |
| 21:52:40 | | DogsRNice (Webuser299) joins |
| 21:54:32 | <dserve> | JAA The TLS errors on stream.demodrop.com seem to not matter as it's AWS/Amazon hosting |
| 21:54:50 | <dserve> | If we could decipher the ID:s they use to store the files it would make things much easier (bypass the download.php on demodrop.com) |
| 21:55:05 | <dserve> | But the download.php currently leads to those ID:s/folder paths using the ID |
| 21:55:41 | <@JAA> | dserve: I don't mean the expired certificate. I mean actual TLS alerts. |
| 21:57:58 | <@JAA> | E.g. on https://demodrop.com/download.php?track_id=608866 I get (after the redirect) this with `curl -skvL`: * TLSv1.2 (IN), TLS alert, close notify (256): |
| 21:59:11 | <dserve> | Oh yeah, I saw those while doing my test-crawl (mass-downloading) |
| 21:59:13 | <@JAA> | Also, we can't bypass download.php since it uses signed S3 URLs. You get a 403 without the signature. |
| 21:59:25 | <dserve> | the site is partially broken so I guess it can be explained by force majure (broken site) lol |
| 21:59:34 | <dserve> | Oh gotcha |
| 22:00:46 | <@JAA> | Yeah, guess so, although it's weird since some tracks work fine and others break like that, but all of them are new. |
| 22:01:02 | <@JAA> | On another note, I get permission errors on most download.php requests. |
| 22:01:03 | | balrog quits [Client Quit] |
| 22:01:16 | <@JAA> | Specifically this: <script>top.alert('Sorry, it seems like you don't have permission to download this track, make sure you're logged in on the right account');</script> |
| 22:01:17 | <dserve> | You need to set a cookie in order to downloa |
| 22:01:39 | <dserve> | Basically, make an account, and copy the cookie's content and set it using curl either a file's content or in-line |
| 22:01:48 | <dserve> | Example: |
| 22:01:59 | <@JAA> | Well, some tracks work fine for me with cookies disabled in uMatrix. |
| 22:02:20 | <dserve> | curl -L -k https://demodrop.com/download.php?track_id=607632 -o 607632.mp3 -b "__stripe_mid=123;__stripe_sid=123;access_code=123;user_name=JAA |
| 22:02:26 | <dserve> | " |
| 22:03:50 | <@JAA> | I like that their help/FAQ link is dead, by the way. :-) |
| 22:03:57 | <dserve> | Those which work without cookies are public downloads, which is like 1/5 of the public songs |
| 22:04:11 | <@JAA> | I see. |
| 22:07:45 | <@JAA> | Looks like streaming works for way more tracks without an account. POST request to the API, then a simple GET to stream.demodrop.com for the MP3. |
| 22:08:01 | <dserve> | Yeah, the site is broken |
| 22:08:06 | <dserve> | It's shutting down <2 months |
| 22:08:11 | <dserve> | Hence why I'm here trying to solve this |
| 22:08:19 | <@JAA> | I saw on the API that it has separate privacy settings for listening and downloading. |
| 22:08:27 | <dserve> | How do you check that? |
| 22:08:37 | <@JAA> | 607632 is "listen": "public", "download": "shared", for example. |
| 22:08:43 | <dserve> | oh yeah |
| 22:08:47 | <@JAA> | https://demodrop.docs.apiary.io/ |
| 22:08:53 | <@JAA> | The /v1.0/tracks/id endpoint. |
| 22:09:10 | <@JAA> | But the API requires auth as well. |
| 22:11:03 | <@JAA> | Hmm, there's a sample key in the docs, and it works. I wonder what its rate limits are... :-) |
| 22:12:04 | <dserve> | It has no rate limits |
| 22:12:16 | <dserve> | I accidentially crashed the site when I tried downloading using 500 workers earlier |
| 22:12:20 | <dserve> | It went down for 15 min |
| 22:12:40 | <dserve> | It's a nightmare setup-wise so I have limit my own servers to not overload it |
| 22:12:42 | <@JAA> | Uh, download.php or the API? |
| 22:12:49 | <dserve> | Download.php (the site demodrop.com) |
| 22:13:04 | <@JAA> | Yeah right, I'm talking about the API, to get song metadata. |
| 22:13:05 | <dserve> | it went 503 after it basically overloaded the web server I presume |
| 22:13:07 | <dserve> | ah |
| 22:13:28 | <@JAA> | Which would also reveal which tracks can be downloaded directly and which require an account. |
| 22:13:34 | <@JAA> | Plus which of the latter can be streamed. |
| 22:13:38 | <dserve> | The API (stream.demodrop.com) presumably don't have any limits |
| 22:13:42 | <dserve> | developer pays for amazon s3 |
| 22:13:52 | <dserve> | True. |
| 22:14:10 | <@JAA> | I'm talking of api.demodrop.com, and the sample key in the documentation really should have rate limits unless they're very incompetent. |
| 22:14:32 | <dserve> | I'd guess the site has been going down alot (the frontend can't stream / download songs) because of the lack of rate limits |
| 22:14:52 | <dserve> | it has been completely down in periods the last months too, so I'd unfortunately guess it's a project that never went outside its Docker shell |
| 22:15:03 | <dserve> | I used it very actively, it's a great site and a very big loss if not archived |
| 22:15:04 | | qwertyasdfuiopghjkl joins |
| 22:15:35 | <dserve> | So, api.demodrop.com is probably not limited in any way |
| 22:15:46 | <dserve> | I'd guess it's down atm since it doesn't work to stream / download through frontend but download.php works |
| 22:17:08 | <dserve> | I started archiving from ID 603500 upwards as a test and I'm hmm |
| 22:17:09 | <@JAA> | Looks like the website's on AWS EC2, the API's behind Cloudfront, and stream.demodrop.com is colo'd at TransIP (‽). |
| 22:17:26 | <dserve> | I'm currently up in 13GB downloaded complete mp3s |
| 22:17:41 | <dserve> | Seems correct JAA |
| 22:18:26 | <@JAA> | Well, I'll throw some load at the API later and see how that goes. I can easily go for hundreds of requests per second, so my side isn't the issue. :-P |
| 22:18:33 | <dserve> | How much storage do you have? |
| 22:18:39 | <dserve> | I'd guess this is in the TB:s |
| 22:18:45 | <dserve> | Lovely btw |
| 22:18:53 | | balrog (balrog) joins |
| 22:19:12 | <dserve> | I own two datacenter dedicated servers which could house the data, they have 8TB combined |
| 22:19:21 | <dserve> | The issue is more the method to scrape the data |
| 22:19:59 | <@JAA> | My storage situation is complicated and messy. In any case, that won't be the limiting factor. And there's still plenty of time at over 2 months, so it can be retrieved slowly as well. |
| 22:20:59 | <@JAA> | There are up to about 600k tracks, so yeah, should be a few TB probably. That's fine. |
| 22:23:31 | | Iki quits [Ping timeout: 244 seconds] |
| 22:27:51 | <nicolas17> | so all these old Xcode versions I uploaded to archive.org, they're under the 'open_source_software' IA collection because it didn't give me any better option; is that a problem, and can I do anything to change it? |
| 22:28:59 | <dserve> | JAA If you want to collab on saving this down, I'm up |
| 23:15:03 | | nertzy_ joins |
| 23:18:13 | | AntiLiberal joins |
| 23:24:44 | | nertzy__ joins |
| 23:27:04 | | nertzy_ quits [Ping timeout: 250 seconds] |
| 23:48:13 | | Arcorann (Arcorann) joins |