00:14:27<nulldata>It might be good to throw https://www.saintsrowmods.com into AB. It came to light today that the site's owner took the money and ran for development on an official SR2 PC patch a few years back. The site now seems to be partially broken and in response to the allegations the owner says he wants to transfer ownership of the site.
00:14:28<nulldata>https://www.saintsrowmods.com/forum/threads/re-flippys-video.20745/
00:35:25jasons (jasons) joins
01:21:19BlueMaxima quits [Read error: Connection reset by peer]
01:25:50krum110487 joins
01:26:05<krum110487>Here is the list of godot games https://transfer.archivete.am/14sEl4/message.txt
01:26:06<eggdrop>inline (for browser viewing): https://transfer.archivete.am/inline/14sEl4/message.txt
01:26:57<krum110487>and just in case this wasn't made clear based on the other channel
01:27:13DLoader_ (DLoader) joins
01:27:22<krum110487>A website gotm.io is shutting down in 6 days (Just announced) it seems fairly easy to scrape. It has online godot games.
01:27:22<krum110487>https://api.gotm.io/games?expand=icon,owner,pack&gameHandle=perfect-direction&icon/inlineFormat=medi...
01:28:29<@JAA>Ew, this site is script hell.
01:29:06<krum110487>yeah, it does some strange things with iframe scrdoc
01:29:15<@JAA>Is there a shutdown announcement somewhere?
01:29:22<krum110487>yeah on the site itself
01:29:40<krum110487>on the bottom right
01:29:44<krum110487>there is a toast announcment
01:29:50DLoader quits [Ping timeout: 240 seconds]
01:29:52DLoader_ is now known as DLoader
01:29:55<krum110487>"Warning: Gotm will shut down permanently on the 16th of February 2024."
01:30:11<@JAA>Huh, not showing here, weird.
01:30:39<krum110487>strange...
01:30:47<krum110487>It was posted in their discord too
01:30:50jasons quits [Ping timeout: 240 seconds]
01:31:22<@JAA>Posting it in the news section would be far too reasonable, I guess.
01:32:08<krum110487>Yeah I suppose it is strange it is not there, I can assure you it is closing.
01:32:50<krum110487>their API is cumbersome, but the method I posted above should get all of the data for every page, as blobs.
01:32:52<@JAA>Yeah, not doubting it, was just looking for something to link on Deathwatch. Oh well...
01:33:35<thuban>JAA: shall we just use that api endpoint to generate all the asset urls and `!ao <` them?
01:34:11DLoader_ (DLoader) joins
01:34:52<h2ibot>JustAnotherArchivist edited Deathwatch (+268, /* 2024 */ Add gotm.io): https://wiki.archiveteam.org/?diff=51689&oldid=51686
01:37:01DLoader quits [Ping timeout: 272 seconds]
01:37:06<thuban>plus the game pages (and possibly the js assets there; they seem to be common between games but i'm not sure whether ab would extract them)
01:37:11DLoader_ is now known as DLoader
01:38:24<krum110487>yeah, they are mostly the same some pages have different js runtimes depending on the version of godot used
01:38:35<krum110487>but a lot of overlap.
01:39:56<@JAA>thuban: Yeah, probably.
01:40:20<@JAA>Not sure how much effort it would be to get this properly working in the WBM, but I have a feeling it'd be hard.
01:41:24<thuban>yeah, realistically we might be going for 'theoretically reconstitutable' here
01:41:43<@JAA>Yeah
01:41:46<thuban>krum110487: is there a way to download the games or are they online-only?
01:42:04<krum110487>no way I am aware of. I just found out about the side.
01:42:24<krum110487>there is a "downloadURL" or something like that in the meta, but it is under "icon" object
01:42:30<krum110487>so I am pretty sure that is not it
01:42:55<krum110487>the "main" blob is the entire game as generated from godot
01:43:15<krum110487>JSONobject.data[].pack.main
01:44:04<thuban>https://gotm.io/faq doesn't look like it
01:44:35<thuban>ok, i'll start generating those urls
01:44:56<krum110487>I think the main blobs are just wasm files (best guess)
01:50:04<krum110487>they might be pck files, I am not 100% sure how godot files work to be completely honest
01:50:55<thuban>ugh, hold on, i gotta figure out how this actually works
01:51:32<krum110487>ok I found this in the godot docs
01:51:33<krum110487>The other exported files are served as they are, next to the .html file, names unchanged. The .wasm file is a binary WebAssembly module implementing the engine. The .pck file is the Godot main pack containing your game. The .js file contains start-up code and is used by the .html file to access the engine. The .png file contains the boot splash
01:51:34<krum110487>image. It is not used in the default HTML page, but is included for custom HTML pages.
01:51:44<krum110487>the wasm is the engine itself, the game is pck file
01:51:55<krum110487>everything else is glue to make those work together.
01:52:39<krum110487>The .pck file is binary, usually delivered with the MIME-type application/octet-stream. The .wasm file is delivered as application/wasm.
01:52:51<krum110487>https://docs.godotengine.org/en/stable/tutorials/export/exporting_for_web.html
01:58:50DLoader quits [Ping timeout: 240 seconds]
01:58:57DLoader (DLoader) joins
02:00:27<krum110487>actually JSONobject.data[].pack.main might be the wasm, the JSONobject.data[].pack.path has a url path like this: "gamePacks/aPtTj94V2TpWl3T6lIBk"
02:00:38<krum110487>which I am 99% sure it the pck file now.
02:06:15<thuban>JAA: i'm not sure we can even get 'theoretically reconstitutable', inasmuch as the script hell makes it impractical to get from the game page to that api call (if you don't know that it already exists)
02:06:35<@JAA>:-|
02:07:37<thuban>i'm gonna generate all those api calls, then all the assets listed in the api responses, then (recursively) all of the js 'chunks' that import from one another
02:08:39<thuban>which probably still won't work in the wbm but which hypothetically might work in some other playback mechanism
02:17:53<@JAA>I grabbed all topic pages and attachments from the Corel forums with qwarc.
02:18:03<krum110487>I might end up just using selenium to load each page and just grab the files that way.
02:34:27jasons (jasons) joins
02:37:03tachymelia joins
02:37:30<tachymelia>heyo, anyone here know about how to archive xenforo forum threads with threadmarks properly?
02:38:30<tachymelia>two main things tripping me up is a per-page token attached to the query, and a post request to retrieve the full threadmark listing
02:39:09<@JAA>Do you have an example?
02:41:50<tachymelia>https://forums.spacebattles.com/threads/defeat-the-godmodder.568723/ the threadmark list has a lil elipses w 320 hidden, clicking to expand does a post req to threadmarks-local-range. doesn't load if it doesn't go through
02:42:25<@JAA>Ah, fun
02:42:37<@JAA>Well, since it's POST, it will never work in the Wayback Machine.
02:43:05<tachymelia>fun stuff
02:43:23<@JAA>I suppose at least the marks all appear in the posts themselves as well, so they could be indexed from that.
02:43:43<@JAA>And each marked post has links to the previous and next marked post.
02:43:55<tachymelia>reader mode does work, which lets you view the modes as a forum thread, which is something too
02:44:06<@JAA>That, too.
02:45:24<@JAA>Archiving the POST request would still be possible with special tooling. I've done similar things with qwarc before (but can't recommend that since it's undocumented and has its share of pitfalls).
02:46:00<tachymelia>ooo yeah I'm distributing outside wayback as well so special tooling is 100% alright w me
02:46:28<tachymelia>anything you'd rec instead of quarc then?
02:49:21<@JAA>Negative. I mean, qwarc wouldn't really make this easy either. It just provides a flexible framework that can be used to archive 'anything'. You need to collect the relevant information (here, the token, for example) and assemble it all into a request yourself.
02:50:45<@JAA>Someone here started writing a tool for downloading forums, but I don't recall whether it's usable yet, and I doubt it has support for this specifically.
02:51:21<tachymelia>forum-dl right? doesn't do threadmarks as far as my source code analysis got me
02:52:06<@JAA>Name sounds right, yeah. And yeah, as expected.
02:54:01<nicolas17>1200 opensource.samsung.com items uploaded
02:55:01<nicolas17>out of ~2640
02:56:39<tachymelia>alright well, thanks JAA!
02:58:28<@JAA>Ah, I see that 'New threadmarks' are also linked on the header. That's nice for checking whether they exist on a forum.
03:09:20tachymelia quits [Ping timeout: 240 seconds]
03:15:16<h2ibot>JustAnotherArchivist edited Vbox7 (+0, In progress): https://wiki.archiveteam.org/?diff=51690&oldid=51667
03:19:17<h2ibot>FireonLive edited Current Projects (-18, vbox7 has launched!): https://wiki.archiveteam.org/?diff=51691&oldid=51663
03:31:39jasons quits [Ping timeout: 272 seconds]
03:36:21<nulldata>Now it seems the saintsrowmods.com site might be compromised? https://twitter.com/xFL1PPYx/status/1756473355908034708
03:36:22<eggdrop>nitter: https://farside.link/nitter/xFL1PPYx/status/1756473355908034708
03:37:18<fireonlive>o_O
04:23:11<monika>i'm not sure if this is well known yet but Discord's CDN has started enforcing URL signatures for a bit now
04:24:16<monika>the exact cutoff date is 2024-02-01T18:00:00Z, according to my discord chat dumps
04:24:34<monika>signature enforcement is only for new images uploaded AFTER that date
04:25:06<monika>caught me offguard since my personal canary only check for old images/links
04:25:40<@JAA>Yes, they announced that. Enforced for new uploads since the 1st, and will get enforced for old on the 22nd.
04:25:51<@JAA>But also → #discard
04:26:09<monika>ah alright
04:26:21<monika>just want to post in this channel for more visibility lol
04:27:14<@JAA>Yeah, good opportunity for a general reminder: if you have any lists of old Discord CDN URLs, get them archived before the 22nd. They can be run through #// or AB.
04:31:20lflare quits [Ping timeout: 240 seconds]
04:34:38jasons (jasons) joins
04:50:23krum110487 quits [Remote host closed the connection]
04:51:08PredatorIWD joins
04:57:39lflare (lflare) joins
05:21:38<h2ibot>Pokechu22 edited Google Drive (+102, /* Notes */ htmlview): https://wiki.archiveteam.org/?diff=51692&oldid=51469
05:35:47jasons quits [Ping timeout: 272 seconds]
05:38:13JohnnyJ joins
05:46:45hackbug quits [Remote host closed the connection]
05:47:05hackbug (hackbug) joins
05:56:14pedantic-darwin quits [Quit: The Lounge - https://thelounge.chat]
06:17:19parfait (kdqep) joins
06:38:43jasons (jasons) joins
06:49:32hackbug quits [Remote host closed the connection]
06:56:49datechnoman quits [Quit: The Lounge - https://thelounge.chat]
06:59:07datechnoman (datechnoman) joins
07:36:50jasons quits [Ping timeout: 240 seconds]
07:39:10BearFortress joins
07:39:56aninternettroll quits [Remote host closed the connection]
07:41:43aninternettroll (aninternettroll) joins
08:26:45pedantic-darwin joins
08:40:06jasons (jasons) joins
08:43:57Ruthalas59 (Ruthalas) joins
09:38:20jasons quits [Ping timeout: 240 seconds]
10:00:03Bleo18260 quits [Client Quit]
10:01:25Bleo18260 joins
10:18:12aninternettroll_ (aninternettroll) joins
10:18:40Island quits [Read error: Connection reset by peer]
10:18:50aninternettroll quits [Ping timeout: 240 seconds]
10:18:50aninternettroll_ is now known as aninternettroll
10:23:42BearFortress quits [Client Quit]
10:42:03jasons (jasons) joins
10:44:23aninternettroll quits [Remote host closed the connection]
10:46:37aninternettroll (aninternettroll) joins
11:06:24ScenarioPlanet (ScenarioPlanet) joins
11:06:43ScenarioPlanet quits [Remote host closed the connection]
11:09:08<eightthree>hi aside from joining each respective #room on hackint and or reading the room history/log, how can I most quickly find out why some projects like youtube reddit and github, which most likely still have tons of stuff left to archive, dont have enough urls in the pipeline for me (or others) to contribute when selecting them in i.e. the warrior vm? like why tracker for those projects show 0 or close to it remaining to be fetched a...
11:09:13<eightthree>... lot of the time?
11:09:38<eightthree>is there a simple table with the explanation for each somewhere?
11:10:41<h2ibot>Bzc6p edited Noob.hu (+15, status clarification that the images themselves…): https://wiki.archiveteam.org/?diff=51693&oldid=51673
11:35:07BearFortress joins
11:43:07jasons quits [Ping timeout: 272 seconds]
11:48:41Elijah joins
12:10:29aninternettroll quits [Remote host closed the connection]
12:12:18aninternettroll (aninternettroll) joins
12:32:48tyler joins
12:33:04tyler quits [Remote host closed the connection]
12:34:29<pabs>eightthree: probably it depends a lot on the project
12:34:33<pabs>like I know from #down-the-tube that there are large lists that are fed in slowly, due to the data quantity
12:35:00<pabs>also #gitgud is very much reactive and not proactive like Software Heritage is
12:35:24<pabs>I expect a lot of the reasons aren't written down anywhere either
12:37:45<kiska>Reddit is due to them blocking us, so there is that
12:38:14<kiska>Youtube is too large for us to archive all the content so we grab a selected subsection of it
12:38:34<kiska>github we completed our inital crawl months ago and so like pabs said its more proactive
12:42:26<pabs>what was the initial github crawl about?
12:46:06jasons (jasons) joins
12:54:03Arcorann quits [Ping timeout: 272 seconds]
13:14:43Elijah quits [Client Quit]
13:16:59qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
13:43:20jasons quits [Ping timeout: 240 seconds]
14:00:47Darken quits [Read error: Connection reset by peer]
14:13:15driib quits [Client Quit]
14:17:12<h2ibot>Barto created Votes in Switzerland/2024-03-03 (+3578, Page creation): https://wiki.archiveteam.org/?title=Votes%20in%20Switzerland/2024-03-03
14:17:22<Barto>JAA: ^
14:18:17<Barto>probably there's something smarter to do about the order of stuffs
14:19:16<Barto>only the twitter link isnt saved, but it is in the pad :-)
14:35:23driib (driib) joins
14:47:10jasons (jasons) joins
15:40:35hackbug (hackbug) joins
15:45:41jasons quits [Ping timeout: 272 seconds]
16:05:40<@JAA>Barto: Thanks! :-)
16:08:19<Barto>was a bit sick yesterday. That happens a bit too often recently
16:08:31<@JAA>:-(
16:10:32<h2ibot>Barto edited Votes in Switzerland/2024-03-03 (+0, Moving twitter link to the right category): https://wiki.archiveteam.org/?diff=51695&oldid=51694
16:10:48<Barto>^ small fix, twitter link was in the incorrect section
16:20:34<h2ibot>Rexma edited List of websites excluded from the Wayback Machine (+47, adding waxtube.com, a porn site that now is a…): https://wiki.archiveteam.org/?diff=51696&oldid=51604
16:20:35<h2ibot>JustAnotherArchivist changed the user rights of User:Rexma
16:28:11DogsRNice joins
16:45:49lflare quits [Client Quit]
16:46:41lflare (lflare) joins
16:48:38jasons (jasons) joins
17:00:41<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=51697&oldid=51696
17:02:47jacksonchen666 (jacksonchen666) joins
17:30:53Deewiant quits [Remote host closed the connection]
17:31:09jacksonchen666 quits [Ping timeout: 255 seconds]
17:31:58Deewiant (Deewiant) joins
17:48:50jasons quits [Ping timeout: 240 seconds]
18:13:14bf_ quits [Remote host closed the connection]
18:40:04jacksonchen666 (jacksonchen666) joins
18:52:46jasons (jasons) joins
18:54:20jacksonchen666 quits [Client Quit]
19:13:10jacksonchen666 (jacksonchen666) joins
19:17:15jacksonchen666 quits [Remote host closed the connection]
19:18:14ThreeHM_ (ThreeHeadedMonkey) joins
19:18:47jacksonchen666 (jacksonchen666) joins
19:18:48jacksonchen666 quits [Client Quit]
19:19:38ThreeHM quits [Killed (NickServ (GHOST command used by ThreeHM_))]
19:19:42ThreeHM_ is now known as ThreeHM
19:42:15JohnnyJ quits [Client Quit]
19:47:09Pedrosso joins
19:47:14ScenarioPlanet (ScenarioPlanet) joins
19:47:32TheTechRobo (TheTechRobo) joins
19:47:35ScenarioPlanet quits [Client Quit]
19:47:35Pedrosso quits [Client Quit]
19:47:35TheTechRobo quits [Client Quit]
19:48:00Pedrosso joins
19:48:13fangfufu quits [Quit: ZNC 1.8.2+deb3.1 - https://znc.in]
19:49:51TheTechRobo (TheTechRobo) joins
19:52:03jasons quits [Ping timeout: 272 seconds]
19:59:07fangfufu joins
20:06:44jacksonchen666 (jacksonchen666) joins
20:09:03jacksonchen666 quits [Remote host closed the connection]
20:09:31jacksonchen666 (jacksonchen666) joins
20:15:21lea quits [Quit: quit.]
20:17:30lea (lea_) joins
20:45:37SootBector quits [Remote host closed the connection]
20:45:38Larsenv quits [Client Quit]
20:45:49Larsenv (Larsenv) joins
20:45:59SootBector (SootBector) joins
20:49:28SootBector quits [Remote host closed the connection]
20:49:48SootBector (SootBector) joins
20:55:35jasons (jasons) joins
21:09:00BlueMaxima joins
21:17:29tech234a quits [Quit: Connection closed for inactivity]
21:17:44jacksonchen666 quits [Remote host closed the connection]
21:26:17BlueMaxima quits [Read error: Connection reset by peer]
21:26:31tzt quits [Remote host closed the connection]
21:26:53tzt (tzt) joins
21:45:51Darken (Darken) joins
21:55:20jasons quits [Ping timeout: 240 seconds]
21:59:17BlueMaxima joins
22:02:28Island joins
22:09:08JustMeCorne joins
22:09:21Darken quits [Read error: Connection reset by peer]
22:12:08<JustMeCorne>Good day all! Is this the correct IRC channel about YouTube archiving? I'm in search for some removed video's from a certain themepark and I found information about this channel. I thought about giving it a go.
22:16:40<nicolas17>the youtube channel is #down-the-tube, send your video link there and we can check if it's archived or not
22:19:26<JustMeCorne>appreciated! I'll try over there. Thank you very much! :-)
22:53:40<eightthree>are rooms not logged for legal and or privacy reasons (maybe the project has legal ambiguity or risk in some jurisdictions)
22:53:56lika joins
22:54:22<eightthree>it's annoying I can't see all prior room history in all these rooms and have to join them one by one and wait to get a useful amount of history to grep for answers...
22:54:27lika quits [Remote host closed the connection]
22:58:52jasons (jasons) joins
22:58:53<nicolas17>eightthree: some are https://hackint.logs.kiska.pw/archiveteam-bs
22:58:56<@JAA>eightthree: Project channels often have discussions that we'd rather not see accessible to anyone after the fact. That's why they're almost always not logged (publicly).
22:59:11<@JAA>s/anyone/everyone/ perhaps, you get the idea.
23:00:47<kiska>But also that is how irc works
23:01:12VerifiedJ (VerifiedJ) joins
23:09:38<eightthree>JAA: I guess it's not always obvious when a discussion will take a direction that will cause trouble later on, but otherwise people could easily just ask to chat in the non-logged room
23:09:47<eightthree>or in some private dm room
23:17:39<@JAA>eightthree: So we'd have two channels for each project? That isn't very practical.
23:20:24<eightthree>JAA: neither is being shown or having to answer the same questions repeatedly because people can't search the uncontroversial portions of room history (that they cannot see for having joined to late)?
23:21:21<@JAA>That isn't happening frequently enough to be a problem, really.
23:21:32<eightthree>fair enoughh
23:22:23<eightthree>have there been issues with things being public or made known more widely before? even to law enforcement?
23:22:29<@JAA>When something does get asked frequently, we just put it in the channel topic (which people proceed to not read then).
23:22:57<Vokun>Big thumbs up to that last line :(
23:23:04<eightthree>lol
23:23:32<nicolas17>eightthree: I think the issue is when we discuss how to bypass rate limits or other stuff that the site may be actively doing to block us
23:26:04<fireonlive>once upon a time, i worked somewhere where a question was frequently asked that day. we gradually put up literally 10 big signs at various points in full unobstructed view of the entering person but they'd still reach the humans and ask the question the signs answered
23:31:23<eightthree>Im really hoping for privacy-friendly localllm bots to answer questions in chatrooms soonTM, almost as a captcha to rank how likely and in what order a dev/project-expert should read questions
23:31:43<eightthree>fully trainable and tweakable by the project-experts of course
23:32:35<@JAA>lol no thanks
23:33:00<@JAA>There's enough subtly wrong information about AT out there already. We don't need an LLM to generate more of it on demand.
23:37:09<eightthree>is there any way to measure the % speed of archiving vs the speed of new messages created? i.e. how close are we to back up all new content on i.e telegram?
23:37:44<nicolas17>we're not archiving all of Telegram
23:38:06<nicolas17>but we *are* adding more stuff to the queue
23:38:43<nicolas17>from links found on websites, or periodically checking for new posts in watched channels, I think messages having images enqueue the image as a separate item too
23:39:31<fireonlive>telegram has kinda two problems.. one is the message rate on huge channels versus how fast we can get those messages
23:39:31<nicolas17>taking *that* into account... at current speeds the queue will empty in 42 months, but speed is super variable
23:39:35<fireonlive>and also discoverability
23:40:01<fireonlive>there's no just /show_all_public_channels_and_groups endpoint
23:40:18<nicolas17>sometimes a project's queue starts growing faster than it shrinks and then the ETA is infinity
23:44:40<eightthree>nicolas17: hmm...I wonder if this caculable number was charted over time how much it would be increasing over time...and how to best recruit more people as this chart would certainly make people 1) feel needed 2) gamified to not just contribute but realize they need to recruit others
23:45:01<nicolas17>https://grafana3.kiska.pw/d/000000/archiveteam-tracker-stats?orgId=1&var-project=telegram&from=now-3h&to=now
23:45:38<eightthree>computable*
23:47:59<nicolas17>telegram queue over the last 30 days https://transfer.archivete.am/inline/5vyqw/screenshot.png
23:48:02<eightthree>fireonlive: this is real content mostly, not just spam? who the hell follows room with that fast a stream of posts?
23:49:42<eightthree>Ive seen huge twitch or youtube streams where the comments are so numerous they are essentially unreadable, is there comparable speed on tg?
23:50:03<fireonlive>there's some... speedy rooms let's say
23:50:15<fireonlive>but yeah, not always very scholarly
23:53:01<eightthree>nicolas17: better than I thought, I guess this indeed allows gamification psychology to work better as the queue is increased in a way as to not discourage people because it's insurmountable. Great thinking
23:53:38<fireonlive>https://tracker.archiveteam.org/telegram/ < it's why we have leaderboards I believe :3
23:54:20jasons quits [Ping timeout: 240 seconds]