00:00:46dm4v quits [Read error: Connection reset by peer]
00:01:14dm4v joins
00:01:16dm4v quits [Changing host]
00:01:16dm4v (dm4v) joins
00:05:26Sluggs quits [Client Quit]
01:02:46dm4v quits [Read error: Connection reset by peer]
01:04:57xarph joins
01:05:28<xarph>re: cubetutor, has anyone reached out to ben to see if he's willing to give up a dump or cleaner export method?
01:05:32<xarph>he seems like a good guy
01:06:37dm4v joins
01:06:39dm4v quits [Changing host]
01:06:39dm4v (dm4v) joins
01:07:15<@JAA>I don't think so. At least nobody here mentioned anything.
01:07:29qwertyasdfuiopghjkl quits [Client Quit]
01:09:43<xarph>I'll drop an email
01:16:17<@JAA>Sounds good.
01:16:32<@JAA>We'll still want to grab it over HTTP for the Wayback Machine, but a direct complete dump would be great to have.
01:19:36HP_Archivist (HP_Archivist) joins
01:22:44<@JAA>xarph: You mentioned /cubeblog/N and /viewcube/N as being the most important. What about /cubedeck/N and /draft/N? I don't know enough about MTG to understand what these things are.
01:29:21<@JAA>There are also forums at /forum/$FORUMID and threads at /forumthreads/$FORUMID/$THREADID. Lots of spam it seems. The thread URL still works if you have the wrong forum ID.
01:30:34<@JAA>Actually, it's not the forum ID. User ID? I don't know what it is exactly, but the same ID is passed to all of the above.
01:56:34lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)]
01:59:47lennier1 (lennier1) joins
02:16:42<xarph>everything seems to revolve around cube id, which is 1-<however many integers>
02:17:09<xarph>so cubedeck and draft is for visitors to mess around
02:17:31<xarph>draft is an ephemeral thing and can be ignored because it's a backend application that just spits out randomness
02:18:03<xarph>cubedeck means "from this set of cards in the cube, people can make decks" but it's not very common and I think it's locked behind user account anyway
02:18:12<xarph>(sorry for the hour long delay I was at the store)
02:21:01<@JAA>Doesn't seem to be locked, and the IDs go to over 1.3 million (and are separate from the cube IDs).
02:21:44<xarph>oh huh
02:22:49<@JAA>And no worries, chat here is async anyway. :-)
02:28:08<xarph>I'm skimming the archivebot log, I think if you had to cut two sections, the /card data is not irreplacable since the text names of the cards appear in the /cubelist entries, which is the crown jewels.
02:28:39<xarph>basically the core thing from cubetutor we're afraid of losing is the lists of cards in each cube, because they're reference for like tournaments and stuff
02:29:13<xarph>Anyway email sent and pinged on twitter, we'll see if ben comes back
02:30:35<xarph>in his replies he's talking to scryfall and the other big mtg cube site (cube cobra) so possibly the data ends up there
02:30:58<xarph>if he cuts a deal with scryfall we can breathe a sigh of relief because scryfall is good peoples
02:31:35HP_Archivist quits [Ping timeout: 265 seconds]
02:32:01<xarph>until they get bought by wotc for having the best mtg database, but even then that means we just panic-download the daily bulk exports https://scryfall.com/docs/api/bulk-data
02:32:08<xarph>(told you scryfall were good people :)
02:32:46<@JAA>Yeah, I noticed /card as well just now. It looks like the pages are /card/$CUBEID/$CARDID, so it's grabbing every card's page as many times as it's included in cubes. Bit of a mess.
02:33:10<xarph>cubetutor really feels like it was the six year saga of ben self-teaching webdev
02:33:19<xarph>which is, like, most magic the gathering sites
02:42:39superkuh_ joins
02:43:41superkuh quits [Ping timeout: 258 seconds]
02:51:13ThreeHM quits [Ping timeout: 252 seconds]
02:53:14ThreeHM (ThreeHeadedMonkey) joins
03:00:35HackMii quits [Remote host closed the connection]
03:01:06HackMii (hacktheplanet) joins
03:06:57qw3rty__ joins
03:10:28qw3rty_ quits [Ping timeout: 252 seconds]
06:19:36mutantmonkey quits [Remote host closed the connection]
06:19:53mutantmonkey (mutantmonkey) joins
07:43:40IDK (IDK) joins
08:00:07britmob256364 quits [Quit: britmob256364]
08:10:19knecht420 quits [Client Quit]
08:10:52knecht420 (knecht420) joins
08:11:09knecht420 quits [Remote host closed the connection]
08:11:40knecht420 (knecht420) joins
08:11:40knecht420 quits [Client Quit]
08:44:46qwertyasdfuiopghjkl joins
08:50:47knecht420 (knecht420) joins
10:22:48IDK quits [Client Quit]
10:25:58Sluggs joins
10:47:22IDK (IDK) joins
11:19:16mutantmonkey quits [Ping timeout: 258 seconds]
11:23:46mutantmonkey (mutantmonkey) joins
11:45:24Justin[home] joins
11:45:30Gaelan_ (Gaelan) joins
11:45:38VerifiedJ3 (VerifiedJ) joins
11:45:41Ruthalas5 (Ruthalas) joins
11:45:44s-crypt2 (s-crypt) joins
11:45:50flashfire429 (flashfire42) joins
11:46:00Matthww quits [Read error: Connection reset by peer]
11:46:00VerifiedJ quits [Write error: Connection reset by peer]
11:46:00Ruthalas quits [Read error: Connection reset by peer]
11:46:00flashfire42 quits [Read error: Connection reset by peer]
11:46:00Ryz2 quits [Read error: Connection reset by peer]
11:46:00s-crypt quits [Read error: Connection reset by peer]
11:46:01Gaelan quits [Quit: ZNC 1.8.2 - https://znc.in]
11:46:01jonty quits [Read error: Connection reset by peer]
11:46:01@hook54321 quits [Read error: Connection reset by peer]
11:46:01VerifiedJ3 is now known as VerifiedJ
11:46:01voltagex quits [Quit: ZNC 1.7.2+deb3 - https://znc.in]
11:46:01Ruthalas5 is now known as Ruthalas
11:46:01jonty_ (jonty) joins
11:46:01s-crypt2 is now known as s-crypt
11:46:01flashfire429 is now known as flashfire42
11:46:03Matthww joins
11:46:04Ryz2 (Ryz) joins
11:46:19voltagex_ joins
11:47:04hook54321 (hook54321) joins
11:47:04@ChanServ sets mode: +o hook54321
11:47:25katocala quits [Ping timeout: 265 seconds]
11:47:25DopefishJustin quits [Ping timeout: 265 seconds]
11:47:49katocala joins
12:06:22britmob256364 joins
12:46:59Wingy quits [Remote host closed the connection]
12:47:55Wingy (Wingy) joins
12:52:11balrog quits [Ping timeout: 265 seconds]
12:52:29balrog (balrog) joins
13:37:08ThreeHM quits [Ping timeout: 265 seconds]
13:38:47ThreeHM (ThreeHeadedMonkey) joins
13:42:06HP_Archivist (HP_Archivist) joins
14:36:13Jonboy345 joins
14:42:48IDK quits [Client Quit]
15:00:35superkuh_ is now known as superkuh
15:02:22Arcorann_ quits [Ping timeout: 258 seconds]
15:08:49dm4v quits [Client Quit]
15:09:28dm4v joins
15:09:30dm4v quits [Changing host]
15:09:30dm4v (dm4v) joins
15:15:32dm4v quits [Client Quit]
15:16:51dm4v joins
15:16:53dm4v quits [Changing host]
15:16:53dm4v (dm4v) joins
15:32:40pabs quits [Read error: Connection reset by peer]
15:33:34pabs (pabs) joins
16:02:33Mateon1 quits [Ping timeout: 258 seconds]
16:03:24Mateon1 joins
16:04:56spirit joins
16:26:31AlsoHP_Archivist joins
16:30:10HP_Archivist quits [Ping timeout: 252 seconds]
16:53:25KiyoshIWJ quits [Remote host closed the connection]
16:55:05<xarph>@JAA I have an email from ben at cubetutor.
16:55:12<xarph>here are the nuts and bolts
16:55:19<xarph>There are currently 182050 cubes hosted on Cube Tutor. The id for each cube can be found in the url, for example the list for cube 181676 is:
16:55:19<xarph>https://www.cubetutor.com/viewcube/181676
16:55:19<xarph>So to start with you could just point a scraper at https://www.cubetutor.com/viewcube/1 and simply increase the number at the end until you hit 182050. I would expect around 30-50% of the cubes on Cube Tutor are empty. I had a problem with spam bots registering on Cube Tutor for a while.
16:56:47<xarph>From there you can just use the cube id to visit all of the public pages for that cube, e.g.
16:56:48<xarph>https://www.cubetutor.com/tokens/181676
16:56:48<xarph>https://www.cubetutor.com/decks/181676
16:56:48<xarph>https://www.cubetutor.com/forum/181676
16:57:04<xarph>Additionally we have a page for each card, for example:
16:57:04<xarph>https://www.cubetutor.com/card/1/3238
16:57:04<xarph>With the highest id being 45772. I think it would be worth archiving these pages as they contain the tags used in the drafting AI.
16:57:12<xarph>There is also a whole mini site under: https://www.cubetutor.com/championcentral/
16:57:18<xarph>--- END EXCERPTS ---
16:57:35<xarph>I'm asking if it's needed to recurse down the /card/ID/ pages
16:57:43<xarph>If you have other questions I can forward them
16:59:06IDK (IDK) joins
17:06:59ArchivalEfforts joins
17:08:24ArchivalEfforts quits [Client Quit]
17:08:27ArchivalEfforts joins
17:34:38<xarph>"The card pages are the same for all cube IDs so all you need to scrape is card/1/Y where 1 >= Y <= 45772."
17:35:01<xarph>"It was built this way due to maintain some of the state in the menu. Honestly I got myself into a bit of a mess with a fairly hardcoded insistence on having the cube ID in the URL for most pages. I won't go in to the details, but suffice to say you don't need to scrape this page for every other cube id."
17:35:11<xarph>@jaa that should save some archiving effort.
17:35:25<xarph>now I have to go do boring things for money, will be back this evening
17:36:14superkuh_ joins
17:37:16superkuh quits [Ping timeout: 252 seconds]
17:47:30AlsoHP_Archivist quits [Ping timeout: 265 seconds]
17:51:15HP_Archivist (HP_Archivist) joins
17:52:37sec^nd quits [Remote host closed the connection]
17:53:03sec^nd (second) joins
17:56:58HP_Archivist quits [Client Quit]
18:34:04Jonboy3451 joins
18:38:11Jonboy345 quits [Ping timeout: 258 seconds]
18:59:14VerifiedJ quits [Client Quit]
19:02:43<@JAA>xarph: Thanks, that's good info. One thing I'm unsure about is whether there's pagination anywhere other than the cubeblog pages. Couldn't find any examples, but would be good to hear for sure whether it can happen or not.
19:03:22VerifiedJ (VerifiedJ) joins
19:16:55JTL quits [Ping timeout: 265 seconds]
19:38:44spirit quits [Client Quit]
19:39:02JTL (JTL) joins
20:32:02Stiletto joins
21:13:43mutantmonkey quits [Remote host closed the connection]
21:14:06mutantmonkey (mutantmonkey) joins
21:40:57tzt quits [Ping timeout: 265 seconds]
21:41:20tzt joins
21:47:05KiyoshIWJ joins
22:11:28KiyoshIWJ quits [Remote host closed the connection]
22:28:59Nay quits [Quit: //System Offline//]
22:29:39Nay (JeDa) joins
22:33:46driib quits [Quit: The Lounge - https://thelounge.chat]
22:34:18driib (driib) joins
22:34:25Arcorann_ joins
22:35:43<@JAA>Only ~637k threads on Ruqqus, easily enumerable.
22:36:17<@JAA>Or 'submissions', as they're called in the API at least.
22:38:24IDK quits [Client Quit]
22:40:08Nay quits [Client Quit]
22:40:47Nay (JeDa) joins
23:00:27<@JAA>9 days until v2 onion addresses will cease working with the new Tor release and likely disappear or become inaccessible soon after that as the network upgrades.
23:04:21<h2ibot>JustAnotherArchivist edited Deathwatch (+147, /* 2021 */ Add Ruqqus): https://wiki.archiveteam.org/?diff=47235&oldid=47135
23:07:21<h2ibot>Moom0o edited Imgur (+151, Add sitemap): https://wiki.archiveteam.org/?diff=47236&oldid=47212
23:07:22<h2ibot>Themadprogramer edited Discourse (+121, /* Active Discourses */): https://wiki.archiveteam.org/?diff=47237&oldid=47165
23:08:09KiyoshIWJ joins
23:32:45HP_Archivist (HP_Archivist) joins
23:39:48wyatt8750 joins
23:43:19wyatt8740 quits [Ping timeout: 258 seconds]