| 00:00:46 | | dm4v quits [Read error: Connection reset by peer] |
| 00:01:14 | | dm4v joins |
| 00:01:16 | | dm4v is now authenticated as dm4v |
| 00:01:16 | | dm4v quits [Changing host] |
| 00:01:16 | | dm4v (dm4v) joins |
| 00:05:26 | | Sluggs quits [Client Quit] |
| 01:02:46 | | dm4v quits [Read error: Connection reset by peer] |
| 01:04:57 | | xarph joins |
| 01:05:28 | <xarph> | re: cubetutor, has anyone reached out to ben to see if he's willing to give up a dump or cleaner export method? |
| 01:05:32 | <xarph> | he seems like a good guy |
| 01:06:37 | | dm4v joins |
| 01:06:39 | | dm4v is now authenticated as dm4v |
| 01:06:39 | | dm4v quits [Changing host] |
| 01:06:39 | | dm4v (dm4v) joins |
| 01:07:15 | <@JAA> | I don't think so. At least nobody here mentioned anything. |
| 01:07:29 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 01:09:43 | <xarph> | I'll drop an email |
| 01:16:17 | <@JAA> | Sounds good. |
| 01:16:32 | <@JAA> | We'll still want to grab it over HTTP for the Wayback Machine, but a direct complete dump would be great to have. |
| 01:19:36 | | HP_Archivist (HP_Archivist) joins |
| 01:22:44 | <@JAA> | xarph: You mentioned /cubeblog/N and /viewcube/N as being the most important. What about /cubedeck/N and /draft/N? I don't know enough about MTG to understand what these things are. |
| 01:29:21 | <@JAA> | There are also forums at /forum/$FORUMID and threads at /forumthreads/$FORUMID/$THREADID. Lots of spam it seems. The thread URL still works if you have the wrong forum ID. |
| 01:30:34 | <@JAA> | Actually, it's not the forum ID. User ID? I don't know what it is exactly, but the same ID is passed to all of the above. |
| 01:56:34 | | lennier1 quits [Quit: Going offline, see ya! (www.adiirc.com)] |
| 01:59:47 | | lennier1 (lennier1) joins |
| 02:16:42 | <xarph> | everything seems to revolve around cube id, which is 1-<however many integers> |
| 02:17:09 | <xarph> | so cubedeck and draft is for visitors to mess around |
| 02:17:31 | <xarph> | draft is an ephemeral thing and can be ignored because it's a backend application that just spits out randomness |
| 02:18:03 | <xarph> | cubedeck means "from this set of cards in the cube, people can make decks" but it's not very common and I think it's locked behind user account anyway |
| 02:18:12 | <xarph> | (sorry for the hour long delay I was at the store) |
| 02:21:01 | <@JAA> | Doesn't seem to be locked, and the IDs go to over 1.3 million (and are separate from the cube IDs). |
| 02:21:44 | <xarph> | oh huh |
| 02:22:49 | <@JAA> | And no worries, chat here is async anyway. :-) |
| 02:28:08 | <xarph> | I'm skimming the archivebot log, I think if you had to cut two sections, the /card data is not irreplacable since the text names of the cards appear in the /cubelist entries, which is the crown jewels. |
| 02:28:39 | <xarph> | basically the core thing from cubetutor we're afraid of losing is the lists of cards in each cube, because they're reference for like tournaments and stuff |
| 02:29:13 | <xarph> | Anyway email sent and pinged on twitter, we'll see if ben comes back |
| 02:30:35 | <xarph> | in his replies he's talking to scryfall and the other big mtg cube site (cube cobra) so possibly the data ends up there |
| 02:30:58 | <xarph> | if he cuts a deal with scryfall we can breathe a sigh of relief because scryfall is good peoples |
| 02:31:35 | | HP_Archivist quits [Ping timeout: 265 seconds] |
| 02:32:01 | <xarph> | until they get bought by wotc for having the best mtg database, but even then that means we just panic-download the daily bulk exports https://scryfall.com/docs/api/bulk-data |
| 02:32:08 | <xarph> | (told you scryfall were good people :) |
| 02:32:46 | <@JAA> | Yeah, I noticed /card as well just now. It looks like the pages are /card/$CUBEID/$CARDID, so it's grabbing every card's page as many times as it's included in cubes. Bit of a mess. |
| 02:33:10 | <xarph> | cubetutor really feels like it was the six year saga of ben self-teaching webdev |
| 02:33:19 | <xarph> | which is, like, most magic the gathering sites |
| 02:42:39 | | superkuh_ joins |
| 02:43:41 | | superkuh quits [Ping timeout: 258 seconds] |
| 02:51:13 | | ThreeHM quits [Ping timeout: 252 seconds] |
| 02:53:14 | | ThreeHM (ThreeHeadedMonkey) joins |
| 03:00:35 | | HackMii quits [Remote host closed the connection] |
| 03:01:06 | | HackMii (hacktheplanet) joins |
| 03:06:57 | | qw3rty__ joins |
| 03:10:28 | | qw3rty_ quits [Ping timeout: 252 seconds] |
| 06:19:36 | | mutantmonkey quits [Remote host closed the connection] |
| 06:19:53 | | mutantmonkey (mutantmonkey) joins |
| 07:43:40 | | IDK (IDK) joins |
| 08:00:07 | | britmob256364 quits [Quit: britmob256364] |
| 08:10:19 | | knecht420 quits [Client Quit] |
| 08:10:52 | | knecht420 (knecht420) joins |
| 08:11:09 | | knecht420 quits [Remote host closed the connection] |
| 08:11:40 | | knecht420 (knecht420) joins |
| 08:11:40 | | knecht420 quits [Client Quit] |
| 08:44:46 | | qwertyasdfuiopghjkl joins |
| 08:50:47 | | knecht420 (knecht420) joins |
| 10:22:48 | | IDK quits [Client Quit] |
| 10:25:58 | | Sluggs joins |
| 10:47:22 | | IDK (IDK) joins |
| 11:19:16 | | mutantmonkey quits [Ping timeout: 258 seconds] |
| 11:23:46 | | mutantmonkey (mutantmonkey) joins |
| 11:45:24 | | Justin[home] joins |
| 11:45:24 | | Justin[home] is now authenticated as DopefishJustin |
| 11:45:30 | | Gaelan_ (Gaelan) joins |
| 11:45:38 | | VerifiedJ3 (VerifiedJ) joins |
| 11:45:41 | | Ruthalas5 (Ruthalas) joins |
| 11:45:44 | | s-crypt2 (s-crypt) joins |
| 11:45:50 | | flashfire429 (flashfire42) joins |
| 11:46:00 | | Matthww quits [Read error: Connection reset by peer] |
| 11:46:00 | | VerifiedJ quits [Write error: Connection reset by peer] |
| 11:46:00 | | Ruthalas quits [Read error: Connection reset by peer] |
| 11:46:00 | | flashfire42 quits [Read error: Connection reset by peer] |
| 11:46:00 | | Ryz2 quits [Read error: Connection reset by peer] |
| 11:46:00 | | s-crypt quits [Read error: Connection reset by peer] |
| 11:46:01 | | Gaelan quits [Quit: ZNC 1.8.2 - https://znc.in] |
| 11:46:01 | | jonty quits [Read error: Connection reset by peer] |
| 11:46:01 | | @hook54321 quits [Read error: Connection reset by peer] |
| 11:46:01 | | VerifiedJ3 is now known as VerifiedJ |
| 11:46:01 | | voltagex quits [Quit: ZNC 1.7.2+deb3 - https://znc.in] |
| 11:46:01 | | Ruthalas5 is now known as Ruthalas |
| 11:46:01 | | jonty_ (jonty) joins |
| 11:46:01 | | s-crypt2 is now known as s-crypt |
| 11:46:01 | | flashfire429 is now known as flashfire42 |
| 11:46:03 | | Matthww joins |
| 11:46:04 | | Ryz2 (Ryz) joins |
| 11:46:19 | | voltagex_ joins |
| 11:47:04 | | hook54321 (hook54321) joins |
| 11:47:04 | | @ChanServ sets mode: +o hook54321 |
| 11:47:25 | | katocala quits [Ping timeout: 265 seconds] |
| 11:47:25 | | DopefishJustin quits [Ping timeout: 265 seconds] |
| 11:47:49 | | katocala joins |
| 12:04:45 | | katocala is now authenticated as katocala |
| 12:06:22 | | britmob256364 joins |
| 12:46:59 | | Wingy quits [Remote host closed the connection] |
| 12:47:55 | | Wingy (Wingy) joins |
| 12:52:11 | | balrog quits [Ping timeout: 265 seconds] |
| 12:52:29 | | balrog (balrog) joins |
| 13:37:08 | | ThreeHM quits [Ping timeout: 265 seconds] |
| 13:38:47 | | ThreeHM (ThreeHeadedMonkey) joins |
| 13:42:06 | | HP_Archivist (HP_Archivist) joins |
| 14:36:13 | | Jonboy345 joins |
| 14:42:48 | | IDK quits [Client Quit] |
| 15:00:35 | | superkuh_ is now known as superkuh |
| 15:02:22 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 15:08:49 | | dm4v quits [Client Quit] |
| 15:09:28 | | dm4v joins |
| 15:09:30 | | dm4v is now authenticated as dm4v |
| 15:09:30 | | dm4v quits [Changing host] |
| 15:09:30 | | dm4v (dm4v) joins |
| 15:15:32 | | dm4v quits [Client Quit] |
| 15:16:51 | | dm4v joins |
| 15:16:53 | | dm4v is now authenticated as dm4v |
| 15:16:53 | | dm4v quits [Changing host] |
| 15:16:53 | | dm4v (dm4v) joins |
| 15:32:40 | | pabs quits [Read error: Connection reset by peer] |
| 15:33:34 | | pabs (pabs) joins |
| 16:02:33 | | Mateon1 quits [Ping timeout: 258 seconds] |
| 16:03:24 | | Mateon1 joins |
| 16:04:56 | | spirit joins |
| 16:26:31 | | AlsoHP_Archivist joins |
| 16:30:10 | | HP_Archivist quits [Ping timeout: 252 seconds] |
| 16:53:25 | | KiyoshIWJ quits [Remote host closed the connection] |
| 16:55:05 | <xarph> | @JAA I have an email from ben at cubetutor. |
| 16:55:12 | <xarph> | here are the nuts and bolts |
| 16:55:19 | <xarph> | There are currently 182050 cubes hosted on Cube Tutor. The id for each cube can be found in the url, for example the list for cube 181676 is: |
| 16:55:19 | <xarph> | https://www.cubetutor.com/viewcube/181676 |
| 16:55:19 | <xarph> | So to start with you could just point a scraper at https://www.cubetutor.com/viewcube/1 and simply increase the number at the end until you hit 182050. I would expect around 30-50% of the cubes on Cube Tutor are empty. I had a problem with spam bots registering on Cube Tutor for a while. |
| 16:56:47 | <xarph> | From there you can just use the cube id to visit all of the public pages for that cube, e.g. |
| 16:56:48 | <xarph> | https://www.cubetutor.com/tokens/181676 |
| 16:56:48 | <xarph> | https://www.cubetutor.com/decks/181676 |
| 16:56:48 | <xarph> | https://www.cubetutor.com/forum/181676 |
| 16:57:04 | <xarph> | Additionally we have a page for each card, for example: |
| 16:57:04 | <xarph> | https://www.cubetutor.com/card/1/3238 |
| 16:57:04 | <xarph> | With the highest id being 45772. I think it would be worth archiving these pages as they contain the tags used in the drafting AI. |
| 16:57:12 | <xarph> | There is also a whole mini site under: https://www.cubetutor.com/championcentral/ |
| 16:57:18 | <xarph> | --- END EXCERPTS --- |
| 16:57:35 | <xarph> | I'm asking if it's needed to recurse down the /card/ID/ pages |
| 16:57:43 | <xarph> | If you have other questions I can forward them |
| 16:59:06 | | IDK (IDK) joins |
| 17:06:59 | | ArchivalEfforts joins |
| 17:08:24 | | ArchivalEfforts quits [Client Quit] |
| 17:08:27 | | ArchivalEfforts joins |
| 17:34:38 | <xarph> | "The card pages are the same for all cube IDs so all you need to scrape is card/1/Y where 1 >= Y <= 45772." |
| 17:35:01 | <xarph> | "It was built this way due to maintain some of the state in the menu. Honestly I got myself into a bit of a mess with a fairly hardcoded insistence on having the cube ID in the URL for most pages. I won't go in to the details, but suffice to say you don't need to scrape this page for every other cube id." |
| 17:35:11 | <xarph> | @jaa that should save some archiving effort. |
| 17:35:25 | <xarph> | now I have to go do boring things for money, will be back this evening |
| 17:36:14 | | superkuh_ joins |
| 17:37:16 | | superkuh quits [Ping timeout: 252 seconds] |
| 17:47:30 | | AlsoHP_Archivist quits [Ping timeout: 265 seconds] |
| 17:51:15 | | HP_Archivist (HP_Archivist) joins |
| 17:52:37 | | sec^nd quits [Remote host closed the connection] |
| 17:53:03 | | sec^nd (second) joins |
| 17:56:58 | | HP_Archivist quits [Client Quit] |
| 18:34:04 | | Jonboy3451 joins |
| 18:38:11 | | Jonboy345 quits [Ping timeout: 258 seconds] |
| 18:59:14 | | VerifiedJ quits [Client Quit] |
| 19:02:43 | <@JAA> | xarph: Thanks, that's good info. One thing I'm unsure about is whether there's pagination anywhere other than the cubeblog pages. Couldn't find any examples, but would be good to hear for sure whether it can happen or not. |
| 19:03:22 | | VerifiedJ (VerifiedJ) joins |
| 19:16:55 | | JTL quits [Ping timeout: 265 seconds] |
| 19:38:44 | | spirit quits [Client Quit] |
| 19:39:02 | | JTL (JTL) joins |
| 20:32:02 | | Stiletto joins |
| 21:13:43 | | mutantmonkey quits [Remote host closed the connection] |
| 21:14:06 | | mutantmonkey (mutantmonkey) joins |
| 21:40:57 | | tzt quits [Ping timeout: 265 seconds] |
| 21:41:20 | | tzt joins |
| 21:47:05 | | KiyoshIWJ joins |
| 22:11:28 | | KiyoshIWJ quits [Remote host closed the connection] |
| 22:28:59 | | Nay quits [Quit: //System Offline//] |
| 22:29:39 | | Nay (JeDa) joins |
| 22:33:46 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
| 22:34:18 | | driib (driib) joins |
| 22:34:25 | | Arcorann_ joins |
| 22:35:43 | <@JAA> | Only ~637k threads on Ruqqus, easily enumerable. |
| 22:36:17 | <@JAA> | Or 'submissions', as they're called in the API at least. |
| 22:38:24 | | IDK quits [Client Quit] |
| 22:40:08 | | Nay quits [Client Quit] |
| 22:40:47 | | Nay (JeDa) joins |
| 23:00:27 | <@JAA> | 9 days until v2 onion addresses will cease working with the new Tor release and likely disappear or become inaccessible soon after that as the network upgrades. |
| 23:04:21 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+147, /* 2021 */ Add Ruqqus): https://wiki.archiveteam.org/?diff=47235&oldid=47135 |
| 23:07:21 | <h2ibot> | Moom0o edited Imgur (+151, Add sitemap): https://wiki.archiveteam.org/?diff=47236&oldid=47212 |
| 23:07:22 | <h2ibot> | Themadprogramer edited Discourse (+121, /* Active Discourses */): https://wiki.archiveteam.org/?diff=47237&oldid=47165 |
| 23:08:09 | | KiyoshIWJ joins |
| 23:32:45 | | HP_Archivist (HP_Archivist) joins |
| 23:39:48 | | wyatt8750 joins |
| 23:43:19 | | wyatt8740 quits [Ping timeout: 258 seconds] |