| 00:18:28 | | lennier1 quits [Client Quit] |
| 00:19:06 | | lennier1 (lennier1) joins |
| 00:34:16 | | Megame quits [Client Quit] |
| 00:36:06 | | sec^nd quits [Remote host closed the connection] |
| 00:37:16 | | sec^nd (second) joins |
| 01:02:42 | | dm4v_ joins |
| 01:02:48 | | dm4v quits [Read error: Connection reset by peer] |
| 01:02:54 | | dm4v_ is now known as dm4v |
| 01:02:57 | | dm4v is now authenticated as dm4v |
| 01:02:57 | | dm4v quits [Changing host] |
| 01:02:57 | | dm4v (dm4v) joins |
| 01:22:23 | | datechnoman quits [Client Quit] |
| 01:23:12 | | datechnoman (datechnoman) joins |
| 01:51:47 | | wyatt8750 joins |
| 01:52:54 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 01:59:50 | | user__ quits [Read error: Connection reset by peer] |
| 02:00:17 | | user__ (gazorpazorp) joins |
| 02:06:49 | | @Fusl quits [Excess Flood] |
| 02:07:05 | | Fusl (Fusl) joins |
| 02:07:05 | | @ChanServ sets mode: +o Fusl |
| 02:41:07 | <h2ibot> | NightHnh099 edited Alive... OR ARE THEY (+238): https://wiki.archiveteam.org/?diff=48444&oldid=47018 |
| 02:56:54 | | lennier1 quits [Client Quit] |
| 02:57:16 | | lennier1 (lennier1) joins |
| 03:10:11 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+15, /* 2022 */ BPO's date shifted again): https://wiki.archiveteam.org/?diff=48445&oldid=48428 |
| 03:36:50 | | Mateon1 quits [Remote host closed the connection] |
| 03:37:03 | | Mateon1 joins |
| 04:13:25 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 04:16:23 | | qwertyasdfuiopghjkl joins |
| 04:28:32 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 04:35:04 | | drexler joins |
| 04:35:14 | <drexler> | > What's the best way to archive large files at their original URL to ensure the maximum chance of people finding it? |
| 04:35:25 | <@JAA> | Hello again. :-) What is this about? |
| 04:36:27 | <drexler> | Oh, I'm trying to archive publicly available AI models. |
| 04:36:55 | <drexler> | Because they're large files that tend to be considered 'incidental' and disappear once someone decides to stop hosting them, but if they're available you can go straight back to a particular 'era' of AI art. |
| 04:37:37 | <drexler> | That is, they're large files with short lived popularity but likely long tail value that is not being recognized right this moment but probably will be later. |
| 04:38:27 | <drexler> | And because of this their hosting tends to be odd? People will put them up on their personal server or a university lab workstation or something then one day that goes down and never comes back up. |
| 04:39:45 | <@JAA> | Right. What order of magnitude are we talking about here? As in, if this were done continuously, how much new data per time would you expect? |
| 04:40:59 | <drexler> | Oh you could probably fit every historically interesting AI model on a few terabytes maybe? That's being conservative/assuming it takes more memory than it probably does. |
| 04:41:18 | <drexler> | Uh, at least for the AI art ones |
| 04:41:22 | <drexler> | Which is I'll I'm interested in at the moment |
| 04:42:12 | <@JAA> | That doesn't sound too horrible. I was expecting more. |
| 04:46:09 | <drexler> | Oh, no. |
| 04:46:20 | <drexler> | StyleGAN models are probably something like 700mb each iirc |
| 04:46:23 | <@JAA> | These are normally distributed over HTTP(S), right? I remember mirroring some stuff from rsync servers before. |
| 04:46:29 | <drexler> | Yeah, they usually are |
| 04:46:59 | <drexler> | And then there's bigger models like the ones I'm training right now, which are 10gb each. But people don't train very many of those. |
| 04:47:05 | <drexler> | Yet. |
| 04:47:16 | <drexler> | This early period is probably going to be of the most historical interest, and it uses the least storage space. |
| 04:47:51 | <@JAA> | Ok, if you have a list, we can probably just run it through ArchiveBot. Shouldn't take more than a few weeks to maybe a couple months for that size, though it obviously depends on the servers as well. |
| 04:47:55 | <@JAA> | arkiver: ^ Thoughts? |
| 04:50:42 | <drexler> | Yeah, the alternative is I just download them all slowly and then reupload as an Internet Archive collection, but I think it'd be ideal if they were findable at their original URLs, dunno. I guess the flip side of it is that people might forget what their original URLs even are, given how weird and obscure the hosting often is in the first place. |
| 04:51:49 | <@JAA> | Yeah, documenting those URLs would definitely be a good idea. If it isn't an endless list, this could go onto our wiki. |
| 04:53:57 | <thuban> | if you do end up submitting a list to archivebot, it might be a good idea to include urls of 'about' pages and/or whatever you used to discover the model urls in the first place (since that discovery would then be replicable within the wbm) |
| 04:56:11 | | drexler thumbs up |
| 05:15:34 | <drexler> | Hm, now that you mention it. |
| 05:15:36 | <drexler> | For about/context |
| 05:15:44 | <drexler> | You ever archived Google CoLab before? |
| 05:17:21 | | wyatt8750 quits [Ping timeout: 265 seconds] |
| 05:18:31 | <h2ibot> | JustAnotherArchivist edited DPoS (-35, FOS hasn't been used in years; HTTPS for tracker): https://wiki.archiveteam.org/?diff=48446&oldid=48431 |
| 05:19:08 | | wyatt8740 joins |
| 06:05:39 | <Jake> | (colab if I remember correctly is _tons_ of JavaScript, so the actual page isn't easily archivable, from what I remember, but I think you can export pretty easily?) |
| 06:10:31 | <drexler> | Jake, Yeah |
| 06:10:42 | <drexler> | You don't need to archive the page |
| 06:10:48 | <drexler> | You just need to archive/export the notebook itself |
| 06:11:07 | <drexler> | Because that's how ML people tend to distribute the prototype/public use version of their programs |
| 06:11:39 | <drexler> | So if you're saving the models at an expensive storage cost, it would be foolish not to also save the notebooks which are a fraction of a fraction the size and what allows you to actually use the models to do something. |
| 06:12:01 | | Hackerpcs quits [Client Quit] |
| 06:12:14 | <drexler> | They also contain the URLs the models are stored at, which is why I thought of them. |
| 06:15:50 | | Hackerpcs (Hackerpcs) joins |
| 06:37:06 | | lennier1 quits [Ping timeout: 265 seconds] |
| 06:42:02 | | lennier1 (lennier1) joins |
| 07:18:27 | | CatBatHat joins |
| 07:23:44 | | CatBatHat quits [Remote host closed the connection] |
| 07:27:14 | | march_happy quits [Ping timeout: 265 seconds] |
| 07:28:12 | | march_happy (march_happy) joins |
| 07:37:52 | | march_happy quits [Ping timeout: 265 seconds] |
| 07:38:09 | | march_happy (march_happy) joins |
| 07:40:37 | | @dxrt quits [Quit: ZNC - http://znc.sourceforge.net] |
| 07:41:19 | | dxrt joins |
| 07:41:22 | | dxrt is now authenticated as dxrt |
| 07:41:22 | | dxrt quits [Changing host] |
| 07:41:22 | | dxrt (dxrt) joins |
| 07:41:22 | | @ChanServ sets mode: +o dxrt |
| 08:09:25 | | march_happy quits [Ping timeout: 265 seconds] |
| 08:09:38 | | march_happy (march_happy) joins |
| 08:30:27 | | nepeat quits [Client Quit] |
| 08:39:52 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 08:48:45 | | nepeat (nepeat) joins |
| 08:55:24 | | nepeat quits [Client Quit] |
| 08:56:56 | | [42] quits [Max SendQ exceeded] |
| 08:57:47 | | [42] (N4Y) joins |
| 09:01:03 | | nepeat (nepeat) joins |
| 09:29:36 | | Megame (Megame) joins |
| 09:53:38 | | sonick quits [Client Quit] |
| 10:28:40 | | sonick (sonick) joins |
| 10:52:37 | | AK quits [Quit: Ping timeout (120 seconds)] |
| 10:57:58 | | JackThompson joins |
| 11:00:35 | | JackThompson quits [Client Quit] |
| 11:04:52 | | march_happy quits [Ping timeout: 265 seconds] |
| 11:05:07 | | march_happy (march_happy) joins |
| 11:24:41 | | march_happy quits [Ping timeout: 265 seconds] |
| 11:25:01 | | march_happy (march_happy) joins |
| 11:29:43 | | AK (AK) joins |
| 11:30:40 | <AK> | arkiver, I know we spoke at one point about doing nslookups of domains and not attempting to grab if the domain resolves to an internal address |
| 11:30:45 | <AK> | My ban reason by hetzner was "since the IPs listed in the log are not routed in the Internet at the moment they are not reachable and therefore this is seen as an abuse." |
| 11:31:19 | <AK> | So I'm either going to ask please can we add something like that on the grab side, or I'll need to look into how I can null route outbound requests to private addresses |
| 11:53:41 | | march_happy quits [Ping timeout: 265 seconds] |
| 11:54:03 | | march_happy (march_happy) joins |
| 11:57:48 | | pokes quits [Remote host closed the connection] |
| 12:10:36 | | march_happy quits [Ping timeout: 265 seconds] |
| 12:11:22 | | march_happy (march_happy) joins |
| 12:26:33 | | jacobk quits [Ping timeout: 265 seconds] |
| 12:44:20 | | CatBatHat joins |
| 12:55:10 | | jacobk joins |
| 13:29:23 | | CatBatHat quits [Ping timeout: 265 seconds] |
| 13:32:46 | | jacobk quits [Ping timeout: 265 seconds] |
| 13:41:26 | | LeGoupil joins |
| 13:46:01 | | binzyboi quits [Quit: Leaving] |
| 14:08:24 | | Arcorann quits [Ping timeout: 265 seconds] |
| 14:25:23 | | LeGoupil quits [Client Quit] |
| 14:41:17 | <@arkiver> | AK: lets take that to #// |
| 14:48:39 | | march_happy quits [Ping timeout: 265 seconds] |
| 14:49:25 | | march_happy (march_happy) joins |
| 14:59:31 | | wessel1512 is now authenticated as wessel1512 |
| 15:01:12 | | Megame quits [Client Quit] |
| 15:08:28 | | march_happy quits [Ping timeout: 265 seconds] |
| 15:08:42 | | march_happy (march_happy) joins |
| 15:17:01 | | jacobk joins |
| 15:30:21 | | JackThompson joins |
| 15:39:45 | | JackThompson quits [Client Quit] |
| 15:46:33 | | JackThompson joins |
| 15:54:23 | | rsn quits [Ping timeout: 265 seconds] |
| 16:05:10 | | qwertyasdfuiopghjkl joins |
| 17:33:05 | | swety-lis joins |
| 17:33:22 | | swety-lis quits [Remote host closed the connection] |
| 17:47:18 | | AnotherIki joins |
| 17:50:52 | | Iki1 quits [Ping timeout: 265 seconds] |
| 18:15:31 | | march_happy quits [Ping timeout: 265 seconds] |
| 18:26:59 | | Church quits [Ping timeout: 265 seconds] |
| 18:42:06 | | jacobk quits [Ping timeout: 265 seconds] |
| 18:44:55 | | march_happy (march_happy) joins |
| 18:46:20 | | Church (Church) joins |
| 18:47:47 | | Megame (Megame) joins |
| 19:07:35 | | march_happy quits [Ping timeout: 265 seconds] |
| 19:10:00 | | march_happy (march_happy) joins |
| 19:14:12 | | rsn joins |
| 19:23:10 | | LeGoupil joins |
| 19:46:15 | | thetechrobo_ joins |
| 19:49:46 | | TheTechRobo quits [Ping timeout: 265 seconds] |
| 20:26:19 | | thetechrobo_ is now known as TheTechRobo |
| 20:26:30 | | TheTechRobo is now authenticated as TheTechRobo |
| 20:50:33 | | jacobk joins |
| 21:06:00 | | yyyy joins |
| 21:06:20 | | yyyy quits [Remote host closed the connection] |
| 21:08:45 | | spirit quits [Client Quit] |
| 21:22:42 | | Megame quits [Client Quit] |
| 21:33:06 | | eroc1990 quits [Client Quit] |
| 21:39:20 | | eroc1990 (eroc1990) joins |
| 21:51:35 | | LeGoupil quits [Client Quit] |
| 21:58:31 | | BlueMaxima joins |
| 22:02:20 | | eroc1990 quits [Client Quit] |
| 22:23:27 | | eroc1990 (eroc1990) joins |
| 22:37:15 | | datechnoman quits [Client Quit] |
| 22:37:38 | | datechnoman (datechnoman) joins |
| 22:43:11 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 22:43:31 | | Minkafighter joins |
| 22:51:33 | | Arcorann (Arcorann) joins |
| 23:19:03 | | geezabiscuit quits [Ping timeout: 265 seconds] |
| 23:19:12 | | drin joins |
| 23:19:46 | | drin is now known as geezabiscuit |
| 23:22:33 | | binzyboi joins |
| 23:29:41 | | march_happy quits [Ping timeout: 265 seconds] |