| 00:00:45 | | dm4v quits [Read error: Connection reset by peer] |
| 00:01:11 | | dm4v joins |
| 00:01:13 | | dm4v is now authenticated as dm4v |
| 00:01:13 | | dm4v quits [Changing host] |
| 00:01:13 | | dm4v (dm4v) joins |
| 00:26:06 | | Core1427 (Cobalt17) joins |
| 00:28:13 | | Earendil quits [Ping timeout: 258 seconds] |
| 00:58:59 | | Stiletto joins |
| 01:00:37 | | Arcorann_ joins |
| 01:02:43 | | dm4v quits [Ping timeout: 258 seconds] |
| 01:03:06 | | dm4v joins |
| 01:03:09 | | dm4v is now authenticated as dm4v |
| 01:03:09 | | dm4v quits [Changing host] |
| 01:03:09 | | dm4v (dm4v) joins |
| 01:09:55 | | Earendil (Cobalt17) joins |
| 01:11:55 | | Core1427 quits [Ping timeout: 258 seconds] |
| 02:03:40 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 02:09:09 | <@JAA> | So the anniversary of the death of Comicogs & Co. is approaching, and they *still* haven't managed to upload the dumps... *facepalm* https://comics.discogs.com/ |
| 02:21:42 | <Ryz> | You know the vid dot me stuff, non-zero chance the companies that have embedded links to that website will just delete the articles instead |
| 02:22:09 | <@JAA> | Not much we can do about that though. |
| 02:26:13 | <pabs> | could use a search engine to find links to vid.me and archive the articles? |
| 02:28:48 | <@JAA> | They're embeds, not links. Is there any search engine that indexes those? |
| 02:30:57 | <pabs> | hmm |
| 02:31:30 | <@JAA> | Also, scraping search engines is hard to impossible. They don't like that at all. Only Bing kind of tolerates a slow speed. |
| 02:31:45 | <@JAA> | So... $£€$£€ |
| 02:55:45 | | Megame (Megame) joins |
| 03:05:45 | | HP_Archivist (HP_Archivist) joins |
| 03:15:46 | | qw3rty__ joins |
| 03:19:34 | | qw3rty_ quits [Ping timeout: 258 seconds] |
| 03:43:02 | | Iki joins |
| 04:16:35 | | Ruthalas quits [Read error: Connection reset by peer] |
| 04:16:51 | | Ruthalas (Ruthalas) joins |
| 04:18:12 | | AntiLiberal joins |
| 04:28:34 | | AntiLiberal quits [Ping timeout: 258 seconds] |
| 04:30:43 | | AntiLiberal joins |
| 04:34:33 | <h2ibot> | Bauerbach edited SCP Foundation (+100): https://wiki.archiveteam.org/?diff=47007&oldid=46701 |
| 04:40:26 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:46:40 | | BlueMaxima quits [Client Quit] |
| 04:47:21 | | AntiLiberal quits [Ping timeout: 258 seconds] |
| 05:17:15 | | Iki quits [Ping timeout: 258 seconds] |
| 05:21:00 | | Core8615 (Cobalt17) joins |
| 05:21:00 | | Earendil quits [Read error: Connection reset by peer] |
| 05:33:36 | | Stiletto quits [Ping timeout: 250 seconds] |
| 05:44:00 | | driib quits [Ping timeout: 250 seconds] |
| 05:54:00 | | driib (driib) joins |
| 06:01:23 | | Matthww80 joins |
| 06:02:12 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 06:02:12 | | Matthww80 is now known as Matthww8 |
| 06:06:59 | | Earendil (Cobalt17) joins |
| 06:10:52 | | Core8615 quits [Ping timeout: 250 seconds] |
| 06:39:27 | | vela quits [Quit: vela] |
| 06:43:32 | | vela (vela) joins |
| 07:27:54 | | Core2100 (Cobalt17) joins |
| 07:29:18 | | Earendil quits [Ping timeout: 250 seconds] |
| 07:32:14 | | spirit joins |
| 07:33:43 | | tzt quits [Ping timeout: 258 seconds] |
| 07:57:06 | | swebb quits [Ping timeout: 258 seconds] |
| 07:59:09 | | swebb joins |
| 08:02:08 | <h2ibot> | Sanqui created Sweb.cz (+215, Created page with "Czech freehost provided by…): https://wiki.archiveteam.org/?title=Sweb.cz |
| 08:39:39 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 09:27:38 | | Matthww89 joins |
| 09:28:54 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 09:28:54 | | Matthww89 is now known as Matthww8 |
| 09:35:47 | | Matthww83 joins |
| 09:36:42 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 09:36:42 | | Matthww83 is now known as Matthww8 |
| 10:11:54 | | ragu__ joins |
| 10:13:34 | | ragu_ quits [Ping timeout: 258 seconds] |
| 10:14:01 | | Stiletto joins |
| 10:18:17 | | Stiletto quits [Read error: Connection reset by peer] |
| 10:18:34 | | Stiletto joins |
| 12:06:38 | | TheRealZago (TheRealZago) joins |
| 12:19:18 | | @OrIdow6 quits [Ping timeout: 258 seconds] |
| 12:28:21 | | TheRealZago quits [Read error: Connection reset by peer] |
| 12:37:50 | | fuzzy8021 quits [Ping timeout: 250 seconds] |
| 12:51:23 | | Core2100 quits [Remote host closed the connection] |
| 12:51:39 | | Earendil (Cobalt17) joins |
| 13:03:18 | | fuzzy8021 (fuzzy8021) joins |
| 13:50:09 | | AntiLiberal joins |
| 13:50:10 | | Matthww88 joins |
| 13:51:04 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 13:51:04 | | Matthww88 is now known as Matthww8 |
| 13:53:29 | | AntiLiberal2 joins |
| 13:56:17 | | AntiLiberal quits [Ping timeout: 258 seconds] |
| 14:03:38 | | abcde quits [Ping timeout: 244 seconds] |
| 14:09:50 | | Matthww86 joins |
| 14:11:52 | | Matthww8 quits [Ping timeout: 250 seconds] |
| 14:11:52 | | Matthww86 is now known as Matthww8 |
| 14:46:43 | | marked10 joins |
| 14:48:16 | | marked1 quits [Ping timeout: 250 seconds] |
| 14:48:16 | | marked10 is now known as marked1 |
| 14:54:32 | | Gereon quits [Quit: The Lounge - https://thelounge.chat] |
| 14:59:09 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 15:03:52 | | Barto quits [Ping timeout: 250 seconds] |
| 15:04:08 | | Barto (Barto) joins |
| 15:07:26 | | Gereon (Gereon) joins |
| 15:40:41 | | Ryz quits [Remote host closed the connection] |
| 15:41:11 | | Ryz (Ryz) joins |
| 17:06:10 | | ragu_ joins |
| 17:06:17 | | tzt joins |
| 17:09:52 | | ragu__ quits [Ping timeout: 258 seconds] |
| 18:23:56 | | HP_Archivist (HP_Archivist) joins |
| 19:02:11 | | Earendil quits [Read error: Connection reset by peer] |
| 19:02:33 | | Earendil (Cobalt17) joins |
| 19:03:33 | | DogsRNice (Webuser299) joins |
| 19:05:22 | <@JAA> | FloydHub's a JS hellhole, so archiving it is tricky. The search accepts an empty query: https://www.floydhub.com/search/projects?page=0&query= |
| 19:05:23 | | Megame quits [Client Quit] |
| 19:07:10 | <Jake> | 11806 pages on the empty search for projects |
| 19:07:25 | <@JAA> | Fun |
| 19:08:07 | <Jake> | 5738 pages of datasets |
| 19:09:26 | <Jake> | 8644 pages of users |
| 19:09:31 | <@JAA> | lol |
| 19:09:35 | <@JAA> | It's all in their main JS file. |
| 19:09:42 | <@JAA> | Because SPA. |
| 19:10:10 | | Stiletto quits [Remote host closed the connection] |
| 19:10:11 | <@JAA> | Oh wait no, those are other hits. Odd |
| 19:10:38 | <Jake> | SPA :( |
| 19:11:11 | | GNU_world joins |
| 19:11:12 | | S0V3R3IGNTY joins |
| 19:11:19 | | jamesp joins |
| 19:12:10 | | graf__ joins |
| 19:12:13 | <jamesp> | wait, who is @JAA? |
| 19:12:20 | <@JAA> | Jake: How did you find those numbers so quickly? |
| 19:12:33 | <Jake> | just casually went through it, starting with bigger numbers |
| 19:12:56 | <rewby> | jamesp: He's just another archivist. |
| 19:13:04 | <@JAA> | :-) |
| 19:13:35 | <jamesp> | I'm just wondering about textfiles. Does he come on? |
| 19:13:53 | <rewby> | He's around. Usually. |
| 19:14:12 | <rewby> | Don't ping him though, that's like poking a sleeping bear |
| 19:14:31 | <jamesp> | he isn't on |
| 19:14:47 | <rewby> | His IRC username isn't textfiles |
| 19:15:28 | <jamesp> | then what is it? put a space before the last character, like james p |
| 19:16:57 | <@JAA> | Projects API endpoint is this: https://www.floydhub.com/api/v1/projects/search?query=&limit=15&offset=15 (for the second page) Limit maxes out at a bit over 1000. |
| 19:17:24 | <@JAA> | 1023 is the maximum allowed limit, to be precise. |
| 19:18:13 | <Jake> | some of floydhub seems partially broken already, https://www.floydhub.com/fastai/projects/lesson1_dogs_cats the project is 'empty', but has a few jobs, some of which have files |
| 19:18:29 | <@JAA> | Yeah, I haven't found any non-empty project yet, actually. |
| 19:19:02 | <Jake> | I think they are all displaying as empty for some odd reason, this job seems to show the code for the project. https://www.floydhub.com/fastai/projects/lesson1_dogs_cats/13/code |
| 19:21:20 | <Jake> | max offsets projects: https://www.floydhub.com/api/v1/projects/search?query=&limit=15&offset=177090 datasets: https://www.floydhub.com/api/v1/datasets/search?query=&limit=15&offset=86070 users: https://www.floydhub.com/api/v1/profile/search?query=&limit=15&offset=129660 |
| 19:23:58 | <Jake> | I'll run through all the datasets and get a size estimate real quick. |
| 19:24:05 | | Core7846 (Cobalt17) joins |
| 19:25:38 | <@JAA> | Datasets are separate from job outputs, it seems? |
| 19:26:28 | | Earendil quits [Ping timeout: 250 seconds] |
| 19:26:28 | <Jake> | I believe so |
| 19:26:30 | <@JAA> | But I imagine the datasets will be much larger. |
| 19:32:12 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+141, /* 2021 */ Add FloydHub): https://wiki.archiveteam.org/?diff=47009&oldid=47002 |
| 19:33:22 | | Iki joins |
| 19:33:57 | | bsmith093 joins |
| 19:33:57 | | bsmith093 is now authenticated as bsmith093 |
| 19:37:02 | <Jake> | script started. might be a bit. |
| 19:41:37 | | m0nika quits [Remote host closed the connection] |
| 19:41:51 | | m0nika (m0nika) joins |
| 19:43:45 | | m0nika quits [Client Quit] |
| 19:43:59 | | Earendil (Cobalt17) joins |
| 19:44:01 | | m0nika (m0nika) joins |
| 19:45:58 | | Core7846 quits [Ping timeout: 250 seconds] |
| 19:46:03 | | m0nika quits [Client Quit] |
| 19:46:50 | | m0nika (m0nika) joins |
| 19:58:06 | | AntiLiberal2 quits [Ping timeout: 250 seconds] |
| 20:00:02 | <Ryz> | Tumblr to take a page from Patreon to have Tumblr account posts that can only be accessed through money: https://cdn.discordapp.com/attachments/455120412460974104/868582804609458197/E68AvqbVcAI2FRt.png - https://cdn.discordapp.com/attachments/455120412460974104/868582834409984010/E68AzrMVgAEOLlE.png |
| 20:02:38 | <Jake> | alright, FloydHub datasets, I got 35363216376832 bytes for total size, or around 35 terabytes. |
| 20:02:52 | | qw3rty__ quits [Ping timeout: 250 seconds] |
| 20:03:53 | <Jake> | Will extract a full list of URLs to use later in a minute. |
| 20:04:35 | | qw3rty joins |
| 20:04:36 | <Ryz> | https://techcrunch.com/2021/07/21/tumblr-debuts-post-a-subscription-service-for-gen-z-creators/ - https://techcrunch.com/2021/07/22/tumblr-community-lash-out-post-plus-subscription/ |
| 20:04:37 | | Myself (myself) joins |
| 20:13:05 | | rsn joins |
| 20:15:13 | <@JAA> | 35 TB doesn't sound too bad. I wonder how much duplication there is. |
| 20:15:13 | | Ajay_m quits [Read error: Connection reset by peer] |
| 20:15:23 | <jamesp> | Wow...the videos are still up! |
| 20:15:28 | <@JAA> | Dude... |
| 20:16:11 | | Ajay_m joins |
| 20:16:15 | <jamesp> | sorry wrong channel |
| 20:16:32 | | spirit quits [Client Quit] |
| 20:17:23 | <Jake> | I also used totalSizeBytes rather than latestSizeBytes so that might count however many versions exist, most seem to have one version, though. I'll do another run with latest as well. |
| 20:31:55 | | abcde joins |
| 20:52:35 | | Iki quits [Ping timeout: 258 seconds] |
| 20:57:02 | | Doranwen quits [Ping timeout: 250 seconds] |
| 20:57:34 | | Ajay_m quits [Ping timeout: 258 seconds] |
| 21:01:04 | | Ajay_m joins |
| 21:01:13 | <VerifiedJ> | Looks likes there is a fair bit of duplication. there are ~1K forks(?) of dog-breed-images dataset. A version of that dataset is 700MB. |
| 21:01:41 | <VerifiedJ> | lists of users, datasets and projects https://verifiedjoseph.com/archiveteam/website-discovery/floydhub.com/ |
| 21:07:26 | | Core7292 (Cobalt17) joins |
| 21:07:47 | <Jake> | beat me to it! :-) |
| 21:08:11 | <Jake> | total size with just the latest version is 29575463216128 or 29TB. |
| 21:10:04 | <Jake> | my version: https://transfer.archivete.am/c9djz/dataset_ids as well as all of the JSON from the datasets: https://transfer.archivete.am/TID3N/dataset_full_json |
| 21:11:22 | | Earendil quits [Ping timeout: 258 seconds] |
| 21:19:49 | | HP_Archivist quits [Read error: Connection reset by peer] |
| 21:20:08 | | HP_Archivist (HP_Archivist) joins |
| 21:24:28 | <@JAA> | There's obviously no good way to detect duplicates just from this, but summing up unique latestSizeBytes over 1 GiB gives 8.2 TiB. |
| 21:25:32 | | jacobk joins |
| 21:25:51 | <@JAA> | Well, all datasets over 1 GiB are 8.5 TiB though, so I guess there's not too much duplication, maybe. |
| 21:29:26 | <jamesp> | I checked the tracker and I don't see it moving. What's happening |
| 21:30:47 | <@JAA> | jamesp: Still the wrong channel. |
| 21:31:06 | <jamesp> | oops. I keep forgetting |
| 21:36:34 | | ArchivalEfforts joins |
| 21:42:24 | | Stiletto joins |
| 22:00:21 | | Matthww81 joins |
| 22:00:49 | | Core7292 quits [Read error: Connection reset by peer] |
| 22:01:03 | | Earendil (Cobalt17) joins |
| 22:01:12 | | Matthww8 quits [Ping timeout: 258 seconds] |
| 22:01:12 | | Matthww81 is now known as Matthww8 |
| 22:01:33 | | Core7787 (Cobalt17) joins |
| 22:05:25 | | Earendil quits [Ping timeout: 258 seconds] |
| 22:08:29 | | Core7787 quits [Ping timeout: 258 seconds] |
| 22:18:04 | | jacobk quits [Ping timeout: 250 seconds] |
| 22:31:43 | | driib8 (driib) joins |
| 22:35:24 | | driib quits [Ping timeout: 250 seconds] |
| 22:35:24 | | driib8 is now known as driib |
| 22:39:25 | | jamesp quits [Remote host closed the connection] |
| 22:47:53 | | jamesp joins |
| 22:53:50 | | Iki joins |
| 22:57:04 | | AntiLiberal joins |
| 23:05:36 | | driib quits [Ping timeout: 258 seconds] |
| 23:31:22 | | Earendil (Cobalt17) joins |
| 23:34:25 | | Core2270 (Cobalt17) joins |
| 23:35:53 | | Iki quits [Ping timeout: 258 seconds] |
| 23:37:48 | | Ajay_m quits [Ping timeout: 258 seconds] |
| 23:38:11 | | Earendil quits [Ping timeout: 258 seconds] |
| 23:45:39 | | WarHawk80 joins |
| 23:45:55 | <WarHawk80> | hello can't rsync my warrior anymore |
| 23:46:05 | <WarHawk80> | im getting this from my client |
| 23:46:24 | <@JAA> | It's being worked on. |
| 23:47:12 | <WarHawk80> | ah ok...cool...wasn't sure if I borked something up |
| 23:47:25 | <WarHawk80> | thanks...I'll leave it running then |
| 23:47:26 | <WarHawk80> | thanks |
| 23:49:05 | <WarHawk80> | cool looks like it's working now...port changed on the rsync...nice! Thanks alot! |
| 23:52:21 | | WarHawk80 leaves |
| 23:59:28 | | HP_Archivist quits [Ping timeout: 250 seconds] |