| 00:36:56 | | Arcorann (Arcorann) joins |
| 00:37:36 | | wyatt8750 joins |
| 00:40:05 | | fuzzy8021 quits [Read error: Connection reset by peer] |
| 00:40:07 | | wyatt8740 quits [Ping timeout: 245 seconds] |
| 00:40:33 | | fuzzy8021 (fuzzy8021) joins |
| 00:40:58 | | eroc1990 quits [Client Quit] |
| 00:41:00 | | ave9 quits [Quit: Ping timeout (120 seconds)] |
| 00:41:06 | | lun4 quits [Client Quit] |
| 00:41:08 | | dxrt_ quits [Client Quit] |
| 00:41:14 | | seednode4943 quits [Client Quit] |
| 00:41:15 | | igloo22225 quits [Quit: Ping timeout (120 seconds)] |
| 00:41:21 | | eroc1990 (eroc1990) joins |
| 00:41:21 | | seednode4943 (seednode) joins |
| 00:41:24 | | jtagcat62 quits [Quit: Ping timeout (120 seconds)] |
| 00:41:33 | | dxrt joins |
| 00:41:35 | | dxrt is now authenticated as dxrt |
| 00:41:35 | | dxrt quits [Changing host] |
| 00:41:35 | | dxrt (dxrt) joins |
| 00:41:35 | | @ChanServ sets mode: +o dxrt |
| 00:41:54 | | nepeat_ quits [Client Quit] |
| 00:42:13 | | nepeat (nepeat) joins |
| 00:42:39 | | jtagcat62 (jtagcat) joins |
| 00:43:34 | | ave (ave) joins |
| 00:43:38 | | igloo22225 (igloo22225) joins |
| 00:43:39 | | lun4 (lun4) joins |
| 01:15:47 | | BlueMaxima quits [Remote host closed the connection] |
| 01:15:47 | | Arcorann quits [Remote host closed the connection] |
| 01:15:47 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 01:15:47 | | superkuh_ quits [Remote host closed the connection] |
| 01:15:52 | | BlueMaxima joins |
| 01:16:05 | | superkuh_ joins |
| 01:21:54 | | Arcorann (Arcorann) joins |
| 01:24:07 | | qwertyasdfuiopghjkl joins |
| 01:37:06 | | DopefishJustin quits [Remote host closed the connection] |
| 01:40:54 | | tzt_ is now known as tzt |
| 02:38:48 | | DopefishJustin joins |
| 02:38:48 | | DopefishJustin is now authenticated as DopefishJustin |
| 02:57:29 | | Atom joins |
| 03:31:53 | | ArchivalEfforts quits [Ping timeout: 265 seconds] |
| 03:32:31 | | ArchivalEfforts joins |
| 03:46:09 | | systwi quits [Ping timeout: 246 seconds] |
| 03:51:27 | | systwi (systwi) joins |
| 04:35:03 | | mutantmonkey quits [Remote host closed the connection] |
| 04:35:20 | | mutantmonkey (mutantmonkey) joins |
| 04:36:59 | <drexler> | JAA, Hm, my RaveArchive scrape stopped early. |
| 04:37:02 | <drexler> | 180gb |
| 04:37:12 | <drexler> | You're certain it's supposed to be 293gb or whatever? |
| 04:37:20 | <drexler> | Plus it says there are 1707 items in the results, but I didn't get that |
| 04:37:29 | <drexler> | I got 1055 |
| 04:37:34 | <@JAA> | drexler: Well, that's what adding the sizes for all files in all items in that collection gave me, yeah. |
| 04:37:40 | <drexler> | Yeah. |
| 04:37:52 | <drexler> | However I didn't get like, any errors or whatever. |
| 04:37:55 | <drexler> | The scrape just stopped. Weird. |
| 04:38:12 | <@JAA> | How are you downloading? |
| 04:38:19 | <drexler> | Uh, I have a script. |
| 04:38:24 | <drexler> | Here lemme gist it |
| 04:39:07 | <drexler> | https://gist.github.com/JD-P/6d591d13847909fe4e85effbf037c2e0 |
| 04:39:37 | <drexler> | End page was 17, so +2 is 19 |
| 04:39:40 | <drexler> | So it's not the indexing |
| 04:40:15 | <@JAA> | Hmm |
| 04:41:01 | <drexler> | I know right? How odd. |
| 04:42:10 | <@JAA> | Nothing surprises me anymore about IA. :-) |
| 04:42:20 | <drexler> | Yeah, maybe some items are duplicates? |
| 04:42:51 | <@JAA> | The item count should be accurate. But perhaps some aren't publicly accessible. |
| 04:43:31 | <drexler> | Oh, maybe. |
| 04:46:09 | <Jake> | Does it not factor that in? |
| 04:46:21 | <@JAA> | My method does not. |
| 04:46:26 | <@JAA> | But checking now. |
| 04:46:43 | <@JAA> | I really just added the sizes for all files that appear in the metadata. |
| 04:48:52 | <@JAA> | Ok, no access-restricted items in that collection. |
| 04:49:23 | <drexler> | Bizarre |
| 04:50:10 | <@JAA> | There are 25448 files in total in that collection. |
| 04:50:53 | <@JAA> | Or at least that's what `ia metadata` returns. I'm not sure if there might be on-the-fly file formats and similar weirdness. |
| 04:51:05 | <drexler> | find gives me 17793 |
| 04:51:22 | <@JAA> | Don't remember how `ia` handles those by default (and it depends on the version too, I think). |
| 04:51:55 | <drexler> | Maybe there's a bug in my scraping script? It's pretty short, I don't see where there would be. |
| 04:52:05 | <Jake> | Do you possibly have a list of identifiers you've downloaded? |
| 04:52:14 | <drexler> | Yeah I do. |
| 04:52:16 | <drexler> | One moment. |
| 04:52:29 | <drexler> | Um...getting it off this server could take a bit |
| 04:52:53 | <@JAA> | https://transfer.archivete.am/3dONk/collection:ravearchive.jsonl.zst |
| 04:53:01 | <@JAA> | That's the full metadata for the entire collection I get. |
| 04:54:07 | <drexler> | https://gist.github.com/JD-P/4d4dc71e3bf1db1e9323dd0bd788f883 |
| 04:54:22 | <@JAA> | I'm not familiar with advancedsearch.php. Why did you use that rather than the ia package's search? |
| 04:54:47 | <drexler> | Because I was piecing together my scraping script from StackOverflow |
| 04:54:56 | <drexler> | And didn't know the IA package had one |
| 04:55:00 | <@JAA> | Ah :-) |
| 04:55:12 | <@JAA> | Yeah, there is one, and that's what I used to grab the metadata, basically. |
| 04:55:26 | <@JAA> | ia search collection:ravearchive --itemlist | xargs -P 18 -n 100 bash -c 'fn=$(mktemp "ravearchive.XXXXXX"); ia metadata "$@" >"${fn}"' bash |
| 04:55:32 | | sec^nd quits [Ping timeout: 245 seconds] |
| 04:56:01 | <@JAA> | Don't know what it corresponds to in Python. I do most of these things with the CLI. |
| 04:57:41 | <Jake> | I see quite a few identifiers missing from yours that are accessible. One moment. |
| 04:58:45 | <@JAA> | All existing identifiers: https://transfer.archivete.am/12i23Y/collection:ravearchive-identifiers |
| 04:59:29 | <Jake> | https://jakel.rocks/up/8354d89677fd4c74/diffed-ravearchive (and a roughly diffed version...) |
| 05:00:21 | <Jake> | 656, which is 4 off the difference we got and a few artifacts from the original list. |
| 05:00:38 | <drexler> | Yeah. |
| 05:00:46 | <drexler> | Alright, worst case scenario I'll scrape with that, BUT. |
| 05:00:47 | | sec^nd (second) joins |
| 05:00:53 | <Jake> | Why it didn't work originally? No clue. :) |
| 05:00:56 | <drexler> | Lets see if using the IA package search fixes it |
| 05:01:11 | <Jake> | if I were to guess, something weird with advancedsearch.php or something. |
| 05:02:47 | <@JAA> | Yeah, the package search uses a different endpoint by default. It should still work with advancedsearch.php though. |
| 05:03:34 | | G4te_Keep3r3 quits [Client Quit] |
| 05:03:48 | | G4te_Keep3r3 joins |
| 05:04:15 | <@JAA> | Full file list as $identifier/$filename: https://transfer.archivete.am/yosX5/collection:ravearchive-files.zst |
| 05:04:36 | <drexler> | Yeah, the internetarchive library seems to do it correctly |
| 05:04:38 | <drexler> | Bizarre |
| 05:04:48 | <drexler> | Oh well, thanks for the help, I'll use that in the future. |
| 05:04:53 | | systwi__ (systwi) joins |
| 05:05:07 | | systwi quits [Ping timeout: 245 seconds] |
| 05:05:09 | <Jake> | best I can tell, advancedsearch appears to be returning the correct result as well 🤔 |
| 05:06:04 | <drexler> | Yeah, but I actually iterated over the internetarchive search to confirm there are 1707 entries represented :p |
| 05:07:13 | | systwi__ is now known as systwi |
| 05:10:44 | <@JAA> | There weren't any new uploads or similar that could've derailed the pagination either. |
| 05:12:10 | | fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))] |
| 05:12:16 | | fuzzy8021 (fuzzy8021) joins |
| 05:15:39 | | BlueMaxima quits [Client Quit] |
| 05:19:57 | <drexler> | It seems to be going now with the missing items, thanks for your help. |
| 05:26:11 | <Jake> | Great! 🥳 |
| 05:26:20 | <Jake> | Glad we could help. |
| 05:26:57 | <@JAA> | I recommend double-checking against the file list at the end to make sure. :-) |
| 05:33:19 | <systwi> | How well do Google Drive folders save in AB? |
| 05:33:26 | <@JAA> | Not |
| 05:33:45 | <systwi> | Is there a better way of saving the data from one? |
| 05:34:11 | <systwi> | (aside from downloading a copy of the data the normal way) |
| 05:34:27 | <@JAA> | #googlecrash in theory, but it's been dormant for a good while now. |
| 06:10:53 | | tech234a quits [Quit: Connection closed for inactivity] |
| 06:11:23 | | systwi quits [Ping timeout: 265 seconds] |
| 07:18:16 | | Mateon2 joins |
| 07:18:36 | | Gaelan_ quits [Client Quit] |
| 07:18:36 | | Mateon1 quits [Remote host closed the connection] |
| 07:18:36 | | Mateon2 is now known as Mateon1 |
| 07:18:52 | | Gaelan (Gaelan) joins |
| 07:46:08 | | dm4v joins |
| 07:46:11 | | dm4v is now authenticated as dm4v |
| 07:46:11 | | dm4v quits [Changing host] |
| 07:46:11 | | dm4v (dm4v) joins |
| 07:53:55 | | dm4v quits [Client Quit] |
| 07:55:25 | | dm4v joins |
| 08:02:24 | | DiscantX joins |
| 08:03:13 | | DiscantX quits [Remote host closed the connection] |
| 08:03:33 | | DiscantX joins |
| 08:22:35 | | Mayk78 joins |
| 08:22:39 | | dm4v quits [Client Quit] |
| 08:22:39 | | DiscantX quits [Remote host closed the connection] |
| 08:22:39 | | Mayk quits [Client Quit] |
| 08:22:43 | | dm4v joins |
| 08:22:48 | | DiscantX joins |
| 08:28:35 | | girst quits [Client Quit] |
| 08:28:53 | | girst (girst) joins |
| 08:58:05 | | adia quits [Client Quit] |
| 08:59:31 | | adia (adia) joins |
| 10:00:58 | | cpina quits [Ping timeout: 265 seconds] |
| 10:07:10 | | cpina joins |
| 10:42:03 | | DiscantX quits [Ping timeout: 265 seconds] |
| 10:46:01 | | mgrytbak quits [Remote host closed the connection] |
| 10:46:06 | | mgrytbak joins |
| 11:01:37 | | driib8 (driib) joins |
| 11:02:48 | | systwi (systwi) joins |
| 11:05:03 | | driib quits [Ping timeout: 246 seconds] |
| 11:05:03 | | driib8 is now known as driib |
| 11:14:13 | | driib quits [Client Quit] |
| 11:14:40 | | driib (driib) joins |
| 11:48:40 | | VerifiedJ quits [Client Quit] |
| 11:49:28 | | VerifiedJ (VerifiedJ) joins |
| 12:35:07 | | Stiletto quits [Ping timeout: 245 seconds] |
| 12:35:53 | | Stiletto joins |
| 12:40:54 | | LeGoupil joins |
| 12:46:03 | | Lord_Nightmare quits [Client Quit] |
| 12:51:44 | | Lord_Nightmare (Lord_Nightmare) joins |
| 13:18:11 | | march_happy (march_happy) joins |
| 13:19:15 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 13:53:08 | | duce1337 is now authenticated as duce1337 |
| 13:53:08 | | duce1337 quits [Changing host] |
| 13:53:08 | | duce1337 (duce1337) joins |
| 14:05:32 | | Arcorann quits [Ping timeout: 245 seconds] |
| 14:07:12 | | march_happy quits [Ping timeout: 245 seconds] |
| 14:07:30 | | march_happy (march_happy) joins |
| 14:12:12 | | jacobk quits [Ping timeout: 245 seconds] |
| 14:57:14 | | jacobk joins |
| 15:08:22 | | systwi quits [Ping timeout: 265 seconds] |
| 15:08:25 | | systwi (systwi) joins |
| 16:01:11 | | jacobk quits [Client Quit] |
| 16:37:59 | | jacobk joins |
| 16:50:07 | | march_happy quits [Ping timeout: 245 seconds] |
| 16:53:28 | | qwertyasdfuiopghjkl joins |
| 17:46:50 | | lennier1 quits [Client Quit] |
| 17:48:03 | | lennier1 (lennier1) joins |
| 18:03:20 | | jacobk quits [Ping timeout: 265 seconds] |
| 18:07:57 | | jacobk joins |
| 18:15:33 | | Minkafighter quits [Client Quit] |
| 18:16:13 | | Minkafighter3 joins |
| 18:20:07 | | Minkafighter3 quits [Client Quit] |
| 18:20:46 | | Minkafighter3 joins |
| 18:22:14 | | Minkafighter3 quits [Client Quit] |
| 18:22:36 | | Minkafighter3 joins |
| 20:02:39 | | jacobk quits [Ping timeout: 246 seconds] |
| 20:51:53 | | LeGoupil quits [Client Quit] |
| 21:09:19 | | jacobk joins |
| 21:14:23 | | DiscantX joins |
| 21:15:57 | | jacobk quits [Ping timeout: 245 seconds] |
| 21:17:44 | <Ryz> | I'm assuming there's still no service or project on archiving Mega.nz content; considering the trouble with https://www.youtube.com/c/SynaMax/videos via https://www.nintendolife.com/news/2022/06/youtuber-ends-metroid-prime-music-covers-after-nintendos-lawyers-call |
| 21:18:09 | <Ryz> | Seeing https://www.youtube.com/post/UgkxhGvZreruILaguvaarzuhX8ss0KOTk73N having this link: https://mega.nz/file/jzBXmDgD#fA1M68kLf_Bqmwl6m2FcW9BBH5dcJGvO1uQbB_UQH58 |
| 21:19:19 | <@arkiver> | how often do we get MEGA stuff that needs to be archived? |
| 21:26:11 | <@arkiver> | Ryz: ^ |
| 21:30:11 | <Ryz> | Trying to find any additional examples in my text files; can't fully represent the others but myself (although I did see some Mega.nz links mentioned in this channel via the logs); |
| 21:33:50 | <Ryz> | Google Drive seems to be used more than Mega.nz - as not only everyday folk use it but also companies and government (I really /wish/ I saved that link for reference) |
| 21:35:15 | | lennier1 quits [Client Quit] |
| 21:35:44 | | lennier1 (lennier1) joins |
| 21:36:03 | <Ryz> | One example is https://mega.nz/#F!8fZAgYQZ!JVt6ZeIdZlSeb3JZn-Pmkg (from https://azogstudio.blogspot.com/ - I haven't archived that yet because to ensure full coverage, I want to archive that link and potentially other links) - admittedly it's more of a want than a need, but what I've seem to be learning more and more over the years is that websites |
| 21:36:03 | <Ryz> | tend to suddenly poof as little as in 3 months |
| 21:36:35 | <Ryz> | Plus me being proactive and archiving websites that won't announce they'll be shutting down that just suddenly poof :/ |
| 21:45:33 | | igloo222250 joins |
| 21:45:37 | | driib5 (driib) joins |
| 21:45:44 | | marto_7 joins |
| 21:45:49 | | Craigle3 (Craigle) joins |
| 21:45:49 | | flashfire421 (flashfire42) joins |
| 21:46:20 | | Ryz3 (Ryz) joins |
| 21:46:38 | | drin joins |
| 21:46:45 | | chfoo (chfoo) joins |
| 21:46:45 | | @ChanServ sets mode: +o chfoo |
| 21:46:46 | | Frogging- joins |
| 21:46:48 | | geezabiscuit quits [Client Quit] |
| 21:46:48 | | driib quits [Client Quit] |
| 21:46:48 | | Craigle quits [Client Quit] |
| 21:46:48 | | billy549- quits [Ping timeout: 265 seconds] |
| 21:46:48 | | eroc1990 quits [Client Quit] |
| 21:46:48 | | igloo22225 quits [Client Quit] |
| 21:46:48 | | flashfire42 quits [Client Quit] |
| 21:46:48 | | @chfoo_ quits [Excess Flood] |
| 21:46:48 | | Ryz quits [Client Quit] |
| 21:46:48 | | marto_ quits [Client Quit] |
| 21:46:48 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 21:46:48 | | Frogging101 quits [Remote host closed the connection] |
| 21:46:48 | | Craigle3 is now known as Craigle |
| 21:46:48 | | driib5 is now known as driib |
| 21:46:48 | | Ryz3 is now known as Ryz |
| 21:46:48 | | marto_7 is now known as marto_ |
| 21:46:48 | | igloo222250 is now known as igloo22225 |
| 21:46:48 | | flashfire421 is now known as flashfire42 |
| 21:46:49 | | jspiros_ quits [Client Quit] |
| 21:46:49 | | katocala quits [Remote host closed the connection] |
| 21:46:50 | | jspiros (jspiros) joins |
| 21:46:58 | | Frogging- is now known as Frogging101 |
| 21:47:21 | | drin is now known as geezabiscuit |
| 21:47:27 | | katocala joins |
| 21:48:46 | | billy549 (Billy549) joins |
| 21:50:58 | <Ryz> | arkiver ^ |
| 21:55:15 | | katocala is now authenticated as katocala |
| 22:06:26 | | march_happy (march_happy) joins |
| 22:46:13 | | march_happy quits [Remote host closed the connection] |
| 22:49:00 | | march_happy (march_happy) joins |
| 23:07:58 | | Discant joins |
| 23:09:01 | | Discant quits [Remote host closed the connection] |
| 23:10:02 | | Discant joins |
| 23:11:22 | | DiscantX quits [Ping timeout: 245 seconds] |
| 23:14:17 | | Discant quits [Ping timeout: 245 seconds] |
| 23:26:49 | | jacobk joins |
| 23:42:19 | | tech234a (tech234a) joins |
| 23:54:52 | | Hackerpcs quits [Quit: Hackerpcs] |
| 23:57:06 | | Hackerpcs (Hackerpcs) joins |