00:36:56Arcorann (Arcorann) joins
00:37:36wyatt8750 joins
00:40:05fuzzy8021 quits [Read error: Connection reset by peer]
00:40:07wyatt8740 quits [Ping timeout: 245 seconds]
00:40:33fuzzy8021 (fuzzy8021) joins
00:40:58eroc1990 quits [Client Quit]
00:41:00ave9 quits [Quit: Ping timeout (120 seconds)]
00:41:06lun4 quits [Client Quit]
00:41:08dxrt_ quits [Client Quit]
00:41:14seednode4943 quits [Client Quit]
00:41:15igloo22225 quits [Quit: Ping timeout (120 seconds)]
00:41:21eroc1990 (eroc1990) joins
00:41:21seednode4943 (seednode) joins
00:41:24jtagcat62 quits [Quit: Ping timeout (120 seconds)]
00:41:33dxrt joins
00:41:35dxrt quits [Changing host]
00:41:35dxrt (dxrt) joins
00:41:35@ChanServ sets mode: +o dxrt
00:41:54nepeat_ quits [Client Quit]
00:42:13nepeat (nepeat) joins
00:42:39jtagcat62 (jtagcat) joins
00:43:34ave (ave) joins
00:43:38igloo22225 (igloo22225) joins
00:43:39lun4 (lun4) joins
01:15:47BlueMaxima quits [Remote host closed the connection]
01:15:47Arcorann quits [Remote host closed the connection]
01:15:47qwertyasdfuiopghjkl quits [Remote host closed the connection]
01:15:47superkuh_ quits [Remote host closed the connection]
01:15:52BlueMaxima joins
01:16:05superkuh_ joins
01:21:54Arcorann (Arcorann) joins
01:24:07qwertyasdfuiopghjkl joins
01:37:06DopefishJustin quits [Remote host closed the connection]
01:40:54tzt_ is now known as tzt
02:38:48DopefishJustin joins
02:57:29Atom joins
03:31:53ArchivalEfforts quits [Ping timeout: 265 seconds]
03:32:31ArchivalEfforts joins
03:46:09systwi quits [Ping timeout: 246 seconds]
03:51:27systwi (systwi) joins
04:35:03mutantmonkey quits [Remote host closed the connection]
04:35:20mutantmonkey (mutantmonkey) joins
04:36:59<drexler>JAA, Hm, my RaveArchive scrape stopped early.
04:37:02<drexler>180gb
04:37:12<drexler>You're certain it's supposed to be 293gb or whatever?
04:37:20<drexler>Plus it says there are 1707 items in the results, but I didn't get that
04:37:29<drexler>I got 1055
04:37:34<@JAA>drexler: Well, that's what adding the sizes for all files in all items in that collection gave me, yeah.
04:37:40<drexler>Yeah.
04:37:52<drexler>However I didn't get like, any errors or whatever.
04:37:55<drexler>The scrape just stopped. Weird.
04:38:12<@JAA>How are you downloading?
04:38:19<drexler>Uh, I have a script.
04:38:24<drexler>Here lemme gist it
04:39:07<drexler>https://gist.github.com/JD-P/6d591d13847909fe4e85effbf037c2e0
04:39:37<drexler>End page was 17, so +2 is 19
04:39:40<drexler>So it's not the indexing
04:40:15<@JAA>Hmm
04:41:01<drexler>I know right? How odd.
04:42:10<@JAA>Nothing surprises me anymore about IA. :-)
04:42:20<drexler>Yeah, maybe some items are duplicates?
04:42:51<@JAA>The item count should be accurate. But perhaps some aren't publicly accessible.
04:43:31<drexler>Oh, maybe.
04:46:09<Jake>Does it not factor that in?
04:46:21<@JAA>My method does not.
04:46:26<@JAA>But checking now.
04:46:43<@JAA>I really just added the sizes for all files that appear in the metadata.
04:48:52<@JAA>Ok, no access-restricted items in that collection.
04:49:23<drexler>Bizarre
04:50:10<@JAA>There are 25448 files in total in that collection.
04:50:53<@JAA>Or at least that's what `ia metadata` returns. I'm not sure if there might be on-the-fly file formats and similar weirdness.
04:51:05<drexler>find gives me 17793
04:51:22<@JAA>Don't remember how `ia` handles those by default (and it depends on the version too, I think).
04:51:55<drexler>Maybe there's a bug in my scraping script? It's pretty short, I don't see where there would be.
04:52:05<Jake>Do you possibly have a list of identifiers you've downloaded?
04:52:14<drexler>Yeah I do.
04:52:16<drexler>One moment.
04:52:29<drexler>Um...getting it off this server could take a bit
04:52:53<@JAA>https://transfer.archivete.am/3dONk/collection:ravearchive.jsonl.zst
04:53:01<@JAA>That's the full metadata for the entire collection I get.
04:54:07<drexler>https://gist.github.com/JD-P/4d4dc71e3bf1db1e9323dd0bd788f883
04:54:22<@JAA>I'm not familiar with advancedsearch.php. Why did you use that rather than the ia package's search?
04:54:47<drexler>Because I was piecing together my scraping script from StackOverflow
04:54:56<drexler>And didn't know the IA package had one
04:55:00<@JAA>Ah :-)
04:55:12<@JAA>Yeah, there is one, and that's what I used to grab the metadata, basically.
04:55:26<@JAA>ia search collection:ravearchive --itemlist | xargs -P 18 -n 100 bash -c 'fn=$(mktemp "ravearchive.XXXXXX"); ia metadata "$@" >"${fn}"' bash
04:55:32sec^nd quits [Ping timeout: 245 seconds]
04:56:01<@JAA>Don't know what it corresponds to in Python. I do most of these things with the CLI.
04:57:41<Jake>I see quite a few identifiers missing from yours that are accessible. One moment.
04:58:45<@JAA>All existing identifiers: https://transfer.archivete.am/12i23Y/collection:ravearchive-identifiers
04:59:29<Jake>https://jakel.rocks/up/8354d89677fd4c74/diffed-ravearchive (and a roughly diffed version...)
05:00:21<Jake>656, which is 4 off the difference we got and a few artifacts from the original list.
05:00:38<drexler>Yeah.
05:00:46<drexler>Alright, worst case scenario I'll scrape with that, BUT.
05:00:47sec^nd (second) joins
05:00:53<Jake>Why it didn't work originally? No clue. :)
05:00:56<drexler>Lets see if using the IA package search fixes it
05:01:11<Jake>if I were to guess, something weird with advancedsearch.php or something.
05:02:47<@JAA>Yeah, the package search uses a different endpoint by default. It should still work with advancedsearch.php though.
05:03:34G4te_Keep3r3 quits [Client Quit]
05:03:48G4te_Keep3r3 joins
05:04:15<@JAA>Full file list as $identifier/$filename: https://transfer.archivete.am/yosX5/collection:ravearchive-files.zst
05:04:36<drexler>Yeah, the internetarchive library seems to do it correctly
05:04:38<drexler>Bizarre
05:04:48<drexler>Oh well, thanks for the help, I'll use that in the future.
05:04:53systwi__ (systwi) joins
05:05:07systwi quits [Ping timeout: 245 seconds]
05:05:09<Jake>best I can tell, advancedsearch appears to be returning the correct result as well 🤔
05:06:04<drexler>Yeah, but I actually iterated over the internetarchive search to confirm there are 1707 entries represented :p
05:07:13systwi__ is now known as systwi
05:10:44<@JAA>There weren't any new uploads or similar that could've derailed the pagination either.
05:12:10fuzzy8021 quits [Killed (NickServ (GHOST command used by fuzzy802!~fuzzy8021@173-224-26-244.ptcnet.net))]
05:12:16fuzzy8021 (fuzzy8021) joins
05:15:39BlueMaxima quits [Client Quit]
05:19:57<drexler>It seems to be going now with the missing items, thanks for your help.
05:26:11<Jake>Great! 🥳
05:26:20<Jake>Glad we could help.
05:26:57<@JAA>I recommend double-checking against the file list at the end to make sure. :-)
05:33:19<systwi>How well do Google Drive folders save in AB?
05:33:26<@JAA>Not
05:33:45<systwi>Is there a better way of saving the data from one?
05:34:11<systwi>(aside from downloading a copy of the data the normal way)
05:34:27<@JAA>#googlecrash in theory, but it's been dormant for a good while now.
06:10:53tech234a quits [Quit: Connection closed for inactivity]
06:11:23systwi quits [Ping timeout: 265 seconds]
07:18:16Mateon2 joins
07:18:36Gaelan_ quits [Client Quit]
07:18:36Mateon1 quits [Remote host closed the connection]
07:18:36Mateon2 is now known as Mateon1
07:18:52Gaelan (Gaelan) joins
07:46:08dm4v joins
07:46:11dm4v quits [Changing host]
07:46:11dm4v (dm4v) joins
07:53:55dm4v quits [Client Quit]
07:55:25dm4v joins
08:02:24DiscantX joins
08:03:13DiscantX quits [Remote host closed the connection]
08:03:33DiscantX joins
08:22:35Mayk78 joins
08:22:39dm4v quits [Client Quit]
08:22:39DiscantX quits [Remote host closed the connection]
08:22:39Mayk quits [Client Quit]
08:22:43dm4v joins
08:22:48DiscantX joins
08:28:35girst quits [Client Quit]
08:28:53girst (girst) joins
08:58:05adia quits [Client Quit]
08:59:31adia (adia) joins
10:00:58cpina quits [Ping timeout: 265 seconds]
10:07:10cpina joins
10:42:03DiscantX quits [Ping timeout: 265 seconds]
10:46:01mgrytbak quits [Remote host closed the connection]
10:46:06mgrytbak joins
11:01:37driib8 (driib) joins
11:02:48systwi (systwi) joins
11:05:03driib quits [Ping timeout: 246 seconds]
11:05:03driib8 is now known as driib
11:14:13driib quits [Client Quit]
11:14:40driib (driib) joins
11:48:40VerifiedJ quits [Client Quit]
11:49:28VerifiedJ (VerifiedJ) joins
12:35:07Stiletto quits [Ping timeout: 245 seconds]
12:35:53Stiletto joins
12:40:54LeGoupil joins
12:46:03Lord_Nightmare quits [Client Quit]
12:51:44Lord_Nightmare (Lord_Nightmare) joins
13:18:11march_happy (march_happy) joins
13:19:15qwertyasdfuiopghjkl quits [Remote host closed the connection]
13:53:08duce1337 quits [Changing host]
13:53:08duce1337 (duce1337) joins
14:05:32Arcorann quits [Ping timeout: 245 seconds]
14:07:12march_happy quits [Ping timeout: 245 seconds]
14:07:30march_happy (march_happy) joins
14:12:12jacobk quits [Ping timeout: 245 seconds]
14:57:14jacobk joins
15:08:22systwi quits [Ping timeout: 265 seconds]
15:08:25systwi (systwi) joins
16:01:11jacobk quits [Client Quit]
16:37:59jacobk joins
16:50:07march_happy quits [Ping timeout: 245 seconds]
16:53:28qwertyasdfuiopghjkl joins
17:46:50lennier1 quits [Client Quit]
17:48:03lennier1 (lennier1) joins
18:03:20jacobk quits [Ping timeout: 265 seconds]
18:07:57jacobk joins
18:15:33Minkafighter quits [Client Quit]
18:16:13Minkafighter3 joins
18:20:07Minkafighter3 quits [Client Quit]
18:20:46Minkafighter3 joins
18:22:14Minkafighter3 quits [Client Quit]
18:22:36Minkafighter3 joins
20:02:39jacobk quits [Ping timeout: 246 seconds]
20:51:53LeGoupil quits [Client Quit]
21:09:19jacobk joins
21:14:23DiscantX joins
21:15:57jacobk quits [Ping timeout: 245 seconds]
21:17:44<Ryz>I'm assuming there's still no service or project on archiving Mega.nz content; considering the trouble with https://www.youtube.com/c/SynaMax/videos via https://www.nintendolife.com/news/2022/06/youtuber-ends-metroid-prime-music-covers-after-nintendos-lawyers-call
21:18:09<Ryz>Seeing https://www.youtube.com/post/UgkxhGvZreruILaguvaarzuhX8ss0KOTk73N having this link: https://mega.nz/file/jzBXmDgD#fA1M68kLf_Bqmwl6m2FcW9BBH5dcJGvO1uQbB_UQH58
21:19:19<@arkiver>how often do we get MEGA stuff that needs to be archived?
21:26:11<@arkiver>Ryz: ^
21:30:11<Ryz>Trying to find any additional examples in my text files; can't fully represent the others but myself (although I did see some Mega.nz links mentioned in this channel via the logs);
21:33:50<Ryz>Google Drive seems to be used more than Mega.nz - as not only everyday folk use it but also companies and government (I really /wish/ I saved that link for reference)
21:35:15lennier1 quits [Client Quit]
21:35:44lennier1 (lennier1) joins
21:36:03<Ryz>One example is https://mega.nz/#F!8fZAgYQZ!JVt6ZeIdZlSeb3JZn-Pmkg (from https://azogstudio.blogspot.com/ - I haven't archived that yet because to ensure full coverage, I want to archive that link and potentially other links) - admittedly it's more of a want than a need, but what I've seem to be learning more and more over the years is that websites
21:36:03<Ryz>tend to suddenly poof as little as in 3 months
21:36:35<Ryz>Plus me being proactive and archiving websites that won't announce they'll be shutting down that just suddenly poof :/
21:45:33igloo222250 joins
21:45:37driib5 (driib) joins
21:45:44marto_7 joins
21:45:49Craigle3 (Craigle) joins
21:45:49flashfire421 (flashfire42) joins
21:46:20Ryz3 (Ryz) joins
21:46:38drin joins
21:46:45chfoo (chfoo) joins
21:46:45@ChanServ sets mode: +o chfoo
21:46:46Frogging- joins
21:46:48geezabiscuit quits [Client Quit]
21:46:48driib quits [Client Quit]
21:46:48Craigle quits [Client Quit]
21:46:48billy549- quits [Ping timeout: 265 seconds]
21:46:48eroc1990 quits [Client Quit]
21:46:48igloo22225 quits [Client Quit]
21:46:48flashfire42 quits [Client Quit]
21:46:48@chfoo_ quits [Excess Flood]
21:46:48Ryz quits [Client Quit]
21:46:48marto_ quits [Client Quit]
21:46:48qwertyasdfuiopghjkl quits [Client Quit]
21:46:48Frogging101 quits [Remote host closed the connection]
21:46:48Craigle3 is now known as Craigle
21:46:48driib5 is now known as driib
21:46:48Ryz3 is now known as Ryz
21:46:48marto_7 is now known as marto_
21:46:48igloo222250 is now known as igloo22225
21:46:48flashfire421 is now known as flashfire42
21:46:49jspiros_ quits [Client Quit]
21:46:49katocala quits [Remote host closed the connection]
21:46:50jspiros (jspiros) joins
21:46:58Frogging- is now known as Frogging101
21:47:21drin is now known as geezabiscuit
21:47:27katocala joins
21:48:46billy549 (Billy549) joins
21:50:58<Ryz>arkiver ^
22:06:26march_happy (march_happy) joins
22:46:13march_happy quits [Remote host closed the connection]
22:49:00march_happy (march_happy) joins
23:07:58Discant joins
23:09:01Discant quits [Remote host closed the connection]
23:10:02Discant joins
23:11:22DiscantX quits [Ping timeout: 245 seconds]
23:14:17Discant quits [Ping timeout: 245 seconds]
23:26:49jacobk joins
23:42:19tech234a (tech234a) joins
23:54:52Hackerpcs quits [Quit: Hackerpcs]
23:57:06Hackerpcs (Hackerpcs) joins