00:06:33etnguyen03 quits [Ping timeout: 272 seconds]
00:10:38aninternettroll quits [Ping timeout: 265 seconds]
00:13:31Bleo18 quits [Client Quit]
00:13:48Bleo18 joins
00:16:11etnguyen03 (etnguyen03) joins
00:21:37<cdreimanu>i was browsing the comments of an archival related hackernews post recently, and apparently there’s this huge collection of bulgarian music (digitized LPs if I understood correctly) kept by some russian bloke. i have no idea how to go about this. https://gramofonche.chitanka.info/
00:22:04<cdreimanu>it’s the comment by svilen_dobrev here, for context: https://news.ycombinator.com/item?id=38288020
00:22:25<cdreimanu>what do y’all think?
00:57:51etnguyen03 quits [Ping timeout: 272 seconds]
01:47:36etnguyen03 (etnguyen03) joins
01:55:28cdreimanu quits [Remote host closed the connection]
01:57:48Arcorann quits [Remote host closed the connection]
01:57:48DogsRNice quits [Remote host closed the connection]
01:58:06DogsRNice joins
02:02:27etnguyen03 quits [Ping timeout: 272 seconds]
02:03:55<@JAA>44000 hours of security camera footage from the Captiol during the insurrection will get released over the next few months. That'll be interesting to archive.
02:03:58Arcorann (Arcorann) joins
02:04:22<@JAA>https://www.theguardian.com/us-news/2023/nov/17/mike-johnson-january-6-video-footage
02:26:49etnguyen03 (etnguyen03) joins
02:43:37Billy549 quits [Ping timeout: 272 seconds]
02:51:28qwertyasdfuiopghjkl quits [Client Quit]
02:53:50Naruyoko5 quits [Read error: Connection reset by peer]
03:00:27emily quits [Quit: ZNC 1.8.2 - https://znc.in]
03:01:08pseudorizer (pseudorizer) joins
03:03:53Barto quits [Ping timeout: 272 seconds]
03:30:29etnguyen03 quits [Ping timeout: 272 seconds]
03:37:28etnguyen03 (etnguyen03) joins
03:41:22Billy549 (Billy549) joins
04:12:20shinji257 leaves [https://quassel-irc.org - Chat comfortably. Anywhere.]
04:17:20Bleo182 joins
04:17:33Bleo18 quits [Client Quit]
04:17:34pseudorizer quits [Client Quit]
04:17:34Bleo182 is now known as Bleo18
04:17:54pseudorizer (pseudorizer) joins
04:20:45yasomi (yasomi) joins
04:22:29Wohlstand (Wohlstand) joins
04:28:08DogsRNice quits [Read error: Connection reset by peer]
04:38:55<@JAA>So for archive.mozilla.org, there is one directory I couldn't list due to timeouts (on my side, but the server's timeout is only marginally higher, and it still fails), namely /pub/firefox/tinderbox-builds/autoland-macosx64-debug/. Apart from that, there are four 404s (/pub/seamonkey/oldnightly/testing/testing/, /pub/seamonkey/nightly/nightly/, /pub/labs/devtools/master/master/, and
04:39:01<@JAA>/pub/seamonkey/oldnightly/2021-09-01-21-00-03-comm-253/2021-09-01-21-00-03-comm-253/). Everything else seems to have been retrieved properly.
04:56:45<pokechu22>JAA: IA gives https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64-debug/1477331902/ (from https://web.archive.org/web/*/archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64-debug*)
04:57:52<pokechu22>based on https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64/1477331902/ existing too you probably can guess a list
05:00:42<@JAA>There are 96867265 files in the dirs I managed to list.
05:01:28<@JAA>2564939 of them exceed 100 MiB.
05:01:45<@JAA>29015882 are over 10 MiB.
05:02:02<@JAA>36821201 over 1 MiB
05:03:06<@JAA>The >= 100 MiB files are 438.52 TiB in total.
05:03:33<@JAA>My size summing tool is rather slow, so I can't get a total size right now. Need to find a way to make that faster first.
05:13:05dumbgoy quits [Ping timeout: 272 seconds]
05:13:43Billy549 quits [Ping timeout: 272 seconds]
05:14:04Billy549 (Billy549) joins
05:25:28dude joins
05:26:24dude quits [Remote host closed the connection]
05:27:29onetruth joins
05:40:34qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
05:51:05Wohlstand quits [Client Quit]
05:53:42Barto (Barto) joins
05:59:52etnguyen03 quits [Client Quit]
06:11:26Wohlstand (Wohlstand) joins
06:17:52BlueMaxima quits [Read error: Connection reset by peer]
06:35:14Island quits [Read error: Connection reset by peer]
06:50:35Naruyoko joins
07:03:24aninternettroll (aninternettroll) joins
07:06:47nicolas17 quits [Ping timeout: 265 seconds]
07:16:00Wohlstand quits [Client Quit]
07:49:01<@JAA>Ok, I have a total size: 1.66 PiB
07:49:05<@JAA>arkiver: ^
07:57:10c3manu (c3manu) joins
08:37:35onetruth quits [Remote host closed the connection]
08:37:35c3manu quits [Remote host closed the connection]
08:37:37onetruth joins
08:37:51c3manu (c3manu) joins
10:00:00Bleo18 quits [Client Quit]
10:01:22Bleo182 joins
10:53:07pie_[bnc] quits []
10:53:10pie_ joins
10:55:45onetruth quits [Remote host closed the connection]
10:55:45c3manu quits [Remote host closed the connection]
10:55:50c3manu (c3manu) joins
10:56:06onetruth joins
10:58:15pie_ quits [Ping timeout: 272 seconds]
10:58:57IDK (IDK) joins
11:05:06onetruth quits [Remote host closed the connection]
11:05:17onetruth joins
11:06:14pie_ joins
11:07:55IDK quits [Max SendQ exceeded]
11:08:42IDK (IDK) joins
11:10:01onetruth quits [Remote host closed the connection]
11:10:08onetruth joins
11:12:49Ruthalas59 quits [Ping timeout: 272 seconds]
11:17:48IDK quits [Max SendQ exceeded]
11:18:35IDK (IDK) joins
11:19:56Ruthalas59 (Ruthalas) joins
12:35:13<c3manu>you reckon I can just try running https://gramofonche.chitanka.info/ through the bot with one connection?
12:59:51Arcorann quits [Ping timeout: 272 seconds]
13:07:52decky joins
13:10:37decky_e quits [Ping timeout: 272 seconds]
13:23:19onetruth quits [Remote host closed the connection]
13:23:19pseudorizer quits [Client Quit]
13:23:30onetruth joins
13:23:38pseudorizer (pseudorizer) joins
13:32:11onetruth quits [Remote host closed the connection]
13:32:21onetruth joins
13:40:30katocala joins
13:45:51icedice (icedice) joins
13:46:56icedice quits [Client Quit]
13:55:09katocala quits [Remote host closed the connection]
13:58:20dumbgoy joins
14:03:26@OrIdow6^2 is now known as @OrIdow6
14:19:39project10 quits [Ping timeout: 272 seconds]
14:20:21project10 (project10) joins
14:56:10hartbart joins
14:57:47hartbart quits [Remote host closed the connection]
14:58:04riku joins
15:04:02dumbgoy_ joins
15:07:09Wohlstand (Wohlstand) joins
15:07:13dumbgoy quits [Ping timeout: 265 seconds]
15:16:20riku quits [Client Quit]
15:24:09katocala joins
15:26:19<@arkiver>JAA: ouch
15:28:26<@arkiver>perhaps we could archive part of it...
15:28:37<@arkiver>is most of this size in a certain part of the repository?
15:28:49<@arkiver>we could get a sample of that, while mirroring the stuff outside of that
15:33:50onetruth quits [Remote host closed the connection]
15:33:50Wohlstand quits [Remote host closed the connection]
15:33:50Wohlstand1 (Wohlstand) joins
15:34:11onetruth joins
15:35:53onetruth quits [Remote host closed the connection]
15:36:10Wohlstand1 is now known as Wohlstand
15:36:11onetruth joins
15:36:36etnguyen03 (etnguyen03) joins
15:44:24Wohlstand quits [Remote host closed the connection]
15:44:24onetruth quits [Remote host closed the connection]
15:44:37Wohlstand (Wohlstand) joins
15:44:39onetruth joins
15:49:32kiryu quits [Remote host closed the connection]
15:52:04project10 quits [Client Quit]
15:52:04Ruthalas59 quits [Client Quit]
15:52:04Wohlstand quits [Remote host closed the connection]
15:52:17us3rrr joins
15:52:17Ruthalas59 (Ruthalas) joins
15:52:19kiryu joins
15:52:19kiryu quits [Changing host]
15:52:19kiryu (kiryu) joins
15:52:23Wohlstand (Wohlstand) joins
15:52:35project10 (project10) joins
15:52:41<Pedrosso>I was using DiscordChatExporter to make a total archive, however, I ran out of disk space whilst it was already a long way through. If I restart it with the same output file, does anyone know if it will resume the save, or overwrite what it's already done? I'd really rather not manually write in the IDs of the many guilds it missed
15:53:18<Pedrosso>(assuming I have more disk space now, which I do)
15:56:31onetruth quits [Ping timeout: 265 seconds]
16:00:24<h2ibot>JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=51164&oldid=51159
16:00:38<c3manu>!ig 1ticp48603jvfw5izv5xfs8vc ^https?://wheelmap\.org/
16:01:43<thuban>c3manu: psst, wrong channel
16:01:54<c3manu>lol, thanks m)
16:35:15DogsRNice joins
16:49:45Megame (Megame) joins
16:52:02<Pedrosso>^ from that above, I think I'll just write a code to load in the rest of the IDs
16:52:25<Pedrosso>https://hackint.logs.kiska.pw/archiveteam-bs/20231112#:~:text=talking%20to%20ia%20about%20it https://hackint.logs.kiska.pw/archiveteam-bs/20231112#:~:text=Either%20directly%20or%20through%20arkiver
16:52:25<Pedrosso>JAA: I've been downloading and I do want to follow this advice, however, I'm not quite sure how to contact them about something such as this. You mentioned I could through ar-kiver?
17:05:56io joins
17:06:24io quits [Remote host closed the connection]
17:06:42iammyself joins
17:07:07iammyself leaves
17:17:03<@JAA>arkiver: I'm afraid I don't have stats on parts of the server. It's large enough to be a pain to work with in general.
17:18:08driib quits [Client Quit]
17:18:32<@JAA>Pedrosso: #discard for Discord stuff
17:18:39<Pedrosso>thx
17:19:16<@JAA>Pedrosso: Yes, arkiver is the person to speak to here about getting permission for uploading large amounts of data (and also getting a collection and so on).
17:21:26<Pedrosso>Ah. Would that be done here (-bs) or, to not clog this chat up, DM:s?
17:24:12riku joins
17:27:34riku quits [Client Quit]
17:27:42<@JAA>Either is fine. Just keep us informed if you do it through PM and the project ends up happening. :-)
17:27:50riku (riku) joins
17:35:04driib (driib) joins
17:38:56<Pedrosso>That I will do.
17:39:04<Pedrosso>(informing, that is)
17:53:39DogsRNice quits [Remote host closed the connection]
17:53:54DogsRNice joins
18:05:34Wohlstand quits [Ping timeout: 265 seconds]
18:11:00<@JAA>arkiver: Someone on SWH's IRC channel shared a breakdown by top dir: firefox 1053, thunderbird 156, devedition 117, seamonkey 26, mobile 17, xulrunner 3 are the top 6. (Numbers are 'size_tb', whichever unit that is exactly.)
18:11:57<@JAA>I'll try to extract a full index with path, size, and mtime.
18:12:29qwertyasdfuiopghjkl quits [Client Quit]
18:16:35Wohlstand (Wohlstand) joins
18:24:21Craigle quits [Quit: The Lounge - https://thelounge.chat]
18:25:02Craigle (Craigle) joins
18:31:03<nulldata>OnMSFT.com hasn't published any new articles since 10/31. Before that was multiple articles daily. No announcements or mentions of taking a break. Did some digging and found they were acquired on 10/10 by Reflector Media. Maybe something happened with the staff under the new ownership?
18:31:03<nulldata>https://www.einpresswire.com/article/660812143/windowsreport-expands-its-microsoft-coverage-with-strategic-onmsft-acquisition
18:33:05<nulldata>They also have a podcast that consistently published every Sunday. Last episode was on 10/29 https://soundcloud.com/onmsft
18:33:38<@arkiver>Pedrosso: hi, i do not exactly know what this is about - what is this about?
18:33:46<@arkiver>svtplay.se ?
18:36:41<Pedrosso>Here's the context https://hackint.logs.kiska.pw/archiveteam-bs/20231112#:~:text=know%20of%20a-,website,-svtplay.se%20(videos
18:36:41<Pedrosso>But to clarify in here, It's a very large official service for Swedish programs, and it uses a stream system. They tend to delete content en-masse and I've yet to find any other copies on the internet of many of them
18:36:43qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
18:38:21<joepie91|m>fwiw, you could check newsgroups if you haven't yet, that's where a lot of the NPO stuff (Dutch equivalent) ends up that's not available elsewhere
18:39:24thuban quits [Ping timeout: 265 seconds]
18:39:59<Pedrosso>I posted about the service once I noticed a documentary, https://www.svtplay.se/din-hjarna had lost both its first seasons due to this deletion
18:40:58<nulldata>One of the higher volume writers for OnMSFT seems to imply it's about to be killed. https://twitter.com/Dav3Shanahan/status/1720441624029769796
18:40:58<eggdrop>nitter: https://nitter.net/Dav3Shanahan/status/1720441624029769796
18:41:58<Pedrosso>https://www.svtplay.se/sitemap-details-episodes.xml may not be extensive to all of all of its content as it doesn't just host episodes but it may give a perspective to the sheer scale. The service is fairly region-locked to Sweden so things outside don't have access to much
18:43:19<Pedrosso>I've found a -dl script for it and made my own code to automate it so I now have a way to be able to efficiently download them (finding items is no problem due to its great sitemap, but I've no clue how to verify it's extensive. Sure does look extensive though, covering items already deleted) I'm still working on ensuring that what's downloaded is
18:43:19<Pedrosso>of highest quality and such
18:43:40<Pedrosso>(other sitemaps are shown in the main sitemap.xml)
18:43:40<nulldata>Looks like at least 2 of the authors from OnMSFT have started their own site - https://msftunboxed.com/
18:45:32<Pedrosso>Possibly especially of interest to IA is the news which afaik is deleted a couple of months after being put online
18:46:43<Pedrosso>Is that enough context?
18:48:08<@JAA>nulldata: Thanks, I've launched an AB job for it.
18:48:25thuban joins
18:49:50<Pedrosso>(lmk if AT has any preferred image sharing tool) https://i.imgur.com/RuQRl77.png the yellow/orange bar here says "4 hours left"
18:51:38<Pedrosso>Though to be clear they usually delete/remove content due to copyright
18:51:39T31M quits [Quit: ZNC - https://znc.in]
18:52:03<@JAA>Images can be uploaded to https://transfer.archivete.am/ and if you then insert /inline/ after the domain, it gets displayed in a browser as well rather than forcing a download on the regular URL (though that'll be fixed soon™).
18:52:09T31M joins
18:52:33<@JAA>E.g. https://transfer.archivete.am/inline/bG4mu/aatt.png
18:53:38<Pedrosso>Awesome, (also, got a great png right there)
18:58:24<@arkiver>Pedrosso: that is indeed useful
19:00:48<Pedrosso>So, I was recommended to speak directly to IA &or you about uploading something so big & making a collection and such. I would like a reassurance that this is legally in the clear and everything
19:01:19<@arkiver>Pedrosso: so is this about you trying to mirror everything? or a part of it?
19:01:26<@arkiver>any idea what numbers we may be looking at here?
19:07:41<@arkiver>Pedrosso: let's continue over PM
19:07:49<@arkiver>or DM, however you want to call it
19:17:40Craigle quits [Client Quit]
19:18:30Craigle (Craigle) joins
19:57:13etnguyen03 quits [Ping timeout: 272 seconds]
20:02:28riku quits [Client Quit]
20:18:18Wohlstand quits [Remote host closed the connection]
20:18:21Wohlstand (Wohlstand) joins
20:23:37driib7 (driib) joins
20:23:56project10 quits [Client Quit]
20:23:56T31M quits [Client Quit]
20:23:56us3rrr quits [Remote host closed the connection]
20:23:56Wohlstand quits [Remote host closed the connection]
20:23:56driib quits [Client Quit]
20:23:56qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:23:56driib7 is now known as driib
20:24:00T31M joins
20:24:02us3rrr joins
20:24:04nicolas17 joins
20:24:05Wohlstand (Wohlstand) joins
20:24:23project10 (project10) joins
20:25:16project10 quits [Max SendQ exceeded]
20:25:55project10 (project10) joins
20:31:08qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
20:41:28Wohlstand quits [Remote host closed the connection]
20:41:28us3rrr quits [Remote host closed the connection]
20:41:37us3rrr joins
20:41:41Wohlstand (Wohlstand) joins
20:52:31etnguyen03 (etnguyen03) joins
21:06:45<@JAA>Here's the archive.mozilla.org file listing in all its g(l)ory: https://transfer.archivete.am/a0mjU/archive.mozilla.org-files.jsonl.zst
21:07:40BlueMaxima joins
21:10:18<@JAA>15.3 GiB after decompression, so have fun.
21:11:00<nicolas17>the other day I tried downloading a few Windows binaries and doing deltas/deduplication
21:14:29<nicolas17>it helped but not as much as I'd have hoped
21:19:16<pokechu22>JAA: does that include e.g. https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-macosx64-debug/1477331902/ which can be guessed from autoland-macosx64 despite autoland-macosx64-debug not working?
21:19:45<nicolas17>if only the GCS bucket was open...
21:20:40<@JAA>pokechu22: No, it does not.
21:20:52<@JAA>It's everything except /pub/firefox/tinderbox-builds/autoland-macosx64-debug/.
21:21:16<@JAA>It might be possible to guess everything* in that directory based on the other autoland-* dirs, but I haven't attempted that.
21:21:36<nicolas17>what zstd settings did you use for the listing?
21:24:09<@JAA>Just -10. I played with higher ones, but that would've taken hours.
21:25:03<nicolas17>I'm trying higher ones out of curiosity and I'm not seeing particularly useful savings
21:25:23<@JAA>Yeah, I've previously noticed that somewhere around -8 to -10 is where the big savings stop.
21:25:33<@JAA>For text-ish files, at least.
21:26:26<@JAA>There's often another significant drop with --ultra and -20 through -22, but those are so slow that they're rarely worth it for larger files.
21:26:58<@JAA>Also, the CPU on this server is a potato.
21:27:18igloo22225 (igloo22225) joins
21:27:26<nicolas17>-19 -T4 on my laptop is producing output at 8KiB/s (dunno how fast it's consuming input)
21:31:52<@JAA>Multi-threaded compression is also going to produce larger output than single-threaded.
21:32:54<nicolas17>hm you made this by parsing the html right? and file sizes are like "504M"?
21:34:05c3manu quits [Remote host closed the connection]
21:36:57<nicolas17>my highly stupid script to calculate total file size is gonna take 10 minutes, lol
21:41:16<@JAA>Correct
21:41:20<@JAA>I already have the total size.
21:41:23<@JAA>1.66 PiB
21:41:46<nicolas17>💀
21:47:12<Barto>even a your momma joke is not that big, damn
21:50:44hitgrr8_ quits [Quit: away]
22:05:51<nicolas17>JAA: 26.64 TiB for seamonkey?
22:15:13<nicolas17>JAA: something seems wrong with your listing...
22:15:14<nicolas17>{"name":"/pub/android/focus/8.0.8","size":"43M","mtime":"13-Feb-2023 04:22"}
22:15:16<nicolas17>{"name":"/pub/android/focus/8.0.8/Focus-arm.apk","size":"43M","mtime":"13-Feb-2023 04:22"}
22:15:17<nicolas17>{"name":"/pub/android/focus/8.0.8/Focus-x86.apk","size":"51M","mtime":"13-Feb-2023 04:22"}
22:16:07<nicolas17>oh they actually exist like that 💀
22:16:29<nicolas17>right, cloud-y object storage doesn't care if a/b is a file and a/b/c is also a file
22:18:31<@JAA>Huh yeah, interesting.
22:24:18<nicolas17>JAA: https://transfer.archivete.am/inline/wYgWD/Screenshot_20231118_192300.png I tried to do a thing before realizing I won't have enough RAM for the entire directory :D
22:29:50icedice (icedice) joins
22:39:15BlueMaxima quits [Remote host closed the connection]
22:39:16us3rrr quits [Remote host closed the connection]
22:39:16Wohlstand quits [Remote host closed the connection]
22:39:20BlueMaxima joins
22:39:26us3rrr joins
22:39:29Wohlstand (Wohlstand) joins
22:40:17Naruyoko quits [Client Quit]
22:41:41Naruyoko joins
22:44:38us3rrr quits [Remote host closed the connection]
22:44:38BlueMaxima quits [Remote host closed the connection]
22:44:38Wohlstand quits [Remote host closed the connection]
22:44:48us3rrr joins
22:44:50BlueMaxima joins
22:44:51Wohlstand (Wohlstand) joins
23:00:48<nicolas17>"[Errno 28] No space left on device" how is this possible, I was creating sparse files
23:00:53<nicolas17>turns out, I ran out of inodes, lol
23:01:52BlueMaxima_ joins
23:03:05qwertyasdfuiopghjkl quits [Client Quit]
23:03:06BlueMaxima quits [Remote host closed the connection]
23:03:06us3rrr quits [Remote host closed the connection]
23:03:06Wohlstand quits [Remote host closed the connection]
23:03:06T31M quits [Client Quit]
23:03:14us3rrr joins
23:03:22Wohlstand (Wohlstand) joins
23:03:39T31M joins
23:05:19<@JAA>nicolas17: lol. Yeah, it is a chonker.
23:17:04riku (riku) joins
23:22:58<nicolas17>now it's gonna take me forever to delete these
23:26:38qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
23:56:26<@JAA>Life hack for next time: make a tmpfs (or even a loop-mounted ext4 or whatever), then simply nuke that when done. :-)
23:56:54<nicolas17>wouldn't that eat >15GB of RAM? :P
23:57:52<@JAA>With a tmpfs, yeah. But you can create an ext4 fs in a file on an existing disk partition, then loop-mount that. When you're done, you unmount and delete the single file.
23:58:30<@JAA>In hindsight, maybe I should've done that as well for my pad backup thing rather than creating 4.8 million symlinks. lol