00:42:51<pabs>download speeds from IA seem quite slow around the world, are their connections overloaded or something?
00:53:48<flashfire42>Yeah
00:56:47<nicolas17>sorry my fault, still downloading the yahoo videos stuff and my 4 connections are clearly overwhelming poor IA (?)
01:04:57<pabs>:)
01:08:00<pabs>hmm, rtorrent finds 0 peers for IA torrents like https://archive.org/download/fossy2023_Breaking_the_Chains_of_Trustin/fossy2023_Breaking_the_Chains_of_Trustin_archive.torrent
01:08:49<pabs>transmission-cli too
01:09:09<pabs>rtorrent also says DHT search unsuccessful
01:10:43<nicolas17>pabs: they use webseeding, which rtorrent doesn't support
01:15:28<pabs>ah :(
01:16:02<pabs>transmission-cli does download from webseeds much faster than a regular web download, interesting
01:17:24<pabs>what are these .____padding_file directories in the torrent?
01:19:24Arcorann (Arcorann) joins
01:20:24<nicolas17>the torrent points at both IA servers having the file as webseeds, so you download from both simultaneously, which mitigates one being slow
01:22:11<nicolas17>multi-file torrents work as if it was a single blob with all files concatenated, so a chunk hash can span multiple files, which causes some issues, those zero-filled padding files align the beginning of each file with chunk hash boundaries
01:25:47<pabs>so the IA download problem isn't the uplink, but individual servers?
01:38:32<pabs>hmm, browsers need better download mechanisms
02:33:12threedeeitguy39 (threedeeitguy) joins
02:33:25threedeeitguy3 quits [Ping timeout: 265 seconds]
02:33:25threedeeitguy39 is now known as threedeeitguy3
02:47:35threedeeitguy3 quits [Client Quit]
02:54:13threedeeitguy39 (threedeeitguy) joins
04:00:26BigBrain_ quits [Ping timeout: 245 seconds]
04:02:12<pabs>arkiver: in principle, do you think IA might be interested in sponsoring servers/storage/network/hosting/etc for live replicas of archive.softwareheritage.org (source code, all of GitHub/Debian/etc, will or does use Ceph) and or snapshot.debian.org (~152TB all of Debian's software history, both source/binaries, bespoke content-addressed filesystem layout)?
04:02:26BigBrain_ (bigbrain) joins
06:31:08nicolas17 quits [Ping timeout: 252 seconds]
06:31:25<Exorcism>update: https://irc.digitaldragon.dev/uploads/6b75bb3f2cb4020c/image.png
06:32:03<Exorcism>and because of this it's impossible for me to publish on ia at the moment oof
07:26:18themadpro (themadpro) joins
08:07:23nulldata quits [Ping timeout: 252 seconds]
08:11:02nulldata (nulldata) joins
09:00:44nulldata quits [Ping timeout: 252 seconds]
09:04:23nulldata (nulldata) joins
09:17:49Exorcism is now known as Exorcism_
09:35:31themadpro quits [Client Quit]
09:46:48Exorcism (exorcism) joins
11:57:07Exorcism uploaded an image: (99KiB) < https://matrix.hackint.org/_matrix/media/v3/download/matrix.fedibird.com/UQEukxNbokaThmsFmxsEzEVw/1000017637.jpg >
11:57:14Exorcism uploaded an image: (225KiB) < https://matrix.hackint.org/_matrix/media/v3/download/matrix.fedibird.com/skOoTTrwCoeaXEFZWbsZvoKM/1000017636.jpg >
11:57:30<Exorcism>hum
11:57:33<Exorcism>wrong channel 💀
13:26:56Arcorann quits [Ping timeout: 252 seconds]
14:07:34driib quits [Quit: The Lounge - https://thelounge.chat]
14:12:28driib (driib) joins
15:43:05<@arkiver>pabs: interesting question, are you connected with software heritage?
15:43:10<@arkiver>i have no idea how big software heritage is
15:43:18<@arkiver>definitely not saying no to that
15:44:28<pabs>kind of. I know the Debian folks who started it, I've been submitting code there for about a year and am about to start a contract or two with them about expanding it
15:44:49<pabs>for Debian and snapshot.debian.org, I'm one of the Debian sysadmin team
15:45:54<pabs>I forget how big SWH is either, but the video/slides for this talk will contain that once published https://debconf23.debconf.org/talks/44-software-heritage-building-a-community-to-safeguard-the-software-commons/
15:46:53<pabs>SWH are also working on mirroring it, they have two orgs in Europe doing that now
15:47:31<@arkiver>very nice!
15:47:40<@arkiver>pabs: do you have an estimate on the side of software heritage?
15:47:51<@arkiver>~250 TB?
15:48:31<pabs>I think more, but I can't remember sorry. definitely in the slides, I can ask olasd to publish them
15:48:38<@arkiver>and what is the yearly growth of both debian and software heritage like?
15:49:08<pabs>snapshot.d.o at least 5TB, probably more these days
15:49:18<@arkiver>i can have the numbers a bit later - will likely not have an answer for you yet next week, and we have the current problems at IA (which are being fixed - various factors :/ )
15:49:35<pabs>it was 5TB/year in 2014-06-01
15:49:45<@arkiver>ah, so probbaly more like 20 TB/year nowadays
15:50:15<pabs>maybe, not sure. might be some data in our munin instance
15:50:31<@arkiver>would an assumption of yearly growth at 50 TB for both debian and software heritage be realistic?
15:50:40<@arkiver>(again, we can figure out these numbers in the coming weeks)
15:50:52@arkiver if afk for ~40 minutes
15:50:54<@arkiver>is*
15:51:10<pabs>anyways, this is just exploration, only briefly mentioned to SWH during an interview and to Debian sysadmins on IRC
16:04:04zhongfu quits [Ping timeout: 258 seconds]
16:05:49zhongfu (zhongfu) joins
16:26:14qw3rty quits [Ping timeout: 252 seconds]
16:30:00<pabs>arkiver: the slides https://annex.softwareheritage.org/public/talks/2023/2023-09-10-DebConf23.pdf
16:31:09<pabs>More than 1PB of source code files
16:31:09<pabs>(replicated 3 times by Software
16:31:09<pabs>Heritage)
16:31:09<pabs>More than 100 TB used for (resilient)
16:31:09<pabs>storage of the graph
16:31:10<pabs>Infrastructure support for mirroring:
16:31:14<pabs>100 TB kafka deployment (~30TB of
16:31:16<pabs>data used)
17:27:00<@arkiver>pabs: thank you. a PB is quite a bit of data
17:27:16<@arkiver>it's not an impossible amount, but might have to check in with some people
18:09:56AlsoHP_Archivist joins
18:10:26<HP_Archivist>JAA: You helped with this script for downloading items from the CLI
18:10:31<HP_Archivist>ia search --itemlist 'uploader:harrypotterarchival@gmail.com -collection:game_replays -collection:videogame_videos -collection:speed_runs' | xargs -P 8 -n 1 ia download
18:10:50<HP_Archivist>How would I edit this to *not* download IA derived files? and only the files I upload?
18:40:12AlsoHP_Archivist quits [Client Quit]
18:50:36<@JAA>HP_Archivist: Support for that has only been added a few months ago: https://github.com/jjjake/internetarchive/issues/365
18:51:14<@JAA>You'll need 3.4.0 or higher, then add --exclude-source=derivative at the end of the command, probably.
19:04:31driib quits [Client Quit]
19:16:34qwertyasdfuiopghjkl quits [Client Quit]
19:21:05driib (driib) joins
19:34:05qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:35:01BigBrain_ quits [Ping timeout: 245 seconds]
19:37:18BigBrain_ (bigbrain) joins
21:03:18<HP_Archivist>JAA: Thanks. ia search --itemlist 'uploader:harrypotterarchival@gmail.com -collection:game_replays -collection:videogame_videos -collection:speed_runs' | xargs -P 8 -n 1 ia download --exclude-source=derivative ?
21:03:29<HP_Archivist>That doesn't look right, though
21:12:54<imer>HP_Archivist: Before | xargs most likely
21:25:40<HP_Archivist>imer: Tried that, it's actually: ia search --itemlist 'uploader:harrypotterarchival@gmail.com -collection:game_replays -collection:videogame_videos -collection:speed_runs' | xargs -P 8 -n 1 ia download --exclude-source=derivative
21:25:49<HP_Archivist>That's what seemed to work just now
21:28:54<@JAA><futurama_squint.png>
21:29:01<@JAA>Those two commands look the same...?
21:29:50<HP_Archivist>JAA: They are. I didn't try it the first time (when I said it didn't look right) Tried it first with imer's suggestion, that didn't work. But my original assumption was correct
21:31:13<@JAA>Yeah, adding --exclude-source=derivative at the end is what I said. :-)
21:31:32<HP_Archivist>:)
21:31:46<HP_Archivist>I should know not to second guess you, heh
21:31:49<@JAA>Technically, `ia download -h` says that the options must come after the identifier, but that's actually not true.
21:32:03<@JAA>And it's annoying to do with xargs, so whatever. :-P
21:32:39<HP_Archivist>Hm. Is there any way to add to this command a way to speed up the downloads, a la aria2?
21:32:43<@JAA>And it's also contrary to the vast majority of CLIs out there. Optional --x/-x arguments normally always come before positional ones.
21:33:19<@JAA>I'm not aware of one. You can make `ia download` print file URLs instead somehow, I think.
21:33:28<@JAA>Otherwise, just download more items in parallel by tweaking -P.
21:33:40<@JAA>But also, IA is struggling, so if it isn't urgent, maybe don't hammer them too hard.
21:34:24<HP_Archivist>JAA: Yeah, I thought about that. I'll just deal with the speed I'm getting rn.
21:37:25fireonlive quits [Excess Flood]
21:39:01fireonlive (fireonlive) joins
21:56:03nicolas17 joins
22:52:52BearFortress quits [Ping timeout: 265 seconds]
23:47:58systwi quits [Ping timeout: 265 seconds]
23:48:25systwi__ (systwi) joins