00:00:52nexussfan (nexussfan) joins
00:11:14mr_sarge quits [Read error: Connection reset by peer]
00:11:47StarletCharlotte joins
00:11:58<StarletCharlotte>What's the best way to upload large files to the Internet Archive?
00:12:22<StarletCharlotte>Because I'm trying to upload an archive of ftp://ftp.funcom.com and it's stuck at 4.9 GB. It's been several hours.
00:12:39<StarletCharlotte>My internet isn't the best but I don't think it's that bad.
00:14:01<Yakov>reading some of ABs source, I think it supports ftp..?
00:14:08<@imer>StarletCharlotte: there's some tips here (if you've not seen it yet) https://wiki.archiveteam.org/index.php/Internet_Archive#Upload_speed
00:17:20<pokechu22>ab doesn't interact nicely with ftp - there's some code for it but it crashes most of the time and as such is mostly disabled at this point
00:18:11<StarletCharlotte>imer I'll take a look
00:26:25<@OrIdow6>What's the current tracker architecture? I found old logs talking about it being an Nginx(Lua) proxy that talks to the original tracker, but doesn't directly talk to Redis - is that still the case?
00:28:36<nicolas17>StarletCharlotte: are you on Linux?
00:28:46<StarletCharlotte>yeah
00:29:47<nicolas17>in my experience "sudo sysctl net.ipv4.tcp_congestion_control=bbr" makes uploads to archive.org significantly faster
00:29:50<nicolas17>won't help with ongoing connections/uploads though, you'd have to start over
00:30:21<StarletCharlotte>Got it. Should I turn it off after though?
00:30:46<nicolas17>I didn't notice any negative effects on the rest of my internet use tbh
00:30:57<StarletCharlotte>got it
00:31:01<nicolas17>but you could run "sudo sysctl net.ipv4.tcp_congestion_control" to see what your current value is
00:31:05<nicolas17>and restore it afterwards
00:31:07<StarletCharlotte>whoops
00:31:12<StarletCharlotte>i uh... already did it
00:31:17<StarletCharlotte>oh well
00:32:59<BlankEclair>out of curiosity, does that only make IA uploads fast, or does it make all tcp connections go faster
00:34:26<nicolas17>there's *something* in archive.org's networking that doesn't interact well with the default congestion control algorithm, I don't understand the details
00:35:34<klea>https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#:~:text=tcp%5Fcongestion%5Fcontrol%20%2D%20STRING is not very clear what that does.
00:40:24<nicolas17>https://en.wikipedia.org/wiki/TCP_congestion_control
00:40:44<StarletCharlotte>> /usr/bin/python: Error while finding module specification for 'ia-upload-stream.py' (ModuleNotFoundError: __path__ attribute not found on 'ia-upload-stream' while trying to find 'ia-upload-stream.py'). Try using 'ia-upload-stream' instead of 'ia-upload-stream.py' as the module name.
00:40:49<StarletCharlotte>not sure what's going on here
00:41:20<StarletCharlotte>Removing .py also fails
00:41:42<StarletCharlotte>same with removing -m
00:47:47<klea>nicolas17: what version does setting the variable to bbr set it to, BBRv1, BBRv2 or BBRv3?
00:49:12<StarletCharlotte>yeah I can't figure out how to run this. The example on the wiki just doesn't work for some reason
00:50:00<@JAA>Why is that command trying to run it as a module? (I either never knew or forgot that my uploader is even listed there.)
00:51:16<StarletCharlotte>Good question, but not running it as a module also fails.
00:51:18<@JAA>And what's that bit about installing the ia package? ia-upload-stream only depends on requests.
00:51:53<StarletCharlotte>https://pastebin.com/s88c8eJr
00:53:03<@JAA>Hmm yeah, I suppose.
00:53:22<@JAA>That does run the script correctly though.
00:53:46<@JAA>You can specify the S3 credentials via IA_S3_ACCESS and IA_S3_SECRET environment variables as well.
00:54:14<@JAA>And ia-s3-auth can get you those values without `ia configure`.
00:55:52etnguyen03 quits [Client Quit]
00:56:42<StarletCharlotte>S3?
00:57:16<StarletCharlotte>Okay I guess
00:57:24<klea>They're available on the web at https://archive.org/account/s3.php too
00:57:32<klea>It's an S3-like API
00:57:34<StarletCharlotte>oh okay thanks
00:59:27<StarletCharlotte>Tried again, same error. It's asking about a config file or something?
00:59:43<@JAA>To explain that error referencing `ia configure`: `ia-upload-stream` reads ia's config file if it's available (and not overridden by the environment variable). There's no actual dependency on `ia`.
01:00:29<StarletCharlotte>I assume ia is from python-internetarchive?
01:00:55<StarletCharlotte>I set the environment variables for the S3 credentials so it's not that.
01:01:27<TheTechRobo>StarletCharlotte: the sysctl option should go back to what it was before after a reboot, FWIW, so don't worry about losing it
01:01:39<StarletCharlotte>got it
01:01:41<@JAA>Sounds like you didn't set them correctly then. It won't even reach that code when they're set.
01:02:03<TheTechRobo>(ia comes from https://pypi.org/project/internetarchive BTW)
01:02:40<StarletCharlotte>Huh, I guess set just sets the shell variables and not environment variables? I think?
01:02:48<@JAA>Yes
01:02:50<klea>try to export.
01:02:53<TheTechRobo>export IA_S3_ACCESS=...
01:03:07<@JAA>Either run it as `IA_S3_ACCESS=... IA_S3_SECRET=... ./ia-upload-stream ...` or `export` them.
01:03:49<StarletCharlotte>There it goes. thank you
01:03:50<@JAA>And `set` sets the arguments, not variables.
01:03:54<StarletCharlotte>that explains a lot
01:04:34StarletCharlotte quits [Client Quit]
01:11:50pabs (pabs) joins
01:13:49LddPotato quits [Read error: Connection reset by peer]
01:14:30LddPotato (LddPotato) joins
01:15:12roverinexile joins
01:17:41rover quits [Ping timeout: 272 seconds]
01:18:31etnguyen03 (etnguyen03) joins
01:24:27LddPotato quits [Read error: Connection reset by peer]
01:25:09LddPotato (LddPotato) joins
01:34:57LddPotato quits [Read error: Connection reset by peer]
01:35:51LddPotato (LddPotato) joins
01:36:03petrichor quits [Ping timeout: 272 seconds]
01:44:13fangfufu quits [Client Quit]
01:45:53LddPotato quits [Read error: Connection reset by peer]
01:46:34LddPotato (LddPotato) joins
01:50:08fangfufu joins
01:50:28kansei- (kansei) joins
01:51:52kansei quits [Ping timeout: 256 seconds]
02:03:57LddPotato quits [Read error: Connection reset by peer]
02:05:31LddPotato (LddPotato) joins
02:29:50pokechu22 quits [Ping timeout: 256 seconds]
02:40:35pokechu22 (pokechu22) joins
02:52:14ducky_ (ducky) joins
02:53:04ducky quits [Ping timeout: 256 seconds]
02:53:04ducky_ is now known as ducky
02:53:29thalia quits [Quit: Connection closed for inactivity]
03:06:40ducky quits [Ping timeout: 256 seconds]
03:08:16ducky (ducky) joins
03:30:58nexussfan quits [Quit: Konversation terminated!]
03:36:42Godzfire quits [Quit: Ooops, wrong browser tab.]
03:47:30nexussfan (nexussfan) joins
04:08:05etnguyen03 quits [Remote host closed the connection]
04:08:17fireatseaparks quits [Quit: Textual IRC Client: www.textualapp.com]
04:16:13fireatseaparks (fireatseaparks) joins
04:39:57Island quits [Read error: Connection reset by peer]
04:46:18cyanbox joins
04:55:14DogsRNice quits [Read error: Connection reset by peer]
05:04:32n9nes quits [Ping timeout: 256 seconds]
05:05:03khaoohs quits [Ping timeout: 272 seconds]
05:06:01n9nes joins
05:06:36khaoohs joins
05:08:58nexussfan quits [Client Quit]
05:15:33steering wonders how thoroughly wikipedia links have been archived
05:23:34<steering>i know there's bots that try and point links to archives when they're dead but is there stuff going through and SPN'ing links for example
05:24:26<BlankEclair>wikipedia-eventstream or something
05:24:55<BlankEclair>https://archive.org/details/wikipedia-eventstream?tab=about
05:27:59<pokechu22>Yeah, my understanding is that there's a project that does that (that isn't by archiveteam). Looking at https://archive.org/details/wikipedia-eventstream?tab=collection&sort=-publicdate it seems like stuff is ran weeklyish?
05:35:01<steering>ah good :)
06:08:23Snivy quits [Ping timeout: 272 seconds]
06:15:57petrichor (petrichor) joins
06:25:00fionera quits [Ping timeout: 256 seconds]
06:29:23BennyOtt (BennyOtt) joins
06:40:59Wohlstand1 (Wohlstand) joins
06:43:24Wohlstand1 is now known as Wohlstand
06:51:24Wohlstand quits [Client Quit]
07:12:09Snivy (Snivy) joins
08:30:53rohvani quits [Ping timeout: 272 seconds]
08:55:44ducky quits [Ping timeout: 256 seconds]
08:57:25<ericgallager>https://en.wikipedia.org/wiki/User:GreenC_bot does archiving of Wikipedia links
08:57:40<ericgallager>https://en.wikipedia.org/wiki/User:GreenC/WaybackMedic
08:59:51<ericgallager>oh and this one too: https://en.wikipedia.org/wiki/User:InternetArchiveBot
09:14:57ducky (ducky) joins
09:32:19sec^nd quits [Ping timeout: 244 seconds]
09:34:36sec^nd (second) joins
09:58:55BornOn420 quits [Ping timeout: 272 seconds]
10:41:42TheEnbyperor quits [Ping timeout: 256 seconds]
10:41:59TheEnbyperor_ quits [Ping timeout: 272 seconds]
10:46:16TheEnbyperor (TheEnbyperor) joins
10:51:29TheEnbyperor quits [Ping timeout: 272 seconds]
10:57:47TheEnbyperor joins
10:59:34TheEnbyperor_ (TheEnbyperor) joins
11:02:13Dada joins
11:05:11Dada quits [Remote host closed the connection]
11:40:27APOLLO03a joins
11:42:54APOLLO03 quits [Ping timeout: 256 seconds]
11:59:46StarletCharlotte joins
12:00:03Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat]
12:02:12<StarletCharlotte>Good news: ia-upload-stream.py works! Bad news: I can't edit the metadata to say I finished uploading the actual file instead of the placeholder now because it turns out the Internet Archive REALLY doesn't like when an item identifier has dots in it. But it only tells you that breaks things AFTER you make that the name of your item, only when you
12:02:12<StarletCharlotte>try to edit the item. https://archive.org/details/ftp.funcom.com
12:02:15<StarletCharlotte>Not sure what to do.
12:02:48Bleo1826007227196234552220 joins
13:03:07StarletCharlotte quits [Client Quit]
13:19:32Webuser302981 joins
13:19:39<Webuser302981>What
13:20:06Webuser302981 quits [Client Quit]
13:20:22@imer nods
13:22:51Arcorann_ quits [Ping timeout: 272 seconds]
13:42:17ice quits [Quit: WeeChat 4.7.1]
13:42:29oxtyped quits [Ping timeout: 272 seconds]
13:54:00mgrytbak8 joins
13:54:50ice joins
13:55:09mgrytbak quits [Ping timeout: 272 seconds]
13:55:09mgrytbak8 is now known as mgrytbak
14:15:02oxtyped joins
14:34:13Webuser247771 joins
14:34:57Webuser247771 quits [Client Quit]
14:40:07oxtyped quits [Ping timeout: 272 seconds]
14:49:40oxtyped joins
14:51:57GodzFire joins
14:58:28<GodzFire>pokechu22 I was watching the crawler and noticed it was seemingly scrapping some production websites so I checked the productionmusic.fandom.com_articles_and_outlinks.txt list. There's a crap ton that should be removed. I went through and took out 17000 links. Here's an updated txt that only has ProdMusic Wiki stuff, could you restart it with
14:58:28<GodzFire>this?: https://litter.catbox.moe/gke9wfo08aoe2dpx.txt
15:00:18<GodzFire>I was wondering why it pulled 111gbs when the site is only 12 total.
15:04:21FiTheArchiver joins
15:04:39FiTheArchiver quits [Remote host closed the connection]
15:14:18Dada joins
15:19:02Webuser963758 joins
15:19:30Webuser963758 quits [Client Quit]
15:20:51<aaq|m>That would compress down well at least
15:21:51<justauser>GodzFire: That's fine, our motto is "Archive All The Things".
15:22:29<justauser>IA is willing to store the junk.
15:24:28<justauser>However, it only pulled 7GB so far - where is your number from?
15:26:33<justauser>Oh, nevermind - it's my number that came from a frozen dashboard.