| 00:00:52 | | nexussfan (nexussfan) joins |
| 00:11:14 | | mr_sarge quits [Read error: Connection reset by peer] |
| 00:11:47 | | StarletCharlotte joins |
| 00:11:58 | <StarletCharlotte> | What's the best way to upload large files to the Internet Archive? |
| 00:12:22 | <StarletCharlotte> | Because I'm trying to upload an archive of ftp://ftp.funcom.com and it's stuck at 4.9 GB. It's been several hours. |
| 00:12:39 | <StarletCharlotte> | My internet isn't the best but I don't think it's that bad. |
| 00:14:01 | <Yakov> | reading some of ABs source, I think it supports ftp..? |
| 00:14:08 | <@imer> | StarletCharlotte: there's some tips here (if you've not seen it yet) https://wiki.archiveteam.org/index.php/Internet_Archive#Upload_speed |
| 00:17:20 | <pokechu22> | ab doesn't interact nicely with ftp - there's some code for it but it crashes most of the time and as such is mostly disabled at this point |
| 00:18:11 | <StarletCharlotte> | imer I'll take a look |
| 00:26:25 | <@OrIdow6> | What's the current tracker architecture? I found old logs talking about it being an Nginx(Lua) proxy that talks to the original tracker, but doesn't directly talk to Redis - is that still the case? |
| 00:28:36 | <nicolas17> | StarletCharlotte: are you on Linux? |
| 00:28:46 | <StarletCharlotte> | yeah |
| 00:29:47 | <nicolas17> | in my experience "sudo sysctl net.ipv4.tcp_congestion_control=bbr" makes uploads to archive.org significantly faster |
| 00:29:50 | <nicolas17> | won't help with ongoing connections/uploads though, you'd have to start over |
| 00:30:21 | <StarletCharlotte> | Got it. Should I turn it off after though? |
| 00:30:46 | <nicolas17> | I didn't notice any negative effects on the rest of my internet use tbh |
| 00:30:57 | <StarletCharlotte> | got it |
| 00:31:01 | <nicolas17> | but you could run "sudo sysctl net.ipv4.tcp_congestion_control" to see what your current value is |
| 00:31:05 | <nicolas17> | and restore it afterwards |
| 00:31:07 | <StarletCharlotte> | whoops |
| 00:31:12 | <StarletCharlotte> | i uh... already did it |
| 00:31:17 | <StarletCharlotte> | oh well |
| 00:32:59 | <BlankEclair> | out of curiosity, does that only make IA uploads fast, or does it make all tcp connections go faster |
| 00:34:26 | <nicolas17> | there's *something* in archive.org's networking that doesn't interact well with the default congestion control algorithm, I don't understand the details |
| 00:35:34 | <klea> | https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#:~:text=tcp%5Fcongestion%5Fcontrol%20%2D%20STRING is not very clear what that does. |
| 00:40:24 | <nicolas17> | https://en.wikipedia.org/wiki/TCP_congestion_control |
| 00:40:44 | <StarletCharlotte> | > /usr/bin/python: Error while finding module specification for 'ia-upload-stream.py' (ModuleNotFoundError: __path__ attribute not found on 'ia-upload-stream' while trying to find 'ia-upload-stream.py'). Try using 'ia-upload-stream' instead of 'ia-upload-stream.py' as the module name. |
| 00:40:49 | <StarletCharlotte> | not sure what's going on here |
| 00:41:20 | <StarletCharlotte> | Removing .py also fails |
| 00:41:42 | <StarletCharlotte> | same with removing -m |
| 00:47:47 | <klea> | nicolas17: what version does setting the variable to bbr set it to, BBRv1, BBRv2 or BBRv3? |
| 00:49:12 | <StarletCharlotte> | yeah I can't figure out how to run this. The example on the wiki just doesn't work for some reason |
| 00:50:00 | <@JAA> | Why is that command trying to run it as a module? (I either never knew or forgot that my uploader is even listed there.) |
| 00:51:16 | <StarletCharlotte> | Good question, but not running it as a module also fails. |
| 00:51:18 | <@JAA> | And what's that bit about installing the ia package? ia-upload-stream only depends on requests. |
| 00:51:53 | <StarletCharlotte> | https://pastebin.com/s88c8eJr |
| 00:53:03 | <@JAA> | Hmm yeah, I suppose. |
| 00:53:22 | <@JAA> | That does run the script correctly though. |
| 00:53:46 | <@JAA> | You can specify the S3 credentials via IA_S3_ACCESS and IA_S3_SECRET environment variables as well. |
| 00:54:14 | <@JAA> | And ia-s3-auth can get you those values without `ia configure`. |
| 00:55:52 | | etnguyen03 quits [Client Quit] |
| 00:56:42 | <StarletCharlotte> | S3? |
| 00:57:16 | <StarletCharlotte> | Okay I guess |
| 00:57:24 | <klea> | They're available on the web at https://archive.org/account/s3.php too |
| 00:57:32 | <klea> | It's an S3-like API |
| 00:57:34 | <StarletCharlotte> | oh okay thanks |
| 00:59:27 | <StarletCharlotte> | Tried again, same error. It's asking about a config file or something? |
| 00:59:43 | <@JAA> | To explain that error referencing `ia configure`: `ia-upload-stream` reads ia's config file if it's available (and not overridden by the environment variable). There's no actual dependency on `ia`. |
| 01:00:29 | <StarletCharlotte> | I assume ia is from python-internetarchive? |
| 01:00:55 | <StarletCharlotte> | I set the environment variables for the S3 credentials so it's not that. |
| 01:01:27 | <TheTechRobo> | StarletCharlotte: the sysctl option should go back to what it was before after a reboot, FWIW, so don't worry about losing it |
| 01:01:39 | <StarletCharlotte> | got it |
| 01:01:41 | <@JAA> | Sounds like you didn't set them correctly then. It won't even reach that code when they're set. |
| 01:02:03 | <TheTechRobo> | (ia comes from https://pypi.org/project/internetarchive BTW) |
| 01:02:40 | <StarletCharlotte> | Huh, I guess set just sets the shell variables and not environment variables? I think? |
| 01:02:48 | <@JAA> | Yes |
| 01:02:50 | <klea> | try to export. |
| 01:02:53 | <TheTechRobo> | export IA_S3_ACCESS=... |
| 01:03:07 | <@JAA> | Either run it as `IA_S3_ACCESS=... IA_S3_SECRET=... ./ia-upload-stream ...` or `export` them. |
| 01:03:49 | <StarletCharlotte> | There it goes. thank you |
| 01:03:50 | <@JAA> | And `set` sets the arguments, not variables. |
| 01:03:54 | <StarletCharlotte> | that explains a lot |
| 01:04:34 | | StarletCharlotte quits [Client Quit] |
| 01:11:50 | | pabs (pabs) joins |
| 01:13:49 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:14:30 | | LddPotato (LddPotato) joins |
| 01:15:12 | | roverinexile joins |
| 01:17:41 | | rover quits [Ping timeout: 272 seconds] |
| 01:18:31 | | etnguyen03 (etnguyen03) joins |
| 01:24:27 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:25:09 | | LddPotato (LddPotato) joins |
| 01:34:57 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:35:51 | | LddPotato (LddPotato) joins |
| 01:36:03 | | petrichor quits [Ping timeout: 272 seconds] |
| 01:44:13 | | fangfufu quits [Client Quit] |
| 01:45:53 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:46:34 | | LddPotato (LddPotato) joins |
| 01:50:08 | | fangfufu joins |
| 01:50:27 | | fangfufu is now authenticated as fangfufu |
| 01:50:28 | | kansei- (kansei) joins |
| 01:51:52 | | kansei quits [Ping timeout: 256 seconds] |
| 02:03:57 | | LddPotato quits [Read error: Connection reset by peer] |
| 02:05:31 | | LddPotato (LddPotato) joins |
| 02:29:50 | | pokechu22 quits [Ping timeout: 256 seconds] |
| 02:40:35 | | pokechu22 (pokechu22) joins |
| 02:52:14 | | ducky_ (ducky) joins |
| 02:53:04 | | ducky quits [Ping timeout: 256 seconds] |
| 02:53:04 | | ducky_ is now known as ducky |
| 02:53:29 | | thalia quits [Quit: Connection closed for inactivity] |
| 03:06:40 | | ducky quits [Ping timeout: 256 seconds] |
| 03:08:16 | | ducky (ducky) joins |
| 03:30:58 | | nexussfan quits [Quit: Konversation terminated!] |
| 03:36:42 | | Godzfire quits [Quit: Ooops, wrong browser tab.] |
| 03:47:30 | | nexussfan (nexussfan) joins |
| 04:08:05 | | etnguyen03 quits [Remote host closed the connection] |
| 04:08:17 | | fireatseaparks quits [Quit: Textual IRC Client: www.textualapp.com] |
| 04:16:13 | | fireatseaparks (fireatseaparks) joins |
| 04:39:57 | | Island quits [Read error: Connection reset by peer] |
| 04:46:18 | | cyanbox joins |
| 04:55:14 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:04:32 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:05:03 | | khaoohs quits [Ping timeout: 272 seconds] |
| 05:06:01 | | n9nes joins |
| 05:06:36 | | khaoohs joins |
| 05:08:58 | | nexussfan quits [Client Quit] |
| 05:15:33 | | steering wonders how thoroughly wikipedia links have been archived |
| 05:23:34 | <steering> | i know there's bots that try and point links to archives when they're dead but is there stuff going through and SPN'ing links for example |
| 05:24:26 | <BlankEclair> | wikipedia-eventstream or something |
| 05:24:55 | <BlankEclair> | https://archive.org/details/wikipedia-eventstream?tab=about |
| 05:27:59 | <pokechu22> | Yeah, my understanding is that there's a project that does that (that isn't by archiveteam). Looking at https://archive.org/details/wikipedia-eventstream?tab=collection&sort=-publicdate it seems like stuff is ran weeklyish? |
| 05:35:01 | <steering> | ah good :) |
| 06:08:23 | | Snivy quits [Ping timeout: 272 seconds] |
| 06:15:57 | | petrichor (petrichor) joins |
| 06:25:00 | | fionera quits [Ping timeout: 256 seconds] |
| 06:29:23 | | BennyOtt (BennyOtt) joins |
| 06:40:59 | | Wohlstand1 (Wohlstand) joins |
| 06:43:24 | | Wohlstand1 is now known as Wohlstand |
| 06:51:24 | | Wohlstand quits [Client Quit] |
| 07:12:09 | | Snivy (Snivy) joins |
| 08:30:53 | | rohvani quits [Ping timeout: 272 seconds] |
| 08:55:44 | | ducky quits [Ping timeout: 256 seconds] |
| 08:57:25 | <ericgallager> | https://en.wikipedia.org/wiki/User:GreenC_bot does archiving of Wikipedia links |
| 08:57:40 | <ericgallager> | https://en.wikipedia.org/wiki/User:GreenC/WaybackMedic |
| 08:59:51 | <ericgallager> | oh and this one too: https://en.wikipedia.org/wiki/User:InternetArchiveBot |
| 09:14:57 | | ducky (ducky) joins |
| 09:32:19 | | sec^nd quits [Ping timeout: 244 seconds] |
| 09:34:36 | | sec^nd (second) joins |
| 09:58:55 | | BornOn420 quits [Ping timeout: 272 seconds] |
| 10:41:42 | | TheEnbyperor quits [Ping timeout: 256 seconds] |
| 10:41:59 | | TheEnbyperor_ quits [Ping timeout: 272 seconds] |
| 10:46:16 | | TheEnbyperor (TheEnbyperor) joins |
| 10:51:29 | | TheEnbyperor quits [Ping timeout: 272 seconds] |
| 10:57:47 | | TheEnbyperor joins |
| 10:59:34 | | TheEnbyperor_ (TheEnbyperor) joins |
| 11:02:13 | | Dada joins |
| 11:05:11 | | Dada quits [Remote host closed the connection] |
| 11:40:27 | | APOLLO03a joins |
| 11:42:54 | | APOLLO03 quits [Ping timeout: 256 seconds] |
| 11:59:46 | | StarletCharlotte joins |
| 12:00:03 | | Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:12 | <StarletCharlotte> | Good news: ia-upload-stream.py works! Bad news: I can't edit the metadata to say I finished uploading the actual file instead of the placeholder now because it turns out the Internet Archive REALLY doesn't like when an item identifier has dots in it. But it only tells you that breaks things AFTER you make that the name of your item, only when you |
| 12:02:12 | <StarletCharlotte> | try to edit the item. https://archive.org/details/ftp.funcom.com |
| 12:02:15 | <StarletCharlotte> | Not sure what to do. |
| 12:02:48 | | Bleo1826007227196234552220 joins |
| 13:03:07 | | StarletCharlotte quits [Client Quit] |
| 13:19:32 | | Webuser302981 joins |
| 13:19:39 | <Webuser302981> | What |
| 13:20:06 | | Webuser302981 quits [Client Quit] |
| 13:20:22 | | @imer nods |
| 13:22:51 | | Arcorann_ quits [Ping timeout: 272 seconds] |
| 13:42:17 | | ice quits [Quit: WeeChat 4.7.1] |
| 13:42:29 | | oxtyped quits [Ping timeout: 272 seconds] |
| 13:54:00 | | mgrytbak8 joins |
| 13:54:50 | | ice joins |
| 13:55:09 | | mgrytbak quits [Ping timeout: 272 seconds] |
| 13:55:09 | | mgrytbak8 is now known as mgrytbak |
| 14:15:02 | | oxtyped joins |
| 14:34:13 | | Webuser247771 joins |
| 14:34:57 | | Webuser247771 quits [Client Quit] |
| 14:40:07 | | oxtyped quits [Ping timeout: 272 seconds] |
| 14:49:40 | | oxtyped joins |
| 14:51:57 | | GodzFire joins |
| 14:58:28 | <GodzFire> | pokechu22 I was watching the crawler and noticed it was seemingly scrapping some production websites so I checked the productionmusic.fandom.com_articles_and_outlinks.txt list. There's a crap ton that should be removed. I went through and took out 17000 links. Here's an updated txt that only has ProdMusic Wiki stuff, could you restart it with |
| 14:58:28 | <GodzFire> | this?: https://litter.catbox.moe/gke9wfo08aoe2dpx.txt |
| 15:00:18 | <GodzFire> | I was wondering why it pulled 111gbs when the site is only 12 total. |
| 15:04:21 | | FiTheArchiver joins |
| 15:04:39 | | FiTheArchiver quits [Remote host closed the connection] |
| 15:14:18 | | Dada joins |
| 15:19:02 | | Webuser963758 joins |
| 15:19:30 | | Webuser963758 quits [Client Quit] |
| 15:20:51 | <aaq|m> | That would compress down well at least |
| 15:21:51 | <justauser> | GodzFire: That's fine, our motto is "Archive All The Things". |
| 15:22:29 | <justauser> | IA is willing to store the junk. |
| 15:24:28 | <justauser> | However, it only pulled 7GB so far - where is your number from? |
| 15:26:33 | <justauser> | Oh, nevermind - it's my number that came from a frozen dashboard. |