| 00:00:52 | | nexussfan (nexussfan) joins |
| 00:11:14 | | mr_sarge quits [Read error: Connection reset by peer] |
| 00:11:47 | | StarletCharlotte joins |
| 00:11:58 | <StarletCharlotte> | What's the best way to upload large files to the Internet Archive? |
| 00:12:22 | <StarletCharlotte> | Because I'm trying to upload an archive of ftp://ftp.funcom.com and it's stuck at 4.9 GB. It's been several hours. |
| 00:12:39 | <StarletCharlotte> | My internet isn't the best but I don't think it's that bad. |
| 00:14:01 | <Yakov> | reading some of ABs source, I think it supports ftp..? |
| 00:14:08 | <@imer> | StarletCharlotte: there's some tips here (if you've not seen it yet) https://wiki.archiveteam.org/index.php/Internet_Archive#Upload_speed |
| 00:17:20 | <pokechu22> | ab doesn't interact nicely with ftp - there's some code for it but it crashes most of the time and as such is mostly disabled at this point |
| 00:18:11 | <StarletCharlotte> | imer I'll take a look |
| 00:26:25 | <@OrIdow6> | What's the current tracker architecture? I found old logs talking about it being an Nginx(Lua) proxy that talks to the original tracker, but doesn't directly talk to Redis - is that still the case? |
| 00:28:36 | <nicolas17> | StarletCharlotte: are you on Linux? |
| 00:28:46 | <StarletCharlotte> | yeah |
| 00:29:47 | <nicolas17> | in my experience "sudo sysctl net.ipv4.tcp_congestion_control=bbr" makes uploads to archive.org significantly faster |
| 00:29:50 | <nicolas17> | won't help with ongoing connections/uploads though, you'd have to start over |
| 00:30:21 | <StarletCharlotte> | Got it. Should I turn it off after though? |
| 00:30:46 | <nicolas17> | I didn't notice any negative effects on the rest of my internet use tbh |
| 00:30:57 | <StarletCharlotte> | got it |
| 00:31:01 | <nicolas17> | but you could run "sudo sysctl net.ipv4.tcp_congestion_control" to see what your current value is |
| 00:31:05 | <nicolas17> | and restore it afterwards |
| 00:31:07 | <StarletCharlotte> | whoops |
| 00:31:12 | <StarletCharlotte> | i uh... already did it |
| 00:31:17 | <StarletCharlotte> | oh well |
| 00:32:59 | <BlankEclair> | out of curiosity, does that only make IA uploads fast, or does it make all tcp connections go faster |
| 00:34:26 | <nicolas17> | there's *something* in archive.org's networking that doesn't interact well with the default congestion control algorithm, I don't understand the details |
| 00:35:34 | <klea> | https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html#:~:text=tcp%5Fcongestion%5Fcontrol%20%2D%20STRING is not very clear what that does. |
| 00:40:24 | <nicolas17> | https://en.wikipedia.org/wiki/TCP_congestion_control |
| 00:40:44 | <StarletCharlotte> | > /usr/bin/python: Error while finding module specification for 'ia-upload-stream.py' (ModuleNotFoundError: __path__ attribute not found on 'ia-upload-stream' while trying to find 'ia-upload-stream.py'). Try using 'ia-upload-stream' instead of 'ia-upload-stream.py' as the module name. |
| 00:40:49 | <StarletCharlotte> | not sure what's going on here |
| 00:41:20 | <StarletCharlotte> | Removing .py also fails |
| 00:41:42 | <StarletCharlotte> | same with removing -m |
| 00:47:47 | <klea> | nicolas17: what version does setting the variable to bbr set it to, BBRv1, BBRv2 or BBRv3? |
| 00:49:12 | <StarletCharlotte> | yeah I can't figure out how to run this. The example on the wiki just doesn't work for some reason |
| 00:50:00 | <@JAA> | Why is that command trying to run it as a module? (I either never knew or forgot that my uploader is even listed there.) |
| 00:51:16 | <StarletCharlotte> | Good question, but not running it as a module also fails. |
| 00:51:18 | <@JAA> | And what's that bit about installing the ia package? ia-upload-stream only depends on requests. |
| 00:51:53 | <StarletCharlotte> | https://pastebin.com/s88c8eJr |
| 00:53:03 | <@JAA> | Hmm yeah, I suppose. |
| 00:53:22 | <@JAA> | That does run the script correctly though. |
| 00:53:46 | <@JAA> | You can specify the S3 credentials via IA_S3_ACCESS and IA_S3_SECRET environment variables as well. |
| 00:54:14 | <@JAA> | And ia-s3-auth can get you those values without `ia configure`. |
| 00:55:52 | | etnguyen03 quits [Client Quit] |
| 00:56:42 | <StarletCharlotte> | S3? |
| 00:57:16 | <StarletCharlotte> | Okay I guess |
| 00:57:24 | <klea> | They're available on the web at https://archive.org/account/s3.php too |
| 00:57:32 | <klea> | It's an S3-like API |
| 00:57:34 | <StarletCharlotte> | oh okay thanks |
| 00:59:27 | <StarletCharlotte> | Tried again, same error. It's asking about a config file or something? |
| 00:59:43 | <@JAA> | To explain that error referencing `ia configure`: `ia-upload-stream` reads ia's config file if it's available (and not overridden by the environment variable). There's no actual dependency on `ia`. |
| 01:00:29 | <StarletCharlotte> | I assume ia is from python-internetarchive? |
| 01:00:55 | <StarletCharlotte> | I set the environment variables for the S3 credentials so it's not that. |
| 01:01:27 | <TheTechRobo> | StarletCharlotte: the sysctl option should go back to what it was before after a reboot, FWIW, so don't worry about losing it |
| 01:01:39 | <StarletCharlotte> | got it |
| 01:01:41 | <@JAA> | Sounds like you didn't set them correctly then. It won't even reach that code when they're set. |
| 01:02:03 | <TheTechRobo> | (ia comes from https://pypi.org/project/internetarchive BTW) |
| 01:02:40 | <StarletCharlotte> | Huh, I guess set just sets the shell variables and not environment variables? I think? |
| 01:02:48 | <@JAA> | Yes |
| 01:02:50 | <klea> | try to export. |
| 01:02:53 | <TheTechRobo> | export IA_S3_ACCESS=... |
| 01:03:07 | <@JAA> | Either run it as `IA_S3_ACCESS=... IA_S3_SECRET=... ./ia-upload-stream ...` or `export` them. |
| 01:03:49 | <StarletCharlotte> | There it goes. thank you |
| 01:03:50 | <@JAA> | And `set` sets the arguments, not variables. |
| 01:03:54 | <StarletCharlotte> | that explains a lot |
| 01:04:34 | | StarletCharlotte quits [Client Quit] |
| 01:11:50 | | pabs (pabs) joins |
| 01:13:49 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:14:30 | | LddPotato (LddPotato) joins |
| 01:15:12 | | roverinexile joins |
| 01:17:41 | | rover quits [Ping timeout: 272 seconds] |
| 01:18:31 | | etnguyen03 (etnguyen03) joins |
| 01:24:27 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:25:09 | | LddPotato (LddPotato) joins |
| 01:34:57 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:35:51 | | LddPotato (LddPotato) joins |
| 01:36:03 | | petrichor quits [Ping timeout: 272 seconds] |
| 01:44:13 | | fangfufu quits [Client Quit] |
| 01:45:53 | | LddPotato quits [Read error: Connection reset by peer] |
| 01:46:34 | | LddPotato (LddPotato) joins |
| 01:50:08 | | fangfufu joins |
| 01:50:27 | | fangfufu is now authenticated as fangfufu |
| 01:50:28 | | kansei- (kansei) joins |
| 01:51:52 | | kansei quits [Ping timeout: 256 seconds] |
| 02:03:57 | | LddPotato quits [Read error: Connection reset by peer] |
| 02:05:31 | | LddPotato (LddPotato) joins |
| 02:29:50 | | pokechu22 quits [Ping timeout: 256 seconds] |
| 02:40:35 | | pokechu22 (pokechu22) joins |
| 02:52:14 | | ducky_ (ducky) joins |
| 02:53:04 | | ducky quits [Ping timeout: 256 seconds] |
| 02:53:04 | | ducky_ is now known as ducky |
| 02:53:29 | | thalia quits [Quit: Connection closed for inactivity] |
| 03:06:40 | | ducky quits [Ping timeout: 256 seconds] |
| 03:08:16 | | ducky (ducky) joins |
| 03:30:58 | | nexussfan quits [Quit: Konversation terminated!] |
| 03:36:42 | | Godzfire quits [Quit: Ooops, wrong browser tab.] |
| 03:47:30 | | nexussfan (nexussfan) joins |
| 04:08:05 | | etnguyen03 quits [Remote host closed the connection] |
| 04:08:17 | | fireatseaparks quits [Quit: Textual IRC Client: www.textualapp.com] |
| 04:16:13 | | fireatseaparks (fireatseaparks) joins |
| 04:39:57 | | Island quits [Read error: Connection reset by peer] |
| 04:46:18 | | cyanbox joins |
| 04:55:14 | | DogsRNice quits [Read error: Connection reset by peer] |
| 05:04:32 | | n9nes quits [Ping timeout: 256 seconds] |
| 05:05:03 | | khaoohs quits [Ping timeout: 272 seconds] |
| 05:06:01 | | n9nes joins |
| 05:06:36 | | khaoohs joins |
| 05:08:58 | | nexussfan quits [Client Quit] |
| 05:15:33 | | steering wonders how thoroughly wikipedia links have been archived |
| 05:23:34 | <steering> | i know there's bots that try and point links to archives when they're dead but is there stuff going through and SPN'ing links for example |
| 05:24:26 | <BlankEclair> | wikipedia-eventstream or something |
| 05:24:55 | <BlankEclair> | https://archive.org/details/wikipedia-eventstream?tab=about |
| 05:27:59 | <pokechu22> | Yeah, my understanding is that there's a project that does that (that isn't by archiveteam). Looking at https://archive.org/details/wikipedia-eventstream?tab=collection&sort=-publicdate it seems like stuff is ran weeklyish? |
| 05:35:01 | <steering> | ah good :) |
| 06:08:23 | | Snivy quits [Ping timeout: 272 seconds] |
| 06:15:57 | | petrichor (petrichor) joins |
| 06:25:00 | | fionera quits [Ping timeout: 256 seconds] |
| 06:29:23 | | BennyOtt (BennyOtt) joins |
| 06:40:59 | | Wohlstand1 (Wohlstand) joins |
| 06:43:24 | | Wohlstand1 is now known as Wohlstand |
| 06:51:24 | | Wohlstand quits [Client Quit] |
| 07:12:09 | | Snivy (Snivy) joins |
| 08:30:53 | | rohvani quits [Ping timeout: 272 seconds] |
| 08:55:44 | | ducky quits [Ping timeout: 256 seconds] |
| 08:57:25 | <ericgallager> | https://en.wikipedia.org/wiki/User:GreenC_bot does archiving of Wikipedia links |
| 08:57:40 | <ericgallager> | https://en.wikipedia.org/wiki/User:GreenC/WaybackMedic |
| 08:59:51 | <ericgallager> | oh and this one too: https://en.wikipedia.org/wiki/User:InternetArchiveBot |
| 09:14:57 | | ducky (ducky) joins |
| 09:32:19 | | sec^nd quits [Ping timeout: 244 seconds] |
| 09:34:36 | | sec^nd (second) joins |
| 09:58:55 | | BornOn420 quits [Ping timeout: 272 seconds] |
| 10:41:42 | | TheEnbyperor quits [Ping timeout: 256 seconds] |
| 10:41:59 | | TheEnbyperor_ quits [Ping timeout: 272 seconds] |
| 10:46:16 | | TheEnbyperor (TheEnbyperor) joins |
| 10:51:29 | | TheEnbyperor quits [Ping timeout: 272 seconds] |
| 10:57:47 | | TheEnbyperor joins |
| 10:59:34 | | TheEnbyperor_ (TheEnbyperor) joins |
| 11:02:13 | | Dada joins |
| 11:05:11 | | Dada quits [Remote host closed the connection] |
| 11:40:27 | | APOLLO03a joins |
| 11:42:54 | | APOLLO03 quits [Ping timeout: 256 seconds] |
| 11:59:46 | | StarletCharlotte joins |
| 12:00:03 | | Bleo1826007227196234552220 quits [Quit: The Lounge - https://thelounge.chat] |
| 12:02:12 | <StarletCharlotte> | Good news: ia-upload-stream.py works! Bad news: I can't edit the metadata to say I finished uploading the actual file instead of the placeholder now because it turns out the Internet Archive REALLY doesn't like when an item identifier has dots in it. But it only tells you that breaks things AFTER you make that the name of your item, only when you |
| 12:02:12 | <StarletCharlotte> | try to edit the item. https://archive.org/details/ftp.funcom.com |
| 12:02:15 | <StarletCharlotte> | Not sure what to do. |
| 12:02:48 | | Bleo1826007227196234552220 joins |
| 13:03:07 | | StarletCharlotte quits [Client Quit] |
| 13:19:32 | | Webuser302981 joins |
| 13:19:39 | <Webuser302981> | What |
| 13:20:06 | | Webuser302981 quits [Client Quit] |
| 13:20:22 | | @imer nods |
| 13:22:51 | | Arcorann_ quits [Ping timeout: 272 seconds] |
| 13:42:17 | | ice quits [Quit: WeeChat 4.7.1] |
| 13:42:29 | | oxtyped quits [Ping timeout: 272 seconds] |
| 13:54:00 | | mgrytbak8 joins |
| 13:54:50 | | ice joins |
| 13:55:09 | | mgrytbak quits [Ping timeout: 272 seconds] |
| 13:55:09 | | mgrytbak8 is now known as mgrytbak |
| 14:15:02 | | oxtyped joins |
| 14:34:13 | | Webuser247771 joins |
| 14:34:57 | | Webuser247771 quits [Client Quit] |
| 14:40:07 | | oxtyped quits [Ping timeout: 272 seconds] |
| 14:49:40 | | oxtyped joins |
| 14:51:57 | | GodzFire joins |
| 14:58:28 | <GodzFire> | pokechu22 I was watching the crawler and noticed it was seemingly scrapping some production websites so I checked the productionmusic.fandom.com_articles_and_outlinks.txt list. There's a crap ton that should be removed. I went through and took out 17000 links. Here's an updated txt that only has ProdMusic Wiki stuff, could you restart it with |
| 14:58:28 | <GodzFire> | this?: https://litter.catbox.moe/gke9wfo08aoe2dpx.txt |
| 15:00:18 | <GodzFire> | I was wondering why it pulled 111gbs when the site is only 12 total. |
| 15:04:21 | | FiTheArchiver joins |
| 15:04:39 | | FiTheArchiver quits [Remote host closed the connection] |
| 15:14:18 | | Dada joins |
| 15:19:02 | | Webuser963758 joins |
| 15:19:30 | | Webuser963758 quits [Client Quit] |
| 15:20:51 | <aaq|m> | That would compress down well at least |
| 15:21:51 | <justauser> | GodzFire: That's fine, our motto is "Archive All The Things". |
| 15:22:29 | <justauser> | IA is willing to store the junk. |
| 15:24:28 | <justauser> | However, it only pulled 7GB so far - where is your number from? |
| 15:26:33 | <justauser> | Oh, nevermind - it's my number that came from a frozen dashboard. |
| 15:33:39 | <GodzFire> | justauser well from what the other 17000 links are too, it's all collections of actual music files on big Production Music websites which is licensed and could get in trouble. I would really prefer if the job could please just get restarted with only the ProdMusic Wiki links. |
| 15:34:53 | <GodzFire> | It doesn't feel right otherwise. |
| 15:37:21 | | BornOn420 (BornOn420) joins |
| 15:43:27 | <klea> | IA excludes stuff from WBM, so that's fine I believe? |
| 16:02:02 | | Island joins |
| 16:02:45 | | polduran joins |
| 16:08:38 | | Wohlstand (Wohlstand) joins |
| 16:10:14 | | Dada quits [Remote host closed the connection] |
| 16:15:45 | | AK quits [Quit: AK] |
| 16:30:07 | | Boppen_ quits [Read error: Connection reset by peer] |
| 16:34:18 | | Boppen (Boppen) joins |
| 16:52:00 | | Dada joins |
| 16:53:16 | | janos777 joins |
| 16:53:23 | | janos778 joins |
| 17:00:23 | <polduran> | hello there. normaly, I only stumble here, when I hear of the approaching end of a website so it gets queued for the archivebot. my understanding of your projects and how you work is therefore very limited. anyway, you may have already heard that archive.today is apparently using site visits for DDoS attacks which makes its already endangered |
| 17:00:23 | <polduran> | future even worse. in the english wikipedia and the german wikipedia (and probably several other languages as well) they started the discussion to ban the URL as it is obviously bad to link to a malicious website. the problem is, that there are almost 700'000 existing links to sites archived on archive.today and its mirrors. most often websites |
| 17:00:23 | <polduran> | that have not been archived (properly) in the wayback machine. i was hoping some of you might be interested to join the discussion and might offer some ideas how to preserve the archived information somewhere else. here the link to the english discussion: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC_5 |
| 17:00:23 | <polduran> | by any chance, is anyone already working on or planing a project to somehow rescue data from there? after all, there are 700k existing links but i cannot imagine how many dead links must be in the wikipedias, which are only archived on archive.today. |
| 17:27:04 | <@arkiver> | archive.is is definitely on my radar |
| 17:30:58 | | lennier2_ joins |
| 17:33:06 | | lennier2 quits [Ping timeout: 256 seconds] |
| 17:46:19 | | ThreeHM quits [Ping timeout: 272 seconds] |
| 17:47:59 | | ThreeHM (ThreeHeadedMonkey) joins |
| 17:48:10 | <pokechu22> | GodzFire: I chose to scrape the outlinks as well because that's what we would have done for a recursive job as well - I included them so that if you clicked on links on the site, those would also be saved... and given that most of them seem to have previously *not* been saved, it feels like it's useful to save them |
| 17:49:41 | <pokechu22> | It should only be files that are public previews - if archivebot is somehow finding music that you're supposed to pay for, then they've done something really weird |
| 17:53:19 | <@JAA> | !tell StarletCharlotte Dots in IA item names are perfectly fine; I use them all the time. And there's a script in little-things for metadata as well. Feel free to ask in #internetarchive if you have more questions. |
| 17:53:19 | <eggdrop> | [tell] ok, I'll tell StarletCharlotte when they join next |
| 17:54:36 | <pokechu22> | (I was expecting it to be outlinks with text information about albums only, but if they have previews, might as well get them too) |
| 18:05:19 | | sg72 quits [Ping timeout: 272 seconds] |
| 18:08:37 | | sg72 joins |
| 18:26:41 | | Webuser043121 joins |
| 18:27:41 | | SootBector quits [Remote host closed the connection] |
| 18:28:49 | | SootBector (SootBector) joins |
| 18:35:04 | | Webuser043121 quits [Client Quit] |
| 18:36:18 | <GodzFire> | pokechu22 everything from ProdMusic Wiki seems to be erroring |
| 18:37:10 | <pokechu22> | Yeah, that's something fandom does - it has a lot of things that look like relative links to scripts to archivebot, but actually aren't |
| 18:37:51 | <pokechu22> | I'll add an ignore for some of them, but it's annoying to deal with (which is one reason why I did an !ao < list job like this instead of a fully recursive !a job) |
| 18:38:14 | <GodzFire> | Is there any way to see how far along it is and how many it's done/have left? |
| 18:41:34 | <pokechu22> | Yes, though the information is presented in a more obvious way on http://archivebot.com/3. It's processed 225k URLs and has another 275k URLs to go (though some are ignored or otherwise not relevant). That means it's saved the HTML for everything in my original list of 53k URLs, and is now saving images/scripts/media files embedded in those pages |
| 18:42:14 | <GodzFire> | I truly appreciate what you're doing and helping, but I really do have a worry about all the sound files. If you wanted to separate this into two jobs where one is all the ProdMusicWiki stuff and another is all the links to other sites, that would make me feel a lot better. One so the ProdMusicWiki can get done by itself, but the other is because |
| 18:42:14 | <GodzFire> | the sheer amount of filesize and music those others link to is isanity. |
| 18:43:05 | | Webuser567384 joins |
| 18:43:29 | <GodzFire> | For example one music library alone is easily 50 gigs of samples and thousands of pages, and currently it's trying to pull probably a hundred different music libraries |
| 18:43:37 | <pokechu22> | Yeah, I probably should have split it off initially and would have if I'd thought of the media files... but archivebot doesn't have a good way of doing that without aborting the job and starting from scratch (which would mean I'd duplicate the 131 GB of media already saved) |
| 18:47:10 | <pokechu22> | I could add ignores for the media files and then !ao < list them afterwards, but I don't really feel like that makes much of a difference (archivebot jobs get split into 5GB chunks that are uploaded as they finish, so just because it's downloaded over 100 GB doesn't mean there's a single 100 GB file sitting on the machine or any risk of running out of disk space) |
| 18:47:31 | <GodzFire> | Would it be possible to just do a separate additional job for just the ProdMusicWiki stuff since it's only 12 gigs? That way this other one can keep going. Then it will just see the ProdMusicWiki stuff is already uploaded to wayback and skip it. |
| 18:47:43 | <nicolas17> | it does not skip stuff that way |
| 18:48:32 | <nicolas17> | note that 85GB was *already* uploaded to archive.org |
| 19:11:17 | | ericgallager quits [Read error: Connection reset by peer] |
| 19:14:57 | | ericgallager joins |
| 19:19:25 | | mls quits [Ping timeout: 272 seconds] |
| 19:20:35 | | mls (mls) joins |
| 19:34:16 | | UwU quits [Remote host closed the connection] |
| 19:34:52 | | UwU joins |
| 19:46:27 | | UwU quits [Remote host closed the connection] |
| 19:47:46 | | UwU joins |
| 19:53:08 | | polduran quits [Quit: Ooops, wrong browser tab.] |
| 19:57:11 | | UwU quits [Remote host closed the connection] |
| 19:58:21 | | UwU joins |
| 20:09:47 | | UwU quits [Remote host closed the connection] |
| 20:10:22 | | UwU joins |
| 20:15:08 | <klea> | https://codeberg.org/lindenii/sethrawall - sethrawall is a small HTTP reverse proxy with SSH-based authentication. |
| 20:26:39 | | UwU quits [Remote host closed the connection] |
| 20:27:16 | | UwU joins |
| 20:44:01 | | UwU quits [Remote host closed the connection] |
| 20:44:36 | | UwU joins |
| 21:10:54 | | GodzFire quits [Quit: Ooops, wrong browser tab.] |
| 21:11:39 | | UwU quits [Remote host closed the connection] |
| 21:12:16 | | UwU joins |
| 21:20:05 | | Webuser567384 quits [Client Quit] |
| 21:30:32 | | UwU quits [Remote host closed the connection] |
| 21:31:13 | | UwU joins |
| 21:32:13 | | Dj-Wawa quits [] |
| 21:34:58 | | Dj-Wawa joins |
| 21:34:58 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 21:35:04 | | Dada quits [Ping timeout: 256 seconds] |
| 21:43:18 | | fionera joins |
| 21:43:18 | | fionera is now authenticated as Fionera |
| 21:43:18 | | fionera quits [Changing host] |
| 21:43:18 | | fionera (Fionera) joins |
| 21:45:27 | | Hackerpcs quits [Quit: Hackerpcs] |
| 21:46:22 | | Hackerpcs (Hackerpcs) joins |
| 21:57:23 | | Dj-Wawa quits [Client Quit] |
| 21:57:30 | | Dj-Wawa joins |
| 21:57:30 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 22:00:45 | | Dj-Wawa quits [Client Quit] |
| 22:00:53 | | Dj-Wawa joins |
| 22:00:54 | | Dj-Wawa is now authenticated as Dj-Wawa |
| 22:11:35 | | UwU quits [Remote host closed the connection] |
| 22:12:20 | | UwU joins |
| 22:31:39 | | Webuser851055 joins |
| 22:42:30 | | G4te_Keep3r34924156 quits [Ping timeout: 256 seconds] |
| 22:44:20 | | UwU quits [Client Quit] |
| 22:44:50 | | G4te_Keep3r34924156 joins |
| 22:44:59 | | UwU joins |
| 22:46:55 | | thedude joins |
| 22:47:48 | | etnguyen03 (etnguyen03) joins |
| 22:48:22 | <thedude> | I'm trying to recover a webpage from archive.today archives. Are there any tools out there that can do this? |
| 22:50:00 | <thedude> | I'd rather not try to hack something together in selenium myself |
| 22:50:36 | <klea> | What's the current approach around archiving Dropbox links? (i'm interested in archiving https://www.dropbox.com/s/l8yoah76t7nq04y/mueller-report.pdf from a url list I found somewhere on the web) |
| 22:52:44 | <pokechu22> | https://www.dropbox.com/s/l8yoah76t7nq04y/mueller-report.pdf?dl=1 and https://dl.dropboxusercontent.com/s/l8yoah76t7nq04y/mueller-report.pdf - note that www.dropbox.com is excluded from WBM |
| 22:59:13 | <klea> | so shoving those two into AB? |
| 23:00:18 | <pokechu22> | Yeah, I'll do that |
| 23:01:58 | | Arcorann_ (Arcorann) joins |
| 23:02:07 | <klea> | Thanks |
| 23:02:56 | | thedude quits [Client Quit] |
| 23:16:45 | | atphoenix__ (atphoenix) joins |
| 23:19:20 | | atphoenix_ quits [Ping timeout: 256 seconds] |
| 23:23:32 | | Webuser851055 quits [Client Quit] |
| 23:46:41 | | ericgallager quits [Ping timeout: 272 seconds] |
| 23:47:57 | | nicolas17 quits [Ping timeout: 272 seconds] |
| 23:48:30 | | nicolas17 (nicolas17) joins |
| 23:57:45 | | rohvani joins |