00:27:56 | | SootBector quits [Remote host closed the connection] |
00:28:21 | | SootBector (SootBector) joins |
00:53:23 | | DogsRNice joins |
01:58:18 | | DogsRNice_ joins |
02:01:43 | | DogsRNice quits [Ping timeout: 255 seconds] |
02:14:09 | | BenFranske joins |
02:16:01 | <BenFranske> | Does anyone have any information on what "RIghts issue" the Internet Archive has in making old copies of the twit.tv podcasts available? All I can get from them is "we don't comment on rights issues" even though these are all released under a CC license... I was working on archiving everything TWiT has published and got to about 7500 episodes (of |
02:16:02 | <BenFranske> | 24,500) but my account got locked and most (but not yet all) of them have been pulled down. |
02:16:55 | <nicolas17> | you got content taken down that was actually under Creative Commons? :| |
02:17:37 | <BenFranske> | Yep, I was shocked |
02:17:52 | | tapos joins |
02:19:25 | <BenFranske> | I sent an email to info@archive.org asking to get some collections created and got back "Please do not upload these. We already have an archive of them and at this time they cannot be made public." I asked for clarification given the CC status and got back "we do not comment on rights issues." Then they locked my account and started pulling them |
02:20:39 | <BenFranske> | See my account which had 7500+ items earlier today https://archive.org/details/@benfranske and currently has 870 (though I expect that to drop, they seem to be pulling stuff in batches) |
02:23:24 | <BenFranske> | I'm especially annoyed because I was backing up a bunch of metadata (including but not limited to the human generated transcripts and show notes for Security Now which are also CC licensed) for each episode as well which I doubt is in whatever they "already have". |
02:44:09 | | DogsRNice_ quits [Read error: Connection reset by peer] |
03:36:48 | | BenFranske quits [Client Quit] |
05:07:09 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
05:12:18 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:34:13 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
05:41:50 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
06:02:44 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
06:05:31 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
07:14:20 | | Arcorann_ joins |
07:58:59 | | qwertyasdfuiopghjkl quits [Client Quit] |
08:08:17 | | SootBector quits [Ping timeout: 250 seconds] |
08:11:45 | | SootBector (SootBector) joins |
09:18:13 | | Doran quits [Ping timeout: 255 seconds] |
09:19:46 | | lea (lea_) joins |
09:20:29 | <lea> | I want to archive a site behind a login wall |
09:20:38 | <lea> | site in question: https://usdb.animux.de/ (hosts synced song texts for karaoke apps) |
09:20:51 | <lea> | is there a documentation on the preferred format for uploads? |
09:21:03 | <lea> | since these are individual files, I guess I could just upload tens of thousands of individual files to the archive? |
09:22:45 | | Doran (Doranwen) joins |
09:26:45 | | Doran quits [Remote host closed the connection] |
09:44:06 | | pabs quits [Ping timeout: 265 seconds] |
09:47:03 | | pabs (pabs) joins |
09:57:57 | | Doran (Doranwen) joins |
10:00:05 | | f_ (funderscore) joins |
10:14:55 | | Doran quits [Ping timeout: 255 seconds] |
10:57:00 | | Doran (Doranwen) joins |
11:07:43 | | f_ quits [Client Quit] |
11:07:49 | | SootBector quits [Remote host closed the connection] |
11:08:18 | | SootBector (SootBector) joins |
11:44:44 | | thalia quits [Quit: Connection closed for inactivity] |
13:27:04 | | Arcorann_ quits [Ping timeout: 255 seconds] |
13:40:34 | | sonick quits [Client Quit] |
13:53:17 | | tapos quits [Client Quit] |
14:17:31 | | s-crypt quits [Quit: Ping timeout (120 seconds)] |
14:17:47 | | s-crypt (s-crypt) joins |
15:16:28 | | f_ (funderscore) joins |
15:26:30 | | RealPerson joins |
15:28:33 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
17:02:14 | | f_ quits [Remote host closed the connection] |
17:03:00 | | f_ (funderscore) joins |
17:09:01 | | f_ quits [Remote host closed the connection] |
17:10:13 | | RealPerson leaves |
17:10:29 | | RealPerson joins |
17:11:03 | | f_ (funderscore) joins |
17:19:55 | | f_ quits [Ping timeout: 250 seconds] |
17:21:02 | | f_ (funderscore) joins |
19:22:07 | | f_ quits [Ping timeout: 250 seconds] |
21:09:01 | | Webuser536 joins |
21:10:38 | <Webuser536> | Moving my question here, is there a way of using wget to download files from the Wayback Machine, my connection gets cut off after around ~9000000 bytes. |
21:13:09 | <@JAA> | Hmm, do you have an example? |
21:17:09 | <Webuser536> | Say you go onto a website in the Wayback Machine, and you click on a direct download link on that page, on a normal browser on windows (Chrome, Firefox, etc) it downloads the file normally. |
21:17:09 | <Webuser536> | But if I use wget, the connection gets cut off and it's unable to fully download. |
21:17:27 | <Webuser536> | If you mean an example link I can go get one, if needed. |
21:17:39 | <@JAA> | Yeah, an example URL where you see that behaviour. |
21:20:31 | <@JAA> | I just tested with a random 166 MB file, and that worked just fine. |
21:22:06 | <Webuser536> | Ah okay, only just realized it's because I'm downloading a much bigger file, not too huge but I can't find it anywhere else on the internet. |
21:28:34 | <nicolas17> | Webuser536: are you sure that *same* file works on the browser and not on wget? |
21:28:53 | <nicolas17> | it's possible that the wayback machine doesn't have the full file archived |
21:33:22 | <@JAA> | Yeah, Save Page Now truncates large files, for example. |
21:33:32 | <@JAA> | Although that limit is much higher than 9 MB. |
21:34:37 | <Webuser536> | Just checked, it gets to 9.70k in wget and it fails at 9.20MB on browser. |
21:35:04 | <Webuser536> | Sad, assuming you are right about everything this probably means the file is lost. |
21:36:08 | <@JAA> | Maybe, maybe not, no way for us to know if you don't share the URL. :-) |
21:37:10 | <Webuser536> | Yeah, sorry about holding off with not sharing it. |
21:37:11 | <Webuser536> | https://web.archive.org/web/20220818205148/http://github-enterprise.s3.amazonaws.com/kvm/releases/github-enterprise-3.6.0.qcow2 |
21:37:11 | <Webuser536> | I wanted to see how the contents changed compared to the currently released build. |
21:39:02 | <katia> | 9720206 bytes via wget |
21:39:10 | <@JAA> | Yeah, same |
21:40:02 | <@JAA> | < warning: 299 wayback content truncated by "time" |
21:41:32 | <@JAA> | Definitely incomplete, even if there might be more than the 9.7 MB. |
21:41:52 | <Webuser536> | Unfortunate, but semi-expected. |
21:41:53 | <Webuser536> | Shame that those versions are probably fully lost. |
21:42:09 | <@JAA> | The WARC is only 955.4 MiB, so the full file couldn't possibly be in there anyway. |
21:44:11 | <Webuser536> | Yeah, probably should've expected this outcome. |
21:53:47 | <TheTechRobo> | lea: WARC is preferred as an archival format, but if you want to avoid that whole mess (I wouldn't blame you), probably just upload it in whatever format makes the most sense |
21:54:20 | <TheTechRobo> | Note that many small files tend to make archive.org's servers sad, so you might want to chunk them up in tarballs or zips |
21:54:50 | <TheTechRobo> | You can also contact them directly (info@archive.org) and discuss with them |
21:58:38 | <that_lurker> | Webuser536: You could also maybe ask githubs customer service if they could give you the file. (generally most likely no) |
22:09:00 | <Webuser536> | Yeah, I'm assuming seeing as I'm not McDonalds or something they probably won't give it to me. |
22:09:16 | <Webuser536> | I will try that though. |
22:12:08 | <Webuser536> | Nevermind, it seems like a hassle to get in contact with specifically who would tell me the info I need. |
22:18:21 | <Webuser536> | There is some good news though, I just found out the University Of Oklahoma archived some old builds in 2016, meaning I could still reference back what I need to those. |
22:22:54 | <Webuser536> | Maybe I should've looked more, it seems like a lot of github enterprise builds WERE actually saved with Save Now. |
22:23:10 | <Webuser536> | I'll attempt downloading one of those to completion. |
22:24:11 | <Webuser536> | *Might be saved, not too sure yet |
22:25:34 | <Webuser536> | The Oklahoma ones are absolutely saved though. |
22:25:35 | <Webuser536> | Sorry for the question that I should've looked into more, I should probably try and upload these to the Internet Archive at some point. |
22:26:29 | <@JAA> | SPN has long been truncating by size though. So those might be incomplete, too. |
22:27:12 | <nicolas17> | <Webuser536> Just checked, it gets to 9.70k in wget and it fails at 9.20MB on browser. |
22:27:16 | <nicolas17> | I think that's you misreading units |
22:27:38 | <nicolas17> | 9720206 bytes is 9.2MiB, not "9.70k" |
22:29:27 | <@JAA> | Where can that University Of Oklahoma data be found? |
22:30:21 | <Webuser536> | No? wget does seem to be reporting ~9k whenever I try that file. |
22:31:19 | <Webuser536> | JAA https://archive.org/details/ArchiveIt-Partner-164 |
22:31:56 | <@JAA> | Oh, I thought you meant outside of IA. |
22:32:07 | <Webuser536> | Oh, sorry for the miscommunication. |
22:32:24 | <Webuser536> | Nevermind, you were right. It is 9.27M. |
22:32:36 | <@JAA> | I'm not sure duplicating that data would be worthwhile. |
22:32:48 | <Webuser536> | Yeah, fair point. |
22:39:37 | | geezabiscuit quits [Remote host closed the connection] |
22:43:56 | | geezabiscuit (geezabiscuit) joins |
23:57:06 | <lea> | TheTechRobo: how happy is info@archive.org to be contacted over a small unimportant page? |