| 00:22:02 | | Dango360 quits [Ping timeout: 252 seconds] |
| 00:24:22 | | BearFortress joins |
| 00:33:52 | | JohnnyJ quits [Client Quit] |
| 00:34:07 | | JohnnyJ joins |
| 00:35:25 | | nick joins |
| 00:44:23 | <pabs> | https://mastodon.social/@miraheze/110506683712194935 |
| 00:44:46 | <pabs> | (wiki hoster at risk, discussed on #wikiteam) |
| 00:50:25 | | imer quits [Client Quit] |
| 00:55:22 | | imer (imer) joins |
| 00:55:26 | | sec^nd quits [Ping timeout: 245 seconds] |
| 00:55:43 | | sec^nd (second) joins |
| 00:58:57 | <fireonlive|s> | does wikiteam <> archiveteam? |
| 00:59:06 | | killsushi quits [Ping timeout: 265 seconds] |
| 01:01:55 | <pokechu22> | wikiteam is an archiveteam project that happens to have team in the name for some reason... at least to my understanding |
| 01:01:55 | | fullpwnmedia quits [Read error: Connection reset by peer] |
| 01:02:36 | | fullpwnmedia joins |
| 01:09:03 | <fireonlive> | ah ok :) it was in a separate GitHub org so was a little confused |
| 01:09:32 | <fireonlive> | (https://github.com/WikiTeam vs https://github.com/ArchiveTeam) |
| 01:10:25 | <fireonlive> | neat pic tho: https://github.com/WikiTeam/wikiteam#this-use-of-github-is-not-an-endorsement |
| 01:15:15 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 01:15:36 | | AmAnd0A joins |
| 01:28:52 | | imer quits [Client Quit] |
| 01:36:18 | | dumbgoy_ quits [Ping timeout: 252 seconds] |
| 01:36:37 | | dumbgoy_ joins |
| 01:39:08 | | imer (imer) joins |
| 01:40:54 | | imer quits [Client Quit] |
| 02:21:58 | | imer (imer) joins |
| 02:23:45 | | imer quits [Client Quit] |
| 02:26:55 | | bufferunderflow quits [Remote host closed the connection] |
| 02:27:52 | | killsushi joins |
| 02:28:04 | | imer (imer) joins |
| 02:39:41 | | imer quits [Client Quit] |
| 02:41:40 | | imer (imer) joins |
| 02:51:25 | | Hajdar (Hajdar) joins |
| 02:52:54 | | JackThompson3 quits [Quit: The Lounge - https://thelounge.chat] |
| 02:55:24 | | imer quits [Client Quit] |
| 02:55:54 | | imer (imer) joins |
| 03:00:29 | | Hajdar quits [Client Quit] |
| 03:01:00 | | Hajdar (Hajdar) joins |
| 03:23:08 | | dumbgoy_ quits [Ping timeout: 265 seconds] |
| 03:36:32 | | JohnnyJ quits [Client Quit] |
| 03:37:25 | | JohnnyJ joins |
| 03:38:36 | | decky_e quits [Ping timeout: 265 seconds] |
| 03:39:08 | | decky_e (decky_e) joins |
| 03:44:24 | | JackThompson3 joins |
| 03:46:45 | | nick quits [Client Quit] |
| 03:47:27 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 03:50:12 | | decky_e quits [Ping timeout: 265 seconds] |
| 03:55:35 | | decky_e joins |
| 03:58:37 | | fireonlive quits [Excess Flood] |
| 03:59:34 | | fireonlive (fireonlive) joins |
| 04:37:11 | | Shjosan quits [Quit: Am sleepy (-, – )…zzzZZZ] |
| 04:37:48 | | Shjosan (Shjosan) joins |
| 04:54:57 | | Island quits [Read error: Connection reset by peer] |
| 04:58:17 | <pcr> | oops forgot that wbm exclusions were two wiki pages, when a mod gets to it can they reject that change? |
| 05:37:49 | <manu|m> | The invidious codebase is probably archived already, isn't it? Looks like the project might be somewhat endangered now: https://github.com/iv-org/invidious/issues/3872 |
| 05:41:11 | | hitgrr8 joins |
| 05:51:38 | | pabs asked SWH to save their github repos and gitea forge, asked #codearchiver and #gitgud to save repos/forge too |
| 06:30:52 | | fireonlive|s quits [Quit: quitters never quit] |
| 06:32:56 | | nicolas17 quits [Ping timeout: 252 seconds] |
| 06:59:11 | | bf_ joins |
| 07:36:37 | | fireonlive quits [Client Quit] |
| 07:37:04 | | fireonlive (fireonlive) joins |
| 07:48:05 | <manu|m> | perfect :) |
| 08:08:08 | | spirit quits [Client Quit] |
| 08:16:51 | | BigBrain quits [Remote host closed the connection] |
| 08:17:08 | | BigBrain (bigbrain) joins |
| 08:48:38 | | dave joins |
| 08:59:46 | | decky_e quits [Client Quit] |
| 09:07:00 | | Perk joins |
| 09:12:03 | | Aoede_ quits [Quit: ZNC - https://znc.in] |
| 09:12:21 | | Aoede (Aoede) joins |
| 09:16:32 | | MetaNova quits [Remote host closed the connection] |
| 09:18:12 | | MetaNova (MetaNova) joins |
| 09:21:00 | | decky_e (decky_e) joins |
| 09:27:36 | | Minkafighter quits [Quit: The Lounge - https://thelounge.chat] |
| 09:27:52 | | Minkafighter joins |
| 09:31:35 | | Elizabeth (Elizabeth) joins |
| 09:34:42 | | Josh joins |
| 09:34:44 | | Josh quits [Client Quit] |
| 09:35:32 | | Josh joins |
| 09:35:54 | | Ruthalas5 quits [Ping timeout: 252 seconds] |
| 09:53:59 | | driib quits [Quit: The Lounge - https://thelounge.chat] |
| 09:54:22 | | driib (driib) joins |
| 09:55:05 | | Ruthalas5 (Ruthalas) joins |
| 10:00:01 | | railen63 quits [Remote host closed the connection] |
| 10:00:16 | | railen63 joins |
| 10:08:42 | | railen63 quits [Remote host closed the connection] |
| 10:09:00 | | railen63 joins |
| 10:12:41 | | driib quits [Remote host closed the connection] |
| 10:13:00 | | driib (driib) joins |
| 10:20:30 | | threedeeitguy quits [Quit: The Lounge - https://thelounge.chat] |
| 10:29:42 | | threedeeitguy (threedeeitguy) joins |
| 10:32:35 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
| 10:38:45 | | Aoede quits [Client Quit] |
| 10:39:08 | | Aoede (Aoede) joins |
| 10:58:04 | | Dango360 (Dango360) joins |
| 11:01:41 | | JohnnyJ quits [Client Quit] |
| 11:02:09 | | JohnnyJ joins |
| 11:10:45 | <h2ibot> | Rewby edited Reddit (+542): https://wiki.archiveteam.org/?diff=49887&oldid=49881 |
| 11:12:24 | | fullpwnmedia quits [Read error: Connection reset by peer] |
| 11:21:28 | | hitgrr8 quits [Client Quit] |
| 11:33:36 | | takmaad joins |
| 11:34:07 | | lightquantum joins |
| 11:34:39 | | takmaad quits [Remote host closed the connection] |
| 11:37:39 | | fermuch joins |
| 11:37:45 | <fermuch> | hello! |
| 11:39:44 | <fermuch> | JAA on the #archiveteam you've pointed me to the warrior-dockerfile image. Is it this one? https://github.com/ArchiveTeam/warrior-dockerfile |
| 11:40:01 | <fermuch> | so I should use atdr.meo.ws/archiveteam/warrior-dockerfile right? |
| 11:42:22 | | lightquantum quits [Remote host closed the connection] |
| 11:43:02 | | Doranwen quits [Remote host closed the connection] |
| 11:43:03 | | decky_e quits [Remote host closed the connection] |
| 11:43:26 | <@JAA> | fermuch: That's the one if you want the 'set it up once and then forget it even exists' thing, yes. The project images let you instead run multiple projects at once, higher concurrencies (if the target site allows it), and less overhead, at the cost of manual management as new projects launch etc. |
| 11:43:39 | | Doranwen (Doranwen) joins |
| 11:45:51 | <fermuch> | Got it. Thanks. Until I understand more about the project I'll run it on auto mode. |
| 11:45:59 | <fermuch> | Thanks for the help |
| 12:03:00 | | railen63 quits [Remote host closed the connection] |
| 12:06:28 | | railen63 joins |
| 12:07:29 | | railen63 quits [Remote host closed the connection] |
| 12:07:43 | | railen63 joins |
| 12:13:00 | | fermuch quits [Remote host closed the connection] |
| 12:15:01 | | BigBrain quits [Ping timeout: 245 seconds] |
| 12:24:40 | | sonick (sonick) joins |
| 12:33:11 | | BigBrain (bigbrain) joins |
| 12:43:29 | | lunik173 joins |
| 13:16:05 | | nicolas17 joins |
| 13:35:42 | | railen63 quits [Ping timeout: 252 seconds] |
| 13:36:15 | | Doranwen quits [Remote host closed the connection] |
| 13:40:54 | | Doranwen (Doranwen) joins |
| 13:45:29 | | Doranwen quits [Remote host closed the connection] |
| 13:46:09 | | Doranwen (Doranwen) joins |
| 14:14:53 | | Megame (Megame) joins |
| 14:21:17 | | diggan joins |
| 14:32:42 | | icaotix|m joins |
| 14:36:44 | | killsushi quits [Client Quit] |
| 14:37:03 | | bf_ quits [Remote host closed the connection] |
| 14:45:27 | <@JAA> | Stack Exchange has officially discontinued its dumps onto IA: https://meta.stackexchange.com/questions/389922/june-2023-data-dump-is-missing |
| 14:45:54 | <@JAA> | (Or well, pretty close to official, anyway.) |
| 14:51:14 | <@JAA> | Also relevant: https://meta.stackoverflow.com/questions/424299/stack-overflow-is-no-longer-providing-creative-commons-data-dumps https://meta.stackexchange.com/questions/388551/is-se-going-to-be-selling-our-content-for-ai-model-training-and-what-exactly |
| 14:52:51 | <@arkiver> | sigh |
| 14:52:58 | <@arkiver> | well it's been a planned project |
| 14:52:58 | <masterx244|m> | seems like they want to get AT'd, too |
| 14:53:21 | <masterx244|m> | sucks that we got fire after fire popping up right now |
| 14:53:58 | <@arkiver> | VC money running out everywhere |
| 14:54:14 | <diggan> | think it's on purpose? They see that AT is busy with other projects, so they try to do it asap so there isn't enough resources |
| 14:55:27 | <diggan> | probably not though, they're just turning off the pipe in order to be able to monetize the data more |
| 15:12:46 | | beario quits [Remote host closed the connection] |
| 15:26:03 | | beario joins |
| 15:26:54 | | bear joins |
| 15:28:42 | | Island joins |
| 15:29:29 | | bear quits [Client Quit] |
| 15:29:54 | | lk_ joins |
| 15:30:28 | | beario_ joins |
| 15:37:40 | | rusty42 joins |
| 15:38:08 | <@JAA> | pcr: Done |
| 15:38:37 | <h2ibot> | Rexma edited List of websites excluded from the Wayback Machine (+25, added mewch.net, imageboard from circa 2017): https://wiki.archiveteam.org/?diff=49888&oldid=49852 |
| 15:40:37 | <h2ibot> | JustAnotherArchivist edited Stack Exchange (+431, Add discontinuation of dumps): https://wiki.archiveteam.org/?diff=49889&oldid=46235 |
| 15:41:15 | | RDPdotSH joins |
| 15:45:18 | | icedice quits [Client Quit] |
| 15:45:31 | <fireonlive> | srsly stackoverflow :| |
| 15:45:42 | <fireonlive> | thought you were one of the good people |
| 15:50:39 | <joepie91|m> | stackoverflow has been being shitty for quite a while |
| 15:50:59 | <fireonlive> | damn |
| 15:57:51 | | Josh quits [Remote host closed the connection] |
| 16:00:41 | <h2ibot> | JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=49890&oldid=49888 |
| 16:20:23 | | taggart joins |
| 16:22:32 | <taggart> | is there a reddit specific channel? |
| 16:23:09 | <diggan> | #shreddit |
| 16:23:55 | <taggart> | thanks |
| 16:33:15 | | icedice (icedice) joins |
| 16:43:06 | | that_lurker quits [Quit: Clowning around is not the same as fooling around...I am a clown, not a fool] |
| 16:43:15 | | vegbrasi_ joins |
| 16:43:15 | | that_lurker (that_lurker) joins |
| 16:54:10 | | vegbrasi_ quits [Client Quit] |
| 16:54:28 | | vegbrasil joins |
| 17:00:37 | | dumbgoy_ joins |
| 17:03:46 | | trekkie101 joins |
| 17:21:34 | | trekkie101 quits [Remote host closed the connection] |
| 17:22:28 | | za3k joins |
| 17:50:43 | | RDPdotSH quits [Ping timeout: 265 seconds] |
| 18:02:44 | | Megame quits [Client Quit] |
| 18:17:13 | | lk_ quits [Client Quit] |
| 18:17:52 | | lk joins |
| 18:35:41 | | JohnnyJ quits [Client Quit] |
| 18:43:39 | | that_lurker quits [Client Quit] |
| 18:43:49 | | that_lurker (that_lurker) joins |
| 18:51:18 | | decky_e (decky_e) joins |
| 18:59:29 | | StrangeFello joins |
| 19:01:50 | | AmAnd0A quits [Ping timeout: 252 seconds] |
| 19:02:12 | | AmAnd0A joins |
| 19:19:07 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 19:19:25 | | AmAnd0A joins |
| 19:19:35 | | sonick quits [Client Quit] |
| 19:20:46 | <fireonlive> | https://twitter.com/textfiles/status/1667162815637905411 |
| 19:25:14 | <BigBrain> | https://masm32.com/board/ and https://masmforum.com/board/ |
| 19:29:37 | <@JAA> | TIL https://ghostarchive.org/ |
| 19:29:57 | | decky_e quits [Read error: Connection reset by peer] |
| 19:30:37 | | decky_e (decky_e) joins |
| 19:32:03 | <that_lurker> | ghostarchive seems to only grab the page you specify not any link in it |
| 19:32:25 | <@JAA> | So just like SPN (until quite recently) and archive.is then. |
| 19:32:50 | <that_lurker> | "Archive system: Webrecorder" |
| 19:32:59 | <@JAA> | :-( |
| 19:34:22 | <fireonlive> | oh no :( |
| 19:34:55 | <fireonlive> | (whoever added 'save outlinks' to SPN, thank you!) |
| 19:42:20 | | railen63 joins |
| 19:43:15 | | railen63 quits [Remote host closed the connection] |
| 19:43:28 | | railen63 joins |
| 19:50:47 | | decky_e quits [Ping timeout: 252 seconds] |
| 19:51:26 | | decky_e (decky_e) joins |
| 19:54:24 | <h2ibot> | TheTechRobo edited Reddit (+57, Clarify): https://wiki.archiveteam.org/?diff=49891&oldid=49887 |
| 20:02:40 | | diggan quits [Ping timeout: 265 seconds] |
| 20:03:21 | <@JAA> | There is now real official confirmation that Stack Exchange is suspending its data dumps: https://meta.stackexchange.com/questions/389922/june-2023-data-dump-is-missing (answer by Jody Bailey, CTO) |
| 20:03:51 | <@JAA> | > Stack Overflow senior leadership is working on a strategy to protect Stack Overflow data from being misused by companies building LLMs. While working on this strategy, we decided to stop the dump until we could put guardrails in place. |
| 20:03:57 | <@JAA> | > We are looking for ways to gate access to the Dump, APIs, and SEDE, that will allow individuals access to the data while preventing misuse by organizations looking to profit from the work of our community. |
| 20:04:40 | <fireonlive> | direct link to answer: https://meta.stackexchange.com/a/390040 |
| 20:05:09 | <fireonlive> | that's just.... |
| 20:05:12 | <fireonlive> | .... |
| 20:05:17 | <fireonlive> | sigh |
| 20:05:23 | <@JAA> | It's exactly what I expected, but yeah. |
| 20:05:52 | <fireonlive> | > Just as context for casual readers since it may not be obvious, Jody is our CTO. (I am not commenting on the matter at hand, just providing this info.) – balpha Staff Mod |
| 20:06:00 | <fireonlive> | lolol |
| 20:07:00 | <fireonlive> | i would not want to be on that hot potato either |
| 20:10:58 | | Perk quits [Read error: Connection reset by peer] |
| 20:12:18 | <EvanBoehs|m> | How long does it take IA to ingest archivebot crawls? |
| 20:13:12 | <@JAA> | You can expect up to a couple days. |
| 20:13:20 | | BearFortress quits [Ping timeout: 252 seconds] |
| 20:13:21 | | Perk joins |
| 20:13:42 | <fireonlive> | i'm surprised torproject didn't self-host their own forums from the start |
| 20:14:03 | <@JAA> | Reposting from -ot: 19:57:15 < that_lurker> https://blog.torproject.org/tor-project-forum-migration/ Tor Project to self host their forums |
| 20:14:36 | <fireonlive> | seems like something they'd do rather than farming it out to discourse for their saas option |
| 20:14:39 | <fireonlive> | tks |
| 20:15:23 | <@JAA> | Yeah, they self-host almost everything, I believe. (As they should, and I'd like us to do the same.) |
| 20:17:26 | <that_lurker> | discourse hosting it for them for free was most likely the reason |
| 20:18:53 | <fireonlive> | selfhosting is awesome :) |
| 20:19:17 | <that_lurker> | that it is |
| 20:19:30 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+198, /* 2023 */ Add Tor forums): https://wiki.archiveteam.org/?diff=49892&oldid=49880 |
| 20:19:36 | <EvanBoehs|m> | What can be done about archivebot jobs that end prematurely? |
| 20:19:39 | <fireonlive> | there are some people that go a bit further and say if it's not physically in your home that's not true self hosting but... i don't quite subscribe to that yet |
| 20:19:52 | <fireonlive> | (or maybe ever) |
| 20:20:09 | <that_lurker> | JAA When will we get Archiveteam inhouse petabyte project :P |
| 20:21:29 | <@JAA> | that_lurker: We just need someone to be volunteered to cosy up to a billionaire. ;-) |
| 20:22:01 | <@JAA> | I'd love a completely separate entity that mirrors all of IA on another continent. |
| 20:22:43 | <that_lurker> | arctic IA vault |
| 20:23:04 | <BigBrain> | any active mirrors outside the us? |
| 20:23:20 | <@JAA> | Nothing significant |
| 20:23:31 | | upintheairsheep joins |
| 20:24:51 | <that_lurker> | Whats the current storage size of IA |
| 20:24:52 | <icedice> | Is there a project for archiving end-to-end encrypted messaging apps? |
| 20:25:02 | <icedice> | The EU are about to ban them |
| 20:25:09 | <that_lurker> | icedice: Telegrab |
| 20:25:14 | <that_lurker> | for telegram |
| 20:25:20 | <icedice> | Or rather require police backdoors |
| 20:25:31 | <icedice> | I meant the binaries |
| 20:25:36 | <that_lurker> | ahh |
| 20:26:09 | <icedice> | The UK, US, and Australia are also going after them as usual |
| 20:26:18 | <upintheairsheep> | https://twitter.com/search?q=%23BlameCanada&src=typeahead_click https://twitter.com/search?q=new%20york%20smoke&src=typed_query&f=top https://twitter.com/search?q=new%20york%20%20sky&src=typed_query&f=top https://twitter.com/search?q=canada%20wildfire%202023&src=typed_query&f=top |
| 20:26:41 | <icedice> | And even if all the servers go down, P2P messengers like Tox would still work |
| 20:26:58 | <icedice> | As well as those that are self-hostable |
| 20:27:20 | <upintheairsheep> | These Twitter links contain the culture and reaction of the recent and ongoing 2023 canada wildfire |
| 20:27:20 | <ehmry> | icedice: f-droid should be archiving builds. if its not on f-droid its probably not worth keeping |
| 20:28:00 | <icedice> | Signal isn't on F-Droid |
| 20:28:21 | <icedice> | And the EU are going after the Apple App Store and Google Play for those apps |
| 20:28:30 | | upintheairsheep quits [Remote host closed the connection] |
| 20:28:39 | <icedice> | Best case scenario, Google and Apple geo-block non-compliant apps in the EU |
| 20:29:01 | | somerando3 joins |
| 20:29:18 | | railen63 quits [Ping timeout: 252 seconds] |
| 20:29:26 | <that_lurker> | icedice: Is ther a news article on the EU encyption? |
| 20:29:29 | | railen63 joins |
| 20:29:32 | | somerando3 quits [Remote host closed the connection] |
| 20:29:36 | <@JAA> | Signal is on F-Droid, though on a separate repo. |
| 20:29:36 | <ehmry> | signal is an american internet freedom project, it'll be taken care of |
| 20:31:41 | <icedice> | that_lurker: https://www.patrick-breyer.de/en/posts/chat-control/ |
| 20:32:34 | <icedice> | The definition on app stores was loose enough that it could include Linux packages |
| 20:32:46 | <icedice> | I wonder if GitHub could fall under that definition as well |
| 20:32:53 | <icedice> | <ehmry> signal is an american internet freedom project, it'll be taken care of |
| 20:33:31 | <icedice> | Hopefully, the EU tends to fine companies up to 6% of global turnover from the preceding financial year for violations |
| 20:33:34 | <that_lurker> | oh that. EU lawyers are already saying that is unlawful and that will get pushbacks form multiple directions. |
| 20:34:01 | <icedice> | Like that matters |
| 20:34:19 | <@JAA> | I've been wanting to launch a general software binaries archival project, but it requires software that does not exist yet. This would include Linux package repos and stuff like this (though maybe/probably not GitHub releases unless #gitgud's intended continuous archival never happens). |
| 20:34:55 | <@JAA> | It matters if the law goes to court and is struck down. |
| 20:34:59 | <icedice> | The EU Data Retention Directive was declared unconstitutional by the European Court of Justice in 2014 |
| 20:35:17 | <icedice> | A bunch of EU countries still require that ISPs keep logs |
| 20:35:42 | <icedice> | <JAA> It matters if the law goes to court and is struck down. |
| 20:35:45 | <icedice> | Then what? |
| 20:35:59 | <icedice> | Countries are already ignoring ECJ rulings |
| 20:36:11 | <ehmry> | icedice: EU countries can still be police states for matters of internal security, but they can't be mandated as such |
| 20:36:44 | <icedice> | If the countries implement Chat Control before ECJ outlaws it, it's here to stay |
| 20:37:30 | <ehmry> | right, in those countries |
| 20:38:09 | <icedice> | Sweden still has mandatory data retention for ISPs, for example |
| 20:39:30 | <ehmry> | Sweden is allowed to do that because the national security excuse, so the ECJ can't stop them |
| 20:39:32 | <that_lurker> | The whole EU directive was declared unconstitutional, but it didn't mean that countries would need to stop doing that |
| 20:39:54 | <fireonlive> | telegram is.... kinda E2E |
| 20:40:02 | <fireonlive> | if you turn it on per convo lol |
| 20:41:13 | | user343 joins |
| 20:41:53 | | user343 quits [Remote host closed the connection] |
| 20:42:07 | <ehmry> | if only some of the EU is going to ban encryption its going to be impractical to enforce |
| 20:42:51 | <ehmry> | an dennmark can say they need encryption to protect themselves against the swedes |
| 20:43:44 | <icedice> | Decentralized and peer-to-peer end-to-end encrypted messengers will make enforcement near impossible |
| 20:44:32 | <icedice> | <fireonlive> telegram is.... kinda E2E |
| 20:44:34 | <icedice> | Good meme |
| 20:44:35 | <icedice> | https://www.wired.com/story/the-kremlin-has-entered-the-chat/ |
| 20:44:39 | <ehmry> | so bans on "E2" platforms isn't such a bad prospect |
| 20:44:48 | <ehmry> | *E2E |
| 20:44:54 | <@JAA> | This has exited the realm of on-topic discussion. |
| 20:44:55 | <icedice> | It's bad |
| 20:45:05 | <@JAA> | Please take it to -ot. |
| 20:46:10 | | Iki1 quits [Ping timeout: 265 seconds] |
| 20:47:47 | | diggan joins |
| 21:00:53 | | vegbrasi_ joins |
| 21:01:09 | | Unholy2361 quits [Quit: The Lounge - https://thelounge.chat] |
| 21:01:30 | | Unholy2361 (Unholy2361) joins |
| 21:03:56 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 21:05:35 | | vegbrasi_ quits [Ping timeout: 252 seconds] |
| 21:16:08 | | diggan quits [Ping timeout: 265 seconds] |
| 21:19:57 | | vegbrasil joins |
| 21:23:09 | | diggan joins |
| 21:29:04 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 21:51:31 | | vegbrasil joins |
| 21:56:11 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 22:00:06 | | Hajdar quits [Remote host closed the connection] |
| 22:00:23 | | Hajdar (Hajdar) joins |
| 22:05:37 | <@JAA> | School of Dragons https://www.schoolofdragons.com/ shutting down at the end of the month, looking into archiving their S3 bucket, see #// for previous discussion. |
| 22:06:01 | <@JAA> | Can someone take care of AB'ing the website and associated stuff? |
| 22:06:43 | <@JAA> | The bucket is https://s3.amazonaws.com/origin.ka.cdn/?prefix=DWADragonsUnity/ and there's a lot of dupes in it. Analysis ongoing, plan is to use either AB or qwarc. |
| 22:11:28 | <nicolas17> | I downloaded all of Android/1.11.0/, it's 3446 MiB, but jdupes says there's 1722 MiB of duplicates |
| 22:13:59 | <nicolas17> | ooh seems I can easily get md5 |
| 22:15:29 | | vegbrasil joins |
| 22:15:31 | | Abacus6427 joins |
| 22:17:34 | <nicolas17> | JAA: https://paste.debian.net/1282530/ |
| 22:17:41 | <@JAA> | Yes, the ETag is the object's MD5 hash. |
| 22:17:57 | <@JAA> | It's in the ListObjects response. |
| 22:18:10 | <nicolas17> | by "easily" I mean "rclone md5sum" and not have to mess with the API myself |
| 22:18:13 | <nicolas17> | :) |
| 22:18:33 | <@JAA> | Right, I have custom tooling to work with ListObjects directly and output JSONL. Then some jq magic and done. |
| 22:18:51 | <@JAA> | But first I need food. |
| 22:20:02 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 22:21:41 | <that_lurker> | https://www.githubstatus.com/incidents/0csnqhwzxp1m |
| 22:22:00 | | vegbrasil joins |
| 22:22:13 | <that_lurker> | Github having issues again |
| 22:22:33 | <nicolas17> | copying from #// |
| 22:22:41 | <nicolas17> | the DWADragonsUnity directory has: "Total usage: 6.754T, Objects: 7757530" |
| 22:29:56 | | vegbrasil quits [Ping timeout: 252 seconds] |
| 22:37:28 | <tech234a> | YouTuber Dream has decided to un-face-reveal, and has begun slowly deleting photos/videos containing his face https://www.youtube.com/watch?v=uvebNBKOSTg |
| 22:38:13 | <nicolas17> | hope he heard of the streisand effect |
| 22:38:27 | | RDPdotSH joins |
| 22:43:37 | <BigBrain> | his face is a meme, his efforts are futile |
| 22:52:33 | <fireonlive> | hmmm. good luck with that |
| 22:53:34 | <fireonlive> | i totally understand the feeling though |
| 22:57:42 | <nicolas17> | after getting the md5 of 1.39M files, it stalled, guessing rate limiting |
| 22:58:12 | <nicolas17> | maybe rclone fell into a deep hole of exponential backoff even |
| 22:58:17 | <nicolas17> | but |
| 22:58:50 | <nicolas17> | JAA: Android directory, 89% checked (didn't finish yet), so far 1394900 files, 253652 unique files according to md5 |
| 23:00:32 | | RDPdotSH quits [Ping timeout: 265 seconds] |
| 23:19:09 | | BlueMaxima joins |
| 23:23:36 | <@JAA> | chrismeller: ArchiveBot might be an option. Once you have a list and any constraints on how it needs to be grabbed, we can discuss this much better. |
| 23:25:41 | <chrismeller> | Thanks, JAA. I should have the list of requests in the next day or so, then I'll start scraping the list of files as well. |
| 23:26:57 | <@JAA> | Sounds good. |
| 23:28:59 | <@JAA> | nicolas17: S3 seems to be a bit sad right now. I'm getting very slow responses, disconnects, etc. |
| 23:29:27 | <nicolas17> | yeah, I thought it was rate limiting |
| 23:29:32 | <nicolas17> | but there may be more to it |
| 23:30:06 | <@JAA> | Definitely not |
| 23:30:25 | <@JAA> | I'm seeing it from other IPs, too, which I haven't used for accessing S3 in any significant capacity recently. |
| 23:30:42 | <@JAA> | Also, AWS is more than happy to bill the bucket owner as much as they can. |
| 23:30:59 | <nicolas17> | I had it stall after 1.39M files were listed, on two different connections, that's why I thought maybe it was some per-IP limit I hit |
| 23:31:26 | <@JAA> | Yep, also at 1.39M here. |
| 23:31:37 | <nicolas17> | ... wow |
| 23:31:39 | <nicolas17> | on what prefix? |
| 23:31:49 | <@JAA> | No prefix |
| 23:31:55 | <@JAA> | I'm listing the whole bucket because why not. |
| 23:32:48 | <@JAA> | But there are only 16671 objects before DWADragonsUnity, so that doesn't really matter. |
| 23:38:11 | <nicolas17> | Total data: 6916.14 GiB in 7757546 files |
| 23:38:13 | <nicolas17> | Unique data: 3037.20 GiB in 1143186 files |
| 23:38:14 | | Abacus6427 quits [Ping timeout: 265 seconds] |
| 23:38:23 | <nicolas17> | that's in DWADragonsUnity |
| 23:44:49 | <@JAA> | Thanks, ok, qwarc it is. |
| 23:51:17 | | AmAnd0A quits [Ping timeout: 265 seconds] |
| 23:52:13 | | AmAnd0A joins |
| 23:52:39 | <nicolas17> | good god... I hope I'm messing up somewhere lol |
| 23:53:05 | <nicolas17> | looking only at files with .mp4 extension, there's 92 unique files and 48357 duplicates |
| 23:53:15 | | AmAnd0A quits [Read error: Connection reset by peer] |
| 23:53:56 | | AmAnd0A joins |
| 23:55:53 | <nicolas17> | 1247 MiB unique + 514697 MiB duplicate, only in mp4 |
| 23:56:31 | <@JAA> | Sounds very plausible. |
| 23:57:36 | <nicolas17> | .unity3d: "575925 MiB in 486754 files" unique + "309943 MiB in 1316232 files" duplicate |