00:30:14Arcorann (Arcorann) joins
01:00:01dm4v quits [Client Quit]
01:06:18dm4v joins
01:06:20dm4v quits [Changing host]
01:06:20dm4v (dm4v) joins
02:04:07dm4v quits [Ping timeout: 252 seconds]
02:04:16dm4v_ joins
02:04:42dm4v_ is now known as dm4v
02:04:42dm4v quits [Changing host]
02:04:42dm4v (dm4v) joins
02:17:31Atom-- joins
02:19:41Atom quits [Ping timeout: 265 seconds]
03:14:01fuzzy8021 quits [Read error: Connection reset by peer]
03:15:02fuzzy8021 (fuzzy8021) joins
03:15:37@dxrt quits [Ping timeout: 252 seconds]
03:15:55dxrt joins
03:15:57dxrt quits [Changing host]
03:15:57dxrt (dxrt) joins
03:15:57@ChanServ sets mode: +o dxrt
03:18:10ThreeHM quits [Ping timeout: 265 seconds]
03:19:58ThreeHM (ThreeHeadedMonkey) joins
03:24:30shoghicp quits [Read error: Connection reset by peer]
03:24:48shoghicp (shoghicp) joins
03:34:17mutantmonkey quits [Remote host closed the connection]
03:34:23mutantmnky (mutantmonkey) joins
03:44:48sonick quits [Client Quit]
03:48:17fuzzy8021 quits [Read error: Connection reset by peer]
03:49:24fuzzy8021 (fuzzy8021) joins
03:55:52katocala quits [Ping timeout: 265 seconds]
03:56:03katocala joins
04:00:43katocala quits [Ping timeout: 252 seconds]
04:01:00katocala joins
04:07:37fuzzy8021 quits [Read error: Connection reset by peer]
04:08:30fuzzy8021 (fuzzy8021) joins
04:10:04katocala quits [Ping timeout: 252 seconds]
04:10:14katocala joins
04:18:45qw3rty_ joins
04:22:27qw3rty__ quits [Ping timeout: 265 seconds]
04:25:43fuzzy8021 quits [Read error: Connection reset by peer]
04:27:40fuzzy8021 (fuzzy8021) joins
04:27:46HP_Archivist quits [Ping timeout: 265 seconds]
04:40:53fuzzy8021 quits [Read error: Connection reset by peer]
04:41:45fuzzy8021 (fuzzy8021) joins
04:53:23katocala quits [Ping timeout: 265 seconds]
04:53:45katocala joins
05:01:58fuzzy8021 quits [Read error: Connection reset by peer]
05:02:25fuzzy8021 (fuzzy8021) joins
05:04:44Jonboy345 joins
05:07:53Jonboy3451 quits [Ping timeout: 265 seconds]
05:12:21qwertyasdfuiopghjkl quits [Client Quit]
05:19:58russss quits [Ping timeout: 265 seconds]
05:20:56Ctrl-S quits [Ping timeout: 265 seconds]
05:20:56Dallas quits [Ping timeout: 265 seconds]
05:20:56mgrandi quits [Ping timeout: 265 seconds]
05:21:02Ctrl-S joins
05:21:26jrwr__ joins
05:22:24Dallas (Dallas) joins
05:22:25mgrandi (mgrandi) joins
05:22:41russss (russss) joins
05:23:51jrwr_ quits [Ping timeout: 622 seconds]
05:23:51jrwr__ is now known as jrwr_
06:19:59ranr joins
06:37:32ranr quits [Remote host closed the connection]
07:24:07BlueMaxima quits [Read error: Connection reset by peer]
07:32:50ghuntley joins
07:33:23<ghuntley>Hey folks, how can we kick off an official project to archive microsoft ch9 before it goes offline?
07:33:37<ghuntley>FYI $ YouTube-dl supports downloading the videos
07:33:45<ghuntley>How can I help?
07:36:11<ghuntley>https://www.zdnet.com/article/microsoft-is-folding-channel-9-into-its-learn-portal/
07:36:29<ghuntley>> Most of the video content published on or after November 1, 2017, will be automatically migrated
07:36:50<ghuntley>Basically content before 2017 (all the valuable stuff with engineers from key compsci people) are at risk.
07:38:31<ghuntley>Content after 2017 is questionable quality when compared to the timeless stuff from 2009 (https://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-Brian-Beckman-and-Erik-Meijer-Inside-the-NET-Reactive-Framework-Rx)
07:43:38fuzzy8021 quits [Read error: Connection reset by peer]
07:46:01fuzzy8021 (fuzzy8021) joins
08:15:10<ghuntley>In theory is the approach to fork https://github.com/ArchiveTeam/youtube-grab and specialise it?
08:16:21RJHacker96492 quits [Ping timeout: 258 seconds]
08:30:12nepeat joins
08:31:09nepeat is now known as RJHacker88623
08:57:17<AK>Hi ghuntley, just woken up and seen your tweets. You're correct in that the most likely method is going to be a customised version of youtube-grab
08:58:17<AK>Just taking a look through the logs and I think JAA did an AB job of https://channel9.msdn.com/ back in 2020
09:04:22<ghuntley>I’ve got 2tb of space on my bare metal machine and have started a grab-site job. Any idea how big that previous job was? I don’t think 2TB will be enough due to all the video content.
09:06:08<AK>Hmm, looks like there were two jobs: 775489.25 MiB and 4453392.02 MiB
09:06:14<AK>So pretty large haha
09:06:39<AK>I think this is going to be too big for an AB job again this time
09:06:55<ghuntley>AB?
09:08:32<AK>ArchiveBot, dedicated grab machines essentially: https://wiki.archiveteam.org/index.php/ArchiveBot
09:08:52<ghuntley>My lua skills are non existent (been a long time since my world of Warcraft days) but happy to help out where I can with promotion and whatever.
09:09:03<AK>Work great for smaller sites, or those without a deadline, but as it's a single machine (for each job) it'll probably take too long this time
09:09:38<AK>I'm too young to have ever learnt lua, I just try to help out with server capacity mainly
09:10:11<ghuntley>Times like this I’m sad I quit the Microsoft MVP program; would have been able to use $13,000USD in credits to help here.
09:11:14<ghuntley>Okay so if AB isn’t going to help. That means we either need a machine with lots of space + grab-site or do warrior project?
09:12:41<AK>I think it's probably going to be a warrior project yeah
09:16:20<h2ibot>AK edited Deathwatch (+204, Add Channel9): https://wiki.archiveteam.org/?diff=47753&oldid=47686
09:52:37<ghuntley>So we got circa 25 days to archive it all
09:53:22h3ndr1k quits [Quit: ]
10:01:36h3ndr1k (h3ndr1k) joins
10:03:27driib7 quits [Client Quit]
10:03:46driib7 (driib) joins
11:06:02mutantmnky quits [Remote host closed the connection]
11:06:48mutantmnky (mutantmonkey) joins
11:07:10sonick (sonick) joins
12:19:30Arcorann quits [Ping timeout: 265 seconds]
13:31:41<@arkiver>if those archivebot jobs were able to get the videos, it may be just enough to run it through archivebot again
13:34:57<monika>https://nopy.to/ which is a file host seemingly mostly used to host (pirated?) adult games went down 2 days ago due to payment processor issues
13:35:05<monika>https://old.reddit.com/r/Piracy/comments/qmgjgl/nopyto_is_going_death/
13:35:11<AK>Alright, I'll start a run on ak-was-here and we can see how it goes
13:39:11<@arkiver>thanks AK
13:45:00Gereon62 quits [Read error: Connection reset by peer]
13:45:12Gereon62 (Gereon) joins
13:46:01HP_Archivist (HP_Archivist) joins
13:59:49wizards quits [Ping timeout: 258 seconds]
14:01:34wizards joins
14:11:52<@arkiver>the archivebot job seems to be getting mp4s
14:15:59wizards quits [Ping timeout: 265 seconds]
14:17:27sec^nd quits [Ping timeout: 258 seconds]
14:17:46wizards joins
14:20:27sec^nd (second) joins
15:21:06IDK quits [Quit: Connection closed for inactivity]
15:40:05tzt quits [Ping timeout: 265 seconds]
16:38:08TheTechRobo quits [Ping timeout: 258 seconds]
16:50:06TheTechRobo (TheTechRobo) joins
17:18:27lunik1 quits [Quit: :x]
17:36:32tzt (tzt) joins
17:54:59lunik1 joins
18:01:26qwertyasdfuiopghjkl joins
18:10:24sec^nd quits [Remote host closed the connection]
18:10:50sec^nd (second) joins
18:48:41qwertyasdfuiopghjkl93 joins
18:50:18qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
18:50:52qwertyasdfuiopghjkl93 is now known as qwertyasdfuiopghjkl
18:55:09sonick quits [Client Quit]
19:47:07wizards quits [Ping timeout: 258 seconds]
19:48:58wizards joins
19:56:19wizards quits [Ping timeout: 258 seconds]
19:58:10wizards joins
19:58:23<h2ibot>Hyperrobbe edited List of websites excluded from the Wayback Machine (+32, www.systutorials.com – a Linux command…): https://wiki.archiveteam.org/?diff=47754&oldid=47731
20:00:09TheTechRobo quits [Ping timeout: 258 seconds]
20:00:23<h2ibot>JAABot edited List of websites excluded from the Wayback Machine (+0): https://wiki.archiveteam.org/?diff=47755&oldid=47754
20:05:29TheTechRobo (TheTechRobo) joins
20:07:07<ghuntley>Morning folks. Okay we need to kick off a job to archive a bugzilla. Just got tip-off that https://bugzilla.xamarin.com/ will be turned off in less than 30 hours. Contains the history of the development of mono.
20:07:53<ghuntley>@arkiver: ^^
20:30:21Atom__ joins
20:33:53Atom-- quits [Ping timeout: 258 seconds]
20:50:04<ThreeHM>Interesting site; went read only in 2019 and has been converted into static HTML. Old links to the original bugzilla pages are being redirected through JS in their 404 page (https://bugzilla.xamarin.com/404.html). There is a copy on github (https://github.com/xamarin/bugzilla-archives), but that seems to be missing attachments.
20:58:07<@OrIdow6>ThreeHM: I am fairly sure that the subdomain is just GH pages for that repo
20:58:58<ghuntley>“Just a heads up, https://github.com/xamarin/bugzilla-archives
20:58:58<ghuntley>The Xamarin Bugzilla archives (http://bugzilla.xamarin.com) are gonna be taken down soon, probably as early as Monday. Apparently, there are some tokens user accidentally posted within the archives from 2013, and security flagged the repo. Instead of clearing it, the Release Engineering team wants to delete it since it could have more tokens or other personal info.”
21:00:40<ThreeHM>OrIdow6: Yeah, seems to be. The DNS record points to xamarin.github.io.
21:00:47<@OrIdow6>ghuntley: You said "less than 30 hours", are you just inferring that from "as early as Monday"? And is the source of this text public?
21:04:59<@OrIdow6>A bit on attachments here https://github.com/xamarin/bugzilla-archives/issues/11
21:10:17BlueMaxima joins
21:11:10<ghuntley>Source of the text is not public and I cannot reveal my sources apart from saying it’s from folks within Microsoft.
21:11:53<ghuntley>By less than 30 hours, I’m I am interfering “as early as Monday” so it could be sooner.
21:18:32qwertyasdfuiopghjkl95 joins
21:21:10qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
21:30:34<ghuntley>Can we kick off a AB job that does https://bugzilla.xamarin.com/ plus all links to https://xamarinbugzillaarchives.blob.core.windows.net
21:32:18sonick (sonick) joins
21:37:14<@OrIdow6>It sounds like there is not a way to avoid grabbing "tokens or other personal info"
21:37:20<@JAA>Running now
21:39:34<@OrIdow6>The attachments seem to happen in a JS redirect
21:40:01<@OrIdow6>But since this site is served from a public git repo should be easy to make a list
21:41:03<@JAA>Have an example with an attachment?
21:42:10<@JAA>Nevermind, found one: https://bugzilla.xamarin.com/55/55721/bug.html
21:50:11<ThreeHM>We might also want to grab the original bugzilla URLs (https://bugzilla.xamarin.com/show_bug.cgi?id=XYZ ; https://bugzilla.xamarin.com/show_activity.cgi?id=XYZ) for all bug IDs to keep external links intact. Those are also redirected through JS.
21:52:23<@JAA>Attachments are running now. For reference, this is how I generated the list: git grep -F '/attachment.cgi' | grep -Po 'href="\K[^"]+' | sed 's,&amp;,\&,g' | ~/little-things/uniqify | grep -Po '^https://bugzilla\.xamarin\.com/attachment\.cgi\?id=\K\d+&file=[^&]+$' | sed 's,&file=, ,' | awk '$1 < 10 { print "0/" $1 "/" $2; } $1 >= 10 { print substr($1, 1, 2) "/" $1 "/" $2; }' | sed
21:52:29<@JAA>'s,^,https://xamarinbugzillaarchives.blob.core.windows.net/attachments/,'
21:57:43<@JAA>ThreeHM: Yup, running as well now.
22:24:48qwertyasdfuiopghjkl joins
22:26:47qwertyasdfuiopghjkl95 quits [Ping timeout: 244 seconds]
22:32:25<@JAA>Also dumped all their GitHub repos as bundles because why not: https://archive.org/details/github.com_xamarin_bundles_20211106 (still uploading)
22:59:23<@JAA>Attachments should all be covered. Had to handle three separately because they fucked up the filename conversion (+ was converted to space), all others seem to have worked.
23:02:43AlsoHP_Archivist joins
23:06:27HP_Archivist quits [Ping timeout: 258 seconds]
23:56:16<ghuntley>Thanks so much