00:24:30<h2ibot>JustAnotherArchivist edited Deathwatch (+222, /* 2025 */ Add Yahoo!パートナー (Yahoo! Partner)): https://wiki.archiveteam.org/?diff=53765&oldid=53763
00:33:09SootBector quits [Ping timeout: 240 seconds]
00:35:20SootBector (SootBector) joins
00:42:56etnguyen03 (etnguyen03) joins
00:53:01benjins quits [Read error: Connection reset by peer]
01:06:27etnguyen03 quits [Client Quit]
01:52:53etnguyen03 (etnguyen03) joins
02:05:17kiska quits [Quit: Ping timeout (120 seconds)]
02:06:21s-crypt quits [Quit: Ping timeout (120 seconds)]
02:07:15Flashfire42 quits [Quit: Ping timeout (120 seconds)]
02:14:27Ryz2 quits [Quit: Ping timeout (120 seconds)]
02:52:50Wohlstand quits [Ping timeout: 260 seconds]
02:53:28s-crypt (s-crypt) joins
02:56:10Flashfire42 joins
02:57:55kiska (kiska) joins
03:19:37seacow joins
03:28:45xDEADBEEF joins
03:30:45th3z0l4 quits [Ping timeout: 260 seconds]
03:32:47etnguyen03 quits [Client Quit]
03:38:10xDEADBEEF quits [Read error: Connection reset by peer]
03:38:20th3z0l4 joins
03:40:48etnguyen03 (etnguyen03) joins
03:46:35xDEADBEEF joins
03:47:37th3z0l4 quits [Read error: Connection reset by peer]
03:52:01seacow quits [Client Quit]
03:55:20etnguyen03 quits [Remote host closed the connection]
04:00:38animal_planet joins
04:19:01<animal_planet>Hi all, MangaZ, a popular manga site is going offline on November 26th. There were a couple of threads on Reddit about trying to archive it (URLs below). In one of those threads, someone recommended contacting you all.
04:19:02<animal_planet>I'm a newbie with limited resources and I don't know how much I can help. But if any of you have the bandwidth (mental or network), I think a lot of people would appreciate your help.
04:19:02<animal_planet>https://old.reddit.com/r/Archiveteam/comments/1gkk47s/manga_library_z_a_website_that_distributed_long/
04:19:03<animal_planet>https://old.reddit.com/r/DataHoarder/comments/1gms28u/update_on_mangaz_archiving_status/
04:20:07<@JAA>Hi, yeah, archival is ongoing, and #mangoes is the project channel.
04:21:06<animal_planet>Thank you! I'll check out #mangoes
04:27:25animal_planet quits [Client Quit]
04:28:12Commander001 quits [Ping timeout: 252 seconds]
04:28:46Commander001 joins
05:05:06Flashfire42 quits [Client Quit]
05:05:37s-crypt quits [Client Quit]
05:07:03kiska quits [Client Quit]
05:17:30Flashfire42 joins
05:21:48M60_ quits [Quit: Going offline, see ya! (www.adiirc.com)]
05:38:11Flashfire42 quits [Client Quit]
05:39:07Flashfire42 joins
05:52:03benjins2 quits [Read error: Connection reset by peer]
05:58:27<h2ibot>JustAnotherArchivist edited The WARC Ecosystem (+138, /* Tools */ warcio.js mangles data): https://wiki.archiveteam.org/?diff=53766&oldid=53471
05:59:42benjins3__ quits [Read error: Connection reset by peer]
06:08:15JayEmbee quits [Ping timeout: 260 seconds]
06:33:20katocala quits [Ping timeout: 260 seconds]
06:33:41JayEmbee (JayEmbee) joins
06:40:20Dango360_ quits [Ping timeout: 260 seconds]
06:40:57Dango360 (Dango360) joins
07:05:49Unholy2361924645377131 (Unholy2361) joins
07:08:20corentin quits [Ping timeout: 260 seconds]
07:11:05Unholy23619246453771315 (Unholy2361) joins
07:14:45Unholy2361924645377131 quits [Ping timeout: 260 seconds]
07:14:45Unholy23619246453771315 is now known as Unholy2361924645377131
07:26:30Guest54 quits [Quit: My MacBook has gone to sleep. ZZZzzz…]
07:30:14Dango360_ (Dango360) joins
07:33:33Dango360 quits [Ping timeout: 252 seconds]
07:33:55Commander001 quits [Read error: Connection reset by peer]
07:34:07Commander001 joins
08:01:22Wohlstand (Wohlstand) joins
08:19:30sarge quits [Ping timeout: 260 seconds]
08:20:06benjins3 joins
08:20:17mr_sarge (sarge) joins
08:25:55eth0ws quits [Ping timeout: 260 seconds]
08:26:05eth0ws joins
08:40:30eth0ws quits [Ping timeout: 260 seconds]
08:40:41eth0ws joins
08:45:19corentin joins
09:15:12szanni46 joins
09:15:17kiska (kiska) joins
09:20:10<szanni46>Hi folks. Got a rather seemingly urgent request. Not sure if somebody is on it already, but it seems like codeproject.com is shutting down. It's been offline the past couple of days already, but seemingly is back up today. While many articles are already to be found in archive.org, virtually none of the source code zip code source files are. They
09:20:11<szanni46>used to be behind a loginwall, but are seemingly scrapeable now. It seems like the archive.org scraper does not by default download the zip files. Could be start a job (if none already exists) to scrape the entirety of codeproject.com, in particular all zip files?
09:21:16<szanni46>No official announcement on the website, but there are some notices on reddit for example: https://www.reddit.com/r/cpp/comments/1g6y1l5/codeprojectcom_is_no_more/
09:25:42<szanni46>Does the archiveteam archiver default to downloading zips?
09:26:51<@OrIdow6>https://www.codeproject.com/info/Changes.aspx first appears in the WBM Nov 4
09:27:48<@OrIdow6>It's not otherwise dated but claims they'd implement a freeze "shortly"
09:27:55<@OrIdow6>Ah, the Redditors discuss it
09:28:11<szanni46>Good find. Is any archiving job already running?
09:29:03<szanni46>And would that scrape the zip files too? No idea how long the read only mode would last. I could not access the site for quite a bit there.
09:29:47szanni joins
09:30:32<c3manu>i think a recursive crawl would, yeah. seems to be just regular links
09:32:19<@OrIdow6>szanni46 / szanni: You seem to be familiar with the site, is it normal that nothing is listed under eg https://www.codeproject.com/script/Answers/List.aspx?tab=toprated&alltags=true&tags=916 ?
09:33:13<@OrIdow6>c3manu: Agree, though the Reddit post mentions "16 million user accounts", might get too big for AB
09:33:23<szanni>I've honestly never used the forums. I only read the published articles
09:34:02<c3manu>wait, is the forum on the same domain?
09:35:18<@JAA>I was about to ask, I just see a link to GitHub discussions.
09:35:39<@OrIdow6>Wikidpedia links to http://www.codeproject.com/script/Forums/List.aspx
09:35:43<@OrIdow6>Which is empty
09:35:57<szanni>It says up top: 65,938 articles
09:36:10<@OrIdow6>Seems Forums, Answers, and Articles are separate sections, and only Articles are populated right now
09:36:14<szanni>Trying to find an article sitemap
09:37:15<@JAA>I see a stylesheet switcher at the bottom. That'd need care in AB.
09:37:33bilboed quits [Quit: The Lounge - https://thelounge.chat]
09:37:55bilboed joins
09:38:12<@JAA>robots.txt links to SiteMap.xml, but that's also empty.
09:38:34loug8318142 joins
09:38:50Wohlstand quits [Ping timeout: 260 seconds]
09:39:25<@JAA>There's no date on https://www.codeproject.com/info/Changes.aspx but there's this meta tag: <meta name="Description" content="For those who code; Updated: 15 Oct 2024">
09:41:00<@OrIdow6>Old and useless not-actually-sitemap at https://web.archive.org/web/20220429034114/https://www.codeproject.com/script/Content/SiteMap.aspx
09:41:12<@OrIdow6>JAA: Good find
09:42:43<@OrIdow6>Curious as to whether we'll see any #// captures of that when the WBM starts ingesting again
09:43:27<@JAA>(And more importantly, when that data is actually on its way to IA.)
09:43:52<@JAA>Article IDs are somewhat sequential but with large gaps.
09:44:43<@JAA>We can just throw it at AB and see what it gets.
09:44:50<@JAA>It might discover quite a lot via https://www.codeproject.com/script/Content/TagList.aspx
09:44:53<@arkiver>is AB enough for codeproject.com?
09:45:27<@arkiver>or do we need an emergency project?
09:45:47<@JAA>Ah, the links there go to the Answers section, not Articles.
09:46:05<@OrIdow6>arkiver: I think AB will do for now?
09:46:09<@arkiver>their sitemap is empty
09:46:17<@JAA>I like the &amp; in links.
09:46:35<@JAA>Starting an AB job in a second.
09:46:42<szanni46>Maybe the siztmap is blocked off? Visit-Time: 1200-1700 # Only visit between 5pm and 10pm US EST
09:46:43<szanni46>Sitemap: https://www.codeproject.com/SiteMap.xml
09:46:59<szanni46>At least thats what robots.txt has to say
09:47:03<@OrIdow6>arkiver: The site's owners seem to have tried to content-freeze it, but (as of the present) in the process have removed the majority of pages, so it's smaller than it was at its prime
09:47:21<@arkiver>pff :/
09:47:23<@arkiver>https://web.archive.org/web/20080115063536id_/http://www.codeproject.com/sitemap.xml
09:48:03<@JAA>AB is running now, we'll see what it manages.
09:48:15<@JAA>The articles could be enumerated.
09:48:45<@arkiver>JAA: i wonder if it could be relatively fast done with qwarc
09:51:58<@JAA>arkiver: Assuming their server holds up and there are no silly rate limits, yeah. Not right now though as I'm too tired.
09:52:16<@JAA>The AB job is going through tags and finding a lot already.
09:52:23<@arkiver>alright
09:52:30<@arkiver>it should cover the far majority
09:52:51<szanni>Beautiful. I hope it works out.
09:53:00<@JAA>Specifically, tags on the homepage and then on articles with those tags etc.
09:53:05<@JAA>Since the list of tags is useless.
09:54:05<@arkiver>i'll send them a quick email unless OrIdow6 or JAA is already doing that?
09:54:27<@arkiver>found an email on https://www.codeproject.com/info/privacy.aspx
09:54:43<@arkiver>though it's "webmaster@codeproject.com@codeproject.com" :P
09:54:44<@JAA>Ah yes, webmaster@codeproject.com@codeproject.com
09:54:48<@JAA>lol
09:55:10@JAA isn't already doing that.
09:55:17<@OrIdow6>arkiver: Good idea, and nope, I haven't done so either
09:55:22<@OrIdow6>Hahah
09:55:23<@arkiver>alright!
09:57:29<@OrIdow6>Looking it up apparently a second @ can actually appear in an email, but per some RFC that in all likelihood nobody actually follows, that means the part before the domain needs to be in quotes
09:58:47<kpcyrd>"webmaster@codeproject.com"@codeproject.com is a valid address
10:04:53<@OrIdow6>szanni: Thzanks for bringing this to our attention
10:08:56<szanni>Pleasure. Thank you for your immediate response!
10:10:37szanni46 quits [Client Quit]
10:17:13<h2ibot>OrIdow6 edited Deathwatch (+258, /* 2024 */ Codeproject temporarily went down…): https://wiki.archiveteam.org/?diff=53767&oldid=53765
10:29:11<@arkiver>OrIdow6: adding you as well in CC on the email
10:29:48<@arkiver>i'll wait 30 minutes for you to confirm that is okey
10:30:10<@arkiver>!remindme 1h codeproject.com email OrIdow6
10:30:11<eggdrop>[remind] ok, i'll remind you at 2024-11-13T11:30:10Z
10:41:01<@OrIdow6>arkiver: Alright, thx
10:43:12<@arkiver>JAA: OrIdow6: sent
10:43:59<@JAA>Thanks :-)
10:55:23sralracer joins
11:19:47benjins3 quits [Read error: Connection reset by peer]
11:20:25benjins3 joins
11:22:54wickedplayer494 quits [Ping timeout: 252 seconds]
11:23:32wickedplayer494 joins
11:30:10<eggdrop>[remind] arkiver: codeproject.com email OrIdow6
11:31:25<pabs>arkiver JAA - would be nice to get the code into Software Heritage if there is some way all the data could be sent to them
11:41:53<@arkiver>pabs: i will suggest that to them if they get back the first email
11:42:11<@arkiver>my experience is that the more requests we put in our initial email, the less likely we receive a response
11:43:12<pabs>thanks
11:43:21<@arkiver>so my initial email is usually something along the lines of "[...] Would you like to work with us on this? [...]" (or similar), instead of "[...] So, we need x, y, and z, can you help us with that? [...]"
11:47:56<@OrIdow6>https://d2emerge.com/2024/11/12/d2-emerge-acquires-codeproject-expanding-reach-into-the-software-development-community-2/
11:49:52<@OrIdow6>:(
12:00:06Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat]
12:02:49Bleo182600722719623 joins
12:04:33ducky quits [Ping timeout: 260 seconds]
12:06:51ducky (ducky) joins
12:20:39wickedplayer494 quits [Ping timeout: 252 seconds]
12:21:43wickedplayer494 joins
12:28:00Commander001 quits [Ping timeout: 260 seconds]
12:28:30Commander001 joins
12:31:07decky_e_ quits [Read error: Connection reset by peer]
12:46:49fuzzy8021 quits [Read error: Connection reset by peer]
12:47:19fuzzy80211 (fuzzy80211) joins
12:54:00SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896]
12:55:54SkilledAlpaca41896 joins
13:09:10kiska52 quits [Quit: Ping timeout (120 seconds)]
13:09:41Ryz quits [Quit: Ping timeout (120 seconds)]
13:10:16kiska52 joins
13:11:02Ryz (Ryz) joins
13:15:10<@arkiver>OrIdow6: there's money to be made i guess...
13:15:24<@arkiver>it was always a .com
13:19:23katocala joins
13:45:21benjins2 joins
14:03:11simon8162 (simon816) joins
14:03:31asie quits [Ping timeout: 255 seconds]
14:03:40simon816 quits [Ping timeout: 260 seconds]
14:03:58Dj-Wawa quits [Ping timeout: 255 seconds]
14:05:07asie joins
14:05:10Dj-Wawa joins
14:11:08Guest54 joins
14:45:18linuxgemini quits [Ping timeout: 252 seconds]
15:17:05M60_ joins
15:25:26MrMcNuggets (MrMcNuggets) joins
15:34:05katocala quits [Ping timeout: 260 seconds]
15:34:52katocala joins
15:48:00katocala quits [Ping timeout: 252 seconds]
15:48:12katocala joins
16:00:44szanni quits [Client Quit]
16:19:22AlsoHP_Archivist quits [Quit: Leaving]
17:19:11Commander001 quits [Read error: Connection reset by peer]
17:19:23Commander001 joins
17:21:24MrMcNuggets quits [Client Quit]
17:36:35lflare quits [Ping timeout: 260 seconds]
17:57:55<nicolas17>hot damn codeproject
18:03:18<steering>been a while since i heard of them
18:04:33lflare (lflare) joins
18:07:15fleppi joins
18:40:12xDEADBEEF is now known as th3z0l4
18:42:02<th3z0l4>hey, im trying for a while without success, i have a uncapped data connection, but i can only run a warrior container(docker) with 6 tasks, is there a way to run with more tasks?
18:48:18<@JAA>(Answered the #warrior crosspost.)
19:06:32ducky_ (ducky) joins
19:07:50ducky quits [Read error: Connection reset by peer]
19:07:50ducky_ is now known as ducky
19:08:34fleppi quits [Client Quit]
19:17:59<h2ibot>Posquito edited URLTeam (+276, add thd.co): https://wiki.archiveteam.org/?diff=53768&oldid=53617
19:31:40<qwertyasdfuiopghjkl>Amazon is shutting down Freevee "over the coming weeks", not sure whether it has/had a distinct website anywhere: https://www.theverge.com/2024/11/12/24295129/amazon-shutting-down-freevee-prime-video
19:43:00BornOn420_ quits [Remote host closed the connection]
19:43:33BornOn420 (BornOn420) joins
19:51:03katocala quits [Remote host closed the connection]
20:02:08<h2ibot>Cooljeanius edited Deathwatch (+9, /* Dead as a Doornail */ typo fixes; use URL…): https://wiki.archiveteam.org/?diff=53769&oldid=53767
20:13:28katocala joins
20:45:43BlueMaxima joins
20:51:08<pokechu22>https://about.grubhub.com/news/wonder-announces-acquisition-of-grubhub/
20:54:11<pokechu22>https://www.wonder.com/ is cloudflare hell unfortunately
21:08:05<@JAA>Looks like I'm getting through with grab-site.
21:08:25<katia>JAA you have to say 'im in'
21:09:06<@JAA>Ich bin drin. Das is ja einfach!
21:09:14<katia>sorry
21:32:03Dango360_ quits [Read error: Connection reset by peer]
21:40:26Dango360 (Dango360) joins
21:50:20tek_dmn quits [Ping timeout: 260 seconds]
21:50:35etnguyen03 (etnguyen03) joins
22:38:28tek_dmn (tek_dmn) joins
22:42:07sralracer quits [Client Quit]
23:36:33yasomi is now known as yasomimi
23:41:08yasomi (yasomi) joins