00:24:30 | <h2ibot> | JustAnotherArchivist edited Deathwatch (+222, /* 2025 */ Add Yahoo!パートナー (Yahoo! Partner)): https://wiki.archiveteam.org/?diff=53765&oldid=53763 |
00:33:09 | | SootBector quits [Ping timeout: 240 seconds] |
00:35:20 | | SootBector (SootBector) joins |
00:42:56 | | etnguyen03 (etnguyen03) joins |
00:53:01 | | benjins quits [Read error: Connection reset by peer] |
01:06:27 | | etnguyen03 quits [Client Quit] |
01:52:53 | | etnguyen03 (etnguyen03) joins |
02:05:17 | | kiska quits [Quit: Ping timeout (120 seconds)] |
02:06:21 | | s-crypt quits [Quit: Ping timeout (120 seconds)] |
02:07:15 | | Flashfire42 quits [Quit: Ping timeout (120 seconds)] |
02:14:27 | | Ryz2 quits [Quit: Ping timeout (120 seconds)] |
02:52:50 | | Wohlstand quits [Ping timeout: 260 seconds] |
02:53:28 | | s-crypt (s-crypt) joins |
02:56:10 | | Flashfire42 joins |
02:57:55 | | kiska (kiska) joins |
03:01:28 | | Flashfire42 is now authenticated as flashfire42 |
03:19:37 | | seacow joins |
03:28:45 | | xDEADBEEF joins |
03:30:45 | | th3z0l4 quits [Ping timeout: 260 seconds] |
03:32:47 | | etnguyen03 quits [Client Quit] |
03:38:10 | | xDEADBEEF quits [Read error: Connection reset by peer] |
03:38:20 | | th3z0l4 joins |
03:40:48 | | etnguyen03 (etnguyen03) joins |
03:46:35 | | xDEADBEEF joins |
03:47:37 | | th3z0l4 quits [Read error: Connection reset by peer] |
03:52:01 | | seacow quits [Client Quit] |
03:55:20 | | etnguyen03 quits [Remote host closed the connection] |
04:00:38 | | animal_planet joins |
04:19:01 | <animal_planet> | Hi all, MangaZ, a popular manga site is going offline on November 26th. There were a couple of threads on Reddit about trying to archive it (URLs below). In one of those threads, someone recommended contacting you all. |
04:19:02 | <animal_planet> | I'm a newbie with limited resources and I don't know how much I can help. But if any of you have the bandwidth (mental or network), I think a lot of people would appreciate your help. |
04:19:02 | <animal_planet> | https://old.reddit.com/r/Archiveteam/comments/1gkk47s/manga_library_z_a_website_that_distributed_long/ |
04:19:03 | <animal_planet> | https://old.reddit.com/r/DataHoarder/comments/1gms28u/update_on_mangaz_archiving_status/ |
04:20:07 | <@JAA> | Hi, yeah, archival is ongoing, and #mangoes is the project channel. |
04:21:06 | <animal_planet> | Thank you! I'll check out #mangoes |
04:27:25 | | animal_planet quits [Client Quit] |
04:28:12 | | Commander001 quits [Ping timeout: 252 seconds] |
04:28:46 | | Commander001 joins |
05:05:06 | | Flashfire42 quits [Client Quit] |
05:05:37 | | s-crypt quits [Client Quit] |
05:07:03 | | kiska quits [Client Quit] |
05:17:30 | | Flashfire42 joins |
05:21:48 | | M60_ quits [Quit: Going offline, see ya! (www.adiirc.com)] |
05:38:11 | | Flashfire42 quits [Client Quit] |
05:39:07 | | Flashfire42 joins |
05:52:03 | | benjins2 quits [Read error: Connection reset by peer] |
05:58:27 | <h2ibot> | JustAnotherArchivist edited The WARC Ecosystem (+138, /* Tools */ warcio.js mangles data): https://wiki.archiveteam.org/?diff=53766&oldid=53471 |
05:59:42 | | benjins3__ quits [Read error: Connection reset by peer] |
06:08:15 | | JayEmbee quits [Ping timeout: 260 seconds] |
06:28:47 | | Flashfire42 is now authenticated as flashfire42 |
06:33:20 | | katocala quits [Ping timeout: 260 seconds] |
06:33:41 | | JayEmbee (JayEmbee) joins |
06:40:20 | | Dango360_ quits [Ping timeout: 260 seconds] |
06:40:57 | | Dango360 (Dango360) joins |
07:05:49 | | Unholy2361924645377131 (Unholy2361) joins |
07:08:20 | | corentin quits [Ping timeout: 260 seconds] |
07:11:05 | | Unholy23619246453771315 (Unholy2361) joins |
07:14:45 | | Unholy2361924645377131 quits [Ping timeout: 260 seconds] |
07:14:45 | | Unholy23619246453771315 is now known as Unholy2361924645377131 |
07:26:30 | | Guest54 quits [Quit: My MacBook has gone to sleep. ZZZzzz…] |
07:30:14 | | Dango360_ (Dango360) joins |
07:33:33 | | Dango360 quits [Ping timeout: 252 seconds] |
07:33:55 | | Commander001 quits [Read error: Connection reset by peer] |
07:34:07 | | Commander001 joins |
08:01:22 | | Wohlstand (Wohlstand) joins |
08:19:30 | | sarge quits [Ping timeout: 260 seconds] |
08:20:06 | | benjins3 joins |
08:20:17 | | mr_sarge (sarge) joins |
08:25:55 | | eth0ws quits [Ping timeout: 260 seconds] |
08:26:05 | | eth0ws joins |
08:40:30 | | eth0ws quits [Ping timeout: 260 seconds] |
08:40:41 | | eth0ws joins |
08:45:19 | | corentin joins |
09:15:12 | | szanni46 joins |
09:15:17 | | kiska (kiska) joins |
09:20:10 | <szanni46> | Hi folks. Got a rather seemingly urgent request. Not sure if somebody is on it already, but it seems like codeproject.com is shutting down. It's been offline the past couple of days already, but seemingly is back up today. While many articles are already to be found in archive.org, virtually none of the source code zip code source files are. They |
09:20:11 | <szanni46> | used to be behind a loginwall, but are seemingly scrapeable now. It seems like the archive.org scraper does not by default download the zip files. Could be start a job (if none already exists) to scrape the entirety of codeproject.com, in particular all zip files? |
09:21:16 | <szanni46> | No official announcement on the website, but there are some notices on reddit for example: https://www.reddit.com/r/cpp/comments/1g6y1l5/codeprojectcom_is_no_more/ |
09:25:42 | <szanni46> | Does the archiveteam archiver default to downloading zips? |
09:26:51 | <@OrIdow6> | https://www.codeproject.com/info/Changes.aspx first appears in the WBM Nov 4 |
09:27:48 | <@OrIdow6> | It's not otherwise dated but claims they'd implement a freeze "shortly" |
09:27:55 | <@OrIdow6> | Ah, the Redditors discuss it |
09:28:11 | <szanni46> | Good find. Is any archiving job already running? |
09:29:03 | <szanni46> | And would that scrape the zip files too? No idea how long the read only mode would last. I could not access the site for quite a bit there. |
09:29:47 | | szanni joins |
09:30:32 | <c3manu> | i think a recursive crawl would, yeah. seems to be just regular links |
09:32:19 | <@OrIdow6> | szanni46 / szanni: You seem to be familiar with the site, is it normal that nothing is listed under eg https://www.codeproject.com/script/Answers/List.aspx?tab=toprated&alltags=true&tags=916 ? |
09:33:13 | <@OrIdow6> | c3manu: Agree, though the Reddit post mentions "16 million user accounts", might get too big for AB |
09:33:23 | <szanni> | I've honestly never used the forums. I only read the published articles |
09:34:02 | <c3manu> | wait, is the forum on the same domain? |
09:35:18 | <@JAA> | I was about to ask, I just see a link to GitHub discussions. |
09:35:39 | <@OrIdow6> | Wikidpedia links to http://www.codeproject.com/script/Forums/List.aspx |
09:35:43 | <@OrIdow6> | Which is empty |
09:35:57 | <szanni> | It says up top: 65,938 articles |
09:36:10 | <@OrIdow6> | Seems Forums, Answers, and Articles are separate sections, and only Articles are populated right now |
09:36:14 | <szanni> | Trying to find an article sitemap |
09:37:15 | <@JAA> | I see a stylesheet switcher at the bottom. That'd need care in AB. |
09:37:33 | | bilboed quits [Quit: The Lounge - https://thelounge.chat] |
09:37:55 | | bilboed joins |
09:38:12 | <@JAA> | robots.txt links to SiteMap.xml, but that's also empty. |
09:38:34 | | loug8318142 joins |
09:38:50 | | Wohlstand quits [Ping timeout: 260 seconds] |
09:39:25 | <@JAA> | There's no date on https://www.codeproject.com/info/Changes.aspx but there's this meta tag: <meta name="Description" content="For those who code; Updated: 15 Oct 2024"> |
09:41:00 | <@OrIdow6> | Old and useless not-actually-sitemap at https://web.archive.org/web/20220429034114/https://www.codeproject.com/script/Content/SiteMap.aspx |
09:41:12 | <@OrIdow6> | JAA: Good find |
09:42:43 | <@OrIdow6> | Curious as to whether we'll see any #// captures of that when the WBM starts ingesting again |
09:43:27 | <@JAA> | (And more importantly, when that data is actually on its way to IA.) |
09:43:52 | <@JAA> | Article IDs are somewhat sequential but with large gaps. |
09:44:43 | <@JAA> | We can just throw it at AB and see what it gets. |
09:44:50 | <@JAA> | It might discover quite a lot via https://www.codeproject.com/script/Content/TagList.aspx |
09:44:53 | <@arkiver> | is AB enough for codeproject.com? |
09:45:27 | <@arkiver> | or do we need an emergency project? |
09:45:47 | <@JAA> | Ah, the links there go to the Answers section, not Articles. |
09:46:05 | <@OrIdow6> | arkiver: I think AB will do for now? |
09:46:09 | <@arkiver> | their sitemap is empty |
09:46:17 | <@JAA> | I like the & in links. |
09:46:35 | <@JAA> | Starting an AB job in a second. |
09:46:42 | <szanni46> | Maybe the siztmap is blocked off? Visit-Time: 1200-1700 # Only visit between 5pm and 10pm US EST |
09:46:43 | <szanni46> | Sitemap: https://www.codeproject.com/SiteMap.xml |
09:46:59 | <szanni46> | At least thats what robots.txt has to say |
09:47:03 | <@OrIdow6> | arkiver: The site's owners seem to have tried to content-freeze it, but (as of the present) in the process have removed the majority of pages, so it's smaller than it was at its prime |
09:47:21 | <@arkiver> | pff :/ |
09:47:23 | <@arkiver> | https://web.archive.org/web/20080115063536id_/http://www.codeproject.com/sitemap.xml |
09:48:03 | <@JAA> | AB is running now, we'll see what it manages. |
09:48:15 | <@JAA> | The articles could be enumerated. |
09:48:45 | <@arkiver> | JAA: i wonder if it could be relatively fast done with qwarc |
09:51:58 | <@JAA> | arkiver: Assuming their server holds up and there are no silly rate limits, yeah. Not right now though as I'm too tired. |
09:52:16 | <@JAA> | The AB job is going through tags and finding a lot already. |
09:52:23 | <@arkiver> | alright |
09:52:30 | <@arkiver> | it should cover the far majority |
09:52:51 | <szanni> | Beautiful. I hope it works out. |
09:53:00 | <@JAA> | Specifically, tags on the homepage and then on articles with those tags etc. |
09:53:05 | <@JAA> | Since the list of tags is useless. |
09:54:05 | <@arkiver> | i'll send them a quick email unless OrIdow6 or JAA is already doing that? |
09:54:27 | <@arkiver> | found an email on https://www.codeproject.com/info/privacy.aspx |
09:54:43 | <@arkiver> | though it's "webmaster@codeproject.com@codeproject.com" :P |
09:54:44 | <@JAA> | Ah yes, webmaster@codeproject.com@codeproject.com |
09:54:48 | <@JAA> | lol |
09:55:10 | | @JAA isn't already doing that. |
09:55:17 | <@OrIdow6> | arkiver: Good idea, and nope, I haven't done so either |
09:55:22 | <@OrIdow6> | Hahah |
09:55:23 | <@arkiver> | alright! |
09:57:29 | <@OrIdow6> | Looking it up apparently a second @ can actually appear in an email, but per some RFC that in all likelihood nobody actually follows, that means the part before the domain needs to be in quotes |
09:58:47 | <kpcyrd> | "webmaster@codeproject.com"@codeproject.com is a valid address |
10:04:53 | <@OrIdow6> | szanni: Thzanks for bringing this to our attention |
10:08:56 | <szanni> | Pleasure. Thank you for your immediate response! |
10:10:37 | | szanni46 quits [Client Quit] |
10:17:13 | <h2ibot> | OrIdow6 edited Deathwatch (+258, /* 2024 */ Codeproject temporarily went down…): https://wiki.archiveteam.org/?diff=53767&oldid=53765 |
10:29:11 | <@arkiver> | OrIdow6: adding you as well in CC on the email |
10:29:48 | <@arkiver> | i'll wait 30 minutes for you to confirm that is okey |
10:30:10 | <@arkiver> | !remindme 1h codeproject.com email OrIdow6 |
10:30:11 | <eggdrop> | [remind] ok, i'll remind you at 2024-11-13T11:30:10Z |
10:41:01 | <@OrIdow6> | arkiver: Alright, thx |
10:43:12 | <@arkiver> | JAA: OrIdow6: sent |
10:43:59 | <@JAA> | Thanks :-) |
10:55:23 | | sralracer joins |
10:56:58 | | sralracer is now authenticated as sralracer |
11:19:47 | | benjins3 quits [Read error: Connection reset by peer] |
11:20:25 | | benjins3 joins |
11:22:54 | | wickedplayer494 quits [Ping timeout: 252 seconds] |
11:23:32 | | wickedplayer494 joins |
11:30:10 | <eggdrop> | [remind] arkiver: codeproject.com email OrIdow6 |
11:31:25 | <pabs> | arkiver JAA - would be nice to get the code into Software Heritage if there is some way all the data could be sent to them |
11:41:53 | <@arkiver> | pabs: i will suggest that to them if they get back the first email |
11:42:11 | <@arkiver> | my experience is that the more requests we put in our initial email, the less likely we receive a response |
11:43:12 | <pabs> | thanks |
11:43:21 | <@arkiver> | so my initial email is usually something along the lines of "[...] Would you like to work with us on this? [...]" (or similar), instead of "[...] So, we need x, y, and z, can you help us with that? [...]" |
11:47:56 | <@OrIdow6> | https://d2emerge.com/2024/11/12/d2-emerge-acquires-codeproject-expanding-reach-into-the-software-development-community-2/ |
11:49:52 | <@OrIdow6> | :( |
12:00:06 | | Bleo182600722719623 quits [Quit: The Lounge - https://thelounge.chat] |
12:02:49 | | Bleo182600722719623 joins |
12:04:33 | | ducky quits [Ping timeout: 260 seconds] |
12:06:51 | | ducky (ducky) joins |
12:20:39 | | wickedplayer494 quits [Ping timeout: 252 seconds] |
12:21:43 | | wickedplayer494 joins |
12:28:00 | | Commander001 quits [Ping timeout: 260 seconds] |
12:28:30 | | Commander001 joins |
12:31:07 | | decky_e_ quits [Read error: Connection reset by peer] |
12:46:49 | | fuzzy8021 quits [Read error: Connection reset by peer] |
12:47:19 | | fuzzy80211 (fuzzy80211) joins |
12:54:00 | | SkilledAlpaca41896 quits [Quit: SkilledAlpaca41896] |
12:55:54 | | SkilledAlpaca41896 joins |
13:09:10 | | kiska52 quits [Quit: Ping timeout (120 seconds)] |
13:09:41 | | Ryz quits [Quit: Ping timeout (120 seconds)] |
13:10:16 | | kiska52 joins |
13:11:02 | | Ryz (Ryz) joins |
13:15:10 | <@arkiver> | OrIdow6: there's money to be made i guess... |
13:15:24 | <@arkiver> | it was always a .com |
13:19:23 | | katocala joins |
13:19:37 | | katocala is now authenticated as katocala |
13:45:21 | | benjins2 joins |
14:03:11 | | simon8162 (simon816) joins |
14:03:31 | | asie quits [Ping timeout: 255 seconds] |
14:03:40 | | simon816 quits [Ping timeout: 260 seconds] |
14:03:58 | | Dj-Wawa quits [Ping timeout: 255 seconds] |
14:05:07 | | asie joins |
14:05:10 | | Dj-Wawa joins |
14:05:10 | | Dj-Wawa is now authenticated as Dj-Wawa |
14:11:08 | | Guest54 joins |
14:45:18 | | linuxgemini quits [Ping timeout: 252 seconds] |
15:17:05 | | M60_ joins |
15:25:26 | | MrMcNuggets (MrMcNuggets) joins |
15:34:05 | | katocala quits [Ping timeout: 260 seconds] |
15:34:52 | | katocala joins |
15:48:00 | | katocala quits [Ping timeout: 252 seconds] |
15:48:12 | | katocala joins |
16:00:44 | | szanni quits [Client Quit] |
16:19:22 | | AlsoHP_Archivist quits [Quit: Leaving] |
17:19:11 | | Commander001 quits [Read error: Connection reset by peer] |
17:19:23 | | Commander001 joins |
17:21:24 | | MrMcNuggets quits [Client Quit] |
17:36:35 | | lflare quits [Ping timeout: 260 seconds] |
17:57:55 | <nicolas17> | hot damn codeproject |
18:03:18 | <steering> | been a while since i heard of them |
18:04:33 | | lflare (lflare) joins |
18:07:15 | | fleppi joins |
18:40:12 | | xDEADBEEF is now known as th3z0l4 |
18:42:02 | <th3z0l4> | hey, im trying for a while without success, i have a uncapped data connection, but i can only run a warrior container(docker) with 6 tasks, is there a way to run with more tasks? |
18:48:18 | <@JAA> | (Answered the #warrior crosspost.) |
19:06:32 | | ducky_ (ducky) joins |
19:07:50 | | ducky quits [Read error: Connection reset by peer] |
19:07:50 | | ducky_ is now known as ducky |
19:08:34 | | fleppi quits [Client Quit] |
19:17:20 | | katocala is now authenticated as katocala |
19:17:59 | <h2ibot> | Posquito edited URLTeam (+276, add thd.co): https://wiki.archiveteam.org/?diff=53768&oldid=53617 |
19:31:40 | <qwertyasdfuiopghjkl> | Amazon is shutting down Freevee "over the coming weeks", not sure whether it has/had a distinct website anywhere: https://www.theverge.com/2024/11/12/24295129/amazon-shutting-down-freevee-prime-video |
19:43:00 | | BornOn420_ quits [Remote host closed the connection] |
19:43:33 | | BornOn420 (BornOn420) joins |
19:51:03 | | katocala quits [Remote host closed the connection] |
20:02:08 | <h2ibot> | Cooljeanius edited Deathwatch (+9, /* Dead as a Doornail */ typo fixes; use URL…): https://wiki.archiveteam.org/?diff=53769&oldid=53767 |
20:13:28 | | katocala joins |
20:13:41 | | katocala is now authenticated as katocala |
20:45:43 | | BlueMaxima joins |
20:51:08 | <pokechu22> | https://about.grubhub.com/news/wonder-announces-acquisition-of-grubhub/ |
20:54:11 | <pokechu22> | https://www.wonder.com/ is cloudflare hell unfortunately |
21:08:05 | <@JAA> | Looks like I'm getting through with grab-site. |
21:08:25 | <katia> | JAA you have to say 'im in' |
21:09:06 | <@JAA> | Ich bin drin. Das is ja einfach! |
21:09:14 | <katia> | sorry |
21:32:03 | | Dango360_ quits [Read error: Connection reset by peer] |
21:40:26 | | Dango360 (Dango360) joins |
21:50:20 | | tek_dmn quits [Ping timeout: 260 seconds] |
21:50:35 | | etnguyen03 (etnguyen03) joins |
22:38:28 | | tek_dmn (tek_dmn) joins |
22:42:07 | | sralracer quits [Client Quit] |
23:36:33 | | yasomi is now known as yasomimi |
23:41:08 | | yasomi (yasomi) joins |