00:17:03wessel1512 quits [Ping timeout: 246 seconds]
00:17:42wessel1512 joins
00:22:11Megame (Megame) joins
00:34:33Wingy1139793 quits [Remote host closed the connection]
00:35:35Wingy1139793 (Wingy) joins
00:54:41Megame quits [Client Quit]
01:33:24justlurking joins
01:50:27Nulo quits [Ping timeout: 246 seconds]
01:50:40Nulo joins
01:50:41justlurking quits [Client Quit]
02:04:13dm4v quits [Client Quit]
02:04:18dm4v joins
02:28:39froschgrosch quits [Ping timeout: 246 seconds]
02:30:39anthony joins
02:30:42froschgrosch joins
02:31:22<anthony>heya, anyone home?
02:32:42<anthony>I'm doing some research on an archived Yahoo newsgroup and could use some advice for how best to view the archive.
02:35:50<anthony>I've tried loading the .warc in ReplayWeb.page. But rather than rendering as a navigable list of threads, I get a list of json objects for each post. This makes viewing the archive pretty impractical
02:37:28<anthony>Any tips for rendering a newsgroup .warc so it looks similar to a native page? Do I need some other file for ReplayWeb.Page to be able to parse it?
02:46:41JackThompson quits [Ping timeout: 240 seconds]
02:52:16<systwi_>anthony: Try changing the list in ReplayWeb.Page to only show HTML, that might make it easier to find what you're looking for.
02:52:36<systwi_>Or, if you know a specific URL, you should be able to enter it in the address bar.
02:53:12<systwi_>Re: "only show HTML," there's a dropdown list in the top right you can change the filter.
02:53:25<systwi_>*where you can change the filter.
02:54:13JackThompson joins
03:03:39ThreeHM quits [Ping timeout: 246 seconds]
03:04:02ThreeHM (ThreeHeadedMonkey) joins
03:05:44<anthony>systwi_ Thanks! It looks like this archive only contains records with the MIME type application/json.
03:06:11JackThompson quits [Ping timeout: 240 seconds]
03:07:33<systwi_>You're welcome. Are you opening the *meta.warc and not the *000000.warc by any chance?
03:08:50<anthony>The file I'm loading into webreplay.page is from here: https://archive.org/details/yahoo-groups-2017-05-23T02-01-34Z-c8358f
03:09:17<anthony>called videoblogging.GORmZlO.warc.gz
03:10:32<anthony>Description says it's an API grab, rather than an archive of the actual webpage. So I guess it makes sense that it only contains the JSON the API would return
03:10:36<systwi_>Ah, okay, I see. It looks like that's a WARC of the Yahoo! Groups API responses, which makes sense as to why they're all JSON files (that's a common occurrence).
03:10:46<systwi_>Yeah, yep.
03:10:50<anthony>Hhah
03:11:38<systwi_>You can extract the JSON files and feed them into `gron': https://github.com/TomNomNom/gron
03:11:49<systwi_>Then grep from there. :-)
03:12:22<systwi_>If you're on a *nix system you can probably just install `gron' from your package manager.
03:13:35<anthony>Ooh nifty
03:15:34<anthony>Ideally I'd like to just feed the JSON into a script that renders the content dynamically on a page that resembles the original Yahoo Groups interface
03:17:39<anthony>I've fiddled a bit w/ javascript trying to build a parser. But I think the task is a bit beyond the scope of my talent, hehe
03:18:58<systwi_>I can't help much with further construction details, but I suppose it could be possible to have, in one pipeline, curl the WARC in question into the stdin of a decompressor, extracting only the JSON in question to stdout (I've only decompressed WARCs on macOS with The Unarchiver, nowhere else yet), then feed that into `gron' or some other tool and
03:18:58<systwi_>create an HTML page from that on the fly.
03:20:14<systwi_>I _think_ PHP might be able to handle a task like this, but I haven't experimented with it enough to know.
03:21:45<systwi_>Ideally, omitting JS altogether would be the most authentic, lightweight and compatible option, judging from my horrible experience with it.
03:22:00<anthony>Haha
03:22:09systwi_ shudders
03:22:17<systwi_>:-)
03:22:45<anthony>My limited experience w/ javascript has given me a similar impression, hah
03:23:19<anthony>(did you know there are no negative indices in js???) makes me appreciate Python so much more
03:24:13<systwi_>Huh, didn't know that.
03:25:45<anthony>Love that ArchiveTeam has been able to back up so much Groups content. Bummer that it takes some legwork to view :(
03:25:47<@JAA>The neat thing about doing it with JS is that it could be a static file on IA. So it won't break in a few years when whoever builds that thing loses interest and shuts down the server.
03:26:10<@JAA>Just about everything else about JS sucks though.
03:26:52<@JAA>Example of such a thing: my Picosong download finder at https://web.archive.org/web/20211001003631id_/https://ia801403.us.archive.org/33/items/picosong.com_finder/index.html
03:30:01<pabs>could someone archive https://github.com/cutefishos? it seems the team and their website went MIA https://www.debugpoint.com/cutefish-os-development-halts/
03:30:18<pabs>JAA: IIRC you did the git stuff last time I asked?
03:30:20Arcorann quits [Read error: Connection reset by peer]
03:33:38<anthony>i ran into some issues parsing the json from the warc. One: the content of a message object (contained in the attr rawEmail) doesn't have a consistent delimiter. So you have to look for a couple different possible options for what precedes the content.
03:34:43<anthony>Also, pointy brackets are sometimes used in <html> tags, and other times used as >>>quotation within a thread
03:35:44<anthony>There might be easy fixes to these formatting issues but i am what the kids call a n00b
03:36:34Arcorann (Arcorann) joins
03:41:12Wingy1139793 quits [Remote host closed the connection]
03:42:06Wingy1139793 (Wingy) joins
03:43:02<Doran>anthony: there'll be a lot more content coming in the future - and a lot easier to view - but the project to tag and sort all the manually saved content is still underway
03:43:07Doran is now known as Doranwen
03:43:17<Doranwen>I've been working on that part
03:43:39<Doranwen>about a million groups in all
03:43:54<anthony>holy moly
03:44:11<Doranwen>most of it saved via Yahoo's GetMyData method
03:45:14<Doranwen>this gdoc gives an idea of what we're aiming to do, if you're curious: https://docs.google.com/document/d/1AWFSmXLH-KsVU7N1EGkmbrLyv1N_fYoRlWLCEWLxtX4/edit?usp=sharing
03:46:54<Doranwen>ultimately there will be a database available for anyone to download and poke in as well
03:47:07Doranwen really wants stats
03:48:05<Doranwen>the variety of languages alone is fascinating
03:51:59<anthony>Thanks for the link Doranwen!
03:52:05<Doranwen>no problem!
03:52:17<Doranwen>I spent two years just working on metadata, Yahoo Groups has taken over my life, lol
03:52:25<Doranwen>but now that I've gotten this far… I have to finish it :)
03:52:48<anthony>wowza
04:06:28JackThompson joins
04:15:24<Jake>I still want to find the original tools we used to build those indexers. :(
04:19:25anthony quits [Remote host closed the connection]
04:23:00dm4v quits [Client Quit]
04:23:05dm4v joins
04:48:07jacobk joins
04:54:26G4te_Keep3r3 quits [Client Quit]
04:56:58G4te_Keep3r3 joins
05:25:11JackThompson quits [Ping timeout: 240 seconds]
05:30:11JackThompson joins
05:31:11qwertyasdfuiopghjkl joins
05:41:20BPCZ quits [Quit: eh???]
05:44:10BPCZ (BPCZ) joins
05:49:51HackMii_ quits [Remote host closed the connection]
05:50:02BPCZ quits [Client Quit]
05:50:27HackMii_ (hacktheplanet) joins
05:50:58march_happy quits [Ping timeout: 265 seconds]
05:51:26march_happy (march_happy) joins
05:55:39HackMii_ quits [Remote host closed the connection]
05:55:47mutantmonkey quits [Remote host closed the connection]
05:56:05mutantmonkey (mutantmonkey) joins
05:56:06HackMii_ (hacktheplanet) joins
05:56:29BPCZ (BPCZ) joins
06:12:07<audrooku|m>Did someone say yahoo groups
06:24:41pabs quits [Ping timeout: 240 seconds]
06:32:45pabs (pabs) joins
06:37:41atphoenix_ quits [Read error: Connection reset by peer]
06:38:19superkuh_ quits [Remote host closed the connection]
06:39:52<Doranwen>audrooku|m: lol yes indeed
06:40:11Doranwen is currently sorting more groups onto tabs, as she's been doing for weeks now
06:42:28atphoenix_ (atphoenix) joins
06:44:41dm4v quits [Client Quit]
06:44:41qwertyasdfuiopghjkl quits [Client Quit]
06:44:45dm4v joins
06:44:51superkuh joins
06:57:09BPCZ quits [Client Quit]
06:57:26BPCZ (BPCZ) joins
07:00:24CreaZyp154 joins
07:03:21<CreaZyp154>I was wondering; since we're archiving all of Reddit, why not archiving all of Twitter as well ?
07:10:26qwertyasdfuiopghjkl joins
07:24:32nikow1 quits [Quit: WeeChat 3.0]
07:24:32nikow joins
07:48:25pabs quits [Ping timeout: 265 seconds]
07:55:41march_happy quits [Ping timeout: 240 seconds]
07:56:29march_happy (march_happy) joins
08:00:21pabs (pabs) joins
08:33:39Ruthalas4 (Ruthalas) joins
08:35:41Ruthalas quits [Ping timeout: 240 seconds]
08:35:41Ruthalas4 is now known as Ruthalas
08:45:47nikow1 joins
08:46:49nikow quits [Remote host closed the connection]
09:12:36<pabs>does ArchiveTeam archive potential DCMA targets? if so, https://github.com/chip-red-pill/MicrocodeDecryptor
09:12:52<pabs>(Intel microcode decryptor)
09:15:14<CreaZyp154>ig... anyways I saved the main page and the zip with save page now and have a copy of the zip as well
09:20:51<@OrIdow6>Wget is the devil
09:21:11<CreaZyp154>why ?
09:23:34Ruthalas1 (Ruthalas) joins
09:24:17Ruthalas quits [Client Quit]
09:24:17dm4v quits [Client Quit]
09:24:17qwertyasdfuiopghjkl quits [Client Quit]
09:24:18CreaZyp154 quits [Client Quit]
09:24:18Ruthalas1 is now known as Ruthalas
09:24:22dm4v joins
09:24:25CreaZyp154 joins
09:25:19<CreaZyp154>sorry my client disconnected on his own why do you say wget is the devil ? OrIdow6
09:26:14<@OrIdow6>Just a joke CreaZyp154, currently dealing with some quirks in its behavior
09:26:57<CreaZyp154>i guessed it was a joke, but what quirks are you dealing with ?
09:28:29<@OrIdow6>Removal of port numbers from the followed target of a Location: header when the number is the default for the scheme, under come circumstances
09:29:11<CreaZyp154>oh ok it was probably a DNS issue, it's annoying; every day the dns stops working for some reasons and have to restart my winblows
09:31:06CreaZyp154 quits [Remote host closed the connection]
09:31:19C joins
09:31:30C is now known as CreaZyp154
09:35:09<@OrIdow6>I don't think it's that, but thank you anyway
09:35:13BlueMaxima quits [Read error: Connection reset by peer]
09:38:33qwertyasdfuiopghjkl joins
09:51:10tech_exorcist (tech_exorcist) joins
10:09:50Ruthalas quits [Client Quit]
10:12:54Ruthalas (Ruthalas) joins
10:23:03HackMii_ quits [Remote host closed the connection]
10:23:43HackMii_ (hacktheplanet) joins
10:27:27Ruthalas quits [Ping timeout: 246 seconds]
10:28:34Ruthalas (Ruthalas) joins
10:42:15LeGoupil joins
10:44:24LeGoupil quits [Remote host closed the connection]
10:44:27HackMii_ quits [Remote host closed the connection]
10:44:47HackMii_ (hacktheplanet) joins
10:45:50niku joins
10:50:14LeGoupil joins
10:59:42tech_exorcist quits [Remote host closed the connection]
10:59:45tech_exorcist_ (tech_exorcist) joins
11:41:18dm4v quits [Client Quit]
11:41:18CreaZyp154 quits [Client Quit]
11:41:18qwertyasdfuiopghjkl quits [Client Quit]
11:41:24dm4v joins
11:44:40C joins
11:50:44C quits [Client Quit]
11:50:45dm4v quits [Client Quit]
11:50:49dm4v joins
11:56:32mgrytbak6 joins
11:57:24mgrytbak quits [Ping timeout: 246 seconds]
11:57:24mgrytbak6 is now known as mgrytbak
12:02:35Wingy1139793 quits [Remote host closed the connection]
12:04:04Wingy1139793 (Wingy) joins
12:18:41march_happy quits [Ping timeout: 240 seconds]
12:19:03march_happy (march_happy) joins
12:25:45wickedplayer494 quits [Ping timeout: 246 seconds]
12:33:59LeGoupil quits [Client Quit]
12:34:06Wingy1139793 quits [Read error: Connection reset by peer]
12:35:00Wingy1139793 (Wingy) joins
12:40:41march_happy quits [Ping timeout: 240 seconds]
12:41:35march_happy (march_happy) joins
12:45:13eroc1990 quits [Quit: The Lounge - https://thelounge.chat]
12:45:49eroc1990 (eroc1990) joins
12:55:49march_happy quits [Ping timeout: 265 seconds]
12:56:07march_happy (march_happy) joins
13:00:39march_happy quits [Ping timeout: 265 seconds]
13:01:27march_happy (march_happy) joins
13:05:41march_happy quits [Ping timeout: 240 seconds]
13:05:46march_happy (march_happy) joins
13:13:19Iki1 joins
13:14:41Iki quits [Ping timeout: 240 seconds]
13:17:14Iki joins
13:20:28Iki1 quits [Ping timeout: 265 seconds]
13:50:40Jonimoose joins
13:52:51dm4v quits [Ping timeout: 265 seconds]
13:53:26dm4v joins
13:54:11pabs quits [Ping timeout: 240 seconds]
14:00:36Arcorann quits [Ping timeout: 246 seconds]
14:01:35eroc19908 (eroc1990) joins
14:01:40Wingy11397937 (Wingy) joins
14:02:22eroc1990 quits [Client Quit]
14:02:22Wingy1139793 quits [Client Quit]
14:02:22dm4v quits [Client Quit]
14:02:22Iki quits [Remote host closed the connection]
14:02:23Wingy11397937 is now known as Wingy1139793
14:02:27dm4v joins
14:06:09dm4v quits [Client Quit]
14:06:13dm4v joins
14:13:09march_happy quits [Ping timeout: 265 seconds]
14:18:11jacobk quits [Ping timeout: 240 seconds]
14:22:36march_happy (march_happy) joins
14:25:26Iki joins
14:25:44<TheTechRobo>systwi_: Re gron - that would have been so helpful for me to know when I was messing around with Discord's websockets
14:25:55<TheTechRobo>Thanks for the link!
14:29:48pabs (pabs) joins
14:55:59tech_exorcist_ quits [Client Quit]
15:02:11atphoenix_ quits [Ping timeout: 240 seconds]
15:06:35mutantmonkey quits [Remote host closed the connection]
15:07:08mutantmonkey (mutantmonkey) joins
15:10:15CraftByte quits [Ping timeout: 246 seconds]
15:12:28<@JAA>pabs: GitHub stuff in #gitgud in the future please. I'll take care of it later.
15:14:22Wingy1139793 quits [Remote host closed the connection]
15:15:13Wingy11397937 (Wingy) joins
15:18:40wickedplayer494 joins
15:32:06Minkafighter quits [Quit: The Lounge - https://thelounge.chat]
15:32:52Minkafighter joins
16:01:13tech_exorcist (tech_exorcist) joins
16:01:55dm4v quits [Client Quit]
16:01:59dm4v joins
16:07:38CraftByte (DragonSec|CraftByte) joins
16:08:01<h2ibot>DigitalDragon edited Alive... OR ARE THEY (+2, Wikia is now Fandom): https://wiki.archiveteam.org/?diff=48761&oldid=48580
16:08:02<h2ibot>Nemo bis edited Magazines and journals (+586, update): https://wiki.archiveteam.org/?diff=48762&oldid=31253
16:38:55AzureLakeZone joins
16:39:49AzureLakeZone leaves
16:41:14AzureLakeZone joins
16:47:21jacobk joins
16:52:52qwertyasdfuiopghjkl joins
17:01:50march_happy quits [Ping timeout: 265 seconds]
17:11:30AzureLakeZone quits [Ping timeout: 265 seconds]
17:13:15tech_exorcist quits [Remote host closed the connection]
17:15:11Wingy11397937 quits [Remote host closed the connection]
17:15:47AzureLakeZone joins
17:16:04Wingy11397937 (Wingy) joins
17:17:04<AzureLakeZone>Hi. You guys have saved the files from Toshiba's drivers site? That's great, it'll come in handy for when they take it all down.
17:17:43<AzureLakeZone>Are you familiar with HP's FTP, with the softpaq files? Apparently apartial backup of it was made in 2013. Do you know something about that?
17:19:11datechnoman quits [Ping timeout: 240 seconds]
17:22:54jacobk quits [Ping timeout: 246 seconds]
17:25:41eroc19908 quits [Ping timeout: 240 seconds]
17:26:23eroc1990 (eroc1990) joins
17:31:45tech_exorcist (tech_exorcist) joins
17:32:03datechnoman (datechnoman) joins
17:48:58jacobk joins
17:59:46spirit joins
18:15:47pabs quits [Ping timeout: 265 seconds]
18:18:29RandoGuy joins
18:22:00tech_exorcist quits [Read error: Connection reset by peer]
18:22:42tech_exorcist (tech_exorcist) joins
18:23:53Sluggs quits [Ping timeout: 265 seconds]
18:24:59Sluggs joins
18:28:16<Lord_Nightmare>the softpaq backup? I may have made that myself back then
18:28:28<Lord_Nightmare>I have a copy... somewhere.
18:33:26<Lord_Nightmare>AzureLakeZone: ^
18:37:45RandoGuy quits [Remote host closed the connection]
19:11:12<Ryz>Time to proactive archive Glassdoor? Annoymous people may have to remove their reviewers because their identities may be de-annoymized: https://www.theguardian.com/world/2022/jul/19/glassdoor-ordered-to-reveal-identity-of-negative-reviewers-to-new-zealand-toymaker
19:48:09jacobk quits [Ping timeout: 246 seconds]
19:49:10jacobk joins
19:56:25wickedplayer494 quits [Remote host closed the connection]
19:56:25CraftByte quits [Client Quit]
19:56:25Wingy11397937 quits [Client Quit]
19:56:25dm4v quits [Client Quit]
19:56:30CraftByte (DragonSec|CraftByte) joins
19:56:30dm4v joins
19:56:40Wingy113979376 (Wingy) joins
19:57:15wickedplayer494 joins
20:02:17dm4v_ joins
20:02:18qwertyasdfuiopghjkl quits [Client Quit]
20:02:18CraftByte quits [Client Quit]
20:02:18dm4v quits [Client Quit]
20:02:18dm4v_ is now known as dm4v
20:05:21<AzureLakeZone>Lord_Nightmare I've heard that while the FTP is still online, they've taken out files from it over the years. Do you have the web pages from HP's site saved or just the FTP files? I wouldn't have enough space for all of the files, I'm just looking for stuff related to the HP compaq nx9010 right now.
20:08:11qwertyasdfuiopghjkl joins
20:38:39tech_exorcist quits [Client Quit]
20:44:39<systwi>Re: Softpaq, I may have a partial copy somewhere as well. I'm estimating it to be about 800GB and from 2019.
20:45:31<systwi>I think their whole FTP server is/was a few TB in size, which is still too heavy for a data hoarder with perpetually-low disk space.
21:13:39dm4v quits [Ping timeout: 265 seconds]
21:47:25mutantmonkey quits [Remote host closed the connection]
21:47:55mutantmonkey (mutantmonkey) joins
21:50:30march_happy (march_happy) joins
21:52:41HackMii_ quits [Ping timeout: 240 seconds]
21:53:33HackMii_ (hacktheplanet) joins
22:02:18dm4v joins
22:13:35jacobk quits [Ping timeout: 246 seconds]
22:38:59HP_Archivist (HP_Archivist) joins
22:40:29dm4v quits [Client Quit]
22:40:29AzureLakeZone quits [Client Quit]
22:40:33dm4v joins
23:05:18HackMii_ quits [Remote host closed the connection]
23:05:18mutantmonkey quits [Remote host closed the connection]
23:06:14mutantmonkey (mutantmonkey) joins
23:08:51HackMii_ (hacktheplanet) joins
23:26:09Wingy113979376 quits [Remote host closed the connection]
23:27:06Wingy113979376 (Wingy) joins
23:33:08BlueMaxima joins
23:33:21marto_ quits [Quit: zzzzz]
23:37:11marto_ (marto_) joins
23:52:55Arcorann (Arcorann) joins