01:01:59HP_Archivist (HP_Archivist) joins
02:17:39<anarcat>https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leaks-online-4chan-llama
02:56:00qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds]
03:00:14shreyasminocha quits [Ping timeout: 245 seconds]
03:00:14thehedgeh0g quits [Ping timeout: 245 seconds]
03:00:14Kevin quits [Client Quit]
03:00:18shreyasminocha (shreyasminocha) joins
03:00:18thehedgeh0g (mrHedgehog0) joins
03:04:32njha1 quits [Ping timeout: 251 seconds]
03:04:32@AlsoJAA quits [Ping timeout: 251 seconds]
03:04:32Jonimus quits [Ping timeout: 251 seconds]
03:04:32FalconK quits [Ping timeout: 251 seconds]
03:04:32kpcyrd quits [Ping timeout: 251 seconds]
03:04:42AlsoJAA (JAA) joins
03:04:42@ChanServ sets mode: +o AlsoJAA
03:04:42FalconK (FalconK) joins
03:04:43kpcyrd (kpcyrd) joins
03:04:51Jonimus joins
03:06:01njha1 joins
03:09:50Arcorann (Arcorann) joins
03:48:57pabs quits [Quit: Don't rest until all the world is paved in moss and greenery.]
03:51:00pabs (pabs) joins
04:30:32hackbug quits [Remote host closed the connection]
04:32:50hackbug (hackbug) joins
05:05:02systwi_ (systwi) joins
05:05:37systwi quits [Ping timeout: 252 seconds]
05:06:10Terbium quits [Ping timeout: 252 seconds]
05:15:27Terbium joins
05:25:09<pabs>https://www.livescience.com/lost-georges-lemaitre-interview-recovered
05:27:03systwi_ is now known as systwi
05:30:01<pabs>is there a way to archive youtube?
05:36:54<pabs>"Tell HN: Freenom (the operator of .tk, .ml, .ga, .cf, .gq TLDs) is falling apart" https://news.ycombinator.com/item?id=34194555
05:37:29<pabs>time for a new archiving project?
05:38:11<pabs>also https://krebsonsecurity.com/2023/03/sued-by-meta-freenom-halts-domain-registrations/ https://news.ycombinator.com/item?id=35062806
05:52:22jamesatjaminit quits [Read error: Connection reset by peer]
05:52:22jamesatjaminit_ (jamesatjaminit) joins
05:54:48Pichu0202 quits [Remote host closed the connection]
05:58:15Pichu0102 joins
06:01:29hitgrr8 joins
06:23:11BlueMaxima quits [Read error: Connection reset by peer]
06:49:42Island quits [Read error: Connection reset by peer]
07:13:16Gereon62009 (Gereon) joins
07:15:58Gereon6200 quits [Ping timeout: 252 seconds]
07:15:58Gereon62009 is now known as Gereon6200
07:33:34<thuban>pabs: #youtubearchive
08:03:59raxxy-137409 quits [Quit: raxxy-137409]
08:06:04raxxy-137409 joins
08:08:22<pabs>thanks
08:51:07thuban quits [Ping timeout: 252 seconds]
08:58:42LeGoupil joins
09:04:40thuban joins
09:08:05Jake quits [Client Quit]
09:08:20Jake (Jake) joins
10:08:00<pabs>whats the difference between #youtubearchive and #down-the-tube ?
10:20:20qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
11:06:05<dieserniko>I think the first one was for archiving the videos themselves and the second one was archiving metadata in response to the removal of the dislike button
12:08:43Jake quits [Client Quit]
12:08:43LeGoupil quits [Remote host closed the connection]
12:08:46Jake (Jake) joins
12:08:49LeGoupil1 joins
12:11:12LeGoupil1 is now known as LeGoupil
12:52:34Arcorann quits [Ping timeout: 252 seconds]
13:51:55hitgrr8 quits [Client Quit]
14:06:07<pabs>ah
14:26:04lennier1 quits [Ping timeout: 252 seconds]
14:27:00lennier1 (lennier1) joins
14:47:05HackMii quits [Remote host closed the connection]
14:47:42HackMii (hacktheplanet) joins
14:48:37Gereon6200 quits [Ping timeout: 252 seconds]
15:02:56thuban quits [Ping timeout: 265 seconds]
15:15:32ehmry joins
15:16:27Gereon6200 (Gereon) joins
15:21:43thuban joins
15:29:08HackMii quits [Ping timeout: 276 seconds]
15:34:02HackMii (hacktheplanet) joins
15:37:19Island joins
16:10:39hitgrr8 joins
16:20:36qwertyasdfuiopghjkl quits [Remote host closed the connection]
16:52:19LeGoupil quits [Client Quit]
16:57:13umgr036 quits [Remote host closed the connection]
16:57:39second (second) joins
16:58:08sec^nd quits [Remote host closed the connection]
16:58:08second is now known as sec^nd
17:00:48umgr036 joins
17:03:12umgr036 quits [Remote host closed the connection]
17:05:38<@JAA>pabs: What dieserniko is correct, but the practical difference is that #down-the-tube data goes into the Wayback Machine with working video playback but can only somewhat selectively archive videos (cf. wiki page for guidelines) while #youtubearchive goes into a storage that isn't publicly accessible and is a bit less strictly limited.
17:18:42qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
18:05:50Mateon1 quits [Remote host closed the connection]
18:07:17Mateon1 joins
18:54:02qwertyasdfuiopghjkl quits [Client Quit]
19:00:25qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
19:26:00ehmry quits [Ping timeout: 252 seconds]
19:39:50HP_Archivist quits [Read error: Connection reset by peer]
21:00:42miko joins
21:15:45miko leaves
21:16:01mikolaj joins
21:18:43<mikolaj>hello
21:19:26<mikolaj>I've been writing my own open source forum downloader tool. Far from done, but I have bery basic Discourse, PhpBB, SMF, HyperKitty, Pipermail extractors, and will be now working on Hypermail, Proboards, vBulletin, IP.Board extractors
21:19:48<mikolaj>I'd like to ask here if there's any application for such tool. So that I can be sure I'm not sinking my energy into oblivion
21:20:20<mikolaj>I've started writing it because I found wget and httrack limited in usability for downloading forums. But today I learned wpull exists, which might beat the purpose of my project
21:23:43<mikolaj>oh -- the primary selling point is that it can dump posts to JSON, instead of only downloading entire pages
21:25:16<mikolaj>my WIP project is here if anyone cares: https://github.com/mikwielgus/forum-dl
21:27:39<@JAA>mikolaj: That's definitely useful and something I wanted to write for a while. Ideally, there'd be a way to still capture the raw network traffic and write WARCs as well, but that's easy to get wrong if you're not very familiar with the details, so that's not a recommendation to add it now.
21:28:29<TheTechRobo>JAA: Could warcprox effectively automate the WARC writing? Or no since that hasnt been audited yet?
21:28:53<@JAA>TheTechRobo: Yeah, audit needed, but potentially yes.
21:34:54<mikolaj>JAA: can you describe how it could be useful in your use cases? Feedback would be great, so that can know what to focus on
21:44:57mikolaj|m joins
21:50:36<@JAA>mikolaj: I like stuff that can go into the Wayback Machine. :-)
21:51:11<@JAA>Also, preserving the raw original data would allow for reprocessing in the future if a bug is found in the extractor.
22:01:23<mikolaj>sorry, I didn't mean to ask why you need WARCs, but rather what particular use you'd have for my tool (a bundle of forum-software-specific downloaders)
22:12:17hitgrr8 quits [Client Quit]
22:14:14<@JAA>mikolaj: Ah, right, nothing specific, just extracting the post contents into a common machine-readable data format.
22:16:22<@JAA>Allows for indexing, replaying, etc. across forum softwares.
22:16:50<@JAA>Someone else here was working on an indexer of forum pages from WARCs a while ago. Don't think it went anywhere though.
22:19:46<mikolaj>I just browsed your IRC logs and found a person named avoozl talking about developing such a project. Haven't found any repo though
22:20:00<mikolaj>avoozl: how is your project going? Do you have a public repository?
22:24:42<mikolaj|m>(I've permanently connected here via Matrix now so I'll disconnect via IRC now)
22:24:56mikolaj quits [Remote host closed the connection]
22:30:47<@JAA>avoozl: fg
22:31:12<@JAA>(Sorry, ignore)
22:33:06Larsenv quits [Quit: ZNC 1.8.2+deb2build5 - https://znc.in]
22:35:36Larsenv (Larsenv) joins
23:09:53BlueMaxima joins
23:16:29Mateon2 joins
23:16:32Mateon1 quits [Remote host closed the connection]
23:16:32Mateon2 is now known as Mateon1
23:25:33Mateon1 quits [Remote host closed the connection]
23:25:43Mateon1 joins
23:59:44<pabs>JAA: hmm, sounds like #down-the-tube is preferrable
23:59:58pabs wonders if forum-dl can output Maildir :)