00:06:02<pabs>FTR, I threw vger.kernel.org into AB, it is small (no archives)
00:22:57kitonthe1et quits [Ping timeout: 272 seconds]
00:42:57Naruyoko joins
01:05:23mrcave joins
01:09:12<mrcave>Hi everyone: Hi, just wondering if anyone know if the upload of the "telenor home" collection is all different version of the same set of website crawls or if every large collection on archive.org are different? https://archive.org/details/archiveteam_telenor - https://wiki.archiveteam.org/index.php/Telenor
01:11:05inedia quits [Ping timeout: 272 seconds]
01:16:15kitonthenet joins
01:18:09<pokechu22>mrcave: I'm pretty sure that each item in that collection contains different data, split so that each item is ~20 GB each
01:18:54<pokechu22>the home.online.no/~joeolavl/ and similar ones are a bit weird but it sounds like those were for individual users that weren't found by the main grab
01:19:09<pokechu22>So if you wanted to download all of the site, you'd need to download the WARCs from all of the items
01:21:13kitonthenet quits [Ping timeout: 272 seconds]
01:21:16<mrcave>hey, thanks for the info
01:26:03<mrcave>I check the individual home.online.no/~joeolavl uploads. I helped out on the home.no grab, but not looked at the files since. now trying to look for band pages, but finding interest in writing a summary of home.no and it content. compared to geo cities, the users was the owner of the Internet subscription, so the pages are often made by adults and
01:26:04<mrcave>with a close family vibe.. person start pages for the familys+++ only found 1 band page
01:26:19kitonthenet joins
01:30:50kitonthenet quits [Ping timeout: 240 seconds]
01:37:31kitonthenet joins
01:41:50kitonthenet quits [Ping timeout: 240 seconds]
01:44:39useretail quits [Ping timeout: 272 seconds]
01:45:22useretail joins
01:47:56kitonthe2et joins
01:56:03kitonthe2et quits [Ping timeout: 272 seconds]
01:56:47<@OrIdow6>Have there been any major changes in the last ~6 months? Anything I can help with in the end-of-year rush?
01:59:00<@arkiver>OrIdow6: hi :)
01:59:04<@arkiver>it's not very busy at the moment
01:59:16<@arkiver>mostly we're working on #frogger (Blogger) still, which is almost finished
02:00:08<@OrIdow6>That's good
02:00:20kitonthenet joins
02:00:21<@OrIdow6>And hi
02:03:16<fireonlive>hii
02:05:33kitonthenet quits [Ping timeout: 272 seconds]
02:30:40kitonthe2et joins
02:31:30mrcave quits [Remote host closed the connection]
02:32:31parfait_ quits [Quit: Leaving]
02:35:20kitonthe2et quits [Ping timeout: 240 seconds]
02:36:44kitonthe1et joins
03:04:00missaustraliana joins
03:08:51missaustraliana quits [Client Quit]
03:24:35missaustraliana joins
03:24:47skyrocket joins
03:33:20skyrocket quits [Ping timeout: 240 seconds]
03:44:57missaustraliana quits [Client Quit]
03:55:12AlsoHP_Archivist joins
03:58:17HP_Archivist quits [Ping timeout: 272 seconds]
04:04:06skyrocket joins
04:04:35skyrocket quits [Client Quit]
04:10:30missaustraliana joins
04:11:45skyrocket joins
04:12:41<fireonlive>https://deadline.com/2023/12/andre-braugher-dead-homicide-life-on-the-street-brooklyn-nine-nine-actor-1235665513/
04:12:44<fireonlive>"André Braugher Dies: Star Of ‘Homicide: Life On The Street’, ‘Brooklyn Nine-Nine’ & Other Series And Films Was 61"
04:14:43<nicolas17>...that URL sounds like he died by homicide
04:14:50missaustraliana quits [Ping timeout: 240 seconds]
04:15:47kiryu quits [Remote host closed the connection]
04:16:39kitonthe1et quits [Ping timeout: 272 seconds]
04:16:56kiryu joins
04:16:56kiryu quits [Changing host]
04:16:56kiryu (kiryu) joins
04:18:20skyrocket quits [Ping timeout: 240 seconds]
04:19:22skyrocket joins
04:22:47<fireonlive>oh it does
04:23:55<@JAA>I feel like they edited the headline after publication due to the same issue, but their system doesn't regenerate the slug in that case.
04:24:15skyrocket quits [Ping timeout: 272 seconds]
04:24:15<@JAA>Yup: https://web.archive.org/web/20231213013032/https://deadline.com/2023/12/andre-braugher-dead-homicide-life-on-the-street-brooklyn-nine-nine-actor-1235665513/
04:24:46<@JAA>Oh wait no, the <title> isn't the article headline...
04:24:52<@JAA>And that's where the slug is derived from.
04:25:08<fireonlive>ahh
04:25:43<fireonlive>at least they have a unique ID in the URL so they can redirect later
04:33:50skyrocket joins
04:45:34<fireonlive>-+rss- Professor in Jordan sues sleuth who exposed citation anomalies: https://retractionwatch.com/2023/11/29/professor-in-jordan-sues-sleuth-who-exposed-citation-anomalies/ https://news.ycombinator.com/item?id=38622057
04:49:50skyrocket quits [Ping timeout: 240 seconds]
04:52:39DogsRNice quits [Read error: Connection reset by peer]
04:53:11skyrocket joins
04:54:18missaustraliana joins
04:57:20skyrocket quits [Ping timeout: 240 seconds]
04:59:30missaustraliana quits [Client Quit]
05:00:19inedia (inedia) joins
05:01:35skyrocket joins
05:02:53nicolas17 quits [Ping timeout: 272 seconds]
05:05:07missaustraliana joins
05:06:59missaustraliana quits [Client Quit]
05:27:49DJ joins
05:28:02<DJ>Ello
05:29:54<DJ>I would like to help archive ponychan, is there anything I can help with or do I just have to download Warrior?
05:30:31<flashfire42>DJ God forbid I ask. Is something happening to ponychan? or would this be proactive archival?
05:30:52<DJ>It's apparently shutting down on Jan 7th
05:31:02<flashfire42>Do you have a source for that at all?
05:31:30<DJ>Yep, here you go https://www.ponychan.net/chat/res/112453.html, it's on Deathwatch as well.
05:31:41<DJ>Sorry just remove the comma
05:32:25<flashfire42>Oh it is too. Ok so I mean depending on the rate limit it may just be an archivebot job. More warrior runners is always great but I am not sure this would be a warrior project cc arkiver maybe?
05:32:47mcint (mcint) joins
05:33:12<fireonlive>hmm depends how many posts it has i suppose
05:33:16<fireonlive>+ media per post
05:33:24<flashfire42>I dont know a lot about chans or MLP for that matter I tend to avoid both so
05:33:33<fireonlive>though we do have until the 7th
05:33:36<@JAA>How far back do the posts go anyway? Many image boards continuously purge old posts.
05:34:23<flashfire42>I was thinking that too JAA thats the way those boards often operate
05:34:27<fireonlive>ah yes
05:34:42<fireonlive>/pony/ shows 2023-07-24
05:35:05<@JAA>/oat/ goes back to 2021.
05:35:19<fireonlive>/chat/ has one from 2023-08-29
05:35:30<@JAA>/fan/ 2015...
05:35:35<@JAA>So I guess it's not very consistent. lol
05:35:53<fireonlive>hmm.. 11 pages on /oat/
05:36:03<fireonlive>wonder if it's not very active
05:36:47<fireonlive>i think imageboards are usually purged based on new threads instead of a timer
05:36:48<@JAA>Older posts still exist. Random example from /pony/: https://www.ponychan.net/pony/res/36833460.html
05:37:04<fireonlive>oh interesting
05:37:38<fireonlive>hm that one shows up in catalog still
05:37:43<fireonlive>https://www.ponychan.net/pony/catalog.html
05:37:48<fireonlive>so not nesc. pruned yet
05:43:19<DJ>flashfire42 Alright, do I ask something specific or just go for it?
05:43:53<flashfire42>If you look above they are discussing possible ways of doing it
05:44:30<flashfire42>I also just threw it into archivebot just to see
05:44:44<DJ>Ah okay then, thanks.
05:46:12skyrocket quits [Client Quit]
05:47:22BlueMaxima quits [Read error: Connection reset by peer]
05:48:13Island quits [Read error: Connection reset by peer]
05:50:30<@JAA>Yeah, checking some more, those are all 404s. I just happened to check one of the very few that's in the catalog.html. lol
05:50:52<fireonlive>ah good luck haha
05:52:37lexikiq joins
05:52:52<fireonlive>AB job seems to be going well so far
06:08:36missaustraliana joins
06:19:24skyrocket joins
06:24:32skyrocket quits [Client Quit]
06:30:19skyrocket joins
06:34:54missaustraliana quits [Client Quit]
06:37:15ymgve quits [Ping timeout: 272 seconds]
07:06:54Megame quits [Client Quit]
07:09:21missaustraliana joins
07:13:59missaustraliana quits [Ping timeout: 272 seconds]
07:35:53Arcorann (Arcorann) joins
07:38:26lexikiq quits [Client Quit]
08:27:24missaustraliana joins
08:34:18ymgve joins
08:56:20<angenieux>Hello
08:56:24<angenieux>Would it be a good idea to rearrange the order of the command line argument of wget-at so that "--lua-script foo.lua" so its easier to see what project a particular process of wget-at is running with htop?
08:58:24<angenieux>*so that --lua-script part is closer to the front
09:11:21missaustraliana quits [Client Quit]
09:49:47bf_ quits [Ping timeout: 272 seconds]
09:56:07bf_ joins
09:58:29DJ quits [Ping timeout: 265 seconds]
10:00:02Bleo1826 quits [Client Quit]
10:01:18Bleo1826 joins
10:11:51missaustraliana joins
10:29:49missaustraliana quits [Client Quit]
11:33:01kallsyms quits [Ping timeout: 272 seconds]
11:35:44kallsyms joins
11:39:34qwertyasdfuiopghjkl quits [Remote host closed the connection]
11:41:03qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
12:03:26gfhh joins
12:29:49sec^nd quits [Remote host closed the connection]
12:30:18sec^nd (second) joins
12:41:50Arcorann quits [Ping timeout: 240 seconds]
12:43:00kitonthenet joins
12:48:20kitonthenet quits [Ping timeout: 240 seconds]
13:30:02DannyDorito joins
13:30:16DannyDorito leaves
13:45:22klara joins
13:46:52foaf joins
13:47:04<foaf>hello guys
13:48:30<foaf>im trying to decompress one megawarc.warc.zst file but it says that i need the dictionary I tried the script in the warc page but it gives me some errors File "C:\jk.py", line 46, in <module> d = get_dict(fp) ^^^^^^^^^^^^ File "C:\jk.py", line 30, in get_dict p = subprocess.Popen(['unzstd'], stdin = subprocess.PIPE, stdout =
13:48:30<foaf>subprocess.PIPE, stderr = subprocess.PIPE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Program Files\Python311\Lib\subprocess.py",
13:48:31<foaf>line 1538, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^FileNotFoundError: [WinError 2]
13:48:50<foaf>i have tried to change unzstd for zstd with no results
13:49:08<foaf>thanks in advance
13:51:19klara quits [Remote host closed the connection]
14:04:42kitonthenet joins
14:13:53kitonthenet quits [Ping timeout: 272 seconds]
14:14:24<TheTechRobo>Is that script compatible with windows?
14:19:36<foaf>i dont know what i changed is unzstd to zstd
14:33:16<TheTechRobo>Make sure it's in the same folder as yhe script
14:33:18<TheTechRobo>*the
14:38:35mr_sarge (sarge) joins
14:41:02kitonthe1et joins
14:41:14qwertyasdfuiopghjkl quits [Remote host closed the connection]
14:41:16foaf quits [Remote host closed the connection]
14:41:46qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
15:22:40foaf joins
15:22:54foaf quits [Remote host closed the connection]
15:22:55kitonthe1et quits [Ping timeout: 272 seconds]
15:26:25kitonthe2et joins
15:31:47kitonthe2et quits [Ping timeout: 272 seconds]
15:36:35kitonthe2et joins
16:10:13Island joins
16:12:29adia quits [Quit: The Lounge - https://thelounge.chat]
16:14:54adia (adia) joins
16:19:18adia quits [Client Quit]
16:19:32adia (adia) joins
16:19:37adia quits [Client Quit]
16:19:49adia (adia) joins
16:46:05evanim joins
17:06:31c3manu (c3manu) joins
17:25:58BornOn420_ quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
17:35:41BornOn420 (BornOn420) joins
17:45:33toss (toss) joins
17:50:50DannyDorito joins
17:52:09DannyDorito quits [Client Quit]
17:52:55DopefishJustin quits [Remote host closed the connection]
17:53:11DopefishJustin joins
17:57:40magmaus3 quits [Client Quit]
18:06:26Ruthalas59 quits [Quit: Ping timeout (120 seconds)]
18:06:49Ruthalas59 (Ruthalas) joins
18:08:44Doranwen quits [Remote host closed the connection]
18:10:18Doranwen (Doranwen) joins
18:15:42nicolas17 joins
18:25:23imer quits [Killed (NickServ (GHOST command used by imer1))]
18:25:30imer (imer) joins
18:27:39DannyDorito joins
18:27:59DannyDorito leaves
18:30:11DJ joins
19:17:53toss quits [Ping timeout: 272 seconds]
19:18:27Megame (Megame) joins
19:41:57katocala quits [Remote host closed the connection]
19:51:59katocala joins
20:16:40DJ quits [Ping timeout: 265 seconds]
21:04:00BlueMaxima joins
21:08:15Megame quits [Client Quit]
21:17:48c3manu quits [Remote host closed the connection]
21:27:45magmaus3 (magmaus3) joins
22:21:49missaustraliana joins
22:25:47BearFortress__ quits [Client Quit]
22:33:25tbc1887 quits [Client Quit]
22:34:27tbc1887 (tbc1887) joins
22:40:36lennier2_ joins
22:43:43lennier2 quits [Ping timeout: 272 seconds]
23:05:05BearFortress joins
23:23:25Island_ joins
23:25:50Island quits [Ping timeout: 240 seconds]
23:45:22<flashfire42>https://apo.org.au/ were we able to do anything about this?
23:46:37<@JAA>Nope, all attempts got banned very quickly.