00:06:02 | <pabs> | FTR, I threw vger.kernel.org into AB, it is small (no archives) |
00:22:57 | | kitonthe1et quits [Ping timeout: 272 seconds] |
00:42:57 | | Naruyoko joins |
01:05:23 | | mrcave joins |
01:09:12 | <mrcave> | Hi everyone: Hi, just wondering if anyone know if the upload of the "telenor home" collection is all different version of the same set of website crawls or if every large collection on archive.org are different? https://archive.org/details/archiveteam_telenor - https://wiki.archiveteam.org/index.php/Telenor |
01:11:05 | | inedia quits [Ping timeout: 272 seconds] |
01:16:15 | | kitonthenet joins |
01:18:09 | <pokechu22> | mrcave: I'm pretty sure that each item in that collection contains different data, split so that each item is ~20 GB each |
01:18:54 | <pokechu22> | the home.online.no/~joeolavl/ and similar ones are a bit weird but it sounds like those were for individual users that weren't found by the main grab |
01:19:09 | <pokechu22> | So if you wanted to download all of the site, you'd need to download the WARCs from all of the items |
01:21:13 | | kitonthenet quits [Ping timeout: 272 seconds] |
01:21:16 | <mrcave> | hey, thanks for the info |
01:26:03 | <mrcave> | I check the individual home.online.no/~joeolavl uploads. I helped out on the home.no grab, but not looked at the files since. now trying to look for band pages, but finding interest in writing a summary of home.no and it content. compared to geo cities, the users was the owner of the Internet subscription, so the pages are often made by adults and |
01:26:04 | <mrcave> | with a close family vibe.. person start pages for the familys+++ only found 1 band page |
01:26:19 | | kitonthenet joins |
01:30:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
01:37:31 | | kitonthenet joins |
01:41:50 | | kitonthenet quits [Ping timeout: 240 seconds] |
01:44:39 | | useretail quits [Ping timeout: 272 seconds] |
01:45:22 | | useretail joins |
01:47:56 | | kitonthe2et joins |
01:56:03 | | kitonthe2et quits [Ping timeout: 272 seconds] |
01:56:47 | <@OrIdow6> | Have there been any major changes in the last ~6 months? Anything I can help with in the end-of-year rush? |
01:59:00 | <@arkiver> | OrIdow6: hi :) |
01:59:04 | <@arkiver> | it's not very busy at the moment |
01:59:16 | <@arkiver> | mostly we're working on #frogger (Blogger) still, which is almost finished |
02:00:08 | <@OrIdow6> | That's good |
02:00:20 | | kitonthenet joins |
02:00:21 | <@OrIdow6> | And hi |
02:03:16 | <fireonlive> | hii |
02:05:33 | | kitonthenet quits [Ping timeout: 272 seconds] |
02:30:40 | | kitonthe2et joins |
02:31:30 | | mrcave quits [Remote host closed the connection] |
02:32:31 | | parfait_ quits [Quit: Leaving] |
02:35:20 | | kitonthe2et quits [Ping timeout: 240 seconds] |
02:36:44 | | kitonthe1et joins |
03:04:00 | | missaustraliana joins |
03:08:51 | | missaustraliana quits [Client Quit] |
03:24:35 | | missaustraliana joins |
03:24:47 | | skyrocket joins |
03:33:20 | | skyrocket quits [Ping timeout: 240 seconds] |
03:44:57 | | missaustraliana quits [Client Quit] |
03:55:12 | | AlsoHP_Archivist joins |
03:58:17 | | HP_Archivist quits [Ping timeout: 272 seconds] |
04:04:06 | | skyrocket joins |
04:04:35 | | skyrocket quits [Client Quit] |
04:10:30 | | missaustraliana joins |
04:11:45 | | skyrocket joins |
04:12:41 | <fireonlive> | https://deadline.com/2023/12/andre-braugher-dead-homicide-life-on-the-street-brooklyn-nine-nine-actor-1235665513/ |
04:12:44 | <fireonlive> | "André Braugher Dies: Star Of ‘Homicide: Life On The Street’, ‘Brooklyn Nine-Nine’ & Other Series And Films Was 61" |
04:14:43 | <nicolas17> | ...that URL sounds like he died by homicide |
04:14:50 | | missaustraliana quits [Ping timeout: 240 seconds] |
04:15:47 | | kiryu quits [Remote host closed the connection] |
04:16:39 | | kitonthe1et quits [Ping timeout: 272 seconds] |
04:16:56 | | kiryu joins |
04:16:56 | | kiryu is now authenticated as kiryu |
04:16:56 | | kiryu quits [Changing host] |
04:16:56 | | kiryu (kiryu) joins |
04:18:20 | | skyrocket quits [Ping timeout: 240 seconds] |
04:19:22 | | skyrocket joins |
04:22:47 | <fireonlive> | oh it does |
04:23:55 | <@JAA> | I feel like they edited the headline after publication due to the same issue, but their system doesn't regenerate the slug in that case. |
04:24:15 | | skyrocket quits [Ping timeout: 272 seconds] |
04:24:15 | <@JAA> | Yup: https://web.archive.org/web/20231213013032/https://deadline.com/2023/12/andre-braugher-dead-homicide-life-on-the-street-brooklyn-nine-nine-actor-1235665513/ |
04:24:46 | <@JAA> | Oh wait no, the <title> isn't the article headline... |
04:24:52 | <@JAA> | And that's where the slug is derived from. |
04:25:08 | <fireonlive> | ahh |
04:25:43 | <fireonlive> | at least they have a unique ID in the URL so they can redirect later |
04:33:50 | | skyrocket joins |
04:45:34 | <fireonlive> | -+rss- Professor in Jordan sues sleuth who exposed citation anomalies: https://retractionwatch.com/2023/11/29/professor-in-jordan-sues-sleuth-who-exposed-citation-anomalies/ https://news.ycombinator.com/item?id=38622057 |
04:49:50 | | skyrocket quits [Ping timeout: 240 seconds] |
04:52:39 | | DogsRNice quits [Read error: Connection reset by peer] |
04:53:11 | | skyrocket joins |
04:54:18 | | missaustraliana joins |
04:57:20 | | skyrocket quits [Ping timeout: 240 seconds] |
04:59:30 | | missaustraliana quits [Client Quit] |
05:00:19 | | inedia (inedia) joins |
05:01:35 | | skyrocket joins |
05:02:53 | | nicolas17 quits [Ping timeout: 272 seconds] |
05:05:07 | | missaustraliana joins |
05:06:59 | | missaustraliana quits [Client Quit] |
05:27:49 | | DJ joins |
05:28:02 | <DJ> | Ello |
05:29:54 | <DJ> | I would like to help archive ponychan, is there anything I can help with or do I just have to download Warrior? |
05:30:31 | <flashfire42> | DJ God forbid I ask. Is something happening to ponychan? or would this be proactive archival? |
05:30:52 | <DJ> | It's apparently shutting down on Jan 7th |
05:31:02 | <flashfire42> | Do you have a source for that at all? |
05:31:30 | <DJ> | Yep, here you go https://www.ponychan.net/chat/res/112453.html, it's on Deathwatch as well. |
05:31:41 | <DJ> | Sorry just remove the comma |
05:32:25 | <flashfire42> | Oh it is too. Ok so I mean depending on the rate limit it may just be an archivebot job. More warrior runners is always great but I am not sure this would be a warrior project cc arkiver maybe? |
05:32:47 | | mcint (mcint) joins |
05:33:12 | <fireonlive> | hmm depends how many posts it has i suppose |
05:33:16 | <fireonlive> | + media per post |
05:33:24 | <flashfire42> | I dont know a lot about chans or MLP for that matter I tend to avoid both so |
05:33:33 | <fireonlive> | though we do have until the 7th |
05:33:36 | <@JAA> | How far back do the posts go anyway? Many image boards continuously purge old posts. |
05:34:23 | <flashfire42> | I was thinking that too JAA thats the way those boards often operate |
05:34:27 | <fireonlive> | ah yes |
05:34:42 | <fireonlive> | /pony/ shows 2023-07-24 |
05:35:05 | <@JAA> | /oat/ goes back to 2021. |
05:35:19 | <fireonlive> | /chat/ has one from 2023-08-29 |
05:35:30 | <@JAA> | /fan/ 2015... |
05:35:35 | <@JAA> | So I guess it's not very consistent. lol |
05:35:53 | <fireonlive> | hmm.. 11 pages on /oat/ |
05:36:03 | <fireonlive> | wonder if it's not very active |
05:36:47 | <fireonlive> | i think imageboards are usually purged based on new threads instead of a timer |
05:36:48 | <@JAA> | Older posts still exist. Random example from /pony/: https://www.ponychan.net/pony/res/36833460.html |
05:37:04 | <fireonlive> | oh interesting |
05:37:38 | <fireonlive> | hm that one shows up in catalog still |
05:37:43 | <fireonlive> | https://www.ponychan.net/pony/catalog.html |
05:37:48 | <fireonlive> | so not nesc. pruned yet |
05:43:19 | <DJ> | flashfire42 Alright, do I ask something specific or just go for it? |
05:43:53 | <flashfire42> | If you look above they are discussing possible ways of doing it |
05:44:30 | <flashfire42> | I also just threw it into archivebot just to see |
05:44:44 | <DJ> | Ah okay then, thanks. |
05:46:12 | | skyrocket quits [Client Quit] |
05:47:22 | | BlueMaxima quits [Read error: Connection reset by peer] |
05:48:13 | | Island quits [Read error: Connection reset by peer] |
05:50:30 | <@JAA> | Yeah, checking some more, those are all 404s. I just happened to check one of the very few that's in the catalog.html. lol |
05:50:52 | <fireonlive> | ah good luck haha |
05:52:37 | | lexikiq joins |
05:52:52 | <fireonlive> | AB job seems to be going well so far |
06:08:36 | | missaustraliana joins |
06:19:24 | | skyrocket joins |
06:24:32 | | skyrocket quits [Client Quit] |
06:30:19 | | skyrocket joins |
06:34:54 | | missaustraliana quits [Client Quit] |
06:37:15 | | ymgve quits [Ping timeout: 272 seconds] |
07:06:54 | | Megame quits [Client Quit] |
07:09:21 | | missaustraliana joins |
07:13:59 | | missaustraliana quits [Ping timeout: 272 seconds] |
07:35:53 | | Arcorann (Arcorann) joins |
07:38:26 | | lexikiq quits [Client Quit] |
08:27:24 | | missaustraliana joins |
08:34:18 | | ymgve joins |
08:56:20 | <angenieux> | Hello |
08:56:24 | <angenieux> | Would it be a good idea to rearrange the order of the command line argument of wget-at so that "--lua-script foo.lua" so its easier to see what project a particular process of wget-at is running with htop? |
08:58:24 | <angenieux> | *so that --lua-script part is closer to the front |
09:11:21 | | missaustraliana quits [Client Quit] |
09:49:47 | | bf_ quits [Ping timeout: 272 seconds] |
09:56:07 | | bf_ joins |
09:58:29 | | DJ quits [Ping timeout: 265 seconds] |
10:00:02 | | Bleo1826 quits [Client Quit] |
10:01:18 | | Bleo1826 joins |
10:11:51 | | missaustraliana joins |
10:29:49 | | missaustraliana quits [Client Quit] |
11:33:01 | | kallsyms quits [Ping timeout: 272 seconds] |
11:35:44 | | kallsyms joins |
11:39:34 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
11:41:03 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
12:03:26 | | gfhh joins |
12:29:49 | | sec^nd quits [Remote host closed the connection] |
12:30:18 | | sec^nd (second) joins |
12:41:50 | | Arcorann quits [Ping timeout: 240 seconds] |
12:43:00 | | kitonthenet joins |
12:48:20 | | kitonthenet quits [Ping timeout: 240 seconds] |
13:30:02 | | DannyDorito joins |
13:30:16 | | DannyDorito leaves |
13:45:22 | | klara joins |
13:46:52 | | foaf joins |
13:47:04 | <foaf> | hello guys |
13:48:30 | <foaf> | im trying to decompress one megawarc.warc.zst file but it says that i need the dictionary I tried the script in the warc page but it gives me some errors File "C:\jk.py", line 46, in <module> d = get_dict(fp) ^^^^^^^^^^^^ File "C:\jk.py", line 30, in get_dict p = subprocess.Popen(['unzstd'], stdin = subprocess.PIPE, stdout = |
13:48:30 | <foaf> | subprocess.PIPE, stderr = subprocess.PIPE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\subprocess.py", line 1026, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Program Files\Python311\Lib\subprocess.py", |
13:48:31 | <foaf> | line 1538, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^FileNotFoundError: [WinError 2] |
13:48:50 | <foaf> | i have tried to change unzstd for zstd with no results |
13:49:08 | <foaf> | thanks in advance |
13:51:19 | | klara quits [Remote host closed the connection] |
14:04:42 | | kitonthenet joins |
14:13:53 | | kitonthenet quits [Ping timeout: 272 seconds] |
14:14:24 | <TheTechRobo> | Is that script compatible with windows? |
14:19:36 | <foaf> | i dont know what i changed is unzstd to zstd |
14:33:16 | <TheTechRobo> | Make sure it's in the same folder as yhe script |
14:33:18 | <TheTechRobo> | *the |
14:38:35 | | mr_sarge (sarge) joins |
14:41:02 | | kitonthe1et joins |
14:41:14 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
14:41:16 | | foaf quits [Remote host closed the connection] |
14:41:46 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:22:40 | | foaf joins |
15:22:54 | | foaf quits [Remote host closed the connection] |
15:22:55 | | kitonthe1et quits [Ping timeout: 272 seconds] |
15:26:25 | | kitonthe2et joins |
15:31:47 | | kitonthe2et quits [Ping timeout: 272 seconds] |
15:36:35 | | kitonthe2et joins |
16:10:13 | | Island joins |
16:12:29 | | adia quits [Quit: The Lounge - https://thelounge.chat] |
16:14:54 | | adia (adia) joins |
16:19:18 | | adia quits [Client Quit] |
16:19:32 | | adia (adia) joins |
16:19:37 | | adia quits [Client Quit] |
16:19:49 | | adia (adia) joins |
16:46:05 | | evanim joins |
17:06:31 | | c3manu (c3manu) joins |
17:25:58 | | BornOn420_ quits [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.] |
17:35:41 | | BornOn420 (BornOn420) joins |
17:37:25 | | evanim is now authenticated as evanim |
17:45:33 | | toss (toss) joins |
17:50:50 | | DannyDorito joins |
17:52:09 | | DannyDorito quits [Client Quit] |
17:52:55 | | DopefishJustin quits [Remote host closed the connection] |
17:53:11 | | DopefishJustin joins |
17:53:11 | | DopefishJustin is now authenticated as DopefishJustin |
17:57:40 | | magmaus3 quits [Client Quit] |
18:06:26 | | Ruthalas59 quits [Quit: Ping timeout (120 seconds)] |
18:06:49 | | Ruthalas59 (Ruthalas) joins |
18:08:44 | | Doranwen quits [Remote host closed the connection] |
18:10:18 | | Doranwen (Doranwen) joins |
18:15:42 | | nicolas17 joins |
18:25:23 | | imer quits [Killed (NickServ (GHOST command used by imer1))] |
18:25:30 | | imer (imer) joins |
18:27:39 | | DannyDorito joins |
18:27:59 | | DannyDorito leaves |
18:30:11 | | DJ joins |
19:17:53 | | toss quits [Ping timeout: 272 seconds] |
19:18:27 | | Megame (Megame) joins |
19:41:57 | | katocala quits [Remote host closed the connection] |
19:51:59 | | katocala joins |
19:51:59 | | katocala is now authenticated as katocala |
20:16:40 | | DJ quits [Ping timeout: 265 seconds] |
21:04:00 | | BlueMaxima joins |
21:08:15 | | Megame quits [Client Quit] |
21:17:48 | | c3manu quits [Remote host closed the connection] |
21:27:45 | | magmaus3 (magmaus3) joins |
22:21:49 | | missaustraliana joins |
22:25:47 | | BearFortress__ quits [Client Quit] |
22:33:25 | | tbc1887 quits [Client Quit] |
22:34:27 | | tbc1887 (tbc1887) joins |
22:40:36 | | lennier2_ joins |
22:43:43 | | lennier2 quits [Ping timeout: 272 seconds] |
23:05:05 | | BearFortress joins |
23:23:25 | | Island_ joins |
23:25:50 | | Island quits [Ping timeout: 240 seconds] |
23:45:22 | <flashfire42> | https://apo.org.au/ were we able to do anything about this? |
23:46:37 | <@JAA> | Nope, all attempts got banned very quickly. |