00:08:11<flashfire42>the fuck
00:16:43<Pedrosso>Made a table of Steam Workshops (the table starting out collapsed for obvious reasons)
00:20:15<fireonlive>nice
00:20:20<Pedrosso>ye
00:42:11jasons quits [Ping timeout: 272 seconds]
00:48:31Billy549_ quits [Ping timeout: 272 seconds]
00:49:00Billy549 (Billy549) joins
00:50:15<nicolas17>>283 kilobytes
01:22:03Mateon2 joins
01:23:59Mateon1 quits [Ping timeout: 272 seconds]
01:23:59Mateon2 is now known as Mateon1
01:30:34Mateon2 joins
01:32:51Mateon1 quits [Ping timeout: 272 seconds]
01:32:51Mateon2 is now known as Mateon1
01:45:11jasons (jasons) joins
02:00:36xarph joins
02:00:45DJ joins
02:02:58<DJ>https://anon.cafe/ is shutting down on March 15
02:03:19<DJ>https://anon.cafe/meta/res/16466.html announcement
02:07:32<nicolas17>what is it?
02:13:23Dominika quits [Ping timeout: 272 seconds]
02:15:14<DJ>It's an imageboard, part of a webring. Shutting down because of operating costs https://anon.cafe/meta/res/16467.html#16486
02:16:33<DJ>Oh sorry that's not the board owner, it's just speculation they don't know.
02:40:04DJ quits [Ping timeout: 265 seconds]
02:41:03<pabs>pokechu22: a jira https://jira.ecmwf.int
02:41:33<h2ibot>Pokechu22 edited Jira (+23, /* Not yet archived */ https://jira.ecmwf.int): https://wiki.archiveteam.org/?diff=51661&oldid=51655
02:41:33<pokechu22>thanks
02:41:40<pokechu22>I'm going to try to get something started on those soon
02:41:50jasons quits [Ping timeout: 240 seconds]
02:42:38<pokechu22>I'm pretty sure the database doesn't actually need to be saved to get attachments, as the same URL extraction issue that causes a bunch of junk relative URLs for attachments means that all attachments get logged... so that simplifies things a bit
02:52:00qwertyasdfuiopghjkl quits [Remote host closed the connection]
02:58:14qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
03:45:16jasons (jasons) joins
03:48:16thalia_ joins
03:52:54icedice quits [Client Quit]
03:55:40razul6 joins
03:57:50razul quits [Ping timeout: 240 seconds]
03:57:50razul6 is now known as razul
04:10:03wyatt8750 joins
04:10:33wyatt8740 quits [Ping timeout: 272 seconds]
04:11:27rohvani joins
04:23:52<h2ibot>JustAnotherArchivist edited Current Projects (+1, Fix date): https://wiki.archiveteam.org/?diff=51662&oldid=51658
04:30:54<h2ibot>FireonLive edited Current Projects (+16, move Blogger to long-term to reflect new…): https://wiki.archiveteam.org/?diff=51663&oldid=51662
04:35:37missaustraliana joins
04:36:32missaustraliana quits [Client Quit]
04:38:25wyatt8750 quits [Ping timeout: 272 seconds]
04:38:40wyatt8740 joins
04:41:50Craigle quits [Quit: The Lounge - https://thelounge.chat]
04:41:50jasons quits [Ping timeout: 240 seconds]
04:42:20Craigle (Craigle) joins
05:09:48<fireonlive>Weaveworks is shutting down - https://www.linkedin.com/posts/richardsonalexis_hi-everyone-i-am-very-sad-to-announce-activity-7160295096825860096-ZS67 https://news.ycombinator.com/item?id=39262650
05:09:55imer quits [Killed (NickServ (GHOST command used by imer6))]
05:10:02imer (imer) joins
05:35:04<h2ibot>Pokechu22 edited Jira (+50, /* Not yet archived */…): https://wiki.archiveteam.org/?diff=51664&oldid=51661
05:45:14jasons (jasons) joins
06:06:10<h2ibot>Pokechu22 edited Jira (+244, /* Strategy */ database isn't needed; link script): https://wiki.archiveteam.org/?diff=51665&oldid=51664
06:07:10<h2ibot>Pokechu22 edited Jira (+26, /* Not yet archived */ https://bugs.openjdk.org/): https://wiki.archiveteam.org/?diff=51666&oldid=51665
06:09:37igloo22225 quits [Ping timeout: 272 seconds]
06:15:20igloo22225 (igloo22225) joins
06:21:53Island quits [Read error: Connection reset by peer]
06:23:54BearFortress quits [Read error: Connection reset by peer]
06:25:09Arcorann (Arcorann) joins
06:26:34BearFortress joins
06:44:27jasons quits [Ping timeout: 272 seconds]
07:17:49Naruyoko quits [Read error: Connection reset by peer]
07:19:26Naruyoko joins
07:47:21jasons (jasons) joins
08:27:11Wohlstand (Wohlstand) joins
08:46:20jasons quits [Ping timeout: 240 seconds]
08:54:17magmaus3 quits [Ping timeout: 272 seconds]
09:17:36Chris5010 (Chris5010) joins
09:49:52jasons (jasons) joins
09:57:00<h2ibot>Exorcism edited Vbox7 (+64): https://wiki.archiveteam.org/?diff=51667&oldid=51648
10:16:28Mist8kenGAS (Mist8kenGAS) joins
10:37:50parfait quits [Client Quit]
10:44:35magmaus3 (magmaus3) joins
10:48:17jasons quits [Ping timeout: 272 seconds]
11:22:57Darken quits [Read error: Connection reset by peer]
11:27:42Darken (Darken) joins
11:43:27Darken quits [Read error: Connection reset by peer]
11:51:41jasons (jasons) joins
12:01:48kiryu__ quits [Remote host closed the connection]
12:03:01systwi quits [Ping timeout: 272 seconds]
12:16:12Darken (Darken) joins
12:16:28Darken quits [Remote host closed the connection]
12:16:43Darken (Darken) joins
12:17:18kiryu (kiryu) joins
12:17:34systwi (systwi) joins
12:22:53Darken2 (Darken) joins
12:27:05Darken quits [Ping timeout: 272 seconds]
12:31:08Darken2 quits [Read error: Connection reset by peer]
12:31:43Darken (Darken) joins
12:35:50kiryu quits [Ping timeout: 240 seconds]
12:44:11VerifiedJ quits [Quit: The Lounge - https://thelounge.chat]
12:44:49kiryu (kiryu) joins
12:49:20kiryu quits [Ping timeout: 240 seconds]
12:49:50jasons quits [Ping timeout: 240 seconds]
12:53:03Arcorann quits [Ping timeout: 272 seconds]
13:04:27kiryu joins
13:04:27kiryu quits [Changing host]
13:04:27kiryu (kiryu) joins
13:16:47Darken2 (Darken) joins
13:20:20Darken quits [Ping timeout: 240 seconds]
13:53:11jasons (jasons) joins
14:02:36h3ndr1k_ (h3ndr1k) joins
14:04:21h3ndr1k__ (h3ndr1k) joins
14:04:57h3ndr1k quits [Ping timeout: 265 seconds]
14:06:27h3ndr1k (h3ndr1k) joins
14:06:50h3ndr1k_ quits [Ping timeout: 240 seconds]
14:08:50h3ndr1k__ quits [Ping timeout: 240 seconds]
14:11:14h3ndr1k quits [Ping timeout: 265 seconds]
14:13:51h3ndr1k (h3ndr1k) joins
14:15:21eightthree quits [Remote host closed the connection]
14:15:22Darken2 quits [Read error: Connection reset by peer]
14:15:43Darken2 (Darken) joins
14:18:18eightthree joins
14:24:17h3ndr1k quits [Ping timeout: 265 seconds]
14:26:02eightthree quits [Remote host closed the connection]
14:27:16icedice (icedice) joins
14:28:18Darken2 quits [Client Quit]
14:28:35Darken (Darken) joins
14:31:23eightthree joins
14:35:42eightthree quits [Remote host closed the connection]
14:48:27eightthree joins
14:56:11h3ndr1k (h3ndr1k) joins
15:32:12<h2ibot>Switchnode edited Deathwatch (+390, /* 2024 */ add world of tanks forums): https://wiki.archiveteam.org/?diff=51668&oldid=51649
15:55:48Darken2 (Darken) joins
15:59:50Darken quits [Ping timeout: 240 seconds]
16:02:07Megame (Megame) joins
16:18:13fishingforsoup_ joins
16:22:03fishingforsoup quits [Ping timeout: 272 seconds]
16:49:50jasons quits [Ping timeout: 240 seconds]
17:16:33Hackerpcs quits [Client Quit]
17:18:04BPCZ quits [Remote host closed the connection]
17:18:38Hackerpcs (Hackerpcs) joins
17:20:23BPCZ (BPCZ) joins
17:53:47jasons (jasons) joins
18:00:35sec^nd quits [Remote host closed the connection]
18:00:54sec^nd (second) joins
18:06:36threedeeitguy39 quits [Quit: The Lounge - https://thelounge.chat]
18:15:48Darken2 quits [Read error: Connection reset by peer]
18:16:09Darken2 (Darken) joins
18:27:11threedeeitguy39 (threedeeitguy) joins
18:31:46<h2ibot>Entartet edited Deathwatch (+231, Added thebillionscompanion.net.): https://wiki.archiveteam.org/?diff=51669&oldid=51668
18:49:37jasons quits [Ping timeout: 272 seconds]
18:57:58jacksonchen666 (jacksonchen666) joins
19:23:31jacksonchen666 quits [Remote host closed the connection]
19:23:55jacksonchen666 (jacksonchen666) joins
19:24:31jacksonchen666 quits [Remote host closed the connection]
19:25:02jacksonchen666 (jacksonchen666) joins
19:30:57<h2ibot>Pokechu22 edited Games/Engines, Platforms and Hostings (+12, /* PC and Web */ [[Steam]]): https://wiki.archiveteam.org/?diff=51670&oldid=50184
19:36:30Wohlstand quits [Remote host closed the connection]
19:52:46jasons (jasons) joins
19:53:24bf_ joins
19:54:47bf_ quits [Remote host closed the connection]
19:55:01bf_ joins
19:58:26Megame quits [Client Quit]
20:00:58Darken2 quits [Client Quit]
20:01:14Darken (Darken) joins
20:01:59qwertyasdfuiopghjkl quits [Remote host closed the connection]
20:03:00bf_ quits [Remote host closed the connection]
20:03:29bf_ joins
20:05:14bf_ quits [Remote host closed the connection]
20:12:58Island joins
20:22:15bf_ joins
20:23:56<pokechu22>Hmm, `(echo a; echo b; echo c) | zstdgrep -e 'a' -e 'b'` gives no output for me but `zstdgrep -e 'a'` does as does `zgrep -e 'a' -e 'b'` or `grep -e 'a' -e 'b'` - this also happened when I used zstdgrep on a .gz file. Is this a bug or have I misunderstood something about zstdgrep?
20:25:00<@JAA>This is a bug.
20:25:19<@JAA>https://github.com/facebook/zstd/issues/2064
20:25:41<@JAA>zstdless has similar issues with option parsing: https://github.com/facebook/zstd/issues/2880
20:26:07<pokechu22>Oof
20:26:53<@JAA>Er, zstdless had*, although I haven't verified whether everything behaves correctly now.
20:27:01<pokechu22>I didn't even intend to type zstdgrep the first time, glad I noticed the missing output (I was verifying that extracting JIRA attachments from junk that gets logged in the meta-warc would work by comparing it with one where we extracted it from the DB)
20:27:31<@JAA>Yeah, zstdgrep is fine for very simple cases, but if in doubt, it's better to use `zstdcat | grep ...` instead.
20:27:51SootBector quits [Remote host closed the connection]
20:28:30SootBector (SootBector) joins
20:33:51bf_ quits [Remote host closed the connection]
20:35:19jacksonchen666 quits [Client Quit]
20:44:02<pokechu22>... ok, new problem, and this seems like it's not a grep one: from view-source:https://web.archive.org/web/20230929192111id_/https://bugs.mojang.com/browse/MC-180529 archivebot saw data-downloadurl="application/zip:Normal_Font_TT_v3.zip:https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip" and extracted
20:44:04<pokechu22>https://bugs.mojang.com/browse/application/zip:Normal_Font_TT_v3.zip:https:/bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip but it *didn't* do anything with data-downloadurl="text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log"
20:44:32<pokechu22>both https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log and https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip ended up in the database though
20:45:49<pokechu22>It doesn't seem to have extracted anything along the lines of browse/text.*\.log:
20:46:08<pokechu22>but it did accept https://bugs.mojang.com/browse/text/plain:crash.log.txt:https:/bugs.mojang.com/secure/attachment/71965/crash.log.txt
20:47:03<pokechu22>hmm, it also didn't extract any .nbt or .dat files - does archivebot have a list of extensions it'll assume might be files when doing extraction from data attributes?
20:49:24<@JAA>This would be on wpull, not AB.
20:51:01<@JAA>https://github.com/ArchiveTeam/wpull/blob/cfa5bcc571e7ff2d5175d8299e90651955c72df5/wpull/scraper/html.py#L618-L621
20:51:51jasons quits [Ping timeout: 272 seconds]
20:51:57<@JAA>And https://github.com/ArchiveTeam/wpull/blob/cfa5bcc571e7ff2d5175d8299e90651955c72df5/wpull/scraper/util.py#L136-L217
20:52:49<@JAA>That should pass `is_likely_link`.
20:53:34<@JAA>Oh hmm, unless it's the `mimetype.guess` check.
20:53:43<@JAA>`mimetype.guess_type` *
20:54:59<@JAA>Yeah, it fails the `is_likely_link` check.
20:56:09<pokechu22>alright, I guess we do need the database after all :|
20:56:38<@JAA>Yep, `mimetypes.guess_type` doesn't know about `.log`.
20:56:56<@JAA>It wouldn't be in the DB either.
20:57:23<@JAA>`mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `(None, None)`
20:57:36<pokechu22>That wouldn't, but the correct URL (https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip or https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log) will be; they're just not saved due to the no-parent rule
20:57:46<@JAA>Ah
20:59:03<pokechu22>this also means I need to find the database for hub.spigotmc.org which we ran a while back and saved the DB for, but I don't think I ever extracted outlinks from
20:59:49<pokechu22>I'll start !a < list jobs for several of the JIRA instances since we are running low on time, and then ping you for the DBs to be saved
21:00:17<thuban>fwiw on 3.11 `mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `('text/plain', None)`
21:01:19<pokechu22>It probably still won't like .dat or .nbt though
21:01:48<thuban>indeed not
21:02:05<@JAA>thuban: I'm still getting `(None, None)` on 3.11.
21:02:51<@JAA>I think the `mimetypes` module does some discovery stuff in /usr/share or something like that.
21:02:57<@JAA>So it can differ from system to system.
21:03:41<thuban>ah, so it does
21:04:18<@JAA>https://github.com/python/cpython/blob/831b95d9b970901a39c64b5f261f379a490c64fb/Lib/mimetypes.py#L48-L58
21:04:31<@JAA>Not /usr/share but same concept. :-)
21:04:42<thuban>you beat me to it, new github is awful v_v
21:05:27<@JAA>It sure is, I do more and more stuff locally with a clone instead.
21:05:42<@JAA>Especially since code search is loginwalled anyway.
21:08:53<thuban>anyway, perhaps the ab pipelines should be fitted with local mimetype files?
21:10:59<@JAA>Perhaps wpull should ship its own list and init the `mimetypes` module with that.
21:12:23<thuban>ah! i didn't see that option. yes, that would simplify things
21:13:16<@JAA>Apache's list doesn't even have .gz and .zst...
21:26:33Darken2 (Darken) joins
21:26:50eightthree quits [Remote host closed the connection]
21:27:17DLoader_ (DLoader) joins
21:28:50Darken quits [Ping timeout: 240 seconds]
21:29:41<@JAA>Looks like they're open to changes: https://github.com/apache/httpd/pull/372
21:30:29DLoader quits [Ping timeout: 272 seconds]
21:30:37DLoader_ is now known as DLoader
21:32:56Wohlstand (Wohlstand) joins
21:45:48qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
21:54:48jasons (jasons) joins
22:01:23qwertyasdfuiopghjkl quits [Remote host closed the connection]
22:03:37qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins
22:12:32<h2ibot>Pokechu22 edited Jira (+163, the database is still needed): https://wiki.archiveteam.org/?diff=51671&oldid=51666
22:21:50lennier1 quits [Ping timeout: 240 seconds]
22:22:23lennier1 (lennier1) joins
22:53:27jasons quits [Ping timeout: 272 seconds]
22:55:57parfait (kdqep) joins
22:59:32Wohlstand quits [Client Quit]
23:01:25eightthree joins
23:04:08lunik173 quits [Quit: Ping timeout (120 seconds)]
23:04:22lunik173 joins
23:08:39Darken2 quits [Ping timeout: 272 seconds]
23:13:27Darken (Darken) joins
23:15:44Darken quits [Read error: Connection reset by peer]
23:22:09BlueMaxima joins
23:38:48qwertyasdfuiopghjkl quits [Remote host closed the connection]
23:55:50<h2ibot>Pokechu22 edited Jira (+0, update script): https://wiki.archiveteam.org/?diff=51672&oldid=51671
23:56:16jasons (jasons) joins