00:08:11 | <flashfire42> | the fuck |
00:16:43 | <Pedrosso> | Made a table of Steam Workshops (the table starting out collapsed for obvious reasons) |
00:20:15 | <fireonlive> | nice |
00:20:20 | <Pedrosso> | ye |
00:42:11 | | jasons quits [Ping timeout: 272 seconds] |
00:48:31 | | Billy549_ quits [Ping timeout: 272 seconds] |
00:49:00 | | Billy549 (Billy549) joins |
00:50:15 | <nicolas17> | >283 kilobytes |
01:22:03 | | Mateon2 joins |
01:23:59 | | Mateon1 quits [Ping timeout: 272 seconds] |
01:23:59 | | Mateon2 is now known as Mateon1 |
01:30:34 | | Mateon2 joins |
01:32:51 | | Mateon1 quits [Ping timeout: 272 seconds] |
01:32:51 | | Mateon2 is now known as Mateon1 |
01:45:11 | | jasons (jasons) joins |
02:00:36 | | xarph joins |
02:00:45 | | DJ joins |
02:02:58 | <DJ> | https://anon.cafe/ is shutting down on March 15 |
02:03:19 | <DJ> | https://anon.cafe/meta/res/16466.html announcement |
02:07:32 | <nicolas17> | what is it? |
02:13:23 | | Dominika quits [Ping timeout: 272 seconds] |
02:15:14 | <DJ> | It's an imageboard, part of a webring. Shutting down because of operating costs https://anon.cafe/meta/res/16467.html#16486 |
02:16:33 | <DJ> | Oh sorry that's not the board owner, it's just speculation they don't know. |
02:40:04 | | DJ quits [Ping timeout: 265 seconds] |
02:41:03 | <pabs> | pokechu22: a jira https://jira.ecmwf.int |
02:41:33 | <h2ibot> | Pokechu22 edited Jira (+23, /* Not yet archived */ https://jira.ecmwf.int): https://wiki.archiveteam.org/?diff=51661&oldid=51655 |
02:41:33 | <pokechu22> | thanks |
02:41:40 | <pokechu22> | I'm going to try to get something started on those soon |
02:41:50 | | jasons quits [Ping timeout: 240 seconds] |
02:42:38 | <pokechu22> | I'm pretty sure the database doesn't actually need to be saved to get attachments, as the same URL extraction issue that causes a bunch of junk relative URLs for attachments means that all attachments get logged... so that simplifies things a bit |
02:52:00 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
02:58:14 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
03:45:16 | | jasons (jasons) joins |
03:48:16 | | thalia_ joins |
03:52:54 | | icedice quits [Client Quit] |
03:55:40 | | razul6 joins |
03:57:50 | | razul quits [Ping timeout: 240 seconds] |
03:57:50 | | razul6 is now known as razul |
04:10:03 | | wyatt8750 joins |
04:10:33 | | wyatt8740 quits [Ping timeout: 272 seconds] |
04:11:27 | | rohvani joins |
04:23:52 | <h2ibot> | JustAnotherArchivist edited Current Projects (+1, Fix date): https://wiki.archiveteam.org/?diff=51662&oldid=51658 |
04:30:54 | <h2ibot> | FireonLive edited Current Projects (+16, move Blogger to long-term to reflect new…): https://wiki.archiveteam.org/?diff=51663&oldid=51662 |
04:35:37 | | missaustraliana joins |
04:36:32 | | missaustraliana quits [Client Quit] |
04:38:25 | | wyatt8750 quits [Ping timeout: 272 seconds] |
04:38:40 | | wyatt8740 joins |
04:41:50 | | Craigle quits [Quit: The Lounge - https://thelounge.chat] |
04:41:50 | | jasons quits [Ping timeout: 240 seconds] |
04:42:20 | | Craigle (Craigle) joins |
05:09:48 | <fireonlive> | Weaveworks is shutting down - https://www.linkedin.com/posts/richardsonalexis_hi-everyone-i-am-very-sad-to-announce-activity-7160295096825860096-ZS67 https://news.ycombinator.com/item?id=39262650 |
05:09:55 | | imer quits [Killed (NickServ (GHOST command used by imer6))] |
05:10:02 | | imer (imer) joins |
05:35:04 | <h2ibot> | Pokechu22 edited Jira (+50, /* Not yet archived */…): https://wiki.archiveteam.org/?diff=51664&oldid=51661 |
05:45:14 | | jasons (jasons) joins |
06:06:10 | <h2ibot> | Pokechu22 edited Jira (+244, /* Strategy */ database isn't needed; link script): https://wiki.archiveteam.org/?diff=51665&oldid=51664 |
06:07:10 | <h2ibot> | Pokechu22 edited Jira (+26, /* Not yet archived */ https://bugs.openjdk.org/): https://wiki.archiveteam.org/?diff=51666&oldid=51665 |
06:09:37 | | igloo22225 quits [Ping timeout: 272 seconds] |
06:15:20 | | igloo22225 (igloo22225) joins |
06:21:53 | | Island quits [Read error: Connection reset by peer] |
06:23:54 | | BearFortress quits [Read error: Connection reset by peer] |
06:25:09 | | Arcorann (Arcorann) joins |
06:26:34 | | BearFortress joins |
06:44:27 | | jasons quits [Ping timeout: 272 seconds] |
07:17:49 | | Naruyoko quits [Read error: Connection reset by peer] |
07:19:26 | | Naruyoko joins |
07:47:21 | | jasons (jasons) joins |
08:27:11 | | Wohlstand (Wohlstand) joins |
08:46:20 | | jasons quits [Ping timeout: 240 seconds] |
08:54:17 | | magmaus3 quits [Ping timeout: 272 seconds] |
09:17:36 | | Chris5010 (Chris5010) joins |
09:49:52 | | jasons (jasons) joins |
09:57:00 | <h2ibot> | Exorcism edited Vbox7 (+64): https://wiki.archiveteam.org/?diff=51667&oldid=51648 |
10:16:28 | | Mist8kenGAS (Mist8kenGAS) joins |
10:37:50 | | parfait quits [Client Quit] |
10:44:35 | | magmaus3 (magmaus3) joins |
10:48:17 | | jasons quits [Ping timeout: 272 seconds] |
11:22:57 | | Darken quits [Read error: Connection reset by peer] |
11:27:42 | | Darken (Darken) joins |
11:43:27 | | Darken quits [Read error: Connection reset by peer] |
11:51:41 | | jasons (jasons) joins |
12:01:48 | | kiryu__ quits [Remote host closed the connection] |
12:03:01 | | systwi quits [Ping timeout: 272 seconds] |
12:16:12 | | Darken (Darken) joins |
12:16:28 | | Darken quits [Remote host closed the connection] |
12:16:43 | | Darken (Darken) joins |
12:17:18 | | kiryu (kiryu) joins |
12:17:34 | | systwi (systwi) joins |
12:22:53 | | Darken2 (Darken) joins |
12:27:05 | | Darken quits [Ping timeout: 272 seconds] |
12:31:08 | | Darken2 quits [Read error: Connection reset by peer] |
12:31:43 | | Darken (Darken) joins |
12:35:50 | | kiryu quits [Ping timeout: 240 seconds] |
12:44:11 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
12:44:49 | | kiryu (kiryu) joins |
12:49:20 | | kiryu quits [Ping timeout: 240 seconds] |
12:49:50 | | jasons quits [Ping timeout: 240 seconds] |
12:53:03 | | Arcorann quits [Ping timeout: 272 seconds] |
13:04:27 | | kiryu joins |
13:04:27 | | kiryu is now authenticated as kiryu |
13:04:27 | | kiryu quits [Changing host] |
13:04:27 | | kiryu (kiryu) joins |
13:16:47 | | Darken2 (Darken) joins |
13:20:20 | | Darken quits [Ping timeout: 240 seconds] |
13:53:11 | | jasons (jasons) joins |
14:02:36 | | h3ndr1k_ (h3ndr1k) joins |
14:04:21 | | h3ndr1k__ (h3ndr1k) joins |
14:04:57 | | h3ndr1k quits [Ping timeout: 265 seconds] |
14:06:27 | | h3ndr1k (h3ndr1k) joins |
14:06:50 | | h3ndr1k_ quits [Ping timeout: 240 seconds] |
14:08:50 | | h3ndr1k__ quits [Ping timeout: 240 seconds] |
14:11:14 | | h3ndr1k quits [Ping timeout: 265 seconds] |
14:13:51 | | h3ndr1k (h3ndr1k) joins |
14:15:21 | | eightthree quits [Remote host closed the connection] |
14:15:22 | | Darken2 quits [Read error: Connection reset by peer] |
14:15:43 | | Darken2 (Darken) joins |
14:18:18 | | eightthree joins |
14:24:17 | | h3ndr1k quits [Ping timeout: 265 seconds] |
14:26:02 | | eightthree quits [Remote host closed the connection] |
14:27:16 | | icedice (icedice) joins |
14:28:18 | | Darken2 quits [Client Quit] |
14:28:35 | | Darken (Darken) joins |
14:31:23 | | eightthree joins |
14:35:42 | | eightthree quits [Remote host closed the connection] |
14:48:27 | | eightthree joins |
14:56:11 | | h3ndr1k (h3ndr1k) joins |
15:32:12 | <h2ibot> | Switchnode edited Deathwatch (+390, /* 2024 */ add world of tanks forums): https://wiki.archiveteam.org/?diff=51668&oldid=51649 |
15:55:48 | | Darken2 (Darken) joins |
15:59:50 | | Darken quits [Ping timeout: 240 seconds] |
16:02:07 | | Megame (Megame) joins |
16:18:13 | | fishingforsoup_ joins |
16:22:03 | | fishingforsoup quits [Ping timeout: 272 seconds] |
16:49:50 | | jasons quits [Ping timeout: 240 seconds] |
17:16:33 | | Hackerpcs quits [Client Quit] |
17:18:04 | | BPCZ quits [Remote host closed the connection] |
17:18:38 | | Hackerpcs (Hackerpcs) joins |
17:20:23 | | BPCZ (BPCZ) joins |
17:53:47 | | jasons (jasons) joins |
18:00:35 | | sec^nd quits [Remote host closed the connection] |
18:00:54 | | sec^nd (second) joins |
18:06:36 | | threedeeitguy39 quits [Quit: The Lounge - https://thelounge.chat] |
18:15:48 | | Darken2 quits [Read error: Connection reset by peer] |
18:16:09 | | Darken2 (Darken) joins |
18:27:11 | | threedeeitguy39 (threedeeitguy) joins |
18:31:46 | <h2ibot> | Entartet edited Deathwatch (+231, Added thebillionscompanion.net.): https://wiki.archiveteam.org/?diff=51669&oldid=51668 |
18:49:37 | | jasons quits [Ping timeout: 272 seconds] |
18:57:58 | | jacksonchen666 (jacksonchen666) joins |
19:23:31 | | jacksonchen666 quits [Remote host closed the connection] |
19:23:55 | | jacksonchen666 (jacksonchen666) joins |
19:24:31 | | jacksonchen666 quits [Remote host closed the connection] |
19:25:02 | | jacksonchen666 (jacksonchen666) joins |
19:30:57 | <h2ibot> | Pokechu22 edited Games/Engines, Platforms and Hostings (+12, /* PC and Web */ [[Steam]]): https://wiki.archiveteam.org/?diff=51670&oldid=50184 |
19:36:30 | | Wohlstand quits [Remote host closed the connection] |
19:52:46 | | jasons (jasons) joins |
19:53:24 | | bf_ joins |
19:54:47 | | bf_ quits [Remote host closed the connection] |
19:55:01 | | bf_ joins |
19:58:26 | | Megame quits [Client Quit] |
20:00:58 | | Darken2 quits [Client Quit] |
20:01:14 | | Darken (Darken) joins |
20:01:59 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
20:03:00 | | bf_ quits [Remote host closed the connection] |
20:03:29 | | bf_ joins |
20:05:14 | | bf_ quits [Remote host closed the connection] |
20:12:58 | | Island joins |
20:22:15 | | bf_ joins |
20:23:56 | <pokechu22> | Hmm, `(echo a; echo b; echo c) | zstdgrep -e 'a' -e 'b'` gives no output for me but `zstdgrep -e 'a'` does as does `zgrep -e 'a' -e 'b'` or `grep -e 'a' -e 'b'` - this also happened when I used zstdgrep on a .gz file. Is this a bug or have I misunderstood something about zstdgrep? |
20:25:00 | <@JAA> | This is a bug. |
20:25:19 | <@JAA> | https://github.com/facebook/zstd/issues/2064 |
20:25:41 | <@JAA> | zstdless has similar issues with option parsing: https://github.com/facebook/zstd/issues/2880 |
20:26:07 | <pokechu22> | Oof |
20:26:53 | <@JAA> | Er, zstdless had*, although I haven't verified whether everything behaves correctly now. |
20:27:01 | <pokechu22> | I didn't even intend to type zstdgrep the first time, glad I noticed the missing output (I was verifying that extracting JIRA attachments from junk that gets logged in the meta-warc would work by comparing it with one where we extracted it from the DB) |
20:27:31 | <@JAA> | Yeah, zstdgrep is fine for very simple cases, but if in doubt, it's better to use `zstdcat | grep ...` instead. |
20:27:51 | | SootBector quits [Remote host closed the connection] |
20:28:30 | | SootBector (SootBector) joins |
20:33:51 | | bf_ quits [Remote host closed the connection] |
20:35:19 | | jacksonchen666 quits [Client Quit] |
20:44:02 | <pokechu22> | ... ok, new problem, and this seems like it's not a grep one: from view-source:https://web.archive.org/web/20230929192111id_/https://bugs.mojang.com/browse/MC-180529 archivebot saw data-downloadurl="application/zip:Normal_Font_TT_v3.zip:https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip" and extracted |
20:44:04 | <pokechu22> | https://bugs.mojang.com/browse/application/zip:Normal_Font_TT_v3.zip:https:/bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip but it *didn't* do anything with data-downloadurl="text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log" |
20:44:32 | <pokechu22> | both https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log and https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip ended up in the database though |
20:45:49 | <pokechu22> | It doesn't seem to have extracted anything along the lines of browse/text.*\.log: |
20:46:08 | <pokechu22> | but it did accept https://bugs.mojang.com/browse/text/plain:crash.log.txt:https:/bugs.mojang.com/secure/attachment/71965/crash.log.txt |
20:47:03 | <pokechu22> | hmm, it also didn't extract any .nbt or .dat files - does archivebot have a list of extensions it'll assume might be files when doing extraction from data attributes? |
20:49:24 | <@JAA> | This would be on wpull, not AB. |
20:51:01 | <@JAA> | https://github.com/ArchiveTeam/wpull/blob/cfa5bcc571e7ff2d5175d8299e90651955c72df5/wpull/scraper/html.py#L618-L621 |
20:51:51 | | jasons quits [Ping timeout: 272 seconds] |
20:51:57 | <@JAA> | And https://github.com/ArchiveTeam/wpull/blob/cfa5bcc571e7ff2d5175d8299e90651955c72df5/wpull/scraper/util.py#L136-L217 |
20:52:49 | <@JAA> | That should pass `is_likely_link`. |
20:53:34 | <@JAA> | Oh hmm, unless it's the `mimetype.guess` check. |
20:53:43 | <@JAA> | `mimetype.guess_type` * |
20:54:59 | <@JAA> | Yeah, it fails the `is_likely_link` check. |
20:56:09 | <pokechu22> | alright, I guess we do need the database after all :| |
20:56:38 | <@JAA> | Yep, `mimetypes.guess_type` doesn't know about `.log`. |
20:56:56 | <@JAA> | It wouldn't be in the DB either. |
20:57:23 | <@JAA> | `mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `(None, None)` |
20:57:36 | <pokechu22> | That wouldn't, but the correct URL (https://bugs.mojang.com/secure/attachment/286387/Normal_Font_TT_v3.zip or https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log) will be; they're just not saved due to the no-parent rule |
20:57:46 | <@JAA> | Ah |
20:59:03 | <pokechu22> | this also means I need to find the database for hub.spigotmc.org which we ran a while back and saved the DB for, but I don't think I ever extracted outlinks from |
20:59:49 | <pokechu22> | I'll start !a < list jobs for several of the JIRA instances since we are running low on time, and then ping you for the DBs to be saved |
21:00:17 | <thuban> | fwiw on 3.11 `mimetypes.guess_type('text/plain:hs_err_pid9900.log:https://bugs.mojang.com/secure/attachment/286386/hs_err_pid9900.log', strict=False)` → `('text/plain', None)` |
21:01:19 | <pokechu22> | It probably still won't like .dat or .nbt though |
21:01:48 | <thuban> | indeed not |
21:02:05 | <@JAA> | thuban: I'm still getting `(None, None)` on 3.11. |
21:02:51 | <@JAA> | I think the `mimetypes` module does some discovery stuff in /usr/share or something like that. |
21:02:57 | <@JAA> | So it can differ from system to system. |
21:03:41 | <thuban> | ah, so it does |
21:04:18 | <@JAA> | https://github.com/python/cpython/blob/831b95d9b970901a39c64b5f261f379a490c64fb/Lib/mimetypes.py#L48-L58 |
21:04:31 | <@JAA> | Not /usr/share but same concept. :-) |
21:04:42 | <thuban> | you beat me to it, new github is awful v_v |
21:05:27 | <@JAA> | It sure is, I do more and more stuff locally with a clone instead. |
21:05:42 | <@JAA> | Especially since code search is loginwalled anyway. |
21:08:53 | <thuban> | anyway, perhaps the ab pipelines should be fitted with local mimetype files? |
21:10:59 | <@JAA> | Perhaps wpull should ship its own list and init the `mimetypes` module with that. |
21:12:23 | <thuban> | ah! i didn't see that option. yes, that would simplify things |
21:13:16 | <@JAA> | Apache's list doesn't even have .gz and .zst... |
21:26:33 | | Darken2 (Darken) joins |
21:26:50 | | eightthree quits [Remote host closed the connection] |
21:27:17 | | DLoader_ (DLoader) joins |
21:28:50 | | Darken quits [Ping timeout: 240 seconds] |
21:29:41 | <@JAA> | Looks like they're open to changes: https://github.com/apache/httpd/pull/372 |
21:30:29 | | DLoader quits [Ping timeout: 272 seconds] |
21:30:37 | | DLoader_ is now known as DLoader |
21:32:56 | | Wohlstand (Wohlstand) joins |
21:45:48 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
21:54:48 | | jasons (jasons) joins |
22:01:23 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
22:03:37 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
22:12:32 | <h2ibot> | Pokechu22 edited Jira (+163, the database is still needed): https://wiki.archiveteam.org/?diff=51671&oldid=51666 |
22:21:50 | | lennier1 quits [Ping timeout: 240 seconds] |
22:22:23 | | lennier1 (lennier1) joins |
22:53:27 | | jasons quits [Ping timeout: 272 seconds] |
22:55:57 | | parfait (kdqep) joins |
22:59:32 | | Wohlstand quits [Client Quit] |
23:01:25 | | eightthree joins |
23:04:08 | | lunik173 quits [Quit: Ping timeout (120 seconds)] |
23:04:22 | | lunik173 joins |
23:08:39 | | Darken2 quits [Ping timeout: 272 seconds] |
23:13:27 | | Darken (Darken) joins |
23:15:44 | | Darken quits [Read error: Connection reset by peer] |
23:22:09 | | BlueMaxima joins |
23:38:48 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
23:55:50 | <h2ibot> | Pokechu22 edited Jira (+0, update script): https://wiki.archiveteam.org/?diff=51672&oldid=51671 |
23:56:16 | | jasons (jasons) joins |