00:10:05 | | IDK quits [Client Quit] |
00:15:56 | | nulldata quits [Ping timeout: 255 seconds] |
00:17:29 | | nulldata (nulldata) joins |
00:34:08 | | magmaus3 quits [Client Quit] |
00:36:44 | | magmaus3 (magmaus3) joins |
01:01:29 | | beastbg8 quits [Read error: Connection reset by peer] |
01:24:41 | | BlueMaxima joins |
01:39:51 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
01:48:27 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
01:50:01 | | beastbg8 (beastbg8) joins |
01:56:31 | | etnguyen03 quits [Ping timeout: 272 seconds] |
01:58:18 | | beastbg8 quits [Read error: Connection reset by peer] |
02:01:35 | | qwertyasdfuiopghjkl quits [Ping timeout: 255 seconds] |
02:02:26 | | owen joins |
02:03:34 | | beastbg8 (beastbg8) joins |
02:05:51 | | sonick (sonick) joins |
02:08:11 | | etnguyen03 (etnguyen03) joins |
02:17:57 | <@OrIdow6> | thuban: On the Nitter idea, guess this would be a la our various transfers where it's technically public but we just hope that the amount of randos using it is low? |
02:18:23 | <@OrIdow6> | One alternative is just to make records of the API calls Nitter makes |
02:18:51 | <@OrIdow6> | But I guess assuming that it's optimal in that department, little disadvantage in rendering them as HTML |
02:20:31 | <thuban> | yeah (and/or rate-limiting harshly but whitelisting ab pipelines) |
02:21:04 | | AramZS joins |
02:22:20 | <AramZS> | Looks like WAMU may shut down soon - https://twitter.com/ElaheIzadi/status/1760814525714981340 |
02:22:20 | <eggdrop> | nitter: https://farside.link/nitter/ElaheIzadi/status/1760814525714981340 |
02:22:55 | <AramZS> | If they do, https://wamu.org/ and https://dcist.com/ are both in danger of outage |
02:27:05 | <@JAA> | Archiving VICE Video needs some work. Scripting, GraphQL, expiring URLs, etc. |
02:31:03 | <nulldata> | WAMU - doesn't sound like a shutdown, but more of a we're going down to a skeleton crew. Still though, probably should be thrown into AB - especially https://dcist.com since it's likely less priority for them to keep than the station itself. |
02:32:16 | <@JAA> | Agreed, I'll throw both in without offsite links (can be done with #// later). |
02:35:27 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
02:37:12 | | cascode joins |
02:52:20 | <@JAA> | Nice, WAMU has an open S3 bucket. |
02:52:58 | <nicolas17> | OwO |
02:53:00 | <fireonlive> | hot |
02:54:19 | <nicolas17> | wtf is this /wamu-home-timestamp that it fetches a dozen times per click |
02:59:09 | <AramZS> | looks like someone broke their Google Analytics setup |
03:00:25 | <nicolas17> | hm I can't see any S3? recorded audio comes from downloads.wamu.org which is a special "streaming server" |
03:02:17 | | AramZS quits [Remote host closed the connection] |
03:03:47 | <nicolas17> | static.wamu.org is S3/cloudfront but it's not open for listing |
03:05:07 | <@JAA> | The bucket I found doesn't have anything particularly interesting. |
03:05:18 | <@JAA> | Just images |
03:05:43 | | AramZS joins |
03:06:06 | <nicolas17> | http://s3.amazonaws.com/static.wamu.org/d/programs/wamu_program_guide.pdf okay I have confirmation that static.wamu.org is a bucket |
03:06:09 | <nicolas17> | but no listing |
03:08:46 | <fireonlive> | images can be titillating too~ |
03:09:21 | | etnguyen03 quits [Ping timeout: 272 seconds] |
03:28:59 | | threedeeitguy39 quits [Quit: Ping timeout (120 seconds)] |
03:29:42 | | threedeeitguy39 (threedeeitguy) joins |
03:39:07 | <@arkiver> | i read on vice something about getting without outlinks and getting only recent articles |
03:39:32 | <@arkiver> | if we have a job running for only recent articles, i think it would still be good to run jobs next to that for entire sites as well without outlinks |
03:39:55 | | Doranwen (Doranwen) joins |
03:40:06 | <@JAA> | arkiver: Both are running, yes. |
03:40:24 | <@arkiver> | perfect |
03:40:25 | <@JAA> | The !ao < job for just the new articles is almost done. |
03:40:29 | <@arkiver> | JAA: can i help in any way? |
03:40:41 | | lennier2_ quits [Read error: Connection reset by peer] |
03:40:55 | | lennier2_ joins |
03:41:02 | <@JAA> | Not sure what we want to do with https://video.vice.com/ which is a mess. |
03:41:21 | <@JAA> | The main site works reasonably well with AB I believe. |
03:42:49 | | jacksonchen666 (jacksonchen666) joins |
03:43:07 | | jacksonchen666 quits [Remote host closed the connection] |
03:43:19 | | kitonthenet joins |
03:43:45 | <@JAA> | I haven't checked what else there is, but I think someone else did, and a few other subdomains were run through AB earlier. |
03:47:23 | <@arkiver> | there's a bunch of "brands" listed at https://en.wikipedia.org/wiki/Vice_Media |
03:49:03 | | etnguyen03 (etnguyen03) joins |
04:16:02 | | owen quits [Client Quit] |
04:40:57 | | eyes quits [Client Quit] |
04:43:47 | | jacksonchen666 (jacksonchen666) joins |
04:44:32 | | atphoenix quits [Remote host closed the connection] |
04:45:16 | | atphoenix (atphoenix) joins |
04:46:15 | | kitonthenet quits [Ping timeout: 272 seconds] |
04:48:24 | | jacksonchen666 quits [Ping timeout: 255 seconds] |
04:58:07 | | kitonthenet joins |
04:59:47 | | tachymelia joins |
05:00:47 | <tachymelia> | thanks for all the help over the past week!!! I just uploaded my first warc archive to archive.org |
05:00:59 | <tachymelia> | faqs said to tag it archiveteam and mention it here so, https://archive.org/details/dtgforums.warc |
05:15:06 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
05:23:11 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
05:34:21 | <thuban> | i'm working on deathwatch. anyone know whether periscope videos shared to twitter are still available, or have an example thereof, or can look through #microscope logs? |
05:35:10 | <thuban> | ah nvm, think i found one finally |
05:43:08 | | Island quits [Read error: Connection reset by peer] |
05:44:41 | | jacksonchen666 (jacksonchen666) joins |
05:47:03 | | kitonthenet quits [Ping timeout: 272 seconds] |
05:49:20 | | jacksonchen666 quits [Remote host closed the connection] |
05:52:48 | | kitonthenet joins |
06:00:59 | | etnguyen03 quits [Ping timeout: 272 seconds] |
06:13:01 | | kitonthenet quits [Ping timeout: 272 seconds] |
06:22:02 | | etnguyen03 (etnguyen03) joins |
06:26:16 | | kitonthenet joins |
06:28:11 | | etnguyen03 quits [Remote host closed the connection] |
06:33:56 | | kitonthenet quits [Ping timeout: 255 seconds] |
06:36:53 | | kitonthenet joins |
06:40:47 | | Wohlstand quits [Client Quit] |
06:44:20 | <thuban> | JAA: was there ever an announcement for the delay of the travis-ci.org shutdown from may to june 2021, or did they just kinda dawdle pulling the plug? |
06:44:48 | | tachymelia quits [Client Quit] |
06:48:34 | | jacksonchen666 (jacksonchen666) joins |
06:49:07 | | jacksonchen666 quits [Read error: Connection reset by peer] |
07:06:06 | <thuban> | also, what's the house style for linking to archived pages? a bunch of these shutdown announcements are, themselves, gone. |
07:06:08 | <thuban> | bare-linking directly to a wbm snapshot seems unfortunate compared to using the url template, but the url template can neither indicate that the target is dead nor link to a specific snapshot date (and realistically only ia will have the page most of the time anyway)... |
07:07:02 | <fireonlive> | hmm maybe the url template should have an indicator for is dead |
07:07:12 | <fireonlive> | like a strike through or something |
07:07:36 | <fireonlive> | it does seem to be the way ™ though |
07:15:05 | | kitonthenet quits [Ping timeout: 272 seconds] |
07:15:55 | | Arcorann (Arcorann) joins |
07:18:01 | | archivefan joins |
07:19:08 | | archivefan quits [Remote host closed the connection] |
07:19:56 | | archivefan joins |
07:20:18 | | archivefan quits [Remote host closed the connection] |
07:20:24 | | kitonthenet joins |
07:25:14 | | kitonthenet quits [Ping timeout: 255 seconds] |
07:25:44 | | kitonthenet joins |
07:26:37 | | le0n quits [Quit: see you later, alligator] |
07:29:02 | | le0n (le0n) joins |
07:33:20 | | kitonthenet quits [Ping timeout: 255 seconds] |
07:37:55 | | kitonthenet joins |
07:42:17 | <thuban> | https://github.com/docker/roadmap/issues/152#issuecomment-1399177057 i also would like to know this! |
07:44:10 | <thuban> | the blog post delaying the deadline said they would "announce the timeline for new image retention policies early in 2021" (https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates/), but i can't find any such announcement, and there seems to be nothing relevant in their pricing page or faqs |
07:45:01 | <fireonlive> | hmm i don't see it in the current tos either |
07:45:06 | <fireonlive> | other than username stuff |
07:46:40 | | jacksonchen666 (jacksonchen666) joins |
07:49:59 | | jacksonchen666 quits [Remote host closed the connection] |
07:54:56 | | nulldata quits [Ping timeout: 255 seconds] |
07:56:14 | | nulldata (nulldata) joins |
07:59:25 | | kitonthenet quits [Ping timeout: 272 seconds] |
08:00:55 | | kitonthenet joins |
08:01:07 | | magmaus3 quits [Client Quit] |
08:01:24 | | magmaus3 (magmaus3) joins |
08:05:45 | | kitonthenet quits [Ping timeout: 272 seconds] |
08:16:58 | | kitonthenet joins |
08:23:44 | | kitonthenet quits [Ping timeout: 255 seconds] |
08:34:01 | | BlueMaxima quits [Read error: Connection reset by peer] |
08:35:54 | | shreyasminocha quits [Remote host closed the connection] |
08:35:54 | | thehedgeh0g quits [Remote host closed the connection] |
08:35:54 | | evan quits [Remote host closed the connection] |
08:35:54 | | c3manu quits [Remote host closed the connection] |
08:35:57 | | evan joins |
08:36:00 | | c3manu (c3manu) joins |
08:36:00 | | thehedgeh0g (mrHedgehog0) joins |
08:36:00 | | shreyasminocha (shreyasminocha) joins |
08:36:04 | | kitonthenet joins |
08:43:07 | | kitonthenet quits [Ping timeout: 272 seconds] |
08:44:25 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
08:47:39 | | jacksonchen666 (jacksonchen666) joins |
08:51:53 | | jacksonchen666 quits [Remote host closed the connection] |
08:54:55 | | kitonthenet joins |
09:02:53 | | kitonthenet quits [Ping timeout: 255 seconds] |
09:15:20 | | kitonthenet joins |
09:47:00 | | ell quits [Client Quit] |
09:47:22 | | ell (ell) joins |
09:48:38 | | jacksonchen666 (jacksonchen666) joins |
09:49:48 | | ell quits [Client Quit] |
09:50:23 | | ell (ell) joins |
09:51:14 | | jacksonchen666 quits [Remote host closed the connection] |
10:00:02 | | Bleo18260 quits [Client Quit] |
10:01:19 | | Bleo18260 joins |
10:02:17 | | kitonthenet quits [Ping timeout: 272 seconds] |
10:13:21 | | kitonthenet joins |
10:18:56 | | kitonthenet quits [Ping timeout: 255 seconds] |
10:29:45 | | kitonthenet joins |
10:30:55 | | nimaje quits [Read error: Connection reset by peer] |
10:32:03 | | nimaje joins |
10:37:07 | | kitonthenet quits [Ping timeout: 272 seconds] |
10:42:04 | | knecht4 quits [Quit: knecht420] |
10:42:08 | | kitonthenet joins |
10:43:05 | | knecht4 joins |
10:43:55 | | ell quits [Client Quit] |
10:44:25 | | ell (ell) joins |
10:46:52 | | ell quits [Client Quit] |
10:47:16 | | ell (ell) joins |
11:02:10 | | jacksonchen666 (jacksonchen666) joins |
11:02:42 | | bf_ joins |
11:04:50 | | kitonthenet quits [Ping timeout: 255 seconds] |
11:05:56 | | kitonthenet joins |
11:13:51 | | MetaNova quits [Ping timeout: 272 seconds] |
11:18:03 | | ell quits [Client Quit] |
11:19:32 | | MetaNova (MetaNova) joins |
11:21:23 | | ell (ell) joins |
11:32:13 | | kitonthenet quits [Ping timeout: 272 seconds] |
11:32:23 | | eyes joins |
11:37:17 | | bf_ quits [Ping timeout: 272 seconds] |
11:38:18 | | bf_ joins |
11:44:57 | | kitonthenet joins |
11:51:51 | | kitonthenet quits [Ping timeout: 272 seconds] |
11:58:21 | | kitonthenet joins |
11:59:01 | | Carnildo quits [Read error: Connection reset by peer] |
11:59:03 | | Carnildo joins |
12:09:04 | | ell quits [Client Quit] |
12:09:33 | | ell (ell) joins |
12:11:37 | | ell quits [Client Quit] |
12:11:55 | | ell (ell) joins |
12:18:00 | | ell quits [Client Quit] |
12:18:53 | | ell (ell) joins |
12:20:49 | | ell quits [Client Quit] |
12:21:38 | | ell (ell) joins |
12:23:39 | | ell quits [Client Quit] |
12:23:56 | | ell (ell) joins |
12:25:23 | | kitonthenet quits [Ping timeout: 255 seconds] |
12:26:08 | | ell quits [Client Quit] |
12:26:34 | | ell (ell) joins |
12:34:52 | | kitonthenet joins |
12:43:50 | | cascode quits [Ping timeout: 255 seconds] |
12:44:10 | | cascode joins |
12:46:19 | | kitonthenet quits [Ping timeout: 272 seconds] |
13:09:45 | | Arcorann quits [Ping timeout: 272 seconds] |
13:29:01 | | cascode quits [Read error: Connection reset by peer] |
13:29:34 | | cascode joins |
13:44:21 | | ell quits [Client Quit] |
13:55:23 | <pabs> | re help.osm.o on Deathwatch, I asked on #osm-dev (OFTC) about delaying the staticisation until we can save it, they seem agreeable to that |
14:14:14 | | etnguyen03 (etnguyen03) joins |
14:35:55 | | HP_Archivist quits [Client Quit] |
14:45:23 | | etnguyen03 quits [Ping timeout: 272 seconds] |
14:54:02 | | mgrytbak quits [Quit: Ping timeout (120 seconds)] |
14:54:12 | | mgrytbak joins |
14:57:08 | | etnguyen03 (etnguyen03) joins |
14:59:13 | | mgrytbak quits [Client Quit] |
14:59:23 | | mgrytbak joins |
15:01:20 | <nulldata> | Thankfully for now the redirect for DCist is just a javascript overlay and the full page is still there for AB to pickup |
15:01:51 | | mgrytbak quits [Client Quit] |
15:02:03 | | mgrytbak joins |
15:09:23 | <bf_> | hey, is there a database of all github sha1 hashes? I heard github exported all public repos in some artic storage - maybe you guys know more? |
15:10:16 | <nicolas17> | I shudder to think how big that would be |
15:10:39 | <bf_> | I do IT forensics and it's handy to have a database with hashes of known-good files |
15:10:43 | <bf_> | or known-uninteresting |
15:10:58 | <bf_> | there is a big database from NIST called NIST RDS with a lot of hashes (~75M) of software distributions |
15:11:10 | <bf_> | but I fear these guys are just hashing physical CDs and stuff |
15:11:21 | <nicolas17> | I thought you meant commit hashes, if you want files that's even worse 💀 |
15:11:27 | <bf_> | blob hashes :) |
15:11:34 | <nicolas17> | especially since blob hashes are *not* file hashes |
15:11:46 | <bf_> | oh that is a good point, I assumed they would be the same |
15:11:57 | <bf_> | I checked github API and they don't expose that kind of stuff. also git protocol does not |
15:12:14 | <bf_> | I saw some teams are archiving github repos but that's quite a task.. |
15:12:41 | <bf_> | I was wondering if there is an official connection to the github folks and maybe this can be suggested somewhere |
15:12:56 | <bf_> | they must be deduplicating somewhere |
15:13:10 | | mgrytbak quits [Client Quit] |
15:13:18 | | mgrytbak joins |
15:14:47 | <nicolas17> | the sha1 of "hello" is aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
15:15:04 | <bf_> | yeah I know at some point such database does not make sense any more ;) |
15:15:17 | <bf_> | nist database contains sha1 hashes of files with size 1gb |
15:15:20 | <bf_> | *100gb |
15:15:40 | <nicolas17> | if you put that in a git file, the blob hash is b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0 |
15:15:48 | <bf_> | and that's basically the industry standard.. mix IT with malware hash lists and you're good to go |
15:15:49 | <nicolas17> | because it's stored as "blob 5\0hello" |
15:16:40 | <bf_> | you'd need a custom git client who fetches all files from remote repo, hash them and store the hash.. |
15:17:14 | <nicolas17> | not really a custom client, just "git clone" the repo and process it locally, but yeah it will be slow |
15:17:33 | <bf_> | git clone would write to disk, it'd be much faster to do it in memory |
15:17:50 | | mgrytbak quits [Client Quit] |
15:17:56 | | mgrytbak joins |
15:17:57 | <bf_> | is there an archive.org copy of npm or other software package libraries like pip? |
15:18:50 | <nicolas17> | well I don't think you can selectively fetch files from a git repo |
15:19:24 | <nicolas17> | it might be possible to extract files out of the packfile in a streaming fashion? |
15:19:36 | <bf_> | no unfortunately not. and github even kneecapped the official git protocol to disallow filtering for blobs only |
15:19:48 | <nicolas17> | but it seems easier to git clone into a ramdisk :P |
15:19:56 | <bf_> | yes git clone is much easier ;D |
15:20:06 | <bf_> | and then traverse through all file versions for each file |
15:20:17 | <bf_> | beautifu |
15:20:18 | <bf_> | *l |
15:27:32 | <nicolas17> | bf_: I'm currently archiving https://opensource.samsung.com/ |
15:28:28 | <bf_> | that's cool. is it also git repos? |
15:28:52 | <nicolas17> | no, it's zip files containing tar files containing dumps of the kernel used for a particular android version and device |
15:29:14 | <bf_> | %) |
15:29:24 | <nicolas17> | and they seem to have gotten angry with my automated scraping and added some extra checks to the site |
15:29:32 | <nicolas17> | culminating with adding captchas last week |
15:29:40 | <bf_> | I did a data dump of store.kde.org recently. there were also a lot of backdoored roms on there. android images with phishing toolkit installed |
15:29:53 | <bf_> | haha such clowns |
15:30:51 | <nicolas17> | captchas totally screw over my automation |
15:30:58 | <nicolas17> | they won the war |
15:31:10 | <nicolas17> | ...unless |
15:31:12 | <nicolas17> | https://transfer.archivete.am/inline/fve7A/screenshot.png |
15:31:36 | <nicolas17> | unless I get enough people to be bored enough to download the files manually |
15:34:09 | | etnguyen03 quits [Ping timeout: 272 seconds] |
15:34:23 | | eightthree quits [Remote host closed the connection] |
15:35:05 | | etnguyen03 (etnguyen03) joins |
15:41:07 | <bf_> | just use chatgpt api to solve captch |
15:45:53 | <@JAA> | thuban: I don't recall re travis-ci.org. |
15:46:56 | | mgrytbak quits [Client Quit] |
15:47:09 | | mgrytbak joins |
15:47:34 | | Naruyoko5 joins |
15:49:59 | | Naruyoko quits [Ping timeout: 272 seconds] |
15:51:29 | | etnguyen03 quits [Ping timeout: 255 seconds] |
15:51:55 | | mgrytbak quits [Client Quit] |
15:52:07 | | mgrytbak joins |
15:53:16 | | sonick quits [Client Quit] |
15:53:52 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
15:59:12 | | icedice joins |
15:59:16 | | icedice quits [Client Quit] |
16:06:05 | <@JAA> | bf_: So when you `git clone` something, the server prepares a pack of the relevant objects and sends you that. In theory, you could process that pack in memory. There's compression and delta encoding involved, so it gets messy, but if you have enough memory, you don't *need* to write to a file system, I think. |
16:07:18 | <nicolas17> | how big is a linux kernel / webkit / chromium git clone? |
16:07:34 | <@JAA> | A couple GB usually |
16:08:09 | <nicolas17> | then I think you could get away with cloning to a ramdisk in most cases - as long as it's a bare clone, without checking out a working copy |
16:08:40 | <@JAA> | Links about packs: https://git-scm.com/book/en/v2/Git-Internals-Packfiles https://git.kernel.org/pub/scm/git/git.git/tree/Documentation/gitprotocol-pack.txt |
16:09:06 | <@JAA> | Yeah, ramdisk is going to be easier, but it does run steps you don't really need for this purpose, e.g. indexing. |
16:09:37 | <@JAA> | I.e. slower |
16:10:42 | <nicolas17> | if you just want to collect hashes of all blobs, sure; if you want stuff like their path, you would need to traverse commits and trees, and the idx may be necessary then |
16:12:32 | <@JAA> | Yeah |
16:17:05 | <nicolas17> | hmm I think you can't process the pack in a streaming fashion, you would need to *store* it all in memory, because you don't know in advance what objects you need to keep for future use as delta bases |
16:17:33 | | Dango360_ joins |
16:17:48 | <@JAA> | Correct |
16:18:16 | <@JAA> | You can process it front to back, I think, but you may need to seek back to reassemble blobs. |
16:20:23 | | Kitty quits [Ping timeout: 272 seconds] |
16:21:11 | | Dango360 quits [Ping timeout: 255 seconds] |
16:26:27 | | Dango360_ quits [Client Quit] |
16:30:27 | <bf_> | JAA: thx, the ramdisk idea sounds easier :) |
17:06:11 | | f_ quits [Ping timeout: 255 seconds] |
17:10:07 | | Island joins |
17:10:59 | | f_ (funderscore) joins |
17:36:45 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
17:39:26 | | HP_Archivist (HP_Archivist) joins |
17:49:14 | <AramZS> | Heads up that there is a way to bypass the DCIst redirect - https://twitter.com/hannah_recht/status/1761072045280866499?s=20 - you just add `?redirect=no` to the end of the URL, at least for now. |
17:49:15 | <eggdrop> | nitter: https://farside.link/nitter/hannah_recht/status/1761072045280866499 |
17:50:16 | | w4nt3d174 joins |
17:50:27 | | w4nt3d174 quits [Remote host closed the connection] |
17:58:22 | | Darken (Darken) joins |
17:59:48 | <anarcat> | vice is covered? |
18:05:50 | <anarcat> | Vice website is shutting down https://news.ycombinator.com/item?id=39476074 https://writing.exchange/@ernie/111977450241144169 |
18:07:41 | <nicolas17> | yes known |
18:07:47 | <nicolas17> | being worked on, not sure about progress |
18:07:50 | <anarcat> | ack |
18:14:01 | | etnguyen03 (etnguyen03) joins |
18:26:49 | | Maika (Maika) joins |
20:15:38 | | etnguyen03 quits [Ping timeout: 255 seconds] |
20:53:24 | | archivetipforyou joins |
20:54:21 | <archivetipforyou> | Hey. You all probably already know, but, the Vice website (including all of its news reporting) is going to be shut down. Maybe this is an emergency situation worthy of Archive Team's resources to ensure it all is available in the future? Many journalists that I follow are frantically trying to archive their reporting before it is deleted. |
20:55:00 | <@JAA> | Yes, we're already archiving it. |
20:55:17 | <archivetipforyou> | Awesome! |
20:55:21 | <archivetipforyou> | Heroes. |
20:55:27 | <archivetipforyou> | 🫡 |
20:55:48 | <archivetipforyou> | 🏆🎖️ |
20:56:32 | <@JAA> | We archived the main site in full last year. We're rearchiving it currently with ArchiveBot, and we've already archived the articles that have been posted since that run last year. |
20:56:43 | <archivetipforyou> | Wow so it is all safe. |
20:57:14 | <@JAA> | We still need to figure out what to do with the videos, but the main site and its articles should be reasonably safe, yes. All in the Wayback Machine if you'd like to check. |
21:04:04 | <fireonlive> | thanks for checking in archivetipforyou :) |
21:04:12 | <fireonlive> | better to hear more than once than not to |
21:04:17 | <archivetipforyou> | Yeah! |
21:14:20 | | archivetipforyou quits [Remote host closed the connection] |
21:48:18 | <h2ibot> | Petchea edited Tumblr (+184): https://wiki.archiveteam.org/?diff=51767&oldid=51764 |
21:52:06 | | BlueMaxima joins |
21:58:09 | <thuban> | JAA, any thoughts on the linking-to-archived-pages question before i submit this edit? i've left everything in the url template for now, but i'm not very happy with it |
22:04:09 | <@JAA> | thuban: Yeah, I feel like the URL template could use a 'dead' parameter or similar. |
22:04:48 | <@JAA> | And then that'd style the link differently, somehow. |
22:05:13 | <thuban> | would it be feasible to make it point to a specific snapshot date? certainly for ia, but i'm not sure about wcite/.today/memweb |
22:06:01 | <@JAA> | Not sufficiently familiar with those either. .today has a way to do it, I think. But the date would be different for each service, too... |
22:06:20 | | lflare quits [Ping timeout: 255 seconds] |
22:06:36 | <thuban> | yeah, idk if they have a 'nearest' api like wbm does |
22:06:53 | <@JAA> | And if they do, the question is whether we want it. |
22:07:00 | <thuban> | yeah |
22:09:58 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
22:10:20 | <thuban> | i think i'm gonna punt and bare-link these, because if we improve the url template it'll be easier to go back and urlify bare links with dates than to try and remember all the dates for url usages without dates. sound ok? |
22:11:11 | <thuban> | (i think the other benefit of the url template is that we auto-archive it, but of course that's no longer relevant here) |
22:12:20 | | lflare (lflare) joins |
22:15:53 | <thuban> | man, ia is not happy today |
22:16:57 | | Irenes quits [Ping timeout: 272 seconds] |
22:18:03 | <@JAA> | Let's do dead=yes? You can add that already even if the template doesn't support it yet. Just won't have any effect until that's implemented. |
22:19:07 | <thuban> | i'd really like to have a dated link available to the user; many of these references are useless without them |
22:22:38 | <@JAA> | Well, when we add support to the template, we can also add all dead=yes usages without dates to a special category (like the one for broken links) so they're easier to find. |
22:23:01 | <@JAA> | But if you have a suggestion for the date thing, that's also fine. |
22:23:16 | | systwi quits [Remote host closed the connection] |
22:23:24 | <@JAA> | Or maybe we should just have 'wbmurl' etc. as params. Not dates, full URLs. |
22:23:48 | <@JAA> | (It has also long annoyed me that the URL template says 'IA', not 'WBM'.) |
22:25:09 | <nicolas17> | thuban: btw I think you can use non-existent parameters in templates, in case you want to preserve information that the template doesn't *yet* support |
22:25:55 | <thuban> | nicolas17: right, but in that case it's invisible to the reader |
22:26:22 | <thuban> | which for some of these (temporary banners, etc) seems very rude |
22:26:28 | <nicolas17> | ah I see |
22:26:37 | <nicolas17> | got an example page? |
22:27:02 | <@JAA> | 90% of [[Deathwatch]] |
22:29:49 | <@JAA> | thuban: Maybe I can poke the template later tonight. If you have ideas about parameter naming and presentation, please do tell. |
22:29:51 | | Irenes (ireneista) joins |
22:30:04 | <@JAA> | We'd probably want to distinguish generic links from ones to specific snapshots. |
22:30:47 | <fireonlive> | 'Search WBM' vs 'WBM'? |
22:30:49 | <fireonlive> | hm |
22:31:15 | <@JAA> | Needs to be very compact. The URL template already occupies a lot of space. |
22:31:36 | <thuban> | JAA: sounds good! i think i like dead=yes, plus wbmurl=, etc (instead of specific dates), with a fallback to the current generic archive links if they aren't supplied. don't have opinions on styling atm though |
22:32:27 | <thuban> | i'm gonna submit this shortly so i don't lose it, but i pinky promise to fix it whenever url gets updated |
22:33:25 | | lflare quits [Ping timeout: 272 seconds] |
22:35:32 | | lflare (lflare) joins |
22:44:21 | <fireonlive> | hmm |
22:44:24 | <fireonlive> | WBM vs WBM? |
22:44:40 | <thuban> | ?_? |
22:45:05 | <fireonlive> | nah we use ? for explanatory stuff elsewhere..;. |
22:45:13 | <thuban> | oic |
22:45:15 | <fireonlive> | i think w Data |
22:45:24 | <thuban> | we used to but don't anymore |
22:45:24 | <@JAA> | Yeah, info link on the data row. |
22:45:30 | <fireonlive> | ah no that was changed to how ot use |
22:45:39 | <@JAA> | Oh, true |
22:47:38 | <fireonlive> | colour it green! |
22:47:40 | <fireonlive> | lol |
22:48:20 | <fireonlive> | (solely relying on colour isn't recommended though) |
22:48:31 | <@JAA> | Yeah, that crossed my mind earlier. Both thoughts. |
22:50:15 | <nicolas17> | samsung update: I still haven't finished the proper task queue, but I'm manually archiving some items that have multiple files, as it's easier to handle them myself than to make the task-assignment thingy support it |
22:51:36 | <nicolas17> | seems there's 36 items remaining (= not in IA yet) that have multiple files and 900+ that have only one |
23:00:16 | | Arcorann (Arcorann) joins |
23:00:45 | <pabs> | bf_: Software Heritage import all of GitHub, and probably have all of those hashes, check their API docs |
23:01:16 | <pabs> | bf_: I would not say GitHub only contains known-good files :) |
23:02:48 | <pabs> | (SWH's archive contains also all of GitLab, Debian, all public repos from a whole bunch of gitab/gitea/cgit/etc instances) |
23:04:04 | <nicolas17> | pabs: did you get KDE's gitlab yet? |
23:04:43 | <pabs> | I think so, invent.kde.org is on https://archive.softwareheritage.org/coverage/ |
23:05:41 | | Maika quits [Read error: Connection reset by peer] |
23:05:59 | <pabs> | search the issues here to find the request and its results https://gitlab.softwareheritage.org/swh/infra/add-forge-now-requests/-/issues |
23:06:24 | <pabs> | or search the archive for KDE repos if you want to find ones that failed |
23:22:23 | | BlueMaxima quits [Read error: Connection reset by peer] |
23:25:35 | <h2ibot> | Switchnode edited Deathwatch (+5669, update 'dying' items of 2021): https://wiki.archiveteam.org/?diff=51768&oldid=51766 |
23:33:58 | | wickedplayer494 quits [Remote host closed the connection] |
23:34:03 | | Wohlstand (Wohlstand) joins |
23:34:48 | | Overlordz joins |
23:43:39 | | wickedplayer494 joins |
23:44:34 | <fireonlive> | thuban++ |
23:44:35 | <eggdrop> | [karma] 'thuban' now has 4 karma! |
23:44:39 | <h2ibot> | Switchnode edited Deathwatch (+206, fix syntax): https://wiki.archiveteam.org/?diff=51769&oldid=51768 |
23:50:46 | | Ruthalas59 quits [Read error: Connection reset by peer] |
23:51:03 | | Ruthalas59 (Ruthalas) joins |
23:51:10 | | Awkward1 joins |
23:53:56 | <Awkward1> | I'm new so forgive me if I break a rule or if I'm in the wrong place. I was sent here to ask about finding specific songs from The Artist Union archive. |