| 00:06:59 | <TheTechRobo> | Do we have a good system in place for archiving websocket stuff? |
| 00:07:05 | <TheTechRobo> | https://place.gd is the site I'm referring to |
| 00:10:45 | <TheTechRobo> | ...It's firebase. (Or at least it looks like it.) |
| 00:16:09 | <@JAA> | We don't, and WARC doesn't even support it at all. Best bet might be a tcpdump-ish thing with a browser (and dumping TLS keys). |
| 00:16:50 | <TheTechRobo> | (Also, has heretrix3 been audited?) |
| 00:17:56 | <@JAA> | mitmdump (from mitmproxy) might be another option, but I don't know if it actually dumps all the information required (cf. transfer encoding). |
| 00:18:24 | <ivan> | SingleFile the rendered page |
| 00:18:47 | <ivan> | or pull JavaScript representation out of memory and reconstruct it with custom software lol |
| 00:18:56 | <TheTechRobo> | ivan: But it's currently still live... |
| 00:19:09 | <ivan> | SingleFile it every hour |
| 00:20:16 | <ivan> | I guess you could just pipe the websocket messages to a file and deal with it later somehow |
| 00:40:13 | | jacobk joins |
| 01:02:48 | | BlueMaxima_ joins |
| 01:03:38 | | BlueMaxima quits [Remote host closed the connection] |
| 01:03:38 | | XanaAdmin quits [Client Quit] |
| 01:10:48 | | Arcorann (Arcorann) joins |
| 01:12:28 | | sonick quits [Client Quit] |
| 01:17:41 | | Celluloid joins |
| 01:18:35 | | Celluloid quits [Remote host closed the connection] |
| 01:49:21 | <TheTechRobo> | Is there a good warc library for java? |
| 01:49:32 | <TheTechRobo> | Does Heretrix export its warc stuff? |
| 02:01:19 | | cascode joins |
| 02:02:29 | | tomorrowRemoval joins |
| 02:02:38 | <tomorrowRemoval> | god, the ship is sinking fast https://twitter.com/jasonbaumgartne/status/1593573576346517504 |
| 02:06:48 | <joepie91|m> | tomorrowRemoval: there is a mastodon instance exclusively for former Twitter employees, macaw.social, and it currently might actually have more Twitter employees than Twitter does |
| 02:06:55 | <joepie91|m> | for some further illustration on how things are going |
| 02:07:33 | <tomorrowRemoval> | are we... too late? |
| 02:07:35 | <tomorrowRemoval> | good god. |
| 02:07:52 | <joepie91|m> | well, your provisional deadline is Sunday |
| 02:07:55 | <joepie91|m> | that is when the world cup starts |
| 02:08:11 | <joepie91|m> | and therefore likely when the whole thing will blow up |
| 02:08:17 | <andrew> | someone please suggest a good Fediverse instance |
| 02:10:09 | <joepie91|m> | andrew: highly personal thing IMO, fedi instances tend to be strongly based around community. my recommendation would be to just pick one from https://joinmastodon.org/servers (ideally not a massive one), test the waters, and hop around until you find one that suits you |
| 02:10:26 | <joepie91|m> | account migration is pretty easy |
| 02:10:49 | <joepie91|m> | (more about this should probably go in -ot) |
| 02:11:12 | <andrew> | infosec.exchange looks fun :) |
| 02:11:43 | <joepie91|m> | seems to be a pretty okay instance from what I've seen |
| 02:12:17 | <joepie91|m> | do make sure to read the rules (https://infosec.exchange/about) because fedi is very different from twitter |
| 02:12:30 | <joepie91|m> | also https://gist.github.com/joepie91/f924e846c24ec7ed82d6d554a7e7c9a8 may be helpful |
| 02:30:57 | <madpro|m> | Never thought I'd see the day #archiveteam-bs would be talking about mastodon instances |
| 02:31:02 | <madpro|m> | Yet here we are |
| 02:31:42 | <madpro|m> | I'm conflicted as to whether or not I should rain on the parade by mentioning the "auto-delete" feature which can be instance-wide or turned on by individual users |
| 02:32:02 | <madpro|m> | https://scholar.social/@Em0nM4stodon@infosec.exchange/109367449990292160 |
| 02:32:29 | <joepie91|m> | there's good reason for that to exist, and folks here should be aware that scraping is generally not welcomed on fedi in most places, and that the privacy expectations/dynamics are very different from twitter |
| 02:32:49 | <madpro|m> | > scraping is not welcomed on fedi |
| 02:32:51 | <madpro|m> | That too |
| 02:33:36 | <joepie91|m> | that doesn't mean there's never a reason to archive anything of course (think the usual "politician says something" kind of rationale) but don't be the guy trying to archive "the fediverse" basically :) |
| 02:34:00 | <madpro|m> | I'm just saying, we have a forum-watch. And as ineffective as that is |
| 02:34:14 | <madpro|m> | Mastodon as it continues to grow will demand a far more nuanced approach |
| 02:34:32 | <madpro|m> | There are already people furious at AT for the pettiest of reasons |
| 02:34:49 | | S55 joins |
| 02:34:53 | <joepie91|m> | I'm mentioning this mainly because a lot of news outlets are erroneously reporting on Mastodon as a "Twitter alternative" and there have already been entirely too many people assuming that the social norms are the same |
| 02:35:14 | <madpro|m> | And when you have this moral burden of "Twitter taught us that some things are more accountable from others" |
| 02:35:16 | <joepie91|m> | or that everything on fedi not behind a login is meant to be publicized to the world, for example |
| 02:35:34 | <madpro|m> | * meant to be more accountable |
| 02:36:52 | <madpro|m> | Volunteer Archiving will be further marginalized. |
| 02:37:19 | <madpro|m> | already, I find people asking "If these are volunteers, who do they volunteer for?" |
| 02:39:13 | <madpro|m> | I think that's all I have to say, if I go on I will digress into personal grievances. |
| 02:44:52 | <h2ibot> | Themadprogramer edited Mastodon (+250, Added summary on automatic post-deletion features): https://wiki.archiveteam.org/?diff=49164&oldid=45749 |
| 02:45:52 | <h2ibot> | Themadprogramer edited Mastodon (-6, Added summary on automatic post-deletion features): https://wiki.archiveteam.org/?diff=49165&oldid=49164 |
| 02:45:59 | | S55 quits [Remote host closed the connection] |
| 03:16:45 | | pabs quits [Ping timeout: 276 seconds] |
| 03:17:25 | | cascode quits [Remote host closed the connection] |
| 03:17:25 | | tomorrowRemoval quits [Remote host closed the connection] |
| 03:26:47 | | pabs (pabs) joins |
| 03:27:20 | | Lord_Nightmare2 (Lord_Nightmare) joins |
| 03:27:42 | | jacobk quits [Client Quit] |
| 03:27:42 | | Lord_Nightmare quits [Remote host closed the connection] |
| 03:27:43 | | Lord_Nightmare2 is now known as Lord_Nightmare |
| 03:28:23 | | jacobk joins |
| 04:11:04 | | AnotherIki joins |
| 04:15:11 | | Iki1 quits [Ping timeout: 268 seconds] |
| 04:23:03 | | wyatt8750 joins |
| 04:24:21 | | wyatt8740 quits [Ping timeout: 276 seconds] |
| 04:33:37 | | BlueMaxima_ quits [Client Quit] |
| 04:51:38 | | cascode joins |
| 04:51:57 | | Pichu0102 is now authenticated as Pichu0102 |
| 04:52:10 | | Pichu0102 quits [Remote host closed the connection] |
| 04:52:28 | | Pichu0102 joins |
| 04:52:28 | | Pichu0102 is now authenticated as Pichu0102 |
| 04:53:08 | | Pichu0102 quits [Remote host closed the connection] |
| 04:53:40 | | Pichu0102 joins |
| 04:53:41 | | Pichu0102 is now authenticated as Pichu0102 |
| 04:53:47 | | atphoenix_ is now known as atphoenix |
| 05:02:05 | <atphoenix> | thought of the evening: seems to me that the Twitter shortener (t.co) could be at particular risk |
| 05:03:26 | <andrew> | t.co links are usually used from tweets though and the canonical URL is included in the API JSON |
| 05:03:32 | <andrew> | this could be an issue with archive sites though |
| 05:07:46 | | Iki1 joins |
| 05:11:48 | | AnotherIki quits [Ping timeout: 276 seconds] |
| 05:14:17 | <atphoenix> | the issue is with the unshortening. Yes #urlteam is a thing. But I doubt all the existing t.co URLs have been resolved into their real URLs. And the remaining ones depend on t.co remaining working. In short, the Terror of Tiny Town is real. |
| 05:18:18 | | pabs quits [Ping timeout: 276 seconds] |
| 05:20:31 | | pabs (pabs) joins |
| 05:26:32 | | AnotherIki joins |
| 05:26:41 | | sdffds joins |
| 05:27:01 | | sdffds quits [Remote host closed the connection] |
| 05:29:37 | | Iki1 quits [Ping timeout: 265 seconds] |
| 05:41:29 | | jacob joins |
| 05:42:11 | | jacob quits [Remote host closed the connection] |
| 05:46:38 | | AnotherIki quits [Remote host closed the connection] |
| 05:46:48 | | AnotherIki joins |
| 05:57:01 | | Iki1 joins |
| 05:59:11 | | AnotherIki quits [Remote host closed the connection] |
| 05:59:11 | | jacobk quits [Client Quit] |
| 05:59:11 | | cascode quits [Client Quit] |
| 06:00:05 | | jacobk joins |
| 06:32:45 | | qwertyasdfuiopghjkl joins |
| 06:51:48 | | Island quits [Read error: Connection reset by peer] |
| 06:52:54 | | wyatt8750 quits [Remote host closed the connection] |
| 07:11:35 | <tech234a> | Mastodon instance that is in the process of closing, not sure if it should be archived but I figure it's worth mentioning https://cybre.space/ |
| 07:15:41 | | wyatt8740 joins |
| 07:18:02 | | wyatt8740 quits [Read error: Connection reset by peer] |
| 07:19:38 | | wyatt8740 joins |
| 07:22:10 | <@JAA> | Since it'll keep coming up now that there's far more attention on Mastodon, I'll repeat it again: we don't archive instances without explicit permission by the instance owner(s). |
| 07:29:01 | <JTL> | Reasonable take, but if someone else (not here) decided to go rogue and start scraping shit enmasse color me not surprised. |
| 07:32:52 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 07:36:37 | | wyatt8740 joins |
| 07:41:38 | | wyatt8750 joins |
| 07:43:15 | | wyatt8740 quits [Ping timeout: 276 seconds] |
| 08:01:02 | <@OrIdow6^2> | sweb on track Sanqui? |
| 08:04:23 | <@OrIdow6^2> | TheTechRobo: Is anything happening to place.gd? |
| 08:27:17 | <@OrIdow6^2> | Looking into webry |
| 08:28:22 | | wyatt8740 joins |
| 08:29:56 | | wyatt8750 quits [Client Quit] |
| 08:29:56 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 08:38:30 | | tzt quits [Ping timeout: 268 seconds] |
| 08:45:06 | | qwertyasdfuiopghjkl joins |
| 08:57:18 | <@Sanqui> | OrIdow6^2: The ~120k domains I know about are scraped and I'm working on extracting some more. Contributions welcome |
| 08:57:27 | <@Sanqui> | by scraped I mean, done by ArchiveBot |
| 08:57:58 | <@Sanqui> | I want to process a few warcs to extract links but I haven't had the chance yet |
| 10:40:49 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 10:52:34 | | tech_exorcist (tech_exorcist) joins |
| 10:56:21 | | tech_exorcist quits [Remote host closed the connection] |
| 10:57:12 | | tech_exorcist (tech_exorcist) joins |
| 11:01:38 | | tech_exorcist quits [Read error: Connection reset by peer] |
| 11:01:59 | | tech_exorcist (tech_exorcist) joins |
| 11:33:18 | | tech_exorcist quits [Ping timeout: 255 seconds] |
| 11:36:02 | | tech_exorcist (tech_exorcist) joins |
| 11:41:12 | | tech_exorcist quits [Remote host closed the connection] |
| 11:46:13 | | qwertyasdfuiopghjkl joins |
| 12:22:43 | | tech_exorcist (tech_exorcist) joins |
| 12:29:20 | | tech_exorcist quits [Remote host closed the connection] |
| 12:29:38 | | tech_exorcist (tech_exorcist) joins |
| 12:56:43 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 13:15:28 | | tech_exorcist quits [Remote host closed the connection] |
| 13:16:22 | | tech_exorcist (tech_exorcist) joins |
| 13:21:31 | | tech_exorcist quits [Remote host closed the connection] |
| 13:22:09 | | tech_exorcist (tech_exorcist) joins |
| 13:23:12 | | Arcorann quits [Ping timeout: 276 seconds] |
| 13:30:32 | | qwertyasdfuiopghjkl joins |
| 13:31:29 | | Larsenv quits [Quit: ZNC 1.8.2+deb2build5 - https://znc.in] |
| 13:41:24 | | pabs quits [Ping timeout: 276 seconds] |
| 13:43:13 | | mistersheeple joins |
| 13:44:08 | | pabs (pabs) joins |
| 13:46:10 | <mistersheeple> | is there anyone working on a solution for archiving twitter during these trying times? |
| 13:51:40 | | tech_exorcist quits [Remote host closed the connection] |
| 13:52:14 | | tech_exorcist (tech_exorcist) joins |
| 14:31:08 | <TheTechRobo> | OrIdow6^2: No but it's a new experiment like r/place |
| 14:31:17 | <TheTechRobo> | When it's done the finished level will be uploaded to the servers |
| 14:31:25 | <TheTechRobo> | Absolutely no idea if the entire history will be preserved |
| 14:31:57 | <TheTechRobo> | mistersheeple: You can request individual accounts or hashtags in #archivebot |
| 14:36:07 | | sonick (sonick) joins |
| 14:48:43 | | march_happy (march_happy) joins |
| 14:53:26 | | yawkat` quits [Ping timeout: 268 seconds] |
| 15:15:15 | <IDK> | TheTechRobo: tbh, that won't really be that scalable |
| 15:15:31 | <IDK> | for both current and future |
| 15:15:37 | <TheTechRobo> | No, but it's all we've currently got. |
| 15:18:01 | <IDK> | Hopefully there will be a twitter project in the near future, since this is the most requested project rn |
| 15:19:07 | | tech_exorcist quits [Client Quit] |
| 15:19:36 | <@arkiver> | tech234a: yeah there's another mastodon closing soon as well |
| 15:19:51 | <@arkiver> | mastodon.technology |
| 15:29:57 | | jacobk quits [Ping timeout: 276 seconds] |
| 15:29:59 | | yawkat (yawkat) joins |
| 15:39:58 | | tech_exorcist (tech_exorcist) joins |
| 15:54:06 | <IDK> | TheTechRobo: "Please try again in ~600 min. Crawling this host is paused because they notified us that they are overloaded right now.", my first time seeing this while doing twitter on SPN, is this a new limitation |
| 15:54:31 | <IDK> | if yes will this affect socialbot |
| 15:59:53 | | tech_exorcist quits [Client Quit] |
| 16:01:01 | <h2ibot> | JAABot edited CurrentWarriorProject (+4): https://wiki.archiveteam.org/?diff=49166&oldid=49163 |
| 16:09:10 | | tech_exorcist (tech_exorcist) joins |
| 16:12:16 | | tech_exorcist quits [Remote host closed the connection] |
| 16:13:14 | | tech_exorcist (tech_exorcist) joins |
| 16:25:33 | | tech_exorcist quits [Client Quit] |
| 16:26:21 | | tech_exorcist (tech_exorcist) joins |
| 16:45:07 | <@arkiver> | for very important stuff, use archivebot and don't rely on SPN |
| 16:48:10 | | HP_Archivist (HP_Archivist) joins |
| 17:00:18 | | march_happy quits [Ping timeout: 265 seconds] |
| 17:26:53 | | holbrooke quits [Ping timeout: 265 seconds] |
| 17:27:50 | | user23436436 joins |
| 17:28:19 | | user23436436 quits [Remote host closed the connection] |
| 17:38:08 | | tech_exorcist quits [Client Quit] |
| 17:38:15 | <TheTechRobo> | <IDK> TheTechRobo: "Please try again in ~600 min. Crawling this host is paused because they notified us that they are overloaded right now.", my first time seeing this while doing twitter on SPN, is this a new limitation |
| 17:38:15 | <TheTechRobo> | No |
| 17:38:21 | <TheTechRobo> | It's automatic based on 429s etc |
| 17:38:40 | <TheTechRobo> | And no, it should not affect socialbot unless socialbot uses SPN |
| 17:38:46 | <TheTechRobo> | This only affects SPN |
| 17:39:39 | <IDK> | I mean, I havent seen 429s even with the 35 minutes queue |
| 17:40:04 | | tech_exorcist (tech_exorcist) joins |
| 17:40:40 | | HP_Archivist quits [Client Quit] |
| 17:44:29 | <@JAA> | Also, WBM/SPN stuff can go to #internetarchive. |
| 18:29:27 | | tech_exorcist quits [Client Quit] |
| 18:30:40 | | tzt (tzt) joins |
| 18:41:11 | | mistersheeple quits [Client Quit] |
| 18:46:57 | <betamax> | I've been asked by someone who has a list of tweets (some with media) and they want to have a local copy |
| 18:47:35 | <betamax> | I tried wget (with "--convert-links --page-requisites") but it failed to download anything but the HTML, and that didn't display correctly |
| 18:47:43 | <betamax> | Is there an easy way to do this? |
| 18:56:01 | | Larsenv_ (Larsenv) joins |
| 19:00:29 | <ivan> | betamax: I will PM you because of a thing that is not entirely public |
| 20:47:50 | <h2ibot> | Tech234a edited Mastodon (+202, /* Dead and dying instances */…): https://wiki.archiveteam.org/?diff=49167&oldid=49165 |
| 20:57:07 | | njha joins |
| 20:59:14 | <njha> | Hi! Berkeley is shutting down student use of *.berkeley.edu domains. Currently active groups will be moved to a different domain, but I think the plan is to just take down any inactive groups. There are definitely things of historical significance (see https://xcf.berkeley.edu/ for one example, although this page was built somewhat recently). |
| 21:00:11 | <njha> | I have the list of subdomains potentially being removed, if anyone is interested (1sec, let me upload this somewhere). |
| 21:00:28 | <@arkiver> | awesome thanks njha |
| 21:00:32 | <@arkiver> | we'll get them |
| 21:03:29 | <@JAA> | njha: You can upload that to https://transfer.archivete.am/ |
| 21:06:25 | <@arkiver> | oh yes |
| 21:07:21 | | BlueMaxima joins |
| 21:08:27 | <njha> | oh that's convenient |
| 21:08:35 | <njha> | i saw that too late so I ended up just putting it here: http://quarantine.ocf.berkeley.edu/ocfhosted_berkeley.edu |
| 21:09:09 | <@arkiver> | that may be small enough for ArchiveBot |
| 21:09:28 | <njha> | yeah it's not a ton, only 700 sites or so |
| 21:13:10 | <njha> | I operate the webserver vhosting all these sites so I also have a snapshot of the source code (for php/django sites) and databases but I can't make those public for obvious reasons. |
| 21:13:58 | <@arkiver> | yeah, we'll just get the public data |
| 21:40:43 | <@JAA> | Sounds good, will queue that to ArchiveBot later. |
| 21:43:08 | | wyatt8750 joins |
| 21:43:42 | | wyatt8740 quits [Ping timeout: 276 seconds] |
| 21:46:35 | | wyatt8750 quits [Client Quit] |
| 21:47:59 | | wyatt8740 joins |
| 21:50:11 | | march_happy (march_happy) joins |
| 22:24:28 | | Arcorann (Arcorann) joins |
| 22:26:42 | | wyatt8740 quits [Client Quit] |
| 22:26:57 | | wyatt8740 joins |
| 22:35:28 | <@arkiver> | JAA: according to https://wiki.archiveteam.org/index.php/Sweb.cz we archived the 'subdomainfinder' sweb.cz URLs. i got 13352 subdomains for sweb.cz here |
| 22:35:53 | <@arkiver> | many may be dead |
| 22:36:42 | <@arkiver> | transfer.archivete.am/GdQK5/sweb.cz.txt |
| 22:37:04 | <@arkiver> | that is significantly more |
| 22:37:10 | <@arkiver> | is that too much for ArchiveBot to handle? |
| 22:37:51 | | katocala quits [Remote host closed the connection] |
| 23:00:57 | | Orange635252 joins |
| 23:25:53 | | Island joins |