| 00:05:56 | <schwarzkatz> | any progress on uploadir yet? :D |
| 00:47:52 | <anarcat> | JAA: thanks |
| 01:01:35 | | katocala quits [Remote host closed the connection] |
| 01:02:57 | <fishingforsoup_> | Can all of this be archived? |
| 01:03:02 | <fishingforsoup_> | https://www.reddit.com/r/kpop/comments/zndxgy/traineeas_social_media_channels_including_youtube/ |
| 01:03:16 | <fishingforsoup_> | I never followed the group, but it might be hard to find years down the line. |
| 01:12:03 | | katocala joins |
| 01:12:29 | | katocala is now authenticated as katocala |
| 01:13:49 | | lennier1 quits [Client Quit] |
| 01:14:12 | | lennier1 (lennier1) joins |
| 02:56:59 | | sonick quits [Client Quit] |
| 03:35:10 | | Island quits [Read error: Connection reset by peer] |
| 03:54:24 | <Ryz> | So, out of pure curiosity (and boredom as I archive a bunch of Pastebin URLs via WBM/SPN), I checked out Fiverr and searched the word 'archive'; apparently there are people who offer services to restore websites from the Wayback Machine back into Wordpress Oo; |
| 03:55:21 | <Ryz> | Or just another website |
| 04:08:33 | | treora joins |
| 04:09:52 | | treora quits [Remote host closed the connection] |
| 04:09:54 | | treora joins |
| 04:32:51 | <@OrIdow6> | I did look into that a while ago and as far as I could tell that was used a lot (though not exclusively) by spammers who would buy expiring domains, restore the content from the WBM, throw some ads onto them, and put them online |
| 04:45:47 | | wyatt8750 quits [Ping timeout: 250 seconds] |
| 05:08:42 | | wyatt8740 joins |
| 05:12:04 | | wyatt8750 joins |
| 05:13:14 | | wyatt8740 quits [Ping timeout: 264 seconds] |
| 05:26:27 | | sonick (sonick) joins |
| 05:30:47 | | Hackerpcs (Hackerpcs) joins |
| 05:50:28 | | lukash799 joins |
| 07:18:56 | | programmerq quits [Remote host closed the connection] |
| 07:28:56 | | hitgrr8 joins |
| 07:55:45 | <monika> | do they even bother stripping out the timings+copyright info in the html? 😂 |
| 08:56:42 | | michaelblob_ quits [Read error: Connection reset by peer] |
| 09:19:32 | | michaelblob (michaelblob) joins |
| 10:38:12 | <mgrandi> | Furaffinity forums going read only, closing eventually, uses xenforo: https://forums.furaffinity.net/threads/forum-closure-fa-discord-coming-soon.1682702/ |
| 11:23:22 | | immibis (immibis) joins |
| 11:33:59 | | jacksonchen666 quits [Remote host closed the connection] |
| 11:43:52 | | jacksonchen666 (jacksonchen666) joins |
| 13:31:11 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 13:31:29 | | mut4ntm0nkey (mutantmonkey) joins |
| 13:31:39 | | Arcorann_ quits [Ping timeout: 265 seconds] |
| 14:29:13 | | IDK quits [Client Quit] |
| 14:30:00 | | IDK (IDK) joins |
| 14:45:17 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 14:46:17 | | mut4ntm0nkey (mutantmonkey) joins |
| 15:04:44 | | Wingy quits [Client Quit] |
| 15:05:22 | | Wingy (Wingy) joins |
| 15:15:31 | <qwertyasdfuiopghjkl> | Some relevant-seeming info in that thread: The forums will go read-only on 2023-01-01 and be fully deleted at some time "in the first quarter [of 2023]". ( https://forums.furaffinity.net/threads/forum-closure-fa-discord-coming-soon.1682702/page-5#post-7378226 , |
| 15:15:32 | <qwertyasdfuiopghjkl> | https://forums.furaffinity.net/threads/forum-closure-fa-discord-coming-soon.1682702/page-9#post-7378589 ) |
| 15:15:32 | <qwertyasdfuiopghjkl> | An administrator of the site is looking for a way to "crawl the entirety of the forum" to make a public archive ( https://forums.furaffinity.net/threads/forum-closure-fa-discord-coming-soon.1682702/page-10#post-7378609 , https://forums.furaffinity.net/threads/forum-closure-fa-discord-coming-soon.1682702/page-10#post-7378623 ), so contacting them |
| 15:15:33 | <qwertyasdfuiopghjkl> | might be an option. |
| 15:17:01 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 15:17:07 | | mut4ntm0nkey (mutantmonkey) joins |
| 15:35:38 | | dhrrr joins |
| 15:44:12 | <dhrrr> | Are there any plans for Twitter? While an exhaustive archive would be infeasible, can we at least archive popular tweets (for example, those with at least a certain number of likes)? snscrape allows filtering by like count |
| 15:44:50 | <ivan> | Twitter users can be submitted in #archivebot |
| 15:45:30 | <ivan> | there's a lot of Twitter in WBM |
| 15:46:42 | <dhrrr> | Yes but twitter has also many "main characters of the day", and isolated popular tweets. People who are not notable and that won't by sent to #archivebot. |
| 15:48:29 | <ivan> | if you have methods to make a good list of users, there's a way to submit a lot of users |
| 15:50:36 | | HP_Archivist (HP_Archivist) joins |
| 15:50:37 | <ivan> | you can, for example, scrape follows and analyze graphs of follow relationships |
| 15:50:57 | | Jonimus quits [Ping timeout: 250 seconds] |
| 15:52:51 | <dhrrr> | I'm aware of that. I'm just surprised that there isn't an ArchiveTeam project for dealing with popular tweets regardless of who posted them |
| 16:15:42 | | HP_Archivist quits [Client Quit] |
| 16:20:44 | | dhrrr quits [Remote host closed the connection] |
| 16:38:12 | | atphoenix_ is now known as atphoenix |
| 16:51:59 | | JensRex quits [Client Quit] |
| 16:52:27 | | JensRex (JensRex) joins |
| 16:53:21 | | Jonimus joins |
| 17:19:42 | | Island joins |
| 17:29:44 | | Island_ joins |
| 17:31:55 | | Island quits [Ping timeout: 250 seconds] |
| 17:38:45 | | lukash799 quits [Client Quit] |
| 17:44:41 | | lukash799 joins |
| 18:06:46 | <@OrIdow6> | monika: I suspect but am sure that this is partially responsible for the proliferation of "welcome to the US petabox" (Google it in quotes) pages that A oede noticed a while ago |
| 18:10:25 | <@OrIdow6> | For a more modern example, here's a site that seems to be doing this https://germanyweek.org/ - notice the links about an online casinos at the bottom - and here https://germanyweek.org/program is a page where they've messed up on the crawling and copied a WBM error page |
| 18:10:31 | | jacksonchen666 quits [Ping timeout: 245 seconds] |
| 18:10:39 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 18:11:41 | <@OrIdow6> | And I am being deliberate with putting this in bs instead of ot, I do think this is relevant to resource allocation on smaller sites and thought about doing some kind of writeup about it eventually |
| 18:12:18 | | Megame (Megame) joins |
| 18:13:02 | | mut4ntm0nkey (mutantmonkey) joins |
| 18:16:39 | <@OrIdow6> | Sounds like a good idea qwertyasdfuiopghjkl |
| 18:23:03 | <@OrIdow6> | Anyone want to do it? |
| 18:23:33 | <@OrIdow6> | This group seems to be full of furries so I don't think we should have difficulty establishing rapport at that end |
| 18:34:57 | | Island joins |
| 18:36:02 | | Island_ quits [Ping timeout: 264 seconds] |
| 18:47:07 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 18:47:37 | | mut4ntm0nkey (mutantmonkey) joins |
| 18:49:51 | | spirit joins |
| 19:06:23 | | Island quits [Ping timeout: 250 seconds] |
| 19:13:37 | | Island joins |
| 19:17:43 | | Megame quits [Client Quit] |
| 19:20:26 | | Island quits [Ping timeout: 264 seconds] |
| 19:35:36 | <mgrandi> | I'll pm the mod, see if I can start the conversation |
| 19:36:37 | <@OrIdow6> | Ok |
| 19:39:56 | | Island joins |
| 19:41:12 | <mgrandi> | I also have experimented in archiving the main FA content, they don't have any rate limiting or anything that I can see, it is behind cloudflare though |
| 19:41:59 | | Island_ joins |
| 19:44:47 | | Island quits [Ping timeout: 265 seconds] |
| 19:51:47 | <@OrIdow6> | We ran a project for that half a decade ago or so I believe |
| 19:52:55 | | Island joins |
| 19:53:55 | <@JAA> | Oh wow, that project used wpull, interesting. |
| 19:54:27 | | Island_ quits [Ping timeout: 265 seconds] |
| 20:01:35 | | Island_ joins |
| 20:02:30 | <@OrIdow6> | Sounds like Python version misery to me |
| 20:04:50 | | Island quits [Ping timeout: 264 seconds] |
| 20:07:02 | | hitgrr8 quits [Client Quit] |
| 20:07:24 | | lukash799 quits [Client Quit] |
| 20:13:25 | | lukash799 joins |
| 20:36:13 | <mgrandi> | Yeah, I used wpull for that as well, for my own infrastructure set up, it's pretty easy, once you have the cloudflare key, I need to set up some ignore rules and then rerun it to get the media |
| 20:50:04 | | godane (godane) joins |
| 21:09:12 | <@OrIdow6> | Depending on the forum thing goes it may raise the possibility of doing that without risking antagonizing them |
| 21:39:45 | <schwarzkatz> | always cool to see site owners wanting to help archive their sites. I have a good feeling about this FA project. |
| 21:41:30 | | jacksonchen666 (jacksonchen666) joins |
| 22:50:31 | | sec^nd quits [Ping timeout: 245 seconds] |
| 22:55:10 | | sec^nd (second) joins |
| 23:09:29 | <@arkiver> | qwertyasdfuiopghjkl: please add it to deathwatch |
| 23:28:05 | | wyatt8750 quits [Ping timeout: 265 seconds] |
| 23:28:32 | | wyatt8740 joins |
| 23:30:26 | | jacksonchen666 quits [Remote host closed the connection] |
| 23:50:36 | | jacksonchen666 (jacksonchen666) joins |