| 00:03:42 | <ROpdebee> | I've uploaded two of the dumps: https://archive.org/details/kodi.tv-20210313 and https://archive.org/details/rescene.wikidot.com-20210315 let me know if there's anything missing. would be great if someone could move those into the appropriate collection for injection into the wayback machine. i've got some other dumps, but they're larger and will |
| 00:03:42 | <ROpdebee> | take a bit more time |
| 00:15:50 | | atphoenix quits [Remote host closed the connection] |
| 00:17:02 | | atphoenix (atphoenix) joins |
| 00:17:17 | | atphoenix quits [Remote host closed the connection] |
| 00:18:23 | | atphoenix (atphoenix) joins |
| 00:24:12 | | Stiletto quits [Ping timeout: 250 seconds] |
| 01:21:42 | | treora quits [Ping timeout: 264 seconds] |
| 01:22:09 | | treora joins |
| 01:35:08 | | Stiletto joins |
| 02:20:49 | | Atom__ joins |
| 02:24:06 | | Atom-- quits [Ping timeout: 264 seconds] |
| 02:39:42 | | Wayward quits [Ping timeout: 264 seconds] |
| 03:04:19 | | Atom-- joins |
| 03:07:54 | | Atom__ quits [Ping timeout: 264 seconds] |
| 03:27:47 | | Mineroboter joins |
| 03:29:30 | | Atom-- quits [Ping timeout: 264 seconds] |
| 03:29:40 | | Mineroboter_ quits [Ping timeout: 250 seconds] |
| 03:48:40 | | nerdguy1138 (nerdguy1138) joins |
| 04:11:48 | | DogsRNice quits [Read error: Connection reset by peer] |
| 04:12:04 | | lennier1 quits [Remote host closed the connection] |
| 04:21:30 | | Wayward (wayward) joins |
| 04:25:32 | | qw3rty__ joins |
| 04:29:31 | | qw3rty_ quits [Ping timeout: 264 seconds] |
| 04:32:02 | | programmerq quits [Remote host closed the connection] |
| 04:39:31 | <tech234a> | HTTPS certificate for www.archiveteam.org expired, now most project icons on the tracker don't load |
| 04:39:56 | <tech234a> | The certificate for archiveteam.org also expired |
| 04:40:40 | <tech234a> | Side note: the Pastebin tracker link uses HTTP while all of the other projects use HTTPS |
| 04:40:54 | <@JAA> | jrwr: ^ |
| 04:41:44 | <@JAA> | tech234a: Fixed the Pastebin link. |
| 04:43:31 | | Atom joins |
| 04:44:04 | <tech234a> | Thanks! |
| 04:48:08 | <@JAA> | jrwr: internetarchive.archiveteam.org as well |
| 04:55:09 | | lennier1 (lennier1) joins |
| 05:00:47 | | treora quits [Client Quit] |
| 05:02:15 | | treora joins |
| 05:16:04 | <Ryz> | Heya folks, I need help on finding the remaining usernames under http://www.afn.org/ - because according to the 'Tuesday, February 26, 2019' entry, they play to purge inactive user accounts, and I'm not sure if it happened or not |
| 05:16:22 | <Ryz> | I only managed to find a few of them by checking https://old.reddit.com/domain/afn.org/ |
| 05:17:03 | <Ryz> | Some of the websites I archived on AB were around in-between late 1990s to early 2000s in terms of the time period |
| 05:19:49 | <tech234a> | https://www.google.com/search?q=site:http://www.afn.org/ has a fair number of results |
| 05:20:26 | <tech234a> | https://www.bing.com/search?q=site:http://www.afn.org/ also |
| 05:21:47 | <Ryz> | Ooo, nice; need more sources though; would ideally wanna go for an archiving run in one swoop and not worry about potential duplicates down the line when doing another one~ |
| 05:24:29 | <@JAA> | Lots of stuff in the WBM, so some CDX API queries may be useful. |
| 05:25:37 | <Ryz> | Welp, checking Google, some domains can only be found via searching 'Images' and not just 'Web' ><; |
| 05:37:48 | <tech234a> | Here's a list of all sites with a homepage, last updated May 14, 1999 http://web.archive.org/web/20020609152621/http://www.afn.org/users/ |
| 05:38:38 | <tech234a> | 2,251 users |
| 05:42:43 | <Jake> | good find! |
| 05:48:58 | <Ryz> | Hmm, can anyone do a scrape of all the links in http://www.afn.org/users/ in WBM before http://web.archive.org/web/20020609152621/http://www.afn.org/users/ ? Since it stopped updating in 1999 May before the list got nerfed heavily |
| 05:49:16 | <Ryz> | I grabbed everything from Google, Bing, and the WBM user list, |
| 05:50:39 | <Ryz> | But I ask for all the iterations before that specific time because it's possible some userpages only existed existed for a certain amount of time before it disappears, thus removed from the list |
| 05:55:03 | <Ryz> | For instance, in https://web.archive.org/web/19961219062107/http://www.afn.org/users/ - only something like https://web.archive.org/web/19961219065637/http://www.afn.org/~afn04887/ can be found |
| 05:55:16 | <Ryz> | Whereas https://web.archive.org/web/20020609152621/http://www.afn.org/users/ (the final version of the list before nerfing) - it doesn't have it anymore |
| 05:56:17 | <tech234a> | Some user creation logs (has some holes in the dates): http://web.archive.org/web/20000819093808/http://www.afn.org/accounts/created/ http://web.archive.org/web/20070609185203/http://www.afn.org/accounts/created/ |
| 05:59:23 | <tech234a> | Possible account management section(?): http://www.afn.org/~am/, used to have subdirectories admin and password, now all forbidden |
| 05:59:55 | <tech234a> | SSH server at ssh.afn.org |
| 06:12:32 | <Ryz> | Hmm, I guess I can iterate up to a certain number for those account creations~ |
| 06:22:08 | <Ryz> | !archive https://doommonsters.tumblr.com/ --explain "Inactive since 2018 December" --ignoreset singletumblr |
| 06:22:14 | <Ryz> | Oops |
| 06:34:08 | <tech234a> | Perhaps it might make sense to try to reach out to the admin for AFN? They might be willing to help provide a list of URLs. |
| 07:36:43 | | sec^nd quits [*.net *.split] |
| 07:36:43 | | hexa- quits [*.net *.split] |
| 07:36:44 | | @ChanServ quits [*.net *.split] |
| 07:36:44 | | Sylirana quits [*.net *.split] |
| 07:36:44 | | nico_32_ quits [*.net *.split] |
| 07:36:44 | | mgrandi quits [*.net *.split] |
| 07:36:44 | | EggplantN2 quits [*.net *.split] |
| 07:36:45 | | Kenshin quits [*.net *.split] |
| 07:36:45 | | FalconK quits [*.net *.split] |
| 07:36:45 | | atomicthumbs quits [*.net *.split] |
| 07:36:45 | | summerisle quits [*.net *.split] |
| 07:36:45 | | Jon quits [*.net *.split] |
| 07:36:45 | | MrKen quits [*.net *.split] |
| 07:36:46 | | phuzion quits [*.net *.split] |
| 07:36:46 | | Kaz__ quits [*.net *.split] |
| 07:36:46 | | @hook54321 quits [*.net *.split] |
| 07:36:46 | | simon816 quits [*.net *.split] |
| 07:36:46 | | lennier1 quits [*.net *.split] |
| 07:36:46 | | Mineroboter quits [*.net *.split] |
| 07:36:46 | | Mateon1 quits [*.net *.split] |
| 07:36:47 | | wessel1512 quits [*.net *.split] |
| 07:36:47 | | Arcorann quits [*.net *.split] |
| 07:36:47 | | ROpdebee quits [*.net *.split] |
| 07:36:47 | | no112 quits [*.net *.split] |
| 07:36:47 | | nertzy quits [*.net *.split] |
| 07:36:47 | | tzt quits [*.net *.split] |
| 07:36:47 | | Zerote__ quits [*.net *.split] |
| 07:36:47 | | @arkiver quits [*.net *.split] |
| 07:36:47 | | wickedplayer494 quits [*.net *.split] |
| 07:36:47 | | @JAA quits [*.net *.split] |
| 07:36:47 | | seatsea quits [*.net *.split] |
| 07:36:48 | | flashfire42 quits [*.net *.split] |
| 07:36:48 | | onetruth quits [*.net *.split] |
| 07:36:48 | | Hackerpcs quits [*.net *.split] |
| 07:36:48 | | Eighty quits [*.net *.split] |
| 07:36:48 | | fuzzy8021 quits [*.net *.split] |
| 07:36:48 | | Craigle quits [*.net *.split] |
| 07:36:48 | | @dxrt quits [*.net *.split] |
| 07:36:48 | | lunik1 quits [*.net *.split] |
| 07:36:48 | | marked195734 quits [*.net *.split] |
| 07:36:49 | | mjh quits [*.net *.split] |
| 07:36:49 | | tech234a quits [*.net *.split] |
| 07:36:49 | | Ajay1 quits [*.net *.split] |
| 07:36:49 | | @Fusl_ quits [*.net *.split] |
| 07:36:49 | | girst_ quits [*.net *.split] |
| 07:36:49 | | JSharp quits [*.net *.split] |
| 07:36:49 | | devsnek quits [*.net *.split] |
| 07:36:49 | | deni quits [*.net *.split] |
| 07:36:50 | | Lord_Nightmare quits [*.net *.split] |
| 07:36:50 | | maxfan8 quits [*.net *.split] |
| 07:36:50 | | Larsenv quits [*.net *.split] |
| 07:36:50 | | Trieste quits [*.net *.split] |
| 07:36:50 | | DopefishJustin quits [*.net *.split] |
| 07:36:50 | | aarchi quits [*.net *.split] |
| 07:36:50 | | SCSi quits [*.net *.split] |
| 07:36:51 | | linuxgemini quits [*.net *.split] |
| 07:36:51 | | Muad-Dib quits [*.net *.split] |
| 07:36:51 | | billy549 quits [*.net *.split] |
| 07:36:51 | | ThreeHeadedMonkey quits [*.net *.split] |
| 07:36:51 | | rewby quits [*.net *.split] |
| 07:36:51 | | AK quits [*.net *.split] |
| 07:36:51 | | mazet quits [*.net *.split] |
| 07:36:51 | | skrzyp quits [*.net *.split] |
| 07:36:52 | | BPCZ quits [*.net *.split] |
| 07:36:53 | | themadpro quits [*.net *.split] |
| 07:36:53 | | brad quits [*.net *.split] |
| 07:36:53 | | @Fusl quits [*.net *.split] |
| 07:36:53 | | @Kaz quits [*.net *.split] |
| 07:36:53 | | fionera quits [*.net *.split] |
| 07:36:53 | | janpaul123 quits [*.net *.split] |
| 07:36:53 | | Jonimus quits [*.net *.split] |
| 07:36:53 | | @jrwr quits [*.net *.split] |
| 07:36:54 | | sknebel quits [*.net *.split] |
| 07:36:54 | | cadence quits [*.net *.split] |
| 07:36:54 | | kallsyms quits [*.net *.split] |
| 07:36:54 | | colona quits [*.net *.split] |
| 07:36:54 | | HotSwap quits [*.net *.split] |
| 07:36:54 | | @AlsoJAA quits [*.net *.split] |
| 07:36:54 | | luckcolors quits [*.net *.split] |
| 07:37:07 | | sec^nd (second) joins |
| 07:37:07 | | hexa- (hexa-) joins |
| 07:37:07 | | ChanServ joins |
| 07:37:07 | | guybrush.hackint.org sets mode: +o ChanServ |
| 07:38:19 | | lennier1 (lennier1) joins |
| 07:38:19 | | Mineroboter joins |
| 07:38:19 | | Mateon1 joins |
| 07:38:19 | | wessel1512 joins |
| 07:38:19 | | Arcorann (Arcorann) joins |
| 07:38:19 | | ROpdebee (ROpdebee) joins |
| 07:38:19 | | no112 joins |
| 07:38:19 | | nertzy (nertzy) joins |
| 07:38:19 | | tzt joins |
| 07:38:19 | | Zerote__ joins |
| 07:38:19 | | arkiver (arkiver) joins |
| 07:38:19 | | wickedplayer494 (wickedplayer494) joins |
| 07:38:19 | | JAA (JAA) joins |
| 07:38:19 | | seatsea joins |
| 07:38:19 | | flashfire42 (flashfire42) joins |
| 07:38:19 | | onetruth joins |
| 07:38:19 | | Hackerpcs (Hackerpcs) joins |
| 07:38:19 | | Eighty (Eighty) joins |
| 07:38:19 | | fuzzy8021 (fuzzy8021) joins |
| 07:38:19 | | Craigle (Craigle) joins |
| 07:38:19 | | dxrt (dxrt) joins |
| 07:38:19 | | lunik1 joins |
| 07:38:19 | | marked195734 joins |
| 07:38:19 | | mjh (mjh) joins |
| 07:38:19 | | tech234a (tech234a) joins |
| 07:38:19 | | Ajay1 joins |
| 07:38:19 | | Fusl_ (Fusl) joins |
| 07:38:19 | | guybrush.hackint.org sets mode: +oooo arkiver JAA dxrt Fusl_ |
| 07:38:19 | | girst_ (girst) joins |
| 07:38:19 | | JSharp (JSharp) joins |
| 07:38:19 | | devsnek (devsnek) joins |
| 07:38:19 | | deni (deni) joins |
| 07:38:19 | | Lord_Nightmare (Lord_Nightmare) joins |
| 07:38:19 | | maxfan8 (maxfan8) joins |
| 07:38:19 | | Larsenv (Larsenv) joins |
| 07:38:19 | | Trieste joins |
| 07:38:19 | | DopefishJustin (DopefishJustin) joins |
| 07:38:19 | | aarchi (aarchi) joins |
| 07:38:19 | | SCSi (SCSi) joins |
| 07:38:19 | | linuxgemini (linuxgemini) joins |
| 07:38:19 | | Muad-Dib joins |
| 07:38:19 | | billy549 (Billy549) joins |
| 07:38:19 | | ThreeHeadedMonkey (ThreeHeadedMonkey) joins |
| 07:38:19 | | rewby (rewby) joins |
| 07:38:19 | | AK (AK) joins |
| 07:38:19 | | mazet joins |
| 07:38:19 | | skrzyp joins |
| 07:38:19 | | BPCZ (BPCZ) joins |
| 07:38:19 | | themadpro (themadpro) joins |
| 07:38:19 | | brad joins |
| 07:39:07 | | janpaul123 (janpaul123) joins |
| 07:39:07 | | Jonimus joins |
| 07:39:07 | | jrwr (jrwr) joins |
| 07:39:07 | | sknebel (sknebel) joins |
| 07:39:07 | | cadence (cadence) joins |
| 07:39:07 | | kallsyms joins |
| 07:39:07 | | colona joins |
| 07:39:07 | | HotSwap joins |
| 07:39:07 | | AlsoJAA (JAA) joins |
| 07:39:07 | | luckcolors (luckcolors) joins |
| 07:39:07 | | guybrush.hackint.org sets mode: +oo jrwr AlsoJAA |
| 07:40:33 | | Fusl (Fusl) joins |
| 07:40:33 | | Kaz (Kaz) joins |
| 07:40:33 | | fionera (Fionera) joins |
| 07:40:33 | | guybrush.hackint.org sets mode: +oo Fusl Kaz |
| 07:40:39 | | Sylirana (Sylirana) joins |
| 07:40:39 | | nico_32_ joins |
| 07:40:39 | | mgrandi (mgrandi) joins |
| 07:40:39 | | EggplantN2 joins |
| 07:40:39 | | Kenshin joins |
| 07:40:39 | | summerisle (summerisle) joins |
| 07:40:39 | | FalconK (FalconK) joins |
| 07:40:39 | | phuzion (phuzion) joins |
| 07:40:39 | | Jon joins |
| 07:40:39 | | atomicthumbs joins |
| 07:40:39 | | MrKen joins |
| 07:40:39 | | hook54321 (hook54321) joins |
| 07:40:39 | | Kaz__ joins |
| 07:40:39 | | simon816 (simon816) joins |
| 07:40:39 | | guybrush.hackint.org sets mode: +o hook54321 |
| 07:40:42 | | @hook54321 quits [Max SendQ exceeded] |
| 07:41:38 | | hook54321 (hook54321) joins |
| 07:41:38 | | @ChanServ sets mode: +o hook54321 |
| 07:50:52 | | Wayward- (wayward) joins |
| 07:51:06 | | Wayward quits [Ping timeout: 264 seconds] |
| 07:58:29 | | Zopolis4 (Zopolis4) joins |
| 08:13:41 | | DopefishJustin quits [Read error: Connection reset by peer] |
| 08:20:30 | | DopefishJustin joins |
| 08:20:30 | | DopefishJustin is now authenticated as DopefishJustin |
| 08:47:18 | | hooway joins |
| 09:12:14 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 09:15:08 | | Lord_Nightmare (Lord_Nightmare) joins |
| 09:44:23 | | BlueMaxima quits [Client Quit] |
| 10:27:36 | | Arcorann_ joins |
| 10:30:52 | | Arcorann quits [Ping timeout: 250 seconds] |
| 10:41:34 | <ROpdebee> | Are there any plans/interest to revive the MusicBrainz outlinks project? It's been over 5 years since it was run, so there's lots of new links |
| 10:42:00 | <ROpdebee> | I estimate roughly 15M unique outlinks, can quite easily be gathered from the database dumps |
| 10:50:28 | <@EggplantN> | ROpdebee have we previously archived it? |
| 10:50:39 | <@EggplantN> | If so can you link me the IA collection |
| 10:51:56 | <ROpdebee> | EggplantN: https://archive.org/details/archiveteam_musicbrainz |
| 10:52:19 | <@EggplantN> | Okay lemme call in rewby |
| 10:52:21 | <ROpdebee> | There's also a wiki page here https://wiki.archiveteam.org/index.php/MusicBrainz |
| 10:52:44 | <@EggplantN> | are you able to do that IA collection and get me the links specifically from that collection rewby ? |
| 10:53:05 | <rewby> | Uh. Lemme check a few things |
| 10:54:03 | <rewby> | I think I can do that |
| 10:54:20 | <@EggplantN> | Ah it’s only 1TB we will let you know ROpdebee. What we will likely do if we can is grab all the outlinks and queue them into #// project |
| 10:54:38 | <@EggplantN> | need any boxes rewby to run it or is it small enough to get done quickly lol |
| 10:55:04 | <ROpdebee> | lemme know, i can dump out all of the current links in a couple hours if necessary |
| 10:55:10 | <rewby> | EggplantN: I'll have to take a quick look later today. I'm in the middle of something. |
| 10:55:37 | <@EggplantN> | No problemo rewby do let me know |
| 10:56:00 | <ROpdebee> | my main concern is the large number of Spotify links though. very js-heavy page, just grabbing the page itself is useless |
| 10:56:08 | <@EggplantN> | we will run it through rewby’s system ROpdebee |
| 10:56:22 | <rewby> | I've got an industrial scale warcfile link extractor |
| 10:57:02 | <@EggplantN> | We don’t have any other option for now. We can run just the basic page through it’s better than nothing |
| 10:57:57 | <rewby> | Do you want to wait for me to implement that dedupe you asked for or are you good doing a manual filtering out of assets? Since we know the current software is prone to giving you media files. |
| 10:58:32 | <@EggplantN> | Go wild rewby. I’ll check once we have the links |
| 10:58:55 | <rewby> | Actually. Hm. I can probably just do a simple query afterwards and just grab this collection then grep -v musicbrainz |
| 10:59:04 | <rewby> | Sure thing. |
| 10:59:08 | <rewby> | I'll queue it later today |
| 10:59:14 | <rewby> | I'll let you know if I need more compute |
| 11:00:00 | <rewby> | Oh only 25 files? |
| 11:00:07 | <rewby> | That'll take like an hour at most |
| 11:02:42 | <rewby> | Oh wait, these are proper megawarcs. Eh maybe a few hours. |
| 11:02:47 | <ROpdebee> | i'm not sure i'm fully understanding, pretty new here, sorry |
| 11:03:03 | <ROpdebee> | are you guys talking about extracting the old outlinks from the warcs uploaded in 2016? |
| 11:03:09 | <rewby> | Yeah |
| 11:03:18 | <rewby> | That's my speciality |
| 11:03:28 | <ROpdebee> | i don't think that's necessary though |
| 11:03:38 | <rewby> | Why not? |
| 11:04:17 | <ROpdebee> | those old warcs don't contain musicbrainz pages AFAIK, they contain the outlinks themselves. those should be in https://archive.org/details/archiveteam_musicbrainz_items_2016010801 and the other two items at the bottom of the collection |
| 11:04:41 | <ROpdebee> | besides, there's 5 years worth of new outlinks too |
| 11:05:23 | <rewby> | The old warcs don't contains musicbrainz pages? Then what would they contain? |
| 11:05:33 | <rewby> | So you want to do a whole new scan of musicbrainz? |
| 11:05:48 | <ROpdebee> | https://musicbrainz.org/doc/MusicBrainz_Database/Download |
| 11:06:11 | <ROpdebee> | downloading that dump and extracting the links should be fine to find the outlinks |
| 11:07:34 | <ROpdebee> | archiving MB itself isn't that useful, since they publish their DB every 3 days and the server itself is open source, so if it goes down (and that won't be soon), it should be possible to restore |
| 11:10:38 | <rewby> | I see. |
| 11:10:56 | <rewby> | Yeah I'm not set up to do that type of extraction |
| 11:12:10 | <ROpdebee> | i'll see if i can do it, will report back soon |
| 11:12:19 | <rewby> | I'll have to kick this back at EggplantN. If you can dump the urls out as you said, he can queue them into #// easily |
| 11:13:06 | <rewby> | Apologies for wasting time. I was under the impression that you wanted outlinks from a set of previously archived web pages. |
| 11:21:25 | <@EggplantN> | Yeah I was as well right. So you want to archive the latest outlinks from MB ROpdebee |
| 11:21:45 | <@EggplantN> | You said they publish their DB every 3 days? Got a link to it? |
| 11:22:01 | <ROpdebee> | this is a mirror of the latest dump: https://mirrors.dotsrc.org/MusicBrainz/data/fullexport/20210313-001745/ |
| 11:22:26 | <ROpdebee> | i'm grabbing the relevant portions now, i've extracted these links before so i know where to find them :) |
| 11:23:08 | <ROpdebee> | they also have some sort of incremental DB update stream that receives updates every hour, so in an ideal world any link in those could be queued automatically |
| 11:23:20 | <@EggplantN> | Yeah if you can tell me where and what process you do, let’s automate this |
| 11:31:05 | <hooway> | are you guys planning on archiving something awful forums? |
| 11:31:18 | <OrIdow6> | EggplantN: https://musicbrainz.org/doc/Live_Data_Feed |
| 11:32:49 | <OrIdow6> | hooway: Is something happening to the forums? I know the site proper changed mid- last year |
| 11:32:55 | <OrIdow6> | *changed owner |
| 11:33:15 | <hooway> | i think not, but i don't see any dumps on SA forums |
| 11:33:32 | <hooway> | of* |
| 11:33:42 | <hooway> | only few on archive.org |
| 11:35:42 | <hooway> | you may need an account to view and save attachtments and view subforums to registered users only |
| 11:37:14 | <hooway> | SA is really old and really needed to archive, it's been around since 1999 iirc |
| 11:37:24 | <hooway> | so there is a lot of history |
| 11:37:37 | <OrIdow6> | You need to pay to see old posts, though, don't you? |
| 11:37:59 | <hooway> | yes |
| 11:42:27 | <OrIdow6> | <Ryz> Hmm, can anyone do a scrape of all the links in http://www.afn.org/users/ in WBM before http://web.archive.org/web/20020609152621/http://www.afn.org/users/ ? Since it stopped updating in 1999 May before the list got nerfed heavily |
| 11:42:30 | <OrIdow6> | https://transfer.notkiska.pw/ieEpm/afn_users_wbm_scrape_before_20020609152621 |
| 11:43:12 | <OrIdow6> | hooway: What we can get now is of limited value, then |
| 11:44:12 | <OrIdow6> | I think |
| 11:44:55 | <rewby> | Unless someone with a premium account is willing to potentially burn it. |
| 11:45:17 | <rewby> | Since I'd think that the admins probably don't want a public archive if they charge for access to old data |
| 11:45:41 | <OrIdow6> | Yeah |
| 11:47:02 | <ROpdebee> | EggplantN: Here or in a more dedicated channel? |
| 11:48:15 | <@EggplantN> | Either DM me freely and I’ll take a look at what we can do |
| 11:49:48 | <hooway> | OrIdow6: get what can without registration |
| 11:50:06 | <hooway> | and also you need to buy "archive" pass to access the archives |
| 12:02:39 | | themadpro quits [Read error: Connection reset by peer] |
| 12:03:01 | | themadpro (themadpro) joins |
| 12:21:22 | | Mateon1 quits [Ping timeout: 250 seconds] |
| 12:22:02 | | Mateon1 joins |
| 12:40:27 | <themadpro> | Uh... why not start an #archivebot job? |
| 12:41:16 | <themadpro> | Oh, registration, gotcha |
| 13:58:52 | | Mateon1 quits [Ping timeout: 250 seconds] |
| 14:01:56 | | programmerq (programmerq) joins |
| 14:09:16 | | Arcorann_ quits [Ping timeout: 250 seconds] |
| 15:18:34 | | godane (godane) joins |
| 15:19:00 | | Mateon1 joins |
| 15:57:59 | | VerifiedJ quits [Quit: The Lounge - https://thelounge.chat] |
| 15:59:28 | | VerifiedJ (VerifiedJ) joins |
| 17:01:32 | | ROpdebee quits [Client Quit] |
| 17:01:53 | | ROpdebee joins |
| 17:28:29 | | jtagcat (jtagcat) joins |
| 17:34:02 | <Ryz> | Thanks for the file OrIdow6, moments earlier I just filtered out the duplicates of the users under http://www.afn.org/ - removing the duplicates and the archives I already did in AB, I got maybe around 4 thousand userpages I'll have to go through |
| 17:36:40 | <jtagcat> | Hello, I've been appointed by @Ryz to ask here. I have tons of unused hardware, in theory tons of time; in practice, bursts of overnighters with bash, py, (somewhat go, and whatever you throw at 'man/docs absorber') capabilities. What can I do for you. |
| 17:43:59 | <@EggplantN> | jtagcat firstly can you define "tons of unused hardware" how much are we talking, how much bandwidth and how many IPs |
| 17:50:27 | | wessel1512 is now authenticated as wessel1512 |
| 17:59:07 | <jtagcat> | EggplantN: Keeping in mind, that it's not all to me, the person. 10+10Gbps from 2 ISPs. Both have one IPv4 /24 (again, not all mine, cost divided by the number of IPs, one ends up around 2-3€/mo; it's probably the last blocks we'll ever get — not smart to get them all banned). As for hardware, piles of blades. The only concern being power usage |
| 17:59:07 | <jtagcat> | (current bill is 600€/mo, 0.132€/kWh). There are 6x 2.7kW, so the capacity is around 9kW, 4kW of what is currently not in use. |
| 18:00:08 | <jtagcat> | In some sense we're peering, also, abuse mails come straight to me. |
| 18:00:12 | <@EggplantN> | Ah fair enough we've had people with "tons of hardware" be a couple of Kimsufi's before lol |
| 18:00:59 | | jtagcat looking with a confused face, "here, let's mine crypto with Wii-s' |
| 18:01:21 | <@EggplantN> | Okay best main use would be for projects. You can run the ArchiveTeam Warrior we automatically change what project to the most important one :) |
| 18:02:34 | <jtagcat> | aight, but I'd think it'd be reasonable to run on docker, if you see #warrior, I asked about unattended configging and volume mounts. |
| 18:03:10 | <jtagcat> | as far of the fleshware, I don't have 'tons of' on that scale, but I do have some time |
| 18:03:41 | <jtagcat> | * tons of: meant to be time |
| 18:39:53 | | no112 quits [Read error: Connection reset by peer] |
| 18:40:02 | | no112 joins |
| 19:11:05 | <SCSi> | am i the only one who realized that archiveteam.org's cert is invalid on their website? |
| 19:11:11 | <SCSi> | like it expired |
| 19:13:36 | <@HCross> | we're aware |
| 19:13:54 | | @HCross sets the topic to: WE'RE AWARE OF THE SSL EXPIRY | Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | https://twitter.com/textfiles/status/1069715869994020867 |
| 19:15:06 | | katocala quits [Ping timeout: 264 seconds] |
| 19:16:46 | | katocala joins |
| 19:30:53 | | Jonboy3451 joins |
| 19:30:53 | | Jonboy345 quits [Read error: Connection reset by peer] |
| 20:07:30 | | katocala is now authenticated as katocala |
| 20:57:57 | | ROpdebee is now authenticated as ROpdebee |
| 20:57:57 | | ROpdebee quits [Changing host] |
| 20:57:57 | | ROpdebee (ROpdebee) joins |
| 21:40:57 | | BlueMaxima joins |
| 22:06:16 | <mgrandi> | so, anyone here have the old user list from this project? https://github.com/ArchiveTeam/furaffinity-discovery |
| 22:06:57 | <mgrandi> | i'm running my own discovery project for users and i can merge that user list with my new one maybe |
| 22:07:47 | <OrIdow6> | mgrandi: https://github.com/ArchiveTeam/furaffinity-items |
| 22:08:29 | <mgrandi> | oh, well thats handy |
| 22:09:40 | <mgrandi> | i'm running my own scrape as i have no idea if that old code even works, but afterwards a warrior project could be made to reup the backup, as its been 6 years |
| 22:11:31 | <OrIdow6> | Is something happening to this site? |
| 22:13:23 | <mgrandi> | Apparently, 6 years ago, IMVU bought it, and then recently IMVU sold it back to the original owner so now it's a 1 man show again and people are nervous he might not have the money to keep it running |
| 22:14:34 | <mgrandi> | So I guess a preemptive setup in case it does need an evacuation |
| 22:15:04 | <OrIdow6> | Oh |
| 22:15:11 | <mgrandi> | Plus it gave me a chance to play with aiohttp and sqlalchemy's new asyncio apis :) |
| 22:16:10 | <mgrandi> | And to implement baby's first seesaw/warrior project implementation only using a database |
| 22:20:26 | <OrIdow6> | "only using a database"? |
| 22:20:42 | <mgrandi> | Aka locking / distribution of work |
| 22:22:04 | <@JAA> | aiohttp and database-based distribution sounds a lot like qwarc. |
| 22:22:07 | <mgrandi> | Not having like the warrior strategy of having redis / server process and then a separate client that consumes a job, finishes it and reports back |
| 22:22:50 | | tzt quits [Ping timeout: 250 seconds] |
| 22:23:13 | <mgrandi> | @JAA: yeah , I assume a more polished version has already been written and is in use, this is just a weekend project and a relatively easy task |
| 22:23:43 | <@JAA> | Yes, qwarc is a very polished turd. :-P |
| 22:23:59 | <mgrandi> | But if you are using sqlalchemy for anything, sqla 1.4 just came out yesterday which has a asyncio api now |
| 22:24:41 | <@JAA> | Good to know, but in my experience, SQLAlchemy is way too slow for anything high-performance. |
| 22:24:53 | | tzt joins |
| 22:25:05 | <mgrandi> | Well now you can get around the slowness with asyncio :) |
| 22:25:17 | <@JAA> | Well, yes, but actually no. It'll still eat CPU time. |
| 22:25:37 | <@JAA> | I'm talking about running stuff where I saturate all cores on a machine. |
| 22:25:58 | <@JAA> | It's the extra processing by SQLAlchemy that throws a wrench into things there. |
| 22:26:01 | <mgrandi> | Yeah, probably not the best use case there |
| 22:26:21 | <@JAA> | But for less intense things, it's perfectly fine, yeah. |
| 22:26:29 | <mgrandi> | I think even the core stuff works with the asyncio which is somewhat close to bare driver speeds |
| 22:26:37 | <@JAA> | wpull uses it (for now). |
| 22:27:03 | <@JAA> | Yeah, but at that point, you might as well use aiosqlite etc. |
| 22:27:46 | <@JAA> | The big advantage of SQLAlchemy is the abstraction of the specifics of a particular DB backend, so you can support anything with zero extra effort. |
| 22:27:46 | <mgrandi> | Even then, having the connection / engine handling code is useful, raw handling of that stuff is annoying |
| 22:28:20 | <mgrandi> | Well I guess python has dbapi2 , less of a thing |
| 22:28:30 | <mgrandi> | But in every other language it's obnoxious as hell :) |
| 22:29:20 | <@JAA> | Yeah, Python is nice in that regard. |
| 22:30:02 | <mgrandi> | Nothing like creating prepared statements in java and having them tied to the connection and having to recreate them If the connection times out and aaaaaaaaa |
| 22:40:38 | <Ryz> | Was searching for maybe more http://www.afn.org/ userpage websites, apparently http://www.afn.org/~afn06264/ wasn't found in any of the lists I got at all~ |
| 22:40:57 | <Ryz> | I still haven't really gotten the WBM CBX; wondering if that makes a substantial difference |
| 22:44:37 | | Arcorann_ joins |
| 22:57:38 | | onetruth quits [Read error: Connection reset by peer] |
| 23:00:23 | <OrIdow6> | Ryz: https://transfer.notkiska.pw/LVVWB/afn_users_cdx_notpaginated.txt |
| 23:15:17 | | lennier1 quits [Client Quit] |
| 23:16:52 | | lennier1 (lennier1) joins |
| 23:19:49 | <Ryz> | Thanks OrIdow6, there's some more userpage websites that the other lists didn't have~ o: |
| 23:22:12 | | Arcorann_ quits [Ping timeout: 249 seconds] |
| 23:24:13 | <OrIdow6> | Ryz: No problem |
| 23:24:22 | <OrIdow6> | I should write something to automate that process |