| 00:22:25 | | Mineroboter quits [Client Quit] |
| 00:34:24 | | IKI quits [Remote host closed the connection] |
| 01:03:31 | | dm4v quits [Ping timeout: 258 seconds] |
| 01:04:22 | | dm4v joins |
| 01:04:24 | | dm4v is now authenticated as dm4v |
| 01:04:24 | | dm4v quits [Changing host] |
| 01:04:24 | | dm4v (dm4v) joins |
| 01:07:47 | <@JAA> | betamax: So. Many. Broken. URLs. :-| |
| 01:29:44 | | IKI joins |
| 01:42:50 | | TheTechRobo joins |
| 01:43:14 | <TheTechRobo> | Hello |
| 01:43:18 | <TheTechRobo> | I think that this is the correct channel |
| 01:43:32 | <TheTechRobo> | Any chance that the https://wiki.archiveteam.org/index.php/Dev will be updated ? |
| 01:43:40 | <TheTechRobo> | Most if not all pages are last updated 2015 |
| 01:43:44 | <TheTechRobo> | and they use python 2 |
| 01:44:21 | <TheTechRobo> | *correction: https://wiki.archiveteam.org/index.php/Dev/Seesaw uses python 2 |
| 01:44:33 | <TheTechRobo> | would ti still work with python 3? |
| 02:06:23 | | Wayward quits [Remote host closed the connection] |
| 02:06:23 | | systwi quits [Read error: Connection reset by peer] |
| 02:06:23 | | @dxrt quits [Quit: ZNC - http://znc.sourceforge.net] |
| 02:06:23 | | Aoede quits [Client Quit] |
| 02:06:23 | | PlsNoJava quits [Read error: Connection reset by peer] |
| 02:06:23 | | girst quits [Ping timeout: 250 seconds] |
| 02:06:44 | | billy549 quits [Ping timeout: 250 seconds] |
| 02:07:10 | | rewby quits [Ping timeout: 250 seconds] |
| 02:07:21 | | Muad-Dib quits [Read error: Connection reset by peer] |
| 02:07:29 | | PlsNoJava4 (ROpdebee) joins |
| 02:07:29 | | systwi_ (systwi) joins |
| 02:07:29 | | Aoede_ (Aoede) joins |
| 02:07:29 | | Wayward- (wayward) joins |
| 02:07:29 | | nepeat_ joins |
| 02:07:29 | | PlsNoJava4 is now known as PlsNoJava |
| 02:07:29 | | girst (girst) joins |
| 02:07:30 | | rewby (rewby) joins |
| 02:07:31 | | Muad_Dib joins |
| 02:07:36 | | nepeat quits [Ping timeout: 250 seconds] |
| 02:07:36 | | voltagex quits [Ping timeout: 250 seconds] |
| 02:07:54 | | voltagex joins |
| 02:08:02 | | @Fusl quits [Ping timeout: 250 seconds] |
| 02:08:22 | | Fusl (Fusl) joins |
| 02:08:22 | | @ChanServ sets mode: +o Fusl |
| 02:08:23 | | dxrt joins |
| 02:08:26 | | dxrt is now authenticated as dxrt |
| 02:08:26 | | dxrt quits [Changing host] |
| 02:08:26 | | dxrt (dxrt) joins |
| 02:08:26 | | @ChanServ sets mode: +o dxrt |
| 02:08:33 | <@JAA> | Yep, that section is pretty outdated. |
| 02:08:39 | <@JAA> | Yes, seesaw works fine with Python 3. |
| 02:17:08 | | minus quits [Ping timeout: 250 seconds] |
| 02:17:08 | <TheTechRobo> | Sounds good, thanks |
| 02:17:34 | | t3chler quits [Ping timeout: 250 seconds] |
| 02:17:34 | | yawkat quits [Ping timeout: 250 seconds] |
| 02:17:34 | | masterX244 quits [Ping timeout: 250 seconds] |
| 02:17:50 | | yawkat (yawkat) joins |
| 02:18:00 | | rewby quits [Ping timeout: 250 seconds] |
| 02:18:00 | | SCSi quits [Ping timeout: 250 seconds] |
| 02:18:10 | | SCSi (SCSi) joins |
| 02:18:12 | | rewby (rewby) joins |
| 02:18:21 | | masterX244 (masterX244) joins |
| 02:18:37 | | t3chler joins |
| 02:19:02 | | TheTechRobo quits [Remote host closed the connection] |
| 02:20:15 | | minus joins |
| 02:34:28 | | billy549 (Billy549) joins |
| 03:03:21 | <@JAA> | betamax: Queueing has begun. 1437 sites after lots of cleanup and dedupe and whatnot. |
| 03:30:08 | <flashfire42> | Ok start grabbing anything related to israel and gaza they are using white phosphorous |
| 03:36:59 | | qw3rty_ joins |
| 03:40:41 | | qw3rty quits [Ping timeout: 258 seconds] |
| 03:49:26 | | Laszlo joins |
| 03:50:08 | | qwertyasdf joins |
| 03:53:46 | | shoghicp quits [Ping timeout: 250 seconds] |
| 03:56:17 | | Laszlo quits [Remote host closed the connection] |
| 03:57:53 | | qwertyasdf quits [Remote host closed the connection] |
| 03:59:17 | | qwertyasdf joins |
| 04:07:38 | | lukash7 quits [Ping timeout: 250 seconds] |
| 04:07:54 | | Jonboy345 quits [Ping timeout: 258 seconds] |
| 04:20:16 | | Jonboy345 joins |
| 04:24:56 | | mutantmnky quits [Remote host closed the connection] |
| 04:25:13 | | mutantmnky (mutantmonkey) joins |
| 04:34:07 | | lukash7 joins |
| 04:43:58 | | Jonboy3451 joins |
| 04:44:50 | | jonboy3452 joins |
| 04:47:04 | | Jonboy345 quits [Ping timeout: 250 seconds] |
| 04:48:32 | | Jonboy3451 quits [Ping timeout: 258 seconds] |
| 04:48:59 | | qwertyasdf is now known as qwertyasdfuiopghjkl |
| 04:51:02 | | lennier2 joins |
| 04:53:54 | | lennier1 quits [Ping timeout: 258 seconds] |
| 04:54:03 | | lennier2 is now known as lennier1 |
| 07:36:33 | | duce1337 (duce1337) joins |
| 07:47:01 | | qwertyasdf joins |
| 07:47:01 | | qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds] |
| 07:57:57 | | qwertyasdf quits [Remote host closed the connection] |
| 08:08:46 | | shoghicp (shoghicp) joins |
| 08:23:35 | | yawkat quits [Ping timeout: 258 seconds] |
| 08:24:22 | | lennier1 quits [Client Quit] |
| 08:33:21 | | yawkat (yawkat) joins |
| 08:34:43 | | lennier1 (lennier1) joins |
| 08:35:12 | <mgrandi> | Do we have a coordinated effort for getting google doc urls? Apparently they are gonna start wiping those of they are inactive in 3 weeks |
| 08:36:52 | <@OrIdow6> | Link? |
| 08:37:13 | <@OrIdow6> | First I'v eheard of this |
| 08:39:42 | <@OrIdow6> | That I remember |
| 08:43:34 | <@OrIdow6> | Oh, apparently a link was posted (though I don't see any discussion) November 15 |
| 08:43:39 | <@OrIdow6> | https://blog.google/products/photos/storage-policy-update/ |
| 08:46:38 | <@OrIdow6> | "After June 1: |
| 08:46:39 | <@OrIdow6> | If you're inactive in one or more of these services for two years (24 months), Google may delete the content in the product(s) in which you're inactive. |
| 08:46:41 | <@OrIdow6> | Similarly, if you're over your storage limit for two years, Google may delete your content across Gmail, Drive and Photos." |
| 08:49:26 | | BlueMaxima quits [Client Quit] |
| 08:55:01 | | colona quits [Ping timeout: 258 seconds] |
| 08:56:12 | | colona joins |
| 09:13:41 | <@HCross> | how would we do it? Export to PDF |
| 09:44:44 | <Sanqui> | the HTML view is probably better? |
| 09:44:54 | <Sanqui> | maybe even possible with #//? |
| 10:01:11 | <@EggplantN> | #// gets single urls |
| 10:36:31 | | duce1337_ (duce1337) joins |
| 10:36:31 | | duce1337 quits [Read error: Connection reset by peer] |
| 11:14:07 | | Arcorann_ joins |
| 11:23:30 | <avoozl> | OrIdow6: inactive in "one or more".. that sounds like they could already wipe things if |
| 11:24:40 | | nuroten quits [Client Quit] |
| 11:27:02 | <avoozl> | .. I'm inactive on a single service even |
| 11:28:26 | | LeighR (LeighR) joins |
| 11:59:11 | <@OrIdow6> | One thing to worry about is ability to discover them |
| 12:00:10 | | Aoede_ is now known as Aoede |
| 12:00:15 | <@OrIdow6> | Even something like "append this to the path" will make it hard to play them back in practice |
| 12:00:57 | <@OrIdow6> | avoozl: it only "delete[s] the content in the product(s) in which you're inactive" |
| 12:02:25 | <avoozl> | Makes sense. I need to get into the habit of cycling through my google accounts every once in a while.. I just tended to create a new account for any android device I had |
| 12:05:19 | | pcr leaves |
| 12:05:22 | | pcr joins |
| 12:39:48 | | yanome quits [Quit: The Lounge - https://thelounge.chat] |
| 12:39:58 | | yanome (yano) joins |
| 13:14:13 | | howardad quits [Quit: WeeChat 2.8] |
| 13:27:30 | | spirit quits [Ping timeout: 250 seconds] |
| 14:32:29 | | spirit joins |
| 14:48:27 | | Arcorann_ quits [Ping timeout: 258 seconds] |
| 15:13:31 | | duce1337_ quits [Read error: Connection reset by peer] |
| 15:13:31 | | duce1337 (duce1337) joins |
| 15:41:03 | | Ryz quits [Remote host closed the connection] |
| 15:43:46 | | xit quits [Quit: Ping timeout (120 seconds)] |
| 15:44:02 | | xit joins |
| 15:57:47 | | Ryz (Ryz) joins |
| 16:25:23 | | treeplant quits [Remote host closed the connection] |
| 16:33:27 | | HP_Archivist (HP_Archivist) joins |
| 16:59:27 | | TempName joins |
| 17:00:45 | | TempName quits [Remote host closed the connection] |
| 17:01:34 | | TempName joins |
| 17:13:47 | | TempName quits [Remote host closed the connection] |
| 17:15:16 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 17:21:16 | | Lord_Nightmare (Lord_Nightmare) joins |
| 17:21:35 | | Eighty quits [Remote host closed the connection] |
| 17:23:25 | | @OrIdow6 quits [Quit: Quitting.] |
| 17:24:25 | | OrIdow6 (OrIdow6) joins |
| 17:24:25 | | @ChanServ sets mode: +o OrIdow6 |
| 17:44:16 | | Eighty (Eighty) joins |
| 17:44:50 | <mgrandi> | The AT wiki has some notes on the gdoc url formats, probably best to convert to a variety of formats since they are so small : shrug: |
| 17:49:24 | | etnguyen03 (etnguyen03) joins |
| 17:50:28 | <etnguyen03> | Not sure if this has been discussed yet but https://www.reddit.com/r/DataHoarder/comments/nah769/fbi_looking_at_the_scihub_developer_dont_want_to/? |
| 17:52:13 | <mgrandi> | Already got their twitter |
| 17:53:15 | <@JAA> | Put it on the pile. ;-) |
| 18:03:06 | | ThreeHeadedMonkey quits [Ping timeout: 250 seconds] |
| 18:04:05 | | ThreeHeadedMonkey (ThreeHeadedMonkey) joins |
| 18:21:23 | <betamax> | JAA: many thanks for queuing the party / candidate sites in AB |
| 18:21:25 | <betamax> | I don't want to overload things - should I wait before loading some more of the twitter scrapes into AB? |
| 18:22:33 | <@JAA> | betamax: Fine to start two more I'd say. The ones that are primarily or entirely twitter.com URLs will run much faster than the ones full of external URLs. |
| 18:23:06 | | lennier1 quits [Client Quit] |
| 18:24:55 | | lennier1 (lennier1) joins |
| 18:25:12 | <betamax> | Yeah, I think the next two lists are entirely t.co shortlinks but after that it's mostly just tweets |
| 18:30:19 | <@JAA> | I'd guess the last one might be www.* stuff. |
| 18:32:08 | <betamax> | The last one goes from "http://t.co..." to "t.co", so I guess there aren't any www stuff (that don't have "http" or "https" prefix) |
| 18:32:24 | <@JAA> | Uh |
| 18:32:50 | | Daloader joins |
| 18:32:53 | <@JAA> | That sounds highly unlikely. Unless you removed it or ignored it on sorting, I guess. |
| 18:33:56 | <betamax> | Just checked, and there aren't any starting with "www". Once sec while I look at the original scrape output. |
| 18:35:24 | <@JAA> | I mean, random example from the job that just started: https://t.co/4P985nEYWA -> https://www.yorkshireparty.org.uk/ |
| 18:36:22 | <@JAA> | And that was the first one I tried. So yeah, there are definitely lots of www URLs in it. |
| 18:36:35 | <@JAA> | HTTP v HTTPS though |
| 18:39:25 | <betamax> | Ah, sorry. I thought you meant URLs starting with "www" (ie: no http or https) |
| 18:40:04 | <@JAA> | Oh, no, there shouldn't be any protocol-less URLs unless something went very wrong. |
| 18:40:04 | | duce1337 quits [Read error: Connection reset by peer] |
| 18:40:26 | | duce1337 (duce1337) joins |
| 18:40:52 | <betamax> | There's around 700,000 non-twitter URLs with "//www." in the scrape. |
| 18:41:05 | <@JAA> | Yeah, that makes more sense. :-) |
| 18:42:31 | <betamax> | There's around 16 protocol-less "t.co" links, but in the scheme of things that's nothing |
| 18:43:11 | | LeighR quits [Ping timeout: 244 seconds] |
| 18:49:21 | <@JAA> | Hmm, that's odd though. |
| 18:49:31 | <@JAA> | I'd love to know where those come from. Sounds like a Twitter or snscrape bug. |
| 18:50:30 | <Ryz> | !ig 8d5te0kc64qttw6s9vsg27hza ^https?://assets\.squarespace\.com/universal/scripts-compressed/ |
| 18:50:33 | <Ryz> | Oops |
| 19:12:48 | | nuroten joins |
| 19:16:23 | | rsn_ joins |
| 19:16:31 | | rsn_ quits [Remote host closed the connection] |
| 19:19:24 | | duce1337 quits [Read error: Connection reset by peer] |
| 19:19:29 | | duce1337_ (duce1337) joins |
| 19:26:02 | <betamax> | JAA: it looks to be a bug in twitter |
| 19:26:13 | <betamax> | here's a tweet that causes it: https://twitter.com/AngusRobertson/status/60228740503965696 |
| 19:26:28 | <betamax> | snscrape output: "https://twitter.com/AngusRobertson/status/60228740503965696 t.co/6kZ3hts http://yfrog.com/h2xnjcoj" |
| 19:26:43 | <betamax> | (when run with "snscrape --format {url} {tcooutlinksss} {outlinksss} twitter-user <username>") |
| 19:26:59 | <@JAA> | Thanks! |
| 19:27:14 | <@JAA> | Time to add another workaround for Twitter weirdness I guess. |
| 19:28:28 | <betamax> | Here's the full list (of 17 results) if you want more examples for testing: https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/twitter_bug.txt |
| 19:29:02 | <@JAA> | Perfect! :-) |
| 19:33:33 | | SketchTh1Cow joins |
| 19:35:34 | | SketchTheCow quits [Ping timeout: 258 seconds] |
| 19:48:31 | | Vukky (Vukky) joins |
| 19:59:35 | | SketchTheCow joins |
| 20:00:29 | | HP_Archivist quits [Ping timeout: 258 seconds] |
| 20:01:22 | | SketchTh1Cow quits [Read error: Connection reset by peer] |
| 20:04:45 | | HP_Archivist (HP_Archivist) joins |
| 20:18:31 | | Daloader quits [Client Quit] |
| 20:25:30 | | ragu joins |
| 20:29:38 | | IKI quits [Remote host closed the connection] |
| 21:30:22 | | teej (teej) joins |
| 21:55:17 | | HP_Archivist quits [Client Quit] |
| 22:00:05 | | webdownload joins |
| 22:17:21 | | duce1337_ quits [Client Quit] |
| 22:57:44 | | Arcorann_ joins |
| 23:04:00 | | BlueMaxima joins |
| 23:28:06 | | Arcorann_ quits [Ping timeout: 250 seconds] |
| 23:34:30 | | Iki joins |