00:22:25Mineroboter quits [Client Quit]
00:34:24IKI quits [Remote host closed the connection]
01:03:31dm4v quits [Ping timeout: 258 seconds]
01:04:22dm4v joins
01:04:24dm4v quits [Changing host]
01:04:24dm4v (dm4v) joins
01:07:47<@JAA>betamax: So. Many. Broken. URLs. :-|
01:29:44IKI joins
01:42:50TheTechRobo joins
01:43:14<TheTechRobo>Hello
01:43:18<TheTechRobo>I think that this is the correct channel
01:43:32<TheTechRobo>Any chance that the https://wiki.archiveteam.org/index.php/Dev will be updated ?
01:43:40<TheTechRobo>Most if not all pages are last updated 2015
01:43:44<TheTechRobo>and they use python 2
01:44:21<TheTechRobo>*correction: https://wiki.archiveteam.org/index.php/Dev/Seesaw uses python 2
01:44:33<TheTechRobo>would ti still work with python 3?
02:06:23Wayward quits [Remote host closed the connection]
02:06:23systwi quits [Read error: Connection reset by peer]
02:06:23@dxrt quits [Quit: ZNC - http://znc.sourceforge.net]
02:06:23Aoede quits [Client Quit]
02:06:23PlsNoJava quits [Read error: Connection reset by peer]
02:06:23girst quits [Ping timeout: 250 seconds]
02:06:44billy549 quits [Ping timeout: 250 seconds]
02:07:10rewby quits [Ping timeout: 250 seconds]
02:07:21Muad-Dib quits [Read error: Connection reset by peer]
02:07:29PlsNoJava4 (ROpdebee) joins
02:07:29systwi_ (systwi) joins
02:07:29Aoede_ (Aoede) joins
02:07:29Wayward- (wayward) joins
02:07:29nepeat_ joins
02:07:29PlsNoJava4 is now known as PlsNoJava
02:07:29girst (girst) joins
02:07:30rewby (rewby) joins
02:07:31Muad_Dib joins
02:07:36nepeat quits [Ping timeout: 250 seconds]
02:07:36voltagex quits [Ping timeout: 250 seconds]
02:07:54voltagex joins
02:08:02@Fusl quits [Ping timeout: 250 seconds]
02:08:22Fusl (Fusl) joins
02:08:22@ChanServ sets mode: +o Fusl
02:08:23dxrt joins
02:08:26dxrt quits [Changing host]
02:08:26dxrt (dxrt) joins
02:08:26@ChanServ sets mode: +o dxrt
02:08:33<@JAA>Yep, that section is pretty outdated.
02:08:39<@JAA>Yes, seesaw works fine with Python 3.
02:17:08minus quits [Ping timeout: 250 seconds]
02:17:08<TheTechRobo>Sounds good, thanks
02:17:34t3chler quits [Ping timeout: 250 seconds]
02:17:34yawkat quits [Ping timeout: 250 seconds]
02:17:34masterX244 quits [Ping timeout: 250 seconds]
02:17:50yawkat (yawkat) joins
02:18:00rewby quits [Ping timeout: 250 seconds]
02:18:00SCSi quits [Ping timeout: 250 seconds]
02:18:10SCSi (SCSi) joins
02:18:12rewby (rewby) joins
02:18:21masterX244 (masterX244) joins
02:18:37t3chler joins
02:19:02TheTechRobo quits [Remote host closed the connection]
02:20:15minus joins
02:34:28billy549 (Billy549) joins
03:03:21<@JAA>betamax: Queueing has begun. 1437 sites after lots of cleanup and dedupe and whatnot.
03:30:08<flashfire42>Ok start grabbing anything related to israel and gaza they are using white phosphorous
03:36:59qw3rty_ joins
03:40:41qw3rty quits [Ping timeout: 258 seconds]
03:49:26Laszlo joins
03:50:08qwertyasdf joins
03:53:46shoghicp quits [Ping timeout: 250 seconds]
03:56:17Laszlo quits [Remote host closed the connection]
03:57:53qwertyasdf quits [Remote host closed the connection]
03:59:17qwertyasdf joins
04:07:38lukash7 quits [Ping timeout: 250 seconds]
04:07:54Jonboy345 quits [Ping timeout: 258 seconds]
04:20:16Jonboy345 joins
04:24:56mutantmnky quits [Remote host closed the connection]
04:25:13mutantmnky (mutantmonkey) joins
04:34:07lukash7 joins
04:43:58Jonboy3451 joins
04:44:50jonboy3452 joins
04:47:04Jonboy345 quits [Ping timeout: 250 seconds]
04:48:32Jonboy3451 quits [Ping timeout: 258 seconds]
04:48:59qwertyasdf is now known as qwertyasdfuiopghjkl
04:51:02lennier2 joins
04:53:54lennier1 quits [Ping timeout: 258 seconds]
04:54:03lennier2 is now known as lennier1
07:36:33duce1337 (duce1337) joins
07:47:01qwertyasdf joins
07:47:01qwertyasdfuiopghjkl quits [Ping timeout: 244 seconds]
07:57:57qwertyasdf quits [Remote host closed the connection]
08:08:46shoghicp (shoghicp) joins
08:23:35yawkat quits [Ping timeout: 258 seconds]
08:24:22lennier1 quits [Client Quit]
08:33:21yawkat (yawkat) joins
08:34:43lennier1 (lennier1) joins
08:35:12<mgrandi>Do we have a coordinated effort for getting google doc urls? Apparently they are gonna start wiping those of they are inactive in 3 weeks
08:36:52<@OrIdow6>Link?
08:37:13<@OrIdow6>First I'v eheard of this
08:39:42<@OrIdow6>That I remember
08:43:34<@OrIdow6>Oh, apparently a link was posted (though I don't see any discussion) November 15
08:43:39<@OrIdow6>https://blog.google/products/photos/storage-policy-update/
08:46:38<@OrIdow6>"After June 1:
08:46:39<@OrIdow6> If you're inactive in one or more of these services for two years (24 months), Google may delete the content in the product(s) in which you're inactive.
08:46:41<@OrIdow6> Similarly, if you're over your storage limit for two years, Google may delete your content across Gmail, Drive and Photos."
08:49:26BlueMaxima quits [Client Quit]
08:55:01colona quits [Ping timeout: 258 seconds]
08:56:12colona joins
09:13:41<@HCross>how would we do it? Export to PDF
09:44:44<Sanqui>the HTML view is probably better?
09:44:54<Sanqui>maybe even possible with #//?
10:01:11<@EggplantN>#// gets single urls
10:36:31duce1337_ (duce1337) joins
10:36:31duce1337 quits [Read error: Connection reset by peer]
11:14:07Arcorann_ joins
11:23:30<avoozl>OrIdow6: inactive in "one or more".. that sounds like they could already wipe things if
11:24:40nuroten quits [Client Quit]
11:27:02<avoozl>.. I'm inactive on a single service even
11:28:26LeighR (LeighR) joins
11:59:11<@OrIdow6>One thing to worry about is ability to discover them
12:00:10Aoede_ is now known as Aoede
12:00:15<@OrIdow6>Even something like "append this to the path" will make it hard to play them back in practice
12:00:57<@OrIdow6>avoozl: it only "delete[s] the content in the product(s) in which you're inactive"
12:02:25<avoozl>Makes sense. I need to get into the habit of cycling through my google accounts every once in a while.. I just tended to create a new account for any android device I had
12:05:19pcr leaves
12:05:22pcr joins
12:39:48yanome quits [Quit: The Lounge - https://thelounge.chat]
12:39:58yanome (yano) joins
13:14:13howardad quits [Quit: WeeChat 2.8]
13:27:30spirit quits [Ping timeout: 250 seconds]
14:32:29spirit joins
14:48:27Arcorann_ quits [Ping timeout: 258 seconds]
15:13:31duce1337_ quits [Read error: Connection reset by peer]
15:13:31duce1337 (duce1337) joins
15:41:03Ryz quits [Remote host closed the connection]
15:43:46xit quits [Quit: Ping timeout (120 seconds)]
15:44:02xit joins
15:57:47Ryz (Ryz) joins
16:25:23treeplant quits [Remote host closed the connection]
16:33:27HP_Archivist (HP_Archivist) joins
16:59:27TempName joins
17:00:45TempName quits [Remote host closed the connection]
17:01:34TempName joins
17:13:47TempName quits [Remote host closed the connection]
17:15:16Lord_Nightmare quits [Quit: ZNC - http://znc.in]
17:21:16Lord_Nightmare (Lord_Nightmare) joins
17:21:35Eighty quits [Remote host closed the connection]
17:23:25@OrIdow6 quits [Quit: Quitting.]
17:24:25OrIdow6 (OrIdow6) joins
17:24:25@ChanServ sets mode: +o OrIdow6
17:44:16Eighty (Eighty) joins
17:44:50<mgrandi>The AT wiki has some notes on the gdoc url formats, probably best to convert to a variety of formats since they are so small : shrug:
17:49:24etnguyen03 (etnguyen03) joins
17:50:28<etnguyen03>Not sure if this has been discussed yet but https://www.reddit.com/r/DataHoarder/comments/nah769/fbi_looking_at_the_scihub_developer_dont_want_to/?
17:52:13<mgrandi>Already got their twitter
17:53:15<@JAA>Put it on the pile. ;-)
18:03:06ThreeHeadedMonkey quits [Ping timeout: 250 seconds]
18:04:05ThreeHeadedMonkey (ThreeHeadedMonkey) joins
18:21:23<betamax>JAA: many thanks for queuing the party / candidate sites in AB
18:21:25<betamax>I don't want to overload things - should I wait before loading some more of the twitter scrapes into AB?
18:22:33<@JAA>betamax: Fine to start two more I'd say. The ones that are primarily or entirely twitter.com URLs will run much faster than the ones full of external URLs.
18:23:06lennier1 quits [Client Quit]
18:24:55lennier1 (lennier1) joins
18:25:12<betamax>Yeah, I think the next two lists are entirely t.co shortlinks but after that it's mostly just tweets
18:30:19<@JAA>I'd guess the last one might be www.* stuff.
18:32:08<betamax>The last one goes from "http://t.co..." to "t.co", so I guess there aren't any www stuff (that don't have "http" or "https" prefix)
18:32:24<@JAA>Uh
18:32:50Daloader joins
18:32:53<@JAA>That sounds highly unlikely. Unless you removed it or ignored it on sorting, I guess.
18:33:56<betamax>Just checked, and there aren't any starting with "www". Once sec while I look at the original scrape output.
18:35:24<@JAA>I mean, random example from the job that just started: https://t.co/4P985nEYWA -> https://www.yorkshireparty.org.uk/
18:36:22<@JAA>And that was the first one I tried. So yeah, there are definitely lots of www URLs in it.
18:36:35<@JAA>HTTP v HTTPS though
18:39:25<betamax>Ah, sorry. I thought you meant URLs starting with "www" (ie: no http or https)
18:40:04<@JAA>Oh, no, there shouldn't be any protocol-less URLs unless something went very wrong.
18:40:04duce1337 quits [Read error: Connection reset by peer]
18:40:26duce1337 (duce1337) joins
18:40:52<betamax>There's around 700,000 non-twitter URLs with "//www." in the scrape.
18:41:05<@JAA>Yeah, that makes more sense. :-)
18:42:31<betamax>There's around 16 protocol-less "t.co" links, but in the scheme of things that's nothing
18:43:11LeighR quits [Ping timeout: 244 seconds]
18:49:21<@JAA>Hmm, that's odd though.
18:49:31<@JAA>I'd love to know where those come from. Sounds like a Twitter or snscrape bug.
18:50:30<Ryz>!ig 8d5te0kc64qttw6s9vsg27hza ^https?://assets\.squarespace\.com/universal/scripts-compressed/
18:50:33<Ryz>Oops
19:12:48nuroten joins
19:16:23rsn_ joins
19:16:31rsn_ quits [Remote host closed the connection]
19:19:24duce1337 quits [Read error: Connection reset by peer]
19:19:29duce1337_ (duce1337) joins
19:26:02<betamax>JAA: it looks to be a bug in twitter
19:26:13<betamax>here's a tweet that causes it: https://twitter.com/AngusRobertson/status/60228740503965696
19:26:28<betamax>snscrape output: "https://twitter.com/AngusRobertson/status/60228740503965696 t.co/6kZ3hts http://yfrog.com/h2xnjcoj"
19:26:43<betamax>(when run with "snscrape --format {url} {tcooutlinksss} {outlinksss} twitter-user <username>")
19:26:59<@JAA>Thanks!
19:27:14<@JAA>Time to add another workaround for Twitter weirdness I guess.
19:28:28<betamax>Here's the full list (of 17 results) if you want more examples for testing: https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/twitter_bug.txt
19:29:02<@JAA>Perfect! :-)
19:33:33SketchTh1Cow joins
19:35:34SketchTheCow quits [Ping timeout: 258 seconds]
19:48:31Vukky (Vukky) joins
19:59:35SketchTheCow joins
20:00:29HP_Archivist quits [Ping timeout: 258 seconds]
20:01:22SketchTh1Cow quits [Read error: Connection reset by peer]
20:04:45HP_Archivist (HP_Archivist) joins
20:18:31Daloader quits [Client Quit]
20:25:30ragu joins
20:29:38IKI quits [Remote host closed the connection]
21:30:22teej (teej) joins
21:55:17HP_Archivist quits [Client Quit]
22:00:05webdownload joins
22:17:21duce1337_ quits [Client Quit]
22:57:44Arcorann_ joins
23:04:00BlueMaxima joins
23:28:06Arcorann_ quits [Ping timeout: 250 seconds]
23:34:30Iki joins