| 00:17:39 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 00:18:28 | | jacobk quits [Ping timeout: 268 seconds] |
| 00:19:31 | | upintheairsheep joins |
| 00:24:09 | | upintheairsheep quits [Remote host closed the connection] |
| 00:24:32 | | tomorrowInstallment joins |
| 00:24:47 | | upintheairsheep joins |
| 00:25:47 | <upintheairsheep> | lennier1 Thank you, writing a scraper would be super easy thanks to your help. |
| 00:26:48 | <tomorrowInstallment> | hello hello... at the risk of being told off because I'm probably the tenth person to ask this today, but how are we looking in terms of saving twitter? |
| 00:28:27 | <TheTechRobo> | tomorrowInstallment: Would likely take too long to make archiving EVERYTHING feasible, although there might be news in that regard. |
| 00:28:38 | <andrew> | tomorrowInstallment: surprisingly, you are not the tenth person to ask this. as far as I know, AT has not started an proper Twitter archival project. I have personally been working on a Twitter scraper, and I think I may have a workable prototype |
| 00:29:01 | <andrew> | but I'm going to need a lot more resources than I have right now to actually make it happen |
| 00:29:04 | <TheTechRobo> | andrew: WARC? |
| 00:29:30 | <andrew> | I don't think WARC makes sense for this. we don't have enough space to store full web pages |
| 00:29:47 | <andrew> | and the webpages themselves are JavaScript rendered |
| 00:29:55 | <andrew> | so the least bad solution appears to be saving API JSON |
| 00:30:19 | | Atom__ joins |
| 00:31:10 | <tomorrowInstallment> | I'd love to hack together something that warriors can get to work on. |
| 00:31:10 | | upintheairsheep quits [Remote host closed the connection] |
| 00:31:10 | | Atom-- quits [Read error: Connection reset by peer] |
| 00:31:23 | | lennier2 joins |
| 00:31:25 | | upintheairsheep joins |
| 00:31:53 | <andrew> | based on my calculations and a Pushshift analysis of Snowflake IDs, if you manage to juggle enough guest tokens, it is definitely feasible to archive around 99% of all tweets made since 2017 within the next few months |
| 00:32:42 | | lennier2_ joins |
| 00:33:18 | | tomorrowRemoval joins |
| 00:33:49 | <tomorrowRemoval> | Hello, I was tomorrowInstallment (seems like my internet has decided to die) |
| 00:34:02 | <andrew> | to repeat, based on my calculations and a Pushshift analysis of Snowflake IDs, if you manage to juggle enough guest tokens, it is definitely feasible to archive around 99% of all tweets made since 2017 within the next few months |
| 00:34:29 | <andrew> | it's probably made easier by the fact that it's going to be difficult for Twitter to deploy new anti-abuse mechanisms due to Elon's management |
| 00:34:30 | <@arkiver> | tomorrowRemoval: have you come here to remove tomorrowInstallment from the chat :P |
| 00:34:53 | <TheTechRobo> | <andrew> so the least bad solution appears to be saving API JSON |
| 00:34:56 | <TheTechRobo> | You can do that with WARC :-) |
| 00:35:03 | <upintheairsheep> | lennier1 Hello, can you decrypt this samsung smart tv firmware, and upload the decrypted files to the internetarchive and post the link for me to try to reverse engineer the store itself? https://wiki.samygo.tv/index.php?title=Extracting_the_ES-series_firmwarehttps://wiki.samygo.tv/index.php?title=Extracting_the_ES-series_firmware |
| 00:35:11 | <tomorrowRemoval> | I think twitter still has a few months of life left into it |
| 00:35:25 | <TheTechRobo> | WARC doesn't necessarily mean full webpages, it's just an archival format |
| 00:35:43 | <tomorrowRemoval> | But I've also heard that multiple critical infra teams at twitter has completely resigned |
| 00:35:49 | <tomorrowRemoval> | Oh, and the world cup is on next Monday, and it's going to be extreme traffic o'clock |
| 00:35:54 | | tomorrowInstallment quits [Ping timeout: 265 seconds] |
| 00:35:54 | | lennier1 quits [Ping timeout: 265 seconds] |
| 00:36:04 | | lennier2_ is now known as lennier1 |
| 00:36:06 | <tomorrowRemoval> | see, the removal worked! |
| 00:36:14 | <@arkiver> | tomorrowRemoval: congrats ;) |
| 00:36:23 | <upintheairsheep> | Firmware: https://www.samsung.com/uk/support/model/UE55ES8000UXXU/ Script to use: https://github.com/george-hopkins/samygo-patcher/blob/master/samygo_patcher.py |
| 00:36:46 | <upintheairsheep> | The github version is more updated, but still requires python 2.7 |
| 00:37:06 | <@arkiver> | python2.7 is definitely dead by now |
| 00:37:21 | | lennier2 quits [Ping timeout: 265 seconds] |
| 00:37:30 | <upintheairsheep> | I could do this myself but i can't use python 2.7 with cryptography frameworks due to software limitations on iOS. |
| 00:37:44 | <upintheairsheep> | It's our only option. |
| 00:37:55 | <andrew> | so, here's the idea, offered free of charge, and without warranty: you have warriors mint guest tokens from Twitter from the large pool of IPs available and submit them to a central server. each guest token can make 180 status lookup requests per 15 minutes, and each token lasts 10800 seconds or so (I have not verified this myself). the central server checks out guest tokens to warriors who then use them to scrape tweets until they are |
| 00:37:56 | <andrew> | exhausted of rate limit, and checks them back in to the central server, which waits until the limit has reset |
| 00:37:56 | <tomorrowRemoval> | also I think twitter has quoted, somewhere back in 2015, that they recieve about 6k tweets a second |
| 00:38:49 | <andrew> | see here for the Pushshift analysis of snowflakes: https://docs.google.com/document/d/1xVrPoNutyqTdQ04DXBEZW4ZW4A5RAQW2he7qIpTmG-M/edit |
| 00:38:59 | <tomorrowRemoval> | ooooh |
| 00:39:20 | <andrew> | I propose you hand out jobs for e.g. scraping all tweets with sequence ID 0 within, say, a minute, or hour |
| 00:39:37 | <andrew> | then later on, you move on to higher sequence IDs that yield progressively fewer tweets to complete the archive |
| 00:39:54 | <tomorrowRemoval> | right, low hanging fruit first, always |
| 00:40:17 | <andrew> | to determine the machine IDs that are generating tweets, you can use the search API or sample the time space randomly until you are reasonably confident of the result |
| 00:40:50 | <andrew> | oh, and I should probably mention that for every request, you can request up to 100 tweets. so that 180 requests per 15 minutes is more like up to 1800 tweets per 15 minutes |
| 00:41:37 | | upintheairsheep leaves |
| 00:42:06 | <andrew> | the fun thing is that guest tokens need not be used on the same IP they were generated on |
| 00:42:07 | <tomorrowRemoval> | here's the scuttlebutt, btw: |
| 00:42:09 | <tomorrowRemoval> | > Twitter currently has no engineers remaining on the team(s) that maintained their monorepo, build system, caching team, search team, timeline team, DNS, DHCP, NTP and egress proxy teams |
| 00:42:31 | <@arkiver> | tomorrowRemoval: source of that? |
| 00:42:49 | <andrew> | you can literally mint guest tokens over their Tor Onion hidden service and use them on clearnet, and the onion service's rate limit for minting guest tokens appears to be much higher than for a normal IP address |
| 00:43:47 | <andrew> | alternatively, you can use services like Luminati or Stormproxies to mint guest tokens, which you then hand to warriors or something |
| 00:44:15 | <andrew> | it's unclear whether there is a per IP rate limit, the only rate limit I can see is per guest token |
| 00:45:09 | <andrew> | alright, there's essentially the results of my prototyping from the past few days, here's hoping someone at AT turns it into reality so I don't have to :) |
| 00:45:13 | <tomorrowRemoval> | All I can say is that it's from someone who I most certainly trust and knows their way around the tech circle :/ |
| 00:45:24 | <tomorrowRemoval> | Sorry, it's probably not the answer you're looking for! |
| 00:45:28 | <andrew> | "sources familiar with the matter" |
| 00:45:40 | <@arkiver> | tomorrowRemoval: i'll consider it useless information then |
| 00:45:55 | <tomorrowRemoval> | Take it with a horse-licking-block grain of salt, indeed |
| 00:46:18 | <tomorrowRemoval> | (i did say scuttlebutt!) |
| 00:46:22 | <andrew> | I don't doubt it that Elon's mismanagement of Twitter makes it very much endangered |
| 00:47:02 | <andrew> | to shrink the scope of this project to something more manageable, I suggest starting at December 2017 - that's when the LoC stopped ingesting the Firehose |
| 00:47:42 | <andrew> | each tweet is around 600 bytes when compressed, so ~100 TiB per year of tweets you want to scrape |
| 00:48:13 | <andrew> | if you don't care about pictures, it's a very manageable amount of data |
| 00:48:48 | <tomorrowRemoval> | no way we can do pictures too |
| 00:48:50 | <tomorrowRemoval> | honestly |
| 00:49:06 | <andrew> | if you restrict pictures to only tweets with at least 100 likes or retweets, it's probably manageable |
| 00:51:57 | <andrew> | btw, if any AT folks want access to my prototype Rust code, I am happy to share |
| 00:52:10 | <tomorrowRemoval> | omg another rustacean! |
| 00:52:50 | <andrew> | 🦀🦀🦀 |
| 00:52:51 | | dasineura (dasineura) joins |
| 00:55:42 | | wywin joins |
| 00:58:22 | | wywin leaves |
| 01:02:40 | | qwertyasdfuiopghjkl joins |
| 01:02:43 | <tomorrowRemoval> | arkiver: would this be any good? https://twitter.com/alexeheath/status/1593399683086327808 - it's not as detailed as the scuttlebutt... |
| 01:04:23 | <tomorrowRemoval> | i've just realised how ironic it is the news is being shared through twitter |
| 01:05:07 | <joepie91|m> | well, it went out in style |
| 01:05:46 | <joepie91|m> | the news of its demise being shared on twitter right now and predicted by dril 5 years prior |
| 01:06:01 | <joepie91|m> | truly a twitter ending |
| 01:06:11 | <@arkiver> | 5 years prior a prediction of elon musk taking over? |
| 01:06:20 | <joepie91|m> | arkiver: not quite; https://twitter.com/dril/status/900592164589248513 |
| 01:06:45 | <joepie91|m> | (there's always a dril tweet) |
| 01:07:20 | <@arkiver> | he may just miss 2022 |
| 01:08:12 | <joepie91|m> | idk, Twitter is estimated to have lost 88% of its entire employee headcount by now, the offices are locked, the world cup is approaching |
| 01:08:44 | <joepie91|m> | seems quite possible to make 2022 still |
| 01:08:58 | <andrew> | insert this is fine gif here |
| 01:11:56 | <@JAA> | The website of the original 'this is fine' comic is down. This is fine. |
| 01:12:04 | <@arkiver> | ohno |
| 01:12:25 | <JTL> | fitting |
| 01:17:03 | <tomorrowRemoval> | you know how when you're watching a disaster compilation and you know something terrible is about to happen but you just can't look away |
| 01:18:24 | <tomorrowRemoval> | i'm gonna catch some Zs, but I'm 100% interested in working on something tomorrow |
| 01:20:32 | <schwarzkatz|m> | arkiver, have you seen what I wrote earlier about uploadir? |
| 01:20:51 | <@arkiver> | no |
| 01:20:55 | <@arkiver> | checking |
| 01:23:30 | <@arkiver> | schwarzkatz|m: no response yet? |
| 01:23:35 | <@arkiver> | feel free to PM me the information |
| 01:24:03 | <schwarzkatz|m> | Sadly no |
| 01:41:19 | | pk joins |
| 01:41:34 | | pk quits [Remote host closed the connection] |
| 01:55:44 | | Lord_Nightmare quits [Quit: ZNC - http://znc.in] |
| 02:01:34 | | Lord_Nightmare (Lord_Nightmare) joins |
| 02:07:26 | <TheTechRobo> | arkiver: Found a source: https://www.reddit.com/r/DataHoarder/comments/yy7tig/backup_twitter_now_multiple_critical_infra_teams/ |
| 02:15:28 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 02:18:04 | | blackle joins |
| 02:22:40 | | qwertyasdfuiopghjkl joins |
| 02:24:38 | | zaza joins |
| 02:24:54 | <zaza> | hi |
| 02:25:20 | <dasineura> | https://twitter.com/PopBase/status/1593427523206934529 |
| 02:25:24 | <dasineura> | never seen anything like this |
| 02:27:34 | | Anthony joins |
| 02:28:01 | | zaza quits [Remote host closed the connection] |
| 02:33:26 | | michaelblob (michaelblob) joins |
| 02:35:20 | | jman005 joins |
| 02:36:58 | | anononymous_penguin joins |
| 02:43:07 | | anelki (anelki) joins |
| 02:46:52 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 02:48:29 | | mut4ntm0nkey (mutantmonkey) joins |
| 02:53:39 | | lennier1 quits [Client Quit] |
| 02:53:56 | | lennier1 (lennier1) joins |
| 02:56:26 | | cascode joins |
| 02:56:55 | <lennier1> | Can a site the size of Twitter run on autopilot? I guess we're about to find out. |
| 03:00:22 | | schr0z1ng3r joins |
| 03:00:41 | <tech234a> | I speculate that it will probably have some outages/reliability issues but I doubt it will disappear completely |
| 03:01:03 | <tech234a> | there are still some people working at the company for now |
| 03:02:01 | | Hackerpcs quits [Client Quit] |
| 03:03:53 | | Hackerpcs (Hackerpcs) joins |
| 03:04:59 | <@JAA> | In the same way that a driverless train will keep running, I guess. |
| 03:09:36 | | Entropy joins |
| 03:09:59 | | Entropy quits [Remote host closed the connection] |
| 03:18:50 | | Earl joins |
| 03:27:45 | <Earl> | what’s the status with twitter? |
| 03:30:32 | <Frogging101> | The market for internet-ruining is about to get shaken up. |
| 03:30:36 | <@JAA> | https://transfer.archivete.am/inline/S38yt/fire.gif |
| 03:31:23 | | misbeseem joins |
| 03:32:43 | <Arcorann> | What was the channel for Twitter archiving discussion again |
| 03:34:27 | <Earl> | Is there one? This page just led me here https://wiki.archiveteam.org/index.php/Twitter |
| 03:35:48 | <@JAA> | There isn't one. |
| 03:36:32 | | sonick quits [Client Quit] |
| 03:37:08 | <@JAA> | We had one back on EFnet for a while when they were contemplating nuking inactive accounts, but that never happened, so the channel wasn't recreated after we moved here. |
| 03:40:21 | | surebet joins |
| 03:42:38 | | Earl quits [Remote host closed the connection] |
| 03:43:26 | | surebet quits [Remote host closed the connection] |
| 03:47:04 | | misbeseem quits [Remote host closed the connection] |
| 04:17:20 | <mind_combatant> | what's the easiest way to queue up around 1500 twitter URLs to be archived and end up on the wayback machine? |
| 04:17:52 | <mind_combatant> | preferably skipping any that are already saved there |
| 04:21:08 | <@JAA> | Define 'twitter URLs'? Tweets, users, something else? |
| 04:22:55 | | Earl joins |
| 04:23:00 | | Earl quits [Remote host closed the connection] |
| 04:29:12 | <mind_combatant> | specifically tweets, all in the form of "https://twitter.com/i/web/status/<id>" |
| 04:34:02 | | Iki1 joins |
| 04:38:15 | | Iki quits [Ping timeout: 276 seconds] |
| 04:38:43 | | nematode joins |
| 04:39:55 | | Lord_Nightmare quits [Client Quit] |
| 04:40:08 | | tomorrowRemoval quits [Client Quit] |
| 04:40:08 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 04:40:08 | | anononymous_penguin quits [Client Quit] |
| 04:40:08 | | blackle quits [Client Quit] |
| 04:40:08 | | Anthony quits [Client Quit] |
| 04:40:08 | | cascode quits [Client Quit] |
| 04:40:08 | | schr0z1ng3r quits [Client Quit] |
| 04:40:08 | | jman005 quits [Client Quit] |
| 04:40:17 | | qwertyasdfuiopghjkl joins |
| 04:40:19 | | Lord_Nightmare (Lord_Nightmare) joins |
| 04:43:45 | | nick joins |
| 04:44:04 | | nick quits [Remote host closed the connection] |
| 04:44:14 | | nick123456 joins |
| 04:46:35 | | Anthony joins |
| 04:46:36 | <nick123456> | just wondering if twitter will be a warrior project, seeing as it seems to be unstable and dying |
| 04:47:05 | | cascode joins |
| 04:47:37 | <@JAA> | mind_combatant: You can send me a list, and I'll run it through the machinery. Should eventually show up in the WBM. |
| 04:48:29 | <@JAA> | Or if those tweets are all from one account (or a small selection of them), we could just run that through socialbot. |
| 04:50:31 | <Anthony> | Google+ is now trending on Twitter. |
| 04:51:49 | | jacobk joins |
| 04:54:19 | <mind_combatant> | <JAA> "mind_combatant (Archie): You can..." <- i assume you mean in a PM, and as a .txt file? |
| 04:54:45 | <mind_combatant> | oh, whoops, forgot, i shouldn't do Matrix-style replies in here |
| 04:55:00 | <@JAA> | mind_combatant: Yeah (or here if you don't mind it being public). You can upload it to https://transfer.archivete.am/ |
| 05:01:44 | | maybe joins |
| 05:02:22 | | nick123456 quits [Remote host closed the connection] |
| 05:05:48 | | maybe quits [Remote host closed the connection] |
| 05:11:54 | | lun4 (lun4) joins |
| 05:12:34 | | ivan (ivan) joins |
| 05:31:07 | | Island joins |
| 05:55:11 | | tyoma joins |
| 05:55:13 | | megaminxwin joins |
| 05:58:03 | <megaminxwin> | question, because im not really sure about the best way to go about this: im wanting to archive all the tweets + images/videos of various people i follow, ive worked out how to get the json file of the users in question via snscrape + twarc, but im not sure how to get the files |
| 05:58:45 | <megaminxwin> | im assuming parsing the json file, but im not sure how well that would work or if theres a better method |
| 05:59:03 | <megaminxwin> | plus of course this doesnt work for private accounts i follow |
| 05:59:15 | <megaminxwin> | any suggestions? thanks |
| 06:01:20 | | jacobk quits [Ping timeout: 268 seconds] |
| 06:11:01 | | tyoma quits [Remote host closed the connection] |
| 06:11:18 | | BlueMaxima quits [Read error: Connection reset by peer] |
| 06:12:39 | | Dudebloke joins |
| 06:15:50 | | jacobk joins |
| 06:19:12 | | mut4ntm0nkey quits [Ping timeout: 255 seconds] |
| 06:25:03 | | Ketchup901 quits [Ping timeout: 255 seconds] |
| 06:25:40 | | Ketchup901 (Ketchup901) joins |
| 06:27:43 | | Dudebloke quits [Remote host closed the connection] |
| 06:32:33 | | mut4ntm0nkey (mutantmonkey) joins |
| 06:32:45 | <lennier1> | megaminxwin: You might check out the Twitter Media Downloader Chrome extension. Be sure to click "No Media" to also get text-only tweets. It does work with private accounts you follow to some extent (might not get really old tweets because the Twitter API doesn't return them so there's really no way to get them unless you already know the link). Would admittedly be annoying if you follow a lot of accounts. https://chrome.googl |
| 06:32:45 | <lennier1> | e.com/webstore/detail/twitter-media-downloader/cblpjenafgeohmnjknfhpdbdljfkndig |
| 06:38:30 | | jacobk_ joins |
| 06:38:51 | | jacobk quits [Client Quit] |
| 06:38:51 | | Lord_Nightmare quits [Client Quit] |
| 06:38:51 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 06:38:51 | | megaminxwin quits [Client Quit] |
| 06:38:51 | | cascode quits [Client Quit] |
| 06:38:51 | | Anthony quits [Client Quit] |
| 06:38:53 | | Lord_Nightmare2 (Lord_Nightmare) joins |
| 06:39:00 | | cascode joins |
| 06:39:24 | | Lord_Nightmare2 is now known as Lord_Nightmare |
| 06:50:54 | | Nick joins |
| 06:51:46 | | Nick is now known as NickNick |
| 06:53:19 | <NickNick> | So this week I've been working on exporting my own data from Twitter, and I just thought to come by here to see if anyone's attempting to take on that behemoth? |
| 07:02:49 | | NickNick quits [Remote host closed the connection] |
| 07:04:01 | | Nick joins |
| 07:04:43 | | Nick is now known as NickNick |
| 07:07:19 | | pabs quits [Ping timeout: 268 seconds] |
| 07:08:28 | | sec^nd quits [Remote host closed the connection] |
| 07:09:24 | | sec^nd (second) joins |
| 07:09:50 | | Island quits [Read error: Connection reset by peer] |
| 07:13:11 | | b joins |
| 07:13:34 | | pabs (pabs) joins |
| 07:16:11 | | sonick (sonick) joins |
| 07:20:38 | | b quits [Remote host closed the connection] |
| 07:20:38 | | NickNick quits [Remote host closed the connection] |
| 07:20:38 | | cascode quits [Remote host closed the connection] |
| 07:23:58 | | atphoenix quits [Ping timeout: 268 seconds] |
| 07:25:30 | | atphoenix (atphoenix) joins |
| 07:27:56 | | Anthony joins |
| 07:35:27 | | Anthony quits [Remote host closed the connection] |
| 07:38:19 | | Arachnophine3 (Arachnophine) joins |
| 07:40:03 | <sonick> | Does anyone know why the job about ArchiveBot's GeoLog project (https://geolog.mydns.jp/) has stopped? |
| 07:41:48 | | Arachnophine3 quits [Changing host] |
| 07:41:48 | | Arachnophine3 (Arachnophine) joins |
| 07:42:14 | <sonick> | Job id: 7dv1ztme3pksk96o7m168n1l3 |
| 07:44:08 | <ivan> | that seems like a question for #archivebot |
| 07:46:26 | <sonick> | ivan thanks! |
| 08:00:10 | <IDK> | JAA: https://www.businessinsider.com/twitter-offices-shutting-down-after-elon-musk-ended-remote-work-2022-11?r=US&IR=T |
| 08:00:41 | <IDK> | https://usercontent.irccloud-cdn.com/file/MlvUuEZa/image.png |
| 08:01:34 | <@JAA> | Why are you pinging me about this? |
| 08:01:49 | <IDK> | wrong channel |
| 08:03:55 | <IDK> | but yea, Users speculate site will shut down in the near future over mass employee exit |
| 08:12:54 | | icryclanteat joins |
| 08:14:24 | | sec^nd quits [Ping timeout: 255 seconds] |
| 08:20:21 | | sec^nd (second) joins |
| 08:22:43 | | megaminxwin joins |
| 08:24:34 | | sepro0 (sepro) joins |
| 08:25:01 | | sepro quits [Ping timeout: 268 seconds] |
| 08:25:01 | | sepro0 is now known as sepro |
| 08:27:36 | <megaminxwin> | lennier1: okay well i found the firefox version and am using that rn; so theres no real way to get tweets past the 3200 api limit? i thought the snscraper could go past that |
| 08:29:04 | <theblazehen|m> | megaminxwin: snscrape to get the actual tweets, then python script to iterate over that data grabbing the actual images? |
| 08:29:13 | <lennier1> | For private accounts, yes. It's not a problem with public accounts. |
| 08:37:10 | | qwertyasdfuiopghjkl joins |
| 08:38:19 | <megaminxwin> | ...im currently using that addon now, and its at over 4200 tweets so far |
| 08:38:43 | <megaminxwin> | theblazehen|m: thats what i was thinking, trouble is im really quite bad at python scripting |
| 08:39:14 | | sepro quits [Ping timeout: 265 seconds] |
| 08:39:26 | <megaminxwin> | i can convert the data to a json file with twarc and that does have links to the media, and then i imagine i can use some combination of jq and curl, but god knows how |
| 08:39:33 | <megaminxwin> | and yeah that doesnt work with private accounts |
| 08:39:59 | | sepro (sepro) joins |
| 08:40:47 | <megaminxwin> | okay weve hit 4500, so either the 3200 tweet api limit is no more (considering everything else i wouldnt be surprised if that just fell) or this addon is doing something. strange |
| 08:44:00 | | namwen joins |
| 08:51:38 | <lennier1> | The addon definitely does search for public acounts, if that's what you're trying. |
| 08:52:12 | | Ketchup901 quits [Ping timeout: 255 seconds] |
| 08:55:00 | | Ketchup901 (Ketchup901) joins |
| 08:57:54 | <theblazehen|m> | https://gist.github.com/theblazehen/6077c25577bf3579c44b9eff26c4901a Not fully tested |
| 09:00:23 | <megaminxwin> | will try and report back, cheers |
| 09:01:07 | <theblazehen|m> | File created with snscrape --jsonl --progress twitter-user Foone > foone_tweets.jsonl |
| 09:02:39 | <megaminxwin> | thatll explain why that wasnt working on the file i had |
| 09:03:12 | <megaminxwin> | keeps the filename as foone_tweets.jsonl because i cant be bothered editing the python script |
| 09:03:20 | <megaminxwin> | im a *professional* lazy |
| 09:03:28 | <theblazehen|m> | Hah! Relatable |
| 09:06:02 | <IDK> | Sorry for being off topic, but which API can I use for searching older twitter posts, I can only call GET /2/tweets/search/recent |
| 09:06:34 | <IDK> | The GUI interface does not work as well |
| 09:08:05 | <megaminxwin> | alright well snscrape got 8430 tweets, not the full 18.1k apparently on the account, i assume theres a reason but lets just test this first |
| 09:08:43 | <IDK> | megaminxwin: SNscrape doesnt seem to get the retweets |
| 09:08:50 | <megaminxwin> | how rude of it |
| 09:09:08 | <IDK> | and the tweet count seems to include the retweets |
| 09:10:27 | <megaminxwin> | alright the script doesnt work, sometimes it says 'NoneType' object has no attribute 'groups', and other times it goes "no such file or directory" |
| 09:10:51 | <theblazehen|m> | Ah, you need to `mkdir media |
| 09:10:57 | <megaminxwin> | ah, cheers |
| 09:11:08 | <theblazehen|m> | Have you got the latest revision? I fixed it shortly after my initial upload |
| 09:11:14 | <theblazehen|m> | That fixes most of the groups issue |
| 09:12:31 | <megaminxwin> | theeere we go |
| 09:17:34 | <megaminxwin> | im very intrigued in seeing what the addon is doing... hmm |
| 09:19:00 | <megaminxwin> | well, while this is happening |
| 09:19:05 | <megaminxwin> | gets out my ds |
| 09:19:13 | <megaminxwin> | see you in six months |
| 09:26:12 | <IDK> | Just curious, which addon are you guys using |
| 09:27:05 | | dasineura quits [Read error: Connection reset by peer] |
| 09:27:10 | <IDK> | nevermind |
| 09:29:52 | | Ketchup901 quits [Remote host closed the connection] |
| 09:30:05 | | Ketchup901 (Ketchup901) joins |
| 10:14:26 | <jacobk_> | above script modified by me for downloading videos also: https://bpa.st/V47A |
| 10:14:34 | | jacobk_ is now known as jacobk |
| 10:15:00 | <jacobk> | not sure how to cleanly get file extension though |
| 10:21:59 | <jacobk> | It does seem like snscrape might be missing some things. @copilotcase scrapes 0 tweets even though they have 2. |
| 10:23:09 | | megaminxwin quits [Ping timeout: 265 seconds] |
| 10:23:59 | <jacobk> | get "gifs" also: https://bpa.st/GMBA |
| 10:24:27 | <jacobk> | Check for failed media fetches because there could be other types too. |
| 10:25:11 | <ivan> | jacobk: this is expected behavior because Twitter search is broken |
| 10:25:30 | <ivan> | https://twitter.com/search?q=from%3Acopilotcase&src=spelling_expansion_revert_click&f=live |
| 10:26:06 | <jacobk> | understandable |
| 10:26:58 | <ivan> | to clarify, it's broken for particular users in unpredictable ways depending on gaps in tweet history, whether they've ever privated, and other unknown factors |
| 10:28:08 | <jacobk> | Yeah, I figured it was something like that; just wanted to make sure it was known. |
| 10:40:29 | | ggggg joins |
| 10:49:24 | <jacobk> | Maybe this will be useful to somebody, if you happen to use Akregator to subscribe to Twitter users, so you can get a list of all users you are subscribed to and then (try to) download all of their tweets: https://bpa.st/7BZQ |
| 10:49:33 | <jacobk> | (I'm going to sleep now) |
| 10:49:53 | <jacobk> | (Hopefully my hard drive isn't completely full in the morning :P) |
| 10:50:50 | <schwarzkatz|m> | Good night o/ |
| 10:57:10 | | ggggg quits [Remote host closed the connection] |
| 10:59:22 | | atphoenix quits [Remote host closed the connection] |
| 10:59:22 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
| 10:59:22 | | icryclanteat quits [Remote host closed the connection] |
| 10:59:22 | | namwen quits [Remote host closed the connection] |
| 10:59:26 | | atphoenix_ (atphoenix) joins |
| 11:02:28 | | qwertyasdfuiopghjkl joins |
| 11:17:47 | | ggggg joins |
| 11:17:59 | | ggggg quits [Remote host closed the connection] |
| 11:20:50 | | inconsistentUsername joins |
| 11:29:23 | <betamax_> | Is there a recommended set of options to add to wget so that when given a URL to a tweet it grabs all the necessary pre-requisites? |
| 11:29:28 | | betamax_ is now known as betamax |
| 11:31:05 | | Pichu0102 joins |
| 11:31:58 | <inconsistentUsername> | Good morning, I was bumming around here yesterday trying to see how I can help with archiving twitter. |
| 11:32:09 | <inconsistentUsername> | Anything I missed while I was away? :) |
| 11:54:01 | | inconsistentUsername quits [Ping timeout: 265 seconds] |
| 12:02:13 | | mut4ntm0nkey quits [Remote host closed the connection] |
| 12:03:35 | | mut4ntm0nkey (mutantmonkey) joins |
| 12:06:22 | | inconsistentUsername joins |
| 12:59:16 | | inconsistentUsername quits [Ping timeout: 257 seconds] |
| 13:02:58 | | namwen joins |
| 13:08:56 | | inconsistentUsername joins |
| 13:10:31 | | eroc1990 quits [Client Quit] |
| 13:13:39 | <TheTechRobo> | IDK: For older tweets, use snscrape. |
| 13:13:46 | <TheTechRobo> | snscrape CAN include retweets. 1sec |
| 13:14:54 | <TheTechRobo> | If you're scraping a user and are fine with a 3200 tweet limit, use the `twitter-profile` scraper (rather than `twitter-user`) to include retweets. |
| 13:15:48 | <TheTechRobo> | If you're using search OR the 3200 tweet limit doesn't work for you, you can include retweets from the past 7 days (non-retweets will not be affected, though, unless Twitter's search does something weird) with `include:nativeretweets` as a search operator. |
| 13:23:20 | | inconsistentUsername quits [Remote host closed the connection] |
| 13:23:25 | | inconsistentUsername joins |
| 13:30:45 | | eroc1990 (eroc1990) joins |
| 13:32:47 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 13:34:28 | | pie_ quits [] |
| 13:34:39 | | pie_ joins |
| 13:41:01 | | qwertyasdfuiopghjkl joins |
| 13:44:15 | | Arcorann quits [Ping timeout: 276 seconds] |
| 13:46:06 | | qw3rty joins |
| 13:46:06 | | qw3rty__ quits [Read error: Connection reset by peer] |
| 13:46:49 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:47:17 | | qw3rty joins |
| 13:48:09 | | qw3rty_ joins |
| 13:48:09 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:48:33 | | qw3rty_ quits [Read error: Connection reset by peer] |
| 13:48:45 | | qw3rty_ joins |
| 13:49:09 | | qw3rty_ quits [Read error: Connection reset by peer] |
| 13:49:46 | | qw3rty joins |
| 13:50:18 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:51:00 | | qw3rty joins |
| 13:51:48 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:53:14 | | qw3rty joins |
| 13:54:11 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:54:33 | | qw3rty joins |
| 13:56:15 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:56:25 | | qw3rty_ joins |
| 13:57:08 | | qw3rty_ quits [Read error: Connection reset by peer] |
| 13:58:30 | | qw3rty joins |
| 13:59:05 | | qw3rty quits [Read error: Connection reset by peer] |
| 13:59:40 | | qw3rty joins |
| 14:00:26 | | qw3rty_ joins |
| 14:00:26 | | qw3rty quits [Read error: Connection reset by peer] |
| 14:00:50 | | qw3rty_ quits [Read error: Connection reset by peer] |
| 14:08:48 | | lunik17 joins |
| 14:10:39 | | inconsistentUsername quits [Remote host closed the connection] |
| 14:13:13 | | tech_exorcist (tech_exorcist) joins |
| 14:33:24 | | inconsistentUsername joins |
| 14:38:20 | | LeGoupil joins |
| 14:56:40 | | eroc1990 quits [Client Quit] |
| 14:56:40 | | LeGoupil quits [Remote host closed the connection] |
| 14:56:48 | | LeGoupil joins |
| 14:56:58 | | eroc1990 (eroc1990) joins |
| 15:01:20 | | LeGoupil quits [Remote host closed the connection] |
| 15:01:20 | | qwertyasdfuiopghjkl quits [Client Quit] |
| 15:01:20 | | namwen quits [Client Quit] |
| 15:01:20 | | inconsistentUsername quits [Client Quit] |
| 15:01:33 | | LeGoupil joins |
| 15:03:31 | | eroc1990 quits [Client Quit] |
| 15:11:30 | | qwertyasdfuiopghjkl joins |
| 15:24:01 | | tech_exorcist quits [Remote host closed the connection] |
| 15:25:03 | | tech_exorcist (tech_exorcist) joins |
| 15:26:28 | | spirit joins |
| 15:32:50 | | Island joins |
| 15:35:50 | | inconsistentUsername joins |
| 15:40:24 | | eroc1990 (eroc1990) joins |
| 15:52:53 | <fishingforsoup> | How familiar... |
| 15:52:55 | <fishingforsoup> | Could you upload Ain't My Fault's beta? I have it if you wish. Just send me an email! |
| 15:53:01 | <fishingforsoup> | Wrong paste. |
| 15:53:09 | <fishingforsoup> | https://spacehey.com/ |
| 15:58:15 | | holbrooke joins |
| 16:14:03 | | inconsistentUsername quits [Ping timeout: 265 seconds] |
| 16:18:20 | | inconsistentUsername joins |
| 16:26:28 | | Jason80 joins |
| 16:39:35 | | Jason80 quits [Remote host closed the connection] |
| 16:45:58 | | tech_exorcist quits [Remote host closed the connection] |
| 16:46:58 | | tech_exorcist (tech_exorcist) joins |
| 17:01:08 | | HP_Archivist (HP_Archivist) joins |
| 17:04:48 | | inconsistentUsername quits [Ping timeout: 265 seconds] |
| 17:09:23 | | HP_Archivist quits [Client Quit] |
| 17:11:19 | <h2ibot> | Jarshua edited Twitter (+249): https://wiki.archiveteam.org/?diff=49161&oldid=49157 |
| 17:11:20 | <h2ibot> | Squidboy edited ArchiveBot/Antarctica (+104, +Queen Maud Land): https://wiki.archiveteam.org/?diff=49162&oldid=40412 |
| 17:20:26 | | upintheairsheep joins |
| 17:21:55 | | cascode joins |
| 17:22:10 | <upintheairsheep> | lennier1 Hello, to focus on the samsung store project, please decrypt the firmware with this python 2.7 script, I haven't found a way to get the python 2.7 encryption modules working on my devices. https://hackint.logs.kiska.pw/archiveteam-bs/20221118 |
| 17:31:34 | <upintheairsheep> | And off-topic, there is a website called https://decrypt.day/ which is a mirror on the app store on onedrive, and is blocked by hcaptia before downloading. |
| 17:36:14 | <upintheairsheep> | the browse page of the app store looks different, and I too am getting 404 errors. |
| 17:38:03 | <upintheairsheep> | https://archive.ph/Pptae |
| 17:38:21 | <upintheairsheep> | The links are still working to this day on archive.ph |
| 17:51:12 | | qwertyasdfuiopghjkl quits [Ping timeout: 265 seconds] |
| 17:55:19 | <upintheairsheep> | lennier1 I'm calling you for this project, before it gets forgotton |
| 18:02:42 | <lennier1> | I don't really have time at the moment. Right now, I'm literally at work. :) |
| 18:03:01 | <@arkiver> | upintheairsheep: don't spam people |
| 18:03:13 | <@arkiver> | you can leave a message, mention someone, but just wait for a reply |
| 18:03:27 | <@arkiver> | if there is no reply after a long time (say day or two), feel free to ping again |
| 18:03:32 | <upintheairsheep> | ok |
| 18:03:44 | <upintheairsheep> | Me too. |
| 18:03:48 | | upintheairsheep leaves |
| 18:13:52 | | qw3rty joins |
| 18:37:49 | | tomorrowRemoval joins |
| 18:44:45 | | LeGoupil quits [Client Quit] |
| 18:49:38 | | wyatt8750 joins |
| 18:51:03 | | wyatt8740 quits [Ping timeout: 276 seconds] |
| 18:54:50 | | tech_exorcist quits [Remote host closed the connection] |
| 18:56:35 | | tech_exorcist (tech_exorcist) joins |
| 18:59:25 | | cascode quits [Remote host closed the connection] |
| 19:00:37 | <h2ibot> | JAABot edited CurrentWarriorProject (-4): https://wiki.archiveteam.org/?diff=49163&oldid=49134 |
| 19:00:39 | | tech_exorcist quits [Remote host closed the connection] |
| 19:01:08 | | tech_exorcist (tech_exorcist) joins |
| 19:03:18 | | HackMii quits [Ping timeout: 255 seconds] |
| 19:04:35 | | HackMii (hacktheplanet) joins |
| 19:09:42 | | tech_exorcist quits [Read error: Connection reset by peer] |
| 19:10:10 | | tech_exorcist (tech_exorcist) joins |
| 19:21:19 | | TheTechRobo quits [Client Quit] |
| 19:21:41 | | TheTechRobo (TheTechRobo) joins |
| 19:22:20 | | TheTechRobo quits [Client Quit] |
| 19:22:41 | | TheTechRobo (TheTechRobo) joins |
| 19:28:06 | | tech_exorcist quits [Read error: Connection reset by peer] |
| 19:28:41 | | TheTechRobo quits [Remote host closed the connection] |
| 19:29:02 | | TheTechRobo (TheTechRobo) joins |
| 19:29:06 | | tech_exorcist (tech_exorcist) joins |
| 19:29:07 | | TheTechRobo quits [Remote host closed the connection] |
| 19:29:31 | | TheTechRobo (TheTechRobo) joins |
| 19:30:25 | | TheTechRobo quits [Client Quit] |
| 19:30:48 | | TheTechRobo (TheTechRobo) joins |
| 19:55:30 | | HackMii quits [Ping timeout: 255 seconds] |
| 20:02:36 | | HackMii (hacktheplanet) joins |
| 20:19:21 | | HackMii quits [Ping timeout: 255 seconds] |
| 20:22:16 | | HackMii (hacktheplanet) joins |
| 20:23:35 | | cascode joins |
| 20:53:23 | | wyatt8740 joins |
| 20:53:36 | | eroc1990 quits [Client Quit] |
| 20:55:18 | <@JAA> | So my Twitter US election candidate rescrape found about 170k tweets less but 221k tweets that weren't in the first scrape. So roughly 391k older tweets vanished for one reason or another, I guess. |
| 20:56:01 | | upintheairsheep joins |
| 20:56:19 | | wyatt8750 quits [Ping timeout: 265 seconds] |
| 20:56:21 | | eroc1990 (eroc1990) joins |
| 20:59:04 | <upintheairsheep> | Even though Roblox archival is not needed right now, a yt-dlp developer has made a pull request to support it, but it is currently a draft. |
| 20:59:05 | <upintheairsheep> | https://github.com/yt-dlp/yt-dlp/pull/5178 |
| 21:00:50 | <upintheairsheep> | I'm a python newbie, but I tried to add support for comments extraction. https://github.com/upintheairsheep/ytdl-sheep/blob/main/yt_dlp/extractor/roblox.py |
| 21:02:09 | <upintheairsheep> | However, it is untested due to my software limitations and does not support looping after the first 10 |
| 21:09:10 | | upintheairsheep leaves |
| 21:10:19 | | wyatt8750 joins |
| 21:11:47 | | wyatt8740 quits [Ping timeout: 265 seconds] |
| 21:19:20 | | eroc1990 quits [Client Quit] |
| 21:20:50 | | eroc1990 (eroc1990) joins |
| 21:21:56 | | wyatt8750 quits [Ping timeout: 265 seconds] |
| 21:22:30 | | wyatt8740 joins |
| 21:37:14 | | lennier1 quits [Client Quit] |
| 21:38:49 | | lennier1 (lennier1) joins |
| 22:28:49 | <tomorrowRemoval> | Oh hey, the warrior auto-select has moved to reddit. |
| 22:28:53 | <tomorrowRemoval> | Are we done with telegram? |
| 22:58:12 | | tomorrowRemoval quits [Client Quit] |
| 22:58:12 | | cascode quits [Client Quit] |
| 23:02:06 | | BlueMaxima joins |
| 23:03:09 | | HackMii quits [Ping timeout: 255 seconds] |
| 23:03:59 | | tech_exorcist quits [Client Quit] |
| 23:06:55 | | HackMii (hacktheplanet) joins |
| 23:07:18 | | lennier1 quits [Client Quit] |
| 23:07:38 | | lennier1 (lennier1) joins |
| 23:21:39 | | jacobk quits [Ping timeout: 268 seconds] |
| 23:23:37 | | HP_Archivist (HP_Archivist) joins |
| 23:25:40 | | eroc1990 quits [Client Quit] |
| 23:31:06 | | spirit quits [Client Quit] |
| 23:32:18 | | eroc1990 (eroc1990) joins |
| 23:41:27 | | HP_Archivist quits [Client Quit] |
| 23:43:16 | | XanaAdmin joins |