00:33:42<nyany>wouldn't put it past x to do something dumb, JAA
01:29:00kiryu joins
01:29:00kiryu quits [Changing host]
01:29:00kiryu (kiryu) joins
02:10:50RealPerson leaves
02:30:13abirkill (abirkill) joins
02:38:42StarletCharlotte quits [Remote host closed the connection]
03:14:05beastbg8_ quits [Read error: Connection reset by peer]
03:38:01BPCZ quits [Ping timeout: 255 seconds]
03:53:17yarrow joins
03:56:28michaelblob quits [Read error: Connection reset by peer]
03:57:49michaelblob (michaelblob) joins
04:01:20DogsRNice joins
04:29:11beastbg8 (beastbg8) joins
04:54:50grid joins
04:56:02muklumsum_ joins
04:56:13muklumsum quits [Ping timeout: 272 seconds]
04:57:47Island quits [Read error: Connection reset by peer]
04:58:04benjins2_ quits [Read error: Connection reset by peer]
05:35:02BPCZ (BPCZ) joins
06:05:01that_lurker quits [Remote host closed the connection]
06:07:05benjins joins
06:08:46benjinsm quits [Ping timeout: 255 seconds]
06:13:17that_lurker joins
06:23:51DogsRNice quits [Read error: Connection reset by peer]
06:26:22parfait quits [Client Quit]
06:29:47BlueMaxima quits [Read error: Connection reset by peer]
07:01:16Arcorann (Arcorann) joins
07:04:43grid quits [Client Quit]
07:05:02Unholy23619246453771 quits [Remote host closed the connection]
07:06:11Unholy23619246453771 (Unholy2361) joins
07:25:18michaelblob quits [Read error: Connection reset by peer]
07:29:03michaelblob (michaelblob) joins
07:46:16tmob joins
07:58:33tmob quits [Client Quit]
08:59:46michaelblob quits [Read error: Connection reset by peer]
09:00:06Bleo1826007227196 quits [Client Quit]
09:01:26Bleo1826007227196 joins
09:35:21benjinsm joins
09:38:03benjins quits [Ping timeout: 272 seconds]
09:46:07muklumsum_ quits [Ping timeout: 255 seconds]
09:46:17muklumsum joins
09:46:52knecht4 quits [Quit: knecht420]
09:47:18knecht4 joins
09:50:06yarrow quits [Read error: Connection reset by peer]
09:52:26yarrow (yarrow) joins
10:14:28muklumsum quits [Ping timeout: 255 seconds]
10:21:36muklumsum joins
10:28:48mighty-bob joins
10:59:30BornOn420 (BornOn420) joins
11:44:06decky_e quits [Read error: Connection reset by peer]
11:56:10mighty-bob quits [Ping timeout: 255 seconds]
12:05:42nertzy joins
12:49:22benjins2 joins
13:02:39mighty-bob joins
13:14:01muklumsum quits [Ping timeout: 255 seconds]
13:16:18muklumsum joins
13:27:19Arcorann quits [Ping timeout: 272 seconds]
13:55:57<pabs>-rss/#hackernews- Noam Chomsky 'no longer able to talk' after 'medical event': https://www.independent.co.uk/arts-entertainment/books/news/noam-chomsky-health-update-tributes-b2559831.html https://news.ycombinator.com/item?id=40641361
14:09:24mighty-bob quits [Client Quit]
14:11:33ArchivalEfforts_ joins
14:14:53loug4 joins
14:20:04katocala quits [Remote host closed the connection]
14:28:03nulldata quits [Read error: Connection reset by peer]
14:28:08nulldata (nulldata) joins
14:31:39katocala joins
15:10:05ArchivalEfforts_ quits [Client Quit]
15:10:31ArchivalEfforts_ joins
15:21:46MrMcNuggets (MrMcNuggets) joins
15:27:10MrMcNuggets quits [Remote host closed the connection]
15:27:21MrMcNuggets (MrMcNuggets) joins
15:29:27sidpatchy joins
15:31:46sidpatchy quits [Remote host closed the connection]
15:31:56sidpatchy joins
15:36:03sidpatchy quits [Remote host closed the connection]
15:36:17sidpatchy joins
15:36:59sidpatchy quits [Remote host closed the connection]
15:37:12sidpatchy joins
15:37:16sidpatchy quits [Remote host closed the connection]
15:37:31sidpatchy joins
15:38:12sidpatchy quits [Remote host closed the connection]
15:38:27sidpatchy joins
15:39:15sidpatchy quits [Remote host closed the connection]
15:51:37Dango360 quits [Quit: Leaving]
15:55:49Dango360 (Dango360) joins
15:59:10icedice quits [Client Quit]
15:59:29MrMcNuggets quits [Client Quit]
16:11:13eightthree quits [Remote host closed the connection]
16:12:09eightthree joins
16:12:30eightthree quits [Remote host closed the connection]
16:13:44eightthree joins
16:19:07Sidpatchy joins
16:30:45eightthree quits [Remote host closed the connection]
16:32:16Notrealname1234 (Notrealname1234) joins
16:35:39eightthree joins
17:00:15Notrealname1234 quits [Client Quit]
17:12:18nicolas17 joins
17:12:46<nicolas17>damn, I forgot *how* busy things get in apple-hacking-land after WWDC
17:14:04<nicolas17>there's 5 files from yesterday on https://data.nicolas17.xyz/samsung-grab/
17:14:09<nicolas17>and I'll be adding 6 more after lunch
17:35:25fireonlive is now known as it
17:36:11nyany is now known as do
17:40:43do is now known as nyany
17:45:02it is now known as fireonlive
18:01:54Island joins
18:05:46ShadowJonathan quits [Read error: Connection reset by peer]
18:05:50ShadowJonathan (ShadowJonathan) joins
18:05:58VonGuard quits [Read error: Connection reset by peer]
18:06:00VonGuard joins
18:06:02cptcobalt quits [Read error: Connection reset by peer]
18:06:06cptcobalt joins
18:06:34loopy quits [Read error: Connection reset by peer]
18:06:35devsnek quits [Read error: Connection reset by peer]
18:06:36loopy joins
18:06:40devsnek (devsnek) joins
18:37:01muklumsum quits [Ping timeout: 272 seconds]
18:38:02muklumsum joins
19:12:26Matthww quits [Quit: The Lounge - https://thelounge.chat]
19:18:02Matthww joins
19:24:38michaelblob (michaelblob) joins
19:32:07Notrealname1234 (Notrealname1234) joins
19:49:09Notrealname1234 quits [Client Quit]
19:56:46muklumsum quits [Ping timeout: 255 seconds]
20:00:14muklumsum joins
20:00:39<that_lurker>nicolas17: https://img.kuhaon.fun/u/aIZKzJ.gif
20:13:50Notrealname1234 (Notrealname1234) joins
20:27:10wyatt8750 joins
20:28:29wyatt8740 quits [Ping timeout: 272 seconds]
20:48:10Notrealname1234 quits [Client Quit]
21:06:03DogsRNice joins
21:33:13Notrealname1234 (Notrealname1234) joins
21:33:54<eggdrop>[remind] JAA: Remove {[[PANDA]]} unless someone spoke up.
21:35:10<@JAA>Oh yeah, the random curly brackets.
21:36:04Notrealname1234 quits [Client Quit]
21:43:23<h2ibot>JustAnotherArchivist deleted PANDA (This does not deserve a wiki page here;…)
21:47:46PredatorIWD quits [Read error: Connection reset by peer]
21:49:26AK quits [Client Quit]
21:50:12AK (AK) joins
21:53:14PredatorIWD joins
22:05:49sebs joins
22:06:04<sebs>Elo! I do have a personal data science project, that might generate a unique set of urls and I may have the means to run the machines to collect all the content myself. Would this be the right chat to ask questions and maybe get some (emotional) support?
22:07:19<sebs>data in question is content generated by politicians and political institutions. data is generated by a personal open (?) data sideproject that got 'out of hand' ;)
22:07:45<imer>sebs: sounds like you're in the right place, yes
22:07:52<sebs><3
22:08:46<sebs>in a nutshell what I (partly intend to) do:
22:08:56<imer>sounds like we might even be interested in archiving that data if its publically available (but thats not my call to make, so take that with a bucket of salt)
22:09:11<sebs>this is why I am here
22:10:14<sebs>its only politcs (official politician pages, official bodies etc). I am trying to map the network of (germand and euro) politics (who is linking who etc) and as a sideproduct I produce the big url set
22:11:02<sebs>at some point i need the data of the pages, but that promps the question about content changes. Writing this on a napkin leads to 'archive does most of that'
22:12:01<sebs>plan is to collect about 200 million content urls in about 5 years
22:12:58<sebs>question: would such a project fill gaps in the content of the wayback machines data, or is it there already anyway?
22:14:41<sebs>question 2: can i run my own 'warrior' to archive the stuff myself and help out the project that way? It strikes me this is a way better method to 're-index' the data and look for changes than writing this myself (so many hard parts, so lazy sebs)
22:16:41<that_lurker>Do you have a sample of the urls? Are the urls on single website or in mutliple? Most likely this could be done through archivebot, but there is a posivility the url project could fit this.
22:16:52<imer>I don't think we have tooling to only rearchive pages that have changed, that might have to happen externally
22:17:59sec^nd quits [Ping timeout: 250 seconds]
22:20:07<sebs>that_lurker: how big do you want the sample to be? the urls are multiple urls on multiple websites.
22:20:47<sebs>imer: did not knew that. Thought this was kind of the purpose of this project. Never thought about how the timeline feature is built.
22:22:07<imer>it's just separate copies that happened to have been archived again (for reasons), we probably don't want to re-archive all those millions of urls constantly
22:22:21<that_lurker>if they are on different websites that could be something datechnoman would maybe like to take a look at :-)
22:22:49<thuban>sebs: wrt your first question, it depends. government bodies tend to be well-covered; individual politicians, especially local politicians, less so (and we do a lot of that through #archivebot, so your suggestions would be welcome there)
22:23:06<thuban>wrt your second question, warriors only work on warrior projects, so you couldn't use a warrior to archive arbitrary websites; however, you could suggest arbitrary websites for archival in #archivebot (or urls in #//, with some caveats).
22:23:38<TheTechRobo>Maybe this could even be a #Y thing?
22:23:38<thuban>that said, if you're doing serious data processing, that might not actually have the advantages you'd think it does, because (a) neither archiveteam (as imer said) nor the internet archive do change detection, and (b) you would have to get the data back out of the internet archive, which is slow
22:23:39sec^nd (second) joins
22:24:01<thuban>yeah... if #Y ever happens ;)
22:27:56<sebs>thuban: for fast access i would dl the data for myself. I am assuming that. if I can fill some gaps at first, that would be worth the effort for me already.
22:28:15<sebs>thanks for the hint with the warrior projects
22:28:43<sebs>what is #Y
22:30:08<thuban>sebs: i'm not sure what you mean exactly by "fill some gaps". are you suggesting that you would download websites and upload them to the internet archive yourself?
22:30:54<TheTechRobo>sebs: Basically a Warrior project where the intention is to make a distributed ArchiveBot with possibility for custom code without making an entirely new project. Unfortunately, it's currently dead.
22:31:01<thuban>#Y is https://wiki.archiveteam.org/index.php/Distributed_recursive_crawls, a hypothetical project that's been in planning stages for years
22:31:20<thuban>(ninja'd)
22:32:00<sebs>thuban: filling in some gaps means I do think I will find some parts of the poitical space soon that are missing in archive. Especailly the mentioned local politicians. And yes, if I could do it, i would be at least give it a try to operate the infra myself instead of stressing already stressed ressources.
22:33:35<sebs>TheTechRobo: My spidey senses tingled on custom code ;) Having tried a bunch of things in that area: that is no easy feat and I do have an idea why smth like this stalls.
22:34:13<thuban>sebs: while you're of course welcome to upload your own web archives to the internet archive, note that ia does not index third-party warcs into the wayback machine (since there's no way to guarantee their correctness).
22:34:52<thuban>warcs from archivebot (like other archiveteam projects) _do_ go into the wbm, so please don't hesitate to make suggestions there
22:37:11<sebs>thuban: I did read about the warc format. I do have to do my homework there as I only discovered the archive team wiki in the middle of the night. Thanks for clarification regarding being in teh wbm or not. was not aware
22:41:29benjins2 quits [Ping timeout: 272 seconds]
22:45:01benjins2 joins
22:55:30<sebs>Anyway: thanks for teaching me a lot and taking the time. very much apprechiated
23:08:54BlueMaxima joins
23:09:59<thuban>you're welcome!
23:11:53muklumsum quits [Ping timeout: 272 seconds]
23:12:55muklumsum joins
23:38:29muklumsum quits [Ping timeout: 272 seconds]
23:39:44muklumsum joins
23:41:19benjins2 quits [Ping timeout: 255 seconds]