00:33:42 | <nyany> | wouldn't put it past x to do something dumb, JAA |
01:29:00 | | kiryu joins |
01:29:00 | | kiryu is now authenticated as kiryu |
01:29:00 | | kiryu quits [Changing host] |
01:29:00 | | kiryu (kiryu) joins |
02:10:50 | | RealPerson leaves |
02:30:13 | | abirkill (abirkill) joins |
02:38:42 | | StarletCharlotte quits [Remote host closed the connection] |
03:14:05 | | beastbg8_ quits [Read error: Connection reset by peer] |
03:38:01 | | BPCZ quits [Ping timeout: 255 seconds] |
03:53:17 | | yarrow joins |
03:56:28 | | michaelblob quits [Read error: Connection reset by peer] |
03:57:49 | | michaelblob (michaelblob) joins |
04:01:20 | | DogsRNice joins |
04:29:11 | | beastbg8 (beastbg8) joins |
04:54:50 | | grid joins |
04:56:02 | | muklumsum_ joins |
04:56:13 | | muklumsum quits [Ping timeout: 272 seconds] |
04:57:47 | | Island quits [Read error: Connection reset by peer] |
04:58:04 | | benjins2_ quits [Read error: Connection reset by peer] |
05:35:02 | | BPCZ (BPCZ) joins |
06:05:01 | | that_lurker quits [Remote host closed the connection] |
06:07:05 | | benjins joins |
06:08:46 | | benjinsm quits [Ping timeout: 255 seconds] |
06:13:17 | | that_lurker joins |
06:13:17 | | that_lurker is now authenticated as that_lurker |
06:23:51 | | DogsRNice quits [Read error: Connection reset by peer] |
06:26:22 | | parfait quits [Client Quit] |
06:29:47 | | BlueMaxima quits [Read error: Connection reset by peer] |
07:01:16 | | Arcorann (Arcorann) joins |
07:04:43 | | grid quits [Client Quit] |
07:05:02 | | Unholy23619246453771 quits [Remote host closed the connection] |
07:06:11 | | Unholy23619246453771 (Unholy2361) joins |
07:25:18 | | michaelblob quits [Read error: Connection reset by peer] |
07:29:03 | | michaelblob (michaelblob) joins |
07:46:16 | | tmob joins |
07:58:33 | | tmob quits [Client Quit] |
07:59:43 | | yarrow is now authenticated as yarrow |
08:59:46 | | michaelblob quits [Read error: Connection reset by peer] |
09:00:06 | | Bleo1826007227196 quits [Client Quit] |
09:01:26 | | Bleo1826007227196 joins |
09:35:21 | | benjinsm joins |
09:38:03 | | benjins quits [Ping timeout: 272 seconds] |
09:46:07 | | muklumsum_ quits [Ping timeout: 255 seconds] |
09:46:17 | | muklumsum joins |
09:46:52 | | knecht4 quits [Quit: knecht420] |
09:47:18 | | knecht4 joins |
09:50:06 | | yarrow quits [Read error: Connection reset by peer] |
09:52:26 | | yarrow (yarrow) joins |
10:14:28 | | muklumsum quits [Ping timeout: 255 seconds] |
10:21:36 | | muklumsum joins |
10:28:48 | | mighty-bob joins |
10:59:30 | | BornOn420 (BornOn420) joins |
11:44:06 | | decky_e quits [Read error: Connection reset by peer] |
11:56:10 | | mighty-bob quits [Ping timeout: 255 seconds] |
12:05:42 | | nertzy joins |
12:49:22 | | benjins2 joins |
13:02:39 | | mighty-bob joins |
13:14:01 | | muklumsum quits [Ping timeout: 255 seconds] |
13:16:18 | | muklumsum joins |
13:27:19 | | Arcorann quits [Ping timeout: 272 seconds] |
13:55:57 | <pabs> | -rss/#hackernews- Noam Chomsky 'no longer able to talk' after 'medical event': https://www.independent.co.uk/arts-entertainment/books/news/noam-chomsky-health-update-tributes-b2559831.html https://news.ycombinator.com/item?id=40641361 |
14:09:24 | | mighty-bob quits [Client Quit] |
14:11:33 | | ArchivalEfforts_ joins |
14:14:53 | | loug4 joins |
14:20:04 | | katocala quits [Remote host closed the connection] |
14:28:03 | | nulldata quits [Read error: Connection reset by peer] |
14:28:08 | | nulldata (nulldata) joins |
14:31:39 | | katocala joins |
14:31:39 | | katocala is now authenticated as katocala |
15:10:05 | | ArchivalEfforts_ quits [Client Quit] |
15:10:31 | | ArchivalEfforts_ joins |
15:21:46 | | MrMcNuggets (MrMcNuggets) joins |
15:27:10 | | MrMcNuggets quits [Remote host closed the connection] |
15:27:21 | | MrMcNuggets (MrMcNuggets) joins |
15:29:27 | | sidpatchy joins |
15:31:46 | | sidpatchy quits [Remote host closed the connection] |
15:31:56 | | sidpatchy joins |
15:36:03 | | sidpatchy quits [Remote host closed the connection] |
15:36:17 | | sidpatchy joins |
15:36:59 | | sidpatchy quits [Remote host closed the connection] |
15:37:12 | | sidpatchy joins |
15:37:16 | | sidpatchy quits [Remote host closed the connection] |
15:37:31 | | sidpatchy joins |
15:38:12 | | sidpatchy quits [Remote host closed the connection] |
15:38:27 | | sidpatchy joins |
15:39:15 | | sidpatchy quits [Remote host closed the connection] |
15:51:37 | | Dango360 quits [Quit: Leaving] |
15:55:49 | | Dango360 (Dango360) joins |
15:59:10 | | icedice quits [Client Quit] |
15:59:29 | | MrMcNuggets quits [Client Quit] |
16:11:13 | | eightthree quits [Remote host closed the connection] |
16:12:09 | | eightthree joins |
16:12:30 | | eightthree quits [Remote host closed the connection] |
16:13:44 | | eightthree joins |
16:19:07 | | Sidpatchy joins |
16:30:45 | | eightthree quits [Remote host closed the connection] |
16:32:16 | | Notrealname1234 (Notrealname1234) joins |
16:35:39 | | eightthree joins |
17:00:15 | | Notrealname1234 quits [Client Quit] |
17:12:18 | | nicolas17 joins |
17:12:46 | <nicolas17> | damn, I forgot *how* busy things get in apple-hacking-land after WWDC |
17:14:04 | <nicolas17> | there's 5 files from yesterday on https://data.nicolas17.xyz/samsung-grab/ |
17:14:09 | <nicolas17> | and I'll be adding 6 more after lunch |
17:21:44 | | Sidpatchy is now authenticated as Sidpatchy |
17:35:25 | | fireonlive is now known as it |
17:36:11 | | nyany is now known as do |
17:40:43 | | do is now known as nyany |
17:45:02 | | it is now known as fireonlive |
18:01:54 | | Island joins |
18:05:46 | | ShadowJonathan quits [Read error: Connection reset by peer] |
18:05:50 | | ShadowJonathan (ShadowJonathan) joins |
18:05:58 | | VonGuard quits [Read error: Connection reset by peer] |
18:06:00 | | VonGuard joins |
18:06:02 | | cptcobalt quits [Read error: Connection reset by peer] |
18:06:06 | | cptcobalt joins |
18:06:34 | | loopy quits [Read error: Connection reset by peer] |
18:06:35 | | devsnek quits [Read error: Connection reset by peer] |
18:06:36 | | loopy joins |
18:06:40 | | devsnek (devsnek) joins |
18:37:01 | | muklumsum quits [Ping timeout: 272 seconds] |
18:38:02 | | muklumsum joins |
19:12:26 | | Matthww quits [Quit: The Lounge - https://thelounge.chat] |
19:18:02 | | Matthww joins |
19:24:38 | | michaelblob (michaelblob) joins |
19:32:07 | | Notrealname1234 (Notrealname1234) joins |
19:49:09 | | Notrealname1234 quits [Client Quit] |
19:56:46 | | muklumsum quits [Ping timeout: 255 seconds] |
20:00:14 | | muklumsum joins |
20:00:39 | <that_lurker> | nicolas17: https://img.kuhaon.fun/u/aIZKzJ.gif |
20:13:50 | | Notrealname1234 (Notrealname1234) joins |
20:27:10 | | wyatt8750 joins |
20:28:29 | | wyatt8740 quits [Ping timeout: 272 seconds] |
20:48:10 | | Notrealname1234 quits [Client Quit] |
21:06:03 | | DogsRNice joins |
21:33:13 | | Notrealname1234 (Notrealname1234) joins |
21:33:54 | <eggdrop> | [remind] JAA: Remove {[[PANDA]]} unless someone spoke up. |
21:35:10 | <@JAA> | Oh yeah, the random curly brackets. |
21:36:04 | | Notrealname1234 quits [Client Quit] |
21:43:23 | <h2ibot> | JustAnotherArchivist deleted PANDA (This does not deserve a wiki page here;…) |
21:47:46 | | PredatorIWD quits [Read error: Connection reset by peer] |
21:49:26 | | AK quits [Client Quit] |
21:50:12 | | AK (AK) joins |
21:53:14 | | PredatorIWD joins |
22:05:49 | | sebs joins |
22:06:04 | <sebs> | Elo! I do have a personal data science project, that might generate a unique set of urls and I may have the means to run the machines to collect all the content myself. Would this be the right chat to ask questions and maybe get some (emotional) support? |
22:07:19 | <sebs> | data in question is content generated by politicians and political institutions. data is generated by a personal open (?) data sideproject that got 'out of hand' ;) |
22:07:45 | <imer> | sebs: sounds like you're in the right place, yes |
22:07:52 | <sebs> | <3 |
22:08:46 | <sebs> | in a nutshell what I (partly intend to) do: |
22:08:56 | <imer> | sounds like we might even be interested in archiving that data if its publically available (but thats not my call to make, so take that with a bucket of salt) |
22:09:11 | <sebs> | this is why I am here |
22:10:14 | <sebs> | its only politcs (official politician pages, official bodies etc). I am trying to map the network of (germand and euro) politics (who is linking who etc) and as a sideproduct I produce the big url set |
22:11:02 | <sebs> | at some point i need the data of the pages, but that promps the question about content changes. Writing this on a napkin leads to 'archive does most of that' |
22:12:01 | <sebs> | plan is to collect about 200 million content urls in about 5 years |
22:12:58 | <sebs> | question: would such a project fill gaps in the content of the wayback machines data, or is it there already anyway? |
22:14:41 | <sebs> | question 2: can i run my own 'warrior' to archive the stuff myself and help out the project that way? It strikes me this is a way better method to 're-index' the data and look for changes than writing this myself (so many hard parts, so lazy sebs) |
22:16:41 | <that_lurker> | Do you have a sample of the urls? Are the urls on single website or in mutliple? Most likely this could be done through archivebot, but there is a posivility the url project could fit this. |
22:16:52 | <imer> | I don't think we have tooling to only rearchive pages that have changed, that might have to happen externally |
22:17:59 | | sec^nd quits [Ping timeout: 250 seconds] |
22:20:07 | <sebs> | that_lurker: how big do you want the sample to be? the urls are multiple urls on multiple websites. |
22:20:47 | <sebs> | imer: did not knew that. Thought this was kind of the purpose of this project. Never thought about how the timeline feature is built. |
22:22:07 | <imer> | it's just separate copies that happened to have been archived again (for reasons), we probably don't want to re-archive all those millions of urls constantly |
22:22:21 | <that_lurker> | if they are on different websites that could be something datechnoman would maybe like to take a look at :-) |
22:22:49 | <thuban> | sebs: wrt your first question, it depends. government bodies tend to be well-covered; individual politicians, especially local politicians, less so (and we do a lot of that through #archivebot, so your suggestions would be welcome there) |
22:23:06 | <thuban> | wrt your second question, warriors only work on warrior projects, so you couldn't use a warrior to archive arbitrary websites; however, you could suggest arbitrary websites for archival in #archivebot (or urls in #//, with some caveats). |
22:23:38 | <TheTechRobo> | Maybe this could even be a #Y thing? |
22:23:38 | <thuban> | that said, if you're doing serious data processing, that might not actually have the advantages you'd think it does, because (a) neither archiveteam (as imer said) nor the internet archive do change detection, and (b) you would have to get the data back out of the internet archive, which is slow |
22:23:39 | | sec^nd (second) joins |
22:24:01 | <thuban> | yeah... if #Y ever happens ;) |
22:27:56 | <sebs> | thuban: for fast access i would dl the data for myself. I am assuming that. if I can fill some gaps at first, that would be worth the effort for me already. |
22:28:15 | <sebs> | thanks for the hint with the warrior projects |
22:28:43 | <sebs> | what is #Y |
22:30:08 | <thuban> | sebs: i'm not sure what you mean exactly by "fill some gaps". are you suggesting that you would download websites and upload them to the internet archive yourself? |
22:30:54 | <TheTechRobo> | sebs: Basically a Warrior project where the intention is to make a distributed ArchiveBot with possibility for custom code without making an entirely new project. Unfortunately, it's currently dead. |
22:31:01 | <thuban> | #Y is https://wiki.archiveteam.org/index.php/Distributed_recursive_crawls, a hypothetical project that's been in planning stages for years |
22:31:20 | <thuban> | (ninja'd) |
22:32:00 | <sebs> | thuban: filling in some gaps means I do think I will find some parts of the poitical space soon that are missing in archive. Especailly the mentioned local politicians. And yes, if I could do it, i would be at least give it a try to operate the infra myself instead of stressing already stressed ressources. |
22:33:35 | <sebs> | TheTechRobo: My spidey senses tingled on custom code ;) Having tried a bunch of things in that area: that is no easy feat and I do have an idea why smth like this stalls. |
22:34:13 | <thuban> | sebs: while you're of course welcome to upload your own web archives to the internet archive, note that ia does not index third-party warcs into the wayback machine (since there's no way to guarantee their correctness). |
22:34:52 | <thuban> | warcs from archivebot (like other archiveteam projects) _do_ go into the wbm, so please don't hesitate to make suggestions there |
22:37:11 | <sebs> | thuban: I did read about the warc format. I do have to do my homework there as I only discovered the archive team wiki in the middle of the night. Thanks for clarification regarding being in teh wbm or not. was not aware |
22:41:29 | | benjins2 quits [Ping timeout: 272 seconds] |
22:45:01 | | benjins2 joins |
22:55:30 | <sebs> | Anyway: thanks for teaching me a lot and taking the time. very much apprechiated |
23:08:54 | | BlueMaxima joins |
23:09:59 | <thuban> | you're welcome! |
23:11:53 | | muklumsum quits [Ping timeout: 272 seconds] |
23:12:55 | | muklumsum joins |
23:38:29 | | muklumsum quits [Ping timeout: 272 seconds] |
23:39:44 | | muklumsum joins |
23:41:19 | | benjins2 quits [Ping timeout: 255 seconds] |