00:33:10 | | sonick (sonick) joins |
00:37:38 | | qwertyasdfuiopghjkl quits [Remote host closed the connection] |
01:15:18 | | pabs quits [Client Quit] |
01:18:33 | | pabs (pabs) joins |
04:11:23 | | line quits [Remote host closed the connection] |
04:48:38 | | line joins |
06:28:42 | | Barto quits [Quit: WeeChat 4.2.1] |
06:30:52 | | Barto (Barto) joins |
07:27:48 | | Arcorann (Arcorann) joins |
08:07:37 | | qwertyasdfuiopghjkl (qwertyasdfuiopghjkl) joins |
08:53:01 | | clarkk joins |
09:22:15 | <clarkk> | Out of interest, how is it decided which pages to archive and when to take the snapshots? Is every snapshot explicitly initiated by a user manually calling the Save it Now endpoint, or is there some other process going on? |
09:26:44 | <pabs> | if you press the "about this snapshot" link at the top right it gives you info on how the snapshot was created |
09:27:24 | <pabs> | some are SPN (Save Page Now), some are ArchiveBot or other ArchiveTeam stuff, some are Alexa crawls, some are IA's own crawls, some are other orgs crawls |
10:22:05 | <clarkk> | pabs, You mean, ArchiveBot spiders the whole internet, like Google does? |
10:42:45 | <pabs> | no AB does individual websites: https://wiki.archiveteam.org/index.php/ArchiveBot |
10:43:02 | <pabs> | or lists of individual pages and the resources they use |
10:43:24 | <pabs> | Alexa and some of the other ones are more like Google I think |
10:44:10 | <pabs> | (and ArchiveBot is not an archive.org initiative, but an ArchiveTeam one, separate organisation) |
10:44:14 | <pabs> | clarkk: ^ |
10:46:36 | <clarkk> | pabs, when you say Alexa, I presume you don't mean the Amazon thing? |
10:47:15 | <clarkk> | Where can I read more about that, and also IA's own crawls? |
12:27:54 | | clarkk quits [Client Quit] |
12:35:39 | | Arcorann quits [Ping timeout: 272 seconds] |
13:28:03 | | PredatorIWD quits [Quit: Leaving] |
13:28:34 | | sknebel quits [Remote host closed the connection] |
13:29:29 | | @AlsoJAA quits [Ping timeout: 272 seconds] |
13:30:28 | | sknebel (sknebel) joins |
13:35:50 | | AlsoJAA (JAA) joins |
13:35:50 | | @ChanServ sets mode: +o AlsoJAA |
13:48:20 | | sknebel quits [Client Quit] |
13:49:20 | | @AlsoJAA quits [Ping timeout: 240 seconds] |
13:50:45 | | sknebel (sknebel) joins |
14:05:21 | | AlsoJAA (JAA) joins |
14:05:21 | | @ChanServ sets mode: +o AlsoJAA |
16:17:07 | <@JAA> | !tell clarkk https://en.wikipedia.org/wiki/Alexa_Internet?useskin=vector was founded by the same guy as IA and later acquired by Amazon, but there's no relation to the spyware devices you're probably thinking of. There's very little documentation about IA's crawling; your best bet is probably their blog. |
16:17:08 | <eggdrop> | [tell] ok, I'll tell clarkk when they join next |
19:22:27 | <TheTechRobo> | I guesà |
19:22:54 | <TheTechRobo> | (please disregard that message) |
20:38:07 | | SootBector quits [Remote host closed the connection] |
20:38:35 | | SootBector (SootBector) joins |
20:53:45 | | that_lurker quits [Quit: I am most likely running a system update] |
20:56:32 | | that_lurker (that_lurker) joins |