Goals and means

June 30, 2008

Followup to: Context-specific actions.

At each step, alternative available actions can lead to different consequences. The choice of action (state of mind) translates into the choice of properties of the future environment. If each choice is performed according to the same preference of properties of future environment, the cumulative pressure of all choices driving the environment in the same direction becomes very strong, even if each individual action can influence very little. The trick is to know the right timing and right direction, to nudge the environment according to its own rhythm, and world-changing results can follow.

The direction in which the agent steers the environment is called the agent’s goal. The goal can be specified in different ways, notably in terms of utility function that ranks each possible state of the environment with a numeric value. From this perspective, intelligent agent is an optimization process that optimizes the environment to assume the state of higher utility, according to a specific utility function that governs the process. Note that the goal (ranking of possible states of the environment) is in general arbitrary, and it is erroneous to apply characteristics of human goals to goals-in-general. Since different goals can favor completely different states of the environment, for every “obvious” preference there is, theoretically, a mind that prefers otherwise (and optimizes in that direction).

Agent’s goal is the sole focus of all of its functionality. The particular way in which the agent is implemented, the ways in which it seeks out the information about the environment, or a particular ritual of rationality that it follows — all are instrumental and are there only to facilitate the optimization of environment according to the goal. (Which doesn’t mean that the goal is external to implementation or that implementation is inherently unimportant. Goal may include the clauses about environment containing an implementation with particular characteristics, and goal is embodied by a particular implementation of the agent.)

Just as perception is only necessary to find the actions available in the current context, actions only need to be constructed according to the agent’s goal. If it is expected that a particular action 1 will be ranked lower than action 2, there is no point in considering action 1. Thus, goal specifies which future states of the environment need to be optimized for, actions are selected to lead to target states of the environment, and perception allows to find out which actions lead to which states of the environment.

Goal, perception and action are not necessarily explicit in the implementation. Basically, goal plays the same role as perception, by limiting available actions. Perception limits the actions according to the state of the environment, and goal limits the actions according to agent’s preferences. Since action itself plays the role similar to perception, the distinctions may disappear altogether. From this perspective, agent operates only through high-level perception that is initiated by sensory input, biased by agent’s goal and translated into low-level action output. This picture doesn’t help in terms of clear semantics of agent’s operation, but it is useful to see how the fundamental building blocks of the agent may blur into each other in some implementations, and also in identifying these building blocks in implementation that doesn’t explicitly contain them (such as human brain).


Context-specific actions

June 27, 2008

Followup to: Action as determined perception.

Some decisions are reliably translated into actions on environment, independently of the context: if you decide to move a finger, and you have no relevant health problems, it will move as the result. The structure of the environment and its relation to the agent establish a relationship between a decision C and outcomes D, such that required outcome often comes as a result of the decision, with high probability P(D|C).

In other cases, notably when the action is more complex, reliability of action depends on the context in which the action is chosen, and in many contexts a given decision won’t reliably lead to the required outcome. If you are going to reach out and take a cup of tea from the table, the mental state of deciding to perform this action will be a good indicator of successfully taking the cup only if the cup was on the table in the first place. Information about the environment obtained through perception limits its possible states, and so the decision for action becomes reliable where it wasn’t a priori. Where the probability of outcome occurring by itself P(D) was very low, as was the success of unconditionally taken action P(D|C), the probability of the success of action taken in the right context B might be high again, P(D|C,B). The outcome D is achieved if decision C is drawn in the context B. Note that the states of the mind are again on the same side, as conditions for successful outcome in the environment.

The result of perception B is useful for determining the reliability of action on the environment, when the state of the environment that it identifies, together with the action performed by the agent, leads to the required outcome. Both state A that is detected by perception B, and decision C, determine the outcome D. If the probability of the outcome given the state of the environment and action, P(D|C,A), is high, the action is reliable. But the agent doesn’t have the direct access to state of the environment A, so it must construct a perception B that taps into the environment and informs the agent when the context is right. If B is a reliable indicator, P(A|B) is high, and so is P(D|C,B). It is not always so, as the outcome D might fail to result from C and A in exactly the circumstances when A is detectable by B, but more often it works, or the error can be fixed afterwards.

The power of perception is in finding which actions are reliable in the current context. Complex actions and creative plans don’t work without knowledge of the environment. Some properties of the environment are rather stable, and prior knowledge about them can be built into the chain of events that produces the outcome from the decision, thus making the unconditional action reliable. Other properties of the environment are not known in advance, and need to be taken into account on the fly.


Action as determined perception

June 26, 2008

Followup to: The semantics of beliefs.

Let’s now turn to the semantics of action. When considering action, it is tempting to see it as a kind of “anti-perception”: where perception is directed from environment towards the intelligent agent, action has an opposite direction. But, paradoxically, it turns out that the semantics of action looks just like that of perception, directed in the same way, from environment towards the agent.

The semantics of belief B in the state of the environment A comes from its ability to accurately identify situations where A is present, from event in the environment A having high probability, given that event in the knowledge representation B occurs, P(A|B). This requirement comes from inability of the agent to directly perceive events in the world, so that things it can work with, its mental states, have to be accurate about the things they represent.

The same situation holds for action. If agent needs to achieve the state of the environment D, it can’t directly affect it, the chain of events that leads to event D needs to start inside its mind, from event C in knowledge representation. Thus, the agent can’t directly choose D, it must choose C instead. The decision C is reliable, if event D will very likely occur, given that C is chosen. Again, we need to ensure high probability of event in the environment D, given the state of the mind C, P(D|C), even though the event D is determined by event C of the decision.

Thus, both action and perception work through having high confidence P(E|R) in the state of the environment E (A and D), given the representation in the agent R (B and C). In the case of perception, state of the environment determines the mental state of the agent, whereas in the case of action, mental state of the agent determines the state of the environment.

What is the functional difference between the action and perception then? The choice of mental states is constrained differently by the direction of causality between the events in the environment and events in the mind. Mental states are constrained by the required relation with the environment, high P(E|R). But in the case of perception, environment can’t be affected by the choice of current mental state, so only the B side can be varied, while in case of action, it’s both C and D. This gives a higher range of allowed actions C than beliefs B. Still, the repertoire of available actions is constrained by the requirement of reliably leading to given consequences in the environment.

Thus, in the considered class of intelligent agents, the minimal requirements for the semantics of their knowledge representation lead to the following picture. Internal representation is constrained to accurately reflect some of the properties of the environment, even when the properties of the environment are determined by this representation. The choice of state of the representation may completely change the environment, but in the end, it must turn out to be an accurate depiction of the relevant properties of the outcome, both before and after the decision.


Perception and reactive beliefs

June 25, 2008

Followup to: Belief is a state of the brain, The semantics of beliefs.

When you perceive something, your brain transitions in a particular state that identifies your experience. When you see a cup on the table, a “cup on the table experience” event occurs. This event is an indicator of a different event of cup really being on the table.

The brain can be regarded as a system that implements multiple feature detectors, each of which can be either active or inactive. (I don’t restrict feature detectors to be implemented by any particular mechanism, feature detectors for such high-level concepts as “elegance” require quite a lot of machinery, and feature detectors for compound concepts such as “a cup of tea is on the kitchen table near the window” require many elements to represent. I also don’t consider all mechanisms of the brain as feature detectors.) Activity of feature detector produces an event of detecting corresponding feature in the current context. Events of activation of some of the feature detectors are good indicators of events in the environment.

Feature detectors act as reactive templates for beliefs. Each feature detector is a potential belief, but it produces an actual belief only when it activates. The pool of available feature detectors defines which beliefs are possible, and correspondingly which events in the environment can be detected. When given feature detector B is inactive, probability of event A, for which B is a good indicator, can be estimated as P(A), whereas activity of B changes the estimate to P(A|B). Strong feature detectors provide overwhelming amount of evidence, so that rare or extremely unlikely event A (low P(A)) is turned into a near-certain event (very high P(A|B)). Note that P(A) and P(A|B) are external estimates of how well a feature detector B works to detect A. These values do not need to be present inside the machinery that implements the feature detector.

The operation of a mind can thus be regarded on two levels. On first level, a pool of feature detectors is implemented, that structures the perception of the environment. On second level, these feature detectors are used for perception, and their activity represents the current situation. The properties of the first level determine the performance of second level. Learning consists in optimization of the feature detectors to improve the reactive performance of the second level. The semantics of representation comes from the pool of feature detectors, which bind the features to the events in the environment by being able to indicate them, and the current activity of feature detectors indicates and thus represents the properties of current state of the environment.


Odds, evidence, and an intuitive form of Bayes’ theorem

June 23, 2008

Bayes’ theorem establishes how to update probabilities when given additional evidence. The standard form of Bayes’ theorem is as follows:

\displaystyle P(A|B)=\frac{P(A)P(B|A)}{P(B|A)P(A)+P(B|\neg A)P(\neg A)}

It shows how to update prior probability P(A) of event A, when given event B as evidence. Updated probability, P(A|B), reads “probability of A given B”. Conditional probability is defined as P(A|B)=\frac{P(A,B)}{P(B)} where P(A,B) is probability of both events A and B happening at the same time.

To get the updated probability P(A|B), we also need to know how strong B is as evidence for A. This is specified by conditional probabilities P(B|A) and P(B|\neg A), that is probability of obtaining evidence B when A is present (correct indication), and probability of receiving evidence B even though A is not present (wrong indication).

Bayes’ theorem follows straightforwardly from definition of conditional probability, and has sufficiently simple form, but obtaining intuitive sense of it is rather tricky. The intuitive understanding is useful for discussions of less technical situations, which can still be informed by technical knowledge from probability theory. The formula has all these probabilities mixed up, so the structure of probability updating is not intuitively clear.

One thing that is not explicitly expressed is that the result doesn’t depend on individual values of P(B|A) and P(B|\neg A), but only on their ratio, \frac{P(B|A)}{P(B|\neg A)}. If we rewrite the formula to express this fact, and rearrange other terms a little, we can get the following form:

\displaystyle \frac{P(A|B)}{P(\neg A|B)}=\frac{P(A)}{P(\neg A)}\cdot\frac{P(B|A)}{P(B|\neg A)}

The value \mathrm{Odds}(X)=\frac{P(X)}{1-P(X)} is called odds of event X. Odds is the ratio of probability of event happening to probability of it not happening, P(X): P(\neg X). Also, the strength of B as evidence for A is called Bayes factor, K(B|A)=\frac{P(B|A)}{P(B|\neg A)}. In these terms, Bayes’ theorem can be rewritten in the following simple form:

\displaystyle \mathrm{Odds}(A|B)=\mathrm{Odds}(A)\cdot K(B|A)

Both odds and Bayes factor show the ratios between alternative outcomes: presence to absence of event for odds, and correct indication to wrong indication for Bayes factor. In this form of Bayes’ theorem, it is intuitively clear how the odds for the presence of event A given an event B are calculated: we start with previously known odds for event A, and multiply it by odds of B being a correct indication of A. Observing B only shifts the odds of our event A, depending on how good an indicator B is for A.

For example, if we have a test B for illness A, and probability of testing positive for people with the illness, P(B|A), is 90%, while the probability of testing positive for people without illness, P(B|\neg A), is 1%, and the portion of population that has the illness, P(A), is 0.1%, then what is the probability that a person that tested positive has the illness, P(A|B)? It’s nowhere near 90%: the prior probability of having illness, 0.1%, was shifted by the evidence, and even though evidence is rather strong, prior probability is still too low, so the posterior probability is only 8.3%.

Let’s see how the same problem looks using the form of Bayes’ theorem based on odds. Prior odds of having the illness are \frac{0.1\%}{100\%-0.1\%}\approx 1:1000, Bayes factor is \frac{90\%}{1\%}=90:1, so the posterior odds are

\displaystyle \mathrm{Odds}(A|B)\approx(1:1000)\cdot(90:1)=9:100

So, the odds are about 1:10 against the conclusion that patient has the illness. Converting from odds back to probabilities, using the formula P(X)=\frac{\mathrm{Odds}(X)}{1+\mathrm{Odds}(X)}, we get the result that P(A|B) is about 8.3%. Note that we don’t always need to convert back to probabilities.

Odds range in (0,+\infty), with value of 1 showing even odds, so that “positive” odds range in (1,+\infty), while “negative” odds range in (0,1). A more symmetric range, and even simpler form of Bayes’ theorem, can be obtained by taking the logarithm of odds and Bayes factor. The logarithm of odds is called log odds or logit. The logarithm of Bayes factor is called weight of evidence. It is useful to take base 2 logarithms, so that the result can be measured in bits. These are not bits on a hard drive, but a mathematician’s bits, so the value in these bits can be fractional and even negative. Taking the logarithm of the both sides of odds-based Bayes’ theorem, we get this rule:

\displaystyle \log_2(\mathrm{Odds}(A|B))=\log_2(\mathrm{Odds}(A))+\log_2(K(B|A))

In this form, the probability of event is represented by evidence about event, measured in bits. Evidence ranges in (-\infty,+\infty), and probability of 50%, or odds of 1:1, corresponds to evidence of 0. Positive evidence corresponds to probability above 50% or odds above 1, and so on. Evidence provided by the indicator, \log_2(K(B|A)), is added to previously known evidence about the event. This also roughly shows how to chain the Bayes’ rule: you just add up all the evidence, each piece of evidence weighting either positively or negatively, and the resulting sum of all evidence shows the conclusion (although one should be very careful not to add the same piece of evidence multiple times, explicitly or implicitly through dependent events).

In conclusion, let’s see how our example works with log odds. Prior evidence for A is log_2(1:1000)\approx -10, the positive test gives additional evidence log_2(90:1)\approx 6.5, so the sum of evidence is -10+6.5=-3.5. This calculation immediately shows that the outcome A is unlikely even given B. The odds are given by 1:2^{3.5}\approx 1:11.

Also note that the negative result of the test would have given a different amount of evidence, namely

\displaystyle \log_2(\mathrm{Odds}(\neg B|A))=\log_2\left(\frac{100\%-90\%}{100\%-1\%}\right)
=\log_2(10:99)\approx -3.3

so that the total evidence for A after receiving the negative test would be -10-3.3=-13.3.


The semantics of beliefs

June 22, 2008

Followup to: Belief is a state of the brain.

What does it mean for a belief to be a belief about particular property of the world? When you believe that there is a cup on the table, what connects the belief in your brain to the cup on the table? When is the belief correct?

Let’s consider a room, in which there might be (or not) a cup on the table, and you, having (or not) a belief in the cup being on the table. Both conditions are the facts about the state of the room: in the first case, we consider a fact (event) of cup being located on the table, and in the second case, your brain being in a state where you have a belief in the cup being located on the table.

The event of you having a belief is an indicator of an event of cup being on the table. The event of you having this belief is a strong evidence about the event it refers to. Since you can’t directly reason using the cups of tea and other real-world objects inside your head, and only reason using the beliefs, the beliefs must be very good indicators of the state of the world. If you have a belief in a cup being on the table, you reason as if the cup is really on the table. But you do it only using the event of you having the belief, not directly using the event of cup really being there.

What makes an event B a good indicator of event A? To be a good indicator, event B must be able to distinguish the cases where event A happened and the cases where it didn’t. Using the notation of probability theory, probability of event B happening, given that event A did happen, P(B|A), should be greater than probability of event B happening when A didn’t happen, P(B|\neg A). If this is the case, the observation of event B is evidence that increases the belief in event A (although not necessarily up to near-certainty, which depends on the relation between the prior belief in A and the evidence \frac{P(B|A)}{P(B|\neg A)}).

The correctness of belief comes from the probability of presence of the referred event, given that you have a belief, P(A|B). If this is the case, you can safely use the belief inside your head in place of the real event out in the world. In the next post, I’ll show how to use weight of evidence with log odds in an alternative notation for Bayes’ theorem to provide a more intuitive understanding of how the evidence changes the levels of belief.


Belief is a state of the brain

June 21, 2008

Let’s consider a simple fact: your every belief is a state of your brain. The fact that you believe in something is a fact about your brain, your belief that you have a belief is also a fact about your brain, and so on.

The beliefs in the brain are continuously updated by the influence of the environment, through the senses, but also by the processes happening within the brain. The beliefs are there to cause other beliefs and the actions.

The brain is a complicated system, consisting of many interacting parts. Parts of the brain observe the activity in other parts, and perform actions on yet other parts. When “you” think about something, “you” are performing internal actions on your beliefs.

When you have any kind of experience, it is a fact about the state of your brain: you have a belief that you are having this experience. When you see an object, you can trace to a certain degree the chain of events caused by the presence of that object, outside and then inside your body. Experience of seeing an object tells you that ultimately chain of events transitioned your brain in a state where it reports this experience to other parts of itself, so that overall behavior, internal (thoughts and imagery that you experience) and external, can manifest properties of this experience.

In other cases, you can’t quite trace the processes leading to the formation of beliefs. But still, when the brain claims to have other kinds of experience, it means that parts of the brain talk to each other in a certain way, claiming that experience is there, and that is all there is to it. Such kinds of experience as consciousness and qualia, pleasure and pain, glorified by many people in their subjective mystery, are examples of processes happening in the brain, creating beliefs about their properties and influencing your behavior.

To understand the experience, one needs to track down the mechanisms by which the beliefs that constitute the experience are formed. Establishing the basic fact that the experience is the output of the algorithms implemented in a brain is a first step on that path.


Introduction

June 21, 2008

Hello, my name is Vladimir Nesov, I’m a PhD student in Computer Science, and I’m interested in understanding the fundamental mechanisms of intelligence. In this blog, I’m going to write about rationality, futurism, and artificial intelligence.