Stability of event semantics

September 22, 2008

Followup to: Improving event detectors.

States of event detectors indicate events in the environment. When event detector changes, so do the events it indicates. If detector changes too much, its states can start indicating something different, or even nonexistent, becoming misleading or useless.

Events in the mind rely on each other. Semantics of most of the events depends on semantics of other events. Only few events are states of input/output, other events are computed through many levels of intermediate events. These intermediate events both depend on events that define them, and provide the foundation for other events that are defined in terms of them.

When an event detector changes, it results not just in change to that detector, but also in changes to all the other detectors depending on it. And since in environment everything is connected to almost everything else (given sufficiently long inferential chains), change in semantics of one event results in implicit change in most of the other events. These changes need to be contained, to preserve correspondence between existing event detectors and the structure of environment. Model of environment can change incrementally, incorporating new facts and repairing the errors, but it doesn’t change all at once, doesn’t break all the time.

Consider a binary detector that is defined in terms of a number of other detectors. It activates in a subset of joint state space of these detectors, elements of which are expected to appear according to some probability distribution. If few of the detectors change (modifying the conditions for assuming different states, not states themselves), so does the probability distribution. Clusters in this distribution (assuming that they hold most of the probability mass) can move around as a result of change, but not too far from their original locations (unchanged detectors keep them in place). Thus, if our detector keeps a sufficient margin around these clusters, shifted clusters won’t cross its boundary and will activate or deactivate the detector the same way as they did before change, more or less preserving its resulting behavior. After the change, detector again needs to shift the boundary away from the clusters, to be prepared to the subsequent changes. For example, learning detectors to keep maximum margin from the decision surface to exemplars on both sides takes care of this issue (but not of others). This way, change could be absorbed by the network; if the first layer of detectors can’t do that, subsequent layers will.


Improving event detectors

September 21, 2008

Followup to: Levels of representation.

Consider a binary event detector that, say, detects the presence of tigers in a picture. This detector defines a certain set of pictures for which it activates. If this is a simple preliminary detector constructed from general description of tigers, it can be significantly improved. But what constitutes an improvement? A change in detector could make it a better tiger-detector, worse tiger-detector, or even turn it into a cloud-detector.

The original detector is a vague question, and the improved detector is an answer. The initial outline allows to identify the clusters of known events fitting the description and change the detector to includes these clusters but not stray peripheral events that come through the boundary, and also to plan for extrapolation of these clusters. Event detector is constructed to be applied in the future, so it needs to be able to recognize not only causal patterns that were never observed before, but also causal patterns that never existed but will appear in the future. In a way, learning is a self-improvement action that agent applies to its future self to process certain events better.

Predicting the subsequent events requires a probabilistic model of environment, and relying on few exemplars that a newly-fledged detector managed to observe is not sufficient. If a new detector is expressed in terms of few other existing detectors (right from the vague question stage), rather than as a classifier applied to the whole input domain, the problem becomes much simpler. Existing detectors already have a good idea about probability distribution of their states, so probability distribution for the new detector roughly derives from them, modulo dependency. The form of the new detector can then be adjusted, guided by this probability distribution and facts about dependencies observed from few exemplars.

Intermediate events allow tractable inference, their purpose is in implementing computational steps that follow the natural structure of modeled environment. An event detector needs to at least not be redundant (not repeat another available detector) and be probable enough to assume more than one state during some reasonable timeframe (if a detector isn’t ever expected to activate, there is no point in keeping it around). When detector supports multiple states, many of these states need to hold sufficient probability mass. After original form of a new event detector distributes the probability mass among its states, based on the knowledge about detectors from which it’s constructed, one of the pressures for the refinement of new detector is in shifting the boundaries of its states to even out (or, at least, keep within limits) the distribution of probability.

The form events tend to assume depends on many aspects of the cognitive algorithm, and only on crude level do they correspond to natural events of the environment. Within the joint detectors of the natural events, factoring into individual detectors may look rather unnatural, like set of floating-point numbers that have a “zero” at twentieth bit in memory. The high-level dynamic of events repeats the structure of environment, but the low-level dynamic of individual detectors may look differently. The low-level dynamic needs to be specifically designed to implement the required high-level dynamic.


Levels of representation

September 20, 2008

Followup to: Perception and reactive beliefs, Vague questions and precise answers, Dynamics of representation.

In Perception and reactive beliefs I introduced the distinction between event detectors and events in the mind. Event detectors are aspects of representation that are relatively fixed, and events in the mind are the states of event detectors assumed during inference, which can change frequently, representing the currently modeled aspect of environment. This distinction allows to consider inference more or less independently from learning (changing event detectors), even though learning is a part of inference dynamic.

Event detectors are not necessarily atomic elements of representation. For example, if 8 bit numbers are directly represented, list of 8 bits is an event detector that can have one of the 256 states, and when these 8 bits assume the state 01010011, it is an event of value 83 being represented by this event detector (of course, it takes much more to go from 8 bits to semantics of “numbers”, I could as well say that when these bits assume the state 01010011, it is an event of “fish”). Also, event detector doesn’t need to be explicitly implemented, for example those 8 bits allow to talk about a binary 83-detector, even though there is no bit that activates exactly for 83. Conversely, a single event may correspond to states of multiple event detectors, including the states of the detectors at different times. An active state of 83-detector is a 83-event, which is a state of 8 bit-detectors.

When a causal pattern is described on multiple levels at the same time, there are multiple event detectors describing different aspects of it, different surface properties, at different granularity. All these detectors work together, not just detecting a simple event of presence or absence of a certain event in environment, but allowing to explore possible variations of its properties.

Just as causal patterns in environment can be described on multiple levels, so can the states of a collection of event detectors in the mind. Joint event detector that includes all of the event detectors involved in describing the causal pattern has a huge number of possible states. Just as there are very few possible causal patterns in the current environment, compared to a set of all physically possible states, there are only few possible states of the joint detector, compared to the total number of its states. Causal patterns cluster in the environment, and possible states of joint detectors cluster in the mind. Natural events are wide areas containing these clusters, both in the mind and the environment. The same happens on every other level of description, with combinations of event detectors defining the representational spaces in which narrow areas indicate possible events of the environment. For example, a joint bird detector has a subset of states corresponding to possible birds, but subsets of bird detectors can also have sets of states corresponding to different possible colors, different sizes, sounds and beaks, down to a great number of properties, if appropriate collection of individual event detectors is considered.


Dynamics of representation

September 4, 2008

Followup to: Where map meets the territory.

Natural events in the environment come in different shapes, and may be located far away from each other, be spatially or temporally distributed, or overlap. A single event may include many different configurations bundled together. Relations between events, that allow intelligent agent to build simplified models, to infer presence of some events from knowing about presence of others, come from the way physical laws apply to the actual content of the world.

Events in the mind follow the same relations, but work by different rules. They are arranged differently and are much more localized in time and space, but they don’t need to be atomic. Just as events in environment are only chosen to approximately describe its structure, events in the mind describe the dynamics of a particular kind of cognitive algorithm. This level of description is useful for designing the algorithm, it allows to develop reduction of a special case of the high-level phenomenon of intelligence down to the interaction of events, but it doesn’t cut to the bottom.

Attended events of environment don’t need to be represented all at once, as some kind of declarative enumeration at the moment of implementing the decision that follows from the model. Events of the model happen at different times, just as events in environment represented by them, to function as elements of the inference process.

There are two main factors that influence the way representation events happen in the mind. First, events drive the inference process and support current context. Inferred event appears after the events of context that indicate it. Some events are intermediary and can be discarded after followup events are inferred, other events need to stay around to shape the context of further inference. Second, a fixed pattern on the mind indicates different things depending on when it appears.

The simplest example of how event in the environment indicated by the same event in the mind changes over time is low-level input, when these events are identical. If binary event detector in the mind for low-level input is active at time T, it represents the same event of low-level input in environment at time T. At time T+1, it represents a different event, low-level input at time T+1, not the old event of input at T. On the other hand, at time T+7 there might form a different event in the mind that represents the input at time T, even though the original event template now represents the input at T+7.

Model of environment not only needs to incorporate new facts and infer missing elements, but also to change representation of some of the events as time goes on. The latter feature is not negative, as it allows to keep track of time and of temporal relations between different events. For example, when a temporally shifted representation of one event meets a new version of another event, they appear simultaneously and thus an inference rule between them can form, that would not happen if they never appear at the same time.

Each event in the mind leaves behind itself a trail of representations: it starts in one form, and then changes into the next, the one after that, and so on. Some of the events are more time-insensitive than others and may have most of their representation unchanged (representation of a single event is not necessarily atomic, so parts of representation of a single event may change in time, and other parts remain stable). As trails of multiple events interact in the mind, relations (rules of thumb) that are time-insensitive will be based on time-insensitive parts of representation, and relations that are more time-sensitive will look at time-dependent parts as well. This allows to learn and perform spatiotemporal inference.