Patterns of structure

October 26, 2008

Followup to: Waves of local structure, Representing multiple copies.

Let’s call a subgraph (or corresponding collection of maps) that repeats in a scene or between scenes, and also has some properties that will be described later, a pattern. Pattern can refer to a recurring object, a property, or a relation. Multiple instances of an object in a scene are represented by a single pattern (a collection of salient maps), and this pattern is embedded in several instantiation points.

Pattern is a building block larger than individual node or map, closer to natural description of environment, but selected so that it’s possible to discuss inference algorithm in terms of it. It allows to connect intuitions about things like recurring properties, relations, verbs, or classes and instances of objects, to dynamic of the algorithm. Patterns can consist of just a few maps, or can include many other patterns as parts of themselves. Pattern is not a functional element of cognitive algorithm, but a tool for describing its operation at intermediate level of detail.

A pattern, when if gets included in the scene, is retraced by structure waves that keep its maps salient. The interesting part is how the waves that cross the boundary between patterns behave. In effect, when structure wave enters a pattern for the first time, in launches a process of bringing that whole pattern into the scene by subsequent waves. Patterns that are already in the scene are kept together by waves that flow between them. The structure of the scene is represented in the way waves flow between its patterns. Set of salient maps is enumeration of events, while the way waves travel between patterns is the causal model that describes relations between the events, and that allows to infer additional events.

Let’s ignore non-salient maps for a moment, and assume for simplicity that maps either clearly match the context or don’t. In this case, the decision to include a matching map in a wave is dominated by presence of ambiguity: if there is only one matching map, it gets accepted, but if there are many, neither does. Translating to the language of patterns, if a pattern is instantiated at multiple points, waves don’t flow from it to instantiation points. On the other hand, if there is only one instantiation point, or if instantiation points are identical, waves can leave the pattern. Identical instantiation points belong to the same pattern and are represented by the same maps. Waves flow both ways between identical instantiation points and the pattern instantiated at them. But waves can’t reach the bulk of patterns that host these instantiation points.

Structure of the scene is in directions that waves flow between patterns, through their interfaces. Each interface is a part of a pattern small enough to fit in a structure wave, and interfaces of different patterns can match. When wave exits an interface, it continues to another pattern only if there is exactly one other interface among available in the scene that matches (under our simplifying assumptions). This makes some interfaces traversable in one direction, and some in both. As a result, the set of patterns that structure wave can reach depends on which pattern it started from.

When two patterns are connected by a two-way interface, or a pair of one-way interfaces going in each direction, it makes them equivalent as starting points for a wave. Patterns connected this way are combined with each other. If two patterns are not combined, but have a one-way interface leading from one pattern to the other, then first pattern instantiates the second at that interface (the relation of instantiation is extended by transitivity). Combined patterns instantiate each other. The closure of a pattern consists of all the patterns instantiated by it, and is an area reachable by a wave starting in that pattern.

Common properties that get repeated over and over throughout the scene, such as color, are patterns instantiated at many points. The color itself in this case is a pattern equal to its closure. Small properties like this can be considered properties of specific complex patterns, but at the same time they are shared with other patterns. Sharing a small detail doesn’t conflate the patterns, as the detail is separately instantiated in each case. On the other hand, the detail parameterizes all patterns that share it, so any change in the parameter gets reflected in all of them.

On a bigger scale, each pattern can be considered containing other patterns instantiated by it as its parts or properties. And conversely, each pattern is a parameter of all patterns that instantiate it. Pattern in itself represents a class of entity, while pattern as a part of other pattern represents an instance. Patterns in the scene are instantiated relative to each other, there is no global counter that says how many instances of given pattern scene contains. For copies of big objects, it might be easy to count, but for recurring small properties like color or texture, instances only get observed relatively. In this representation, all the properties of the scene are mixed together, linked at every similarity, and only get separated implicitly, through selective local retracing by structure waves.

Described concepts naturally extend to the scene restoration. When there are no patterns in the scene that match a given interface, this interface is a gaping hole in knowledge about the scene, ready to be filled by a specific property. If known scenes show that only one property matches the interface, it gets accepted (“all crows I saw were black, so I assume this one is also black”). If multiple different properties match the interface, neither gets accepted. When one new property is accepted, in combination with the context of the scene it creates a more specific interface that can enable more properties to be accepted. Finally, inference performed by structure waves based on gradual salience and matching can be reformulated as probabilistic inference.


Representing multiple copies

October 21, 2008

Followup to: Waves of local structure.

Consider a scene that contains multiple instances of the same object, for example a room with multiple identical chairs. The internal structure of the instances is the same, they all have the same components with the same properties and relations. Instances only differ in relations to scene as a whole, to elements of the scene outside themselves, including each other. I’ll call the areas of the scene as a whole in which instances are embedded, instantiation points.

If scene is represented as a set of maps, internal structure of identical objects is represented by the same maps, making no distinction between instances, and even between the case when the scene contains only a single instance and the case when it contains many.

When structure wave starts from inside one of the copies, it doesn’t know which copy it’s inside, and so it doesn’t know where it should exit the copy to the scene outside. It’s more accurate to say that it starts in all of the copies at the same time, or in an abstract class of object in question, that doesn’t correspond to a particular copy. If structure wave exits the copies and enters the scene in multiple places, reconstruction of the scene becomes scrambled, with different instantiation points in the scene tied together. If scene contains an apple hanging on a tree and another apple lying on a plate, structure wave that exits both to the tree and to the plate will mold the tree with the plate. So, structure wave must choose between instantiation points, and failing that, it shouldn’t exit the object at all.

Wave chooses its direction through application of maps. The situation when wave can’t decide between alternative instantiation points corresponds to it being unable to choose between maps describing them. This suggest another component of map application, besides salience and matching: map needs to be chosen more or less unambiguously. If there are multiple similar candidates, neither of then is to be applied.

Thus, waves retrace the copies of an object without exiting to the scene, without connecting it to the scene. But, on the other hand, waves that start in the scene around the instantiation points can unambiguously enter the corresponding instance. These waves can bring instance-specific modifications, making a particular instance less of a copy. If one instance is sufficiently modified, structure waves will also become able to exit it.


Waves of local structure

October 17, 2008

Followup to: Restoring the structure, Fragments of structure.

To perform a single structure restoration step, we only need a local structural context of the current scene graph. Thus, if we have the current scene represented in form of maps, together with collection of known scenes, we only need to reassemble local area around this context. The scene never needs to be assembled as a whole. After all applicable maps are applied to the structural context, it can be disassembled again (note that reflecting the changes requires reconstructing a subgraph slightly bigger than a single structural context, to cover it with multiple structural contexts, capturing the structure in maps from different centers). On completion, this operation makes a change to the set of salient maps corresponding to the structural context: some maps are replaced by other (maybe completely new) maps due to significant changes in the context, some non-salient maps become salient.

When scene grows through application of known maps, each new element creates potential for further growth. If scene is nearly complete, with almost nothing to add, a single structural context that accepts new maps can create a seed for an additional big object to be integrated in the scene, requiring many subsequent restoration steps, even inserting elements into existing structural contexts from outside, making them change where they didn’t before. Local changes follow from each other, unstable areas of scene graph generate further instability.

If subgraphs are only reconstructed to make changes, and changes generate further changes, this establishes a certain pattern for reconstruction of subgraphs. Once an area of the scene is reconstructed and changes are made, it creates a potential for further changes applied to the same area, or its immediate extension. Reconstructed area of the scene grows with the changes, in the direction of the changes, and gets disassembled where changes stop.

Overall, this process of restoring local graph structure, following the changes and disassembling the structure once it becomes stable, has a form of a wavefront. All the activity during structure restoration happens within these waves of local structure, plowing incomplete areas of the scene, leaving the healed structure behind. Structure wave is a subgraph formed from salient maps, that is directed by the rest of the maps. When wave starts to retrace only maps that are already salient, area of the scene is complete and the wave goes out.

Structure waves can also be seen as navigating the set of maps. Wave forms from few salient maps, and propagates to the other maps matching the context. At each step, wave chooses the most relevant maps to focus on, preferring salient maps and maps better matching the context. Salient maps have the highest priority, which allows them to define the initial form of the wave, with non-salient maps following the context they create. Still, salient maps can only be accepted if they sufficiently match the context. At the extremes of the threshold for acceptance of maps, structure waves can fail to even retrace the current scene, or they can jam together most of the known elements.

This perspective allows to further unify the process of wave propagation. Instead of drawing a clear distinction between salient and non-salient maps, some maps can be marked as more salient, with gradation of salience. As structure wave develops, it accepts new maps based on how well they match the context, but in a way heavily biased by salience of these maps. Structure waves spontaneously start growing from random salient maps, and maps accepted by structure waves get marked as more salient. Initiating waves this way also allows to keep them small, independently retracing the features of the current scene, weaving in the threads of previous knowledge.


Fragments of structure

October 16, 2008

Followup to: Restoring the structure.

Graph restoration procedure only uses structural contexts of known scene graphs. Thus, there is no need to preserve individual scenes. Instead, the set of all structural contexts of all known scenes should be summarized directly. Context-matching only looks at labels of nodes, so identities of nodes in known scenes are also irrelevant. For example, if a certain structural context is frequently present in known scenes, this fact can be represented without keeping separate copies and without keeping track of which scenes this context is present in.

If structural contexts are sufficiently small, and if elements of scene graph encode sufficient amount of information about its local structure using special labels and multilevel representation, structural contexts themselves can be further simplified. I’ll call these simplified structural contexts maps, to avoid confusion with more general notion of context. A map consists only of a central label and a set of weighted support labels, and it doesn’t encode a subgraph. Maps are created from structural contexts by measuring the distance from central node to other nodes, and discarding the remaining structure, keeping only information about labels. Similar maps are joined together, with weights of the summary reflecting the weights in the originals and frequencies of their elements.

Maps are matched against the structural context in current scene (a subgraph that is not simplified). When matched maps strongly indicate that a certain label should be added to the current structural context, either a new node with this label is created, or an edge connects central node of structural context with another node that already has that label. If edge needs to be created between nodes that are not central in the current structural context, it’ll have to wait until one of these nodes becomes a center of another structural context. It nodes with the same label are located close together, they need to be looked at from different maps to attract different structure. Step by step, this more limited algorithm can also restore structures from known scenes. The structure restoration from unstructured maps is supported by the structure of current scene. Structural contexts guide both the search among maps extracted from known scenes and application of their elements.

Taken together, maps can be considered local pattern-completion rules, that get applied to the current scene, with the rules having the best match (and better track record) acting first.

Another interesting consequence of representing known scenes in form of maps is that it becomes possible to represent the current scene the same way. If set of known scenes consists of only a single scene, the process of structure restoration will recreate that same scene, even if it starts from a single node. So, while scene restoration ostensibly requires supporting explicit graph, to act as a framework on which maps grow the structure, that graph can be shattered on small simplified fragments when it’s not explicitly required. Current scene can be represented by a set of its maps. A similar approach was originally used by connectionists to represent structured knowledge as an unordered set of atomic symbols, with symbols encoding local neighborhoods of structure. For example, a string of letters in natural language can be encoded using a set of n-grams.

Thus, both known scenes and the current scene can be represented by an overall set of maps, with maps constituting the current scene marked as salient maps. To launch the structure restoration process, first salient maps are used to restore the current scene, and then the rest of the maps are used to build on it.


Restoring the structure

October 15, 2008

Followup to: Representation of structure.

Let’s say, we have a set of known scenes of similar nature, given in graph representation. The task is to reconstruct the missing properties in a new scene, based on known scenes.

On a scene graph, let structural context of given node be a local subgraph centered on that node, that includes only nodes and links within a certain small distance from it. One simplifying approach to reconstruction of missing properties is adding nodes and edges locally and incrementally, looking only at one structural context at a time (it requires some assumptions about the scene graphs). If current scene is extended locally, we only need to look at the local structural contexts of known scenes. The algorithm, roughly, would select a node in the current scene, and compare its structural context to all structural contexts of all known scenes. If a certain subgraph is encountered in sufficient portion of sufficiently similar structural contexts, algorithm inserts that subgraph in the current structural context, connecting it to the rest of the structural context in the same way it was connected in known scenes.

This process is basically inference. Known scenes establish the rules of thumb, and new incomplete scene is an initial focus of attention, collection of facts to be built upon according to known rules of thumb. Structural contexts limit the complexity and sensitivity of each individual inferential step. At this point, nothing fades away or gets repaired, elements are only added.

Big complicated structures from the past scenes can be restored and adaptively connected to existing scene using multilevel representation. Adding high-level description of the structure sketches its new location, and then gradually details can be elaborated, all within structural contexts, with details at each level of description connecting to corresponding details in the current scene at the neighboring levels of description. Each new detail adds to the context for selection of the next detail. To avoid local inconsistencies, where top-down rebuilding of new structure starts going the wrong way and comes to a halt in inconsistent state, sufficient amount of high-level representation is required, binding details of the scene together, establishing robust structural framework. This is one of the assumptions for the format of scene graphs.

The process of structure restoration can also be seen as growth of structure at the boundaries of the specified scene, according to the “natural” way structure used to grow in the past scenes. Would the scene be a simple regular crystal, there would be a single rule to be applied repeatedly to each shallow layer of the structure. In a real-world scene graph, growth simultaneously happens of multiple levels, which guide each other according to many established rules. Growth happens locally, seeing limited amount of structure on each step, but acts on many levels, which have different notions of locality. High-level representation gives general guidance, while low-level representation captures the little details, allowing the new structure to naturally extend beyond the boundary, instead of being crudely fastened on top.


Representation of structure

October 13, 2008

So far, the arrangement of events and structure of environment as depicted by representation weren’t described in any detail, the discussion centered on relations between events in general. In the next several posts, I’ll explore a way of representing more concrete structural descriptions in form of events and rules of thumb. This will add another perspective on the process of inference, identifying correlates of representing multiple instances of object, recurring properties, process of creating new elements of representation in response to observing new variants of known objects, access to properties of objects in a scene, and so on. This perspective will also suggest more concrete form of representation, setting stage for a more close investigation of possible inference algorithms.

Frame representation is an expressive method for capturing the structure of a static scene, its components, relations between them, properties and generalizations. A scene in frame representation is described by frames, which contain a certain set of slots. Frames represent entities or classes of entities, and slots can be filled by one of multiple possible values, describing properties of these entities or relations between them.

Let the scene within a focus of attention be explicitly presented in form of an undirected graph, with labeled nodes and unlabeled edges. This form is sufficient to capture a frame representation of a scene, with reserved labeled nodes added to represent slots, frames, directed edges, and so on. I assume that the scene is presented in a fairly loose frame-like form, with restrictions for representation arising as necessary.

This graph is not the knowledge representation, but representation of a scene that knowledge representation needs to capture. This is a design requirement for the cognitive algorithm: it needs to be able to represent a scene with about this level of expressive power. Additionally, it needs to be able to perform various operations: inference, attention control, learning, reconstruction of the scene from low-level input, and so on.

Structural elements of a scene in this representation are described by subgraphs. Graph for the overall scene is composed of many such subgraphs linked together, specifying each other’s properties and relations. Setting one of the several alternative properties is reflected by attaching one of the several alternative subgraphs. A simple binary property can be represented by a single node, while a complex property that has internal structure or includes relations to other objects, is represented by a bigger subgraph. For example, form of an object can be described by a graph of its 3D surface mesh.

Scene graph represents a focus of attention, highlighting a particular view of environment, on particular levels of description. Different scene graphs can capture different parts of environment, including its state at different time. The same object can be represented at overlapping levels of granularity or considered from different perspectives. Relations between properties captured by different scenes can be made a central element of another scene. Thus, scene graph is an open-ended representation that can be extended indefinitely, by elaborating particular details, adding new entities, establishing relations with additional entities, and shifting to different levels of description. An object represented by a single node can be expanded to a subgraph having many properties, and conversely a subgraph can be collapsed to a one-node summary.


Abstract inference

October 12, 2008

Followup to: Stability of event semantics, Focus of attention, Learning rules of thumb.

Not all events in the mind have natural interpretation as indicators of events in environment, even in the context of specific focus of attention. Some of the detectors form abstract inferential circuits that allow to compute effects more complicated than immediate association between events in environment (supported by rules of thumb acting on them), such as algorithms implementing skills, translating high-level commands into low-level actions executing them in a context-sensitive way, when they can’t be traced through hierarchy of consequences in the environment. These circuits can be built using the same learning algorithms and supported using the same techniques for preserving stable inference. Event detectors (or groups of detectors) act as equivalents of logic gates in a circuit or states and transitions in a finite state machine. Computation proceeds as inference between focuses of attention, directed by context outside the circuit.

Just as with normal representation, abstract inference can be supported on multiple levels of description. Low-level circuit that computes individual actions from high-level command can be described by a simple rule that models successful performance of that action, leaving out the low-level details, which is used in the planning algorithm. Circuit is activated only when action actually needs to be performed. This is an example of reflection, when mind models an aspect of mind, and not of external environment.

There is a deep analogy between this kind of within-mind reflection and the process of building many layers of representation of the environment from low-level sensory activity. Abstract circuits themselves are learned by generalizing training sessions, so for cognitive algorithm they are no different from other fragments of low-level model of environment. Strictly speaking, structure of environment is also only a statistical regularity in the environment that has no concrete existence, so when it’s captured in representation in a mind, this representation is only a way to describe the regularity, which can as well apply to the steps of abstract inference.

Another way of seeing low-level action implemented by abstract inference is by shifting the boundary of mind. Abstract inference is regarded as part of the environment, so reflective model becomes a model of just another aspect of environment. This also allows to view learning as a kind of action applied to environment, where in this case the subject of action is a part of mind. Step by step, this approach allows to reimplement the whole intelligent agent, breaking out of constraints of initial cognitive algorithm, launching the cycle of strong self-improvement, as opposed to learning within initial algorithm.


Learning rules of thumb

October 11, 2008

Followup to: Rules of thumb, Focus of attention.

States of individual low-level event detectors don’t correspond to specific events in environment. Only when considered in a context of focus of attention, individual detectors indicate localized events. More generally, an event in environment can be represented by a collection of low-level event detectors in the mind. This happens because each specific event in environment rarely gets attention more than once, and when it does, the question of presence or absence of that event gets resolved once and for all, so that there is no need to keep a detector specialized on answering this question around.

Rules of thumb describe the events in environment, but act and get learned in the mind. They capture general relations that hold between specific events, so within the mind they establish relations between contexts, collections of events, parts of focus of attention. For each context representing a specific event, rule of thumb constructs another context, representing inferred specific event, and this inferred event is different for different original events. Rules of thumb generalize observed transitions between focuses of attention, so that when focus of attention represents a novel situation, they can be used to infer a next focus of attention. When transitions between focuses of attention are originally enforced by sensory input, or by few rules of thumb that look for specific cues, new rules of thumb make the transitions more robust, so that sensory input or initial cues are not required anymore to make the same inference.


Focus of attention

October 7, 2008

Followup to: Principles of holistic control, Event detector as experimental setup.

Event detectors can be regarded as experimental devices for discovering properties of past and future. State of detector is the result of the experiment, so if detector is designed to answer only one specific question, the answer should also stay the same, as constant state of the detector. But semantics of detectors changes over time, and the questions are more generally context-sensitive, so state doesn’t stay the same, and interpretation needs to keep up. When interpretation of detector changes, so do interpretations of dependent detectors, and detectors depending on them.

Locally and globally, each moment the mind changes which properties of environment it indicates, and correspondingly changes its state. Interpretation of possible states of the detectors (external description) and states themselves (actual implementation) are interrelated: current states form a context for reaching following states, and correspondingly interpretation of current states determines interpretation of possible following states.

Let’s call interpretation of (properties of environment indicated by) current state of mind, focus of attention. Events indicated by next state of mind are directly indicated by currently indicated events in environment (including current sensory input). Thus, focus of attention follows indicator chains in the environment. Since mutually indicating properties propagate together, inference running for a while without drastic disturbances will lead to a robust model of aspect of environment, with elements reinforcing and double-checking each other.

Focus of attention is driven by several related pressures. First, directions of inference from properties of environment in the focus of attention form the direction of change in overall attention, if aligned sufficiently together. Second, events in the focus of attention tend to form a coherent picture, with mutually reinforcing events staying longer together and separate events gradually dissipating. Third, low-level input and output have fixed semantics, thus binding focus of attention to the agent in several points. Even though the focus can stretch for light-years in high-level representation, it never completely leaves action and perception, which emanate waves of attention through the first pressure, by immediate inference, in all directions around the agent. This same pressure drags the attention forward in time, changing the representation to reflect current events in the environment. And fourth, focus of attention is driven to seek desirable properties in environment, forming a substrate for preparing and following goal-reaching plans.

Depending on context, individual detectors can have a great variety of interpretations. As a result, separately, detectors have a very distributed semantics, indicating a certain texture of configurations located anywhere and at any time. But together, they add up to an inferentially localized picture, in which each individual event makes much more clear-cut distinction, being focused by other events.


Event detector as experimental setup

October 4, 2008

Followup to: Where map meets the territory, Levels of representation, Improving event detectors.

Uncertainty about environment is uncertainty about specific events. Experiment is a procedure for which the outcome is initially unknown, but knowledge about actual outcome (after the experiment is performed) can be used to resolve the uncertainty about environment. Experimental setup is created in such a way that its future state, after the experiment is performed, indicates the property of environment in question. This constitutes a theoretical basis or interpretation for experiment. Interpretation provides both a functional cause of experiment (the reason it’s being performed, chain of events that leads to it being performed), and a way to use its results. Interpretation also provides a criterion of optimality for experimental setup, so that a particular experiment is a good solution with respect to this criterion, configuration optimized for the task.

Event detector can assume one of the multiple possible states, and serves as a tool for resolving uncertainty. The state it assumes indicates which of the alternatives holds in reality. From this perspective, event detector can be regarded as part of experimental setup, that each moment answers a question about environment.

Two complementary interpretations can be used for event detectors. First, whole intelligent agent can be regarded as experimental setup, with this particular event detector being a readout of the experiment (global interpretation). The event detector is considered independently of other elements of mind, and interpretation relates state of the detector directly to state of environment, shows what events in environment are indicated by the readout. Structure of mind is a way of obtaining measurement with required properties. The relation between detector and environment is primary, and algorithm of the mind is a way to implement that relation. Direction of improvement for the detector is defined by the external criterion, by structure of environment.

The second way is to consider only the detector itself as experimental setup (local interpretation). Detector can be configured to answer a question specific to needs of cognitive algorithm, and not so much for a feature of environment. Interpretation of its states can be derived from interpretations of elements of mind it interacts with, which makes derivation of interpretation more tractable than in global case. Starting from input/output, where detectors indicate their own state, interpretation reaches deeper in the environment as detectors get deeper in the mind. Each detector solves a local optimization problem, efficiently indicating the state of environment following from states indicated by related detectors.

Interpretation guides learning, sets the optimization target for representation. When learning is regarded as development of new experiments in anticipation of inference, interpretation can be said to be the process of learning, chain of events that leads to particular changes in representation. Global interpretation regards intelligent agent as a whole, directing representation to indicate the goal, and to figure out the ways of indicating the goal, developing the experiments that find a way to indicate the goal more efficiently. Local interpretation optimizes the representation with regard to current cognitive algorithm, where operations are instrumental, performing small subtasks far from the goal.