The project of Friendly AI

September 19, 2009

What is the problem and project of Friendly AI? This issue is rather confused, so I’ll outline the motivation and break the problem into its two main components.

Much of the power of technology manifests as predictable tools we create. Predictability comes in many forms: a bowl is expected not to leak liquids, an excavator is expected to be useful for digging holes, a written note is expected to bring back forgotten memories. Tools can be trusted to deliver their predictable effects, and so can be safely designed to wield great power. An A-bomb, based on its design, is trusted not to blow up spontaneously, and software in the banks is trusted to correctly keep track of everyone’s accounts.

Humans can inventively reason about a lot of things, but our ability to correctly anticipate the effects of detailed plans is pretty limited. When designing a bridge, it is not enough to pick its shape and materials and estimate intuitively whether it’ll stand a load: this mode of operation will yield an unpredictable result, one that can’t be trusted. To get better at designing predictable tools, we invent more tools targeted at helping in this task.

Computers can be used to implement huge calculations, if the problem statement can be entered explicitly. For example, you can program the material and mechanical laws in an engineering application, enter a building plan, and have the computer predict what’s going to happen to it, or what parameters should be used in the construction so that the outcome is as required. That’s the power outside human mind, directed by the correct laws, and targeted at the formally specified problem.

The process of decision-making has two aspects: prediction (factual estimation) and valuation (moral estimation). To be selected, a plan has to be both feasible and lead to good consequences. It is possible to implement a nuclear winter, but people don’t want that to happen. So far, people have been fairly successful at designing powerful mental tools for prediction (think physics, not futurism), but outside narrow domains, the application of the resulted plans always has to be “manually” morally evaluated by people in order to proceed with the decisions. We can create designed powerful tools to augment only half of the decision process, the other half remains hopelessly in the domain of human brain.

Let’s say we built an AI, a tool capable of planning in any domain, that is also capable of estimating desirability of plans, and so can make decisions autonomously. If this AI is considered independently of its goals, it’s like an engineering application with a random building plan: it can powerfully produce a solution, but it’s not a solution to the problem anyone needs solving. If you can specify a problem, but don’t have the AI, nothing happens. If you have the AI but give it a random goal, it solves a random problem, with all its power of precision and autonomy. The AI algorithm is essential when you do have an ability to specify the problem, but it’s a separate issue from specifying the problem statement that comes from human nature.

I tentatively identify Friendly AI is an autonomous decision-making tool that is powerful at what it does and can be trusted not only with factual estimation, but also with moral estimation. You don’t have to manually check what it deems desirable, just as you don’t have to manually check how a calculator arrived at each specific result, to be confident that the result is correct.

What is the difficulty then? Why can’t we program human values in a computer, just like a building plan, to be computed in higher resolution? The answer is that we can’t explicitly see our values. We can use them, with varying levels of success, but we can’t write them down, cast the whole of human preference in explicit form. Any direct attempt to do so will end up as a crude caricature that breaks in situations not at all difficult to find. A moral machine would need to work with human values, but human programmers can’t enter them, and neither can they do in their heads what a machine would be able to do given a formal problem statement, because humans can’t handle this problem statement, it’s too big. It could exist in a computer explicitly, but it can’t be entered there by programmers.

So, here is the barrier: problem statement (human values) resides in the structure of human mind, but the strong power of inference doesn’t, while the strong power of inference (potentially) exists in computers outside human minds, where the problem statement can’t be manually transmitted. Creating Friendly AI requires these components to meet in the same system, but it can’t be done in a way other kinds of programming are done.

On the surface, the problem of Friendly AI seems to be about engineering an algorithm capable of powerful planning that is guaranteed by design to follow a clearly defined goal system. But the deeper problem seems to be extracting that goal system from humanity, seeing values in the messy detail of a given physical system.

Technically understanding the more or less arbitrary physical artifact as an instance of goal-directed algorithm is a problem much more general than constructing a specific algorithm. To see the human values in detail, the basic paradigm of what values are, as a property of physical processes, is necessary. Here we seem to be on a pre-Newtonian stage, there is no “mass” or “force” in the description of preference (but there is a lot of existing science to throw at this problem).

The project of understanding arbitrary physical systems as formal goal-directed agents is (1) more general than designing a specific goal-directed AI, so that the solution to the latter may not even meaningfully contribute; (2) a necessary component of any successful FAI design; (3) safer than designing an AI, which, given arbitrary goals, is a very dangerous thing to have around; and even (4) may answer some fundamental conceptual questions in AI design, allowing to complete the project.


Leveling up

August 10, 2009

That scream of horror and embarrassment is the sound that rationalists make when they level up.

Eliezer Yudkowsky

I wasn’t updating this blog since February, and the reason for that is that I understood a couple of things that prompted a change in the perspective on how to do and communicate research, as well as the direction of research.

What I’ve been doing before was top-down formalization of intuition, something akin to philosophy: start with a vague idea about a phenomenon, and then iteratively clarify it, step by step rendering parts of the idea more explicit, in turn using the clearer understanding to train intuition, and so on. Throughout the process, there are almost no stand-alone technically understood components, everything is only held together in the mind. The intermediate product of this process is a set of mental tools allowing to better understand the phenomenon under study.

There is a number of related difficulties to this approach. As most of the concepts are fuzzy, there is a temptation to neglect epistemic hygiene. This shows in attempts to cover inferential distances with explanations that misuse technical terms, saying only something similar to the truth, but not really true, as it’s easier this way. This plagued the first sequence I ran on the blog, in June-July 2008, with term “probability”. What it really takes to describe a complex idea that you don’t yet understand technically is probably a book-length description, that won’t be an easy read either (with much of philosophy being the primary example). More importantly, it’s easier to engage in sloppy thinking, creating the illusion of progress while going in circles, and to start chasing lost purposes, solving problems that don’t need to be solved.

While research is informed by both facts and tools from the literature, in the “fuzzy” mode there is really very little that generalizes to something helpful on a not-directly-related problem. The most helpful thing is the methodology, a set of tricks for managing concepts as they develop, separating meaningful ones from the trivial, grounding in the existing body of science, and so on.

What I discovered when I started to look into the mathematics on topics related to intelligence (machine learning, graphical models, decision theory, game theory, formal semantics, logic, model checking, etc.) is that the intuitions forming in the mind once you understand these topics are vastly superior to those I was able to gather before, both from reading “fuzzy-grade” research (descriptions of “ad-hoc” AI approaches, neural nets, cognitive science, neuroscience), and from developing my own structures. At that point, I was down this “fuzzy” path for about year and a half, starting from no knowledge in the related fields; the material I described on the blog is what I constructed in the first half a year, a year before writing it up, since reduced to a kind of recurrent neural networks, with experimental implementations and so on. It took only a couple of months to comprehend the hands-down superiority of math, even for the ideas that aren’t reduced to math yet.

And then I saw that the problem I was solving doesn’t really develop in the direction of Friendly AI (FAI), that all my previous activity was mostly a lost purpose, apart from educational value. I was acting from a vague idea that understanding AGI is a step in the direction of understanding FAI, since FAI is a kind of AGI. This idea turned out to be misguided for a number of reasons, that should become clear from the following posts.

I leave the existing posts be, despite not really approving of them, and will resume blogging here.


Learning factored representation

February 11, 2009

Followup to: Balancing context with conceptual slippages, Summarizing structure in new labels, Independence of patterns.

Repeated contexts and context transitions become compressed over use, losing variability in their compressed form. Any distinguishing characteristics of particular instances of such repeated contexts can be extracted as separate properties. Commonalities get compressed in a central pattern, and variations become properties of that central pattern. For example, typical objects, such as cups, have certain common characteristics, but properties of a particular cup can be expressed as additional patterns showing where it differs from typicality.

Central pattern of an object extracts mutual information from features describing the object, and as a result remaining patterns of object properties become more independent from each other, given the object pattern. A change in one property of an object doesn’t usually call for changes in other properties, and if it does, the dependent properties should probably again be summarized by a new single property. Individual slippages of object properties don’t affect most of the scene.

Resulting representation shouldn’t be strictly hierarchical, as limiting the representation to a hierarchy significantly reduces its expressive power. Center of a natural category can consist of a collection of interfering patterns, encoding the object’s structure and instantiated depending on context, whereas more rare characteristics are much more independent, given any compatible state of the object’s center.

Learning factored representations of transformations of the scene may result in formation of procedural patterns, with the center of transformation becoming procedure itself, and peripheral variations in transformation’s properties becoming arguments of the procedure.


Independence of patterns

February 7, 2009

Followup to: Structural representation of uncertainty, Interference of patterns

In a given scene, two patterns are called independent, if changes in one of them don’t lead to changes in another, if they don’t interfere directly or through short enough sequence of changes in the scene. Independence is conditional on context, so two patterns can be independent in one scene, but not in a different scene, and a change to a third pattern can make them interfere.

Interference makes scene a whole, connects its parts, translates the presence of additional patterns into influence on behavior of existing patterns. Independence allows modular composition of elements of the scene, “keeps everything from happening all at once”. It could also be fundamental to scalable implementation, since it makes interactions between patterns local on each given step.

Groups of patterns, where patterns in each group are mostly independent of patterns in other groups, can function in parallel, so that the whole processes in inference within each group are independent from other processes. Such configurations could be used to compute answers to subproblems, to divide a bigger problem on a collection of smaller ones (using procedural patterns), or to model a bigger system by a collection of models of its parts.


Patterns as contextually invoked procedures

January 3, 2009

Followup to: Continuous balancing of changing structure.

Patterns present in the memory direct the change of current context. They are instantiated depending on state of the scene, and resulting state of the scene depends on their structure. Thus, apart from declarative interpretation, patterns can be considered as contextually invoked procedures.

In simpler cases, declarative patterns build new structures in the current scene, adding content, and displacing other content. The resulting structure can be mostly predetermined, perhaps as reconstruction of past episodes, verbatim or distilled into semantic memory.

In other cases, procedural patterns implement more elaborate procedures that recombine existing patterns in a scene, even if those existing patterns never occurred in the same combination before. The operation of procedural patterns can be thought of as based on controlled conceptual slippages.

The structure of the scene is supported on a network of contextual interfaces between patterns. When interfaces change, so does the structure. When previously unconnected patterns somehow acquire compatible interfaces, these patterns become connected, which in turn leads to interference between them, and a wave of integration of their structures. The scene gets rebalanced around a new connection.

To implement nontrivial procedure, a pattern is invoked for a combination of cues on existing patterns. Its application attaches new cues to these patterns, that act as interfaces connecting them in a new way and starting a recombination process.

The operation of procedural patterns is slightly analogous to the way complex biochemical processes work in a cell, with molecules being produced step by step, new cues appearing at each step, allowing new reactions to proceed at various active sites, protein folding acting as structure-changing rebalancing, enzymes implementing global context, and structures like ribosomes reliably transforming elements of representation. Analogy is rather weak, but shows some of the elements of the process. It applies more to the declarative patterns than to procedural ones, and doesn’t include learning.

When patterns are interpreted as contextually executed procedures, the balancing process can be interpreted as a process of parallel procedure execution, where multiple procedures are running in their local contexts, interacting with each other through that context. Each procedure has activation conditions, and each procedure has its own structure that determines its effect when applied at the call site. External input introduces the change in the context, but gets processed the same way. A balanced scene that doesn’t change corresponds to a stable point, with procedures running in a loop. Declarative patterns are simple procedures, and procedural patterns are more general. Declarative patterns are “nouns”, and procedural patterns are “verbs”. Some patterns contain just a few steps, and some initiate complex processes, transforming the scene along one of the many possible paths, chosen based on context along the way, branching out into multiple parallel procedures.


Episodic and semantic memory

December 21, 2008

Followup to: Fragments of structure, Summarizing structure in new labels.

In this model, structure restoration can be interpreted as remembering, and context balancing as focusing attention on contextually relevant facts and memories. Depending on the character of restored structure and restoration process, some memories can be considered episodic or semantic. Episodic memory restores a significant part of a single past scene, allowing to situate the restored structure relative to known locations and times. Semantic memory plays out a semantic rule of thumb, filling in a property that can be discerned from the context, and even though this property can have complex structure, it isn’t associated with a particular past scene.

Before an episode is first recalled, the pattern of that episode is unique in the memory. This theoretically allows to recall every single bit of the old episode, there is no ambiguity in the details, for example if an unique label belonging to that episode gets triggered by a cue. But once a part of an episode gets recalled, its content is associated with two episodes: the original one, and the episode of recall. The recalled part, or episodic memory, becomes a weaker cue for the parts of the episode that were not recalled the first time. After many recalls, the episodic memory that gets restored in the context of recall is formed more as a reconstruction of previous episodes of recall, than of the original episode. The details that are not usually recalled get forgotten, and the details that by some reason get distorted during recalls stay distorted in the subsequent recalls. These effects, following from simple considerations about associative memories, are known to occur with human memory, as retrieval-induced forgetting and memory distortion, and can be mimicked by very simple models.

The first episode in agent’s experience to which a rule encoded in semantic memory applies, plays the same role as the original episode of episodic memory. The only difference is in what kind of content gets the emphasis during the recall. Episodic memory retains the relations to many details, even as retrieval-induced forgetting tries to shut them out, with rarity of recall events helping the matter, while semantic memory focuses on few properties and gets applied over and over. This allows to view semantic memory as a special case of episodic memory, that through many cases of recall abstracted out all of the episode-specific details, leaving only what’s usually important in the contexts of recall.

When a fact is learned declaratively, there is an episode in which it’s first stated, but the details of that episode are irrelevant to the fact itself. When the fact is recalled in the future, the recalling process can stop on the fact, without going into the details of the episode in which the fact was learned, even though those details are available. The limited part of episodic memory becomes semantic memory. Alternatively, a fact can be abstracted out as a regularity present in many scenes, without ever being a part of episodic memory, as the only unambiguous inference that follows from cumulative memory of many past episodes.

Another interesting effect is that not every episode can form an episodic memory. If an episode is so ordinary that no cue can uniquely point to it, there is no way to recall it as an episode. When you commute to work a hundredth time, you don’t usually pay attention to details, each action results from a known rule, there is nothing to learn from the process. New memories capture reusable novelty, contexts that are expected to repeat sometime in the future, but did not appear in the past.


Summarizing structure in new labels

December 9, 2008

Followup to: Structural representation of uncertainty, Continuous balancing of changing structure.

Labels allow to represent states of knowledge in the most compact form. Expressive power of structural contexts and whole scenes allows to construct representations of elaborate combinations of previously encountered states of knowledge. These more complex representations are harder to manage than simple labels, and so when certain structural pattern (or map) becomes common enough, it can be assigned a new unique label of its own.

New labels allow to compactly represent frequently encountered states of knowledge, to form a language adapted to the environment. Multiple generations of labels can represent the most salient aspects of bigger and bigger structures in the scene. Smaller representation allows more robust processing. Structure restoration can function with fewer errors because structures that need to be restored become smaller. Maps can capture more global concepts in the scene without needing to consider more labels at the same time, because common combinations of labels are summarized by new labels. New labels form a basis for new levels of representations for structures that are already represented in the scenes.

A label by itself is good for nothing: if it isn’t in any map, it’ll never get restored. If a new label appears once in an ordinary context, it’ll never be restored again, because this context is matched by old maps better than a new map that also includes a new label.

A new label can get remembered if it appears in a novel structural context consisting of old labels. New maps that capture this new context can be restored in the future by right combinations of the old labels, and as a result restore the new label. From now on, the new label appears in all scenes containing this novel structural context, at the same time new maps representing this context do. As a result, it becomes possible to represent this context and associated maps just by the new label, and this label gets learned by other maps to reflect the presence of regularity represented by it. It’s no longer owned by the context in which it was bootstrapped and with which it was originally associated, and in the future it can even get completely disassociated from it, gradually shifting its semantics elsewhere.

Thus, it’s unnecessary to create a separate algorithm to manage new labels and rigidly assign them to maps they are supposed to represent, map learning takes care of it. It’s sufficient to create a new label for each new map, and if it turns out to be useful, it’ll get learned by other maps. Relatively useless labels get abstracted out of maps, the same way relatively useless maps get discarded or merged with other maps.


Continuous balancing of changing structure

November 26, 2008

Followup to: Restoring the structure, Balancing context with conceptual slippages.

Now that the operation of our algorithms is no longer monotonic, so that the scene is not just being extended, but balanced, with possible replacement and deactivation of patterns, it’s time to consider its continuous operation.

In continuous balancing, scene is never being reset, and the process of balancing never stops. Elements of the scene are being updated through external activation and deactivation of certain maps (change in their salience), concurrently with activity of structure waves. This is an equivalent of sensory input. For now, let’s assume that this input describes the scene on high level as well as on low level, activating maps corresponding to arbitrarily abstract properties and relations. To simplify the dynamics, let the scene change slowly relative to propagation of structure waves.

Known maps (long-term memory) are simply maps that were synthesized by structure waves at some point. When certain pattern loses support from sufficient number of other salient patterns, it gradually fades from the scene. Resulting maps with no salience can later be reactivated, if they fit a structure wave better than alternatives. The strength of parameters of a map depends on how it was constructed and changes with each reconstruction.

This setting threatens to leave too much debris, with elements of old scenes remaining active when current scene is updated to something else entirely. However, each active pattern interferes with other patterns, influencing global context. Forgotten elements of an old scene are influenced by the current scene, and vice versa. Old scene can’t change externally updated elements of the current scene, which to some extent gives the direction to dynamic. On the other hand, preserving context left from the old scenes is also a very important feature, allowing to model dynamics of environment and perform deliberative inference.

Representation can now be considered a dynamic inductive-predictive model of environment, responding to sensory input and improving itself by drawing inferences between its elements and learning new rules. This representation is a more technically elaborated substrate for holistic control framework, though far from being specified in enough detail to be implemented.


Interference of patterns

November 25, 2008

Followup to: Balance of context.

What makes a scene whole, as opposed to collection of unconnected parts? Parts (patterns) influence each other, so that some collections of patterns are balanced, and some are not. Balance is in stability of local reconstruction of structural contexts, and the process of balancing the scene in waves of local structure brings it together. There are many properties present in balanced scene and not in an unconnected collection of parts, but functionally the distinction comes down to local balance.

A balanced scene is consistent: different paths reconstruct the same pattern in approximately the same way. This notion is tricky, since one could say that if different paths create different reconstructions, these are reconstructions of different patterns, not of the same pattern. What is the identity of a pattern, when should we say that two slightly different structural contexts are approximations of the same, and when they should be considered separately?

Identity of a pattern is a functional property, it reflects the fact that all reconstructions of the same pattern are affected by changes in the pattern. Different reconstructions of the same pattern interfere with each other. In unbalanced scene, reconstructions constantly change each other, identity of patterns isn’t localized, they are mixed together. Changes accumulate to affect the global context, interfering with all kinds of patterns throughout the scene.

Pattern interference establishes structure on the implicit pattern graph. Different structure waves propagating through it interact by interference of enumerated patterns, even if they don’t share exact structural contexts along the way. There is a huge number of patterns and paths in the pattern graph, and interference allows them to interact with each other when the match isn’t exact.

As the scene becomes more balanced, each reconstructed pattern interferes with less and less other patterns. Change in a slightly unbalanced scene takes the form of a conceptual slippage, a localized interference that directly influences only reconstruction of the same pattern, with minimum influence on the global context. However, interference doesn’t refer to all changes initiated by variation in a pattern, only to changes in the first affected step of reconstruction in other structure waves, in immediate effect of changes in map salience on pattern matching. A local conceptual slippage could initiate a wave of reconstruction that eventually rewrites the whole scene, through local interference of reconstructions of the slipped pattern and global effects of structure waves that subsequently passed through it, changed by the interference.


Balancing context with conceptual slippages

November 20, 2008

Followup to: Balance of context.

In his book “Fluid Concepts and Creative Analogies”, Douglas Hofstadter, among other things, explored the notion of conceptual slippage, as a mechanism of fluid cognition. Conceptual slippage is a “context-induced dislodging of one concept by a closely related one, inside a mental representation of some situation”. It can be caught in action in slips of the tongue, when related words or phrases competing for expression of the same context get morphed together, resulting in combinations such as “I’ll chake a look” (“I’ll check is out” vs. “I’ll take a look”) or “I’m going to the begin — uhh, the entrance of the store”. Conceptual slippages perform steps of local search, gradually driving mental representation in a more balanced state.

When global context shifts, it changes the dynamic of structure waves. Some structural contexts get reconstructed in a slightly different way, and since different structural contexts match different patterns, the effects gets magnified along the path of structure waves. As a result, some new patterns get reconstructed, and some old patterns get forgotten. New patterns come with associations of their own, initiating restoration of whole new structures. The moment when a noticeably different pattern gets introduced in the scene corresponds best to the concept of conceptual slippage. It’s a change in representation that lies at the boundary between tiny changes, and a wave of integration of completely new structure.

Conceptual slippages can be used to describe the process of context balancing. Like patterns, they are a tool for describing the dynamic. They are above insignificant changes in structure waves, but still small enough to have structure of their own, to capture replacement of one pattern by another similar pattern. If global context is changed slowly enough, even significant reconfiguration of representation can be regarded as a sequence of conceptual slippages. As I’ll describe later, good representation learns to perform anticipated shifts of balance in as few conceptual slippages as possible, even if the change appears too big and distributed at first.