10. The libidinal economy of the computer
Ian WRIGHT, a member of Sloman's working group in Birmingham, has developes his theory further into the computational libidinal economy (WRIGHT, 1997).
Wright categorizes the theories of Simon, Sloman, Frijda as well as Oatley and Johnson Laird under the term "design-based interrupt theories" and formulates three points of criticism which apply to all approaches mentioned.
10.1. Criticism of interrupt theories of emotion
Simon differentiates in his approach between emotions and interrupt function which possess a highly adaptive value and emotions with disruptive effect which run counter toward an adaptive behaviour. According to Wright, the criticized theories did not solve the problem so far why a disruptive, thus adaptively not meaningful, emotion can take over the control over an intelligent system and maintain it for a longer time. Obviously, the meta- management system is not in a position to terminate the disturbance quickly in such cases. In order to explain such phenomena, the theories would have to be extended by phylogenetic, ontogenetic, and social aspects.
Wright criticizes the existing theories because they do not suggest mechanisms which explain the connection between emotional states and learning processes. For him, emotional states possess not only a motivational component, but are also important impulses for learning processes. This is also pointed out expressly by Frijda (1986). In connection with this must be seen the correlation between the intensity of an emotion and the learning process which are not explained by the interrupt theories.
According to Wright, the available theories do not explain on which mechanisms hedonic tone signals are based, why such signals are "simple", why they differ from semantic signals and why they are, in the cases of joy and pain, either positive or negative.
Simon, so Wright, turns feelings simply under the physiological carpet by postulating that all hedonistic states are consequences of the perception of physical conditions. Therefore it is not possible to explain with his theory, for example, a condition like "mourning" and the associated psychological pain which must not necessarily be connected with states of physical excitation.
For Frijda, Oatley & Johnson-Laird as well as Sloman hedonistic components are simple, phylogenetically older control signals. Thus they have at least a function on the level of information processing.
Frijda underlines the meaning of the hedonistic colouring of emotional conditions. His theory postulates relevance signals for joy, pain, being astonished or desiring, which arise if an event is compared with the satisfaction conditions of different concerns.
Oatley and Johnson-Laird explain the hedonistic components of fundamental emotional states with their concept of control signals. Their theory assumes, for example, that the hedonistic colouring of joy or sadness is caused by fundamental, not further reducible control signals. Because of their functional role, control signals have different hedonistic values. The control signal for sadness, for example, has the function to break off or change plans, while the function of happiness consists of maintaining or further pursuing plans.
In Sloman's theory, insistence is not connected with hedonistic components. Sloman understands, however, the meaning of hedonistic components which play a motivational role as negative or positive evaluations by breaking off or maintaining actions. He grants that his model must be extended by a pleasure and pain mechanism.
Wright tries to find a solution for the latter problem by starting, first of all, with definitions. For him, hedonic tone is a too general term. Therefore he uses the term valency. Firstly, Wright differentiates between physiological and cognitive forms of joy and pain. Then he states that hedonistic colouring always is connected with a quantitative dimension, intensity. He quotes Sonnemans & Frijda (1994) who differentiate between six aspects of emotional intensity: the duration of an emotion, perceived bodily changes and the strength of the felt passivity (loss of control of attention), memory and re-experience of the emotion, strength and drasticness of the action tendency as well as drasticness of the actual behaviour, changes of beliefs and their influence on the long-term behaviour, and an overall felt intensity. Wright points out that none of these categories describes the intensity of the hedonistic colouring, but that, however, the category of the "strength of the felt passivity" is connected with it because both intensive joy and intensive pain can be controlled voluntarily only with difficulty.
Then Wright defines valency as follows:
"Valency is a form of cognitive pleasure or unpleasure not linked to information concerning bodily locations, and is a quantitatively varying, non-intentional component of occurrent convergent or divergent emotions. Valenced states are contingent on the success or failure of subjectively important goals."
Wright takes the system of Sloman as a basis and extends it by the component reinforcement learning (RL). In order to be able to implement this mechanism, he postulates first: "A society of mind needs an economy of mind."
For Wright it is important that RL always contains a selection component: reinforced actions have a stronger tendency to be repeated than non-reinforced actions.
In order to employ RL on all levels of a multi-agent system , it requires an appropriate reward mechanism. For this, Wright relies predominantly on four corresponding algorithms: Q-Learning, classification systems, XCS and Dyna.
With Q-Learning (Watkins & Dayan, 1992), an agent tries to learn for each possible situation-action combination what the value for this action is, if it implements it in the given situation. At the beginning, the values for all possible situation-action combinations are set to a default value. The goal of the system consists now of updating the values in such a way that they lead to the maximum cumulative discounted reward.
The maximum cumulative reward at a given time consists of the reward for the directly following action as well as of the rewards which can be expected for the actions following it. These rewards are discounted in such a way that rewards which can be expected directly are more highly evaluated than rewards which can be expected in the future.
The reward forecasts P for each possible situation-action combination are stored in a two-dimensional matrix. The algorithm selects from this matrix the action which possesses the highest forecast value for the present situation. With the help of an update rule, the values are afterwards computed anew.
One of the greatest weaknesses of Q-Learning consists of the fact that with large situation and action areas the appropriate tables become excessively large and make an economic trial and error search impossible.
Holland (1995) has developed an algorithm with the name classifier system. With it he wants to guarantee that a learning success which is based on an action succession of several modules leads to rewards for all modules involved.
In his system, there are numerous classifiers which are nothing else than IF-THEN rules (condition-action rules). Some of them observe the environment and, if the own rule is fulfilled, send appropriate messages to a kind of black board (message list). Other classifiers suggest their specific action suggestions due to the information at the black board. The probability of the acceptance of such an action suggestion by the system is predominantly based on the strength of the classifiers which is deduced from how successful its suggestions were in the past.
If the accepted action suggestion of a classifier leads to success, then it receives a reward which lets its strength increase. If a failure follows after its suggestion, it receives a punishment with which its strength is decreased. It shares the reward or punishment with all other classifiers which had a part in leading to its suggestion.
This credit assignment is achieved by a bucket brigade algorithm. The algorithm is called bucket brigade because not only the last classifier in a series of classifiers is rewarded or punished, but the rewards or punishments are proportionally disributed among the classifiers who participated in the end result - like firefighters in earlier times handed the water buckets along. Thus a reward can be propagated backwards through the system and cause respective reinforcements in certain action chains.
Holland has coupled his model with a genetic algorithm. Successful classifiers are paired and can produce new classifiers which can then work again more effectively.
With XCS Wilson (1995) presented an advancement of Holland's classifier system. XCS deals with one of the weaknesses of Holland's system in which only the strongest are rewarded. For the success of an XCS agent, not its absolute strength is decisive, but its ability to make correct forecasts over the probability of success of its actions. If thus a classifier in the XCS system predicts correctly the fact that it will receive a low reward, this qualifies it for the inclusion into the genetic algorithm.
Sutton's (1991) Dyna architecture proceeds still another step because it possesses the ability to plan. Before an action is initiated, Dyna can, by trial and error within a world model, play through "in its head" the consequences of possible actions and thus develop an optimized action strategy.
Wright points out that RL algorithms are trial-and-error learners which need appropiately gradual rewards in order to be adaptive. "Unfortunately, the form or forms of value in natural reinforcements learners are unknown." (Wright, 1997, p. 139)
Wright further points out that value can have two different meanings: On the one hand it is used when an object is evaluated. Someone likes an object very much, it is dear to him. The other use is the assigning of value to an object regarding a certain goal: A power saw mostly possesses a higher value than an axe for a lumberjack.
Wright differentiates between the value an external object can have and the value which an internal state of a system can possess. Value for Wright is a relationship between a goal-oriented system and its own internal components. Value "refers...to the utility of internal substates" (Wright, 1997, p. 138).
Value ist as well a scalar quantity as a control signal. The form taken on by value in RL algorithms is one of a scalar quantity. Such a scalar quantity is, contrary to a vector, not dividable into components with different semantics. Values specify a better-than relation between substates and have no further meaning..
In a RL system, the values of the different substates change over time; value controls thereby the respective action alternative to be executed. The value of a substate lies in its ability to buy processing power with it.
Wright points out the coordination problem in multi-agent systems (MAS) which had also been mentioned by Oatley (1992). This is especially true with adaptive multi-agent systems (AMAS). The solution, according to Wright, is an internal economy with a currency flow.
Wright compares an AMAS to an economically operating society:
Based on this, Wright develops his currency flow hypothesis (CFH):
10.5. The details of the CLE system
Wright's computational libidinal economy unites the model of an intelligent system sketched by Sloman with a learning mechanism and a motivational subsystem which maintain emotional relations with other agents. Wright hopes to be able to solve with this model a problem of Sloman's model which he calls valenced perturbant states problem, because it cannot explain how perturbances with a valenced component are produced.
Wright begins the description of his model by specifying the CFH again for natural RL
The description of the CLE covers several aspects: A libidinal selective system, a scalar quantity of value, credit assignment as well as a value circulation theory of achievement pleasure and failure unpleasure.
Wright's libidinal selective system is a cognitive sub-system whose main task is the development of social relations. It contains the following components:
10.5.2. The conative universal equivalent (CUE)
In Wright's model, CUE represents the scalar quantity form of value. The term "conative" is used here by him in the sense of "motivational". CUE is the universal means of exchange between the substates of the libidinal system. The possession of CUE means the ability to buy processing power. This can take different forms:
Thus CUE stands in a causal relationship with the interruption abilities of motivators and their ability to demand attention resources.
The exchange of CUE reflects the flow of semantic products within the system: In order to get into the circulation, a substate must pay the substate which supplied the semantic product to which the first substate reacts. This distribution of CUE to preceding substates takes place according to Holland's bucket brigade algorithm.
10.5.4. Circulation of value
The CLE has two distinguishable internal states: intentional and non-intentional. The intentional component of the CLE is the set of the substate products, in particular the motivators produced by the libidinal generactivators. These have a representational content, they have to do with something. The non-intentional component of CLE is the circulation of value. This circulation of value is a flow of control signals, not of semantic signals.
For this, the circulation of value needs a module of the overall system which observes and registers the internal flow of CUE; the meta-anagement layer mentioned by Sloman. This mechanism will at any time determine a movement of CUE within the system. For each substate the values change, according to whether it is rewarded (positive) or punished (negative).
Wright demonstrates with a thought experiment to which this can lead. A virtual frog (simfrog) learns the catching of flies in a virtual environment. If the substates necessary for this are successful, the meta-management layer registers an increase of CUE compared with the time before. Now let's assume the observations of the meta-management layer were coupled with the skin color of the frog: positive values lead to the skin turning yellow, negative to it turning green, and no changes to no skin change. After a successful fly catch the frog would notice a change of its skin color which it cannot explain itself. At the same time, it has either positive or negative feelings of different intensity (depending upon change of the CUE state). A non-intentional control state develops which was released by the circulation of value in a system with a meta-management layer.
Wright therefore adds a further element to his libidinal economy: Valency as the monitoring of a process of credit assignment. The registration of the circulation of value produces valenced states which represent a kind of cognitive achievement pleasure or failure unpleasure.
10.6. A practical example of CLE
Wright demonstrates the working of his model with the example of mourning. Based on an analysis of comments from people who mourned, he picks out a number of phenomena and tries to explain the underlying processes with his theory.
1) The repeated and continuous interruption of attention by thoughts about and memories of the deceased.
If a bond structure exists with X, then motives and thoughts which refer to him will emerge and compete with success for processing resources of attention. To these cyclic processes can belong the desire the deceased might still live, or the desire one could have done something in order to prevent his death. Due to the messages over the death of X this and other substates will therefore very probably circulate through the system in which they are deeply rooted, due to the intensive connection to the deceased. The thought processes of the agent are shaken by perturbances and will be partly not under his conscious control.
2) The difficulty to accept the death of the deceased.
The updating of a large data base and the propagation of the information through the system take some time. Also, the agent has affective reasons not to accept the information because this would mean that sometimes a year-long process of building up a relationship was for nothing. Finally, there is the knowledge of the agent about a long and painful mourning process which he would like to postpone.
3) The disruptive effect on everyday functioning.
The daily goal processing is made more difficult by management overload which is to due to the disturbance of the motive management processes
4) Periods of relative normality, in which the mourning is pushed into the background.
With important new tasks, the filter threshold is set so high that thoughts of the deceased cannot come through. After accomplishment of the task, the filter threshold sinks again, and it comes again to mourning
5) Attempts to fight the mourning.
The activity of a meta process which notices the disturbance and tries to fight it. This succeeds, however, only rarely; frequently the result is only pushing the motivators below the filter threshold, where they increase in urgency and wait for the lowering of the filter threshold. Then they come through in larger numbers and lead to a control loss of the agent over the system. The perturbances will only then decrease if the CLE nearly finished the process of detachement.
6) Motivators of second order, i.e. evaluation of mourning.
Meta-management processes which are culturally influenced.
7) Subjectively experienced pain.
Loss of CUE leads to negative states which are experienced as pain. Negative valency is dominant because generactivators produce motivators which cannot be satisified anymore. This leads to an overproduction which leads to a gradual deselection and possesses high negative valency.
If motives which disturb the normal management process penetrate again and again through the filter and one does not succeed in pushing them away for a longer time, an agent often can't think of another strategy to change this state. "Crying is the plan of last resort, and can be triggered by negatively valenced perturbant states." (Wright, 1997, S. 207)
Wright postulates that his model contains the solutions for the problems of interrupt theories outlined by him. He explains this for the four addressed problem areas:
Oatley and Johnson-Laird postulate in their theory fundamental and not further reducible control signals for emotions like happiness and sadness. With CLE, one element is sufficient: The circulation of value consists of simple control signals which are observed and registered by another instance. Depending upon the result of this process the emotions develop, for which Oatley and Johnson-Laird assume two separate signals.
The circulation of value has the added advantage that it coordinates a variety of relatively autonomous substates. The task of the circulation of value consists in the long run only of attributing positive or negative assets. All other effects are of second order and result from the original, simple function.
The CLE theory also explains, according to Wright, why control signals differ from semantic signals. Value is nothing else as a means to establish better-than relations between substates and contains thereby no semantic content whatsoever like, for example, beliefs or desires.
By the introduction of a fictitious currency CUE and its circulation through the system learning effects become possible. Reinforcement learning can thus change the abilities of generactivators to interrupt processing and to demand resources of the system for themselves.
besides, emotions have a strong influence on learning processes. The more results of behaviour are accompanied by positive or negative feelings, the better the appropriate behaviour is learned or avoided. Through increase of CUE, substates win more power in the system; the more CUE, the more strongly is the registered intensity of the valenced state.
Wright equates a concern in Frijda's sense with a libidinal generactivator whose strength defines itself by how much processing capacity it can buy. At the same time, it also determines its disposition to be able to affect behaviour. This strength is based on the CUE accumulated by it.
Generactivators of the libidinal system which have much CUE produce motives with a high interruption potential. A high increase or a large loss of CUE leads to a valenced state that can be accompanied at the same time with a loss of control (mourning, triumph).
10.7.4. CLE and the control precedence problem
Why can dysfunctional and non-adaptive emotions take over control and not be pushed back by the meta-management layer? Wright offers as an explanation that the process of the accumulation of CUE cannot be controlled by libidinal generactivators of this layer; they can only register the processes. Only the libidinal selective system itself can decrease the strength of a substate which has too much CUE and thereby affects the overall system disruptively. Only if this has taken place, the state of loss of control will be lifted.
Wright's model tries to solve a number of problems which have been avoided so far by other computer models. Of special interest is his suggestion for the treatment of the hedonic tone problem. While other models define the hedonistic value of an event always directly, Wright tries to model this as a characteristic of a system.
The connection of the theoretical approach of Sloman with reinforcement learning and the introduction of an imaginary currency whose circulation through the system is responsible for emotional processes, require a model of high complexity, offer however, in the context of the model, a convincing explanation for the development of emotions as well as for disruptive emotional processes - and this not alone on an abstract level, but very close to an operationalization.
On the other hand, one could stress, with Pfeifer, the argument of "overdesign" against Wright's model. The already very complex model of Sloman which was implemented in MINDER1 becomes more complex by several degrees through the additions of Wright, which menas very high demands on the programming of the system and the underlying computing capacity.
Of all presented models, Wright's is the only one that neither excuses itself through "emergence of emotions" as reason for their lack of integration into a model nor hard-wires them into the system from the start. It remains to be seen to what extent his attempt of a theoretical explanation of emotions in connection with a "partially emergent" design proves itself to be sound with the implementation into an actual model.