The CoBra Project

Collective Brain

It has been a few weeks since I started thinking about building a website for fun. Basically, I was trying to come up with an idea that would be easy enough to be implemented fast and complex enough to make it interesting for users. I was also thinking about something interactive -- something that would involve the users in an interesting and fun way. So, I came up with the following idea: "Every social media platform makes its users to log in and post something individually. What if they had a platform where they post something collectively?" I don't want to talk about this particular idea and its implementation in this post very much, so, long story short: you can visit the CoWis (Collective Wisdom) website and see it for yourselves. Then I kept thinking about something new. This idea of making people do something "meaningful" or "unmeaningful" together seemed very intriguing to me. It has been a few days now that I finally refined some of my all-over-the-place thoughts and now I am thinking I have got something interesting again. I will walk the readers through my thought process step by step.

Imagine that there was a mobile application that people could just launch and press the big red button saying track me. After pressing the button without worrying about it too much, you just get in your car and drive it from one place to another. Now, you might think where am I going with this? Collecting data about carbon emissions? No. Collecting data about the roads? No. Solving autonomous driving? No. Making it easier for serial killers to find people? No. Then what is it for? Well, I will give away the spoiler of a movie that's not finished being filmed yet: it is to use live traffic to solve scientific problems. The reason that this is the spoiler of a movie that's not finished being filmed yet is because I don't even know if doing something like what I just mentioned is even possible, and if possible, how one could actually implement it. But this was the idea regardless and it was an initial spark that led to the chain of more generic versions later on.

As intriguing as this idea of solving scientific problems by using live traffic processes seems to be, I still don't know to what extent it would be possible to do such a thing both in theory and in practice. This leads to the following question: What is the computational power of traffic? In other words, how could one quantify computational power of traffic? Is it computationally equivalent to combinational logic, finite state machines, pushdown automata, or even Turing machines? In fact, these questions could be asked for any type of process such as earthquakes, storms, thunder lightning, and so on. So far I haven't seen any work being done on trying to rigorously answer such "weird" questions. This is why I introduce new concepts that, I think, play an essential role in setting up the right framework before answering these questions in the first place, and I try to mathematically formalize the problem statement in the following subsection.

Mathematical formalization

Instead of writing down all the definitions from the start, I will give more of my intuitions about the "CoBra" problem. So, imagine that there is a room in which you cannot peek and see what's going on inside. But you know that it is capable of solving 10 simple mathematical problems (e.g., addition of two numbers, linear equations, polynomials with up to 3 roots). You know that clones of this room could be built at will. Now, the question is as follows: How could you build a custom (and potentially bigger) room that is able to solve a problem which is unsolvable by the only building block (i.e., the room capable of solving 10 easy problems) you have? Making clones of the same room is simple. What's hard about it is determining the connections between the copies to make the whole thing solve a previously unsolvable problem, if it is even possible at all. I will leave the impossibility case away for the sake of exploring the intuition even further and focus on how to invent a systematic way of connecting rooms together as we scale it up. Now, I proceed to the new definitions which could be thought of as an addition to the classic graph theory.

First, I would like to start with different types of nodes and edges. There are 2 types of nodes and 2 types of edges. The node types are functional and input/output. A functional node is to represent a computational processes while an i/o node is to represent a directed communication between the processes. The edge types are defined with respect to the node that it is connected to and they are the following: input and output. Input edge (of a node) is an incoming directed arrow (to the node) and output edge (of a node) is an outgoing directed arrow (from the node). Simple enough. This being said, let's talk about more definitions.

Definition. Graph input is the output of an input/output node and graph output is the input of an input/output node.

Figure 1. A simple graph with different nodes and edges.

Definition. An anchor is such a subgraph that is isomorphic with its supergraph containing it in terms of its input/output connections and the graph's input/output connections respectively.

Definition. A column is such a graph that has at least one anchor.

Figure 2. Left: a column with only one anchor (the graph has 0 input and 1 output and only one node has 0 input and 1 output). Right: Four identity columns (double cross lines indicate an arbitrary number of edges).

Remark. All empty graphs are identity columns.

Definition. Column extension is the process of extending a column by substituting one or more of its anchors with the column itself.

Figure 3. Left: column extension of figure 2:left. Right: column extension of figure 2:right.

Remark. An extended column or an extension is the result of column extension.

Definition. A prime column is one that is not an extension of any other column except itself.

All the definitions except the last one are useful to conveniently have the concept of column extension. Actually, this is it. We are done with the formalization (at least for now) and it is very short and simple. Before leaving this subject, one might rightfully ask the following question: but why the column extensions are that important? The answer is because they have the ability to overcome the potential limitation of any finite time computation. In other words, as more computation might be required along the way to solve a particular problem, the column could be extended from the anchors indefinitely to create more space for the computation to continue. However, one question is still about the usage of results of each previous sub-column in the extended parts consecutively and wheter problems can be actually solved this way. Now, we could ask ourselves the following questions:

  • Does there exist a column with "interesting" behaviors? If so, how can we find (or generate) it?
  • What kinds of computational processes could be considered as "useful and interesting" anchors?
  • How much computational power/capacity could one achieve by using the CoBra theory?

To answer the first question about finding/generating "interesting" columns, we could use evolutionary strategies to tackle this. However, it would be a good idea to first zoom out a little bit and think about the general problem of AI before jumping to the implementation blindly. I will discuss my recent realizations and speculate on one possible approach to building truly intelligent systems in the next subsection.

Speculations

Solving AI is not easy obviously. But let's ask ourselves this question: How hard could it potentially be? Well, I don't know the answer to this question either although I may have some guesses. These guesses don't exactly tell you the degree of hardness in any quantitative way but they may still help you grasp the reality of it. Without prolonging it, here is what I think as one potential approach of building truly general intelligent systems: we may need to essentially use already existing computational processes that are out there independently from the particular math problems in our minds that we would like to solve. As discussed in the introduction of this post, we need to find a way to make traffic, thunderstorms, earthquakes, etc. solve problems for which these processes are probably not the "right tools". Instead of conversion of energy, this is about conversion of intelligence or information processing or computation. Do we know how to convert one form of intelligence to another? To be able to do such a conversion, we may probably need to build something that is very agnostic of its own nature as well as what exists outside it.

Traffic essentially solves only one problem, and that's going from one place to another without violating the set of constraints that is called the law. Other than that, it doesn't care about our math problems such as $1+1=?$ or solving an integral of a sine wave. In other words, the traffic also doesn't know what's outside it, and I don't think it knows its own nature either (at least, we cannot directly ask the traffic any questions about its own nature or about anything for this matter). It doesn't know anything. It just does things and when it does it there is some part of the computation going on that could potentially be harnessed and used for other purposes. This is still a big MAYBE obviously. If doing such a thing was possible then I would speculate that one of the ways to do it right might be as follows:

  1. TAP (Task Assignment Problem). First, you solve the task assignment problem in which you have to understand what each element of the system should do. Going with the traffic example, this would mean understanding what each car represents and what each place represents, so that you could assign the tasks correctly. This is also a matter of translating one problem formulated in one axiomatic universe to the one in another axiomatic universe.
  2. CAP (Credit Assignment Problem). Now, you solve the credit assignment problem which kind of comes as a free lunch from the underlying computational process(es). Going with the traffic example again, credit assignment would mean solving the problem of driving cars from one place to another without violating the law. But the elements of the system (i.e., cars) are already capable of doing this. Thus, you get this one for free and this is where the efficiency may actually be gained.
The TAP-CAP approach works by first solving the TAP part and then the CAP part. If we were able to make use of already existing systems that have achieved CAP-level intelligence then the latter part come for free. As we will see in the next subsection, one of the important questions is whether TAP and CAP are so closely related to each other that one cannot exist without the other. If this was the case then we would not be able to just plug in CAP and just go with it; instead, we would have to align TAP and CAP with each other. This property could also be called the cohesion property of TAP-CAP systems. It would also add to the complexity of the intelligence phenomenon. 

How did the evolution do it?

It would be also very interesting to know how the evolution invented/discovered the phenomenon of intelligence. There are two levels he depth I would like to dive a little bit into. The first one would be by talking about how the intelligent organisms (such as humans) came to be and the second one would be by talking about how the artificial organisms (such as economies and governments) came to be just because it could be easier to answer the latter since it has been mostly conscious efforts of the humans that have played an important role in forming them as opposed to the natural processes that are hidden and not (fully or even partially) observable from us. It should not be surprising that even the scientific branches trying to analyze and learn these organisms or systems are different. While learning how the living organisms came to be is the concern of biological evolution, learning about the civilization and how the first notion of a republic came about and were modified over time is the concern of the history. For someone like me who doesn't enjoy reading history much, it was somewhat a surprise that I could actually use history to learn something useful to me.

I would like to put down some questions that I have been having during the writing phase of this blog post. It is worth mentioning that these questions are interconnected with each other and therefore, finding an answer to one of them lead to the answer of the answer.

  • How did the evolution find "the right column"?
    • Did it emerge because of of its parts (i.e., computation processes and their relationships) were somehow in a great harmony with each other simultaneously (i.e., intelligence is complex)? -or-
    • Did it emerge just as a result of independent iterative refinements and improvements on both parts (i.e., computational processes and their relationships) separately (i.e., intelligence is complicated but not complex)?

From the historical point of view precisely answering the following questions could be important:

  • Did humanity first find necessary concepts/entities (i.e., computation processes) to create governments? -or-
  • Did it first find structure/relationships (between already existing entities)? -or-
  • Did it find both the necessary concepts/entities and the structure/relationships simultaneously?

CoBra Search Algorithm (CoBraS)

Finding the right column seems to be a non-trivial task at all since this is what the true intelligence could potentially emerge from. Since looking for the right column is the same as finding or generating the right column, it all comes down to the search algorithms after all. Of course, this is what one should have expected even from the beginning. The point I am making here is the following instead: how could we invent a new search/optimization algorithm that would work best within the cobra project? For this reason, I will lay out several factors that, I think, would play important role in such a CoBraS algorithm:

  • Many feedbacks
  • Feedbacks for as many individual parts as possible
  • A strategy to update based on the feedback
  • The update strategy must also respect and preserve all the previous updates as much as possible

Let's look at the first bullet point. Why do we need many feedbacks? In fact, why do we need a feedback at all? The answer to this question is that without any feedback there is no action to be taken rationally to improve on something. Any optimization algorithm needs some sort of a feedback. The more feedback, the more room to improve and optimize. On the other hand, no feedback means no optimization. Without any feedback we wouldn't know whether something went wrong or not which is in the basis of the improvements and optimization of the decisions that led to some particular outcome.

So, we all hopefully agree on the importance of the feedbacks, and especially many of them. However, I think that there is more to the story. If there were many feedbacks but they were intractably complex to understand and improve on then they would have no use for us. Imagine the following case, you make thousands of decisions in a game while being blindfolded and you get the reward signal at the end. If you played 1000 games it is obvious that you would also get 1000 reward signals and this could count as many feedbacks. However, it would still take way too longer for you to understand what you have done wrong or right with your decisions to play the game optimally next time. This is because the rewards were very scarce or in other words, you didn't have any feedback for either individual decisions or even smaller groups of decisions. So, we don't want just more feedbacks but we also want detailed feedbacks. Maybe this is too much to ask from the mother nature but we will see how this could be made possible in the following paragraph.

Lots of feedbacks without any update mechanism is also useless. That is why we also need an update mechanism to make modification on the system based on those received feedback signals. As mentioned in the previous paragraph, it might be too much to ask the mother environment to provide very details feedbacks. So, we should have some mechanism that deals with this issue. The mechanism in my mind is something that is similar to the following. Suppose that an organization full of human workers gets a negative reward for doing "something" wrong. It will then start to analyze the process and try to find what went wrong. When the issue is found, if it is purely within the company itself then the workers will try to fix it by themselves, and if the part of the issue is known to came from some other company(s) then this one will signal them with a negative reward for what they have produced. Each partially guilty company will repeat this process indefinitely until the problem is solved. Note that how each company upon receiving a negative reward first analyzed the process to get down to the root(S) of the issue. This is to say that each company must be able to grasp the whole problem almost completely. Thus, while each company only tackles some part of the whole problem, they do have some general ideas about "the big picture." In this case, different columns play the roles of different companies, and different nodes within a column are the same as different workers within a company. Columns can use reward information to inform their neighbors accordingly and optimize their own connections with them. However, nodes (i.e., computational processes) and their relationships are to be updated in some other way since none of them would be able to "fully" understand the reward signal (in which case backprop could be used blindly as in ANN models). So, we'll have to make a column to understand problems in a somewhat big-picture format.

We have been able to fuse all first three points in a single search algorithm. If it lacks the fourth one then our hopes will lack the real sense of the improvements on the candidate columns unfortunately. It means that even if the algorithm worked almost perfectly for a single reward signal or even a bunch of them, it could go off the grid real quick very soon when the new rewards arrive. The reason behind this is because our optimization algorithm didn't even try to respect the previous feedbacks and modifications done to the system based on them. That's to say, if we tried to find the right column for the moment then we will get it but only for a moment. To prevent this from happening, the algorithm must try its best to preserve all the previous updates while adding a new one. It could possibly merge a current update with the previous ones in a somewhat lossless way.

All the points made above sound good and all but it is harder to actually do it than just to say it. Perhaps it would be better to first try out different search algorithms that already exist and are used such as genetic algorithms, differential evolution, (stochastic) gradient descent, and so on. If none worked in a satisfactory degree then one would need to think about what has been written in this section of the blog post more deeply.

Implementation

Evolutionary approach

Genetic programming is obviously very convenient for our situation. Without losing much time, let's say we already have the initial population of columns. There are several challenges in this approach:

  • How do we evaluate fitness of a column?
  • How do we mutate a column?
  • How do we crossover multiple columns? 

Fitness evaluation. Evaluating a column in its simplest form could be done in several ways based on (1) its random problem solving ability, (2) its simplicity in terms of the size of the base column. We could combine (1) and (2) in some way to have something from both of them. It is worth mentioning that (2) is not about the size of a column that is also an extension of another one; it is instead, the size of the base column from which the evaluated column might have been extended multiple times. That's also to say that we are allowed to scale up and down the column without decreasing the simplicity measure and the only thing that this may affect on the evaluation function is its problem solving performance.

Column mutation. Here are three different mutation strategies:

  • Use classic mutation operation on the column (i.e., random changes in the structure)
  • Randomly extend the column (i.e., random yet systematic changes in the structure, also known as extension mutation)
  • Mutate nodes by modifying their functionalities randomly (i.e., node mutation)

Column crossover. Here are three different crossover strategies:

  • Use classic crossover operation on multiple columns (i.e., random exchange of sub-structures of multiple columns)
  • Crossover (nodes) intra-column (i.e., random exchange between nodes within a single column)
  • Crossover (functionalities) intra-node (i.e., random exchange between functionalities of the same node, also known as node crossover)

Combination of different strategies. How did evolution do it? Did it start with finite amount of all possible computations and used them as the building blocks of all the columns in a population? Or did it start from scratch where it had to "invent" or "discover" useful computations as the building blocks for the columns along the way? In other words, did evolution use 1-way optimization where the building blocks (basic computational processes) were known and the optimization was done on the level of columns or did it use 2-way optimization where it had to optimize for both the building blocks and the columns simultaneously? Yet another interesting question is the following: Is it possible to optimize 1-way at 2 levels of complexity that are probably have inter-relations with each other? If not, then we are essentially obliged to perform mutations and crossovers for both nodes and columns. This would also mean that one does not simply emerge without the other -- that is, intelligence is also complex phenomenon besides being complicated one.

Evaluation Mutation Crossover Interpretation Implicit assumptions
Problem-solving + Column simplicity Node mutation Node crossover Finding computational processes for a given column (could act like step 1). In this view, intelligence is complicated phenomenon. -- The given column structure is the right one.
-- Extending the found column will always result in better problem-solving performance.
-- Finding computational processes is now an independently solvable problem.
Problem-solving + Column simplicity Classic mutation Classic crossover Finding base columns that are built upon known computational processes (could act like step 2). In this view, intelligence is a complicated phenomenon. -- The given computation processes are the right ones.
-- Extending the found column will always result in better problem-solving performance.
-- Finding the right column is now an independently solvable problem.
Problem-solving + Column simplicity Classic mutation + Extension mutation Classic crossover + Intra-column crossover Finding columns with known computational processes that work when extended (could act like step 2). In this view, intelligence is a very complicated phenomenon. -- The given computation processes are the right ones.
-- Finding the right column is now an independently solvable problem.
Problem-solving + Column simplicity Classic mutation + Node mutation Classic crossover + Node crossover Finding computational processes and base columns that can successfully utilize them. In this view, intelligence is a complex phenomenon. -- Extending the found column will always result in better problem-solving performance.
Problem-solving + Column simplicity Classic mutation + Extension mutation + Node mutation Classic crossover + Intra-column crossover + Node crossover Finding computational processes and columns that can successfully utilize them when extended. In this view, intelligence is a both complex and complicated phenomenon. None

Let's talk about the table shown above. It essentially shows different ways of combining different mutation and crossover strategies as well as evaluation strategies, and it briefly describes their interpretations. It's up to us to choose a strategy and those interpretations could actually help to decide which one we would like to try first. Of course, they are my interpretations and therefore, may not be accepted by everyone. One way to figure whether they are reasonably correct or completely wrong would be to test each strategy one by one and then see the results.

The first case would make the intelligence appear as a complicated phenomenon because we just look for the computational processes that fit well within the given column. The second case would make it seem like a complicated one since finding a better column alone for the given computational processes is probably as hard as finding the computational processes alone for the given column (as we are trying to do in the first case). However, we would not need to do apply these strategies simultaneously, and it means that they belong to the two separable and independent phases of the GP (i.e., first step-1 and then step-2). That's why the phenomenon of intelligence is viewed as complicated and not complex in these scenarios. The third one is about finding columns with known computational processes as it was in the second case, but here we also take the column extensions into account. Basically, it means that we are not certain that what works with a base column will also work with its extensions, and therefore, we are suspicious that extending columns could make the problem solving performance drop. This suspicion makes us view the intelligence as a very complicated phenomenon because now there are even infinitely more column extensions to be considered. In the fourth case, intelligence is viewed as a complex phenomenon because both step 1 and step 2 are combined together and executed simultaneously. In the fifth case, we combine the third and the fourth strategies which makes the intelligence appear as a both complex and complicated phenomenon.

Artificial Neural Network approach

Another way of implementing columns would be by using artificial neural networks (ANNs). They do have an architecture which is usually represented as a graph but you might say that they are fixed and are not capable of changing the architecture once trained. This is obviously true but let's try something different with these fancy looking fixed-architecture-for-life ANNs. Suppose that you have an ANN model with a certain architecture that is also a column by the definition. You could first extended the model architecture several times before training it with the final fixed architecture as usual. However, this final model would have same set of weights in multiple places as due to the extension process. In addition to that, some layers in the middle of the network, besides the final output layer, would try to predict the final output too. This is interesting because now there are two differences between such an ANN model and other commonly used ones:

  1. The same set of weights are used repeatedly in multiple places
  2. Some hidden layers in the extended network also try to reproduce the final desired output
We could use the second observation to run a separate backpropagation on different parts of the network since we know that they literally predict the final output along with the final layer. Another interesting thing is that the final layer would get inputs from the previous layers some of which already have the outputs, however immature they might be. Maybe this process could also be interpretted as the refinement of final predictions but I am not sure whether such a model would produce any good results.

Conclusion

The CoBra project has been built upon an idea that I came up with while thinking about building a kind of platform or system that may solve different kinds of problems while relying on the same computational processes acting as building blocks. This would essentially require translating the given problem into the system's computational language and then assign appropriate tasks to different parts of the system accordingly. Then the system would do what it does best and by doing so, it would actually unintentionally solve the problem that we care about. I call this approach the TAP-CAP strategy for achieving general intelligence. Such general intelligence is essentially adaptive and convertible from one form to another to solve different kinds of problems that it was initially agnostic about. Not only such a system is agnostic about what exists outside it, it may very well be agnostic about its own nature and computational capacity. Such an agnostic system would hopefully dodge the bullet of "pre-mature converge on deciding or concluding the unsolvability of a given problem". Instead, truly intelligent system would try its best to mimic the problem as much as possible so that it could have an idea what is going on and finally manage to solve it.

The ideas mentioned on the paragraph above led to the invention of very small and easy theoretical notions such as anchor, column, and column extension. These formal notions helped to automate the process of building truly intelligent system by finding or generating special columns essentially. For that, I described the genetic programming approach and discussed various operational strategies from the evolutionary point of view with my interpretations of them. Although I have not experimentally tested all strategies yet, I thought it would be cool to write down the ideas first at least and maybe I could share the experimentations in the future as well. Ideas come before the execution.

Besides automation, I decided to make this process of seeking general intelligence public for the fun part. So, I made a website where people create and iteratively modify their "creatures" and see how well it does at each step in terms of intelligence. To see more about the fun part, you could visit the CoBra (Collective Brain) website.

Links

Comments

Popular posts from this blog

0, 1, and beyond

Books