We're building a new kind of AI
that learns more like people.

In the demo below, our prototype uses trial & error to learn a robotics task from only 3 examples.

The prototype agent lives inside a simulation.

  1. The simulation works like virtual reality.

  2. The agent has some goals.

  3. The agent receives sensory data from the simulation.

    Sight, sound, etc.

  4. The agent chooses actions to learn about the environment and try to reach the goal.

๐ŸŒฎ

We simplified the simulation to focus on testing a few core ideas.

  1. The agent's goal is for the third pixel to be white.

    pixel 1: N/A
    pixel 2: N/A
    pixel 3: white
    Initial Goal
  2. At each timestep, it sees three pixels and their colors...

    t0
    pixel 1: red
    pixel 2: black
    pixel 3: grey
    What it sees
  3. Then it chooses button A or button B.

    t0
    A
    Button A is pressed
    B
    Button B is not pressed
    What it does
  4. Then, at the next timestep, the agent receives updated sensory input...

    t1
    pixel 1: red
    pixel 2: black
    pixel 3: grey
    What it sees
  5. And again it chooses button A or button B.

    t1
    A
    Button A is not pressed
    B
    Button B is pressed
    What it does
  6. It continues to interact with the environment, trying to figure out how it works and how to reach its goal.

pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
pixel 1: green
pixel 2: black
pixel 3: grey
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
A
Button A is not pressed
B
Button B is pressed
What it does
A
Button A is pressed
B
Button B is not pressed
What it does
pixel 1: N/A
pixel 2: N/A
pixel 3: white
Initial Goal

Demo of our prototype

  1. These are two recordings of our prototype agent interacting with the simulation.

    One recording is the "training" episode; the other is the "testing" episode.

  2. This is the agent โ€œtrainingโ€ on the task โ€” learning how to reach the goal for the first time.

  3. For this simple task, it completes its learning after 8 timesteps.

  4. This is the agent โ€œtestingโ€ on the task โ€” reaching the goal as quickly as possible after the training episode.

  5. It reaches the goal in 2 timesteps โ€” the fastest possible for this task.

Training

 / EPISODE 001

What it sees
What it does
t0
pixel 1: red
pixel 2: black
pixel 3: grey
A
Button A is pressed
B
Button B is not pressed
t1
pixel 1: red
pixel 2: black
pixel 3: grey
A
Button A is not pressed
B
Button B is pressed
t2
pixel 1: green
pixel 2: black
pixel 3: grey
A
Button A is pressed
B
Button B is not pressed
t3
pixel 1: red
pixel 2: black
pixel 3: white
A
Button A is pressed
B
Button B is not pressed
t4
pixel 1: red
pixel 2: black
pixel 3: grey
A
Button A is not pressed
B
Button B is pressed
t5
pixel 1: green
pixel 2: black
pixel 3: grey
A
Button A is pressed
B
Button B is not pressed
t6
pixel 1: red
pixel 2: black
pixel 3: white
A
Button A is not pressed
B
Button B is pressed
t7
pixel 1: green
pixel 2: black
pixel 3: grey
A
Button A is pressed
B
Button B is not pressed
t8
pixel 1: red
pixel 2: black
pixel 3: white

Testing

 / EPISODE 002

What it sees
What it does
t0
pixel 1: red
pixel 2: black
pixel 3: grey
A
Button A is not pressed
B
Button B is pressed
t1
pixel 1: green
pixel 2: black
pixel 3: grey
A
Button A is pressed
B
Button B is not pressed
t2
pixel 1: red
pixel 2: black
pixel 3: white
๐ŸŽ‰

Toy problem. ๐Ÿงธ
Principled solution. ๐Ÿ“

The agent's core principles were developed independent of the task, and are the foundation for solving more complex tasks.

What makes this result interesting is the explanation for how it works...

Training

 / EPISODE 001

๐Ÿฃ
0
Past training or experience
Init from scratch
t0
pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
t1
pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
A
Button A is not pressed
B
Button B is pressed
What it does
t2
pixel 1: green
pixel 2: black
pixel 3: grey
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
t3
pixel 1: red
pixel 2: black
pixel 3: white
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
t4
pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
A
Button A is not pressed
B
Button B is pressed
What it does
t5
pixel 1: green
pixel 2: black
pixel 3: grey
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
t6
pixel 1: red
pixel 2: black
pixel 3: white
What it sees
A
Button A is not pressed
B
Button B is pressed
What it does
t7
pixel 1: green
pixel 2: black
pixel 3: grey
What it sees
A
Button A is pressed
B
Button B is not pressed
What it does
t8
pixel 1: red
pixel 2: black
pixel 3: white
What it sees
Reconciliation

How learning works

  1. The agent has zero knowledge or experience before training.

  2. Agent's first ever "data".

    The sensory input at t0 is the first "data" ever seen by the agent.

  3. Before making a decision about what to do, the agent exhibits some internal behavior.

  4. Two steps:
    Reconciliation & Planning

  5. Theory Reconciliation

    First, the agent looks for any new problems in its theories and attempts to reconcile them.

  6. No theories? No problems.

    In the first timestep, the agent has no theories yet, so there are no problems to reconcile.

  7. Next, it makes a plan to reach its goal.

    We choose the goal for now, and we can modify the goal at any time.

  8. Planning may require some conjecture.

    It chooses a strategy for reaching its goal. If it doesn't have any strategies, it needs to invent one.

  9. The agent's next action is determined by its selected plan.

  10. At the next timestep, the agent receives new sensory input.

  11. The agent reconciles its first problem.

    This time, the agent's sensory input reveals a problem with its theories. It reconciles the problem by rejecting one or more theories.

  12. Then it revisits its plan.

    This time, it can rule out more theories because it has more sensory experience.

  13. Skipping ahead โ€” at t3, it reaches the goal for the first time.

    But it doesn't yet understand how it reached the goal...

  14. Reaching the goal surprises the agent, revealing a problem with its theories.

    It discards the problematic theories despite reaching the goal.

  15. It continues to iterate on its theories and plans at each timestep.

    Let's skip ahead to when it reaches the goal for the third time, at t8.

  16. At t8, the agent reaches the goal for the third time...

  17. But this time it isn't surprised by reaching the goal.

    Importantly, it didn't experience any problems on the way to the goal.

  18. After this point, the agent doesn't learn anything new.

    It knows how to reach its goal without encountering any problems.

Learning is fueled by problems.

A problem can come from an error in its theories or from not-knowing how to reach its goal.

If the agent doesn't discover any new problems, then it has no "fuel" to continue learning.

So understanding is about having good-enough theories, not perfect ones.

When the agent's theories are good-enough to reliably reach the goal without encountering problems, then learning is complete.

Testing

 / EPISODE 002

๐Ÿฅ
Past training or experience
Keeps existing knowledge
t0
pixel 1: red
pixel 2: black
pixel 3: grey
What it sees
Reconciliation
Planning
A
Button A is not pressed
B
Button B is pressed
What it does
t1
pixel 1: green
pixel 2: black
pixel 3: grey
What it sees
Reconciliation
Planning
A
Button A is pressed
B
Button B is not pressed
What it does
t2
pixel 1: red
pixel 2: black
pixel 3: white
What it sees
๐ŸŽ‰

How testing works

  1. After training, we "test" the agent to see how quickly it can reach the goal.

    The agent retains all of its knowledge from training.

  2. The agent wakes up and "sees" these colors.

  3. It doesn't have any theories about it's current situation yet...

  4. ...so there are no problems to reconcile.

  5. In order to make a plan, it needs to guess at the current situation.

  6. These conjectures are singular, not universal.

    It needs to conjecture theories about what's there rather than how it works.

  7. The agent's next action is determined by its selected plan.

  8. At the next timestep, the agent receives new sensory input.

  9. No problems so far.

    The agent got lucky and guessed its situation on the first try.

  10. The agent updates its plan based on its progress.

  11. It reaches the goal in 2 timesteps โ€” the lowest possible score.

  12. And again, no problems were encountered.

    This indicates a threshold level of understanding of this situation, simulation, and task.

How it's useful

  1. We'll start by automating simple physical tasks.

    Tasks that require less abstraction and reasoning, e.g. assembly, welding, factory tasks.

  2. As we expand its capability, we'll automate complex physical or digital tasks.

    Tasks that require more abstraction and reasoning, e.g. language, math, decision-making, multi-step processes.

  3. Ultimately, we'll be able to automate engineering and science tasks.

    Tasks that require creating and testing new designs or new theories.

  4. In general, the agent will be able to learn anything where we can create a feedback loop for trial and error.

The work ahead

  1. Building AGI requires understanding how knowledge is created and improved โ€” not just memorizing a lot of knowledge.

    We've been able to make meaningful progress in a relatively short time, and we want to reach this threshold as quickly as possible.

  2. This is only the beginning.

    There are many interesting problems ahead of us.

  3. Future work will build upon the principles behind our prototype.

    Weโ€™ll systematically work to address increasingly complex tasks and environments.

  4. Weโ€™re currently looking for investors who want to fund the next phase of R&D.

    Contact Collin Kindrom for more information, or with any other questions or ideas.
    collinkindrom@gmail.com