Brain doesn't test performance....

New Contributor

Brain in bonsai doesn't test performance after 60 000+ iterations.
For teaching I use unmanaged local simulator AnyLogic.


Inklink:

inkling "2.0"
#using Goal

# Define a type that represents the per-iteration state
# returned by the simulator.
type SimState {
    #timeInModel: number,
    meanTime: number,
    #maxiTime: number,
    #pocet: number,
    #priemernycas: number,
}

# Define a type that represents the per-iteration action
# accepted by the simulator.
type SimAction {
    p1: number<5 .. 50 step 1>,
    p2: number<5 .. 50 step 1>,
    p3: number<5 .. 50 step 1>,
    p4: number<5 .. 50 step 1>,
}

#function Terminal(obs:SimState)
#{
   # if(obs.timeInModel >=100000000)
   # {
   #     return true
   # }
   # return false
#}

function Reward(obs: SimState)  {
 
    #return  -obs.meanTime * 0.7 + (-obs.pocet * 0.3)
    return  -obs.meanTime 
}

simulator Simulator(action: SimAction): SimState {
    #package "test"
}

# Define a concept graph
graph (input: SimState): SimAction {
    concept Concept1(input): SimAction {
        curriculum {
                 #goal (s: SimState){
                #minimize priemernycas: s.meanTime in Goal.RangeBelow(100)
                #}
                training {
                    EpisodeIterationLimit: 1000,
                    LessonAssessmentWindow: 10000,
                }          
            # The source of training for this concept is a simulator
            # that takes an action as an input and outputs a state.
            source Simulator
            reward Reward
            #terminal Terminal
            # Add goals here describing what you want to teach the brain
            # See the Inkling documentation for goals syntax
        }
    }
}
3 Replies

Hi @MichalFEIT2020 ,

 

Are you saying that you are training a policy, and it does not appear to be learning after 60,000+ iterations? In other words, the average reward per episode is "flat" and doesn't appear to be improving?

 

There are a number of things that could explain this.

 

First, I'll point out that most problems require many more iterations than 60K — typically one or two orders of magnitude more — to fully train the policy. However, I'd normally expect to see some visible progress in the first 60K iterations. If you're not seeing that, it probably means there's a problem with your simulation model or the way you've specified the problem in inkling.

 

Looking at your inkling below, it appears that you're currently using a reward function, as opposed to a goal. You have also commented out your terminal function, which means that every episode is going to run for 1000 iterations (the value you've defined for `EpisodeIterationLimit`). That means you've run only 60 episodes so far, which is a _very_ small number.

 

Your reward function is based only on the `meanTime` state parameter. I don't know how this is calculated in your simulation, what it represents, or how its value depends on the actions `p1` through `p4`. If you could explain more, I might be able to offer more suggestions. Based on the symptoms you've described, I'm guessing that `meanTime` is not a function of `p1` through `p4` or is stochastic.

 

 -Eric

@erictr Simulation is about crossroad. meanTime is time cars spend at intersection and <p1 ... p4> is parameters of green lights in that intersection. In the picture is my learning progres which is, none ....
When i try use simulator package it's failed with error: 
Unable to start test
Resource type 'Microsoft.ContainerInstance/containerGroups' container group quota 'StandardCores' exceeded in region 'westus2'. Limit: '10', Usage: '34' Requested: '1'.
(default_30 / v01 / Concept1 / Train / e8ccff1e-debb-4eab-bcd3-00921ce4e6e3)

 

I have already send request to rise up quotas.

@MichalFEIT2020 Thanks for the additional details. I think what's happening is that you haven't trained enough to get through even the first batch, so no training has yet occurred. The RL training algorithms in the bonsai platform gather batches of state/action/reward information and use those batches when training. 1K episodes is probably not enough to make a single batch.

 

Training will go much faster if you're able to run multiple simulation instances in parallel. Once your quota limits have been increased, you should be able to make more rapid progress.

 

Let us know if you have any other questions or run into other problems. We're here to help.

 

 -Eric