Feb 06 2021 03:43 AM
Feb 06 2021 03:43 AM
Brain in bonsai doesn't test performance after 60 000+ iterations.
For teaching I use unmanaged local simulator AnyLogic.
Feb 06 2021 07:43 AM
Hi @MichalFEIT2020 ,
Are you saying that you are training a policy, and it does not appear to be learning after 60,000+ iterations? In other words, the average reward per episode is "flat" and doesn't appear to be improving?
There are a number of things that could explain this.
First, I'll point out that most problems require many more iterations than 60K — typically one or two orders of magnitude more — to fully train the policy. However, I'd normally expect to see some visible progress in the first 60K iterations. If you're not seeing that, it probably means there's a problem with your simulation model or the way you've specified the problem in inkling.
Looking at your inkling below, it appears that you're currently using a reward function, as opposed to a goal. You have also commented out your terminal function, which means that every episode is going to run for 1000 iterations (the value you've defined for `EpisodeIterationLimit`). That means you've run only 60 episodes so far, which is a _very_ small number.
Your reward function is based only on the `meanTime` state parameter. I don't know how this is calculated in your simulation, what it represents, or how its value depends on the actions `p1` through `p4`. If you could explain more, I might be able to offer more suggestions. Based on the symptoms you've described, I'm guessing that `meanTime` is not a function of `p1` through `p4` or is stochastic.
Feb 06 2021 11:41 AM
@erictr Simulation is about crossroad. meanTime is time cars spend at intersection and <p1 ... p4> is parameters of green lights in that intersection. In the picture is my learning progres which is, none ....
When i try use simulator package it's failed with error:
Unable to start test
Resource type 'Microsoft.ContainerInstance/containerGroups' container group quota 'StandardCores' exceeded in region 'westus2'. Limit: '10', Usage: '34' Requested: '1'.
(default_30 / v01 / Concept1 / Train / e8ccff1e-debb-4eab-bcd3-00921ce4e6e3)
I have already send request to rise up quotas.
Feb 06 2021 02:32 PM
@MichalFEIT2020 Thanks for the additional details. I think what's happening is that you haven't trained enough to get through even the first batch, so no training has yet occurred. The RL training algorithms in the bonsai platform gather batches of state/action/reward information and use those batches when training. 1K episodes is probably not enough to make a single batch.
Training will go much faster if you're able to run multiple simulation instances in parallel. Once your quota limits have been increased, you should be able to make more rapid progress.
Let us know if you have any other questions or run into other problems. We're here to help.