Goal: avoid objective for the end of the simulation run

Occasional Contributor

Hello,

While training a brain, I keep getting warnings about some states. The warnings suggest that -if they persist- I should consider avoiding the states causing these warnings. The common thing among these states is that they occur at the end of my simulation run. Accordingly, I added an Avoid objective that should avoid the end of the simulation. After that, I restarted the training, the warnings disappeared, yet the avoid objective success rate is always zero. The goal satisfaction is around 80%.

Assuming one run of my simulation takes 720 minutes, then:

 

avoid endSimulation: state.currentTime in Goal.RangeAbove(720)

 

where the currentTime is the model time at that time step.

 

What could be the cause of the avoid objective not succeeding?

5 Replies

Hello @Duoaa. An avoid objective teaches a brain to prevent something from happening. It tells the brain to avoid certain state conditions. If the conditions occur, the episode will end and be considered a failure. I don't think that's what you want in this case.

 

If you need to limit the length of an episode, then you can use EpisodeIterationLimit in the training parameters as described here: https://docs.microsoft.com/en-us/bonsai/inkling/keywords/curriculum#curriculum-training-parameters.

 

Alternatively, you may want to investigate the warnings about the states. One way that this can occur if the numeric values of the simulated states are outside the range declared in the Inkling file's state structure. For example, if a variable is declared as "position: number<-10 .. 10>" but it is assigned the value 11. In order to solve this, you can expand the range or--for state values--declare it as "position: number" so that there is no limit to the values.

Hello @Forrest_Trepte,  thank you for your response.

 

I have double checked that the numeric values generated are within the ranges specified. Based on my simulation, I have also reduced the EpisodeIterationLimit to the maximum number of iterations possible per episode. For example, if my simulation runs for 720 minutes, and every 5 minutes, if my agent is ideal, I ask the brain for an action, the maximum number of iterations would be (720/5 = 144). My agent is not always idle every 5 minutes, so I can not really anticipate the number of iterations for every simulation run, and thus the warning persists. Would it be safe to ignore the warnings?

 

This is an example of the warning: 

The simulator with id ##### halted execution before the episode was complete.
This indicates a bug in your simulator or Inkling code. It can result in poor training performance if it occurs frequently.
If the system should learn to avoid this state, add an avoid objective or terminal condition for this case.
Episode id: #####
State: { currentTime: 720.0, ... }
Action: { actionToTake: 6.0 }
 

 

The "simulator halted execution before the episode was complete" warning indicates that your simulator has set "halted=true" to indicate that it can no longer continue simulating. If you developed a custom simulator in Python, you set the halted parameter in the SimulatorState that you pass to client.session.advance. If you are using a simulator software package such as AnyLogic, Simulink, VP Link, etc. they each have their own way of setting the halted state.

When halted is set, the current episode is discarded and the brain will not learn from that episode. If this halted state happens rarely or in situations that cannot really occur in real life then you can ignore the warning. However, it could be a problem that the simulator will not learn from these episodes. If halting happens more frequently, your training could slow down due to these ignored episodes or it could bias the data that the brain uses to learn. If halting arises in a certain kind of situation and the brain needs to learn how to handle that situation then it will be a problem if the warning is ignored.
It sounds like part of the issue is that the ideal number of iterations in an episode varies. ("I can not really anticipate the number of iterations for every simulation run.")

Ideally, I think you'd end each of your episodes without halting. Since the length varies, you can't rely on EpisodeIterationLimit. The criteria for ending episodes via goals are listed at https://docs.microsoft.com/en-us/bonsai/inkling/keywords/goal?tabs=avoid#early-episode-termination. I think you'd need to define success/failure goals in a way that terminates the episode and also makes sense for encouraging what the brain is trying to learn. Alternatively, could you modify your simulation model so that it doesn't set "halted=true". Could it quietly continue--perhaps not actually simulating, but just returning the final state until the epsiode terminates due to a maximum length?

Do other folks on the forum have better ideas for how to handle variable-length episodes?
We're actively exploring some new functionality that will improve Bonsai's ability to handle variable-length episodes, but that feature won't be ready for a some time. In the meantime, the advice that @Forrest_Trepte mentioned above is a sound approach for most problems. If that doesn't work for your problem, you could write a reward and terminal function rather than using goals. The terminal function would return true when you want your episode to end, and the reward function would indicate whether it was a desired terminal state (higher reward), undesired terminal state (lower reward), or a neutral condition (middle reward).