SOLVED

how is the reward function calculated internally?

Occasional Contributor
I am using an AnyLogic model... and I want to know how to define the reward function correctly. And for this i need to understand the underlying technique, since there are 2 different approaches
 
if I say in inkling
reward = obs.something;
 
what does bonsai do on the background? There are 2 options

OPTION 1:
it takes the difference between the last reward and the current reward, meaning that if reward=obs.something it is actually calculating the difference between the current obs.something and the previous one, and this is accumulated internally
 
OPTION 2
bonsai ignores any previous reward and when i do reward=obs.something what it's really doing it's just using obs.something and internally adding that reward.

option 1 requires me to create accumulated rewards in anylogic, OPTION 2 requires me to created rewards that DO NOT accumulate in anylogic

What is the right way to do it?
1 Reply
best response confirmed by felipeharo100 (Occasional Contributor)
Solution

The reward that you calculate for each iteration is assumed by the platform to be specific to that iteration, not a cumulative value. As the policy is trained, it will attempt to maximize the cumulative rewards for each episode. In other words, you should assume OPTION 2.