I am using an AnyLogic model... and I want to know how to define the reward function correctly. And for this i need to understand the underlying technique, since there are 2 different approaches
if I say in inkling
reward = obs.something;
what does bonsai do on the background? There are 2 options
OPTION 1:
it takes the difference between the last reward and the current reward, meaning that if reward=obs.something it is actually calculating the difference between the current obs.something and the previous one, and this is accumulated internally
OPTION 2
bonsai ignores any previous reward and when i do reward=obs.something what it's really doing it's just using obs.something and internally adding that reward.
option 1 requires me to create accumulated rewards in anylogic, OPTION 2 requires me to created rewards that DO NOT accumulate in anylogic
What is the right way to do it?