how is the reward function calculated internally?

Occasional Contributor
I am using an AnyLogic model... and I want to know how to define the reward function correctly. And for this i need to understand the underlying technique, since there are 2 different approaches
if I say in inkling
reward = obs.something;
what does bonsai do on the background? There are 2 options

it takes the difference between the last reward and the current reward, meaning that if reward=obs.something it is actually calculating the difference between the current obs.something and the previous one, and this is accumulated internally
bonsai ignores any previous reward and when i do reward=obs.something what it's really doing it's just using obs.something and internally adding that reward.

option 1 requires me to create accumulated rewards in anylogic, OPTION 2 requires me to created rewards that DO NOT accumulate in anylogic

What is the right way to do it?
The reward that you calculate for each iteration is assumed by the platform to be specific to that iteration, not a cumulative value. As the policy is trained, it will attempt to maximize the cumulative rewards for each episode. In other words, you should assume OPTION 2.