SOLVED

Verifying multi-lesson trainings

Brass Contributor

When my inkling includes multiple lessons, I notice that training will complete with only the first one appearing to have been trained on. For reference, I'm observing this both from the concept overview chart and the automatic assessments (though, the custom assessments also just show the first).

 

Behind the scenes, are all lessons being "taught" or is my `NoProgressIterationLimit` not high enough to start the next lesson (unsure what it should be set to, if this is the case)?

 

Inkling sample

2022-03-28_12-27-06.jpg

 

 Concept graph

(please disregard the unscaled rewards :upside_down_face:)

2022-03-28_12-32-59.jpg

 

 

 

 

6 Replies
The concept and assessment charts aren't as clear as they could be about lesson boundaries. We've been thinking about ways to improve that.

If you open up the "Errors & Outputs" tab at the bottom of the Training panel, you should see additional information about the training progress — including any lesson boundaries and whether the NoProgressIterationLimit was reached.

Does that provide the information you're looking for?

@erictrThanks for the info, though it doesn't appear to have reported anything (see log below).

 

Return value will be cast to return type. Rounding may occur.
  Rounding may occur for struct field 'delay_Nf'
    Rounding may occur when converting a value of type 'number' to 'number<0 .. 1>'
  Rounding may occur for struct field 'delay_Ef'
    Rounding may occur when converting a value of type 'number' to 'number<0 .. 1>'
  Rounding may occur for struct field 'delay_Sf'
    Rounding may occur when converting a value of type 'number' to 'number<0 .. 1>'
  Rounding may occur for struct field 'delay_Wf'
    Rounding may occur when converting a value of type 'number' to 'number<0 .. 1>' (line 47, column 12)
3/26 12:55 PM: 
The training session is starting. Training engine version 3.5 is being used.
3/26 12:55 PM: 
Using algorithm APEX as specified in the inkling algorithm clause.
3/26 12:55 PM: 
Training will stop automatically after 50,000,000 iterations. Learn more.
3/26 12:55 PM: 
Training will stop early if no learning progress is detected in over 500,000 iterations. To adjust, change the NoProgressIterationLimit training parameter. Learn more.
3/26 12:55 PM: 
Training concept "Concept1" from scratch.
3/26 2:22 PM: 
Training of concept "Concept1" has stopped because there has been no learning progress detected in over NoProgressIterationLimit (500,000) iterations. To continue training, increase the NoProgressIterationLimit training parameter. Learn more.
3/26 2:29 PM: 
The training session completed successfully.

 

As another data point, I also tried the Moab example with the multi-lesson setup shown in the help article. It did appear to switch lessons part way through (see below), but it also doesn't show anything referencing the lesson in the logs.

2022-03-28_14-01-37.jpg

 

 

From this output, we can tell that training has stopped during the first lesson because there wasn't any forward progress. It didn't proceed to the second lesson because the first lesson training didn't succeed. The error message "The training session completed successfully" is misleading here, and we should fix that.

So I think your next step is to look at your goal (or reward) definition and determine why the policy isn't succeeding during the first lesson. Other possible workarounds: 1) lower the success threshold or 2) increase the no progress iteration limit.

@erictrInteresting, so that makes sense based on the graphs, but two related follow-ups:

 

1. The first lesson had about 100% increased improvement (-3.3e9 to -9.4e6) from start to "finish", so I'm not sure how high the threshold must be to be considered "forward progress".

 

2. When I trained the moab using multiple lessons, it also did not show anything in the logs (see below).

 

3/26 3:15 PM: 
The training session is starting. Training engine version 3.5 is being used.
3/26 3:15 PM: 
Algorithm selection started. Gathering samples for algorithm selection.
3/26 3:15 PM: 
Algorithm selection complete. Using the algorithm PPO.
3/26 3:15 PM: 
Training will stop automatically after 50,000,000 iterations. Learn more.
3/26 3:15 PM: 
Training will stop early if no learning progress is detected in over 250,000 iterations. To adjust, change the NoProgressIterationLimit training parameter. Learn more.
3/26 3:15 PM: 
Training concept "MoveToCenter" from scratch.
3/26 3:57 PM: 
Training of concept "MoveToCenter" has stopped because there has been no learning progress detected in over NoProgressIterationLimit (250,000) iterations. To continue training, increase the NoProgressIterationLimit training parameter. Learn more.
3/26 3:58 PM: 
The training session completed successfully.

 

best response confirmed by TWolfeAdam (Brass Contributor)
Solution

@TWolfeAdam 

 

There are no user facing logs emitted when we transition between lessons. What I will do to help reduce confusion here in the future is add a log message when training terminates before the curriculum is complete. The log will point users towards the Lesson Threshold documentation as well as the docs on running custom assessments.

 

To get into the technical details of what happened here, when you're using rewards and have not specified a LessonRewardThreshold, the system will use a very simple convergence test over a rolling window. The window is based on assessments, and it's going to look at the last three assessments to measure their % deviation in mean reward. Your policy had several assessments that failed to become a new high water mark, but they did have enough variance they failed the simple 'is it flat' progress test.

 

If you know your reward threshold for progress, you can specify the LessonRewardThreshold for each lesson. A LessonRewardThreshold of -9.4e6 would mean 'success' for an episode for that lesson was exceeding a reward of -9.4e6. If you have a LessonRewardThreshold or use goals, you can set a LessonSuccessThreshold. A LessonSuccessThreshold of 0.75 means 75% of all episodes in an assessment must be successful (either exceed the reward threshold or satisfy all goals) before training can proceed to the next lesson.

 

A custom assessment using the same parameters as the lesson can help tell you what your policy is doing that prevents it from succeeding often enough to progress to the next lesson.

@Ross_Story  Thank you for taking the time to explain that! And the log message will be definitely useful feedback and a reminder, as I appear to have previously glossed over both reward threshold settings.