Import AML-Dataset

Occasional Contributor

Hey guys, 

i want to add an AML-Dataset to Bonsai to train a brain with that dataset, as described here, last sentences: https://docs.microsoft.com/en-us/bonsai/guides/add-dataset.

 

But: I can not find the "+ Add dataset next to the Datasets list in the Bonsai UI" - so i've no idea how to add a dataset to bonsai. 

 

Assume i didn't understand something - is there somewhere a button to add a datasets to bonsai? Or how is that possible?

 

Here the description: 

Step 3: Import the data to your Bonsai workspace

  1. Click + Add dataset next to the Datasets list in the Bonsai UI.
  2. Select "Azure Machine Learning" from the list of data sources types.
  3. Provide a display name for your imported dataset.
  4. Select workspaceblobstore as the target datastore.
  5. Select your AML data container and version in the provided dropdowns.
  6. Click the Create dataset button to add your data to the Datasets list.

Thanks for enlighting me :)

Torsten

11 Replies

Finally i got the first part: After enabling the "Beta features", the Datasets-Menu is there, and it works more or less as described... Choosing the right AML-DataStore is important - if its the wrong, you can choose and find yout dataset, but after clicking "create" - nothing relevant happens...

Hi @Torsten_Katthoefer sorry we were not able to get back to you on time. But I'm glad to see you were able to figure this out by yourself.

 

I created an internal bug for the case you are reporting about the incorrect combination of fields in the Dataset creation is not providing proper error messages.

 

About your initial issue, how did you figure out that Beta has to be enabled? (I want to see what or how to improve in the documentation).

 

Thanks for trying this new feature. And feel free to reach out to me directly for anything related to training directly with Datasets.

Hey @edilmo,

related to to your question how i figured out to enable the Beta-features: I was lucky by clicking randomly...

 

Cheers, Torsten 

Thanks for your quick response@Torsten_Katthoefer.
I will create a bug for that as well. We will fix the doc.
Really appreciated your feedback. And again, if you face any issues using this new training feature, feel free to tag me directly.


I want to take the opportunity to clarify a misleading comment in the documentation. As you can notice, the feature allows you to train Brains (policies) directly from data, without simulators, or with both.

The feature is in beta and that is why you could find issues like those you saw (in the system and the doc). In the Doc, when we talk about the Hybrid mode, there is a comment that could be misinterpreted. Hybrid mode, which allows you to use both types of sources, data, and simulators, is actually using data as the first source of training and the simulator as the source for assessment. In other words, if you are using both, simulator and data, any type of assessment, automatic or custom, will be performed using the simulator. If you are in pure Offline training mode (no simulator source), both types of assessment are performed with an algorithm. 

Training with datasets enables you to use expert data which is an effective way to accelerate the training, reduce the number of samples required for training, and more importantly, achieve better results. One typical issue is how to know that the task (reward function or goal statement in Inkling) is actually aligned with the expert data in the dataset. To address that issue you can use the "Pre-Training chart" which shows you the performance in the dataset as a distribution of rewards or goal satisfaction rates. If you are using expert data and the plot is showing a bad performance in the "Pre-Training chart", then you have an issue either in the Inkling or in the dataset.

We should be adding more documentation about the impact of type and amount of data in the future, but in the meanwhile I hope these comments could help you and others to avoid typical problems.

@Torsten_Katthoefer 

 

Hey Torsten, I have a question. Does the create button get disabled when you click it and do you see a loading indicator in the create button box? The request to create a dataset is being sent when you click create and It may take a few moments for the request to complete.

 

To help with debugging could you check browser console and network tab to see if any errors are occurring?

Hey @Navvaran_Mann,
yes, it got disabled and the loading indicator appears, only for a view seconds. I got following error-message in the console:

Instrumentation.ts:33 [Stores] 2022-01-05T18:01:39.642Z Error creating dataset: Error: , Exception: [Exception] Type: RequestFailedException Error: Failed to get schema for dataset testService request failed.
Status: 404 (The specified container does not exist.)
ErrorCode: ContainerNotFound

Headers:
Transfer-Encoding: chunked
Vary: Origin
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: 5dab711a-501e-002f-0c5e-02ee92000000
x-ms-client-request-id: 5d79a451-31aa-4b09-9f6b-39230afa2a2c
x-ms-version: 2020-08-04
x-ms-error-code: ContainerNotFound
Date: Wed, 05 Jan 2022 18:01:40 GMT
ExceptionMessage: Service request failed.
Status: 404 (The specified container does not exist.)
ErrorCode: ContainerNotFound

Headers:
Transfer-Encoding: chunked
Vary: Origin
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: 5dab711a-501e-002f-0c5e-02ee92000000
x-ms-client-request-id: 5d79a451-31aa-4b09-9f6b-39230afa2a2c
x-ms-version: 2020-08-04
x-ms-error-code: ContainerNotFound
Date: Wed, 05 Jan 2022 18:01:40 GMT

@Torsten_Katthoefer I'll forward this error to our service team.

I have another question as I may have misread the thread. Were you able to get it to work and this situation only occurs when you select an incorrect combination of datastore/container/version?

You are right, only when the wrong container is choosed. The dataset and the version is found even if the container is wrong, but create Leads to the error what is only viewable in the console.
Ah I see what's up. Yeah this is not a great UX. The issue is that the datastores and datasets are all fetched simultaneously when selecting the AML workspace. I haven't figured out a proper way to filter these using the Azure ML APIs. You do have to select the correct datastore & dataset combinations or it will fail.

I'll have to poke at it some more to see if I can get it to filter the datasets when selecting a datastore. From what I recall I couldn't seem to find a way to infer this from the data or fetch it from the API.
Is it really necessary to choose the container? If the AML-dataset is fetched before, maybe you could skip the container?
At the moment it is necessary. Our service specifically requires all these to be able to get the data on the service side. It may be possible to infer some of these options on the client side so the user flow can be simplified however I haven't found a way to do it yet.

It's on my list of things to improve for this feature. Sorry for the inconvenience at the moment.