Typically, the sample AzureML notebooks I share with our Azure ML customers, I test those first. The sample notebook, mslearn-aml-labs/05-Creating_a_Pipeline.ipynb at master · MicrosoftDocs/mslearn-aml-labs (github.com..., where we walkthrough our customer on how to train and register a machine learning model, through a pipeline, was no different.
I tested and made some changes along the way, to reflect some of the changes for my environment, as well as to accommodate some new changes (i.e., ScriptRunConfig instead of Estimator). I have done all the testing, using Blob storage as the data store for my sample data. The pipeline ran for me as expected.
In our customer scenario though, their data would be stored in ADLS Gen2. They changed their Default Store for the Workspace to their ADLS Gen2 Store (Datastore connected to ADLS Gen2 account) to reflect this need. Customer followed the example notebook as is, changing the data store location to create the model folder to the ADLS Gen 2 location.
Everything worked as expected (they could retrieve the data for analytics and modeling), except, when they tried to submit the experiment with the code block:
# Create an experiment and run the pipeline
experiment = Experiment(workspace = ws, name = 'diabetes-training-pipeline')
pipeline_run = experiment.submit(pipeline, regenerate_outputs=True)
They would get the error, upon execution of the cell, as below:
ActivityFailedException: ActivityFailedException: Message: Activity Failed:
{
"error": {
"code": "UserError",
"message": "Unable to mount data store <ADLS Gen2 Data Store Name> because data store type AzureDataLakeGen2 can not be mounted.",
"messageFormat": "Unable to mount data store {Name} because data store type {DataStoreType} can not be mounted.",
...
}
}
I went ahead and ran through the version I had, and found that the error actually would come because of the line where we create the pipeline data, under the “Create and Run a pipeline”, from the GitHub example:
# Create a PipelineData (temporary Data Reference) for the model
foldermodel_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())
In the sample we are using default data store, part of our Azure ML workspace. Now, if we first get the data store with the example below:
# Retrieve an existing datastore in the workspace by name
mydatastore = Datastore.get(ws, datastore_name)
# where, the datastore_name is the Datastore pointing to ADLS Gen2 store.
and then change the model_folder assignment as:
# Create a PipelineData (temporary Data Reference) for the model folder
model_folder = PipelineData("model_folder", datastore=mydatastore)
we will encounter the error, as our customer did, when the experiment is run.
At this point, the reason of the error message is, as the error message suggests, we cannot mount datastore to the compute for our pipeline run. Please note that, this is not to say that you cannot use the data store to keep your training/testing data. It is only when you try to refer to the ADLS store as your temporary store for the pipeline related data, we will have the situation. We won’t be able to mount the ADLS store and we provide the error message to inform our users.
To resolve the error, simply use the default store. In other words, do not change the line, below with the ADLS store object and you will be just fine.
# Create a PipelineData (temporary Data Reference) for the model folder
model_folder = PipelineData("model_folder", datastore=ws.get_default_datastore())
Also, if you are using ADLS Store (Datastore you created in ML Studio connecting to ADLS Gen2 account) as a Default Store, please change this back to Azure Blob Store (workspaceblobstore, created as part of your workspace), so the system attempts to mount only Azure Blob Storage.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.