Using the well-known artificial intelligence framework TensorFlow on Azure Web App can help you bring your ideas to life more quickly. This tutorial provides a step-by-step guide to help you deploy your TensorFlow project on an Azure Web App, covering everything from resource setup to troubleshooting common issues.
TOC
- Introduction to OpenAI
- System Architecture
- Architecture
- Focus of This Tutorial
- Setup Azure Resources
- File and Directory Structure
- Bicep Template
- Running Locally
- Training Models and Training Data
- Predicting with the Model
- Publishing the Project to Azure
-
Code Commit to Azure DevOps
-
Publish to Azure Web App via Pipeline
-
- Running on Azure Web App
- Training the Model
- Using the Model for Prediction
- Troubleshooting
-
docker.log freeze after deployment
- Others
-
- Conclusion
- References
1. Introduction to OpenAI
TensorFlow is an open-source machine learning framework developed by Google. It provides tools for building and deploying machine learning models, with a focus on flexibility and scalability. TensorFlow supports deep learning, classical machine learning, and neural network models, enabling tasks like image recognition, natural language processing, and time series forecasting.
At its core, TensorFlow uses computational graphs to model mathematical operations, allowing efficient computation on CPUs, GPUs, and TPUs. It features a high-level API, Keras, for easy model building, as well as lower-level APIs for advanced customization. TensorFlow also supports distributed training for large-scale datasets and diverse deployment options, including cloud services, mobile devices, and edge computing.
TensorFlow’s ecosystem includes TensorFlow Hub (pre-trained models), TensorFlow Lite (for mobile and IoT), and TensorFlow.js (for JavaScript applications). Its integration with visualization tools like TensorBoard simplifies debugging and performance monitoring. TensorFlow excels in production environments, offering features like TensorFlow Extended (TFX) for end-to-end ML pipelines.
With its versatile capabilities and large community, TensorFlow is widely used in industries like healthcare, finance, and technology, making it one of the most powerful tools for modern machine learning development.
2. System Architecture
Architecture
Development Environment
OS: |
macOS |
Version: |
Sonoma 14.1.1 |
Python Version: |
3.9.20 |
Azure Resources
App Service Plan: |
SKU - Premium Plan 0 V3 |
App Service: |
Platform - Linux (Python 3.9, Version 3.9.19) |
Storage Account: |
SKU - General Purpose V2 |
File Share: |
No backup plan |
Focus of This Tutorial
This tutorial walks you through the following stages:
- Setting up Azure resources
- Running the project locally
- Publishing the project to Azure
- Running the application on Azure
- Troubleshooting common issues
Each of the mentioned aspects has numerous corresponding tools and solutions. The relevant information for this session is listed in the table below.
Local OS
Windows |
Linux |
Mac |
|
|
V |
How to setup Azure resources
Portal (i.e., REST api) |
ARM |
Bicep |
Terraform |
|
|
V |
|
How to deploy project to Azure
VSCode |
CLI |
Azure DevOps |
GitHub Action |
|
|
V |
|
3. Setup Azure Resources
File and Directory Structure
Please open a terminal and enter the following commands:
git clone https://github.com/theringe/azure-appservice-ai.git
cd azure-appservice-ai
bash ./tensorflow/tools/add-venv.sh
If you are using a Windows platform, use the following alternative PowerShell commands instead:
git clone https://github.com/theringe/azure-appservice-ai.git
cd azure-appservice-ai
.\tensorflow\tools\add-venv.cmd
After completing the execution, you should see the following directory structure:
File and Path |
Purpose |
tensorflow/tools/add-venv.* |
The script executed in the previous step (cmd for Windows, sh for Linux/Mac) to create all Python virtual environments required for this tutorial. |
.venv/tensorflow-webjob/ |
A virtual environment specifically used for training models (i.e., tokenize text, construct a neuron network for training). |
The list of packages (with exact versions) required for the tensorflow-webjob virtual environment. |
|
.venv/tensorflow/ |
A virtual environment specifically used for the Flask application, enabling API endpoint access for querying predictions (i.e., MBTI). |
The list of packages (with exact versions) required for the tensorflow virtual environment. |
|
tensorflow/ |
The main folder for this tutorial. |
The Bicep template to setup all the Azure resources related to this tutorial, including an App Service Plan, a Web App, and a Storage Account. |
|
tensorflow/tools/create-folder.* |
A script to create all directories required for this tutorial in the File Share, including train, model, and test. |
tensorflow/tools/download-sample-training-set.* |
A script to download a sample training set from MBTI text classifer trained on the Kaggle MBTI dataset, containing MBTI and post data from social media platforms, into the train directory of the File Share. |
A script for tokenizing the posts from each record, training an LSTM-based model for MBTI classification, and saves the embedding vectors in the model directory of the File Share. |
|
tensorflow/App_Data/jobs/triggered/train-mbti-model/train_mbti_model.sh |
A shell script for Azure App Service web jobs. It activates the tensorflow-webjob virtual environment and starts the train_mbti_model.py script. |
Code for the Flask application, including routes, port configuration, input parsing, vectors loading, predictions, and output generation. |
|
A script executed after deployment (as specified in the Bicep template startup command I will introduce it later). It sets up the virtual environment and starts the Flask application to handle web requests. |
|
A process document for an Azure DevOps pipeline, detailing the steps to deploy code to an Azure Web App. |
Bicep Template
We need to create the following resources or services:
|
Manual Creation Required |
Resource/Service |
App Service Plan |
No |
Resource (plan) |
App Service |
Yes |
Resource (app) |
Storage Account |
Yes |
Resource (storageAccount) |
File Share |
Yes |
Service |
Let’s take a look at the tensorflow/tools/bicep-template.bicep file. Refer to the configuration section for all the resources.
Since most of the configuration values don’t require changes, I’ve placed them in the variables section of the ARM template rather than the parameters section. This helps keep the configuration simpler. However, I’d still like to briefly explain some of the more critical settings.
As you can see, I’ve adopted a camelCase naming convention, which combines the [Resource Type] with [Setting Name and Hierarchy]. This makes it easier to understand where each setting will be used. The configurations in the diagram are sorted by resource name, but the following list is categorized by functionality for better clarity.
Configuration Name |
Value |
Purpose |
storageAccountFileShareName |
data-and-model |
[Purpose 1: Link File Share to Web App] |
storageAccountFileShareShareQuota |
5120 |
[Purpose 1: Link File Share to Web App] |
storageAccountFileShareEnabledProtocols |
SMB |
[Purpose 1: Link File Share to Web App] |
appSiteConfigAzureStorageAccountsType |
AzureFiles |
[Purpose 1: Link File Share to Web App] |
appSiteConfigAzureStorageAccountsProtocol |
Smb |
[Purpose 1: Link File Share to Web App] |
planKind |
linux |
[Purpose 2: Specify platform and stack runtime] Select Linux (default if Python stack is chosen) |
planSkuTier |
Premium0V3 |
[Purpose 2: Specify platform and stack runtime] |
planSkuName |
P0v3 |
[Purpose 2: Specify platform and stack runtime] |
appKind |
app,linux |
[Purpose 2: Specify platform and stack runtime] Same as above |
appSiteConfigLinuxFxVersion |
PYTHON|3.9 |
[Purpose 2: Specify platform and stack runtime] |
appSiteConfigAppSettingsWEBSITES_CONTAINER_START_TIME_LIMIT |
1800 |
[Purpose 3: Deploying] The value is in seconds, ensuring the Startup Command can continue execution beyond the default timeout of 230 seconds. This tutorial’s Startup Command typically takes around 1200 seconds, so setting it to 1800 seconds (i.e., it is the max value) provides a safety margin and accommodates future project expansion (e.g., adding more packages) |
appSiteConfigAppCommandLine |
[ -f /home/site/wwwroot/start.sh ] && bash /home/site/wwwroot/start.sh || GUNICORN_CMD_ARGS=\"--timeout 600 --access-logfile '-' --error-logfile '-' -c /opt/startup/gunicorn.conf.py --chdir=/opt/defaultsite\" gunicorn application:app |
[Purpose 3: Deploying] This is the Startup Command, which can be break down into 3 parts:
Since the command is enclosed in double quotes within the ARM template, during actual execution, replace \" with " |
appSiteConfigAppSettingsSCM_DO_BUILD_DURING_DEPLOYMENT |
false |
[Purpose 3: Deploying] Since we have already defined the handling for different virtual environments in start.sh, we do not need to initiate the default build process of the Web App |
appSiteConfigAppSettingsWEBSITES_ENABLE_APP_SERVICE_STORAGE |
true |
[Purpose 4: Webjobs] This setting is required to enable the App Service storage feature, which is necessary for using web jobs (e.g., for model training) |
storageAccountPropertiesAllowSharedKeyAccess |
true |
[Purpose 5: Troubleshooting] |
Return to terminal and execute the following commands (their purpose has been described earlier).
# Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai
# Please change <RegionName> to your prefer region, for example: eastus2
# Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account)
az group create --name <ResourceGroupName> --location <RegionName>
az deployment group create --resource-group <ResourceGroupName> --template-file ./tensorflow/tools/bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName>
If you are using a Windows platform, use the following alternative PowerShell commands instead:
# Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai
# Please change <RegionName> to your prefer region, for example: eastus2
# Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account)
az group create --name <ResourceGroupName> --location <RegionName>
az deployment group create --resource-group <ResourceGroupName> --template-file .\tensorflow\tools\bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName>
After execution, please copy the output section containing 3 key-value pairs from the result like this.
Return to terminal and execute the following commands:
# Please setup 3 variables you've got from the previous step
OUTPUT_STORAGE_NAME="<outputStorageName>"
OUTPUT_STORAGE_KEY="<outputStorageKey>"
OUTPUT_SHARE_NAME="<outputShareName>"
# URL encode the storage key
ENCODED_OUTPUT_STORAGE_KEY=$(python3 -c "
import urllib.parse
key = '''$OUTPUT_STORAGE_KEY'''
encoded_key = urllib.parse.quote(key, safe='') # No safe characters, encode everything
print(encoded_key)
")
# Mount
open smb://$OUTPUT_STORAGE_NAME:$ENCODED_OUTPUT_STORAGE_KEY@$OUTPUT_STORAGE_NAME.file.core.windows.net/$OUTPUT_SHARE_NAME
Or you could simply go to Azure Portal, navigate to the File Share you just created, and refer to the diagram below to copy the required command. You can choose Linux or Windows if you are using such OS in your dev environment.
After executing the command, the network drive will be successfully mounted.
4. Running Locally
Training Models and Training Data
Return to terminal and execute the following commands (their purpose has been described earlier).
source .venv/tensorflow-webjob/bin/activate
bash ./tensorflow/tools/create-folder.sh
bash ./tensorflow/tools/download-sample-training-set.sh
python ./tensorflow/webjob/train_mbti_model.py
If you are using a Windows platform, use the following alternative PowerShell commands instead:
.\.venv\tensorflow-webjob\Scripts\Activate.ps1
.\tensorflow\tools\create-folder.cmd
.\tensorflow\tools\download-sample-training-set.cmd
python .\tensorflow\webjob\train_mbti_model.py
After execution, the File Share will now include the following directories and files.
Let’s take a brief detour to examine the structure of the training data downloaded from the GitHub.
The dataset used in this project focuses on MBTI (Myers-Briggs Type Indicator) personality types. Each record in the dataset contains a user’s MBTI type and a collection of their social media posts, separated by |||. This tutorial repurposes the dataset to classify personality types based on textual data.
This image represents the raw data, where each line includes an MBTI type and its associated text. For training, the posts are tokenized and transformed into numerical sequences using TensorFlow's preprocessing tools. This step involves converting each word into a corresponding token based on a fixed vocabulary size. These sequences are then padded to a uniform length, ensuring consistency in the input data.
During training, the performance is heavily influenced by factors like data balancing and hyperparameter tuning. The MBTI dataset is inherently imbalanced, with certain personality types appearing far more frequently than others. To address this, only 30 samples per type are used in training to ensure balance. However, this approach simplifies the task and may lead to suboptimal results.
The inference step involves tokenizing a new input post and passing it through the trained model to predict the MBTI type. It is important to note that with the current setup, the inference results may often return the same prediction. This is due to the limited dataset size, imbalanced data handling, and the need for further tuning of training parameters such as the number of epochs, batch size, and learning rate.
This tutorial introduces an approach to train and infer MBTI personality types using TensorFlow. While the process highlights key steps like data preprocessing, model training, and inference, it does not delve deeply into AI-specific topics like advanced model optimization or deployment. To achieve better results, we could setup the dataset can be expanded to include more samples per personality type. Or setup hyperparameters like the learning rate, number of epochs, and embedding dimensions should be fine-tuned.
Predicting with the Model
Return to terminal and execute the following commands. First, deactivate the virtual environment, then activate the virtual environment for the Flask application, and finally, start the Flask app.
Commands for Linux or Mac:
deactivate
source .venv/tensorflow/bin/activate
python ./tensorflow/api/app.py
Commands for Windows:
deactivate
.\.venv\tensorflow\Scripts\Activate.ps1
python .\tensorflow\api\app.py
When you see a screen similar to the following, it means the server has started successfully. Press Ctrl+C to stop the server if needed.
Before conducting the actual test, let’s construct some sample query data:
I am happy
Next, open a terminal and use the following curl commands to send requests to the app:
curl -X GET "http://0.0.0.0:8000/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}'
You should see the prediction results.
PS: Your results may differ from mine due to variations in the sampling of your training dataset compared to mine.
5. Publishing the Project to Azure
Code Commit to Azure DevOps
First, create a new and empty repository (referred to as repo) under your Azure DevOps project and get its URL.
Open a terminal in the cloned azure-appservice-ai project directory and run the following commands to add the new repo as a push/pull target. Then, verify the associated git repos for the directory.
git remote add azure https://<organization>@dev.azure.com/<organization>/<project>/_git/azure-appservice-ai
git remote -v
Next, run the following commands in the terminal to push the entire project to the Azure DevOps repo.
git push azure --all
The following steps need to be performed only once.These configurations ensure that the pipeline can automatically deploy the tensorflow portion of the azure-appservice-ai project to the newly created Azure Web App.
Setup Service Connection:
- Go to Project Settings in Azure DevOps and perform the necessary operations.
- Specify the Service connection name as "azure-appservice-ai-tensorflow" (you can use any name for easy identification).
Create Pipeline YAML File:
- Navigate to the tensorflow subdirectory and create a new file named azure-pipeline.yml
- Copy the contents of another file named pipeline.yml (in the same directory) into azure-pipeline.yml
- Modify the variables section as indicated by the comments, then save and commit the changes.
Setup the Pipeline:
- Navigate to the Pipelines section and create a new pipeline.
- Follow the prompts to select the newly created azure-pipeline.yml as the pipeline script file.
- Save the configuration (do not run it yet).
The above setup steps only need to be done once. Next, you can deploy the project to the Azure Web App using the pipeline in different ways.
Publish to Azure Web App via Pipeline
Manual Trigger:
- Navigate to the newly created pipeline.
- Click Run Pipeline to start the deployment process.
- Click on the deployment to monitor its progress. Below is an example of a successful deployment screen.
Trigger on Push:
Alternatively, you can configure the pipeline to run automatically whenever new code is pushed to the Azure DevOps repo:
- Open a terminal and run the following commands (after code updates):
git push azure --all
- This will trigger a new pipeline deployment process.
6. Running on Azure Web App
Training the Model
Return to terminal and execute the following commands to invoke the WebJobs.
Commands for Linux or Mac:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; curl -X POST -H "Authorization: Bearer $token" -H "Content-Type: application/json" -d '{}' "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01"
Commands for Windows:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"; "Content-type" = "application/json"} -Method POST -Body '{}'
You could see the training status by execute the following commands.
Commands for Linux or Mac:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; response=$(curl -s -H "Authorization: Bearer $token" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01") ; echo "$response" | jq
Commands for Windows:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv); $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"} -Method GET ; $response | ConvertTo-Json -Depth 10
Processing |
Complete |
|
|
And you can get the latest detail log by execute the following commands.
Commands for Linux or Mac:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; history_id=$(az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | sed 's|.*/history/||') ; response=$(curl -X GET -H "Authorization: Bearer $token" -H "Content-Type: application/json" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01") ; log_url=$(echo "$response" | jq -r '.properties.output_url') ; curl -X GET -H "Authorization: Bearer $token" "$log_url"
Commands for Windows:
# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token = az account get-access-token --resource https://management.azure.com --query accessToken -o tsv ; $history_id = az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | ForEach-Object { ($_ -split "/history/")[-1] } ; $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01" -Headers @{ Authorization = "Bearer $token" } -Method GET ; $log_url = $response.properties.output_url ; Invoke-RestMethod -Uri $log_url -Headers @{ Authorization = "Bearer $token" } -Method GET
Once you see the report in the Logs, it indicates that the training is complete, and the Flask app is ready for predictions.
You can also find the newly trained models in the File Share mounted in your local environment.
Using the Model for Prediction
Just like in local testing, open a terminal and use the following curl commands to send requests to the app:
# Note: Replace the instance of tensorflow-bicep-app with the name of your web app.
curl -X GET "https://tensorflow-bicep-app.azurewebsites.net/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}'
As with the local environment, you should see the expected results.
7. Troubleshooting
docker.log freeze after deployment
- Symptom: I cannot get the latest deployment status after Azure DevOps publish the code to Web App via kudu site and frontpage getting error 504
- Cause: This project includes two virtual environments, each containing a TensorFlow package. During the start.sh process of creating these environments, each environment takes approximately 10 minutes to set up. As a result, the docker.log or Log Stream might temporarily stall for about 20 minutes at a certain stage.
- Resolution: After roughly 20 minutes, once all the packages are downloaded, the logs will resume recording.
Others
Using Scikit-learn on Azure Web App
8. Conclusion
TensorFlow, much like a Swiss Army knife, encompasses a wide range of training algorithms. While Azure Web App is not typically used for training models, it can still serve as a platform for inference. In the future, I plan to introduce how pre-trained models can be directly loaded using JavaScript. This approach allows the inference workload for non-sensitive models to be offloaded to the client side.