Apps on Azure Blog

15 MIN READ

Using TensorFlow on Azure Web App

theringe

Microsoft

Nov 27, 2024

Using the well-known artificial intelligence framework TensorFlow on Azure Web App can help you bring your ideas to life more quickly. This tutorial provides a step-by-step guide to help you deploy your TensorFlow project on an Azure Web App, covering everything from resource setup to troubleshooting common issues.

TOC

Introduction to OpenAI
System Architecture
- Architecture
- Focus of This Tutorial
Setup Azure Resources
- File and Directory Structure
- Bicep Template
Running Locally
- Training Models and Training Data
- Predicting with the Model
Publishing the Project to Azure
- Code Commit to Azure DevOps
- Publish to Azure Web App via Pipeline
Running on Azure Web App
- Training the Model
- Using the Model for Prediction
Troubleshooting
- docker.log freeze after deployment
- Others
Conclusion
References

1. Introduction to OpenAI

TensorFlow is an open-source machine learning framework developed by Google. It provides tools for building and deploying machine learning models, with a focus on flexibility and scalability. TensorFlow supports deep learning, classical machine learning, and neural network models, enabling tasks like image recognition, natural language processing, and time series forecasting.

At its core, TensorFlow uses computational graphs to model mathematical operations, allowing efficient computation on CPUs, GPUs, and TPUs. It features a high-level API, Keras, for easy model building, as well as lower-level APIs for advanced customization. TensorFlow also supports distributed training for large-scale datasets and diverse deployment options, including cloud services, mobile devices, and edge computing.

TensorFlow’s ecosystem includes TensorFlow Hub (pre-trained models), TensorFlow Lite (for mobile and IoT), and TensorFlow.js (for JavaScript applications). Its integration with visualization tools like TensorBoard simplifies debugging and performance monitoring. TensorFlow excels in production environments, offering features like TensorFlow Extended (TFX) for end-to-end ML pipelines.

With its versatile capabilities and large community, TensorFlow is widely used in industries like healthcare, finance, and technology, making it one of the most powerful tools for modern machine learning development.

2. System Architecture

Architecture

Development Environment

OS:	macOS
Version:	Sonoma 14.1.1
Python Version:	3.9.20

Azure Resources

App Service Plan:	SKU - Premium Plan 0 V3
App Service:	Platform - Linux (Python 3.9, Version 3.9.19)
Storage Account:	SKU - General Purpose V2
File Share:	No backup plan

Focus of This Tutorial

This tutorial walks you through the following stages:

Setting up Azure resources
Running the project locally
Publishing the project to Azure
Running the application on Azure
Troubleshooting common issues

Each of the mentioned aspects has numerous corresponding tools and solutions. The relevant information for this session is listed in the table below.

Local OS

Windows	Linux	Mac
		V

How to setup Azure resources

Portal (i.e., REST api)	ARM	Bicep	Terraform
		V

How to deploy project to Azure

VSCode	CLI	Azure DevOps	GitHub Action
		V

3. Setup Azure Resources

File and Directory Structure

Please open a terminal and enter the following commands:

git clone https://github.com/theringe/azure-appservice-ai.git
cd azure-appservice-ai
bash ./tensorflow/tools/add-venv.sh

If you are using a Windows platform, use the following alternative PowerShell commands instead:

git clone https://github.com/theringe/azure-appservice-ai.git
cd azure-appservice-ai
.\tensorflow\tools\add-venv.cmd

After completing the execution, you should see the following directory structure:

File and Path	Purpose
tensorflow/tools/add-venv.*	The script executed in the previous step (cmd for Windows, sh for Linux/Mac) to create all Python virtual environments required for this tutorial.
.venv/tensorflow-webjob/	A virtual environment specifically used for training models (i.e., tokenize text, construct a neuron network for training).
tensorflow/webjob/requirements.txt	The list of packages (with exact versions) required for the tensorflow-webjob virtual environment.
.venv/tensorflow/	A virtual environment specifically used for the Flask application, enabling API endpoint access for querying predictions (i.e., MBTI).
tensorflow/requirements.txt	The list of packages (with exact versions) required for the tensorflow virtual environment.
tensorflow/	The main folder for this tutorial.
tensorflow/tools/bicep-template.bicep	The Bicep template to setup all the Azure resources related to this tutorial, including an App Service Plan, a Web App, and a Storage Account.
tensorflow/tools/create-folder.*	A script to create all directories required for this tutorial in the File Share, including train, model, and test.
tensorflow/tools/download-sample-training-set.*	A script to download a sample training set from MBTI text classifer trained on the Kaggle MBTI dataset, containing MBTI and post data from social media platforms, into the train directory of the File Share.
tensorflow/webjob/train_mbti_model.py	A script for tokenizing the posts from each record, training an LSTM-based model for MBTI classification, and saves the embedding vectors in the model directory of the File Share.
tensorflow/App_Data/jobs/triggered/train-mbti-model/train_mbti_model.sh	A shell script for Azure App Service web jobs. It activates the tensorflow-webjob virtual environment and starts the train_mbti_model.py script.
tensorflow/api/app.py	Code for the Flask application, including routes, port configuration, input parsing, vectors loading, predictions, and output generation.
tensorflow/start.sh	A script executed after deployment (as specified in the Bicep template startup command I will introduce it later). It sets up the virtual environment and starts the Flask application to handle web requests.
tensorflow/pipeline.yml	A process document for an Azure DevOps pipeline, detailing the steps to deploy code to an Azure Web App.

Bicep Template

We need to create the following resources or services:

	Manual Creation Required	Resource/Service
App Service Plan	No	Resource (plan)
App Service	Yes	Resource (app)
Storage Account	Yes	Resource (storageAccount)
File Share	Yes	Service

Let’s take a look at the tensorflow/tools/bicep-template.bicep file. Refer to the configuration section for all the resources.

Since most of the configuration values don’t require changes, I’ve placed them in the variables section of the ARM template rather than the parameters section. This helps keep the configuration simpler. However, I’d still like to briefly explain some of the more critical settings.

As you can see, I’ve adopted a camelCase naming convention, which combines the [Resource Type] with [Setting Name and Hierarchy]. This makes it easier to understand where each setting will be used. The configurations in the diagram are sorted by resource name, but the following list is categorized by functionality for better clarity.

Configuration Name	Value	Purpose
storageAccountFileShareName	data-and-model	[Purpose 1: Link File Share to Web App] Use this fixed name for File Share
storageAccountFileShareShareQuota	5120	[Purpose 1: Link File Share to Web App] The value is in GB
storageAccountFileShareEnabledProtocols	SMB	[Purpose 1: Link File Share to Web App]
appSiteConfigAzureStorageAccountsType	AzureFiles	[Purpose 1: Link File Share to Web App]
appSiteConfigAzureStorageAccountsProtocol	Smb	[Purpose 1: Link File Share to Web App]
planKind	linux	[Purpose 2: Specify platform and stack runtime] Select Linux (default if Python stack is chosen)
planSkuTier	Premium0V3	[Purpose 2: Specify platform and stack runtime] Choose at least Premium Plan to ensure enough memory for your AI workloads
planSkuName	P0v3	[Purpose 2: Specify platform and stack runtime] Same as above
appKind	app,linux	[Purpose 2: Specify platform and stack runtime] Same as above
appSiteConfigLinuxFxVersion	PYTHON\|3.9	[Purpose 2: Specify platform and stack runtime] Select Python 3.9 to avoid dependency issues
appSiteConfigAppSettingsWEBSITES_CONTAINER_START_TIME_LIMIT	1800	[Purpose 3: Deploying] The value is in seconds, ensuring the Startup Command can continue execution beyond the default timeout of 230 seconds. This tutorial’s Startup Command typically takes around 1200 seconds, so setting it to 1800 seconds (i.e., it is the max value) provides a safety margin and accommodates future project expansion (e.g., adding more packages)
appSiteConfigAppCommandLine	[ -f /home/site/wwwroot/start.sh ] && bash /home/site/wwwroot/start.sh \|\| GUNICORN_CMD_ARGS=\"--timeout 600 --access-logfile '-' --error-logfile '-' -c /opt/startup/gunicorn.conf.py --chdir=/opt/defaultsite\" gunicorn application:app	[Purpose 3: Deploying] This is the Startup Command, which can be break down into 3 parts: First (-f /home/site/wwwroot/start.sh): Checks whether start.sh exists. This is used to determine whether the app is in its initial state (just created) or has already been deployed. Second (bash /home/site/wwwroot/start.sh): If the file exists, it means the app has already been deployed. The start.sh script will be executed, which installs the necessary packages and starts the Flask application. Third (GUNICORN_CMD_ARGS=\"--timeout 600 --access-logfile '-' --error-logfile '-' -c /opt/startup/gunicorn.conf.py --chdir=/opt/defaultsite\" gunicorn application:app): If the file does not exist, the command falls back to the default HTTP server (gunicorn) to start the web app. Since the command is enclosed in double quotes within the ARM template, during actual execution, replace \" with "
appSiteConfigAppSettingsSCM_DO_BUILD_DURING_DEPLOYMENT	false	[Purpose 3: Deploying] Since we have already defined the handling for different virtual environments in start.sh, we do not need to initiate the default build process of the Web App
appSiteConfigAppSettingsWEBSITES_ENABLE_APP_SERVICE_STORAGE	true	[Purpose 4: Webjobs] This setting is required to enable the App Service storage feature, which is necessary for using web jobs (e.g., for model training)
storageAccountPropertiesAllowSharedKeyAccess	true	[Purpose 5: Troubleshooting] This setting is enabled by default. The reason for highlighting it is that certain enterprise IT policies may enforce changes to this configuration after a period, potentially causing a series of issues. For more details, please refer to the Troubleshooting section below.

Return to terminal and execute the following commands (their purpose has been described earlier).

# Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai
# Please change <RegionName> to your prefer region, for example: eastus2
# Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account)
az group create --name <ResourceGroupName> --location <RegionName>
az deployment group create --resource-group <ResourceGroupName> --template-file ./tensorflow/tools/bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName>

If you are using a Windows platform, use the following alternative PowerShell commands instead:

# Please change <ResourceGroupName> to your prefer name, for example: azure-appservice-ai
# Please change <RegionName> to your prefer region, for example: eastus2
# Please change <ResourcesPrefixName> to your prefer naming pattern, for example: tensorflow-bicep (it will create tensorflow-bicep-asp as App Service Plan, tensorflow-bicep-app for web app, and tensorflowbicepsa for Storage Account)
az group create --name <ResourceGroupName> --location <RegionName>
az deployment group create --resource-group <ResourceGroupName> --template-file .\tensorflow\tools\bicep-template.bicep --parameters resourcePrefix=<ResourcesPrefixName>

After execution, please copy the output section containing 3 key-value pairs from the result like this.

Return to terminal and execute the following commands:

# Please setup 3 variables you've got from the previous step
OUTPUT_STORAGE_NAME="<outputStorageName>"
OUTPUT_STORAGE_KEY="<outputStorageKey>"
OUTPUT_SHARE_NAME="<outputShareName>"
# URL encode the storage key
ENCODED_OUTPUT_STORAGE_KEY=$(python3 -c "
import urllib.parse
key = '''$OUTPUT_STORAGE_KEY'''
encoded_key = urllib.parse.quote(key, safe='')  # No safe characters, encode everything
print(encoded_key)
")
# Mount
open smb://$OUTPUT_STORAGE_NAME:$ENCODED_OUTPUT_STORAGE_KEY@$OUTPUT_STORAGE_NAME.file.core.windows.net/$OUTPUT_SHARE_NAME

Or you could simply go to Azure Portal, navigate to the File Share you just created, and refer to the diagram below to copy the required command. You can choose Linux or Windows if you are using such OS in your dev environment.

After executing the command, the network drive will be successfully mounted.

4. Running Locally

Training Models and Training Data

Return to terminal and execute the following commands (their purpose has been described earlier).

source .venv/tensorflow-webjob/bin/activate
bash ./tensorflow/tools/create-folder.sh
bash ./tensorflow/tools/download-sample-training-set.sh
python ./tensorflow/webjob/train_mbti_model.py

If you are using a Windows platform, use the following alternative PowerShell commands instead:

.\.venv\tensorflow-webjob\Scripts\Activate.ps1
.\tensorflow\tools\create-folder.cmd
.\tensorflow\tools\download-sample-training-set.cmd
python .\tensorflow\webjob\train_mbti_model.py

After execution, the File Share will now include the following directories and files.

Let’s take a brief detour to examine the structure of the training data downloaded from the GitHub.

The dataset used in this project focuses on MBTI (Myers-Briggs Type Indicator) personality types. Each record in the dataset contains a user’s MBTI type and a collection of their social media posts, separated by |||. This tutorial repurposes the dataset to classify personality types based on textual data.

This image represents the raw data, where each line includes an MBTI type and its associated text. For training, the posts are tokenized and transformed into numerical sequences using TensorFlow's preprocessing tools. This step involves converting each word into a corresponding token based on a fixed vocabulary size. These sequences are then padded to a uniform length, ensuring consistency in the input data.

During training, the performance is heavily influenced by factors like data balancing and hyperparameter tuning. The MBTI dataset is inherently imbalanced, with certain personality types appearing far more frequently than others. To address this, only 30 samples per type are used in training to ensure balance. However, this approach simplifies the task and may lead to suboptimal results.

The inference step involves tokenizing a new input post and passing it through the trained model to predict the MBTI type. It is important to note that with the current setup, the inference results may often return the same prediction. This is due to the limited dataset size, imbalanced data handling, and the need for further tuning of training parameters such as the number of epochs, batch size, and learning rate.

This tutorial introduces an approach to train and infer MBTI personality types using TensorFlow. While the process highlights key steps like data preprocessing, model training, and inference, it does not delve deeply into AI-specific topics like advanced model optimization or deployment. To achieve better results, we could setup the dataset can be expanded to include more samples per personality type. Or setup hyperparameters like the learning rate, number of epochs, and embedding dimensions should be fine-tuned.

Predicting with the Model

Return to terminal and execute the following commands. First, deactivate the virtual environment, then activate the virtual environment for the Flask application, and finally, start the Flask app.

Commands for Linux or Mac:

deactivate
source .venv/tensorflow/bin/activate
python ./tensorflow/api/app.py

Commands for Windows:

deactivate
.\.venv\tensorflow\Scripts\Activate.ps1
python .\tensorflow\api\app.py

When you see a screen similar to the following, it means the server has started successfully. Press Ctrl+C to stop the server if needed.

Before conducting the actual test, let’s construct some sample query data:

I am happy

Next, open a terminal and use the following curl commands to send requests to the app:

curl -X GET "http://0.0.0.0:8000/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}'

You should see the prediction results.

PS: Your results may differ from mine due to variations in the sampling of your training dataset compared to mine.

5. Publishing the Project to Azure

Code Commit to Azure DevOps

First, create a new and empty repository (referred to as repo) under your Azure DevOps project and get its URL.

Open a terminal in the cloned azure-appservice-ai project directory and run the following commands to add the new repo as a push/pull target. Then, verify the associated git repos for the directory.

git remote add azure https://<organization>@dev.azure.com/<organization>/<project>/_git/azure-appservice-ai
git remote -v

Next, run the following commands in the terminal to push the entire project to the Azure DevOps repo.

git push azure --all

The following steps need to be performed only once.These configurations ensure that the pipeline can automatically deploy the tensorflow portion of the azure-appservice-ai project to the newly created Azure Web App.

Setup Service Connection:

Go to Project Settings in Azure DevOps and perform the necessary operations.
Specify the Service connection name as "azure-appservice-ai-tensorflow" (you can use any name for easy identification).

Create Pipeline YAML File:

Navigate to the tensorflow subdirectory and create a new file named azure-pipeline.yml
Copy the contents of another file named pipeline.yml (in the same directory) into azure-pipeline.yml
Modify the variables section as indicated by the comments, then save and commit the changes.

Setup the Pipeline:

Navigate to the Pipelines section and create a new pipeline.
Follow the prompts to select the newly created azure-pipeline.yml as the pipeline script file.
Save the configuration (do not run it yet).

The above setup steps only need to be done once. Next, you can deploy the project to the Azure Web App using the pipeline in different ways.

Publish to Azure Web App via Pipeline

Manual Trigger:

Navigate to the newly created pipeline.
Click Run Pipeline to start the deployment process.
Click on the deployment to monitor its progress. Below is an example of a successful deployment screen.

Trigger on Push:

Alternatively, you can configure the pipeline to run automatically whenever new code is pushed to the Azure DevOps repo:

Open a terminal and run the following commands (after code updates):
```
git push azure --all
```
This will trigger a new pipeline deployment process.

6. Running on Azure Web App

Training the Model

Return to terminal and execute the following commands to invoke the WebJobs.

Commands for Linux or Mac:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; curl -X POST -H "Authorization: Bearer $token" -H "Content-Type: application/json" -d '{}' "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01"

Commands for Windows:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/run?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"; "Content-type" = "application/json"} -Method POST -Body '{}'

You could see the training status by execute the following commands.

Commands for Linux or Mac:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; response=$(curl -s -H "Authorization: Bearer $token" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01") ; echo "$response" | jq

Commands for Windows:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv); $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/webjobs?api-version=2024-04-01" -Headers @{Authorization = "Bearer $token"} -Method GET ; $response | ConvertTo-Json -Depth 10

Processing

Complete

And you can get the latest detail log by execute the following commands.

Commands for Linux or Mac:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
token=$(az account get-access-token --resource https://management.azure.com --query accessToken -o tsv) ; history_id=$(az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | sed 's|.*/history/||') ; response=$(curl -X GET -H "Authorization: Bearer $token" -H "Content-Type: application/json" "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01") ; log_url=$(echo "$response" | jq -r '.properties.output_url') ; curl -X GET -H "Authorization: Bearer $token" "$log_url"

Commands for Windows:

# Please change <subscription_id> <resourcegroup_name> and <webapp_name> to your own
$token = az account get-access-token --resource https://management.azure.com --query accessToken -o tsv ; $history_id = az webapp webjob triggered log --resource-group <resourcegroup_name> --name <webapp_name> --webjob-name train-mbti-model --query "[0].id" -o tsv | ForEach-Object { ($_ -split "/history/")[-1] } ; $response = Invoke-RestMethod -Uri "https://management.azure.com/subscriptions/<subscription_id>/resourceGroups/<resourcegroup_name>/providers/Microsoft.Web/sites/<webapp_name>/triggeredwebjobs/train-mbti-model/history/$history_id/?api-version=2024-04-01" -Headers @{ Authorization = "Bearer $token" } -Method GET ; $log_url = $response.properties.output_url ; Invoke-RestMethod -Uri $log_url -Headers @{ Authorization = "Bearer $token" } -Method GET

Once you see the report in the Logs, it indicates that the training is complete, and the Flask app is ready for predictions.

You can also find the newly trained models in the File Share mounted in your local environment.

Using the Model for Prediction

Just like in local testing, open a terminal and use the following curl commands to send requests to the app:

# Note: Replace the instance of tensorflow-bicep-app with the name of your web app.
curl -X GET "https://tensorflow-bicep-app.azurewebsites.net/api/detect" -H "Content-Type: application/json" -d '{"post": "I am happy"}'

As with the local environment, you should see the expected results.

7. Troubleshooting

docker.log freeze after deployment

Symptom: I cannot get the latest deployment status after Azure DevOps publish the code to Web App via kudu site and frontpage getting error 504
Cause: This project includes two virtual environments, each containing a TensorFlow package. During the start.sh process of creating these environments, each environment takes approximately 10 minutes to set up. As a result, the docker.log or Log Stream might temporarily stall for about 20 minutes at a certain stage.
Resolution: After roughly 20 minutes, once all the packages are downloaded, the logs will resume recording.

Others

Using Scikit-learn on Azure Web App

Using OpenAI on Azure Web App

8. Conclusion

TensorFlow, much like a Swiss Army knife, encompasses a wide range of training algorithms. While Azure Web App is not typically used for training models, it can still serve as a platform for inference. In the future, I plan to introduce how pre-trained models can be directly loaded using JavaScript. This approach allows the inference workload for non-sensitive models to be offloaded to the client side.