AI allows you to deliver breakthrough experiences in your apps. With Azure Cognitive Services, you can easily customize and deploy the same AI models that power Microsoft’s products, such as Xbox and Bing, using the tools and languages of your choice.
In this blog we will walk through an exercise that you can complete in under an hour and learn how to build an application that can be useful for you, all while exploring a set of Azure services. If you have ever wanted to get your financial transactions in order, look no further. With this exercise, we’ll explore how to quickly take a snap of a receipt from your phone and upload it for categorization, creating expense reports, and to gain insights to your spending. Remember, even though we’ll walk you through each step, you can always explore the sample code and get creative with your own unique solution!
Features of the application:
- Snap a picture of your receipt and upload it using your smartphone
- Extract relevant data from the images: Who issued the receipt? What was the total amount? What was purchased? All of this information can be effortlessly stored for exploration
- Query the data: bring your receipts to life by extracting relevant and insightful information
Prerequisites
- If you don't have an Azure subscription, create a free account before you begin. If you have a subscription, log in to the Azure Portal.
- You will need to have python installed locally to run some of the samples.
Key Azure technologies:
- Azure Form Recognizer scans image documents with optical character recognition and extracts text, key/value pairs, and tables from documents, receipts, and forms.
- Form Recognizer’s prebuilt receipt model specifically extracts receipt data
- Azure Blob Storage is used to store data
- Azure Cognitive Search enriches the data by making it easily identifiable
Solution Architecture
App Architecture Description:
- User uploads a receipt image from their mobile device
- The uploaded image is verified and then sent to the Azure Form Recognizer to extract information
- The image is analysed by the REST API within the Form Recognizer prebuilt receipt model
- A JSON is returned that has both the text information and bounding box coordinates of the extracted receipt data
- The resulting JSON is parsed and a simpler JSON is formed, saving only the relevant information needed
- This receipt JSON is then stored in Azure Blob Storage
- Azure Cognitive Search points directly to Azure Blob Storage and is used to index the data
- The application queries this search index to extract relevant information from the receipts
Another visual of the flow of data within the solution architecture is shown below.
Now that we’ve explored the technology and services we’ll be using, let’s dive into building our app!
Implementation
To get started, data from receipts must be extracted; this is done by setting up the Form Recognizer service in Azure and connecting to the service to use the relevant API for receipts. A JSON is returned that contains the information extracted from receipts and is stored in Azure Blob Storage to be used by Azure Cognitive Search. Cognitive Search is then utilized to index the receipt data, and to search for relevant information.
High level overview of steps, along with sample code snippets for illustration:
- Go to the Azure portal and create a new Form Recognizer resource. In the Create pane, provide the following information:
Name |
A descriptive name for your resource. |
Subscription |
Select the Azure subscription which has been granted access. |
Location |
The location of your cognitive service instance. Different locations may introduce latency, but have no impact on the runtime availability of your resource. |
Pricing Tier |
The cost of your resource depends on the pricing tier you choose and your usage. For more information, see the API pricing details. |
Resource Group |
The Azure resource group that will contain your resource. You can create a new group or add it to a pre-existing group. |
- After Form Recognizer deploys, go to All Resources and locate the newly deployed resource. Save the key and endpoint from the resource’s key and endpoint page somewhere so you can access it later.
- You can use the following Analyze Receipt API to start analyzing the receipt. Remember to replace <endpoint> & <subscription key> the values you saved earlier and replace <path to your receipt> with the local path to your scanned receipt image.
# Analyse script import json import time from requests import get, post # Endpoint URL endpoint = r"<endpoint url>" apim_key = "<subscription key>" post_url = endpoint + "/formrecognizer/v2.0/prebuilt/receipt/analyze" source = r"<path to your receipt>" headers = { # Request headers 'Content-Type': 'image/jpeg', 'Ocp-Apim-Subscription-Key': apim_key, } params = { "includeTextDetails": True } with open(source, "rb") as f: data_bytes = f.read() try: resp = post(url=post_url, data=data_bytes, headers=headers, params=params) if resp.status_code != 202: print("POST analyze failed:\n%s" % resp.text) quit() print("POST analyze succeeded:\n%s" % resp.headers) get_url = resp.headers["operation-location"] except Exception as e: print("POST analyze failed:\n%s" % str(e)) quit()
- If you run this code and everything is as it should be, you'll receive a 202 (Success) response that includes an Operation-Location header, which the script will print to the console. This header contains an operation id that you can use to query the status of the asynchronous operation and get the results. In the following example value, the string after operations/ is the operation ID.
|
- Now you can call the Get Analyze Receipt Result API to get the Extracted Data.
# Get results. n_tries = 10 n_try = 0 wait_sec = 6 while n_try < n_tries: try: resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key}) resp_json = json.loads(resp.text) if resp.status_code != 200: print("GET Receipt results failed:\n%s" % resp_json) quit() status = resp_json["status"] if status == "succeeded": print("Receipt Analysis succeeded:\n%s" % resp_json) quit() if status == "failed": print("Analysis failed:\n%s" % resp_json) quit() # Analysis still running. Wait and retry. time.sleep(wait_sec) n_try += 1 except Exception as e: msg = "GET analyze results failed:\n%s" % str(e) print(msg) quit()
This code uses the operation id and makes another API call.
- The JSON that is returned can be examined to get the required information - ‘readResults’ field will contain all lines of text that was decipherable, and the ‘documentResults’ field contains ‘key/value’ information for the most relevant parts of the receipt (e.g. the merchant, total, line items etc.)
The receipt image below,
MerchantName: THE MAD HUNTER TransactionDate: 2020-08-23 TransactionTime: 22:07:00 Total: £107.10
- We will now create a JSON from all the data extracted from the analysed receipt. The structure of the JSON is shown below:
{ "id":"INV001", "user":"Sujith Kumar", "createdDateTime":"2020-10-23T17:16:32Z", "MerchantName":"THE MAD HUNTER", "TransactionDate":"2020-10-23", "TransactionTime":"22:07:00", "currency":"GBP", "Category":"Entertainment", "Total":"107.10", "Items":[ ] }
We can now save this JSON and build a search service to extract the information we want from it.
Before continuing onto step 8, you must have an Azure Storage Account with Blob storage.
- We will now save the JSON files in an Azure Blob Storage container and use it as a source for the Azure Cognitive Search Service Index that we will create.
- Sign-in to the Azure Portal and search for "Azure Cognitive Search" or navigate to the resource through Web > Azure Cognitive Search. Follow the steps to:
- Choose a subscription
- Set a resource group
- Name the service appropriately
- Choose a location
- Choose a pricing tier for this service
- Create your service
- Get a key and URL endpoint
We will use the free Azure service, which means you can create three indexes, three data sources and three indexers. The dashboard will show you how many of each you have left. For this exercise you will create one of each.
- In the portal, find the search service you created above and click Import data on the command bar to start the wizard. In the wizard, click on Connect to your data and specify the name, type, and connection information. Skip the ‘Enrich Content’ page and go to Customize Target Index.
- For this exercise, we will use the wizard to generate a basic index for our receipt data. Minimally, an index requires a name and a fields collection; one of the fields should be marked as the document key to uniquely identify each document.
Fields have data types and attributes. The check boxes across the top are index attributes controlling how the field is used.
- Retrievable means that it shows up in search results list. You can mark individual fields as off limits for search results by clearing this checkbox.
- Key is the unique document identifier. It's always a string, and it is required.
- Filterable, Sortable, and Facetable determine whether fields are used in a filter, sort, or faceted navigation structure.
- Searchable means that a field is included in full text search. Only Strings are searchable.
Make sure you choose the following fields:
- id
- user
- createdDateTime
- MerchantName
- TransactionDate
- TransactionTime
- Currency
- Category
- Total
- Still in the Import data wizard, click Indexer > Name, and type a name for the indexer.
This object defines an executable process. For now, use the default option (Once) to run the indexer once, immediately.
- Click Submit to create and simultaneously run the indexer.
Soon you should see the newly created indexer in the list, with status indicating "in progress" or success, along with the number of documents indexed.
The main service page provides links to the resources created in your Azure Cognitive Search service. To view the index you just created, click Indexes from the list of links.
- Click on the index (azureblob-indexer in this case) from the list of links and view the index-schema.
Now you should have a search index that you can use to query the receipt data that’s been extracted from the uploaded receipts.
- Click the search explorer
- From the index drop down choose the relevant index. Choose the default API Version (2020-06-30) for this exercise.
- In the search bar paste a query string (for eg. category='Entertainment')
You will get results as verbose JSON documents as shown below:
Now that you have built a query indexer and aimed it at your data you can now use it to build queries programmatically and extract information to answer some of the following questions:
- How much did I spend last Thursday?
- How much have I spent on entertainment over the last quarter?
- Did I spend anything at ‘The Crown and Pepper’ last month?
Additional Ideas
In addition to the services and functionalities used throughout this exercise, there are numerous other ways you can use Azure AI to build in support for all kinds of receipts or invoices. For example, the logo extractor can be used to identify logos of popular restaurants or hotel chains, and the business card model can ingest business contact information just as easily as we saw with receipts.
We encourage you to explore some of the following ideas to enrich your application:
- Search invoices for specific line items
- Train the models to recognize different expense categories such as entertainment, supplies, etc.
- Add Language Understanding (LUIS) to ask your app questions in natural language and extract formatted reports
- Add Azure QnA Maker to your app and get insights such as how much you spent on entertainment last month, or other categories of insights you’d like to explore