Programming can sometimes be challenging and tedious whether you are a beginner or experienced developer. The frustration comes in spending long periods of time searching for code snippets to unblock coding barriers like errors or finding the best code syntax to solve a problem. Thanks to the breakthroughs in OpenAI Codex GPT-3, developers can use the AI pair programming solution that translates natural language to code in GitHub Copilot in dozens of programming languages including Python, JavaScript, Go, Perl, PHP, Ruby, and more. In this session, we will be exploring how to use GitHub Copilot by simply describing in short sentences what we want, to solve common business problems that enable you to be more efficient and increase productivity from now on. We will use Python to see how GitHub Copilot can help developers integrate with the Azure Form Recognizer API to read contents of a receipt.
Prerequisites:
- Login or sign up for a Free Azure account.
- Install Visual Studio Code.
- Install the GitHub Copilot extension by completing the step in the “Installing the Visual Studio Code extension” section.
- Create an Azure Form Recognizer resource. Then get the endpoint URL and Key 1 to use later in this tutorial.
Using Keyboard Shortcuts
GitHub Copilot provides keyboard shortcuts to make the experience in Visual Studio Code user-friendly across platforms. We will be using Windows shortcuts in this tutorial. If you are using a Mac or Linux platform, refer to shortcuts for your environment.
Importing Libraries
- Create a new python file in Visual Studio Code.
- Install the python package for Azure Form Recognizer by running the following in the command prompt terminal:
pip install azure-ai-formrecognizer==3.2.0
- The first thing we need to figure out in our code is which python libraries are needed for using the Form Recognizer API. To do this, we’ll type the following python commented sentence:
# import azure form recognizer libraries
- To see the options that Copilot suggests, we need to click anywhere on the sentence and use the “Ctrl + Enter” shortcut key. This opens a “GitHub Copilot” pane with a list of code suggestions.
As you can see in the image above, Github Copilot generated 10 code suggestions (NOTE: GitHub Copilot hides suggestions that are duplicates). In the first suggestion, it provides two form recognizer libraries. In addition, it knows that we would need to connect to the API, so it includes an AzureKeyCredential class used for azure authentication. Finally, it includes a ResourceNotFoundError class, because it knows we'll need exception handling while accessing azure resources.
The next suggestion, it goes a step further by providing libraries for azure storage for cases where the input files are stored in an azure storage account. In addition, it includes the key vault for encrypting and storing information such as the API Key for security purposes.
So, as you can see, Copilot will suggest the bare minimum code a user needs as well as best practice code recommendations for things the user may not have considered.
- For this tutorial, we’ll go with the first suggestion and click on the “Accept Solution” above the code. This will automatically populate the code under the commented sentence in our python file.
Connecting to the Azure Form Recognizer API
- To connect to the Form Recognizer API, we need the endpoint URL and key that you created earlier.
- Next, we are going to ask Copilot to create a variable for the endpoint and key.
# create the client and authenticates with the endpoint and key
- To see the options that Copilot generated, we'll click anywhere on the commented sentence line and press the “Ctrl + Enter” shortcut key. This will open a “GitHub Copilot” pane with a list of code suggestions.
- The suggestions that it generates are straight-forward because we just need to create two variables.
- We'll click on the “Accept Solution” text above the first code suggestion. This will automatically populate the code under the commented sentence.
- For the code to connect to the Azure Form Recognizer instance that you created earlier, we'll need to replace the endpoint placeholder with your endpoint URL. Similarly, replace the key placeholder with your key 1 value.
- Next, we'll need to create a client connection by using your endpoint URL and key variable to authenticate with the Form Recognizer API. To do this, we'll typed the following commented sentence:
# create the client and authenticate with the endpoint and key
- This will display one generated code suggestion. When the Azure Form Recognizer API uses an endpoint and key to successfully authorize a connection, it returns a client object. In our case, this includes a FormRecognizerClient client variable to be used for prebuilt models; and another FormRecognizerTrainingClient client variable to be used for training a custom model.
- We'll click on the “Accept Solution” text above code suggestion to be populated in our code under the commented sentence.
Recognizing a Receipt
The Azure Form Recognizer service has pretrained models to analyze and extract data fields or values for different types of forms and documents. For example, tax forms, sales invoices, printed or handwritten receipts, passport or ID cards, etc.
- We'll be using a pretrained model with a printed receipt in this tutorial. So, we'll need to create a variable for our receipt URL. For this, will just manually create a variable and specify the URL location of the file:
myReceiptUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"
- Next, we need to use the form recognizer client for recognizing a receipt URL. To do that, we'll type the following comment sentence:
# user form recognizer client to recognize image from myReceiptUrl
- Even with the misspelling of the word "User" instead of "Use", Copilot still understands the intend and generates two suggestions. For both cases, it is smart enough to figure out from the receipt URL that FormRecognizerClient was the correct client to use for the pretrained receipt model; even though we chose both client types earlier. The first suggestion is the right code from the sentence we typed. However, Copilot understands that after we recognize the data from the receipt, may need to view the data result extracted from the image. So, it provides additional code to get the output results from Form Recognizer's class for recognizing receipts from a URL.
Based on the two choices, we'll click on “Accept Solution” for the second suggestion.
Print out extracted receipt data
Now that the Form Recognizer has recognized the receipt image. We need to print out the receipt fields and values from the receipt. However, we are not sure what the data structure for the Form Recognizer output result is. So, we are going to rely on Copilot to see if it’ll be able to help us generate the code for printing out the results.
- The commented instruction sentence we’re going to use is:
# loop through results and extract data from receipt
- Copilot generates one suggestion. Since we are not sure what the output structure of Azure Form Recognizer is, we are going to “Accept Solution”. One thing to point out is, our sentence wanted to “extract data” from the result. However, OpenAI’s natural language understanding capability that GitHub Copilot uses, knows that we probably meant to print out the data.
- To run the Python code, we are going to click on the "Run" icon on the upper right-hand corner of the VS Code editor. See the icon highlighted in red, in the image below.
- From the output, we can see the Field name and value for each line item of the receipt we provided. In addition, we can see the confidence score for the prediction of the form recognizer model’s ability of comprehending data for each line on the receipt image.
Conclusion
As you can see, it only took us 5 short comment sentences to write an application that is able to use Azure Form Recognizer to read a receipt image, analyze and extract the contents of the receipt. In addition, we observed GitHub Copilot's ability to understand a user's intent even when there are misspelling, or the instruction sentence is not very detail. These are OpenAI's natural language processing capabilities that are built-in GitHub Copilot. It does not just read commented instructions and generate code; it pays attention to the context of the user's code to perform autocompletes when the user is typing or generate code based on what the user has coded so far. These are helpful to reduce the time it takes to write repetitive boiler-plate code. This tutorial also illustrated situations where programmers are face with a new API. Trying to figure out an APIs client connectivity or how use its services are not always easy, so GitHub is a great paired programmer to help a developer on syntax. Overall, this is a useful tool for developers to be more productive when programming and implementing applications faster.