I built a game of 1-on-1 Poker using Azure IoT and AI technologies. The resulting architecture ended up being remarkably similar to what could be used for real-world applications.
At the Microsoft Technology Centers, we have architectural discussions with a lot of customers who need to set up an IoT Architecture to bring data into Azure for dashboarding and analysis. Although the data needs are varied, the desired architectures usually have several commonalities:
- Continual data coming from sensors, including imagery devices
- Required data reduction and stream processing
- The ability to view the system’s current state in some comprehendible manner (“dashboarding”)
- The ability to perform custom actions when specific data conditions occur
Over the summer, I was trying to come up with a new demo for the center. I really wanted to show off Azure Digital Twins (Digital Twins – Modeling and Simulations | Microsoft Azure). I also had various pieces of equipment that had been “liberated” from the pandemically empty office. One of which was an Azure Percept DK (Azure Percept | Edge Computing Solution | Microsoft Azure), with its included vision module. Then it came to me. A card game, if implemented correctly, would be able to show all the necessary architectural pieces of a more advanced solution which then could be easily discussed in terms of other, real-world data sets, such as smart buildings, electric vehicles, aircraft fighters, etc. As a bonus, I’d be able to play cards with my customers during some of my sessions.
Down this road, I go. This is an architecture for a game of one-on-one Texas Hold ’Em, along with a concrete implementation. I hope the principles detailed here will aid the development of more advanced IoT architectures using Microsoft Azure. The game itself is driven by a Bot and uses Azure Digital Twins to create a live, updating model of the game, as well as being the only source of truth for the game data. The Azure Percept DK Vision Module is used to read the player cards using a custom model created with Azure Custom Vision and a couple of ready-made Azure Cognitive Services.
If you don’t know how to play Texas Hold ‘Em, just type it into your search engine of choice. There are thousands of articles on it.
The code for this solution is located at: microsoft/MTC_PokerBot (github.com). There are no warranties on this in any manner. Feel free to use as you like.
Top Level Architecture
I am going to go into detail on 3 different pieces of the architecture:
- The game itself. How does the Bot interface with the game components to provide the necessary UI/UX.
- Modeling the game within the Azure Digital Twins service. This is where the current state of the game is stored, so at any point you can examine the game itself, the player and dealer hands, and even look through the cards in the deck. Current statistical values are updated as telemetry while the game progresses.
- Reading playing cards with the Azure Percept. There are two separate types of machine learning used to read the cards. Suite determination is performed through a custom model built with Azure Custom Vision and uploaded to the Percept. Value is performed by reading the actual numbers off the cards with a combination of two Azure Cognitive Services, Form Recognizer and Optical Character Recognition (OCR). Since the Percept is running locally, but the game is managed in the Cloud, the camera outputs interface directly with the Azure Digital Twins data store (through an Azure Function), not the Bot itself.
The full picture looks like this, where the parts outlined in grey are the only pieces not running within Azure. They are running on my local machine, and the Azure Percept is connected to my local WiFi network.
Texas Hold ‘Em, Bot Style
The game Bot provides the functionality of a 1-on-1 game of Texas Hold ‘Em (but without the gambling component). This allows the game model to be extremely simple:
The Bot can also handle items such as cancellation and accept multiple command synonyms. I am not going to discuss generic Bot design and implementation here, but a good place to start is: Conversational user experience in the Bot Framework SDK - Bot Service | Microsoft Docs
There are two Azure Function Apps used in the implementation. The first is all about running the game and performing all game logic. The second is related to the Azure Percept card reading workflow and I will describe it further down below in this post. Note that the interface to the Azure Digital Twins is always through the Azure Function Layer. The Bot does not know about the Azure Digital Twins and does not interface directly. The README in the code repository has a full description of each game logic function, how it should be called, and how it should be used.
The game has two modes, non-camera and camera. In non-camera mode, the user just types in his card value with a simple syntax, like 3H, 9S, AC to denote value and suite. In camera mode, the user will place his card in front of the Percept Vision Module camera and it will read the value and suite from the card itself.
I wanted to show actual playing card images in the bot display and I did this by creating an Adaptive Card which is used for each hand (Player, Dealer, Community) and contains the card values as well as images of the cards. I am very thankful that the American Contract Bridge League provides Royalty-Free card images which you can use and edit. You can find these here: 52 Playing Cards - ACBL - Resource Center (mybigcommerce.com)
Here is what the output looks like in the Power Virtual Agent test harness:
In order to create the adaptive card themselves, I used the Adaptive Card Designer provided by Microsoft at Designer | Adaptive Cards.
Actual Bot Creation
Two separate Bots were created as part of this effort. The first used the Microsoft Bot Framework only, fully implemented through the Bot Framework Composer. This is a free application made available by Microsoft and it will greatly increase your bot development speed vs. coding a Bot by hand. I highly recommend this tool. Documentation and links to download are available here: Introduction to Bot Framework Composer - Bot Composer | Microsoft Docs. This is the version of the bot which is included in the GitHub repository.
Once I had this working, it was desirable for us to also have a version that ran as a Microsoft Power Virtual Agent, merely for additional demos in the MTC. Power Virtual Agents are fully featured Bots, but some of the features require that you also use the Bot Framework Composer for development, as Power Virtual Agents (PVAs) does not natively surface all functionality. At the time of this writing, one of these pieces of functionality which I needed was Adaptive Cards. The README in the GitHub repository details the specifics about how I was able to do the PVA conversion.
It should be noted that either the Azure Bot or the Power Virtual Assistant could also be integrated to run as a Bot in Microsoft Teams.
Modeling a Card Game with Azure Digital Twins
Having the ability to model the game state as the game was being played was part of the genesis for the whole demo. The digital twin model of the entire game is laid out with models and relationships between the models. Specifically, the models are:
- Card
- Game
- Deck
- Hand
The relationships between the models are
- The Game has Hands and Decks (one deck)
- The Hands have cards
- The Deck has Cards
Visually, the Azure Digital Twins Explorer will display a game in progress like so:
The Azure Digital Twins Explorer is showing all of the individual entities which are constructed within the model. The lines between the circles are the relationships between the entities.
Selecting an entity or a relationship in the model will bring up the right-side display which will detail all of the current telemetry values on the entity.
Interfacing with Azure Digital Twins from the Azure Functions was very straight forward. I used the C# DigitalTwins SDK and created a library of 13 methods that comprised all the calls I needed for the game to proceed. The details of this library are in the README of the GitHub repository, but if you are implementing code against this SDK, I really think there are only a couple of irregularities that you need to note.
The first is that there are two ways to interface with Azure Digital Twins. The DigitalTwins SDK/Rest API Calls or the Query Interface, both are within the Azure.DigitalTwins.Core library. The query interface takes a query in a near identical manner as the Azure Digital Twins Explorer Web Interface and then returns the twins which are in the response. For example, valid queries are:
Select the full digital twin graph:
SELECT * FROM digitaltwins
Select all cards:
SELECT * FROM DIGITALTWINS WHERE IS_OF_MODEL('dtmi:games:Card;1')
Select the 3 of Diamonds (from all twins, but only a Card will match):
SELECT D FROM DIGITALTWINS D WHERE D.Value = '3' and D.Suite = 'Diamond'
Select all cards in the deck:
SELECT D, C FROM DIGITALTWINS D JOIN C RELATED D.has_cards WHERE D.$dtId = 'Deck'
Select just the 3rd card in the deck:
SELECT C FROM DIGITALTWINS D JOIN C RELATED D.has_cards R WHERE D.$dtId = 'Deck' and R.cardOrder = 2
You can make changes to these returned twins, which will then change the digital twin representation. Due to the power of the query syntax, my first attempt was to just use this interface. However, it is important to note that changes made from this interface can take up to 10 seconds to be represented in the model. For certain data sets and solutions, this is absolutely fine, but for a game where users are watching the results, the timespan was too long. The DigitalTwins SDK/API does not have this lag time, and so I used this API for all calls where a user would expect to see the game update in near real time.
The second thing to pay attention to is that if you are using the query syntax and include query variables (required if you are going to have joins, as in the last two examples above), then the resulting JSON returned from the query is not identical as if there was no variable in the query. For example, the query:
SELECT * FROM DIGITALTWINS WHERE Value = '3' and Suite = 'Diamond'
Returns:
{
"result": [
{
"$dtId": "41",
"$etag": "W/\"04c385b4-446b-4120-a532-356751742310\"",
"$metadata": {
"$model": "dtmi:games:Card;1",
"Color": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"NumericalValue": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"Suite": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"Value": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
}
},
"Color": "Red",
"NumericalValue": 3,
"Suite": "Diamond",
"Value": "3"
}
]
}
While the query
SELECT C FROM DIGITALTWINS C WHERE C.Value = '3' and C.Suite = 'Diamond'
Returns:
{
"result": [
{
"C": {
"$dtId": "41",
"$etag": "W/\"04c385b4-446b-4120-a532-356751742310\"",
"$metadata": {
"$model": "dtmi:games:Card;1",
"Color": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"NumericalValue": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"Suite": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
},
"Value": {
"lastUpdateTime": "2021-10-24T22:53:50.3156660Z"
}
},
"Color": "Red",
"NumericalValue": 3,
"Suite": "Diamond",
"Value": "3"
}
}
]
}
Where the data is in a “C” stanza. This structural difference flows down to the SDK also. These two functions are equivalent, but take the differing return structures into account:
public static async Task<BasicDigitalTwin> GetOneTwin(DigitalTwinsClient dtclient, string query)
{
AsyncPageable<BasicDigitalTwin> twinList = dtclient.QueryAsync<BasicDigitalTwin>(query);
BasicDigitalTwin twin = null;
await foreach (var d in twinList)
{
twin = d;
}
return twin;
}
public static async Task<BasicDigitalTwin> GetOneTwinAliasedC(DigitalTwinsClient dtclient, string query)
{
AsyncPageable<AliasedCBasicDigitalTwin> twinList = dtclient.QueryAsync<AliasedCBasicDigitalTwin>(query);
BasicDigitalTwin twin = null;
await foreach (var d in twinList)
{
twin = d.Twin;
break;
}
return twin;
}
public class AliasedCBasicDigitalTwin
{
[JsonPropertyName("C")]
public BasicDigitalTwin Twin { get; set; }
}
The bottom line being that you need to think your design through clearly when wrapping the query calls in a library method, since different queries will produce different structured responses.
Reading the Cards with the Azure Percept DK Vision Module
Once the Bot was initially created, and the Azure Digital Twins service was running, I actually had a decent demonstrable game. However, it was missing a coolness factor, as the player merely typed in the card names. This was where the Azure Percept DK came into play – let’s have an IoT device just read in the cards.
I am not going to describe how to set up the Azure Percept for use with the Azure Percept Studio or how to use Azure Custom Vision to create a model to be used by the Azure Percept DK Vision module. If you need help getting started, please look at the following resources:
Set up the Azure Percept DK device | Microsoft Docs
Deploy a vision AI model to Azure Percept DK | Microsoft Docs
I do want to discuss the approach I took to develop the custom machine learning model, and then how I integrated the Azure Percept into the solution.
I started by seeing whether Azure Cognitive Services could recognize standard playing cards within the baseline Computer Vision model. It could not, so that meant I needed to create my own model. I am not a data scientist. I am an application developer. My goal for this section of the demo was to see how simply I could do this, but with adequate results.
My first thought was that training a model on a full deck of cards would just take me too long. I would need in excess of 1300 images to train a model to fully identify a card (e.g., this is a 2 of hearts). Then I realized, that really, there is no difference between a heart on a 2 vs a heart on a 5. If I could get the model to recognize the symbols, then I should only really need about 100 images.
The second piece of this was that I did not want to spend a lot of heavy brain power on this, so I was going to use the point and click Custom Vision web interface. Since I did not need a lot of images, and the standard instructions for the Percept DK already show how to integrate with the Custom Vision web interface, this seemed to be a safe choice. There are lots of ways to create a custom ML model within Azure and I’m sure that computer scientists using more advanced tools could create a more accurate model, but for a game demo, my belief was that I could achieve a “just fine” level of accuracy for my needs.
Finally, I concluded that I should not need to train the model to determine the value of the cards, since the value is written on the cards. Twice, actually. We have two cognitive services in Azure, Form Recognizer and Optical Character Recognition (OCR), which can read text in images. I should not need to train the model on items that I can just read with available pre-trained models.
This worked out very well. Let me go into details on suite recognition first.
I created a suite recognition model using 119 images. Roughly 2/3 of these images were pictures of me holding up a card in front of the Percept with the rest being random card images I pulled from the internet. Custom Vision reported my performance characteristics as the following:
Numerically speaking, these results are just okay. I think I will miss a bunch. Taking a look at the performance in real time, I see a lot of images like this:
This image is only notable as two of the diamonds are unrecognized. That is actually okay, since there are no cards with mixed suites. As long as I get one good identification, I am pretty much good to go. Even better, however, is the pathway that the suite recognition results take is Azure Percept -> Azure IoT Hub -> Azure Streaming Analytics -> Azure Function. When Streaming Analytics calls an Azure Function, it automatically batches up the results. You can specify maximum batch sizes or counts per call, and although I suppose you can set this to 1 to get a function call for each piece of data, I set it to 15. The Percept DK Vision Module runs at 30 frames per second, so by setting this to 15, I get roughly a function call twice a second. Inside of the function, I then sort the data by its confidence value and just take the highest one. This means that my processing is highly flexible on missing data within the model. The half second batch time also means that it is highly unlikely that I will have data from two different cards in the same batch.
Once I created the model, I needed to put it in the IoT Hub device twin, so that the Azure Percept would be sure to pick it up automatically on start up. This was very straight forward as the output format of the Custom Vision service is a .zip file which is the exact same format that the Azure Percept required. All I needed to do was take the zip file and place it in Azure Blob Storage. I used a SAS key along with the url and updated the IoT Hub device twin for the azureeyemodule to load my custom model.
Once the camera is on, it then forwards model outputs to Azure Streaming Analytics, as mentioned above. Streaming Analytics runs the following before calling the Azure Function:
SELECT
GetArrayElement(NEURAL_NETWORK,0).label as Label,
GetArrayElement(NEURAL_NETWORK,0).confidence as Confidence,
GetArrayElement(NEURAL_NETWORK,0).[timestamp] as [TimeStamp]
INTO
[HandlePlayerCard]
FROM
[PerceptIoTHub] TIMESTAMP BY EventEnqueuedUTCTime
WHERE GetArrayElement(NEURAL_NETWORK,0).label IS NOT NULL
This processing as all about data reduction. The GetArrayElement lines merely specify the only three pieces of the data which I need sent to the Azure Function. More importantly is the null check in the where clause. This will prevent any Azure Functions from being initiated if the Percept is not looking at a card (i.e. there is no suite identification).
The Azure Function called as part of the camera processing stream is named HandlePlayerCard. It can run from a separate function app then the game functions. Its only purpose is to fully identify the card, and then make a call to the Azure Digital Twin service to submit this as one of the cards in the player’s hand. To get the card value, I needed to send the Percept imagery to Azure Cognitive Services. I thought about how I should match up the imagery with the model output and came to the conclusion that I did not need to. The model output was going to be processed once every half second, and as long as I had an image that was taken +/- a half second around that, I should not have to worry about matching up the model data with the wrong card, since user’s hands just don’t move that fast.
I was able to get this working by using VLC, which is a downloadable media player (and available at https://www.videolan.org/vlc). VLC has a “Scene Filter” which copies a portion of the video stream as still frames onto the local hard where it is running. This can be configured to overwrite each time, so that there is only a single, latest image. Since the Azure Percept routes to my local WiFi (and the stream can be read from rtsp://xxx.xxx.xxx.xxx:8554/raw), this seemed to be the simplest solution to the problem. Alternatively, you could push the entire stream to Azure and use the Azure Video Analyzer (Azure Video Analyzer -- Video analytics | Microsoft Azure) to handle the matchup and advanced analytics. In my implementation, the benefit of using VLC was that I also use it as part of the application display.
I then created a background executable to run on this same computer, which merely takes the latest image and copies it into Azure Blob Storage. This also overwrites every time, so there is only the latest image stored.
As the HandlePlayerCard Azure Function is invoked, it uses a second file in the same blob storage container as a locking mechanism, so that multiple instances if the function will not attempt recognition at the same time. Once a card is recognized and sent to the game, any following instances will fail, as the game knows that it cannot take an identical card.
HandlePlayerCard will read the current image from blob storage and call Azure Cognitive Services to read the values off the card. One of the difficulties is 6’s and 9’s, as on standard playing cards, the 6 is merely an upside down 9 and vice-versa. To get around this confusion, I use a combination of Forms Processor and OCR to read the value.
Forms processor is called first. I do not specify a Form Template, but rather just take the first value returned. This should be the upper-left corner of the card. Forms processor does not seem to return any upside-down values, and I noticed that it usually does not have the second value (upside-down and the bottom-right). If no good value is returned, OCR is called using an imaginary line 75% of the way down the image. If the read object is below this line, if it is a 6 we assume it is a 9, and vice-versa. (I also make the same fix for “10” and “01”, but never actually saw this error occur).
As I mentioned earlier, I take all the suite identifications and sort them by confidence value and take the highest value. However, if the highest value is 0.2 or less, then we assume that the results cannot be trusted and do not enter the card into the game. Similarly, if we could not read a value, we just end the function invocation without calling the game. One half second later there will be more data and a new image. In Camera mode, the bot will query the Azure Digital Twins (by way of an Azure Function) to determine how many player cards it has, and then retrieve the values as necessary.
And in Closing
I hope that this architectural explanation of a card game is helpful in describing how Azure IoT Architectures can be put together into real-world IoT solutions. Although the pieces here are directed towards a card game, they are the same pieces which could be used for any IoT Architecture.
Azure Digital Twins provides an ideal modeling solution for visualization and analytics of IoT systems. Through its built-in mechanisms, you can examine IoT interactions, and it provides a consistent data plane for tools like dashboarding. The Azure Percept DK is a fully fledged IoT device which can be used as part of the Azure Percept Studio to develop and run custom machine learning models as part of an IoT solution and easily bring this data into Azure for further analysis.
Here are some helpful starting point links:
Learn more about Azure Digital Twins: Digital Twins – Modeling and Simulations | Microsoft Azure
Learn more about Azure Percept: Azure Percept | Edge Computing Solution | Microsoft Azure
Learn more about Azure Functions: Azure Functions – Serverless Apps and Computing | Microsoft Azure
Learn more about the Azure Bot Service: Azure Bot Service – Conversational AI Application | Microsoft Azure