First published on MSDN on Jan 20, 2018
This a post based on my colleague Anze Vodovnik demo at this Cambridge Hack
www.vodovnik.com/2018/01/20/a-look-at-computer-vision/
The following is a short step by step tutorial on how to build a .NET core application and run this on Apple iPhone X with no connectivity
Getting Started
To get started, go to
https://customvision.ai
. You’ll be greeted by a page allowing you to create a new model (or a list of models if you have them already).
Create a New Custom Vision Model
See
https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/09/07/using-microsoft-customvision-a...
for a full step by step guide
For this demo were going to build an app which identifies drinks.
Once the model is created, it’s time to start loading our images and tagging them. I’ve compiled (searched for and borrowed from the internet) photos of various different drink types, like wine, beer, shots and cocktails.
Once you have uploaded all the photos, you will have the option to tag them. So, you start with some photos of wine, and you can tag them with
wine
. You repeat the process for all the other classes and voila, your model is born.
Train the model
When we’ve uploaded the photos, we’re only half way through. Now we need to actually
train
the model to do something based on those images. There is a big green button at the top called
Train
.
The nice thing is that you also immediately see how goo the training data seems to be. As you move forward with the model, you will likely end up with multiple iterations the
customvision.ai
service allows you to have a maximum of 20 iterations in its current preview mode.
You are able to get back to any of the previous iteration had if they have a much better
precision
and
recall
rate, so I’ve elected to keep those. You’ll notice there’s an
Export
button on top of that page. And that is the next step…
Export the Model
When we click the
Export
we can choose either CoreML (iOS 11) or TensorFlow (Android). Because I’m writing an iOS app, the choice was obvious.
That downloads a file ending with .mlmodel. You need to drag and drop that model into Xcode and you’re good to go. But, more on that later…
Step 2: Build the iOS App
Next, I needed an iOS app. Because I’m not an iOS app developer, I’ve elected to stick to the sample that the product team built (and it does exactly what it says on the tin). It’s available over on
GitHub
and you can get started by simply cloning that repository and modifying the bundle identifier and make sure you select the right team.
Note: you still need your Apple Developer Account.
When you clone that, it will come with a pre-built model for you, of fruit. But that’s boring…
To make things more fun, we will drag that .mlmodel file we’ve downloaded earlier. Xcode is pretty good at making sure all of the things are set correctly.
The important bit for us is that it
automatically
generates a class based on the name of the model – in my case, Drinks1 . This is relevant for the next step.
Change the app to use the model
Now that the model is in our app, we need to tell the code to use it. To do that, we will be changing ViewController .
Specifically, there is a line of code that initialises the CoreML model and we need it to look like this:
let model = try VNCoreMLModel(for: Drinks1().model)
Obviously, the key thing for us is the Drinks1 name, representing the class generated from the model we’ve imported.
Step 3: Test the app
Once that’s changed, the app is good to go. I’ve run it on my iPhone X and pointed it towards an image of a wine glass and a shot. These are the results:
Important bit to grasp here is that this is
fully offline
, so it doesn’t need a connection to do this. So, we’ve trained our own model using Microsoft’s pre-built and optimised networks, exported that to a CoreML model and used it straight from our Swift app.
Bonus: REST API from a .NET Core App on a Mac
The above example is cool, but it doesn’t cover everything, and your model may be evolving constantly, etc. There is a prediction API available and exposed from the service as well meaning that for each model you build, you can also get an API endpoint to which you can send either an image URL or the image itself, and get back a prediction.
Naturally, the only reasonable thing to do was to get down and dirty, and use this morning to quickly build an example app to showcase that as well.
Make sure you’re environment is setup by following the
instructions here
. Next, launch a terminal and create a new Console app and run
dotnet new console --name MyAwesomeName
Then, open the Program.cs in your
favourite editor
and make it look something like this:
using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;
namespace Dev
{
class Program
{
static void Main(string[] args)
{
Console.Write("Enter image file path: ");
string imageFilePath = Console.ReadLine();
Task.Run(() => MakePredictionRequest(imageFilePath));
Console.WriteLine("\n\n\nHit ENTER to exit...");
Console.ReadLine();
}
static byte[] GetImageAsByteArray(string imageFilePath)
{
FileStream fileStream = new FileStream(imageFilePath, FileMode.Open, FileAccess.Read);
BinaryReader binaryReader = new BinaryReader(fileStream);
return binaryReader.ReadBytes((int)fileStream.Length);
}
static async void MakePredictionRequest(string imageFilePath)
{
var client = new HttpClient();
// Request headers - replace this example key with your valid subscription key.
client.DefaultRequestHeaders.Add("Prediction-Key", "your prediction key here");
// Prediction URL - replace this example URL with your valid prediction URL.
string url = "your prediction URL here";
HttpResponseMessage response;
// Request body. Try this sample with a locally stored image.
byte[] byteData = GetImageAsByteArray(imageFilePath);
using (var content = new ByteArrayContent(byteData))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = await client.PostAsync(url, content);
Console.WriteLine(await response.Content.ReadAsStringAsync());
}
}
}
}
There are two placeholders for your prediction URL and prediction key. You get the latter when you open the model in Custom Vision and click on the little
World
icon.
You then need to open the
Settings
tab in the upper right corner and get the Subscription key. Once that’s updated in the code, you can build and run it, either from Visual Studio Code, or from the terminal.
You’ll see this returns the tag
‘Nike’
which is great, because that’s exactly what trainers I wearing at the event:
The model used in this example is one that was pre-built, and contains a lot of Adidas and Nike shoes two tag 27 images of Adidas and 30 images of Nike Trainers. The aim, of course, being that we are able to differentiate between them. The model looks like this:
So, with that, that should give you a quick and dirty start into the world of Computer Vision.
Additional Resources
Building a model for Android Devices
https://blogs.msdn.microsoft.com/uk_faculty_connection/2018/01/20/using-customvision-ai-and-building...