Once the model is created, it’s time to start loading our images and tagging them. I’ve compiled (searched for and borrowed from the internet) photos of various different drink types, like wine, beer, shots and cocktails.
Once you have uploaded all the photos, you will have the option to tag them. So, you start with some photos of wine, and you can tag them with
. You repeat the process for all the other classes and voila, your model is born.
Train the model
When we’ve uploaded the photos, we’re only half way through. Now we need to actually
the model to do something based on those images. There is a big green button at the top called
The nice thing is that you also immediately see how goo the training data seems to be. As you move forward with the model, you will likely end up with multiple iterations the
service allows you to have a maximum of 20 iterations in its current preview mode.
You are able to get back to any of the previous iteration had if they have a much better
rate, so I’ve elected to keep those. You’ll notice there’s an
button on top of that page. And that is the next step…
Export the Model
When we click the
we can choose either CoreML (iOS 11) or TensorFlow (Android). Because I’m writing an iOS app, the choice was obvious.
That downloads a file ending with .mlmodel. You need to drag and drop that model into Xcode and you’re good to go. But, more on that later…
Step 2: Build the iOS App
Next, I needed an iOS app. Because I’m not an iOS app developer, I’ve elected to stick to the sample that the product team built (and it does exactly what it says on the tin). It’s available over on
and you can get started by simply cloning that repository and modifying the bundle identifier and make sure you select the right team.
Note: you still need your Apple Developer Account.
When you clone that, it will come with a pre-built model for you, of fruit. But that’s boring…
To make things more fun, we will drag that .mlmodel file we’ve downloaded earlier. Xcode is pretty good at making sure all of the things are set correctly.
The important bit for us is that it
generates a class based on the name of the model – in my case, Drinks1 . This is relevant for the next step.
Change the app to use the model
Now that the model is in our app, we need to tell the code to use it. To do that, we will be changing ViewController .
Specifically, there is a line of code that initialises the CoreML model and we need it to look like this:
let model = try VNCoreMLModel(for: Drinks1().model)
Obviously, the key thing for us is the Drinks1 name, representing the class generated from the model we’ve imported.
Step 3: Test the app
Once that’s changed, the app is good to go. I’ve run it on my iPhone X and pointed it towards an image of a wine glass and a shot. These are the results:
Important bit to grasp here is that this is
, so it doesn’t need a connection to do this. So, we’ve trained our own model using Microsoft’s pre-built and optimised networks, exported that to a CoreML model and used it straight from our Swift app.
Bonus: REST API from a .NET Core App on a Mac
The above example is cool, but it doesn’t cover everything, and your model may be evolving constantly, etc. There is a prediction API available and exposed from the service as well meaning that for each model you build, you can also get an API endpoint to which you can send either an image URL or the image itself, and get back a prediction.
Naturally, the only reasonable thing to do was to get down and dirty, and use this morning to quickly build an example app to showcase that as well.
Make sure you’re environment is setup by following the
. Next, launch a terminal and create a new Console app and run
There are two placeholders for your prediction URL and prediction key. You get the latter when you open the model in Custom Vision and click on the little
You then need to open the
tab in the upper right corner and get the Subscription key. Once that’s updated in the code, you can build and run it, either from Visual Studio Code, or from the terminal.
You’ll see this returns the tag
which is great, because that’s exactly what trainers I wearing at the event:
The model used in this example is one that was pre-built, and contains a lot of Adidas and Nike shoes two tag 27 images of Adidas and 30 images of Nike Trainers. The aim, of course, being that we are able to differentiate between them. The model looks like this:
So, with that, that should give you a quick and dirty start into the world of Computer Vision.