BearID Project Header
BearID Project is a non-profit organization staffed by conservation scientists and volunteer software engineers. Our aim is to progress the field of conservation technology by developing individual identification software applied to camera trap data for use in noninvasive wildlife monitoring. Our initial application focuses on brown bears and provides the foundation for the development of individual recognition for other wildlife, which could aid conservation efforts worldwide.
While the current BearID application significantly reduces to time to process camera data, it still takes weeks to months to determine which bears were seen. This is mainly due to the need to retrieve the memory cards from the field. Connected cameras could reduce the time lag, however, streaming video directly to the cloud requires considerable data bandwidth and device power. Bandwidth and power pose significant challenges due to the remote location of the cameras. The ability to detect and identify individuals directly on the camera combined with connectivity could enable remote device management and near real-time monitoring and notification. Intelligent sensing devices with renewable power and long-range connectivity would be a game-changer for conservation science.
As a Microsoft AI for Earth grantee with the BearID Project I was introduced to Azure Percept. Azure Percept provides an easy way to set up and manage IoT devices and experiment with detection and identification models on an edge-computing device like the Azure Percept. This would serve as a first step toward on-camera machine learning. This blog will walk through the steps we followed from setting up Azure Percept to training and deploying a model and running inference remotely to detect and identify a bear.
Overview of the Use Case
Using Azure Percept and Azure Custom Vision, I developed a prototype camera trap with bear face detection and individual identification. This consisted of training a machine learning model with Azure Custom Vision, deploying it to an Azure Percept DK and using it to identify bears by pointing the Azure Percept Vision at various photos of bears.
Here is an architecture diagram of the solution:
Architecture Diagram
Setup
The Azure Percept DK comes with 3 main components:
- Azure Percept Developer Board - the main compute board powered by the NXP i.MX 8M application processor, powered by 4 Arm Cortex-A53 cores
- Azure Percept Vision - a vision system-on-module (SoM) with an Intel Movidius Myriad X (MA2085) vision processing unit (VPU) and an RGB camera sensor
- Azure Percept Audio - an audio system-on-module (SoM) with an XMOS XUF208 multicore microcontroller and 4 MEMS microphones
Azure Percept DK
Setting up the device is easy, just follow the quickstart guide. I did have an issue when creating the Azure IoT Hub. I tried to add it to an existing resource group but got an error. The error message didn't provide much information. When I tried to create a new resource group, I got the same error. However, when I tried using the new resource group a second time, I was able to complete the process. If you do get errors, check the resource group activity log for more information on the error (I discovered this after the fact!).
Once the Percept DK was setup, I was easily able to connect to the device using Azure Percept Studio. From the Percept Studio Overview, select Devices and select your device from the list. On the device page, switch to the Vision tab. Under Actions, click View stream. You should see the camera feed in a browser window. Most likely it is already running a default object detection model. My view looked like this:
Azure Percept Webstream
My helper, Kodi (aka teddy bear: 0.99), is looking forward to identifying some friends. Let's get started!
Training a model
As one of our early experiments with the AI for Earth grant, we trained a bear face detection model using Azure Custom Vision and the SDK for Python (it properly detects Kodi!). I had written a blog post, Object Detection with Azure Custom Vision, which describes a low-code approach to the problem. That post will serve as a starting point for this one. This time we want to not only find the bear faces, but we want to label them with the individual bear's identification (a name or number). As with the bear face detector posted previously, we will follow the Quickstart: Create an object detection project with the Custom Vision client library guide and use the Python SDK do the following:
- Create a new Custom Vision project
- Add tags to the project
- Upload and tag images
- Train the model
If you are just getting started with Custom Vision, follow the prerequisites and Setting up sections of the guide. The rest of this post assumes familiarity with utilizing Custom Vision for object detection.
BearID Camera Trap Results
Create a new Custom Vision project
I used the same code as with the face detector to setup up the Azure Custom Vision object detection project and credentials. The only change was in naming the project in the Custom Vision API call trainer.create_project, which I changed from face-resize to face-id-resize:
project = trainer.create_project("face-resize", domain_id=obj_detection_domain.id)
Initially I used the General domain, which I could easily test on the server. For use with the Percept DK, I used the General (compact) domain. The compact domain is optimized for edge devices. For more information on domains, see Select a domain for a Custom Vision project.
Add tags to the project
Custom Vision first needs a list of tags, or labels, for the objects we want to detect. In the bear face detector, we had a box drawn around each bear's face and all the boxes were labelled as bear. This time, the boxes remain the same, but the labels will correspond to the individual bear's identification. For training object detectors, the Custom Vision documentation recommends a minimum of 50 instances of each object for training. When all the objects were labeled as bear, we had more than 3000 instances. For most of the bears in our dataset, we don't have that many images (and, sadly, no Kodi). We will need to pull out a subset of bears with 50 or more images.
Our dataset uses an XML file format defined by dlib’s imglab tool. We have written a parser in Python, xml_utils.py, which can be found in the tools directory in the bearid GitHub repository. The parser reads the metadata into a dictionary, keyed by the bear ID. I have bearid cloned at ~/dev/bearid. We can import xml_utils and a few other common libraries:
import sys
sys.path.append('~/dev/bearid/tools')
import xml_utils as x
from collections import defaultdict
from PIL import Image
We can read in the XML file and load the objects from it using the load_objs_from_files function in xml_utils.
objs_d = defaultdict(list)
x.load_objs_from_files(['faceGold_train_resize.xml'], objs_d, 'faces')
Next we loop through the keys to find those with more 50 images. We will use this set as our list of tags, which we can set using the Custom Vision API, trainer.create_tag. In this case we end up with a set of 21 individual bears. Here's the code:
MIN_TAGS = 50
# get all tags from XML and loop through them to create_tag in project
label_tag = defaultdict()
for key, objs in list(objs_d.items()) :
if (len(objs) < MIN_TAGS):
continue
label_tag[key] = trainer.create_tag(project.id, key)
Upload and tag images
Now we need to upload the dataset to Azure Custom Vision. We will only be uploading the images of the 21 bears where we have at least 50. For each image, we care about the image file and box information. The Custom Vision API allows you to upload images in batches of 64. So let’s set up a constant for the batch size and keep track of the current image_list and image_count:
MAX_IMAGE_BATCH = 64
image_list = []
image_count = 0
The next block of code is nearly the same as for the bear face object detector. Again, the primary difference is that we use the bear ID as the key (label). Here's what is does:
- Loop through labels (key is the label)
- Loop through all the objs for one key (an obj in this case is an image)
- For each obj, get the image file with all the tags (labels) and all the regions (bounding boxes)
- Upload the batch of images for each label
- Break after uploading 64 images of a label (this will help normalize the distribution of images per bear to 50-64)
Here's the code:
MAX_IMAGE_BATCH = 64
# loop through all the labels and get their corresponding objects
for key, objs in list(objs_d.items()) :
obj_count = 0
obj_size = len(objs)
print(key, obj_size)
# if there are less than 50 images, skip it
if (obj_size < MIN_TAGS):
continue
# loop through objects (images) for each label
for obj in objs :
image_count += 1
obj_count += 1
file_name = obj.attrib.get('file')
print("Image:", image_count, file_name)
img = Image.open(file_name)
width,height = img.size
regions = []
# find all the bounding boxes
for box in obj.findall('box') :
bleft = int (box.attrib.get('left'))
btop = int (box.attrib.get('top'))
bheight = int (box.attrib.get('height'))
bwidth = int (box.attrib.get('width'))
# add bounding box to regions, and translate coordinates
# from absolute (pixel) to relative (percentage)
regions.append(Region(tag_id=label_tag[key].id, left=bleft/width,top=btop/height,width=bwidth/width,height=bheight/height))
# add object to the image list
with open(file_name, "rb") as image_contents:
image_list.append(ImageFileCreateEntry(name=file_name, contents=image_contents.read(), regions=regions))
# if this is the last image or if we hit the batch size
# then upload the images
if ((obj_count == obj_size) or ((obj_count % MAX_IMAGE_BATCH) == 0)):
print("Upload batch:", key, obj_count)
upload_result = trainer.create_images_from_files(project.id, ImageFileCreateBatch(images=image_list))
if not upload_result.is_batch_successful:
print("Image batch upload failed.")
for image in upload_result.images:
if ((image.status != "OKDuplicate") and (image.status != "OK")) :
print("Image status: ", image.status)
exit(-1)
print("Continue...")
image_list.clear()
obj_count = 0
obj_size -= MAX_IMAGE_BATCH
# To upload a max of MAX_IMAGE_BATCH, uncomment the next line
break
You can view your labeled dataset in the web portal:
Custom Vision Dataset
You can also use the web portal to edit your labels as needed.
Train the project
Once your dataset is ready, it is time for training. In the bear face object detector post I described how to use the Python API to start a training iteration. I did this for the General domain. I did some experimentation with other domains using the web portal. For use with the Percept DK, I trained an iteration using the General (compact) domain by changing the domain in the project settings. In the training dialog box, I selected Advanced Training and set the budget for 1 hour.
Once training is complete, you can see the cross-validation performance on the web portal:
Custom Vision Model Performance
In this case, for a probability threshold of 50% and an overlap threshold of 30%, we are getting 79.4% mean Average Precision with a Precision of 78.2% and a Recall of 67.4. A longer training budget may result in better performance. Kodi wants to get to the good stuff, so this is good enough for now.
Deploying a model
Now we need to deploy our new face-id-resize model to the Azure Percept DK. Since we trained our model in Azure Custom Vision, deployment is a snap with Azure Percept Studio. In Percept Studio, go to the Vision tab on your device page. Click on Deploy project. In the popup card, you can select the Custom Vision model and iteration you want to deploy.
Check out the Deploy a vision AI model to Azure Percept DK guide for more details.
Identifying a bear
Azure Percept and Target Photo
After deployment completes, the stream view should show the camera feed with an overlay of our custom model detections. For the `face-id-resize` model, we need some bears. Since Kodi is not in our dataset, and none of the 21 bears in the dataset are in or around my home (they all reside in Alaska or British Columbia), we'll use some images of bears.
Rather than point the Azure Percept Vision at a computer screen and viewing images from our test set, I opted to use the Katmai Conservancy Fat Bear Calendar 2022. I selected the September bear, 128 Grazer (bf_128 in our dataset), who is one of the 21 bears in our data subset (Kodi says Grazer is a ferocious mother bear!). You can see the result from the stream view below, which shows a bounding box around Grazer's face, the predicted label (bf_128) and the confidence score (0.79).
The model found bear faces in the calendar images quite reliably. However, the identification was not always correct. Many of the bears in the calendar are not in the subset we used. Even for the bears in subset, there were some misidentifications. Some errors could be due to changes in the bear's appearance, as our dataset is mainly from photos taken 2014-2017. Some errors and variation in the confidence score are likely due to the angles and glare involved with pointing the Azure Percept Vision at a glossy photo.
Azure Percept Webstream with Bear Identified
With Azure Custom Vision and the Azure Percept DK we were able to go from setup to identifying individual bears in a matter of hours.
Conclusion
Azure Percept DK is an easy to set up IoT device capable of vision processing at the edge. Azure Custom Vision makes it simple to create no-code or low-code machine learning vision models which can easily be loaded to the Percept DK using Azure Percept Studio. With this array of tools, vision models can be trained in hours then deployed at the edge in minutes.
Azure Percept is great start for conservation scientists wanting to experiment with noninvasive monitoring.