Educator Developer Blog

5 MIN READ

Microsoft Project15 & University of Oxford Capstone Project with Elephant Listening Project Team 4

Microsoft

Apr 08, 2021

Oxford's AI Group 4 Project 15 Writeup

Who are we?

The goal of the project was to count the number of elephants in a sound file.

To do so, we detected whether rumbles are belonging to the same elephant or not

Poole, Joyce H. (1999). Signals and assessment in African elephants: evidence from playback experiments. Animal Behaviour, 58(1), 185-193
Jarne, Cecilia (2019). A method for estimation of fundamental frequency for tonal sounds inspired on bird song studies. MethodX, 6, 124-131
Stoeger, Angela S. et al (2012). Visualizing Sound Emission of Elephant Vocalizations: Evidence for Two Rumble Production Types.
O'Connell-Rodwell, C.E. et al (2000). Seismic properties of Asian elephant (Elephas maximus) vocalizations and locomotion. Journal of the Acoustic Society of America, 108(6), 3066-3072
Heffner, R. S., & Heffner, H. E. (1982). Hearing in the elephant (Elephas maximus): Absolute sensitivity, frequency discrimination, and sound localization. Journal of Comparative and Physiological Psychology, 96(6), 926–944
Elephant Listening Project, Cornell University: https://elephantlisteningproject.org/
Project 15, Microsoft: https://microsoft.github.io/project15/

Sound files can be analysed by transforming them into a 2D image: a spectrogram of time (seconds) vs frequency (Hertz). The third dimension is sound intensity (decibel), which can be shown as a colour or grayscale.
Elephants produce rumbles to communicate with a typical frequency of 10 – 50 Hz and lasting 2 - 6 seconds
One elephant rumble will have many harmonics, which are sound waves of increasing frequency.
An elephant can be identified by its base frequency. If there are two slightly overlapping or separated rumbles with a different base frequency, they probably belong to separate animals.

We received a set of sounds files (.wav) and metadata that pointed us to the segments where elephants were likely to produce rumbles.

Challenges:

Segmenting data: based the metadata files, we create segments of a few seconds that contain the interesting information
Spectrograms: each data segment is transformed into a 2D image of time vs frequency (10-50 Hz), using FFT transformation algorithm, lowpass/highpass filters, and frequency filters
Noise reduction: each spectrogram is reduced of noise and transformed into a simple monochrome (black and white) image
Contours detection: each monochrome image is evaluated with a contour detection algorithm, to distinguish the separate 'objects' which in our case are the elephant rumbles
Boxing: for each contour (potential elephant rumble) we calculate the size (height and width) by drawing a box around the contour
Counting: we compare the boxes that identify the rumbles to each other in each spectrogram. Based on a few business rules, we count the number of unique elephant rumbles in each image

The source code is made available at: https://github.com/AI-Cloud-and-Edge-Implementations/Project15-G4
All code is written in Python and runs on premise or in the cloud (Azure)
We used the following frameworks to process and analyze the data:
- boto3 for connecting to Amazon AWS
- Numpy, Pandas, SciPy and MatPlotLib for statistical analysis and visualization
- Librosa for FFT
- noisereduce for noice reduction
- SoundFile
- OpenCV for contour detection

We analysed 3935 elephant sounds:
- 112 spectrograms were identified as containing 0 elephants
- 3277 spectrograms were identified as containing 1 elephant
- 505 spectrograms were identified as containing 2 elephants
- 40 spectrograms were identified as containing 3 elephants

The boxing algorithm was evaluated by Liz Rowland of Cornell University
The reported accuracy of the model is:
- 97.29 % for the Training dataset (3180 cases)
- 99.29 % for the Testing dataset (758 cases)
- This proves that the model is useful for counting elephants
In combination with other models (elephant detection), many interesting use case can be built with this model, for example visualizing elephant movements and detecting poaching

Aim
Using the processed spectrogram data as an input to a CNN to automatically categorise how many elephants are present
Why are we doing this?
- To enable automation the workflow end to end
- To improve accuracy by reducing human error
- To save time, enabling researchers to focus their attention on complex problems
Our Approach
Transfer learning looks to take advantage of models which have been pre-trained on large datasets, then fine tuning to our specific problem. This approach is becoming very popular for several reasons (quicker time to train, better performance, not needing lots of data) and we found it to work well.

Implemented using keras with a tensorflow backend.
To evaluate the performance of our models we looked at the following measures of our two most promising architectures:

Sound files

Machine learning on spectrograms using labelled data
Automatic classification and better acoustic analysis (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0048907)
Further fine-tuning of the boxing algorithm might lead to even better results, e.g.
- Fixing the time axis in the spectrograms
- Increasing the frequency range
- Other (better) noise reduction techniques

Elephant counting based on base frequency analysis is possible
The team delivered a ready-to-use software library for counting elephants that with a high accuracy (97% on selected cases)
The software can be used in the IoT Hub (Project 15) or on-premise
The application can be integrated into other software
A machine learning model (VGG or Resnet50) could be used to count the elephants instead of the rule-based boxing algorithm
Further research is needed to improve the results, for example for broadening to other species