August 25, 2023

Flying High with AI: Counting Pelican Breeding Pairs in the Danube Delta

Learn how we build computer vision models to detect and classify the pelican population in Romania as well as automate the evaluation of the breeding population based on aerial photographs.

Imagine flying in a small airplane over the vast wetlands of the Danube Delta on the shores of the Black Sea in Romania looking for patches of small white dots: great white pelicans (Pelecanus onocrotalus). While flying over the colonies researchers like Sebastian Bugariu from the Romanian Ornithological Society (ROS) take photos which will be used to count the number of breeding birds when back in the office. The number of breeding pairs has grown from ~5,000 pairs 15 years ago to recently ~18,000 pairs. Keeping good records of the breeding numbers is important but not an easy task. Back at the office, it can take weeks to go through the images and manually count the pelicans. Wouldn’t it be great if this process could be automated, freeing up time that could be dedicated to other important conservation? This is where the AI for Pelican challenge started.

“1 image can capture more than 4,000 pelicans”

The challenge is how to count pelicans in an image. Colonies can be dense with overlapping pelicans, captured in different poses and angles, and variable conditions. We divided the problem into two stages of increasing difficulty: 

  1. count the number of pelicans in an image and
  2. differentiate between age (juvenile and adult) and for adults differentiate between breeding, non-breeding and unknown classes.

This approach would provide the needed information to estimate the breeding population size and estimate the breeding success through the number of juveniles. Using AI instead of humans could increase the speed of the analysis, reduce biases if different people annotate the data, and standardize the process. 

Fig 1. Aerial image of a pelican colony and examples of the four different categories we like to detect.

How we tackled this Challenge

In order to increase our chances of obtaining a high-performance model, we divided into two sub-teams. Team 1 focused on selecting, training, and testing various object detection models, while Team 2 focused on training and deploying a Yolov8 and developing a workflow for SOR to aid pelican counting in the future. Before division, we prioritized data annotation, which is crucial for project success. We received a wealth of aerial images from 2009 to 2018, which were annotated in Photoshop using point labels. Our first task was to convert these pixel values to bounding boxes for use in computer vision models.

Data annotation

In the initial annotations from SOR, individual pelicans were manually located and assigned to one of four categories using the ‘Count tool’ in Photoshop. However, the proprietary .psd file format is not directly accepted in open-source frameworks like PyTorch or TensorFlow. In the first week of the challenge, we discovered two methods (using Python or the psdtools library) which we used to convert the images to a standard file format and extract annotations into a tabular format. We then tried to convert the point annotation into a bounding box, but this method was not very accurate due to variations in the angle of the photo, the posture of the bird, and the surroundings. This problem, combined with a large imbalance in label frequency, led us to re-annotate a set of images.

The Semi-supervised approach

We manually annotated 12 images using Roboflow in two steps (V1 and V2). The annotated images were then used to train a YOLOv8 model to obtain a rough set of bounding boxes around pelicans. The model's performance was impressive, particularly in adult breeding colonies with well-spaced-out pelicans. However, we had to redo the annotation for dense clusters of pelicans and juvenile pelicans. Eventually, we obtained 21 well-annotated images for the model's training, validation, and testing. Although we only have 21 images to work with, they are very large (6000x4000 pixels) and we can make them suitable for training machine-learning algorithms. Currently, the maximum image input size for the Yolov8 is 512x512. Meaning that each image could be cut into approximately ~90 tiles. 

Fig 2. An overview of the image annotation process.
Fig 3. Example of a labeled image with bounding boxes 
Fig 4. Classes and counts of Figure 3.

Team 1 - Model selection

We divided the project into three stages. Firstly, we created a shortlist of nine models. Then, we conducted in-depth studies on each model to identify the most useful ones for the task at hand. Our focus criteria included complexity, resourcefulness, results, and timeframe for completion. Table 1 provides a summary of all the models.

Table 1. An overview of model performance based on existing literature.
Several models performed well with moderate complexity, including the YOLO models which we focussed on for the next steps. We focus on YOLOv8 which is the latest YOLO release and started training two architectures small (s) and medium (m) which differ in their number of parameters and network depth. The models were initially trained with 300 epochs with an early stopping regularization technique of 50 epochs aiming to minimize the risk of overfitting. 

The performance was validated using mAP-50. The results in Table 2 show very similar performance but given better accuracy we picked YOLOv8s to test on the test images.

Table 2. Model validation of the mAP50 scores of YOLOv8 small and medium models
Theoretically, smaller models with fewer parameters are expected to have lower accuracy compared to medium, large, and extra-large models. However, it was found that not only medium models but also large and extra-large models had lower accuracy. We can speculate that larger architectures tend to overfit, making them poor at generalizing when handling new/test datasets.

Team 2 - Model implementation

The model implementation team focused on developing an efficient and user-friendly workflow around YOLOv8.
Before training could start, we first: 

  • Extracted annotations from Photoshop PSD files.
  • Tiled images in squares of 1024 * 1024 pixels and saved the cropped images with the corresponding labels.
  • Created Yaml file with training file locations, number of classes, and class names.

Subsequently, the model was trained on 31 images (21 Photoshop and 10 Roboflow annotated images).

There are 2 models created, a ‘General Class’ and a ‘Multi Class’.

  • ‘General Class’  is for detecting pelicans as a single class.
  • ‘Multi Class’ is for detecting counting Nesting, Not Nesting, Juvenile, and Uncertain pelicans.

                                                          General Class                                      Multi-Class

Fig 5. Confusion matrix for the multi-class model
The confusion matrix in fig 5 shows the performance of the Multi model we trained. For the general detection we reached a pretty impressive accuracy of 91% and using multi-class detection we reached 90+% for each class in the multi-class model. 

The final task for Team 2 was to implement the model into a GUI, as shown in Figure 6. Key parameters that the user can set include the detection confidence threshold, intersection over union percentage threshold, and maximum number of detections per image. Additionally, the user can choose to output results compatible with either Photoshop or Roboflow, and can even select or add another YoloV8 model for predictions.

The GUI application is coded using Python and utilizes the ‘ttkbootstrap’ library. The application is created using ‘pyinstaller’.

Fig 6. The user interface for the YOLOv8 model pre-trained on pelican data.

Achievements and Hurdles

After ten weeks of work, the team successfully created a user-friendly GUI for pelican detection using the YOLOv8 detector. However, the detector struggles with fine-grained classification, particularly with juveniles, despite its accuracy in detecting adult pelicans.

We faced challenges with class imbalance and YOLO's limited capability to detect clustered individuals. There is certainly room for further improvements especially through increasing the training dataset with more images with high-quality labels, retraining the model, and fine-tuning the hyperparameters.

Using this new counting tool, SOR researchers can reduce manual counting time for pelicans. We hope the success with an off-the-shelf YOLO model encourages others to try it on their own data sets and be as pleasantly surprised as we are! ‍

Personal take away

Davide

This was my first challenge with FruitPunch AI and I really enjoyed it! It was great to meet likeminded people from around the world and work together towards a common goal. Through this challenge, I definitely had a chance to improve my soft skills as well as my technical skills. I look forward to keeping using AI for Good! 

Adriaan

AI for Pelicans was my second challenge with FruitPunch. It was a great and fun project working with everyone in this group. We started annotating images and ended with a good endresult for making predictions. There were some challenges, but that helps to improve the technical skills 🙂.

Closing...

A big thank you to everyone the participated in this Challenge and made these amazing results possible! 

Sebsatian Bugariu, Achuka Simon Allan, Ian Ocholla, Kabeer Nthakur, Yastika Joshi, Adriaan Kastelein, Davide Coppola, Olga Rudakova, Ștefan Istrate, Thor Veen, Jaka Cikač

AI for Wildlife
Computer vision
Object detection
Challenge results
Subscribe to our newsletter

Be the first to know when a new AI for Good challenge is launched. Keep up do date with the latest AI for Good news.

* indicates required
Thank you!

We’ve just sent you a confirmation email.

We know, this can be annoying, but we want to make sure we don’t spam anyone. Please, check out your inbox and confirm the link in the email.

Once confirmed, you’ll be ready to go!

Oops! Something went wrong while submitting the form.