The Bear Necessity of AI in Conservation
The AI for Bears Challenge results which aims to improve the monitoring and identification of bears using advanced computer vision techniques.
Imagine flying in a small airplane over the vast wetlands of the Danube Delta on the shores of the Black Sea in Romania looking for patches of small white dots: great white pelicans (Pelecanus onocrotalus). While flying over the colonies researchers like Sebastian Bugariu from the Romanian Ornithological Society (ROS) take photos which will be used to count the number of breeding birds when back in the office. The number of breeding pairs has grown from ~5,000 pairs 15 years ago to recently ~18,000 pairs. Keeping good records of the breeding numbers is important but not an easy task. Back at the office, it can take weeks to go through the images and manually count the pelicans. Wouldn’t it be great if this process could be automated, freeing up time that could be dedicated to other important conservation? This is where the AI for Pelican challenge started.
The challenge is how to count pelicans in an image. Colonies can be dense with overlapping pelicans, captured in different poses and angles, and variable conditions. We divided the problem into two stages of increasing difficulty:
This approach would provide the needed information to estimate the breeding population size and estimate the breeding success through the number of juveniles. Using AI instead of humans could increase the speed of the analysis, reduce biases if different people annotate the data, and standardize the process.
In order to increase our chances of obtaining a high-performance model, we divided into two sub-teams. Team 1 focused on selecting, training, and testing various object detection models, while Team 2 focused on training and deploying a Yolov8 and developing a workflow for SOR to aid pelican counting in the future. Before division, we prioritized data annotation, which is crucial for project success. We received a wealth of aerial images from 2009 to 2018, which were annotated in Photoshop using point labels. Our first task was to convert these pixel values to bounding boxes for use in computer vision models.
In the initial annotations from SOR, individual pelicans were manually located and assigned to one of four categories using the ‘Count tool’ in Photoshop. However, the proprietary .psd file format is not directly accepted in open-source frameworks like PyTorch or TensorFlow. In the first week of the challenge, we discovered two methods (using Python or the psdtools library) which we used to convert the images to a standard file format and extract annotations into a tabular format. We then tried to convert the point annotation into a bounding box, but this method was not very accurate due to variations in the angle of the photo, the posture of the bird, and the surroundings. This problem, combined with a large imbalance in label frequency, led us to re-annotate a set of images.
We manually annotated 12 images using Roboflow in two steps (V1 and V2). The annotated images were then used to train a YOLOv8 model to obtain a rough set of bounding boxes around pelicans. The model's performance was impressive, particularly in adult breeding colonies with well-spaced-out pelicans. However, we had to redo the annotation for dense clusters of pelicans and juvenile pelicans. Eventually, we obtained 21 well-annotated images for the model's training, validation, and testing. Although we only have 21 images to work with, they are very large (6000x4000 pixels) and we can make them suitable for training machine-learning algorithms. Currently, the maximum image input size for the Yolov8 is 512x512. Meaning that each image could be cut into approximately ~90 tiles.
We divided the project into three stages. Firstly, we created a shortlist of nine models. Then, we conducted in-depth studies on each model to identify the most useful ones for the task at hand. Our focus criteria included complexity, resourcefulness, results, and timeframe for completion. Table 1 provides a summary of all the models. Several models performed well with moderate complexity, including the YOLO models which we focussed on for the next steps. We focus on YOLOv8 which is the latest YOLO release and started training two architectures small (s) and medium (m) which differ in their number of parameters and network depth. The models were initially trained with 300 epochs with an early stopping regularization technique of 50 epochs aiming to minimize the risk of overfitting.
The performance was validated using mAP-50. The results in Table 2 show very similar performance but given better accuracy we picked YOLOv8s to test on the test images. Theoretically, smaller models with fewer parameters are expected to have lower accuracy compared to medium, large, and extra-large models. However, it was found that not only medium models but also large and extra-large models had lower accuracy. We can speculate that larger architectures tend to overfit, making them poor at generalizing when handling new/test datasets.
The model implementation team focused on developing an efficient and user-friendly workflow around YOLOv8.
Before training could start, we first:
Subsequently, the model was trained on 31 images (21 Photoshop and 10 Roboflow annotated images).
There are 2 models created, a ‘General Class’ and a ‘Multi Class’.
General Class Multi-Class The confusion matrix in fig 5 shows the performance of the Multi model we trained. For the general detection we reached a pretty impressive accuracy of 91% and using multi-class detection we reached 90+% for each class in the multi-class model.
The final task for Team 2 was to implement the model into a GUI, as shown in Figure 6. Key parameters that the user can set include the detection confidence threshold, intersection over union percentage threshold, and maximum number of detections per image. Additionally, the user can choose to output results compatible with either Photoshop or Roboflow, and can even select or add another YoloV8 model for predictions.
The GUI application is coded using Python and utilizes the ‘ttkbootstrap’ library. The application is created using ‘pyinstaller’.
After ten weeks of work, the team successfully created a user-friendly GUI for pelican detection using the YOLOv8 detector. However, the detector struggles with fine-grained classification, particularly with juveniles, despite its accuracy in detecting adult pelicans.
We faced challenges with class imbalance and YOLO's limited capability to detect clustered individuals. There is certainly room for further improvements especially through increasing the training dataset with more images with high-quality labels, retraining the model, and fine-tuning the hyperparameters.
Using this new counting tool, SOR researchers can reduce manual counting time for pelicans. We hope the success with an off-the-shelf YOLO model encourages others to try it on their own data sets and be as pleasantly surprised as we are!
This was my first challenge with FruitPunch AI and I really enjoyed it! It was great to meet likeminded people from around the world and work together towards a common goal. Through this challenge, I definitely had a chance to improve my soft skills as well as my technical skills. I look forward to keeping using AI for Good!
AI for Pelicans was my second challenge with FruitPunch. It was a great and fun project working with everyone in this group. We started annotating images and ended with a good endresult for making predictions. There were some challenges, but that helps to improve the technical skills 🙂.
A big thank you to everyone the participated in this Challenge and made these amazing results possible!
Sebsatian Bugariu, Achuka Simon Allan, Ian Ocholla, Kabeer Nthakur, Yastika Joshi, Adriaan Kastelein, Davide Coppola, Olga Rudakova, Ștefan Istrate, Thor Veen, Jaka Cikač