Building an End-to-End Bird Detection System: A Journey from Model Development to Deployment on Google Cloud with Flask, MongoDB, and API Testing using Postman

Anubhav Elhence
8 min readOct 29, 2023

In this article, we’ll dive into the fascinating world of bird detection, walking you through each step of the process from problem definition to final deployment. We’ll explore technologies like Machine Learning, Google Cloud Platform, Flask, MongoDB, Google Maps API, Bootstrap, and even API testing using Postman. Whether you’re a bird enthusiast looking for a smarter way to identify birds or a developer interested in deploying machine learning models, this article has something for you.

Deployment Link: http://elhence.in/
Github Link: https://github.com/anubhavelhence/birddetectionpytorch/
Youtube Video: https://youtu.be/Fqdi1XQ0r14
Postman API public collection: https://www.postman.com/lunar-crater-246529/workspace/anubhav-elhence-public-workspace/collection/23319025-7679dc3a-2359-45b7-9c09-025d895eb5e9?action=share&creator=23319025
Postman Documentation: https://documenter.getpostman.com/view/23319025/2s9YRGy9am?source=post_page-----0db7147ba7b4--------------------------------

Problem Statement and Motivation

When thinking about animal welfare, several guiding questions arise: How can we make people more aware of the animals that share the planet with us? How can we facilitate the adoption of pets in shelters waiting for a forever home? These questions might seem distant from birdwatching at first glance, but they are inherently connected.

Birdwatching, often seen as a passive hobby, can become an active means of conservation. By knowing more about birds — their species, habitats, migratory patterns — we can contribute to data that helps in scientific research and conservation efforts. For instance, knowing the regular spots of an endangered bird species can help conservationists focus their efforts in those areas. Just like pets, many bird species are also on the verge of extinction and need rehabilitative efforts for survival. Could it be possible to adapt our bird identification technology to locate and monitor endangered birds? By doing so, we could collect vital data that could be used for their conservation. Further, the same technology could be used to identify birds that are commonly adopted as pets, providing potential bird owners with information that could help them make a responsible choice.

Spread your wings for Bird Detection

Birds might not be the first animal that comes to mind when talking about animal welfare, but they are a vital part of our ecosystem that needs attention and care, just like any other. By leveraging technology, we can make strides in this direction, turning passive observation into active participation.

Setting up Google Cloud Compute Engine and Requesting GPU Access

In this section, we’ll be discussing how to set up a Google Cloud Compute Engine instance and how to request GPU access for better computation power.

  • Step 1: Create a Google Cloud Account
  • Step 2: Requesting GPU Access
  • Step 3: Launching a GPU instance
  • Step 4: Opening Ports for running flask and accessing server

Developing the ML Model

The ML Model used for Bird Classification is a specific type of image classification known as “fine-grained image classification.” This is a bit tricky because the differences between various categories can be very subtle. Most techniques distinguish these subtle categories by looking at specific parts of an object. However, many methods in recent times don’t effectively use natural language (like descriptions or labels) to help with this classification. This research introduces a new approach where they use both images and their natural language descriptions together. They do this using a specially designed network with two branches. Their tests reveal that this method does a much better job at this fine-grained classification task than others, and they even set new performance records on a well-known dataset of bird images.

For reference, this explanation is based on the paper titled “Are These Birds Similar: Learning Branched Networks for Fine-grained Representations” by S. Nawaz et al., presented at the 2019 International Conference on Image and Vision Computing New Zealand.

The architecture depicted in the above diagram represents a multimodal approach that combines both image and text data for processing.

  1. Image Pathway (NTS-NET):
  • Input: The image of what appears to be a bird.
  • Navigator Network: This guides the model to focus on certain parts of the image. The network seems to “zoom in” on regions of the image that might be significant.
  • Feature Extractor: Multiple feature extractors are at work here, extracting features from the various regions of interest identified by the Navigator Network.
  • Scrutinizer Network: This further processes the extracted features. The ‘Predict’ label indicates some sort of prediction being made based on the image’s features, likely classification.

2. Text Pathway (BERT ):

  • Embedding: The words (W1 to W5) are converted into embeddings or vector representations. Note that there’s a “MASK” which is typical for BERT models, used for predicting missing words.
  • Transformer Encoder: This part of the BERT model processes the word embeddings. The transformer encoder is known for handling context well, meaning it looks at each word in relation to every other word in the sequence.

3. Fusion and Final Classification:

  • After both the image and text data are processed by their respective pathways, their representations are combined or “fused.”
  • This fused data is then passed through an MLP (Multi-Layer Perceptron), which is a type of neural network.
  • The output is a series of representations (the blocks in the final part of the diagram), which are then passed through a classification layer (with a fully-connected layer, GELU activation, and normalization).
  • The result of this is a series of word predictions (W1’ to W5’), likely classifying or describing attributes of the input image and text data.

In summary, this architecture takes an image and a sequence of words as input, processes them using specialized networks (NTS-NET for images and BERT for text), and then combines their features to produce a classification or description of the input data.

Deploying the ML Model with Flask

Deployment can often be a challenging task; however, Flask makes it incredibly straightforward.

View the complete repository at https://github.com/anubhavelhence/birddetectionpytorch

  • Setting up a Flask App
  • Integrating the ML Model

Creating Routes and API Testing with Postman

APIs are the backbone of any web service. We’ll explore how to set up routes in Flask and test them using Postman. We have the following 9 API request’s defined as can be seen in the image.

Documentation Link:

https://www.postman.com/lunar-crater-246529/workspace/anubhav-elhence-public-workspace/collection/23319025-7679dc3a-2359-45b7-9c09-025d895eb5e9?action=share&creator=23319025

In this project, we’re not just leveraging our own machine learning API for bird identification, but also integrating various powerful public APIs to enhance the user experience and deliver robust functionalities. We use the eBird API to gather extensive and real-time data about bird species, including their geographical distribution. For image uploading and hosting, we’re utilizing Imgur’s API, which provides a hassle-free way to manage images. Furthermore, we’re employing Google Maps API to create dynamic heatmaps that display the locations where specific bird species are commonly found. This blend of custom and public APIs allows us to offer a comprehensive solution for bird enthusiasts, providing a one-stop platform for all their bird identification and information needs.

List of all API requests which can be used for testing

1. GET HomePage

  • Purpose: Fetch the main landing page of the Bird Detection application.
  • Endpoint: /
  • Parameters: None.
  • Response: Returns the rendered HTML for the application’s homepage.

2. POST ImgurImageUpload

  • Purpose: Upload an image to Imgur for hosting and retrieval.
  • Endpoint: https://api.imgur.com/3/image
  • Parameters:
  • image: The image file to be uploaded.
  • Response: Returns the direct link of the uploaded image on Imgur.

3. POST PredictSpeciesofBird

  • Purpose: Classify the uploaded image to predict the bird species.
  • Endpoint: /predict
  • Parameters:
  • url: Direct link of the uploaded image on Imgur.
  • Response: Returns the predicted bird class along with the confidence level.

4. GET metadataofSpecies

  • Purpose: Fetch detailed metadata of a particular bird species.
  • Endpoint: /get_bird_metadata?species=Acadian_Flycatcher
  • Parameters:
  • species: The name or identifier of the bird species.
  • Response: Returns metadata like species’ characteristics, habitat, diet, etc.

5. GET externalInformationaboutSpecies

  • Purpose: Fetch external information about a bird species from eBird or other sources.
  • Endpoint: /capture?speciescode=acafly
  • Parameters:
  • speciescode: The unique code or identifier for the bird species from eBird.
  • Response: Returns detailed information and recent sightings or any other relevant information from eBird.

6. GET AllSpeciesList

  • Purpose: Fetch a list of all bird species present in the database.
  • Endpoint: /bird_classes
  • Parameters: None.
  • Response: Returns a list of bird species names and identifiers.

7. POST AddNewSpeciestotheList

  • Purpose: Add a new bird species to the database.
  • Endpoint: /new_bird
  • Parameters:
  • species_data: Data related to the bird species like name, characteristics, images, etc.
  • Response: Confirmation of the addition of the new species to the database.

8. GET RecentNotableObservationsinspecificArea

  • Purpose: Get recent notable bird sightings in a specific area.
  • Endpoint: https://api.ebird.org/v2/data/obs/US/recent/notable
  • Parameters:
  • location: The geographical location or identifier where you want to fetch recent observations.
  • Response: Returns a list of recent notable bird sightings in the specified area.

9. GET NearestObservationsofaSpecies

Installing and Using MongoDB

Here’s how we are using MongoDB to store our data.

The mongodb server is running on the server as our flask app and therefore connecting flask to mongodb involves only following lines of code

Frontend and External APIs

Design is not just about looks; it’s also about how it works. We’ll wrap up by discussing the aesthetic choices we made for the frontend.

After uploading, the following data appears

Thank you for making it to the end of this article. We’ve covered a wide array of technologies and platforms, all aimed at solving a simple yet exciting problem — bird identification. Happy birdwatching and coding!

--

--

Anubhav Elhence

We seamlessly integrate cutting-edge AI, IoT, or Blockchain technologies, or a synergistic combination thereof, into businesses, institutions, and organizations