This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > FPGA Technology > FPGA > TinyML-based Autonomous UAV Landing Algorithm - FPGA Technology

TinyML-based Autonomous UAV Landing Algorithm

Date: Nov 15, 2021

Click Count: 408

This solution is a computer vision algorithm using tinyML that enables UAV autopilots to land on a target surface while avoiding obstacles.

For the past few years, I have been developing tools at StreamLogic designed to make it easier to build and deploy computer vision at the edge. 

1 at the edge.png

In the center, you can find small, low-power and affordable camera modules from HiMax Imaging. I'm always looking for new applications that I might target with this platform.

Drones are a good use case for tinyML; they require a lot of sensor data and are sensitive to size, weight and power.

The QAV250 kit from Holybro.


The drone didn't come pre-assembled, I was still a novice, and months passed before I had all the right parts and got it off the ground. But let's jump to the end of the story. Once I started actually flying the drone, I realized how inaccurate GPS positioning is. When using an autopilot for autonomous missions, you may find your drone moving away from the planned flight path. This can be very dangerous in flight and when landing. That's when I came up with the idea for this project.

The goal was very simple: to land the drone safely using the autopilot. This has two parts: 1) landing on the intended surface (in my case the sidewalk) and 2) avoiding any unexpected obstacles on the surface. To achieve this, I came up with a computer vision algorithm to detect when and in which direction the drone should move when landing.

Algorithm design

My solution is very simple at a high level. A downward-facing camera is mounted on the bottom of the drone. As the drone slowly descends to land, it captures images for analysis. First, the square meter directly below the drone is identified as the landing point. Second, nearby sites in eight directions (north, northeast, east, southeast, south, southwest, west, and northwest) are also rated. Finally, if any of the nearby sites were rated significantly better than the current site, the drone was repositioned to the appropriate direction.

This is an example image taken from my drone.

3 image taken from my drone.png

If we continue on course, the black square in the middle is the current landing site. The marked squares around it are candidate sites for evaluating possible changes in direction. Obviously, if we were to land on the sidewalk, we would want to move southeast in this case.

Well, that's high level, but you may still be wondering 1) how do you know the size of a meter and 2) how do you evaluate it? The first one is simple, there is already an altitude sensor on the drone, so you can estimate the height of the drone. Calculating the pixels per meter (PPM) of the current image is very simple considering the altitude, some camera specifications and some algebra. In this case, the calculation is PPM = 168 / H.

For the second problem, this is where computer vision comes in handy. We need to be able to crop the image arbitrarily and obtain a quantitative score about the likelihood that the image will pass unobstructed on the road. The actual value is not so important here, as long as we can compare the two scores to determine which is better. One could handcraft an algorithm, but I chose to train a convolutional neural network (CNN) to generate the scores.

Let's look at the output of the graph above.


The graph shows the scores of the current landing site (labeled C) and all 8 candidate sites in each direction. There is no doubt that the southeast is the expected winner.

Machine Learning

As mentioned above, I chose to use a CNN to generate quantitative scores for the candidate landing sites. Specifically, a fully convolutional network (FCN). They differ from other CNNs in that they do not end with a dense or global pooling layer. The activation map of the last convolutional layer is the network output. As you will see, this is perfect for this application.

There are different ways to train an FCN, but I chose to first build a classifier CNN and then convert it to an FCN. let's first see how I built the classifier.

I want to create a classifier that will classify small image blocks as good or bad. But first, I needed data. Of course, I spent a lot of time searching the Internet for some public datasets.

5recorder captures images.png

The image recorder captures images every second and writes them to an SD card. This allowed me to collect images from the drone during landing, as follows.

6collect images from the drone.png

With many images on hand, I created patch datasets in three different categories.

  • Ground

  • Road

  • Unknown

I initially collected about 2,000 patches. I then spent a lot of time trying to train a shallow network (2 convolutional layers), but without much success. I tried many data enhancements and network structures to no avail. The network would always generate random results or make constant choices on the validation set (:sigh:).

There was no other way but to get more data. So, I collected about 1, 800 patches from my images. bingo, with about 4, 800 patches, performed well in training and after some experiments, my network reached 88% accuracy. Here is the final network.

7final network.png

This model has only 1, 033 parameters!

Full Convolutional Network

With the ground patch classifier in place, it can be converted to an FCN that can be applied to the entire image. looking at the network above, you can see that by removing the GlobalAveragePooling2D layer, we end up with a network that contains only convolutional layers. The result of the FCN is actually the same as applying a block classifier to all blocks in the image in a sliding window fashion. The activation value for each position in the output map is the classifier result for the patch at the corresponding position in the input image.

Does this really work? Let's take a look. Here is the rendering of the example image and the corresponding output from the FCN.

8 the example image and the corresponding output .png

The image on the right shows the FCN output for the pavement grade. The values of the candidate sites in the image on the right are summed to get the score for that site.

Here is another way to present the same data.

9 present the same data.png

In this image, I have overlaid the original image with a red square that is the reciprocal of the pavement class output in terms of transparency (i.e., the more likely it is to be a pavement, the more transparent it is). Looking at the target landing points, the more red, the lower the score.

All the details for creating the FCN and transferring the weights from the trained classifier to the FCN are in the evaluation.ipynbPython notebook of the code repository.


I quantized the network to use 8-bit weights and 8-bit activations for better runtime performance. You can follow the standard procedure described in the Tensorflow documentation, but I used my own method that has a simpler quantization model. The procedure is detailed in the code repository's README.

The following is an example of the output of the quantized network.

10the output of the quantized network.png

This is almost indistinguishable from the output of the original network: 

12 original network.png

the Hardware build

I already had a hardware stack with a Vision FPGA SoM and a microcontroller from the battery-powered Image Logger project. Unfortunately, the microcontroller is too small. Calculating the size of the input/output activation maps for each layer in the FCN, we found that we needed 48KB of buffer space. Fortunately, there was another Feather board that fit the bill.

Not only is the Feather M4 Express faster, but it has up to 192KB of RAM! We didn't need to use an SD card for this project, so our build was just a Vision FPGA SoM board stacked on top of the Feather M4 Express.

13 Feather M4 Express.png

Small, light, and battery powered!


In the block diagram you can see that the FPGA is used as a bridge between the camera module and the microcontroller.

14block diagram.png

The FPGA passes the captured image to the microcontroller in the original format of the image sensor (Bayer filter image). A pre-built programming file for the FPGA is provided in the code library.

This allows the microcontroller to take on the following tasks.

Convert the image from Bayer to RGB format

Downsampling the image to 80x80

Performing FCN inference

Aggregating target site regions

The source code for all these tasks is contained in the sketch folder of the code repository. I will make some comments on the code here.

The full raw image is over 100KB, and although this MCU has 192KB, I didn't want to waste space unnecessarily. The downsampled image only needs 20KB. you will see in the loop where it reads the image from the FPGA that it performs a Bayer conversion and downsampling on read, one block at a time, so we only need enough space for the downsampled image.

This network does not use the CNN framework, but is very simple and I implemented my own convolution and activation functions. These are defined in the gndnet.cpp file. If you are interested in learning more about how convolution works, it is written in a very straightforward way. I'm not too worried about performance, but these may be replaced by functions in the ARM neural network library for better performance.


The integration of the landing algorithm with the UAV's flight controller is beyond the scope of this proof of concept. The goal was to design and train the algorithm, prove its effectiveness and obtain some performance data.

Qualitatively, the algorithm looks good. It has been evaluated on test images from sites used for training and sites not used for training. The results are very encouraging. Of course, to go into production, one might want to train on data from multiple sites.

In terms of performance, the entire process from image capture to score calculation is about 525 ms. This means slightly less than 2 FPS. This may work, as the drone does drop at a fairly slow rate. However, this can be easily improved. For example, if the Bayer conversion and downsampling functions were moved to the FPGA, then the time would be reduced to a little over 200 ms. This would be more than 4 FPS, which is definitely enough.

<< Previous: Basic knowledge of FPGA architecture and applications

<< Next: FPGA Power System Management

Need Help?


If you have any questions about the product and related issues, Please contact us.