This website uses cookies. By using this site, you consent to the use of cookies. For more information, please take a look at our Privacy Policy.
Home > Wiki encyclopedia > AlexNet


AlexNet was designed by the 2012 ImageNet competition winner Hinton and his student Alex Krizhevsky. Also after that year, more and deeper neural networks were proposed, such as excellent vgg and GoogLeNet. This is quite good for traditional machine learning classification algorithms.


Model introduction

AlexNet contains several relatively new technical points, and for the first time successfully applied Tricks such as ReLU, Dropout and LRN in CNN. At the same time, AlexNet also uses GPU for computing acceleration.

AlexNet carried forward LeNet's ideas and applied the basic principles of CNN to a very deep and wide network. The main new technology points used by AlexNet are as follows:

(1) Successfully used ReLU as the activation function of CNN, and verified that its effect surpassed Sigmoid in the deeper network, and successfully solved the gradient dispersion problem of Sigmoid in the deeper network. Although the ReLU activation function was proposed a long time ago, it was not promoted until the advent of AlexNet.

(2) Dropout is used to randomly ignore some neurons during training to avoid overfitting the model. Although Dropout has a separate paper, AlexNet has put it into practical use, and has confirmed its effect through practice. In AlexNet, the last few fully connected layers use Dropout.

(3) Use overlapping maximum pooling in CNN. Previously, average pooling was commonly used in CNN, and AlexNet all used maximum pooling to avoid the blurring effect of average pooling. And AlexNet proposed that the concession size is smaller than the size of the pooling core, so that there will be overlap and coverage between the outputs of the pooling layer, which enhances the richness of features.

(4) The LRN layer is proposed to create a competition mechanism for the activity of local neurons, so that the value with a larger response becomes relatively larger, and suppress other neurons with smaller feedback, enhancing the generalization ability of the model.

(5) Use CUDA to accelerate the training of deep convolutional networks, use GPU's powerful parallel computing capabilities to handle a large number of matrix operations during neural network training. AlexNet uses two GTX 580 GPUs for training. A single GTX 580 only has 3GB of video memory, which limits the maximum size of the trainable network. Therefore, the author distributes AlexNet on two GPUs and stores the parameters of half of the neurons in the memory of each GPU. Because the communication between GPUs is convenient, you can access each other's video memory without going through the host memory, so it is also very efficient to use multiple GPUs at the same time. At the same time, the design of AlexNet makes the communication between GPUs only at certain layers of the network, controlling the performance loss of communication. 

(6) Data enhancement, randomly intercepting 224*224-sized areas (and horizontally flipped mirrors) from the original 256*256 image, which is equivalent to an increase of 2*(256-224)^2=2048 times the amount of data . If there is no data enhancement, only the original data volume, CNN with many parameters will fall into overfitting. After using data enhancement, it can greatly reduce overfitting and improve the generalization ability. When making predictions, we take the four corners of the picture and add a total of 5 positions in the middle, and flip it left and right, obtaining a total of 10 pictures, predicting them and averaging the 10 results. At the same time, the AlexNet paper mentioned that PCA processing of the RGB data of the image, and a Gaussian disturbance with a standard deviation of 0.1 on the principal component, adding some noise, this Trick can reduce the error rate by another 1%.

AlexNet features

Used the Relu activation function

Relu function:


Deep convolutional networks based on ReLU are several times faster than those based on tanh and sigmoid.


After using ReLU, you will find that the value after the activation function does not have a range of values as the tanh and sigmoid functions, so generally after ReLU, a normalization will be done. LRU is a method that is steadily proposed. There is a concept in neuroscience called "Lateral inhibition" refers to the effect of active neurons on its surrounding neurons.


Dropout is also a concept that is often said to effectively prevent overfitting of neural networks. Compared with the general linear model, the regular method is used to prevent the model from overfitting, and in the neural network, Dropout is achieved by modifying the structure of the neural network itself. For a certain layer of neurons, randomly delete some neurons with a defined probability, while keeping the number of input layer and output layer neurons unchanged, and then update the parameters according to the learning method of the neural network. In the next iteration, restart Randomly delete some neurons until the end of training.


  • XC5206-6PQ160I


    FPGA XC5200 Family 10K Gates 784 Cells 83MHz 0.5um Technology 5V 160-Pin PQFP

  • XC4025E-4HQ240I


    FPGA XC4000E Family 25K Gates 2432 Cells 0.35um Technology 5V 240-Pin HSPQFP EP

  • XC3042A-7PCG84I


    FPGA XC3000 Family 3K Gates 144 Cells 113MHz 5V 84-Pin PLCC

  • XC5210-4PC84C


    FPGA XC5200 Family 16K Gates 1296 Cells 83MHz 0.5um Technology 5V 84-Pin PLCC

  • XC3S5000-4FG900C


    FPGA Spartan-3 Family 5M Gates 74880 Cells 630MHz 90nm Technology 1.2V 900-Pin FBGA

FPGA Tutorial Lattice FPGA
Need Help?


If you have any questions about the product and related issues, Please contact us.