Abstract:It is of great significance in the field of intelligent security to count the people in a specific scene with cameras or other devices. Due to the huge scale variation, messy background, and severe occlusion, the traditional method cannot get high precision accordingly. This paper proposed a head detection method based on an improved Faster-RCNN to accurately count the people. In this model, ResNet101, as a feature extraction network, uses a multi-scale feature fusion module to fuse the extracted features and perform hierarchical detection. The purpose is to detect people of different scales. In addition, by designing the size of an anchor and using Roi -Align instead of Roi -Pooling layer, the detection effect is further improved. Experiments show that the method achieves better results on the two Brainwash and HollwoodHeads datasets, and the accuracy reaches 95.3% and 89.1% respectively.