Abstract
In order to conduct optical neurophysiology experiments on a freely swimming zebrafish, it is essential to quantify the zebrafish head to determine exact lighting positions. To efficiently quantify a zebrafish head's behaviors with limited resources, we propose a real-time multi-stage architecture based on convolutional neural networks for pose estimation of the zebrafish head on CPUs. Each stage is implemented with a small neural network. Specifically, a light-weight object detector named Micro-YOLO is used to detect a coarse region of the zebrafish head in the first stage. In the second stage, a tiny bounding box refinement network is devised to produce a high-quality bounding box around the zebrafish head. Finally, a small pose estimation network named tiny-hourglass is designed to detect keypoints in the zebrafish head. The experimental results show that using Micro-YOLO combined with RegressNet to predict the zebrafish head region is not only more accurate but also much faster than Faster R-CNN which is the representative of two-stage detectors. Compared with DeepLabCut, a state-of-the-art method to estimate poses for user-defined body parts, our multi-stage architecture can achieve a higher accuracy, and runs 19x faster than it on CPUs.