Addressing the limitations of deep learning feature extraction methods, which fail to comprehensively extract and effectively integrate emotional features from speech, this paper proposes a novel speech emotion recognition model. It integrates a complementary feature learning framework and an attention feature fusion module. The complementary feature learning framework consists of two independent representational extraction branches and an interactive complementary representational extraction branch, thoroughly covering both independent and complementary representations of emotional features. To further optimize model performance, an attention feature fusion module is introduced. This module allocates appropriate weights based on the contribution level of different representations to emotion classification, enabling the model to focus maximally on features most beneficial for emotion recognition. Simulation experiments conducted on two public emotion databases (Emo-DB and IEMOCAP) validate the robustness and effectiveness of the proposed model.