With the rapid development of Internet technology, multimedia data of different view have grown exponentially, and people have been unable to satisfy the original single-modal data retrieval methods such as image retrieval. Cross-modal retrieval has became more and more important in information retrieval field. Aiming at this task, a cross-modal retrieval method for double-branch network structure by increase the attention mechanism of sentence-dependent phrases is proposed. We apply the CNN model to extract image features, and obtain the dependency segments of text based on syntactic structure analysis, and design the original double-branch network structure model which embeds the attention mechanism to learn the weight distribution of each dependent segment, so that the feature representation of the text can be more focused on key sentence segment features. The experimental results show that the proposed method has better performance in the retrieval accuracy evaluation than other methods, and verify the effectiveness of the algorithm.