Abstract:At present,the mainstream multi-modal rumor detection models mainly focus on the feature extraction and splicing methods of the modes in the modeling process,while the local feature relationship of each mode and the information interaction within and between modes are often ignored,which affects the effect of rumor detection to a certain extent.To address this issue,we propose a metric learning based multimodal rumor detection method. Considering the influence of local feature relationships within each modality on the overall representation of modalities, we employed the technology of syntactic analysis and attention mechanism to exploring the local feature relationships of text and images,respectively.Additionally,metric learning is applied to rumor detection,where triplet learning and contrastive learning are utilized to identify the associated information within and between modalities.Performance testing experiments conducted on publicly available datasets from Twitter and Weibo demonstrate accuracy rates of 92.8%and 85.2%,respectively.These results indicate that incorporating local feature relationships within each modality and the interaction between modalities into the rumor detection model can further enhance the accuracy of rumor detection