An increasing number of people tend to convey their opinions in different modalities.For the purpose of opinion mining, sentiment classification based on multimodal data becomes a major focus.In this work, we propose a novel Multimodal Interactive and Fusion Graph Convolutional Candles Network to deal with both texts and images on the task of document-level multimodal sentiment analysis.
The image caption is introduced as an auxiliary, which is aligned with the image to enhance the semantics delivery.Then, a graph is constructed with the sentences and images generated as nodes.In line with the graph learning, the long-distance dependencies can be captured while the visual noise can be filtered.
Specifically, a cross-modal graph convolutional network is built for multimodal information fusion.Extensive experiments are conducted on a multimodal dataset from Yelp.Experimental results reveal Dust Bags that our model obtains a satisfying working performance in DLMSA tasks.