Petrovica S, Anohina-Naumeca A, Ekenel HK. Emotion recognition in affective tutoring systems: collection of ground-truth data. Procedia Comput Sci. 2017;104:437–44.
Noda K, Arie H, Suga Y, Ogata T. Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst. 2014;62(6):721–36.
Frantzidis CA, Bratsas C, Klados MA, Konstantinidis E, Lithari CD, Vivas AB, et al. On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Trans Inf Technol Biomed. 2010;14(2):309–18.
Grifoni P. Multimodal human computer interaction and pervasive services. USA: IGI Global; 2009.
Baltrušaitis T, Ahuja C, Morency L. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2019;41(2):423–43.
Rahate A, Walambe R, Ramanna S, Kotecha K. Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Inf Fusion. 2022;81:203–39.
Lian Z, Chen L, Sun L, Liu B, Tao J. Gcnet: graph completion network for incomplete multimodal learning in conversation. IEEE Trans Pattern Anal Mach Intell. 2023;45(7):8419–32.
Yang X, Yumer E, Asente P, Kraley M, Kifer D, Giles C. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 5315–5324.
Liang P, Liu Z, Tsai Y-H H, Zhao Q, Salakhutdinov R, Morency L-P. Learning representations from imperfect time series data via tensor rank regularization. In: 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 1569–1576.
Wang Q, Zhan L, Thompson P, Zhou J. Multimodal learning with incomplete modalities by knowledge distillation. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020. pp. 1828–1838.
Tran L, Liu X, Zhou J, Jin R. Missing modalities imputation via cascaded residual autoencoder. In: IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 4971–4980.
Hao W., Zhang Z., Guan H. CMCGAN: a uniform framework for cross-modal visual-audio mutual generation. In: AAAI Conference on Artificial Intelligence. 2018. pp. 6886–6893.
Choi J-H, Lee J-S. EmbraceNet: a robust deep learning architecture for multimodal classification. Information Fusion. 2019;51:259–70.
Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell. 2012;35(12):2916–29.
Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell. 2010;33(1):117–28.
Zeng H, Zhang H, Zhu L. Label consistent locally linear embedding based cross-modal hashing. Inf Process Manag. 2010;57(6):102136.
Cui Z., Chang H., Shan S., Chen X. Generalized unsupervised manifold alignment. In: Advances in Neural Information Processing Systems. 2014. pp. 2429–2437.
Andre TN, Luke ER, Gaoussou Y, Edward R, Kasra D, Frank F. Practical cross-modal manifold alignment for robotic grounded language learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021. pp. 1613–1622.
Li Y, Hu H, Wang D. Learning visually aligned semantic graph for cross-modal manifold matching. In: IEEE International Conference on Image Processing (ICIP). 2019. pp. 3412–3416.
Jiang Q, Chen C, Zhao H, Chen L, Ping Q, Tran SD, et al. Understanding and constructing latent modality structures in multi-modal representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. pp. 7661–7671.
Yang H-M, Zhang X-Y, Yin F, Liu C-L. Robust classification with convolutional prototype learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. pp. 3474–3482.
Bishop CM. Pattern recognition and machine learning. New York: Springer; 1992.
Du C, Du C, He H. Multimodal deep generative adversarial models for scalable doubly semi-supervised learning. Inf Fusion. 2021;68:118–30.
Wu M, Goodman N. Multimodal generative models for scalable weakly supervised learning. In: International Conference on Neural Information Processing Systems. 2018. pp. 5580–5590.
Yuan Z, Li W, Xu H, Yu W. Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In: 29th ACM International Conference on Multimedia. 2021. pp. 4400–4407.
Hou J-C, Wang S-S, Lai Y-H, Tsao Y, Chang H-W, Wang H-M. Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans Emerg Topics Computat Intell. 2018;2(2):117–28.
Chen J, Zhang A. HGMF: heterogeneous graph-based fusion for multimodal data with incompleteness. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20. 2020. pp. 1295–1305.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 770–778.
Zhao J, Li R, Jin Q. Missing modality imagination network for emotion recognition with uncertain missing modalities. In: 59th Annual Meeting of the Association for Computational Linguistics (vol. 1). 2021. pp. 2608–2618.
Lin Y, Gou Y, Liu Z, Li B, Lv J. Peng X. Completer incomplete multi-view clustering via contrastive prediction. In: IEEE Conference on Computer Vision and Pattern Recognition. 2021.
Liang N, Yang Z, Li L, Li Z, Xie S. Incomplete multiview clustering with cross-view feature transformation. IEEE Trans Artif Intell. 2021;3(5):749–62.
Jing M, Li J, Zhu L, Lu K, Yang Y, Huang Z. Incomplete cross-modal retrieval with dual-aligned variational autoencoders. In: 28th ACM International Conference on Multimedia. 2021. pp. 3283–3291.
Ma M, Ren J, Zhao L, Tulyakov S, Wu C, Peng X. SMIL: multimodal learning with severely missing modality. In: AAAI Conference on Artificial Intelligence (vol. 35). 2021. pp. 2302–2310.
Finn C, Xu K, Levine S. Probabilistic model-agnostic meta-learning. In: Proceedings of International Conference on Neural Information Processing Systems (vol. 31). 2018. pp. 9537–9548.
C. Zhang, Z. Han, y. cui, H. Fu, J. T. Zhou, Q. Hu, CPM-Nets: cross partial multi-view networks. In: Advances in Neural Information Processing Systems (vol. 32). 2019.
Lu J, Goswami V, Rohrbach M, Parikh D, Lee S. 12-in-1: multi-task vision and language representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 10434–10443.
Shi X, Liu Q, Fan W, Philip S Y, Zhu R. Transfer learning on heterogenous feature spaces via spectral transformation. In: IEEE International Conference on Data Mining. 2010. pp. 1049–1054.
Han Y, Wu F, Tao D, Shao J, Zhuang Y, Jiang J. Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circuits Syst Video Technol. 2012;22(10):1485.
Sugiyama M, Nakajima S, Kashima H, Buenau P V, Kawanabe M. Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems. 2008. pp. 1433–1440.
Long M, Wang J, Ding G, Sun J, Yu P S. Transfer joint matching for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition. 2014. pp. 1410–1417.
Hotelling H. Relations between two sets of variates, Springer New York, NY, 1992. https://doi.org/10.1007/978-1-4612-4380-9_14.
Courty N, Flamary R, Tuia D, Rakotomamonjy A. Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell. 2017;39(9):1853–65.
Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.
Geva S, Sitte J. Adaptive nearest neighbor pattern classification. IEEE Trans Neural Netw. 1991;2(2):318–22.
Liu C-L, Eim I-J, Kim J. High accuracy handwritten Chinese character recognition by improved feature matching method. In: Fourth International Conference on Document Analysis and Recognition (vol. 2). 1997. pp. 1033–1037.
Decaestecker C. Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recogn. 1997;30(2):281–8.
Liu C-L, Nakagawa M. Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recogn. 2001;34(3):601–15.
Kuo W, Angelova A, Malik J, Lin T-Y. ShapeMask: learning to segment novel objects by refining shape priors. In: IEEE/CVF International Conference on Computer Vision (ICCV). 2019. pp. 9207–9216.
Vielzeuf V, Lechervy A, Pateux S, Jurie F. Centralnet: a multilayer approach for multimodal fusion. In: European Conference on Computer Vision. 2018.
Wang X, Kumar D, Thome N, Cord M, Precioso F. Recipe recognition with large multimodal food dataset. In: IEEE International Conference on Multimedia Expo Workshops. 2015. pp. 1–6.
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Tzanetakis G, Cook P. Musical genre classification of audio signals. IEEE Trans Speech Audio Process. 2002;10(5):293–302.
Lee H-C, Lin C-Y, Hsu P-C, Hsu W H. Audio feature generation for missing modality problem in video action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019. pp. 3956–3960.
Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency L-P. Context-dependent sentiment analysis in user-generated videos. In: Annual Meeting of the Association for Computational Linguistics (vol. 1). 2017. pp. 873–883.
Sun Z, Sarma P, Sethares W, Liang Y. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: AAAI Conference on Artificial Intelligence (vol. 34). 2020. pp. 8992–8999.
Kingma DP, Ba J. Adam: a method for stochastic optimization. In: International Conference on Learning Representations, 2015.
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (vol. 1). 2019. pp. 4171–4186.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Maaten LVD, Hinton G. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9:2579–605.
Comments (0)