Topologically Consistent Prototype Network for Incomplete Multimodal Learning

Petrovica S, Anohina-Naumeca A, Ekenel HK. Emotion recognition in affective tutoring systems: collection of ground-truth data. Procedia Comput Sci. 2017;104:437–44.

Article Google Scholar

Noda K, Arie H, Suga Y, Ogata T. Multimodal integration learning of robot behavior using deep neural networks. Robot Auton Syst. 2014;62(6):721–36.

Article Google Scholar

Frantzidis CA, Bratsas C, Klados MA, Konstantinidis E, Lithari CD, Vivas AB, et al. On the classification of emotional biosignals evoked while viewing affective pictures: an integrated data-mining-based approach for healthcare applications. IEEE Trans Inf Technol Biomed. 2010;14(2):309–18.

Article Google Scholar

Grifoni P. Multimodal human computer interaction and pervasive services. USA: IGI Global; 2009.

Book Google Scholar

Baltrušaitis T, Ahuja C, Morency L. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell. 2019;41(2):423–43.

Article Google Scholar

Rahate A, Walambe R, Ramanna S, Kotecha K. Multimodal co-learning: challenges, applications with datasets, recent advances and future directions. Inf Fusion. 2022;81:203–39.

Article Google Scholar

Lian Z, Chen L, Sun L, Liu B, Tao J. Gcnet: graph completion network for incomplete multimodal learning in conversation. IEEE Trans Pattern Anal Mach Intell. 2023;45(7):8419–32.

Google Scholar

Yang X, Yumer E, Asente P, Kraley M, Kifer D, Giles C. Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 5315–5324.

Liang P, Liu Z, Tsai Y-H H, Zhao Q, Salakhutdinov R, Morency L-P. Learning representations from imperfect time series data via tensor rank regularization. In: 57th Annual Meeting of the Association for Computational Linguistics. 2019. pp. 1569–1576.

Wang Q, Zhan L, Thompson P, Zhou J. Multimodal learning with incomplete modalities by knowledge distillation. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020. pp. 1828–1838.

Tran L, Liu X, Zhou J, Jin R. Missing modalities imputation via cascaded residual autoencoder. In: IEEE Conference on Computer Vision and Pattern Recognition. 2017. pp. 4971–4980.

Hao W., Zhang Z., Guan H. CMCGAN: a uniform framework for cross-modal visual-audio mutual generation. In: AAAI Conference on Artificial Intelligence. 2018. pp. 6886–6893.

Choi J-H, Lee J-S. EmbraceNet: a robust deep learning architecture for multimodal classification. Information Fusion. 2019;51:259–70.

Article Google Scholar

Gong Y, Lazebnik S, Gordo A, Perronnin F. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell. 2012;35(12):2916–29.

Article Google Scholar

Jegou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Trans Pattern Anal Mach Intell. 2010;33(1):117–28.

Article Google Scholar

Zeng H, Zhang H, Zhu L. Label consistent locally linear embedding based cross-modal hashing. Inf Process Manag. 2010;57(6):102136.

Article Google Scholar

Cui Z., Chang H., Shan S., Chen X. Generalized unsupervised manifold alignment. In: Advances in Neural Information Processing Systems. 2014. pp. 2429–2437.

Andre TN, Luke ER, Gaoussou Y, Edward R, Kasra D, Frank F. Practical cross-modal manifold alignment for robotic grounded language learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2021. pp. 1613–1622.

Li Y, Hu H, Wang D. Learning visually aligned semantic graph for cross-modal manifold matching. In: IEEE International Conference on Image Processing (ICIP). 2019. pp. 3412–3416.

Jiang Q, Chen C, Zhao H, Chen L, Ping Q, Tran SD, et al. Understanding and constructing latent modality structures in multi-modal representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. pp. 7661–7671.

Yang H-M, Zhang X-Y, Yin F, Liu C-L. Robust classification with convolutional prototype learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. pp. 3474–3482.

Bishop CM. Pattern recognition and machine learning. New York: Springer; 1992.

Google Scholar

Du C, Du C, He H. Multimodal deep generative adversarial models for scalable doubly semi-supervised learning. Inf Fusion. 2021;68:118–30.

Article Google Scholar

Wu M, Goodman N. Multimodal generative models for scalable weakly supervised learning. In: International Conference on Neural Information Processing Systems. 2018. pp. 5580–5590.

Yuan Z, Li W, Xu H, Yu W. Transformer-based feature reconstruction network for robust multimodal sentiment analysis. In: 29th ACM International Conference on Multimedia. 2021. pp. 4400–4407.

Hou J-C, Wang S-S, Lai Y-H, Tsao Y, Chang H-W, Wang H-M. Audio-visual speech enhancement using multimodal deep convolutional neural networks. IEEE Trans Emerg Topics Computat Intell. 2018;2(2):117–28.

Article Google Scholar

Chen J, Zhang A. HGMF: heterogeneous graph-based fusion for multimodal data with incompleteness. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20. 2020. pp. 1295–1305.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 770–778.

Zhao J, Li R, Jin Q. Missing modality imagination network for emotion recognition with uncertain missing modalities. In: 59th Annual Meeting of the Association for Computational Linguistics (vol. 1). 2021. pp. 2608–2618.

Lin Y, Gou Y, Liu Z, Li B, Lv J. Peng X. Completer incomplete multi-view clustering via contrastive prediction. In: IEEE Conference on Computer Vision and Pattern Recognition. 2021.

Liang N, Yang Z, Li L, Li Z, Xie S. Incomplete multiview clustering with cross-view feature transformation. IEEE Trans Artif Intell. 2021;3(5):749–62.

Article Google Scholar

Jing M, Li J, Zhu L, Lu K, Yang Y, Huang Z. Incomplete cross-modal retrieval with dual-aligned variational autoencoders. In: 28th ACM International Conference on Multimedia. 2021. pp. 3283–3291.

Ma M, Ren J, Zhao L, Tulyakov S, Wu C, Peng X. SMIL: multimodal learning with severely missing modality. In: AAAI Conference on Artificial Intelligence (vol. 35). 2021. pp. 2302–2310.

Finn C, Xu K, Levine S. Probabilistic model-agnostic meta-learning. In: Proceedings of International Conference on Neural Information Processing Systems (vol. 31). 2018. pp. 9537–9548.

C. Zhang, Z. Han, y. cui, H. Fu, J. T. Zhou, Q. Hu, CPM-Nets: cross partial multi-view networks. In: Advances in Neural Information Processing Systems (vol. 32). 2019.

Lu J, Goswami V, Rohrbach M, Parikh D, Lee S. 12-in-1: multi-task vision and language representation learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 10434–10443.

Shi X, Liu Q, Fan W, Philip S Y, Zhu R. Transfer learning on heterogenous feature spaces via spectral transformation. In: IEEE International Conference on Data Mining. 2010. pp. 1049–1054.

Han Y, Wu F, Tao D, Shao J, Zhuang Y, Jiang J. Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans Circuits Syst Video Technol. 2012;22(10):1485.

Article Google Scholar

Sugiyama M, Nakajima S, Kashima H, Buenau P V, Kawanabe M. Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems. 2008. pp. 1433–1440.

Long M, Wang J, Ding G, Sun J, Yu P S. Transfer joint matching for unsupervised domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognition. 2014. pp. 1410–1417.

Hotelling H. Relations between two sets of variates, Springer New York, NY, 1992. https://doi.org/10.1007/978-1-4612-4380-9_14.

Courty N, Flamary R, Tuia D, Rakotomamonjy A. Optimal transport for domain adaptation. IEEE Trans Pattern Anal Mach Intell. 2017;39(9):1853–65.

Article Google Scholar

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Article Google Scholar

Geva S, Sitte J. Adaptive nearest neighbor pattern classification. IEEE Trans Neural Netw. 1991;2(2):318–22.

Article Google Scholar

Liu C-L, Eim I-J, Kim J. High accuracy handwritten Chinese character recognition by improved feature matching method. In: Fourth International Conference on Document Analysis and Recognition (vol. 2). 1997. pp. 1033–1037.

Decaestecker C. Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recogn. 1997;30(2):281–8.

Article Google Scholar

Liu C-L, Nakagawa M. Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recogn. 2001;34(3):601–15.

Article Google Scholar

Kuo W, Angelova A, Malik J, Lin T-Y. ShapeMask: learning to segment novel objects by refining shape priors. In: IEEE/CVF International Conference on Computer Vision (ICCV). 2019. pp. 9207–9216.

Vielzeuf V, Lechervy A, Pateux S, Jurie F. Centralnet: a multilayer approach for multimodal fusion. In: European Conference on Computer Vision. 2018.

Wang X, Kumar D, Thome N, Cord M, Precioso F. Recipe recognition with large multimodal food dataset. In: IEEE International Conference on Multimedia Expo Workshops. 2015. pp. 1–6.

Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Article Google Scholar

Tzanetakis G, Cook P. Musical genre classification of audio signals. IEEE Trans Speech Audio Process. 2002;10(5):293–302.

Article Google Scholar

Lee H-C, Lin C-Y, Hsu P-C, Hsu W H. Audio feature generation for missing modality problem in video action recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019. pp. 3956–3960.

Poria S, Cambria E, Hazarika D, Majumder N, Zadeh A, Morency L-P. Context-dependent sentiment analysis in user-generated videos. In: Annual Meeting of the Association for Computational Linguistics (vol. 1). 2017. pp. 873–883.

Sun Z, Sarma P, Sethares W, Liang Y. Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: AAAI Conference on Artificial Intelligence (vol. 34). 2020. pp. 8992–8999.

Kingma DP, Ba J. Adam: a method for stochastic optimization. In: International Conference on Learning Representations, 2015.

Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (vol. 1). 2019. pp. 4171–4186.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

Article Google Scholar

Maaten LVD, Hinton G. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9:2579–605.

Google Scholar

View original article

COGNITIVE COMPUTATION

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Topologically Consistent Prototype Network for Incomplete Multimodal Learning

Comments (0)