Description

Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities.

This course will teach fundamental mathematical concepts related to MMML including (1) multimodal representation learning, (2) translation & mapping, (3) modality alignment, (4) multimodal fusion and (5) co-learning. We will also review recent papers describing state-of-the-art probabilistic models and computational algorithms for MMML and discuss the current and upcoming challenges. The course will also discuss many of the recent applications of MMML including multimodal affect recognition, image and video captioning and cross-modal multimedia retrieval.

General Information

Time
Tuesdays and Thursday, 4:30pm-5:50pm
Lecture location
BH 136A

Announcements

Announcements are not public for this course.

Staff Office Hours

Chaitanya Ahuja
--
--
Amir Zadeh
--
--
Louis-Philippe Morency
--
--
Tadas Baltrusaitis
--
--