Description
Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. This course will teach fundamental mathematical concepts related to MMML including multimodal alignment and fusion, heterogeneous representation learning and multi-stream temporal modeling. We will also review recent papers describing state-of-the-art probabilistic models and computational algorithms for MMML and discuss the current and upcoming challenges.
The course will present the fundamental concepts of machine learning and deep neural networks relevant to the five main challenges in multimodal machine learning: (1) multimodal representation learning, (2) translation & mapping, (3) modality alignment, (4) multimodal fusion and (5) co-learning. These include, but not limited to, multimodal auto-encoder, deep canonical correlation analysis, multi-kernel learning, attention models and multimodal recurrent neural networks. The course will also discuss many of the recent applications of MMML including multimodal affect recognition, image and video captioning and cross-modal multimedia retrieval.
The course will present the fundamental concepts of machine learning and deep neural networks relevant to the five main challenges in multimodal machine learning: (1) multimodal representation learning, (2) translation & mapping, (3) modality alignment, (4) multimodal fusion and (5) co-learning. These include, but not limited to, multimodal auto-encoder, deep canonical correlation analysis, multi-kernel learning, attention models and multimodal recurrent neural networks. The course will also discuss many of the recent applications of MMML including multimodal affect recognition, image and video captioning and cross-modal multimedia retrieval.
General Information
Time
Tuesdays and Thursday, 4:30pm-5:50pm
Lecture location (Tuesdays and some Thursdays)
DH A302
Discussion location (Thursdays, with some exceptions)
WEH 5302, WEH 5304, WEH 5316 and WEH 5320
Name | Office Hours | |
---|---|---|
Louis-Philippe Morency | When? Where? | |
Harsh Jhamtani | When? Where? | |
Zhun Liu | When? Where? | |
Ying Shen | When? Where? | |
Varun Bharadhwaj Lakshminarasimhan | When? Where? | |
Carla Luisa de Oliveira Viegas | When? Where? |
Lecture Notes
Lecture Notes
Lecture Date
Nov 13, 2018
Oct 30, 2018
Oct 25, 2018
Oct 16, 2018
Oct 9, 2018
Sep 25, 2018
Sep 18, 2018
Sep 11, 2018
Sep 4, 2018
Aug 30, 2018
Aug 28, 2018
Reading Assignments
Reading Assignments
Discussion Date
Nov 15, 2018
Oct 1, 2018
Nov 1, 2018
Nov 1, 2018
Oct 23, 2018
Oct 25, 2018
Oct 23, 2018
Oct 23, 2018
Sep 27, 2018
Sep 13, 2018