Description
Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. This course will teach fundamental mathematical concepts related to MMML including multimodal alignment and fusion, heterogeneous representation learning and multi-stream temporal modeling. We will also review recent papers describing state-of-the-art probabilistic models and computational algorithms for MMML and discuss the current and upcoming challenges.
Recommended preparation: This is a graduate course designed primarily for PhD and research master students at LTI, MLD, CSD, HCII and RI; others, for example (undergraduate) students of CS or from professional master programs, are advised to seek prior permission of the instructor. It is required for students to have taken an introduction machine learning course such as 10-401, 10-601, 10-701, 11-663, 11-441, 11-641 or 11-741. Prior knowledge of deep learning is recommended. Students should have proper academic background in probability, statistic and linear algebra. Programming knowledge in Python is also strongly recommended.
More details in the Syllabus document.
Recommended preparation: This is a graduate course designed primarily for PhD and research master students at LTI, MLD, CSD, HCII and RI; others, for example (undergraduate) students of CS or from professional master programs, are advised to seek prior permission of the instructor. It is required for students to have taken an introduction machine learning course such as 10-401, 10-601, 10-701, 11-663, 11-441, 11-641 or 11-741. Prior knowledge of deep learning is recommended. Students should have proper academic background in probability, statistic and linear algebra. Programming knowledge in Python is also strongly recommended.
More details in the Syllabus document.
General Information
Time
Tuesdays and Thursday, 3:20pm-4:40pm
Location
Remote teaching – Zoom (see links in CMU Canvas)
Name | Office Hours | |
---|---|---|
Shikib Mehri | When? Where? | |
Paul Liang | When? Where? | |
Louis-Philippe Morency | When? Where? | |
Prakhar Gupta | When? Where? | |
Martin Ma | When? Where? | |
Torsten Woertwein | When? Where? | |
Yonatan Bisk | When? Where? |
Reading Assignments
Reading Assignments
Start Date
Dec 1, 2020
Nov 16, 2020
Nov 2, 2020
Oct 26, 2020
Oct 12, 2020
Oct 12, 2020
Sep 28, 2020
Sep 28, 2020
Sep 21, 2020
Sep 21, 2020
Lecture Slides
Lecture Slides
Lecture Date
Dec 3, 2020
Dec 1, 2020
Nov 17, 2020
Nov 5, 2020
Nov 3, 2020
Oct 29, 2020
Oct 27, 2020
Oct 22, 2020
Oct 20, 2020
Oct 15, 2020
Oct 13, 2020
Oct 1, 2020
Sep 29, 2020
Sep 24, 2020
Sep 22, 2020
Sep 17, 2020
Sep 15, 2020
Sep 10, 2020
Sep 8, 2020
Sep 3, 2020
Sep 1, 2020