Are you a Professor or a TA? Learn more about Piazza for your class...

11-777: Advanced Multimodal Machine Learning

Carnegie Mellon University - Spring 2017

Number of posts: 122

Number of students enrolled: 134

Syllabus

Course Information
Staff
Resources

Description

Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities.

This course will teach fundamental mathematical concepts related to MMML including (1) multimodal representation learning, (2) translation & mapping, (3) modality alignment, (4) multimodal fusion and (5) co-learning. We will also review recent papers describing state-of-the-art probabilistic models and computational algorithms for MMML and discuss the current and upcoming challenges. The course will also discuss many of the recent applications of MMML including multimodal affect recognition, image and video captioning and cross-modal multimedia retrieval.

General Information

Time

Tuesdays and Thursday, 4:30pm-5:50pm

Lecture location

BH 136A

Announcements

Announcements are not public for this course.

Staff Office Hours

Chaitanya Ahuja

--

--

Amir Zadeh

--

--

Louis-Philippe Morency

--

--

Tadas Baltrusaitis

--

--

Reading Assignments

Reading Assignments

Discussion Date

W14 - Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

Apr 27, 2017

W12 - Massively Multilingual Word Embeddings

Apr 13, 2017

W10 - Baby Talk: Understanding and Generating Simple Image Descriptions.pdf

Mar 30, 2017

W9 - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Mar 23, 2017

W8 - InverseGraphics.pdf

Mar 9, 2017

W7 - Deep CCA.pdf

Mar 2, 2017

W5 - DeviseADeepVisualSemantingEmbeddingModel.pdf

Feb 16, 2017

W4 - Visualizing and Understanding Recurrent Networks

Feb 9, 2017

W3 - Visualizing and Understanding Convolutional Networks

Feb 2, 2017

W2 - Representation Learning: A Survey and new Perspectives

Jan 26, 2017

Lecture Notes

Lecture Notes

Lecture Date

Russ_GuestLecture_Multimodal.pdf

Apr 18, 2017

neubig17multimodal.pdf

Apr 11, 2017

Mar 28, 2017

lecture9.1AttentionModels_and_Alignment.pdf

Mar 21, 2017

lecture8.1PracticalOptimization.pdf

Mar 14, 2017

lecture7.1ComponentAnalysis.pdf

Feb 28, 2017

lecture5.1MultimodalRepresentations.pdf

Feb 14, 2017

lecture4.1RecurrentNetworks.pdf

Feb 7, 2017

lecture3.1CNN_and_Optimization.pdf

Jan 31, 2017

lecture2.1BasicConcepts.pdf

Jan 24, 2017

lecture1.2challenges.pdf

Jan 19, 2017

Lecture1.1Introduction.pdf

Jan 17, 2017