11777-A | Class Profile | Piazza

Are you a Professor or a TA? Learn more about Piazza for your class...

Course Information
Staff
Resources

Description

Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimodal researchers given the heterogeneity of the data and the contingency often found between modalities. This course will teach fundamental mathematical concepts related to MMML including multimodal alignment and fusion, heterogeneous representation learning and multi-stream temporal modeling. We will also review recent papers describing state-of-the-art probabilistic models and computational algorithms for MMML and discuss the current and upcoming challenges.

Recommended preparation: This is a graduate course designed primarily for PhD and research master students at LTI, MLD, CSD, HCII and RI; others, for example (undergraduate) students of CS or from professional master programs, are advised to seek prior permission of the instructor. It is required for students to have taken an introduction machine learning course such as 10-401, 10-601, 10-701, 11-663, 11-441, 11-641 or 11-741. Prior knowledge of deep learning is recommended. Students should have proper academic background in probability, statistic and linear algebra. Programming knowledge in Python is also strongly recommended.

More details in the Syllabus document.

General Information

Time

Tuesdays and Thursday, 3:20pm-4:40pm

Location

Remote teaching – Zoom (see links in CMU Canvas)

Course website

https://cmu-multicomp-lab.github.io/mmml-course/fall2020/

Announcements

Announcements are not public for this course.

Staff Office Hours
	Name	Office Hours
	Shikib Mehri	When? Where?
	Paul Liang	When? Where?
	Louis-Philippe Morency	When? Where?
	Prakhar Gupta	When? Where?
	Martin Ma	When? Where?
	Torsten Woertwein	When? Where?
	Yonatan Bisk	When? Where?

Reading Assignments

Reading Assignments

Start Date

W14a_Clue: Cross-modal Coherence Modeling for Caption Generation

Dec 1, 2020

W14b_History for Visual Dialog: Do we really need it?

Dec 1, 2020

W14c_Big Data's Disparate Impact

Dec 1, 2020

W14d_The Social Impact of Natural Language Processing

Dec 1, 2020

W12a Sim-to-Real Transfer for Vision-and-Language Navigation

Nov 16, 2020

W12b Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Predicti

Nov 16, 2020

W12c What is Learned in Visually Grounded Neural Syntax Acquisition

Nov 16, 2020

W12d The Return of Lexical Dependencies: Neural Lexicalized PCFGs

Nov 16, 2020

W10a_Guessing_State_Tracking_for_Visual_Dialogue.pdf

Nov 2, 2020

W10b_Iterative_Answer_Prediction_with_Pointer-Augmented_multiomodla_transformers_for_textvqa.pdf

Nov 2, 2020

W10c-learning-by-abstraction-the-neural-state-machine.pdf

Nov 2, 2020

W10d__Grounded_language_learning_fast_and_slow.pdf

Nov 2, 2020

W9a - Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Oct 26, 2020

W9b_A_Survey_of_Reinforcement_Learning_Informed_by_Natural_Language

Oct 26, 2020

W9c_Neural_Modular_Control_for_Embodied_Question_Answering

Oct 26, 2020

W9d_Towards_Diverse_and_Natural_Image_Descriptions_via_a_Conditional_GAN

Oct 26, 2020

W8a_Asynchronous_Temporal_Fields_for_Action_Recognition.pdf

Oct 19, 2020

W8b_Detecting_Visual_Relationships_with_Deep_Relational_Networks.pdf

Oct 19, 2020

W8c_Multimodal_Generative_Models_for_Scalable_Weakly_Supervised_Learning.pdf

Oct 19, 2020

W8d_Unpaired_Image_to_Image_Translation_using_Cycle_Consistent_Adversarial_Networks.pdf

Oct 19, 2020

W7d_Toward Multimodal Image-to-Image Translation

Oct 12, 2020

W7c_Soft-DTW: a Differentiable Loss Function for Time-Series. ICM

Oct 12, 2020

W7b_Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Oct 12, 2020

W7a_The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Super

Oct 12, 2020

W5a_Bottom-Up_and_Top-Down_CVPR_2018_paper.pdf

Sep 28, 2020

W5b_Attention-_is_not_not_explanation.pdf

Sep 28, 2020

W5c_Multimodal_Transformer_Networks_for_End-to-End_Video-Grounded_Dialogue_SYstems.pdf

Sep 28, 2020

W5d_LXMERT_Learning_Cross-Modality_Encoder_Representations_from_Transformers.pdf

Sep 28, 2020

W4a-Audio-Visual_Scene_Analysis.pdf

Sep 21, 2020

W4b_Learning_Deep_Structure-Preserving_Image-Text_Embeddings.pdf

Sep 21, 2020

W4c-Linking_Image_and_Text_with_2-way_Nets.pdf

Sep 21, 2020

W4d-Autoencoder_in_Autoencoder_Networks_CVPR_2019_paper.pdf

Sep 21, 2020

W3d - Sharp_Nearby_Fuzzy_Far_Away_How_Neural_Language_Models_Use_Context

Sep 14, 2020

W3c - Visualizing_and_Understanding_Recurrent_Networks

Sep 14, 2020

W3b - Grad-CAM_Visual_Explanations_from_Deep_Networks_via_Gradient-based_Localization

Sep 14, 2020

W3a - Visualizing_and_Understanding_Convolutional_Networks

Sep 14, 2020

W2c - Representation_Learning_A_Review_and_New_Perspectives.pdf

Sep 7, 2020

W2ab - Multimodal_Machine_Learning_A_Survey_and_Taxonomy.pdf

Sep 7, 2020

Lecture Slides

Lecture Slides

Lecture Date

Lecture14.2-Ethics.pdf

Dec 3, 2020

Lecture14.1-CoherenceAndGrounding.pdf

Dec 1, 2020

lecture12.2-Multimodal Human-inspired Language Learning.pdf

Nov 19, 2020

lecture_12.1_Connecting_Language_to_Actions.pdf

Nov 17, 2020

lecture10.2-NewResearchTrends.pdf

Nov 5, 2020

lecture10.1-MultimodalFusion.pdf

Nov 3, 2020

lecture9.2-ReinforcementLearning2.pdf

Oct 29, 2020

lecture9.1-ReinforcementLearning1.pdf

Oct 27, 2020

lecture8.2-DeepGenerativeModels.pdf

Oct 22, 2020

lecture8.1-DiscriminativeGraphicalModels.pdf

Oct 20, 2020

lecture7.2-ProbabilisticGraphicalModels.pdf

Oct 15, 2020

lecture7.1-AlignmentAndTranslation.pdf

Oct 13, 2020

lecture5.2-AlignmentAndRepresentation.pdf

Oct 1, 2020

lecture5.1-MultimodalAlignment.pdf

Sep 29, 2020

lecture4.2-CoordinatedRepresentations.pdf

Sep 24, 2020

lecture4.1-MultimodalRepresentations.pdf

Sep 22, 2020

lecture3.2-LanguageRepresentationsAndRNNs.pdf

Sep 17, 2020

lecture3.1-VisualRepresentationsAndCNNs.pdf

Sep 15, 2020

lecture2.2-BasicConcepts-Optimization.pdf

Sep 10, 2020

lecture2.1-BasicConcepts.pdf

Sep 8, 2020

lecture1.2-Datasets.pdf

Sep 3, 2020

lecture1.1-Introduction.pdf

Sep 1, 2020