University of British Columbia - Other
Multimodal machine learning is a multi-disciplinary research field which addresses some of the core goals of artificial intelligence by integrating and modeling two or more data modalities (e.g., visual, linguistic, acoustic, etc.). This course will teach fundamental concepts related to multimodal machine learning, including (1) representation learning, (2) translation and mapping, and (3) modality alignment. While the fundamental techniques covered in this course are applicable broadly, the focus will be on studying them in the context of joint reasoning and understanding of images/videos and language (text).
In addition to fundamentals, we will study recent rich body of research at the intersection of vision and language, including problems of (i) generating image descriptions using natural language, (ii) visual question answering, (iii) retrieval of images based on textural queries (and vice versa), (iv) generating images/videos from textual descriptions, (v) language grounding and many other related topics. On a technical side, we will be studying neural network architectures of various forms, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), memory networks, attention models, neural language models, and structures prediction models.
Click the Edit button to add class information.
You have the option of deleting this announcement from just the course homepage or deleting this announcement from the course homepage and Q&A feed. What would you like to do?
You'll lose everything you typed, plus all the time it took to type it...