Carnegie Mellon University - Spring 2016
Can a robot watch “Youtube" to learn about the world? What makes us laugh? How to bake a cake? Why is Kim Kardashian famous?
12-unit class covering fundamentals of computer vision, audio and speech processing, multi-media files and streaming, multi-modal signal processing, video retrieval, semantics, and text (possibly also: speech, music) generation.
Instructors will give an overview of relevant recent work and benchmarking efforts (Trecvid, Mediaeval, etc.). Students will work on research projects to explore these ideas and learn to perform multi-modal retrieval, summarization and inference on large amounts of “Youtube”-style data. The experimental environment for the practical part of the course will be given to students in the form of Virtual Machines.
This is a graduate course primarily for students in LTI, HCII, CSD, Robotics, ECE; others, for example (undergraduate) students of CS or professional masters, by prior permission of the instructor(s). Strong implementation skills, experience on working with large data sets, and familiarity with some (not all) of the above fields (e.g. 11-611, 11-711, 11-751, 11-755, 11-792, 16-720, or equivalent), will be helpful.
Click the Edit button to add class information.
You have the option of deleting this announcement from just the course homepage or deleting this announcement from the course homepage and Q&A feed. What would you like to do?
You'll lose everything you typed, plus all the time it took to type it...