Description

Piazza sign up: piazza.com/umd/fall2015/cmsc422

Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples. The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time." A very large fraction of what we'll talk about has to do with figuring out what generalization means. We'll look at it from lots of different perspectives and hopefully gain some understanding of what's going on.

There are a few cool things about machine learning that I hope to get across in class. The first is that it's broadly applicable. These techniques have led to significant advances in many fields, including stock trading, robotics, machine translation, computer vision, medicine, etc. The second is that there is a very close connection between theory and practice. While this course is more on the "practical" side of things, almost everything we will talk about has a huge amount of accompanying theory. The third is that once you understand the basics of machine learning technology, it's a very open field and lots of progress can be made quickly, effectively by figuring out ways to formalize whatever we can figure out about the world.

Prerequisites: There will be a lot of math in this class and if you do not come prepared, life will be rough. You must be able to take derivatives by hand (preferably of multivariate functions). You must know what the chain rule of probability is, and Bayes' rule. More background is not necessary but is helpful: for instance, dot products and their relationship to projections onto subspaces, and what a Gaussian is and why it's okay if it's density is greater than one. I've provided some reading material to refresh these issues in your head, but if you haven't at least seen these things before, you should beef up your math background before class begins:

http://hal3.name/courses/2013S_ML/math4ml.pdf

On the programming side, projects will be in Python; you should understand basic computer science concepts (like recursion), basic data structures (trees, graphs), and basic algorithms (search, sorting, etc.). (If you know matlab, here's a nice cheat sheet:

http://mathesaurus.sourceforge.net/matlab-numpy.html

The course schedule is below. Readings are to be completed before class on the listed date. Homework and project deadlines are viewable on the handin server:

http://inclass.umiacs.umd.edu/perl/handin.pl?course=cmsc422

Grading:

Each homework is worth 0.5 points (total 6). Each lab is worth 2 points (total 20). Each project is worth 8 points (total 24). The midterm is 15 points and the final is 25 points (total 40). Participation is 10 points (in class or on piazza).

There will be occasional opportunities for extra credit.

Contact Information:
Hal Daumé III (Professor)
me@hal3.name
Office Hours Monday 3-4pm AVW 3227

Ryan Dorson (TA)
ta.cmsc422.fall2015@gmail.com
Office Hours Wednesday 10-11:30am AVW 4103

General Information

Books
Primary: A Course in Machine Learning (http://ciml.info)

Alternatives:

Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach (ISBN 1107422221)

Pattern Recognition and Machine Learning by Chris Bishop (ISBN 0387310738)

Machine Learning by Tom Mitchell (ISBN 0070428077)

Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman (ISBN 0387952845)

Information Theory, Inference and Learning Algorithms by David MacKay (ISBN 0521642981)

An Introduction to Computational Learning Theory by Michael Kearns and Umesh Vazirani (ISBN 0262111934)
Cheating
Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance - if you're having trouble understanding the material, please let us know and we will be more than happy to help.
ADA/DSS
Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first two weeks of the semester. You may reach them at 301-314-7682 or by visiting Susquehanna Hall on the 4th Floor.
Add/Drop
Document concerning adding, dropping, etc. is available here:

http://registrar.umd.edu/current/registration/Schedule%20Adjustment.html#policy
Anti-Harassment
The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of a this course. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. Harassment and hostile behavior are unwelcome in any part of this course. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in the conference. We aim for this course to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. Please contact an instructor or CS staff member if you have questions or if you feel you are the victim of harassment (or otherwise witness harassment of others).

Announcements

Course Schedule
08/31/15 - 10:29 AM

Day    Topic                  Read      Due
-----  ---------------------  --------  ----
T  1S  Course overview        none
R  3S  Decision trees         1-1.6     
T  8S  Practicalities & lab1  1.7-1.10  HW00
R 10S  Geometry & KNN         2-2.3     
T 15S  K-means & lab2         2.4-2.6   HW01
R 17S  Perceptron             3-3.5     
T 22S  Linear models          3.6-3.7   HW02
R 24S  Features & evaluation  4-4.4     
T 29S  F&E II & lab3          4.5-4.8   HW03
R  1O  Multiclass             5-5.3     
T  6O  Ranking & lab4         none      HW04
R  8O  Collective             5.4       
T 13O  Gradient descent       6-6.4     HW05
R 15O  Midterm review         none      P1
T 20O  Midterm                none      
R 22O  SGD, SVMs              6.5-6.7     
T 27O  Neural networks        8-8.3     HW07
R 29O  Deep nnets             8.4-8.6   
T  3N  Recursive neural nets  1,2 below HW08
R  5N  Kernels                9-9.3     
T 10N  SVMs II                9.4-9.6   HW09
R 12N  SVMs III                         P2
T 17N  Catch up
R 19N  K-means++ & lab6       13-13.2
T 24N  MDPs & basic RL                  
T  1D  Imitation learning     TBD       HW11
R  3D  IL II & lab7           TBD       
T  8D  Online learning        TBD       HW12
R 10D  Recommender systems    TBD       P3
T 15D  Final exam 8a :(       none

Extra readings:

[1] http://karpathy.github.io/2015/05/21/rnn-effectiveness/
[2] http://nbviewer.ipython.org/gist/yoavg/d76121dfde2618422139
Visit Manage Class to disable runnable code snippets

Homeworks are due on the date listed, by 8:00am (so I have time to review them before class). They are mostly sanity checks to see what things are confusing from the readings so we can go over them in class. Late homeworks are not allowed, and the cutoff of 8am is strict and precise.

Projects are due on the date listed by 10:00pm. You may hand in projects up to 48 hours late. Once they are 1 second late, you will be docked 25% (absolute) of the grade.

Labs are to be done in teams, but write-ups handed in independently on paper the following class period (i.e., the writeup for lab1 is to be handed in at the beginning of class on September 10). Please remember to write your name and UID on the lab. If you don't have time to complete the lab in class, please finish at home (either with your group/partner or on your own). This will help you get ahead on the projects too.

Rosh-Hashanah is Sept 14-15, which is when HW01 is due; you may hand this in on Sept 16. Yom Kippur is Sept 23 and Eid al Adha is Sept 23-26; these do not interfere with deadlines. If there are other religious observances that will interfere with your successful completion of assignments, please let me know in the first two weeks of class.

Staff Office Hours

Hal Daume III
--
--
Ryan Dorson
--
--

General Resources

Nothing has been added to the General Resources section, yet. Stay tuned!