Description
The goal is for students to become familiar with the main underlying concepts and to be able to explore new directions, expanding their knowledge when faced with new problems or applications. For this reason, the course will also have a practical slant, whereby students are expected to apply the techniques taught in the course to scenarios of practical interest.
General Information
Thursdays: 4pm - 6pm, Room B2
Office hours (Prof. Becchetti): Wednesdays, 10am - 12am, Thursdays, 2pm - 4pm
Fridays: 2pm - 4pm, room A3
Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets (2nd edition). Cambridge University Press, 2014.
A pdf version of the book, together with slides from the course on massive datasets taught at Stanford university, was generously made available by the authors. It can be found here: http://www.mmds.org/
References for some of the topics also include scientific papers and on line
resources. Pointers will be given by the instructor when needed
Each homework set will cover or extend topics presented in the class. Discussion of solutions will take place a few days after delivery deadline and will focus on the topics of the assignment and possibly related ones. In general, you should show a general knowledge of the main topics we touched. Not remembering technical details of a proof is fine, missing key ideas, notions and definitions is bad. You are expected to support the answers to the homework with proofs. Much of the homework will consist of questions asking you to design algorithms for various problems. A complete answer consists of a clear description of an algorithm/approach (an English description is fine, pseudo-code is welcome), followed by convincing arguments that the proposed solution is efficient, correct and achieves required performance guarantees. Here ``convincing'' use of (and therefore reference to) existing work in the literature or a formal proof. You do not need to implement the algorithm, unless requested. You should try to make your algorithms as efficient as possible. Note that a high-level idea does not suffice. You must be able to provide a detailed description of how you would implement your solution. Being able to expose your ideas clearly, concisely and accurately is a plus. In general, unclear, confused or inaccurate descriptions/explanations (in instructor's opinion) will cost you points, even if you are pretty sure you had the right idea in mind.
Handing in: You must hand in the homeworks by the due date and time by an email to the instructor that will contain as attachment a
pdf with all your answers. The solutions for the theoretical exercises must contain your answers either typed up or hand written clearly and scanned.
The subject of the email should be:
[BD] [Last_name First_name] HW #,
where # is the homework number.
Late policy: Every homework must be returned by the due date. Homeworks that are late will lose 10% of the grade if they are up to 1 day (24h) late, 20% if they are 2 days late, 30% if they are 3 days late, and they will receive no credit if they are late by more than 3 days. However, you have a bonus of 3 late days, which you can distribute as you wish between the two homeworks. Homeworks will be discussed and graded at the end, during the final exam.
We expect students to do all homeworks. In this case, they will only have an oral exam (with a schedule decided in advance), in which they will discuss their proposed solutions and related issues with the instructor and might also be asked about other topics taught in the course, especially if related to some HW. In particular, the oral exam will include examination of your homework solutions, for which you should be able to provide details and prove that you have understood and remember your solutions and understood your mistakes (if any).
Students who for whatever reason (work, medical reasons, grandmother who died, laziness, etc.) do not do all the homeworks, will also have a written exam, in which they must demonstrate that they have knowledge of all class material and that they are able to design and implement correctly algorithms for the topics covered by the homeworks they did not deliver.
The final exam will include answering questions about the class material, similar to homework questions.
Quantitatively, the written exam will consist of 2 assignments on the topics of each missing homework. Each block of two assignments will be allocated 60 minutes.
Validity of Homeworks (read carefully): your homework(s) can be used only once. In more detail, once you have discussed the homework, you have to decide whether to accept the proposed vote or not. The moment you sit in a written exam after discussing your homework(s) the vote resulting from the discussion is lost forever.
Collaboration policy (read carefully!): you may discuss the homework problems with other members of the class, but you must write up the assignment alone, in isolation. Also, you must understand well your solutions and be able to discuss your choices and their motivations in detail with the instructor. Finally, you should cite any sources you use in working on a homework problem.
Likewise, it is not fine if you post your homework solutions to a place where it is easy for other students to access them. This includes uploading your solutions to publicly-viewable repositories like on GitHub.
If we find out that you have violated this policy and you have copied in any way you will automatically fail. If you have any doubts about whether something is allowed or not, ask the instructor.
Announcements
Dear students,
I will not be in office tomorrow. My next office hours will be next week as usual
Apologies for the inconvenience
for the afternoon, instructors' rooms
Dear students,
we will discuss the homeworks next June 9th, friday, starting at 9am. We will basically follow a FIFO policy, so you can (and this is probably the easiest thing to do) organize among yourselves for the schedule. The morning round will end around 12.45. We will continue from 2pm, right after lunch. Please try not to come all at the same time. You should take an id with you, since we will proceed with the registration of the exam directly in the days following the discussion. By the way, please register for the next session if you are discussing on June 9th
Dear students,
this is to inform you that syllabus has been uploaded in its final version, while general information about homeworks and exams has been updated.
A discussion schedule will be proposed soon
Dear students,
this is to inform that yesterday was the last lecture for this year, since the topics we covered the topics we intended to. Stay tuned with the Web site for information about hw discussion, exam dates etc.
We are of course at your disposal for tutoring and support during office hours
We hope you enjoyed the course
The text of Assigment 5 was slightly changed (the last few lines on what to submit were missing). Please download updated version
The first homework is out. It can be found here
Due date: June 5th, 2017, 11.59PM
Please recall rules regarding homeworks, that you find in the course's homepage, under "General information"
Handing in your homework: You must hand your homework by the due date and time by an email to the instructor that will contain as attachment a zip file with all your answers and code. The solutions for the theoretical exercises must contain your answers either typed up (preferred) or hand written clearly and scanned.
Please remember that subject of the email should be: [BD] [Last_name First_name] HW 2
Dear students,
due to an inconvenience I am forced to cancel tomorrow afternoon's lecture
I apologize for that and for the late communication
Best regards
Luca Becchetti