Description
Effective Fall 2021, this course fulfills a single unit in each of the following BU Hub areas: Digital/Multimedia Expression, Quantitative Reasoning I, Critical Thinking.
Prerequisites: DS 110 and 120, or equivalents.
General Information
If you cannot attend a lecture in person, you can find the links to the Zoom livestream, recording, and lecture notes on the Course schedule page: https://piazza.com/class/l7p7fjsa9so6yg?cid=6 (just look to the right of this text).
1. You must document on your homework submission: (a) the names of any other students you worked with, (b) any websites you used besides the ones listed in this syllabus, and (c) any code you have used from other sources.
2. You may not directly copy solutions from anyone else, or give your solutions to someone else to copy.
Basically: sharing ideas with attribution is fine, but sharing answers is not.
The goal of tests is for you to show me what you have learned. As a result, any form of collaboration is strictly prohibited. Computers and notes are also forbidden during tests unless I explicitly state otherwise. (That said, I encourage you to collaborate with classmates when studying lecture materials and preparing for tests.)
Announcements
The goal of the final project is to give you the opportunity to further explore one of the topics covered in the course. Concretely, your objective is to analyze a dataset using any of the techniques we have covered this semester such as clustering, regression, principal components analysis, matrix factorization, or more.
You should find a dataset from a research paper, a Python repository, a blog post, or any other source (see some suggestions below). Then, you must either run a new analysis from scratch on this dataset, or you can reproduce the result of a prior work and add at least one extension. Note that it’s perfectly fine to use ideas from other papers or websites, but you must (a) cite any sources used and (b) describe concretely the parts of your project report that are similar to, or different than, the prior work.
Project Report Description: Your final report should be a Jupyter notebook containing the following sections:
-
Introduction: Provide an easy-to-understand summary of what you’ve done and why it matters to you. This section should be understandable even by people who have not taken this course.
-
Data: Describe the dataset you have studied. Explain what the objects and features are, show some samples, etc. If appropriate, describe where the dataset comes from and where it is applied in practice.
-
Methodology: Here, you should state the analysis techniques that you used in the project. Make sure to explain what the algorithm you’re using does, and why you chose this particular strategy for analyzing the data. If appropriate, state a hypothesis that you plan to test.
-
Analysis: Show the analysis itself, and include any charts/graphs/tables that help to visualize what you’ve done. This section should contain the code for implementing your methodology. For example, a project using matrix factorizations could include an implementation of one or more of the factorizations we discussed in class (or a related one).
-
Results: Explain the takeaways from your analysis. For example, does your analysis support or refute the claim you were intending to study, and did your algorithm behave as expected? If your project is building upon someone else’s work, make sure to use this space to compare and contrast your findings with other works.
-
Conclusion: Summarize the work you’ve done and the outcomes you’ve discovered.
-
References: I’ll repeat, make sure to cite your work! You can use any textbook, website, paper, or other resource as long as you cite it. Using prior work without citing it is plagiarism and will be handled as stated in the course syllabus.
Here is the timeline for the project:
-
Wednesday, November 30: By this day, please send me a private Piazza note describing the project that you plan to pursue. That is, describe the dataset you want to analyze and your initial ideas for the kinds of data analysis you think would be relevant to understanding this dataset.
-
Monday, December 12: Submit your final project on Gradescope.
The proposal: This is a brief summary of what topic you would like to tackle for the project. Deciding and scoping what exactly you want to do is very much part of the project itself. I am happy to chat during office hours or by appointment about ideas you may have and provide relevant references. The proposal should provide an overview of the topic you are choosing.
Your project will be graded as follows:
-
5% for submitting the initial project plan by Nov 30. (You will get full credit here for submitting any plan by Nov 30, even if we have some discussion about it afterward.)
-
15% for each of the six sections in the report. We will grade each section based on how well your code uses the techniques learned in class, and how well your writing documents this work.
-
5% for including a references section. (See more details below.)
Remember that it is crucial to include a references section and to properly compare your project with any prior work. Your bibliography entries should include author(s), title, and a website link (if appropriate).
Copying of ideas from prior work without citation will be considered plagiarism, and it is grounds for receiving a grade of 0 on the project along with further disciplinary action as described in the syllabus. As a general rule: cite liberally whenever you quote or paraphrase ideas from another paper, in every single sentence/paragraph where you do so. If you have any questions about this policy, send a private Piazza note to ask us.
Resources:
Algorithms and Libraries:- Useful libraries for data science in Python include:
[Machine Learning]
[Data Processing]
[Visualizations]
Examples of applications can be found on:
Datasets:- You can use any open dataset. Some common sources include:
[Dataset aggregators for Machine Learning projects]
[Open Data]
This page contains the lesson plan for DS 121 lectures.
Week | Topic | Reading | Homework |
1 | Vectors | Boyd-Vandenberghe Chapter 1 and 2.1 3Blue1Brown video 1 and video 2 | |
2 | Matrices | Boyd-Vandenberghe Chapter 6 3Blue1Brown video 3 and video 4 | HW1 due 9/16 |
3 | Geometry | Boyd-Vandenberghe Chapter 3 3Blue1Brown video 5 and video 6 | HW2 due 9/23 |
4 | Clustering | Boyd-Vandenberghe Chapter 4 | HW3 due 9/30 |
5 | Clustering | (no new reading, just review for the test) | TEST on 10/6 |
6 | LU decomposition | Aggarwal Section 2.4-2.5 3Blue1Brown video 7 and video 8 | HW4 due 10/14 |
7 | Subspaces & Orthogonality |
Aggarwal Sections 2.6-2.7.2 Boyd-Vandenberghe Chapter 10 3Blue1Brown video 9 | HW5 due 10/21 |
8 | Regressions | Boyd-Vandenberghe Chapter 12 | HW6 due 10/28 |
9 | Markov chains & Eigenvalues |
Deisenroth-Faisal-Ong Section 4.2 | HW7 due 11/4 |
10 | PageRank |
Aggarwal Sections 10.1-10.3 and 10.6 | TEST 2 on 11/8 |
11 | Diagonalization | Deisenroth-Faisal-Ong Section 4.4 | HW8 due 11/18 |
12 | SVD | Aggarwal Sections 7.1-7.4 | |
13 | PCA | Aggarwal Sections 8.1-8.2 | HW9 due 12/2 |
14 | Classifiers | PROJECT due 12/12 | |
EXAM on 12/19 |
Name | Office Hours | |
---|---|---|
Mayank Varia | When? Where? | |
Harshit Agrawal | When? Where? | |
Andy Yang | When? Where? | |
Daniel Cho | When? Where? | |
Lisa Wobbes | When? Where? |