1. Introduction
	> Motivations and applications: what does Big mean?
	> A few examples
	> General tasks
	> Tools 

References
- [1], Chapter 1, in particular 1.3
- Very popular article: http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1
Suggested readings 
- [2]
- IEEE Spectrum's interview with Michael Jordan: http://spectrum.ieee.org/robotics/artificial-intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts


2. Recap on MapReduce and introduction to Spark
	> Hadoop
	> Spark

References
- [1], Chapter 2
- [3]
- Spark documentation: http://spark.apache.org/documentation.html

3. Locality sensitive hashing
A case study: min-hashing and estimation of Jaccard similarity of shingled documents.
Min-wise permutations and Jaccard similarity. Locality sensitive hashing for Jaccard
similarity. Signatures and the banding technique. Detecting candidate pairs. 

References
- [1], Sections 3.1, 3.2, 3.3 and 3.4, 3.5 (no 3.5.5), 3.6, 3.7 until 3.7.2 (included)
- [4] until section 3 (included)
- Suggested reading: [1], section 3.8

4. Clustering algorithms for big data
Review of basic techniques and hierarchical clustering. Brief recap of K-means algorithm.
BFR algorithm. The CURE algorithm.

References
- [1], Sections 7.1 to 7.4

5. Streaming algorithms
The data stream model. Estimating the number of distinct items. Flajolet-Martin (like) algorithm.
Sampling from an (infinite) stream: Vitter's reservoir sampling algorithm. Counting ones (and estimating a sum of integers) in a window. The DGIM algorithm. Keeping track of the most frequent items in a stream: the heavy hitters problem. 

References
- [1], Sections 4.1, 4.2, 4.4, 4.5.5, 4.6
- [5], introduction and Section 2.3

6. Dimensionality reduction
Brief review of eigendecomposition and eigenvector/eigenvalue computation. The power method.
Using spectra for dimensionality reduction: Singular value decomposition. Interpretation of
the concept space: the user-item metaphor. Querying using latent concepts. Computing
SVD. Extending SVD to large data sets: CUR decomposition

References
- [1], Chapter 11

7. Mining large graphs
A motivating example: link prediction and social matching. Link prediction/recommendation as a collaborative filtering task. Measures of vertex similarity in a social network: neighbourhood based and random walk indices. 
More motivating examples: leveraging topological structure to compute machine learining features of members of a social network. The case of Web spamming.
Computational challenges: the semi-streaming model of computation. Applying streaming techniques: estimating number of supporters and local clustering coefficient using composable sketches. 
Estimating the total number of triangles from the largest eigenvalues of the adjacency matrix.

References
- [7]
- [1], Section 9.3, Section 10.1, Section 10.6, 10.7 (suggested reading), 10.8.1, 10.8.2, 10.8.7
- [8], [9] (you can skip 6.1) [10] (understand most important notions and ideas)


LIST OF REFERENCES
[1] Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2014.
[2] Boyd, Danah, and Kate Crawford. "Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon." Information, communication & society 15.5 (2012): 662-679.
URL: http://www.tandfonline.com/doi/abs/10.1080/1369118X.2012.678878
[3] Zaharia, Matei, et al. "Spark: Cluster Computing with Working Sets." HotCloud 10.10-10 (2010): 95.
URL: http://static.usenix.org/legacy/events/hotcloud10/tech/full_papers/Zaharia.pdf
[4] Andoni, Alexandr, and Piotr Indyk. "Near-optimal hashing algorithms for near neighbor problem in high dimension." 
Communications of the ACM 51.1 (2008). 
[5] Cormode, Graham, and Muthukrishnan, S. "An improved data stream summary: the count-min sketch and its applications." Journal of Algorithms 55.1 (2005): 58-75.
URL: http://www.cs.yale.edu/homes/aspnes/pinewiki/attachments/DataStreamComputation/count-min.pdf
[6] Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. ACM, 1996.
URL: http://www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/alonMS.pdf
[7] Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social networks." journal of the Association for Information Science and Technology 58.7 (2007): 1019-1031.
[8] Palmer, Christopher R., Phillip B. Gibbons, and Christos Faloutsos. "ANF: A fast and scalable tool for data mining in massive graphs." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
[9] Becchetti, Luca, et al. "Efficient semi-streaming algorithms for local triangle counting in massive graphs." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.
[10] Tsourakakis, C. E. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pages 608–617. IEEE.