Description
1] Implementing ML algorithms over Hadoop. (Done)
2] Recommendation systems (Done)
3] Online Learning Algorithms & Bandit learning (Done)
4] Deep Learning (Done)
5] Topological Data Analysis (Done)
Besides, we will cover a few machine learning libraries such as Torch, Storm etc (list to be updated).
Recommendation System
Bandit Learning http://engineering.richrelevance.com/bandits-recommendation-systems>
Deep Learning
Chris Manning's lecture: http://techtalks.tv/talks/deep-learning-for-nlp-without-magic-part-1/58414/
Book : http://research.microsoft.com/pubs/209355/NOW-Book-Revised-Feb2014-online.pdf
Using H2o: http://www.slideshare.net/0xdata/h2odeeplearningarnocandel032014
Business Ideas: http://npbay.es/deep-learning-business-models.html
Generic overview of Machine Learning
<http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms>
General Information
-Open Sourced Curriculum http://datasciencemasters.org/
-Quora Discussion http://hail-data.quora.com/How-to-acquire-the-Essential-Skill-Set-the-Self-Starter-way
If you want to install libraries etc. in Python- https://github.com/dzorlu/GADS/wiki/Guide-to-installing-machine-learning-libraries-in-python
Another good book- http://learnpythonthehardway.org/book/ (available online as HTML read)
I will recommend you to use this cheatsheet- <http://cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf> Go through it once and you are set to program. Then you can go use the smaller handy version <http://cran.r-project.org/doc/contrib/Short-refcard.pdf>.
This is a big book for going through content in detail. <http://web.udl.es/Biomath/Bioestadistica/R/Manuals/r_in_a_nutshell.pdf>
Subscribe to this blog- http://www.r-bloggers.com/. They will send you some amazing stuff daily and keep inspiring you to use R.
CMU- Alex Smola's lectures- http://alex.smola.org/teaching/cmu2013-10-701x/
Caltech- Yaser Abu-Mostafa- https://www.youtube.com/playlist?list=PLD63A284B7615313A
Announcements
Topics to be covered:
1 Techtalk by Chris Manning introducing the field <link>. Slides <link>
2] From Perceptrons to Deep Networks <link>
3] Deep Belief networks <link>.
1] Restricted Boltzmann Machines (RBMs) there is a great post by Edwin Chen <link>. Another simulation on digit recognition <link>.
2] Theano <link>
This week, we will focus a bit more on the practical use cases online learning and research area.
Location: Ernst & Young, 5 Times Square, New York <map>
Time: 1:00 to 3:00 PM. (Please be on time)
PLEASE try to make it by 12:30 to 12:50pm in the front lobby.
You'll require a visitors pass to be made at the front desk. Thereafter, Sam will accompany you to the conference room.
My phone number is 607-216-7043.
Good Evening,
I hope you all are doing well.
Last Saturday we covered, topological data analysis. Its a very interesting topic that deals more with unsupervised settings, when you want to know more about the data. It captures relationships between variables, important in case of natural or human settings. Request for Ayasdi's academic license <http://www.ayasdi.com/inquiry/academic-trial.html>. It was fun playing with mapper algorithm in Python. Thanks Aziz,
This Saturday, will be our last week (I go back to school next month). We will be covering Deep Learning and discuss a few use cases. Although there is tons of materials out there, I've carefully picked some, given that many of us don't have much experience with use of neural networks.
Tasks
1] Watch this techtalk by Chris Manning introducing the field <link>.
2] A great post by Toptal that walks though use case from Perceptrons, which we also covered during online learning. <link>.
3] Restricted Boltzmann Machines (RBMs) use case in R <link>. For Python developers, there is a great post by Edwin Chen <link>. Another simulation on digit recognition <link>.
4] Deep Belief networks <link>.
5] Play around with Theano <link>, the python library. I've worked on some text simulations for same.
Suggested
1] An awesome paper (only 4 pages) about using Neural Networks to reduce data size. A fairly practical problem <link>.
2] For deeper understanding, I'll recommend this book <link>. Its been widely circulated within the community.
3] Deep learning in context of text mining (sentiment analysis) <link>
4] Reading list from Dartmouth course on Deep learning <link>.
5] Great resource for deep learning <http://deeplearning.net/>.
6] Business case of using deep learning models <link>.
7] Using H2o toolboox. <link>
Awesome!
Cheers,
Tushar
Topics to be covered:
1]Prof. Carlson's short introductory video <here>
2] Reading of the week <here> (Thanks Phil, for suggesting this). And slides on concurrence topology <here> (Do go through them).
1] For hands on exercise, I'll do a simulation from their code <here>. There another R package called as Phom, which is used for doing this <here>
2] For Python, there is a mapper algorithm implementation which is base of Ayasdi's main product <here>.
This week, we will focus a bit more on the practical use cases online learning and research area.
Location: Ernst & Young, 5 Times Square, New York <map>
Time: 1:00 to 3:00 PM. (Please be on time)
PLEASE try to make it by 12:30 to 12:50pm in the front lobby.
You'll require a visitors pass to be made at the front desk. Thereafter, Sam will accompany you to the conference room.
My phone number is 607-216-7043.
Hi,
I hope everyone is doing good. Well, KDD cup is over, definitely leaving us with lot of learnings. Mostly about competition logistics though. Results might not be that fruitful directly, but we were atleast set in the correct directions for feature extraction as per winner's solution. Check Forum on Kaggle. Shubin, did a great job. Bravo!
Last Saturday, we covered online learning systems and change point analysis. Adam gave us an awesome demonstration in PyMC. BTW, I found simple implementation of McKay's paper <here>. Great handy tool.
We are left with 2 more topics, topological data analysis and deep learning (theano and H2o). For this weekend, I plan to take Topological data analysis. This branch of mathematics has been gaining lot of attention in data science. I am very excited about this one. It is used for extracting relationships between features at very high degrees.
Tasks
1]Prof. Carlson's short introductory video <here>
2] Reading of the week <here> (Thanks Phil, for suggesting this). And slides on concurrence topology <here> (Do go through them).
3] For hands on exercise, I'll do a simulation from their code <here>. There another R package called as Phom, which is used for doing this <here>
4] For Python, there is a mapper algorithm implementation which is base of Ayasdi's main product <here>.
Suggested
1] You can go through some of the material in Ayasdi's resource page <here>.
2] If you are in academia, you can request for Ayasdi's trial license <here>
Awesome! Don't hesitate to recommend some readings on the topic.
Cheers,
Tushar
Topics to be covered: (1-3p)
1] Context based bandits <slides>. If you missed last topics from last time or want some deeper understanding of Bandit learning, please look for material into Suggested section. (Don't spend much time into readings, I'll be summarizing the findings).
2] Change Point Analysis. A simple example <link>. Reading of the week <link>.
3] Install and do some hands on analysis with PyMC <link>. Adam will give a small tutorial-cum-demonstration. Please install it in your systems.
4] Our last chance with KDD. Bring your insights. This is the last week to put in energy. We will be hacking for the almost full day. We will be working from 3 to 10pm. (However this is optional, if you want to leave early if you are not working on KDD you are welcome to.)
This week, we will focus a bit more on the practical use cases online learning and research area.
Location: Ernst & Young, 5 Times Square, New York <map>
Time: 1:00 to 10:00 PM. (1-3p: normal session and 3-10p we will working on KDD dataset).
PLEASE try to make it by 12:30 to 12:50pm in the front lobby. (Please be on time)
You'll require a visitors pass to be made at the front desk. Thereafter, Sam will accompany you to the conference room.
My phone number is 607-216-7043.
Hello Everyone,
I hope this email finds you in awesome mood, a week after the long weekend. For this weekend we will be covering 2nd week of online learning algorithms. Adam pointed out a great topic: change point analysis. There is no good implementation in Python. We should target implementing our own code and it will give us some hands-on experience with PyMC which is very cool library, I will strongly urge you to look into.
For next two weeks we will be covering topological data analysis and deep learning implementations. Suggestions invited.
Task
1] Context based bandits <slides>. If you missed last topics from last time or want some deeper understanding of Bandit learning, please look for material into Suggested section. (Don't spend much time into readings, I'll be summarizing the findings).
2] Change Point Analysis. A simple example <link>. There is some R code implementation for same <link>. Reading of the week <link>.
3] Install and do some hands on analysis with PyMC <link>
4] Our last chance with KDD. Bring your insights.
Suggested
1] A ICML tutorial on Bandit Learning <link>. And some related slides <link>
This week, we will be covering the material first and then jump on to to discuss KDD. Last time we had to rush through the material and many of you had to leave. Plan for this week is, we will cover the material first and then jump on to KDD.
(There is possibility we might have a mini Hackathon session right after we finish the material. We will start around 2 or 3p and go uptil 10p till night. Reserve your Saturday. I'll get booze and snacks. Let me look into the logistics. I'll get back to your soon.)
Cheers,
Tushar
Hacknight!!
Location: Ernst & Young, 5 Times Square, New York <map>
Time: 7:00 PM. (Please be on time. Join us when you can, after your job)
PLEASE try to make it by 6:30 to 6:50pm in the front lobby.
You'll require a visitors pass to be made at the front desk. Thereafter, Sam will accompany you to the conference room.
My phone number is 607-216-7043.
Hello,
It was a great meeting yesterday.
Thanks to Sam and Alexey for doing an awesome job on data exploration for DonorsChoose data. Bravo!! Text team will catch on soon. :D :D
I apologize for a little delayed schedule yesterday, due to an emergency Sam was not able to make it on time. And we had to rush through the introduction to online learning algorithms as we spent more time on discussing KDD. Do go through the lecture on expert learning and bandits (links in previous post). Its really cool topic with practical importance (a great example would be A-B testing) and there isn't much material out there on this. We will continue this on the next meeting.
So, as per the plan, and coming long weekend. We will not have any meeting next Saturday. But, we will be meeting this coming Wednesday July 2nd for KDD Hacknight. We will meet around 7p. Join us when you can. Idea is to put together rigorous engineering at the feature end and modeling. Also, it will give us an idea about what to do over next week. Only 16 days are left for the final submission.
I am sharing a folder named KDD along with the extracted features from TFIDF. Start modeling and make submissions to get an idea about how your features are doing.
Work for the coming week:
Tasks
1] Work on the KDD Cup 2014 dataset. <link> Featuring engineering and modeling.
Cheers,
Tushar
Name | Office Hours | |
---|---|---|
Shubin Li | When? Where? | |
Tushar | When? Where? | |
sam sach | When? Where? | |
Adam Kelleher | When? Where? |