Data science notebooks and presentations
I’ve collected a few project samples and tutorials on this site.
How to create slideshows from iPython notebooks.
I took an updated version of the standard Diamonds dataset that comes with R and did an exploratory data analysis on it. The dataset has almost half a million diamonds, all with GIA certification. I found an interesting anomaly with carat size that might indicate fraud (it’s the first graph of the three polished graphs at the end).
Knitted html version of the project
Rmd file for the project
I used pandas and a number of the Python graphing packages to explore some relationships in the Lahman Baseball dataset. I uncovered an anomaly in the birthdays of baseball players, but only later found out that it was due to the Little League cut-off of July 31.