Summit's Data Science Club Explores New Analysis Techniques

January 13, 2015 David Kretch

LaptopMany Summiteers are interested in the burgeoning field of ‘data science’: anything and everything useful for learning from data, composed of parts from databases, machine learning, data visualization, programming, and more. Two such enthusiasts, myself and analyst Elizabeth Byerly, decided therefore that the time was ripe to form the Summit Data Science Club.

Data Science Club (DSC) is a forum to learn about, discuss, and apply the tools and techniques of data science. DSC is working with the Python and R programming languages, two of the most popular environments for data science work. DSC’s current focus is on ‘big data’ processing: how databases work, both relational and NoSQL; how to implement data analyses in the MapReduce programming model; how to use Hadoop, Pig, and other big data tools; and how to use these on cloud services like Amazon’s Elastic Compute Cloud.

The club meets weekly, where members discuss some common topic, e.g. how the MapReduce programming model works and how to apply it. Time is also set aside for ‘hack sessions’ where club members work together implementing some problem-solving approach in Python, R, etc. DSC meetings are also an avenue for the output of the nascent Baking Club, an activity which is altogether not unlike programming.

In the near future, DSC plans to spend some time going over machine learning algorithms, and data visualization principles and tools, especially interactive web-oriented tools like D3.js. DSC also plans to give club members the chance to work together on projects like Kaggle data mining competitions and interesting uses for publicly available data like Twitter or Capital Bikeshare (a service with which Summit is already familiar). 

Subscribe to the Summit Blog

Share This: