We believe data science is for everyone!
So we created free data science tutorials just for you!
Our teaching team consists of leading data scientists and practitioners who are also passionate about teaching. If you like these data science tutorials, why not come and meet them at our data science bootcamp! Be sure to check back regularly as we upload our newest content here.
Web Scraping using Python and Beautiful Soup
January 6, 2017
Web scraping is a very powerful tool to learn for any data professional. With web scraping, the entire internet becomes your database. In this python tutorial, we introduce the fundamentals
Web Scraping in R Part 1 | Writing your Script in rvest
December 19, 2018
In part 1 of introduction to web scraping in r, you will learn how to write standard web scraping commands in R, filter timely data based on time diffs, analyze
Web Scraping in R Part 2 | Scheduling your Script using taskscheduleR
February 25, 2019
In part two of our introduction to web scraping in r, we will use taskscheduleR to set up our automated web scraping script to run as a background task on
Time Series in Python Part 1: Read and Transform Your Data
April 22, 2019
In part 1 of this video series, learn how to read and index your data for time series using Python’s pandas package. We check if the data meets the requirements
ARIMA modeling and forecasting: Time Series in Python Part 2
April 29, 2019
In part 2 of this video series, learn how to build an ARIMA time series model using Python’s statsmodels package and predict or forecast N timestamps ahead into the future.
In part 3 of this video series, learn how to evaluate time series model predictions using mean absolute error and Python’s statistics and matplotlib packages. We look at plotting the
Overview | Introduction to Text Analytics with R Part 1
November 26, 2013
The overview of this video series provides an introduction to text analytics as a whole and what is to be expected throughout the instruction. It also includes specific coverage of:
Text analytics fundamentals covers: – The importance of splitting data in to training and test datasets – Stratified sampling of imbalanced data using the caret package – Representing text data
Data Pipelines – Introduction to Text Analytics with R Part 3
November 26, 2013
In our next installment of introduction to text analytics, data pipelines, we take cover: – Exploration of textual data for pre-processing “gotchas” – Using the quanteda package for text analytics
Model Building – Introduction to Text Analytics with R Part 4
November 26, 2013
We are now ready to build our first model in RStudio and to do our model building, we will cover: – Correcting column names derived from tokenization to ensure smooth
TF-IDF – Introduction to Text Analytics with R Part 5
November 26, 2013
TF-IDF includes specific coverage of: • Discussion of how the document-term frequency matrix representation can be improved: – How to deal with documents of unequal lengths. – What to do
N-grams – Introduction to Text Analytics with R Part 6
November 26, 2013
N-grams includes specific coverage of: • Validate the effectiveness of TF-IDF in improving model accuracy. • Introduce the concept of N-grams as an extension to the bag-of-words model to allow
LSA, VSM, & SVD – Introduction to Text Analytics with R Part 7
November 26, 2013
Part 7 of this video series includes specific coverage of LSA, VSM, & SVD: – The trade-offs of expanding the text analytics feature space with n-grams. – How bag-of-words representations
SVD with R – Introduction to Text Analytics with R Part 8
November 26, 2013
SVD with R includes specific coverage of: – Use of the irlba package to perform truncated SVD. – How to project a TF-IDF document vector into the SVD semantic space
Model Metrics – Introduction to Text Analytics with R Part 9
November 26, 2013
Model Metrics includes specific coverage of: – The importance of metrics beyond accuracy for building effective models. – Coverage of sensitivity and specificity and their importance for building effective binary
Cosine Similarity – Text Analytics with R Part 10
November 26, 2013
Cosine Similarity includes specific coverage of: – How cosine similarity is used to measure similarity between documents in vector space. – The mathematics behind cosine similarity. – Using cosine similarity
Your First Test – Introduction to Text Analytics with R Part 11
November 26, 2013
Your First Test includes specific coverage of: – Pre-processing new, unseen textual data to allow for predictions from our trained model. – The importance of caching the IDF values calculated
Conclusion – Introduction to Text Analytics with R Part 12
November 26, 2013
In this conclusion to Text Analytics with R we cover topics such as: – Optimizing our model for the best generalization on new/unseen data. – Discussion of the sensitivity/specificity trade-off
Introduction to Recommender Systems
December 20, 2019
We introduce you to the big world of recommender systems. We cover what they are, why they are important, and how they work. We also go over how and why big
Time Series Forecasting in Minutes
May 13, 2019
In this Data Science in Minutes, we will describe what time series forecasting is, and provide several examples of when you can use time series for your data. Time Series
One Versus One vs. One Versus All in Classification Models
April 15, 2019
In this quick overview, we introduce you to the concepts of one-versus-one and one-versus-all in classification. In classification models, you will often want to predict one class from another. This
N-grams in Minutes
April 8, 2019
In this quick tutorial, we learn that machines can not only make sense of words but also make sense of words in their context. N-grams are one way to help
Natural Language Processing
April 1, 2019
In this quick tutorial, we go over the basics of Natural Language Processing, what it is, and a few key applications of it. Machines can’t simply read and interpret language
Clustering Introduction
March 11, 2019
We will look at the fundamental concept of clustering, different types of clustering methods and the weaknesses. Clustering is an unsupervised learning technique that consists of grouping data points and
Introduction to Precision, Recall and F1 in Classification Models
February 4, 2019
You may have come across the terms “Precision, Recall and F1” when reading about Classification Models and machine learning. In this Data Science in Minutes tutorial, we will explain what
Introduction to the Confusion Matrix
January 28, 2019
A confusion matrix, also known as an error matrix, uses a special table to help visualize the performance of your classification model. That way, you can easily see how successful
Introduction to Classification Models
January 21, 2019
Ever wonder what classification models do? In this quick introduction, we talk about what classifications models are, as well as what they are used for in machine learning. In machine
Steps in Experimentation
January 14, 2019
In this tutorial we go over what you need for steps in online experimentation. When running a business, there are many different moving parts happening all at once. This is
A/A Testing in Minutes
January 9, 2019
So, we’ve just covered what is an A/B test and what is a multivariate test, but what is an A/A test? In this quick tutorial we go over A/A testing,
Multivariate Testing in Minutes
January 8, 2019
Multivariate testing is a technique for testing a hypothesis in which multiple variables are modified. In this tutorial, we will explain how a multivariate test differs from an A/B Test, how to create and
A/B Testing in Minutes
December 28, 2018
What is A/B testing? In this quick tutorial, we go over the basics of A/B testing, as well as answer some in-depth questions such as: why should businesses conduct A/B
Data Storage Systems
December 17, 2019
Redshift, MySQL, PostGreSQL, Hadoop and a list of other data systems are utilized for various analytical and operational purposes in the modern business world. As each company focuses more and
Introduction to Natural Language Processing (NLP)
November 3, 2019
This talk is an introduction to Natural Language Processing and its parent areas of Artificial Intelligence and Linguistics. We will discuss real use-cases of NLP in the world today –
Experiment Management for Machine Learning
September 2, 2019
An average data scientist spends a significant amount of time designing and running machine learning experiments This involves one or many of the following: – trying out various training algorithms
AMA About Data Science Job Interviews
July 29, 2019
Are you interviewing for a position in data science or considering switching careers? Join us for an AMA about job interviews in data science! Job interviews don’t always go the
What is a Data Engineer?
June 3, 2019
In this webinar, we will explore what is a data engineer. This includes discussing what are the goals, skills, and tools that they use on a daily basis. We wanted
R Tutorial: Automated Web Scraping Using rvest
February 22, 2019
In this R tutorial, we show you how to automatically web scrape using rvest periodically so you can analyze timely/frequently updated data. This talk was given by one of our
Building data science products? Think business first!
February 18, 2019
Modern machine learning libraries are both a blessing and a curse. Due to the ease with which the libraries can be used, most users focus too much on tools and
Artificial Intelligence For Social Good
February 4, 2019
It’s not hard to see machine learning and artificial intelligence in nearly every app we use – from any website we visit, to any mobile device we carry, to any
NLP 101 + Chatbots
November 12, 2018
In this meetup, Chris Shei talks about the basics of natural language processing: the components of NLP , enterprise applications of NLP, and finally build a simple Frequently Asked Questions
Data Visualization with ggplot2
June 20, 2018
The focus of the webinar will be using ggplot2 to analyze your data visually with a specific focus on discovering the underlying signals/patterns of your business. As an example, R’s
Data Manipulation with dplyr
March 15, 2018
dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying
Machine Learning Models: Building a Business Case for your idea
January 11, 2018
This presentation will discuss building a business case for machine learning models. In this talk, our presenter, Neeti Gupta, will provide a 10-step checklist with examples for the audience to
Ethical Dimensions of Data Science
December 15, 2017
From distorting experiments with systemic bias to imposing human ethics on machine learning models, data scientists have far more to worry about than the raw numbers in their spreadsheet. Join
Feature Engineering for Bot Detection
October 25, 2017
According to some estimates, bots constitute close to 50% of the overall traffic. In this introductory talk to Feature Engineering for Bot Detection, we will cover various aspects of feature
Online Experimentation and A/B Testing
October 13, 2017
In this meetup, we provide a quick introduction to online experimentation and A/B testing. To keep the tutorial self-contained, we will first give an overview of stats fundamentals needed to
Building Robust Models Machine Learning Models
October 12, 2017
Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis on tools like R, Python, SparkML, and techniques like deep learning is prevalent. Relying on tools and
Data Visualization with R and ggplot2
August 17, 2017
The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While
Power BI Storytelling
July 5, 2017
In this session, we will learn about telling impactful Storytelling with Power BI. Storytelling is a cornerstone of the human experience. Though many elements of stories have remained the same
Introduction to Online Experimentation and A/B Testing
June 11, 2017
In this full length segment we discuss A B testing: Online experimentation is perhaps the most misused of data science techniques. We will walk through the best practices for designing
caret Package – Machine Learning with R
June 7, 2017
The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s huge collection of open source machine learning
R Programming for Excel Users
May 3, 2017
R programming is rapidly becoming a valuable skill for data professionals of all stripes and a must-have skill for aspiring data scientists. Adding R programming to your data analyst skillset
Power BI for R Visualizations
April 5, 2017
Microsoft’s Power BI is a powerful technology for quickly creating rich r visualizations. Power BI has many practical uses for the modern data professional including executive dashboards, operational dashboards, and
Data Analysis with Excel
March 8, 2017
Business data analysis presents a challenge for the data analyst. Business data is often aggregated, recorded over time, and tends to exhibit autocorrelation. Additionally, and most problematically, the amount of
Event Log Mining with R
January 9, 2017
Event logs are everywhere and represent a prime source of Big Data. Event log sources run the gamut from e-commerce web servers to devices participating in globally distributed Internet of
Unstructured Text With Python, MS Cognitive Services & PowerBI
October 21, 2016
At this meetup, presenter Craig Guarraci speaks about how to Make Sense of Unstructured Text With Python, MS Cognitive Services & PowerBI. In this presentation we’ll take a broad look
Big Data Scaling in R Using Hadoop and Spark
June 17, 2016
R is currently one of the most popular data science languages in the world. However, it’s always had constraints around scaling out to big data. What happens when you expand
LIGO – Listening to the Melody of the Universe
March 24, 2016
Nearly 100 years after Einstein predicted the existence of gravitational waves, Laser Interferometer and Gravitational Wave Observatory astounded the world by successfully detecting these waves. Detection was made possible by
Sentiment Pipeline for Live Tweets
January 8, 2016
This will be an advanced talk on how to build a real-time predictive analytics pipeline. This will be a two hour talk and demo that covers the following: • Building
Predictive Modeling with R and Azure ML
August 6, 2014
In this 90-minute video tutorial, we will cover an overview of solving a simple predictive analytics problem. We will use R for Feature Exploration, Visualization, and Predictive Modeling with R