All great learning opportunities are built on a solid foundation. This data mining fundamentals series is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running. In part 1 of this data mining video series, we cover what data is and the basic vocabulary associated

In this video tutorial on Data Mining Fundamentals, we dive deeper into the vocabulary used in data mining, focusing on attributes. By the end of this tutorial, you will understand the different kinds of attribute classification, and when you should use each.

In this Data Mining Fundamentals video tutorial, we dive even deeper into attributes by identifying the subsets of attribute classification. These subsets include: categorical, nominal, ordinal, interval and ratio.

Continuing our series on Data Mining Fundamentals, we introduce you to the three data set types, Record, Ordered, and Graph and give you examples of when you would want to use each data set.

In this Data Mining Fundamentals video tutorial, we discuss another useful subcategory of record data, document data. We also discuss transaction data, which is record data where each record involves a set of items.

In this Data Mining Fundamentals tutorial, we introduce graph data and ordered data, and discuss the different types of ordered data such as spatial-temporal and genomic data.

In this Data Mining Fundamentals, we introduce the most overlooked step in data mining, Data Quality. Understanding your data quality problems is very important to creating robust models that will actually work in production.

In this Data Mining Fundamentals tutorial, we discuss data noise that can overlap valid data and outliers. Noise can appear because of human inconsistency and labeling. We will provide you with several examples of data noise, and how data noise can be measured and recorded.

In this Data Mining Fundamentals tutorial, we discuss missing values and duplicated data. Missing values can occur because information is not collected, or attributes are not applicable to all cases. We will tell you several ways to handle your missing values, as well as solutions for dealing with duplicate data, which can be a major

In this Data Mining Fundamentals tutorial, we introduce Data Preprocessing, known as data cleaning, and the different strategies used to tackle it. There are many strategies for data preprocessing, and because data science is such a heterogeneous field, none of these strategies are strictly independent.

In this Data Mining Fundamentals tutorial, we discuss our first data cleaning strategy, data aggregation. Aggregation is combining two or more attributes into a single attribute .

In this Data Mining Fundamentals tutorial, we discuss the data preprocessing technique of sampling for data selection. Sampling is the main technique employed for data selection, and is often used for both the preliminary investigation of data and the final data analysis.

In this Data Mining Fundamentals tutorial, we discuss the different types of sampling for data preprocessing, such as random sampling, stratified sampling, sampling without and with replacement. We will also dive into the issues of sample size, and how that can effect your sampling.

In this Data Mining Fundamentals tutorial, we discuss the curse of dimensionality and the purpose of dimensionality reduction for data preprocessing. When dimensionality increases, data becomes increasingly sparse in the space that it occupies. Dimensionality reduction will help you avoid this.

In this Data Mining Fundamentals tutorial, we discuss another way of dimensionality reduction, feature subset selection. We discuss the many techniques for feature subset selection, including the brute-force approach, embedded approach, and filter approach. Feature subset selection will reduce redundant and irrelevant features in your data.

In this Data Mining Fundamentals tutorial, we discuss the transformation of data in data preprocessing, such as attribute transformation. Attribute transformation is a function that maps the entire set of values of a given attribute to a new set of replacement values such that each old value can be identified with one of the new

In this Data Mining Fundamentals tutorial, we introduce you to similarity and dissimilarity. Similarity is a numerical measure of how alike two data objects are, and dissimilarity is a numerical measure of how different two data objects are. We also discuss similarity and dissimilarity for single attributes.

In this Data Mining Fundamentals tutorial, we continue our introduction to similarity and dissimilarity by discussing euclidean distance and cosine similarity. We will show you how to calculate the euclidean distance and construct a distance matrix.

In this Data Mining Fundamentals tutorial, we continue our discussion on similarity and dissimilarity and discuss correlation and visually evaluating it. Correlation measures the linear relationship between objects, and to visually evaluate correlation, you will need to build a scatter plot.

In this Data Mining Fundamentals tutorial, we introduce you to data exploration and visualization and what they are to data mining. Data exploration is visualization and calculation to better understand characteristics of data. We will tell you the key motivations of data exploration as well as the techniques used in data exploration.

In this Data Mining Fundamentals tutorial, we continue our discussion on data exploration and visualization. We discuss summary statistics and the frequency and mode of an attribute. Summary statistics are numbers that summarize properties of data, and the frequency of an attribute value is a percentage measuring how often the value occurs in the data

In this Data Mining Fundamentals tutorial, we continue our discussion on data exploration and visualization. We discuss measuring of center such as the median and mean, and look at measures of spread such as range and variance.

In this Data Mining Fundamentals tutorial, we discuss different visualization techniques, starting with the most popular: histograms and box plots. We discuss the unique benefits of both, and provide examples of when you can use each for your data exploration and visualization.

In the final video in our Data Mining Fundamentals series, we conclude our discussion of different visualization techniques for data exploration with scatter plots and contour plots. We will define each plot, and share examples of when you can use each for your data mining.

### AI For Social Good

February 4, 2019

It’s not hard to see machine learning and artificial intelligence in nearly every app we use – from any website we visit, to any mobile device we carry, to any goods or services we use. Where there are commercial applications, data scientists are all over it. What we don’t typically see, however, is how AI

### NLP 101 + Chatbots

November 20, 2018

Learn the basics of natural language processing: the components of NLP , enterprise applications of NLP, and finally build a simple FAQ Chatbot. About the Speaker: Chris Shei is the technical evangelist for Jet.com where he explores trending tech and helps Jet’s engineering org build stronger relationships with the external tech […]

### Introduction to Data Visualization with ggplot2

June 22, 2018

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms

### Data Manipulation with dplyr

March 19, 2018

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these

### Building a Business Case for your Machine Learning Idea

January 15, 2018

This presentation will discuss building a business case for your machine learning idea. In this talk, our presenter, Neeti Gupta, will provide a 10-step checklist with examples for the audience to build their own business model. This 10-step business checklist is a synthesis of the speaker’s real world experience evaluating companies that have built a

### Ethical Dimensions of Data Science

December 15, 2017

From distorting experiments with systemic bias to imposing human ethics on machine learning models, data scientists have far more to worry about than the raw numbers in their spreadsheet. Join Raja Iqbal on an exploration of data science’s past evils and how we can pave the way to a brighter future.

### Feature Engineering for Bot Detection

October 27, 2017

According to some estimates, bots constitute close to 50% of the overall traffic. In this introductory talk to Feature Engineering for Bot Detection, we will cover various aspects of feature engineering for bot detection of automated web traffic. We will start with understanding the impact of bots on an online business and various types of

### Online Experimentation and A/B Testing

October 16, 2017

In this meetup, we provide a quick introduction to online experimentation and A/B testing. To keep the tutorial self-contained, we will first give an overview of stats fundamentals needed to understand A/B testing. We then explain how A/B testing is done in an online business. We will conclude by mentioning some of the pitfalls that