All great learning opportunities are built on a solid foundation. This session is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running on the first day of the bootcamp.

This video is part two of the vocabulary used in data mining focused on attributes.

This video is part three of the introduction to the data mining vocabulary. Explaining important attribute classes

Continuing with data fundamentals, we introduce you to the three data set types, Record, Ordered, and Graph.

Part two of data types, we discuss document data and transaction data, and how it works in data mining.

Part three of data types, we introduce graph data and ordered data. And discuss the types of ordered data such as spatial-temporal and genomic data.

In this next topic, we introduce the most overlooked step in data mining, Data Quality.

In part 2 of the introduction to Data Quality, we discuss noise that can overlap valid data and outliers.

In part three of the introduction to Data Quality, we discuss missing values and duplicated data

In this video we introduce Data Preprocessing, known as data cleaning, and the different strategies used to tackle it.

In part two of data preprocessing, we discuss aggregation.

In part three of data preprocessing, we discuss the technique of sampling for data selection

In part four of data preprocessing, we discuss the different types of sampling such as random sampling, stratified sampling, sampling without and with replacement. And go into the issues of sample size.

In part five of data preprocessing, we discuss the curse of dimensionality and the purpose of dimensionality reduction.

In part six of data preprocessing, we discuss another way of dimensionality reduction, feature subset selection.

In part seven of data preprocessing, we discuss transformation of data such as attribute transformation.

In this next section we introduce you to similarity and dissimilarity.

Part two of our introduction to similarity and dissimilarity, we discuss euclidean distance and cosine similarity.

Part three of our introduction to similarity and dissimilarity, we discuss correlation and visually evaluating it.

In our last section in data mining fundamentals, we introduce you to data exploration and visualization and what they are to data mining.

Part two of data exploration and visualization, we discuss summary statistics and the frequency and mode of a attribute.

Part three of data exploration and visualization, we discuss measuring of center such as the median and mean. And look at measures of spread such as range and variance.

Part four of data exploration and visualization, we discuss different visualization techniques starting with the most popular histograms and box plots.

Part five of data exploration and visualization, we continue our discussion of different visualization techniques to scatter plots and contour plots. This concludes our Data Mining Fundamentals course.

### Introduction to Data Visualization with ggplot2

June 22, 2018

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms

### Data Manipulation with dplyr

March 19, 2018

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these

### Building a Business Case for your Machine Learning Idea

January 15, 2018

This presentation will discuss building a business model for your machine learning idea. In this talk, our presenter, Neeti Gupta, will provide a 10-step checklist with examples for the audience to build their own business model. This 10-step business checklist is a synthesis of the speaker’s real world experience evaluating companies that have built a

### Ethical Dimensions of Data Science

December 15, 2017

From distorting experiments with systemic bias to imposing human ethics on machine learning models, data scientists have far more to worry about than the raw numbers in their spreadsheet. Join Raja Iqbal on an exploration of data science’s past evils and how we can pave the way to a brighter future.

### Feature Engineering for Bot Detection

October 27, 2017

According to some estimates, bots constitute close to 50% of the overall traffic. In this introductory talk, we will cover various aspects of feature engineering & detection of automated web traffic. We will start with understanding the impact of bots on an online business and various types of web bots. Finally, we will talk about

### Online Experimentation and A/B Testing

October 16, 2017

In this meetup, I will give a quick introduction to online experimentation and A/B testing. To keep the tutorial self-contained, I will first give an overview of stats fundamentals needed to understand A/B testing. I will explain how A/B testing is done in an online business. In the end, I will mention some of the

### Building Robust Machine Learning Models

October 13, 2017

Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis on tools like R, Python, SparkML, and techniques like deep learning is prevalent. Relying on tools and techniques while ignoring the fundamentals is the wrong approach to model building. Real-world machine learning requires hard work, discipline and […]

### Introduction to Data Visualization with R and ggplot2

August 18, 2017

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms