All great learning opportunities are built on a solid foundation. This session is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running on the first day of the bootcamp.
This video is part two of the vocabulary used in data mining focused on attributes.
This video is part three of the introduction to the data mining vocabulary. Explaining important attribute classes
Continuing with data fundamentals, we introduce you to the three data set types, Record, Ordered, and Graph.
Part two of data types, we discuss document data and transaction data, and how it works in data mining.
Part three of data types, we introduce graph data and ordered data. And discuss the types of ordered data such as spatial-temporal and genomic data.
In part three of the introduction to Data Quality, we discuss missing values and duplicated data
In this video we introduce Data Preprocessing, known as data cleaning, and the different strategies used to tackle it.
In part two of data preprocessing, we discuss aggregation.
In part three of data preprocessing, we discuss the technique of sampling for data selection
In part four of data preprocessing, we discuss the different types of sampling such as random sampling, stratified sampling, sampling without and with replacement. And go into the issues of sample size.
In part five of data preprocessing, we discuss the curse of dimensionality and the purpose of dimensionality reduction.
In part six of data preprocessing, we discuss another way of dimensionality reduction, feature subset selection.
In part seven of data preprocessing, we discuss transformation of data such as attribute transformation.
In this next section we introduce you to similarity and dissimilarity.
Part two of our introduction to similarity and dissimilarity, we discuss euclidean distance and cosine similarity.
Part three of our introduction to similarity and dissimilarity, we discuss correlation and visually evaluating it.
In our last section in data mining fundamentals, we introduce you to data exploration and visualization and what they are to data mining.
Part two of data exploration and visualization, we discuss summary statistics and the frequency and mode of a attribute.
Part three of data exploration and visualization, we discuss measuring of center such as the median and mean. And look at measures of spread such as range and variance.
Part four of data exploration and visualization, we discuss different visualization techniques starting with the most popular histograms and box plots.
Part five of data exploration and visualization, we continue our discussion of different visualization techniques to scatter plots and contour plots. This concludes our Data Mining Fundamentals course.
March 19, 2018
dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these
January 15, 2018
This presentation will discuss building a business model for your machine learning idea. In this talk, our presenter, Neeti Gupta, will provide a 10-step checklist with examples for the audience to build their own business model. This 10-step business checklist is a synthesis of the speaker’s real world experience evaluating companies that have built a
December 15, 2017
From distorting experiments with systemic bias to imposing human ethics on machine learning models, data scientists have far more to worry about than the raw numbers in their spreadsheet. Join Raja Iqbal on an exploration of data science’s past evils and how we can pave the way to a brighter future.
October 27, 2017
According to some estimates, bots constitute close to 50% of the overall traffic. In this introductory talk, we will cover various aspects of feature engineering & detection of automated web traffic. We will start with understanding the impact of bots on an online business and various types of web bots. Finally, we will talk about
October 16, 2017
In this meetup, I will give a quick introduction to online experimentation and A/B testing. To keep the tutorial self-contained, I will first give an overview of stats fundamentals needed to understand A/B testing. I will explain how A/B testing is done in an online business. In the end, I will mention some of the
October 13, 2017
Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis on tools like R, Python, SparkML, and techniques like deep learning is prevalent. Relying on tools and techniques while ignoring the fundamentals is the wrong approach to model building. Real-world machine learning requires hard work, discipline and […]
August 18, 2017
The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms
February 10, 2014
Storytelling is a cornerstone of the human experience. Though many elements of stories have remained the same throughout history, we have developed better tools and mediums for telling them, such as printed books, movies, and comics. This has changed storytelling styles—and perhaps most importantly, the impact of those stories. Today the best stories are often