Home \ Data Mining

R programming is rapidly becoming a valuable skill for data professionals of all stripes and a must-have skill for aspiring data scientists. Adding R programming to your data analyst skillset allows you to leverage powerful data visualizations, statistical analyses, and …

Event logs are everywhere and represent a prime source of Big Data. Event log sources run the gamut from e-commerce web servers to devices participating in globally distributed Internet of Things (IoT) architectures. Even Enterprise Resource Planning (ERP) systems produce …

Scatter plots and contour plots are the final topics of our discussion on different visualization techniques for data exploration. We will define each plot, and share examples of when you can use each for your data mining.

Histograms and box plots are the most popular visualization techniques. In this tutorial, we discuss the unique benefits of both, and provide examples of when you can use each for your data exploration and visualization.

Center and Spread measurement is the next topic in our discussion on data exploration and visualization. We discuss measuring of center such as the median and mean, and look at measures of spread such as range and variance.

Data visualization and exploration and what they are to data mining is the topic of this tutorial. Data exploration is visualization and calculation to better understand characteristics of data. We will tell you the key motivations of data exploration as …

Summary statistics is the next step in our discussion on data exploration and visualization. We discuss summary statistics and the frequency and mode of an attribute. Summary statistics are numbers that summarize properties of data, and the frequency of an …

Correlation and visually evaluating is the next step in our discussion on similarity and dissimilarity. Correlation measures the linear relationship between objects, and to visually evaluate correlation, you will need to build a scatter plot.

Euclidean distance and cosine similarity are the next aspect of similarity and dissimilarity we will discuss. We will show you how to calculate the euclidean distance and construct a distance matrix.

Watch the next video in this series

(540)…

Similarity and dissimilarity are the next data mining concepts we will discuss. Similarity is a numerical measure of how alike two data objects are, and dissimilarity is a numerical measure of how different two data objects are. We also discuss …