In this video series, we introduce the R programming language. We go through what it is, what it does, and how to use it. R is a free, open-source statistical programming platform. It is designed to make many of the most common data processing tasks as simple as possible. With this knowledge, you’ll be able

Here we introduce the benefits and disadvantages to using the R language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce the R interface, functions, and variables. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

We go deeper into how R functions. Here we introduce data types and the 5 atomic classes. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce the basics of vectors in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

This is a continuation of the basics of vectors in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce the basics of matrices in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce the basics of matrices in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce data frames in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce what lists are in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

We continue explaining the data functions and types you can use in R. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we explain the missing values of the R programming language, NA, NAN, and NO. R is a free, open-source statistical programming platform. It is designed to make many of the most common data processing tasks as simple as possible. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we show you how to use 3rd party packages for the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Here we introduce the built-in interfaces such as reading and writing text data. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

We dive deeper into the basics of R and introduce you to the control instructors such as “if statements”. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

We introduce the built-in functions for data exploration and alteration. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

Continuing with our introduction to the basic features, we show off “apply functions” in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

We introduce plotting packages in the R programming language. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

This is the last video our R series. We finish explaining the basics of data exploration and visualization in R. You should now understand the basics of R and be better prepared to utilize it in bootcamp. Download R: https://cran.r-project.org/ Download RStudio: https://www.rstudio.com/products/rstu…

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these

We cover some basic functions of dplyr including the mighty group_by and summarize combo that makes dividing up datasets a breeze, as well as arrange, select, and filter that help get the data in a cleaner and more organized format. Group-by aggregation is one of the most powerful, yet simple, tools you can use to

We introduce functions that make it easy to find overlapping and distinct values from two different data sources, intersect and setdiff. These two functions let you see the shared and unique elements from different vectors, making it easy to spot commonalities and differences. After watching this video, you’ll walk away feeling more empowered to tackle

In this final tutorial of the dplyr series, we will cover ways to do feature engineering both with dplyr and base R . You’ll learn how to impute missing values as well as create new values based on existing columns. In addition, we’ll go over four different ways to combine datasets. If […]

This data science tutorial introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of

This data science tutorial introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of

This data science tutorial introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of

To prepare you for Data Science Dojo’s day two homework we will explain what Kaggle is and show you how to create a Kaggle account and submit your model to the Kaggle competition. Titanic Data Set: https://www.kaggle.com/c/titanic

In this tutorial we will show you how to complete the titanic Kaggle competition using Microsoft Azure Machine Learning Studio.This video assumes you have an Azure account and you understand how to use Azure. Kaggle Titanic Experiment: https://gallery.cortanaintelligence.com/Experiment/Titanic-Kaggle-Competition-1

As part of submitting to Data Science Dojo’s Kaggle competition you need to create a model out of the titanic data set. We will show you how to do this using RStudio. Titanic Data Set: https://www.kaggle.com/c/titanic Download RStudio: https://www.rstudio.com/products/rstudio/download/

In part two of using RStudio for Data Science Dojo’s Kaggle competition, we will show you more advance cleaning functions for your model. This video assumes you have watched part one, if you have not, view it here: https://www.youtube.com/watch?v=Zx2TguRHrJE Titanic Data Set: https://www.kaggle.com/c/titanic Download RStudio: https://www.rstudio.com/products/rstudio/download/

### Data Manipulation with dplyr

March 19, 2018

dplyr is a a great tool to perform data manipulation. It makes your data analysis process a lot more efficient. Even better, it’s fairly simple to learn and start applying immediately to your work! Oftentimes, with just a few elegant lines of code, your data becomes that much easier to dissect and analyze. For these

### Building a Business Case for your Machine Learning Idea

January 15, 2018

This presentation will discuss building a business model for your machine learning idea. In this talk, our presenter, Neeti Gupta, will provide a 10-step checklist with examples for the audience to build their own business model. This 10-step business checklist is a synthesis of the speaker’s real world experience evaluating companies that have built a

### Ethical Dimensions of Data Science

December 15, 2017

From distorting experiments with systemic bias to imposing human ethics on machine learning models, data scientists have far more to worry about than the raw numbers in their spreadsheet. Join Raja Iqbal on an exploration of data science’s past evils and how we can pave the way to a brighter future.

### Feature Engineering for Bot Detection

October 27, 2017

According to some estimates, bots constitute close to 50% of the overall traffic. In this introductory talk, we will cover various aspects of feature engineering & detection of automated web traffic. We will start with understanding the impact of bots on an online business and various types of web bots. Finally, we will talk about

### Online Experimentation and A/B Testing

October 16, 2017

In this meetup, I will give a quick introduction to online experimentation and A/B testing. To keep the tutorial self-contained, I will first give an overview of stats fundamentals needed to understand A/B testing. I will explain how A/B testing is done in an online business. In the end, I will mention some of the

### Building Robust Machine Learning Models

October 13, 2017

Modern machine learning libraries make model building look deceptively easy. An unnecessary emphasis on tools like R, Python, SparkML, and techniques like deep learning is prevalent. Relying on tools and techniques while ignoring the fundamentals is the wrong approach to model building. Real-world machine learning requires hard work, discipline and […]

### Introduction to Data Visualization with R and ggplot2

August 18, 2017

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s rich and powerful data visualization capabilities. While tools like Excel, Power BI, and Tableau are often the go-to solutions for data visualizations, none of these tools can compete with R in terms

### Storytelling with PowerBI

February 10, 2014

Storytelling is a cornerstone of the human experience. Though many elements of stories have remained the same throughout history, we have developed better tools and mediums for telling them, such as printed books, movies, and comics. This has changed storytelling styles—and perhaps most importantly, the impact of those stories. Today the best stories are often

### Introduction to Machine Learning with R and caret

February 10, 2014

The R programming language is experiencing rapid increases in popularity and wide adoption across industries. This popularity is due, in part, to R’s huge collection of open source machine learning algorithms. If you are a data scientist working with R, the caret package is a must-have tool in your […]

### Business Data Analysis with Excel

January 20, 2014

Lecture Starts at: 8:25 Business data presents a challenge for the data analyst. Business data is often aggregated, recorded over time, and tends to exhibit autocorrelation. Additionally, and most problematically, the amount of business data is usually quite limited. These characteristics lead to a situation where many of the tools in the analyst’s tool belt

### Introduction to R Programming for Excel Users

January 8, 2014

R programming is rapidly becoming a valuable skill for data professionals of all stripes and a must-have skill for aspiring data scientists. Adding R programming to your data analyst skillset allows you to leverage powerful data visualizations, statistical analyses, and even machine learning in your daily work. In this presentation, Dave Langer illustrates how your

### Introduction to Event Log Mining with R

January 8, 2014

Event logs are everywhere and represent a prime source of Big Data. Event log sources run the gamut from e-commerce web servers to devices participating in globally distributed Internet of Things architectures. Even Enterprise Resource Planning systems produce event logs! Given the rich and varied data contained in event logs, mining these assets […]

### Intro to R Visualizations in Microsoft Power BI

January 8, 2014

Microsoft’s Power BI is a powerful technology for quickly creating rich visualizations. Power BI has many practical uses for the modern data professional including executive dashboards, operational dashboards, and visualizations for data exploration/analysis. Microsoft has also extended Power BI with support for incorporating R visualizations into Power BI projects, enabling a myriad of data visualization

### Scale R to Big Data Using Hadoop and Spark

January 8, 2014

Outline: · Setup a Spark cluster with R installed . · Wrangle data that is inside HDFS using R. · Build and deploy a machine learning model using R. R is currently one of the most popular data science languages in the world. However, it’s always had constraints around scaling out to big data. […]

Nearly 100 years after Einstein predicted the existence of gravitational waves, Laser Interferometer and Gravitational Wave Observatory astounded the world by successfully detecting these waves. Detection was made possible by the advancement of laser technology and data processing techniques. Being able to distinguish the gravitational waves from the background noise was key to verifying […]

At this meetup, presenter Craig Guarraci speaks about how to make sense of unstructured text with Python, MS Cognitive Services & PowerBI. – In this presentation we’ll take a broad look at industry research to see how text analytics and sentiment analysis is used – We’ll look at difficulties associated with sentiment analysis – Review

### Building Real-Time Sentiment Pipeline for Live Tweets

January 8, 2014

At this Data Science Dojo meetup, Phuc Duong talks about Building a Real-Time Sentiment Pipeline for Live Tweets Using Python, R, & Azure Supplementary Material found here: https://github.com/gokul180288/meetup…

In this 90-minute talk, we will cover an overview of solving a simple predictive analytics problem. We will use R for feature exploration and visualization and build a predictive model using Azure ML. We will be using the Titanic data set for our exercise. You will see the end-to-end process of building a predictive model.