Data Science and R Programming



R is a programming language and free software environment for statistical computing and graphics supported. R’s open interfaces allow it to integrate with other applications and systems. It is a powerful language used widely for data analysis and statistical computing. Packages such as dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization and computation much faster. To Become a Data Scientist, you must know starting from basics in Statistics, Data Management and Analytics to advanced topics like Machine Learning and Big Data.

course Outline

Introduction to R Programming

  • Why R?
  • Approaches to Machine Learning
  • Data Cleaning
  • Data Integration
  • Installing R Programming Console Software
  • Understanding the Comprehensive R Archive Network
  • Installing RStudio: The IDE

Creating a Graph using RStudio

  • Histograms and Density Plots
  • Dot Plots
  • Bar Plots
  • Line Charts
  • Pie Charts
  • Box Plots
  • Scatter Plots

Advanced Graphs using RStudio

  • Graphical Parameters
  • Axes and Text
  • Combining Plots
  • Lattice Graphs
  • ggplot2 Graphs
  • Probability Plots
  • Mosaic Plots
  • Correlograms
  • Interactive Graphs

Loading Different Kinds of Data into R

  • Read TXT files with read.table()
  • Read CSV files into R
  • read.delim() for Delimited Files
  • XLConnect Package for Reading Excel Files
  • Read RDBMS data
  • Read JSON data
  • Read XML data
  • Read HTML data
  • SPSS data
  • Read Stata data
  • Read Systat data
  • Read SAS data
  • Read Minitab data
  • Read RDA or RData data

Data Types and Structures

  • Introduction to Data Types and Structures
  • Understanding Basic Data Types in R
  • Lists
  • Matrices

Function Components

  • Introduction to Function Components
  • Writing functions in R
  • Lexical Scoping
  • Function Arguments
  • Special Calls
  • Return Values
  • Objects Attributes

Creating Data Frame

  • Testing and Corecion
  • Combining data frames
  • Special Columns
  • Manipulate and Analyze data

Fundamentals of Data Management

  • Creating the leadership data frame
  • Renaming variables
  • Missing values
  • Data values
  • Type conversions
  • Sorting data
  • Merging datasets
  • Subsetting operators
  • Using SQL statements to manipulate data frames

Introduction to Components Analysis

  • Components Analysis
  • Cluster Analysis
  • Discriminant Analysis
  • Statistical Tests
  • Missing Value Treatment

Introduction to Statistical Models

  • Defining Statistical Models
  • Linear Models
  • Generic Functions for Extracting Model Information
  • Analysis of Variance and Model Comparison
  • Generalized Linear Models
  • Some Non-Standard Models

Predictive Modeling using R Programming

  • Linear Regression
  • Non-Linear Regression (NLS)
  • Logistic Regression
  • Isotonic Regression
  • Decision Tree
  • Support Vector Machines (SVM)
  • Naive Bayes
  • K-Nearest Neighbours (kNN)
  • K-Means
  • Streaming K-Means
  • Gaussian Mixture
  • Random Forest
  • Dimesionality Reduction Algorithms

Time Series Analysis

  • Robust Regression
  • Fitting Analysis of Variance (ANOVA) Models
  • One-Way ANOVA
  • One-Way ANCOVA
  • Two-Way Factorial ANOVA
  • Repeated Meaures ANOVA
  • Multivariate Analysis of Variance (MANOVA)

Advance Statistical Analysis using R

  • Hypothesis Testing
  • Outlier Analysis
  • Feature Selection
  • Model Selection
  • Logistic Regression
  • Advanced Linear Regression
  • Text Analysis in R
  • Sentiment Analysis with Tidy Data

Graphical Procedures & Packages

  • High-Level Plotting Commands
  • Low-Level Plotting Commands
  • Interacting with Graphics
  • Using Graphics Parameters
  • Graphics Parameters List