Codes for the assignment tasks

First Assignment/Practice Task using the University_data.csv data.

Dataset description/ Description of variables:

Major – program of freshman

Language – the language of the program

Type – Type of prame A: scholarship, K: tuition fee

Enrolment_Points – points of freshman (maximum is 500)

High level mathematics – Mathematics A level or maturity test at a high level

Mathematics point – maximum is 100 Tasks (marks out of 100)

Residence: 1 means South; 2 means North; 3 means East; 4 means West

TASKS

  1. Assign the names to the levels of the categorical variables (Residence) where 1 means South; 2 means North; 3 means East; 4 means West.
  1. Create a new numerical variable called Total_points; which is obtained after adding both the Enrolment_Points and Mathematics points for each student & add this new variable (Total_points) to the original data.

  2. Convert their Mathematics points to a categorical variable called Mathematics_grade with categories: i) Low class if Mathematics point is less 40% ii) Medium class if Mathematics point is from 40% to 70% and iii) High class if Mathematics point > 70%.

  1. Save the new data as a CSV file directly into your working directory/DataAnalysis_results_R folder.

  2. Create your own function to compute descriptive/summary statistics (just update what I created already in the second meeting dated October 24, 2020). The summary statistics function should: i) First determine the type of variable, ii). If it's numeric find & return mean, median, mode, variance, standard deviation, maximum value, minimum value, standard error, skewness, kurtosis and 95% quantile recorded to 2 decimal places as well as histogram plot of the numeric variable coloured by “green” colour. iii) Else if it is categorical, it should find & return percentages for all categories/levels (in 1 decimal place) and the name of the categories as a data frame as well as plot a pie chart with percentages for each category of the variable with different colours.

  1. For each variable, find the summary statistics with the data created above.
  1. Depict at least 8 figures/plots (it can be more than 8) using only ggplot2 package to describe the data the best you can as a Data scientist (example of the plots are line graphs, scatter plots, barplot, boxplot, histograms of numeric variables across different categories, barchart of categorical variables distinguishing between different categories variables, etc ). Figures should be nice and very informative too as a professional Data Scientist.
  1. Produce 5 different plots to also summarize the data the best you can without the use of the ggplot2 package.
  1. Find the correlation matrix plots between all the numerical variables (Enrolment_points, Mathematics points and Total points) using the "PerformanceAnalytics" R package.

Setting working directory

In [1]:
#Setting working directory (desktop>Folder called DataAnalysis_results_R )
setwd("C:/Users/user/Desktop/DataAnalysis_results_R")
In [2]:
#Setting plot size for large view and specific resolution
options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300)

Loading R packages

In [52]:
#Intalling our first R package (NB: You will need an internet to install packages only)
#install.packages("moments") #run this to install the package called "moments"

# Loading R packages
library("moments") # load the package to be able to calculate skewness and kurtosis only
library("ggplot2") #for plot graphs using ggplot
library("RColorBrewer") #To select customized colours (sequential, etc.) but there are lot's of colours without packages 
library("PerformanceAnalytics") #for correlation matrix plot
library(gridExtra)
library("tidyr")
library("dplyr")
library(forcats)

Importing the data into R

In [8]:
#Importing data from the working directory 
Data_uni<-read.csv(file="University_data.csv")
head(Data_uni,n=6)#view first 10 rows of the data with object name: "Data_school"
#You can look at the last 10 rows using:
#tail(Data_uni,n=10)
A data.frame: 6 × 7
Major.programLanguageTypeEnrolment_PointsHigh.Level.MathematicsMathematics.pointsResidence
<fct><fct><fct><int><fct><int><int>
1Human resources HungarianA436No 01
2International Business HungarianA486Yes771
3Finance and accounting HungarianA456Yes861
4Finance and accounting HungarianA431No 01
5Political science HungarianA448No 01
6Sociology HungarianA476Yes921

To check whether there is any missing data

In [9]:
#To check whether there is any missing data
any(is.na(Data_uni)) #it returned false implies no missing data
FALSE

To view the dimension of the data

In [10]:
#To view the dimension of the data 2331 rows and 7 columns
dim(Data_uni)
  1. 2331
  2. 7

Names of variables in the data

In [12]:
#Names of variables in the data
names(Data_uni)
  1. 'Major.program'
  2. 'Language'
  3. 'Type'
  4. 'Enrolment_Points'
  5. 'High.Level.Mathematics'
  6. 'Mathematics.points'
  7. 'Residence'

1. Assign the names to the levels of the categorical variables (Residence) where 1 means South; 2 means North; 3 means East; 4 means West.

In [11]:
#view the levels for the variable Residence
levels(as.factor(Data_uni$Residence))
  1. '1'
  2. '2'
  3. '3'
  4. '4'
In [26]:
#Assign the names to the levels of the categorical variables (Residence) 
#where 1 means South; 2 means North; 3 means East; 4 means West.
Data_uni$Residence<-factor(Data_uni$Residence,levels =c(1,2,3,4),
                               labels = c("South","North","East","West"))
In [27]:
#Viewing the updated data (first 8 rows)
Data_uni[1:8, ] #or head(Data_uni,n=8)
A data.frame: 8 × 7
Major.programLanguageTypeEnrolment_PointsHigh.Level.MathematicsMathematics.pointsResidence
<fct><fct><fct><int><fct><int><fct>
1Human resources HungarianA436No 0South
2International Business HungarianA486Yes77South
3Finance and accounting HungarianA456Yes86South
4Finance and accounting HungarianA431No 0South
5Political science HungarianA448No 0South
6Sociology HungarianA476Yes92South
7Commerce and Marketing HungarianA462No 0South
8Business Informatics HungarianK412No 0South

2. Create a new numerical variable called Total_points; which is obtained after adding both the Enrolment_Points and Mathematics points for each student & add this new variable (Total_points) to the original data.

In [28]:
# Create a new numerical variable called:
#Total_points= Enrolment_Points + Mathematics

Data_uni$Total_points<-Data_uni$Enrolment_Points+Data_uni$Mathematics.points

#Viewing the updated data (first 4 rows)
Data_uni[1:4, ]
A data.frame: 4 × 8
Major.programLanguageTypeEnrolment_PointsHigh.Level.MathematicsMathematics.pointsResidenceTotal_points
<fct><fct><fct><int><fct><int><fct><int>
1Human resources HungarianA436No 0South436
2International Business HungarianA486Yes77South563
3Finance and accounting HungarianA456Yes86South542
4Finance and accounting HungarianA431No 0South431

3. Convert their Mathematics points to catgorical variable called Mathematics_grade with categories: i) Low class if Mathematics point is less 40% ii) Medium class if Mathematics point is from 40% to 70% and iii) High class if Mathematics point > 70%.

In [40]:
#Convert their Mathematics points to catgorical variable called Mathematics_grade with categories: 
#i) Low class if Mathematics point is less 40% 
#ii) Medium class if Mathematics point is from 40% to 70% and 
#iii) High class if Mathematics point > 70%.

threshold<- c( min(Data_uni$Mathematics.points), 40, 71, max(Data_uni$Mathematics.points)+1)

Data_uni$Mathematics_grade<- cut(Data_uni$Mathematics.points,breaks=threshold,right=FALSE, 
                                 labels=c("Low class","Medium class","High class"))

#Viewing the updated data (first 10 rows)
head(Data_uni,n=10)
A data.frame: 10 × 9
Major.programLanguageTypeEnrolment_PointsHigh.Level.MathematicsMathematics.pointsResidenceTotal_pointsMathematics_grade
<fct><fct><fct><int><fct><int><fct><int><fct>
1Human resources HungarianA436No 0South436Low class
2International Business HungarianA486Yes77South563High class
3Finance and accounting HungarianA456Yes86South542High class
4Finance and accounting HungarianA431No 0South431Low class
5Political science HungarianA448No 0South448Low class
6Sociology HungarianA476Yes92South568High class
7Commerce and Marketing HungarianA462No 0South462Low class
8Business Informatics HungarianK412No 0South412Low class
9Human resources HungarianA444No 0South444Low class
10International Business HungarianA480Yes55South535Medium class

4.Save the updated data as csv data and called it University_data_Updated file directly into your working directory/DataAnalysis_results_R folder.

To know your data has been saved you can either check your folder

In [33]:
#Save the new data as a  CSV file directly into your 
#working directory/DataAnalysis_results_R folder.
write.csv(Data_uni,"University_data_Updated.csv")

5. Create your own function to compute descriptive/summary statistics (just update what I created already in the second meeting dated October 24, 2020). The summary statistics function should: i) First determine the type of variable, ii). If it's numeric find & return mean, median, mode, variance, standard deviation, maximum value, minimum value, standard error, skewness, kurtosis and 95% quantile recorded to 2 decimal places as well as histogram plot of the numeric variable coloured by “green” colour. iii) Else if it is categorical, it should find & return percentages for all categories/levels (in 1 decimal place) and the name of the categories as a data frame as well as plot a pie chart with percentages for each category of the variable with different colours.

Creating a fuction to compute the mode

It is important to note that the mode may not be relevant for some numeric/quantity variable and may not exist (or could be more than one value). Mode for categorical variables can sometimes be important.

In [44]:
# Creating the function to compute mode in R
getmode <- function(x) {
   uniq_x <- unique(x)
   return(uniq_x[which.max(tabulate(match(x, uniq_x)))])
}
In [47]:
getmode(Data_uni$Enrolment_Points)
458
In [26]:
names(Data_uni)
  1. 'Major.program'
  2. 'Language'
  3. 'Type'
  4. 'Enrolment_Points'
  5. 'High.Level.Mathematics'
  6. 'Mathematics.points'
  7. 'Residence'
  8. 'Total_points'
  9. 'Mathematics_grade'
In [78]:
table(Data_uni$Mathematics_grade)
   Low class Medium class   High class 
        1751          149          431 

Creating own function to compute descriptive/summary statistics

The summary statistics function should first:

  1. Determine the type of variable
  1. If it's numeric find & return mean, median, mode, standard deviation, standard error, skewness, kurtosis and 95% quantile recorded to 2 decimal places as well as histogram plot of the numeric variable coloured by blue colour.
  1. Else if it's categorical, it should find & return percentages for all categories/levels (in 1 decimal place) and the name of the categories as a dataframe as well as plot a pie chart with percentages for each category of the variable with different colours
In [53]:
#Creating a function to estimate summary statistics of the data
#by determining the type of variable
#If it's numeric find & return mean, median, mode standard deviation, standard error, 
#skewness, kurtosis and 95% quantile recorded to 2 decimal places
#But if its categorical find & return percentages for all categories/levels 
#(in 1 decimal place) and the name of the categories as a dataframe

Summary_stats<-function(data, variable_index){
  Variable_name<-names(data)[variable_index]
  Variable<-(data)[,variable_index]
  if(is.numeric(Variable)==TRUE){ #if variable is numeric/quantitative
    #compute mean, median, standard deviation, standard error, 
    #skewness and kurtosis
  mean_value<-mean(Variable) #compute mean
  median_value<-median(Variable) #compute median
  modal_value<-getmode(Variable)
  std<-sd(Variable) #compute standard deviation
  standard_error<-std/sqrt(length(Variable)) #compute standard error
  skewness<-skewness(Variable) #compute skewness
  kurtosis<-kurtosis(Variable) #compute kurtosis
  quantile_95percent<-quantile(Variable,c(0.025,0.975)) #compute 95% quantile
    graph<-hist(Variable,xlab=paste(Variable_name),col="blue", main="")
  #returns the mean, median, standard deviation, standard error,skewness and kurtosis
  return(list(Variable_name=Variable_name,
    mean=round(mean_value,2),median=round(median_value,2),mode=modal_value,std=round(std,2),SE=round(standard_error,2),
    skewness=round(skewness,2),kurtosis=round(kurtosis,2),quantile_95percent=round(quantile_95percent,2),histogram=graph))           

  } else if(is.factor(Variable)==TRUE){ #else if categorical
    #compute the percentages rounded in 1 decimal place
  percentage<-paste(round((table(Variable)/dim(data)[1])*
                           100,1),"%")
  levels_variable<-levels(Variable)
  output<-data.frame(Categories=levels_variable,percentage=percentage)#storing output as dataframe
      
 #Plotting the pie chart for the categorical variable
   Percentage_values<- round((table(Variable)/dim(data)[1])*100,1)
   labels_variables <- paste(levels(Variable),":", Percentage_values) # add percents to labels
   labels_variables <- paste( labels_variables,"%",sep="") # ad % to labels
    
     
 #Deciding how many colours to choose if the number of categories is < 3 or >=3 before plot
  if(length(levels_variable)==2){
    colours_two_categories<- c("red","blue")
    pie_chart<- pie(x=Percentage_values, labels =labels_variables,radius =.7,cex=0.71,main="",            
    col =colours_two_categories,font=2,clockwise = TRUE,init.angle=90)

      } else if(length(levels_variable)>=3){
      colours_categories<-brewer.pal(n = length(Percentage_values), name = "Paired")
      pie_chart<- pie(x=Percentage_values, labels =labels_variables,radius =.7,cex=0.71,main="",            
    col =colours_categories,font=2,clockwise = TRUE,init.angle=90)
     
           }
  
       #return variable name and a dataframe of percentages for each category
     return(list(Variable_name=Variable_name,output=output, pie_chart= pie_chart))
       }
  }
In [30]:
#Print index for each variable
for(i in 1:dim(Data_uni)[2]) print(paste(names(Data_uni)[i],"","Index=",i))
[1] "Major.program  Index= 1"
[1] "Language  Index= 2"
[1] "Type  Index= 3"
[1] "Enrolment_Points  Index= 4"
[1] "High.Level.Mathematics  Index= 5"
[1] "Mathematics.points  Index= 6"
[1] "Residence  Index= 7"
[1] "Total_points  Index= 8"
[1] "Mathematics_grade  Index= 9"
In [60]:
#Summary_stats(data=Data_uni, variable_index=4)

Finding the summary statistics for each variable

In [57]:
#1st variable
Summary_stats(data=Data_uni, variable_index=1)
$Variable_name
'Major.program'
$output
A data.frame: 11 × 2
Categoriespercentage
<fct><fct>
Applied economics 6.4 %
Business and management 21.5 %
Business Informatics 12.1 %
Commerce and Marketing 7.7 %
Communication and Media Science5.7 %
Finance and accounting 14.5 %
Human resources 7.2 %
International Business 10.7 %
International relations 7.5 %
Political science 2.2 %
Sociology 4.5 %
$pie_chart
NULL
In [58]:
#2nd variable
Summary_stats(data=Data_uni, variable_index=2)
$Variable_name
'Language'
$output
A data.frame: 2 × 2
Categoriespercentage
<fct><fct>
English 10.6 %
Hungarian89.4 %
$pie_chart
NULL
In [59]:
#3rd variable
Summary_stats(data=Data_uni, variable_index=3)
$Variable_name
'Type'
$output
A data.frame: 2 × 2
Categoriespercentage
<fct><fct>
A76.2 %
K23.8 %
$pie_chart
NULL
In [60]:
#4th variable
Summary_stats(data=Data_uni, variable_index=4)
$Variable_name
[1] "Enrolment_Points"

$mean
[1] 437.38

$median
[1] 448

$mode
[1] 458

$std
[1] 39.19

$SE
[1] 0.81

$skewness
[1] -1.32

$kurtosis
[1] 1.55

$quantile_95percent
  2.5%  97.5% 
327.00 487.75 

$histogram
$breaks
 [1] 300 320 340 360 380 400 420 440 460 480 500

$counts
 [1]  42  51  59  71 135 214 398 649 561 151

$density
 [1] 0.0009009009 0.0010939511 0.0012655513 0.0015229515 0.0028957529
 [6] 0.0045903046 0.0085371085 0.0139210639 0.0120334620 0.0032389532

$mids
 [1] 310 330 350 370 390 410 430 450 470 490

$xname
[1] "Variable"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
In [61]:
#5th variable
Summary_stats(data=Data_uni, variable_index=5)
$Variable_name
'High.Level.Mathematics'
$output
A data.frame: 2 × 2
Categoriespercentage
<fct><fct>
No 74.8 %
Yes25.2 %
$pie_chart
NULL
In [62]:
#6th variable
Summary_stats(data=Data_uni, variable_index=6)
$Variable_name
[1] "Mathematics.points"

$mean
[1] 19.55

$median
[1] 0

$mode
[1] 0

$std
[1] 34.27

$SE
[1] 0.71

$skewness
[1] 1.24

$kurtosis
[1] -0.33

$quantile_95percent
 2.5% 97.5% 
    0    92 

$histogram
$breaks
 [1]   0  10  20  30  40  50  60  70  80  90 100

$counts
 [1] 1744    0    2    5   10   40   99  163  188   80

$density
 [1] 7.481767e-02 0.000000e+00 8.580009e-05 2.145002e-04 4.290004e-04
 [6] 1.716002e-03 4.247104e-03 6.992707e-03 8.065208e-03 3.432003e-03

$mids
 [1]  5 15 25 35 45 55 65 75 85 95

$xname
[1] "Variable"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
In [63]:
#7th variable
Summary_stats(data=Data_uni, variable_index=7)
$Variable_name
'Residence'
$output
A data.frame: 4 × 2
Categoriespercentage
<fct><fct>
South45.3 %
North18.1 %
East 15.2 %
West 21.3 %
$pie_chart
NULL
In [64]:
#8th variable
Summary_stats(data=Data_uni, variable_index=8)
$Variable_name
[1] "Total_points"

$mean
[1] 456.93

$median
[1] 457

$mode
[1] 458

$std
[1] 56.75

$SE
[1] 1.18

$skewness
[1] -0.16

$kurtosis
[1] 0.22

$quantile_95percent
 2.5% 97.5% 
  327   568 

$histogram
$breaks
 [1] 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600

$counts
 [1]  42  51  55  57 116 170 306 481 436 152 100 137 140  70  18

$density
 [1] 0.0009009009 0.0010939511 0.0011797512 0.0012226512 0.0024882025
 [6] 0.0036465036 0.0065637066 0.0103174603 0.0093522094 0.0032604033
[11] 0.0021450021 0.0029386529 0.0030030030 0.0015015015 0.0003861004

$mids
 [1] 310 330 350 370 390 410 430 450 470 490 510 530 550 570 590

$xname
[1] "Variable"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
In [79]:
#9th variable
Summary_stats(data=Data_uni, variable_index=9)
$Variable_name
'Mathematics_grade'
$output
A data.frame: 3 × 2
Categoriespercentage
<fct><fct>
Low class 75.1 %
Medium class6.4 %
High class 18.5 %
$pie_chart
NULL

6. Depict at least 8 figures/plots (it can be more than 8) using only ggplot2 package to describe the data the best you can as a Data scientist (example of the plots are line graphs, scatter plots, barplot, boxplot, histograms of numeric variables across different categories, barchart of categorical variables distinguishing between different categories variables, etc)

Plotting 8 plots in using ggplot2 package

http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

NB: facet_wrap() in the ggplot functions below helps plotting/splitting across the different categories as seen with the variable Language using facet_wrap(~language) for example.

In [61]:
names(Data_uni)
  1. 'Major.program'
  2. 'Language'
  3. 'Type'
  4. 'Enrolment_Points'
  5. 'High.Level.Mathematics'
  6. 'Mathematics.points'
  7. 'Residence'
  8. 'Total_points'
  9. 'Mathematics_grade'
In [65]:
#NB: **facet_wrap()** in the ggplot functions below helps plotting/splitting 
#across the different categories as seen 
#with the variable **Language** using facet_wrap(~language) for example.

#First plot
ggplot(Data_uni, aes(x= Type,  group=Language)) + 
    geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
    geom_text(aes( label = scales::percent(..prop..),
                   y= ..prop.. ), stat= "count", vjust = -.5) +
    labs(y = "Percent", fill=" Type",caption="Clement's School on Advanced Data Analysis") +
    facet_grid(~Language) +
    scale_y_continuous(labels = scales::percent)+xlab(" Type of program")+ylab("Percentage")+ theme(legend.position = "none")
In [68]:
#2nd  ggplot
ggplot(Data_uni, aes(x=Residence,  group=Mathematics_grade)) + 
    geom_bar(aes(y = ..prop.., fill = factor(..x..))) +
    geom_text(aes( label = scales::percent(..prop..),
                   y= ..prop.. ), stat= "count", vjust = -.5) +
    labs(y = "Percent", fill="Residence",caption="Clement's School on Advanced Data Analysis") +
    facet_grid(~Mathematics_grade) +
    scale_y_continuous(labels = scales::percent)+xlab("Residence")+ylab("Percentage")+ theme(legend.position = "none")
In [71]:
#3rd & 4th ggplot

Type_levels<-c(levels(Data_uni$Type))
g1 <- ggplot(Data_uni, aes(x=Language, y=Mathematics.points,fill=Residence)) +
    geom_bar(aes(fill = factor(Type, levels=Type_levels)),position=position_dodge(), stat="identity")+
facet_wrap( ~ Mathematics_grade)+labs(fill="Type")+ylab("Mathematics points")



Type_levels<-c(levels(Data_uni$Type))
g2 <- ggplot(Data_uni, aes(x=Residence, y=Enrolment_Points,fill=Residence)) +
    geom_bar(aes(fill = factor(Type, levels=Type_levels)),position=position_dodge(), stat="identity")+
facet_wrap( ~ Mathematics_grade)+labs(fill="Type")+ylab("Enrolment points")+xlab("Residence")



grid.arrange(g1,g2,nrow=2,ncol=1)
In [75]:
max(Data_uni$Total_point)
590
In [78]:
#5th
ggplot(Data_uni, aes(x=Major.program, y=Total_points/1000)) +
    geom_bar(stat="identity", alpha=.6, width=.4,color="blue") +
    coord_flip() +xlab("Program Major") +ylab("Total points")+theme_bw()
In [82]:
ggplot(Data_uni,aes(x=Language, y=Enrolment_Points, fill=Type)) + 
    geom_boxplot() +
    xlab("class")  +
   xlab("Language of program") +ylab("Enrolment points")+
labs(fill = "Program type",caption="Clement's School on Advanced Data Analysis")
In [151]:
ggplot(Data_uni,aes(x=Language, y=Enrolment_Points, fill=Residence)) + 
    geom_boxplot() +
    xlab("class")  +
   xlab("Language of program") +ylab("Enrolment points")+labs(fill = "Residence")

Smoothed Line graph between Enrolment and Mathematics points without splitting by any additional variables

In [83]:
#Smoothed Line graph between Enrolment and Mathematics points without splitting by any additional variables
Data_uni%>%
  mutate(type=fct_reorder(as.factor(Enrolment_Points),Total_points),
         prcode=fct_reorder(as.factor(Mathematics_grade),Total_points)) %>% 
  ggplot()+geom_smooth(aes(x=Enrolment_Points,y=Total_points),method = "auto") +
xlab("Enrolment points")+ylab("Total points")+ labs(colour = "Mathematics grade")
`geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

In [191]:
p1=Data_uni%>%
  mutate(type=fct_reorder(as.factor(Enrolment_Points),Total_points),
         prcode=fct_reorder(as.factor(Mathematics_grade),Total_points)) %>% 
  ggplot()+geom_smooth(aes(x=Enrolment_Points,y=Total_points,color=Mathematics_grade,group=Mathematics_grade),method = "auto") +
xlab("Enrolment points")+ylab("Total points")+ labs(colour = "Mathematics grade")



p2=ggplot(Data_uni, aes(x =Enrolment_Points,y =Total_points, color =Mathematics_grade)) + geom_point()+ 
labs(colour = "Mathematics grades")+xlab("Enrolment points")+ylab("Total points")


grid.arrange(p1,p2,nrow=2,ncol=1)
`geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

In [204]:
g1 <- ggplot(Data_uni, aes(x=Enrolment_Points))+ geom_density(aes(fill=factor(Residence)), alpha=0.8) + 
    labs(title="Density plot", 
         subtitle="Enrolment points grouped by Residence",
         caption="Clement's School on Advance Data Analysis",
         x="Enrolment points",
         fill="Residence")


g2 <- ggplot(Data_uni, aes(x=Total_points))+ geom_density(aes(fill=factor(Mathematics_grade)), alpha=0.8) + 
    labs(title="Density plot", 
         subtitle="Total points grouped by Mathematics grade",
         caption="Clement's School on Advance Data Analysis",
         x="Total points",
         fill="Mathematics grade")

grid.arrange(g1,g2,nrow=2,ncol=1)
In [ ]:

In [148]:
names(Data_uni)
  1. 'Major.program'
  2. 'Language'
  3. 'Type'
  4. 'Enrolment_Points'
  5. 'High.Level.Mathematics'
  6. 'Mathematics.points'
  7. 'Residence'
  8. 'Total_points'
  9. 'Mathematics_grade'

Correlation matrix plot

In [210]:
Data=Data_uni[,c(4,6,8)] 
colnames(Data)=c("Enrolment points","Mathematics points","Total points")
chart.Correlation(Data, histogram=TRUE, pch=19)