Advanced Data Analysis in R organised by NUGS-China

Session: One

Date: 17/03/2021

Intructor: Clement Twumasi

Website: https://twumasiclement.wixsite.com/website

YouTube channel: https://www.youtube.com/channel/UCxrpjLYi_Akmbc2QICDvKaQ

Objectives

  1. Brief introduction to the R environment and its IDEs (R base, R studio and Jupyter Notebook).

  2. Objects, variable types, Arrays and matrices & installing R packages.

  3. Vector and matrix arithmetics (using R as a calculator).

  4. Set working directory and importing different data formats (.CSV, .TXT, XLS/XLXS, SPSS, SAS, STATA, etc.)

  5. Attaching and detaching variables & its effects

  6. Exporting data or results from R as a CSV file into desired working directory.

Attached is also a pdf just in case you want to learn on your own

1.https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf

2.https://www.tutorialspoint.com/r/r_tutorial.pdf

For extremely good free books on R, click the link to get names of range of good books👇

https://r-dir.com/learn/e-books.html

Setting working directory

In [7]:
setwd("C:/Users/user/Desktop/DataAnalysis_results_R/NUGSChina_R_class")
In [8]:
#Setting plot size and resolution in Jupyter notebook
options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300) #Setting plot size

Installing and load R packages

NB: You only install a package once; afterwards, you will only need to load it with library() anytime you want to use it

In [53]:
# install R package called "foreign" (it is used to import other data types like SPSS, STATA, etc.)
#NB: From the warning message, this package is not in my version of R
install.packages("foreign")

#NB; Package "haven" can also be used import other data types like SPSS, STATA, etc.
Installing package into 'C:/Users/user/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

Warning message:
"package 'foreign' is not available (for R version 3.6.1)"
In [17]:
x= 1:1000
mean(x)
sd(x)
#plot(x)
500.5
288.819436095749
In [9]:
install.packages("https://cran.r-project.org/src/contrib/Archive/foreign/foreign_0.8-76.tar.gz")
Installing package into 'C:/Users/user/Documents/R/win-library/3.6'
(as 'lib' is unspecified)

inferring 'repos = NULL' from 'pkgs'

Warning message in install.packages("https://cran.r-project.org/src/contrib/Archive/foreign/foreign_0.8-76.tar.gz"):
"installation of package 'C:/Users/user/AppData/Local/Temp/RtmpykxPs8/downloaded_packages/foreign_0.8-76.tar.gz' had non-zero exit status"
In [5]:
#install either package "devtools" or "remotes" to install R packages from Github
#install.packages("devtools")   
#install.packages("remotes")
#library(devtools) #loading "devtools" package

Installing package "haven" from github from the author Hadley

In [19]:
#You can use: 
#install.packages("haven")

#Alternatively
#remotes::install_github("hadley/haven")
In [20]:
#Loading packages

library("foreign")
library("readxl") #package for loading excel data
library(haven) #pac
In [21]:
2+3
5

Using R as a calculator & other arithmetics

Creating vectors and matrices

Creating vectors

In [28]:
multiples_of_3<- seq(3, 100, by=3)
multiples_of_3
  1. 3
  2. 6
  3. 9
  4. 12
  5. 15
  6. 18
  7. 21
  8. 24
  9. 27
  10. 30
  11. 33
  12. 36
  13. 39
  14. 42
  15. 45
  16. 48
  17. 51
  18. 54
  19. 57
  20. 60
  21. 63
  22. 66
  23. 69
  24. 72
  25. 75
  26. 78
  27. 81
  28. 84
  29. 87
  30. 90
  31. 93
  32. 96
  33. 99
In [26]:
x<- seq(1,20,length.out=100)
#x
print(x)
  [1]  1.000000  1.191919  1.383838  1.575758  1.767677  1.959596  2.151515
  [8]  2.343434  2.535354  2.727273  2.919192  3.111111  3.303030  3.494949
 [15]  3.686869  3.878788  4.070707  4.262626  4.454545  4.646465  4.838384
 [22]  5.030303  5.222222  5.414141  5.606061  5.797980  5.989899  6.181818
 [29]  6.373737  6.565657  6.757576  6.949495  7.141414  7.333333  7.525253
 [36]  7.717172  7.909091  8.101010  8.292929  8.484848  8.676768  8.868687
 [43]  9.060606  9.252525  9.444444  9.636364  9.828283 10.020202 10.212121
 [50] 10.404040 10.595960 10.787879 10.979798 11.171717 11.363636 11.555556
 [57] 11.747475 11.939394 12.131313 12.323232 12.515152 12.707071 12.898990
 [64] 13.090909 13.282828 13.474747 13.666667 13.858586 14.050505 14.242424
 [71] 14.434343 14.626263 14.818182 15.010101 15.202020 15.393939 15.585859
 [78] 15.777778 15.969697 16.161616 16.353535 16.545455 16.737374 16.929293
 [85] 17.121212 17.313131 17.505051 17.696970 17.888889 18.080808 18.272727
 [92] 18.464646 18.656566 18.848485 19.040404 19.232323 19.424242 19.616162
 [99] 19.808081 20.000000
In [36]:
is.vector(multiples_of_3) # asking whether its a vector

multiples_of_3
length(multiples_of_3)
#multiples_of_3
TRUE
  1. 3
  2. 6
  3. 9
  4. 12
  5. 15
  6. 18
  7. 21
  8. 24
  9. 27
  10. 30
  11. 33
  12. 36
  13. 39
  14. 42
  15. 45
  16. 48
  17. 51
  18. 54
  19. 57
  20. 60
  21. 63
  22. 66
  23. 69
  24. 72
  25. 75
  26. 78
  27. 81
  28. 84
  29. 87
  30. 90
  31. 93
  32. 96
  33. 99
33
In [29]:
print(multiples_of_3)
 [1]  3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75
[26] 78 81 84 87 90 93 96 99
In [39]:
x<-c(4,80,1,4,6)
x
  1. 4
  2. 80
  3. 1
  4. 4
  5. 6
In [40]:
x<- c(2,4, 6, 8, 10)
y<-c(3,6,9,12,15)
In [41]:
x/(y^2)
  1. 0.222222222222222
  2. 0.111111111111111
  3. 0.0740740740740741
  4. 0.0555555555555556
  5. 0.0444444444444444
In [11]:
3*y ##multiple 3 to each value of y
  1. 9
  2. 18
  3. 27
  4. 36
  5. 45
In [12]:
x+y #it adds element-wise
  1. 5
  2. 10
  3. 15
  4. 20
  5. 25
In [13]:
x-y #it substracts element-wise
  1. -1
  2. -2
  3. -3
  4. -4
  5. -5
In [14]:
y/x #it divides element-wise
  1. 1.5
  2. 1.5
  3. 1.5
  4. 1.5
  5. 1.5
In [15]:
x*y #it multiples element-wise
  1. 6
  2. 24
  3. 54
  4. 96
  5. 150
In [28]:
?sort
In [42]:
z<- c(5,8,2,4,7,10)
z
  1. 5
  2. 8
  3. 2
  4. 4
  5. 7
  6. 10
In [30]:
sort(z) # sorts the vector in an ascending order
order(z) #eturns the indices of the vector in a sorted order
sort(z,decreasing = TRUE) # sorts the vector in a descending order
rev(z) # reverse z
  1. 2
  2. 4
  3. 5
  4. 7
  5. 8
  6. 10
  1. 3
  2. 4
  3. 1
  4. 5
  5. 2
  6. 6
  1. 10
  2. 8
  3. 7
  4. 5
  5. 4
  6. 2
  1. 10
  2. 7
  3. 4
  4. 2
  5. 8
  6. 5
In [19]:
Countries<- c("Ghana","UK","France")
Countries
Countries[1]
Countries[1:2]
Countries[3]
Countries[c(3,2)]
  1. 'Ghana'
  2. 'UK'
  3. 'France'
'Ghana'
  1. 'Ghana'
  2. 'UK'
'France'
  1. 'France'
  2. 'UK'

Creating matrices

$$A=\begin{pmatrix} 3 & 2 & 3 \\ 8 & 5 & 6 \\ 7 & 2 & 9 \end{pmatrix}$$

Method 1

In [45]:
mean_value<- 3
mean_value
3
In [46]:
c1<-c(3,8,7)
c2<-c(2,5,2)
c3<- c(3,6,9)
A1<- cbind(c1,c2,c3)
A1


#OR


A2= matrix(c(3,8,7,2,5,2,3,6,9), nrow=3, ncol=3,byrow=F)
A2
A matrix: 3 × 3 of type dbl
c1c2c3
323
856
729
A matrix: 3 × 3 of type dbl
323
856
729

Method 2

In [47]:
r1<-c(3,2,3); r2<- c(8,5,6); r3<- c(7,2,9)
A3<- rbind(r1,r2,r3)
A3

#OR

A4= matrix(c(3,2,3,8,5,6,7,2,9), nrow=3, ncol=3,byrow=TRUE)
A4
A matrix: 3 × 3 of type dbl
r1323
r2856
r3729
A matrix: 3 × 3 of type dbl
323
856
729
In [41]:
# NB A1=A4 creating an object A1 with entries equal to A4
A1==A4 # A1==A4 means checking whether the entries of A1 and A4 are equal
A matrix: 3 × 3 of type lgl
c1c2c3
TRUETRUETRUE
TRUETRUETRUE
TRUETRUETRUE

Matrix Arithmetics

In [48]:
det(A1) #determinant of matrix A
-18
In [49]:
2*3
6
In [44]:
A1_square<- A1%*%A4 
A1_square
A matrix: 3 × 3 of type dbl
c1c2c3
4622 48
10653108
10042114
In [45]:
solve(A1)# inverse of matrix A1
A matrix: 3 × 3 of type dbl
c1-1.833333 0.6666667 0.16666667
c2 1.666667-0.3333333-0.33333333
c3 1.055556-0.4444444 0.05555556
In [51]:
eigen(A1)$values
  1. 14.2433660672058
  2. 3.15694111947913
  3. -0.400307186684956
In [49]:
eigen(A1)# return eigen values and eigen vectors

eigen(A1)$values # extract only eigen values as a vector

eigen(A1)$vectors # extract only eigen vectors as a matrix
eigen() decomposition
$values
[1] 14.2433661  3.1569411 -0.4003072

$vectors
           [,1]       [,2]       [,3]
[1,] -0.2988448 -0.2122669 -0.6984402
[2,] -0.6879522 -0.8181781  0.5982253
[3,] -0.6613725  0.5343476  0.3928203
  1. 14.2433660672058
  2. 3.15694111947913
  3. -0.400307186684956
A matrix: 3 × 3 of type dbl
-0.2988448-0.2122669-0.6984402
-0.6879522-0.8181781 0.5982253
-0.6613725 0.5343476 0.3928203

Importing different data from your working directory

Note that its is more convenient to convert excel files into CSV file before importing into R

Importing CSV data

In [54]:
#Importing CSV data
MurderRates<-read.csv("MurderRates_data.csv")
head(MurderRates,n=10) # view first 6 rows
tail(MurderRates,n=6) # view last 6 rows
A data.frame: 10 × 8
rateconvictionsexecutionstimeincomelfpnoncaucsouthern
<dbl><dbl><dbl><int><dbl><dbl><dbl><fct>
119.250.2040.035 471.1051.20.321yes
2 7.530.3270.081 580.9248.50.224yes
3 5.660.4010.012 821.7250.80.127no
4 3.210.3180.0701002.1854.40.063no
5 2.800.3500.0622221.7552.40.021no
6 1.410.2830.1001642.2656.70.027no
7 6.180.2040.0501612.0754.60.139yes
812.150.2320.054 701.4352.70.218yes
9 1.340.1990.0862191.9252.30.008no
10 3.710.1380.000 811.8253.00.012no
A data.frame: 6 × 8
rateconvictionsexecutionstimeincomelfpnoncaucsouthern
<dbl><dbl><dbl><int><dbl><dbl><dbl><fct>
39 1.740.4180.0001042.0451.70.017no
4011.980.2820.032 911.5954.30.222yes
41 3.040.1940.0861992.0753.70.026no
42 0.850.3780.0001012.0054.70.012no
43 2.830.7570.0331091.8447.00.057yes
44 2.890.3560.0001172.0456.90.022no

Importing excel data directly without changing to CSV

In [78]:
#Importing excel data directly without changing to CSV
library("readxl")
Excel_data<- read_excel("Transformed_data.xlsx")
head(Excel_data)


Excel_data<- Excel_data[,-1]
head(Excel_data)
New names:
* `` -> ...1

A tibble: 6 × 4
...1ZcoresElementsLocations
<dbl><chr><chr><chr>
12.2171772203965099 CarbonA
2-8.0415754107645801E-2CarbonC
3-0.76969364645889404 CarbonE
40.14934354334276501 CarbonA
5-0.31017505155806502 CarbonC
6-0.31017505155806502 CarbonE
A tibble: 6 × 3
ZcoresElementsLocations
<chr><chr><chr>
2.2171772203965099 CarbonA
-8.0415754107645801E-2CarbonC
-0.76969364645889404 CarbonE
0.14934354334276501 CarbonA
-0.31017505155806502 CarbonC
-0.31017505155806502 CarbonE

Importing SPSS data into R

In [63]:
#Importing SPSS data into R
SPSS_data<- read.spss("Combined_data_SPSS.sav", use.value.label=TRUE, to.data.frame=TRUE)
head(SPSS_data)
re-encoding from UTF-8

A data.frame: 6 × 19
V1ExperienceX.Strabismus_surgeryX.Oculoplastic_surgeryX.Cataract_surgeryVR_surgeryLaser_surgeryExtraocular_surgical._competenceX.stereoacuity_level_extraocular_surgeryIntraocular_surgical_competencestereoacuity_level_intraocular_surgeryX.Extraocular_surgery_performedIntraocular_surgery_performedStereoacuity_measuredStereoCompLocationCategoryCataract
<dbl><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><dbl><dbl><fct><dbl><dbl>
115-10 years Disagree Disagree Disagree Disagree Disagree No No stereoacuity No No stereoacuity NoYesNo 30Wales 11
221-5 years Agree Agree Agree Agree Agree Yes200 secs Yes80 secs NoYesYes11Wales 11
331-5 years Disagree Disagree Agree Agree Agree No No stereoacuity Yes200 secs NoYesYes01Wales 11
4410-15 years Disagree Disagree Agree Agree Agree No No stereoacuity Yes400 secs NoYesNo 31Wales 11
5515-20 years Agree Agree Agree Agree Agree Yes60 secs of arc or betterYes60 secs of arc or better NoYesYes11Wales 11
6615-20 years Agree Agree Agree Agree Agree Yes60 secs of arc or betterYes60 secs of arc or better NoYesYes21Northern 11

Importing Stata data

In [64]:
#Importing Stata data into R using package "foreign"
Stata_data <- read.dta("imm23.dta")
head(Stata_data )
A data.frame: 6 × 18
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregion
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
16053 1 0.850.69977271140183502443312
26053 2 0.430.69977271130183432443312
36053 4-0.590.69977273030183502143312
4605311 1.020.69977271150183492443312
5605312 0.840.69977271150183621443312
6605313 1.320.69977271160183432443312

Importing SAS data into R

Run in SAS to convert data into CSV before importing into R (Long approach) :)

proc export data=dataset

outfile="datast.csv"

dbms=csv;

run;

And then, run this in R

df <- read.csv("dataset.csv",header=T,as.is=T)

In [71]:
#Alternatively (simple approach) using package haven
#library(haven)
SAS_data<- read_sas("imm10.sas7bdat")
head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.482608711211904824123221
7472 8-0.39-0.482608701211904814123221
747213-0.80-0.482608701211905314123221
747217-0.72-0.482608711211904214123221
747227-0.74-0.482608721211904324123221
747228-0.58-0.482608711211905724123221
In [129]:
summary(SAS_data) #not recommended
     schid           stuid            ses              meanses        
 Min.   : 7472   Min.   : 0.00   Min.   :-2.41000   Min.   :-1.06850  
 1st Qu.: 7930   1st Qu.:24.75   1st Qu.:-0.77000   1st Qu.:-0.50450  
 Median :25642   Median :52.00   Median :-0.14500   Median :-0.19682  
 Mean   :41024   Mean   :49.87   Mean   :-0.07331   Mean   :-0.07331  
 3rd Qu.:62821   3rd Qu.:73.25   3rd Qu.: 0.81250   3rd Qu.: 1.04463  
 Max.   :72292   Max.   :99.00   Max.   : 1.85000   Max.   : 1.04463  
    homework         white           parented         public      
 Min.   :0.000   Min.   :0.0000   Min.   :1.000   Min.   :0.0000  
 1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:0.0000  
 Median :1.000   Median :1.0000   Median :3.000   Median :1.0000  
 Mean   :2.023   Mean   :0.7269   Mean   :3.177   Mean   :0.7423  
 3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:1.0000  
 Max.   :7.000   Max.   :1.0000   Max.   :6.000   Max.   :1.0000  
     ratio          percmin           math           sex             race      
 Min.   :10.00   Min.   :0.000   Min.   :31.0   Min.   :1.000   Min.   :1.000  
 1st Qu.:10.00   1st Qu.:1.000   1st Qu.:42.0   1st Qu.:1.000   1st Qu.:3.000  
 Median :14.00   Median :3.000   Median :49.5   Median :1.000   Median :4.000  
 Mean   :14.53   Mean   :2.835   Mean   :51.3   Mean   :1.492   Mean   :3.577  
 3rd Qu.:18.00   3rd Qu.:5.000   3rd Qu.:62.0   3rd Qu.:2.000   3rd Qu.:4.000  
 Max.   :22.00   Max.   :7.000   Max.   :71.0   Max.   :2.000   Max.   :4.000  
     sctype           cstr           scsize          urban      
 Min.   :1.000   Min.   :2.000   Min.   :2.000   Min.   :1.000  
 1st Qu.:1.000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:1.000  
 Median :1.000   Median :4.000   Median :3.000   Median :1.000  
 Mean   :1.773   Mean   :3.562   Mean   :3.269   Mean   :1.773  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:3.250   3rd Qu.:3.000  
 Max.   :4.000   Max.   :5.000   Max.   :6.000   Max.   :3.000  
     region          schnum      
 Min.   :1.000   Min.   : 1.000  
 1st Qu.:2.000   1st Qu.: 3.000  
 Median :2.000   Median : 6.000  
 Mean   :2.331   Mean   : 5.688  
 3rd Qu.:3.000   3rd Qu.: 7.000  
 Max.   :3.000   Max.   :10.000  

Attaching and detaching variables/data using the SAS data

In [60]:
print(SAS_data$race) #run the variable sex in the SAS data
  [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 2 3 3 3 3
 [38] 3 3 3 3 3 3 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[112] 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 1 3 4 4
[149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 1 4 4 1 4
[186] 4 1 4 2 4 4 4 4 4 4 4 4 4 2 3 3 4 4 4 4 3 4 4 4 3 3 3 3 4 3 3 3 4 3 4 3 4
[223] 3 3 4 4 3 3 3 4 4 4 3 4 4 4 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[260] 4
attr(,"label")
[1] "race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American"
In [83]:
print(SAS_data$sex)
  [1] 2 1 1 1 2 2 2 1 2 2 2 1 1 2 2 2 2 2 2 1 2 1 2 2 1 1 1 2 1 1 1 1 1 1 2 1 2
 [38] 1 1 2 1 2 2 2 1 2 1 2 2 1 1 2 1 2 1 1 2 2 2 1 1 1 2 2 2 1 2 1 1 2 1 1 1 2
 [75] 2 1 2 2 1 2 2 2 2 1 1 2 1 1 1 2 2 2 1 2 1 1 2 2 1 1 1 1 1 2 1 1 2 1 1 2 2
[112] 1 2 2 2 1 2 2 2 2 1 2 1 1 1 1 2 2 2 2 1 1 2 1 2 2 2 2 2 2 1 1 1 1 2 2 1 1
[149] 2 2 2 2 1 2 1 1 2 1 2 2 2 2 1 1 1 2 1 2 2 1 1 1 2 1 1 1 1 2 1 1 1 2 2 1 1
[186] 1 2 1 1 2 2 1 2 1 1 2 1 1 1 1 2 1 2 1 1 2 1 2 2 1 2 2 1 1 1 2 2 2 2 2 1 1
[223] 2 1 2 2 1 2 1 1 2 1 2 1 2 1 2 1 2 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 2 1 1 1 1
[260] 2
attr(,"label")
[1] "Sex: 1=male, 2=female"
In [63]:
levels(as.factor(SAS_data$sex))
is.factor(SAS_data$sex)
  1. '1'
  2. '2'
FALSE
In [80]:
levels(as.factor(SAS_data$sex))
  1. '1'
  2. '2'
In [72]:
SAS_data$sex<- factor(SAS_data$sex,levels=c(1,2),labels=c("male","female"))

head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.4826087112119048female4123221
7472 8-0.39-0.4826087012119048male 4123221
747213-0.80-0.4826087012119053male 4123221
747217-0.72-0.4826087112119042male 4123221
747227-0.74-0.4826087212119043female4123221
747228-0.58-0.4826087112119057female4123221
In [73]:
table(SAS_data$sex)
  male female 
   132    128 
In [132]:
print(SAS_data$race)
  [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 2 3 3 3 3
 [38] 3 3 3 3 3 3 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[112] 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 1 3 4 4
[149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 1 4 4 1 4
[186] 4 1 4 2 4 4 4 4 4 4 4 4 4 2 3 3 4 4 4 4 3 4 4 4 3 3 3 3 4 3 3 3 4 3 4 3 4
[223] 3 3 4 4 3 3 3 4 4 4 3 4 4 4 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[260] 4
attr(,"label")
[1] "race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American"
In [133]:
table(as.factor(SAS_data$race))
  1   2   3   4 
  8  23  40 189 
In [134]:
SAS_data$race<- factor(SAS_data$race,levels=c(1,2,3,4),labels=c("asian","Hispanic","Black","White"))
print(SAS_data$race)
  [1] White    White    White    White    White    White    White    White   
  [9] White    White    White    White    White    White    White    White   
 [17] White    White    White    White    White    White    White    Black   
 [25] Black    Black    Black    Black    Black    Black    Black    Black   
 [33] Hispanic Black    Black    Black    Black    Black    Black    Black   
 [41] Black    Black    Black    White    White    asian    White    White   
 [49] White    White    White    White    White    asian    White    White   
 [57] White    White    asian    White    White    White    White    White   
 [65] White    White    White    White    White    White    White    White   
 [73] White    White    White    White    White    White    White    White   
 [81] White    White    White    White    White    White    White    Hispanic
 [89] White    White    White    White    White    White    White    White   
 [97] White    White    White    White    White    White    White    White   
[105] White    White    White    White    White    White    White    Hispanic
[113] Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic
[121] White    Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic
[129] Hispanic Hispanic Hispanic White    White    White    White    White   
[137] White    White    White    White    White    White    White    White   
[145] asian    Black    White    White    White    White    White    White   
[153] White    White    White    White    White    White    White    White   
[161] White    White    White    White    White    White    White    White   
[169] White    White    White    White    White    White    White    White   
[177] asian    White    White    White    asian    White    White    asian   
[185] White    White    asian    White    Hispanic White    White    White   
[193] White    White    White    White    White    White    Hispanic Black   
[201] Black    White    White    White    White    Black    White    White   
[209] White    Black    Black    Black    Black    White    Black    Black   
[217] Black    White    Black    White    Black    White    Black    Black   
[225] White    White    Black    Black    Black    White    White    White   
[233] Black    White    White    White    Black    White    Black    White   
[241] White    White    White    White    White    White    White    White   
[249] White    White    White    White    White    White    White    White   
[257] White    White    White    White   
Levels: asian Hispanic Black White
In [135]:
table(SAS_data$race,SAS_data$sex)
          
           male female
  asian       3      5
  Hispanic   11     12
  Black      20     20
  White      98     91
In [75]:
attach(SAS_data)

print(sex)
  [1] female male   male   male   female female female male   female female
 [11] female male   male   female female female female female female male  
 [21] female male   female female male   male   male   female male   male  
 [31] male   male   male   male   female male   female male   male   female
 [41] male   female female female male   female male   female female male  
 [51] male   female male   female male   male   female female female male  
 [61] male   male   female female female male   female male   male   female
 [71] male   male   male   female female male   female female male   female
 [81] female female female male   male   female male   male   male   female
 [91] female female male   female male   male   female female male   male  
[101] male   male   male   female male   male   female male   male   female
[111] female male   female female female male   female female female female
[121] male   female male   male   male   male   female female female female
[131] male   male   female male   female female female female female female
[141] male   male   male   male   female female male   male   female female
[151] female female male   female male   male   female male   female female
[161] female female male   male   male   female male   female female male  
[171] male   male   female male   male   male   male   female male   male  
[181] male   female female male   male   male   female male   male   female
[191] female male   female male   male   female male   male   male   male  
[201] female male   female male   male   female male   female female male  
[211] female female male   male   male   female female female female female
[221] male   male   female male   female female male   female male   male  
[231] female male   female male   female male   female male   female female
[241] female male   female female male   male   male   female female male  
[251] male   female male   male   female male   male   male   male   female
Levels: male female
In [76]:
detach(SAS_data) # detach function can be used to also detach an R package

print(sex)
Error in print(sex): object 'sex' not found
Traceback:

1. print(sex)
In [136]:
head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct><fct><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.4826087112119048femaleWhite123221
7472 8-0.39-0.4826087012119048male White123221
747213-0.80-0.4826087012119053male White123221
747217-0.72-0.4826087112119042male White123221
747227-0.74-0.4826087212119043femaleWhite123221
747228-0.58-0.4826087112119057femaleWhite123221

Exporting the updated "SAS_data" into working repository as "SAS_data_updated" (saved as CSV file)

In [139]:
write.csv(SAS_data,"SAS_data_updated.csv")