Advanced Data Analysis in R (organised by NUGS-China 2022)

Facilitator: Clement Twumasi (Postdoctoral Researcher, Oxford University Statistics Department, UK).

Date: July 1, 2022.

Personal Website: https://twumasiclement.wixsite.com/website

YouTube Channel on Advanced R programming videos: https://www.youtube.com/channel/UCxrpjLYi_Akmbc2QICDvKaQ/videos

YouTube_Screenshot.png

Objectives

  1. Brief introduction to mathematical/statistical programming and important info on data analyses (e.g., variables, data structures, inferential tests & model fitting).
  1. Description of R IDEs including installation & loading of packages (e.g., R studio, Jupyter Notebook).
  1. Setting working directory, importing and exporting data of different extensions (.CSV, .TXT, XLS/XLXS, SPSS, SAS, STATA, etc.).
  1. Sourcing of external Rscripts/Complex R Functions, descriptive summaries & data visualisations in R.
  1. Interesting Practice Task (Instructor will describe the problem): Using statistics to aid in forensic or crime detection investigation of mummy with the help of a GLM classification model.

1. Brief introduction to mathematical/statistical programming and important info on data analyses (e.g., variables, data structures, inferential tests & model fitting)

Inferential tests and model fitting in R

NB: The class/type of statistical tests and models to fit are predominantly dependent on the type/nature of your data.

  1. Statistical tests:
    • Mean tests (Parametric tests: Paired t-tests, independent t-test, ANOVA, Bonferroni pairwise comparison tests, Repeated Measures ANOVA, MANOVA,etc).
    • Median tests (Non-parametric tests: Wilcoxon sign test, Mann-Whitney, Kruskal-Wallis test, Bonferroni-Dunn's test, Friedman test, Multivariate Kruskal-Wallis test,etc).
    • Test of proportions for one or more groups, etc.
    • Correlation test/analysis (Using the informative correlation matrix plot from the PerformanceAnalytics package should be enough).

Link to some of the aforementioned univariate tests in R: https://rpubs.com/sujith/IS

Multivariate analysis in R: https://little-book-of-r-for-multivariate-analysis.readthedocs.io/en/latest/src/multivariateanalysis.html

http://www.sthda.com/english/wiki/manova-test-in-r-multivariate-analysis-of-variance

PCA, SVD, Correspondence Analysis and Multidimensional Scaling:

https://web.stanford.edu/class/bios221/labs/multivariate/lab_5_multivariate.html

  1. Mixed effect regression model (we shall fit a Generalized Linear Mixed Model based on an empirical).

Note some of the class of regression models available. Knowing which one is appropriate for any given data and research problem is very imperative.

  • GLM (OLS regression, binomial \& multinomial regression, poisson regression, quasi poisson regression, negative binomial regression, etc) [cross-sectional data]
  • Generalized Linear Mixed Models (for all types of dependent variables) [these class of models are for longitudinal data]
  • Generalized Additive Models (fixed and mixed-effect types; both cross-sectional \& longitudinal data) and Panel regression models [these class of models for longitudinal data]

Introduction to linear regression: https://www.machinelearningplus.com/machine-learning/complete-introduction-linear-regression-r/

OLS linear regression in R: http://r-statistics.co/Linear-Regression.html

Different class of linear models in R: https://www.statmethods.net/advstats/glm.html

  1. Time series analysis (we shall simulate a time series data, learn how to declare time series data and fit its model).

Time series analysis in R:

https://ourcodingclub.github.io/tutorials/time/

https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/

Other class of models: Machine learning algorithms

  • Machine learning algorithms (Classification tree, Random forest, Gradient Boosting Machine, etc.) [both cross-sectional \& longitudinal data]

Variable selection methods using penalized regression & other methods

  • Method of variable selection: Stepwise regression, Penalized regression (Ridge, LASSO and Elastic net), Recursive Feature Selection, etc.

Data dimension reduction methods

  • Principal Component Analysis--PCA, Factor Analysis--FA (which can either be Confirmatory or Exploratory FA), and Partial Least Square Regression, Multidimensional Scaling, Independent component analysis, t-distributed stochastic neighbor embedding, Uniform manifold approximation and projection (UMAP), etc.

https://rpubs.com/Saskia/520216

2. Description of R IDEs including installation & loading of packages (e.g., R studio, Jupyter Notebook).

In [9]:
###How to install R packages#####
#install.packages("parallel") #a package for parallel computing
#install.packages("glmnet") # a package for performing penalised or regularised regression models
#install.packages("ggplot2") #for other customised data visualisation
#install.packages("PerformanceAnalytics") #for customised correlation matrix plots

####Loading installed packages of interest#######
library("parallel")
library("glmnet")
library("ggplot2")
library("PerformanceAnalytics")
library("foreign")#for importing data such as SAS, Spss and Stata, etc.
library(haven) #for also importing data such as SAS, Spss and Stata, etc.
library("readxl") #package for loading excel data

3. Setting working directory, importing and exporting data of different extensions (.CSV, .TXT, XLS/XLXS, SPSS, SAS, STATA, etc.).

In [1]:
options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300) #Setting plot size

#setting working directory to import and/or export data
setwd("C:/Users/user/Desktop/DataAnalysis_results_R/NUGSChina_R_class2022")

Importing CSV data

In [14]:
#Importing CSV data
MurderRates<-read.csv("MurderRates_data.csv")
head(MurderRates,n=10) # view first 10 rows
A data.frame: 10 × 8
rateconvictionsexecutionstimeincomelfpnoncaucsouthern
<dbl><dbl><dbl><int><dbl><dbl><dbl><fct>
119.250.2040.035 471.1051.20.321yes
2 7.530.3270.081 580.9248.50.224yes
3 5.660.4010.012 821.7250.80.127no
4 3.210.3180.0701002.1854.40.063no
5 2.800.3500.0622221.7552.40.021no
6 1.410.2830.1001642.2656.70.027no
7 6.180.2040.0501612.0754.60.139yes
812.150.2320.054 701.4352.70.218yes
9 1.340.1990.0862191.9252.30.008no
10 3.710.1380.000 811.8253.00.012no
In [15]:
tail(MurderRates,n=6) # view last 6 rows
A data.frame: 6 × 8
rateconvictionsexecutionstimeincomelfpnoncaucsouthern
<dbl><dbl><dbl><int><dbl><dbl><dbl><fct>
39 1.740.4180.0001042.0451.70.017no
4011.980.2820.032 911.5954.30.222yes
41 3.040.1940.0861992.0753.70.026no
42 0.850.3780.0001012.0054.70.012no
43 2.830.7570.0331091.8447.00.057yes
44 2.890.3560.0001172.0456.90.022no

Importing excel data directly without changing to CSV

In [12]:
#Importing excel data directly without changing to CSV
library("readxl")
Excel_data<- read_excel("Transformed_data.xlsx")
head(Excel_data)


Excel_data<- Excel_data[,-1]
head(Excel_data)
New names:
* `` -> ...1

A tibble: 6 × 4
...1ZcoresElementsLocations
<dbl><chr><chr><chr>
12.2171772203965099 CarbonA
2-8.0415754107645801E-2CarbonC
3-0.76969364645889404 CarbonE
40.14934354334276501 CarbonA
5-0.31017505155806502 CarbonC
6-0.31017505155806502 CarbonE
A tibble: 6 × 3
ZcoresElementsLocations
<chr><chr><chr>
2.2171772203965099 CarbonA
-8.0415754107645801E-2CarbonC
-0.76969364645889404 CarbonE
0.14934354334276501 CarbonA
-0.31017505155806502 CarbonC
-0.31017505155806502 CarbonE

Importing SPSS data into R

In [16]:
#Importing SPSS data into R
SPSS_data<- read.spss("Combined_data_SPSS.sav", use.value.label=TRUE, to.data.frame=TRUE)
head(SPSS_data)
re-encoding from UTF-8

A data.frame: 6 × 19
V1ExperienceX.Strabismus_surgeryX.Oculoplastic_surgeryX.Cataract_surgeryVR_surgeryLaser_surgeryExtraocular_surgical._competenceX.stereoacuity_level_extraocular_surgeryIntraocular_surgical_competencestereoacuity_level_intraocular_surgeryX.Extraocular_surgery_performedIntraocular_surgery_performedStereoacuity_measuredStereoCompLocationCategoryCataract
<dbl><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><fct><dbl><dbl><fct><dbl><dbl>
115-10 years Disagree Disagree Disagree Disagree Disagree No No stereoacuity No No stereoacuity NoYesNo 30Wales 11
221-5 years Agree Agree Agree Agree Agree Yes200 secs Yes80 secs NoYesYes11Wales 11
331-5 years Disagree Disagree Agree Agree Agree No No stereoacuity Yes200 secs NoYesYes01Wales 11
4410-15 years Disagree Disagree Agree Agree Agree No No stereoacuity Yes400 secs NoYesNo 31Wales 11
5515-20 years Agree Agree Agree Agree Agree Yes60 secs of arc or betterYes60 secs of arc or better NoYesYes11Wales 11
6615-20 years Agree Agree Agree Agree Agree Yes60 secs of arc or betterYes60 secs of arc or better NoYesYes21Northern 11

Importing Stata data with Haven package

In [18]:
#With Haven package 
#install.packaes("haven")
#library("haven")
ADP_data <- read_dta("ADP.dta")
head(ADP_data)
write.csv(ADP_data,"ADP_excel.csv")
A tibble: 6 × 632
versiondatestart_timehhidregion_idenumerator_idenumeratorconsentregiondistrict_id...fert_cost_soybean_3fertilizer_soybean_4fert_qty_soybean_4fert_up_soybean_4fert_cost_soybean_4durationhourstart_dateend_datesubmission_date
<chr><chr><chr><chr><dbl+lbl><dbl+lbl><chr><dbl+lbl><chr><dbl+lbl>...<dbl><chr><dbl><dbl><dbl><dbl><dbl><date><date><date>
21060121552-Jun-202111:40820959411Appiah Adjei Christina 1Upper East10...NANANANA303450.566672021-06-022021-06-042021-06-05
21060121552-Jun-202111:47646837412Mustapha Suraj Mohammed1Upper East10...NANANANA257642.933332021-06-022021-06-042021-06-04
21060121555-Jun-202112:26398199412Mustapha Suraj Mohammed1Upper East10...NANANANA408868.133332021-06-052021-06-082021-06-10
21060121553-Jun-202114:56674671412Mustapha Suraj Mohammed1Upper East10...NANANANA354859.133342021-06-032021-06-062021-06-06
21060121553-Jun-202111:24467917410John Azaaziba 1Upper East10...NANANANA497482.900002021-06-032021-06-062021-06-07
21060121554-Jun-202111:05127523412Mustapha Suraj Mohammed1Upper East10...NANANANA408468.066672021-06-042021-06-072021-06-08

Importing Stata data into R using package "foreign"

In [19]:
#Importing Stata data into R using package "foreign"
Stata_data <- read.dta("imm23.dta")
head(Stata_data )
A data.frame: 6 × 18
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregion
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
16053 1 0.850.69977271140183502443312
26053 2 0.430.69977271130183432443312
36053 4-0.590.69977273030183502143312
4605311 1.020.69977271150183492443312
5605312 0.840.69977271150183621443312
6605313 1.320.69977271160183432443312

Importing SAS data into R

Run in SAS to convert data into CSV before importing into R (Long approach) :)

proc export data=dataset

outfile="datast.csv"

dbms=csv;

run;

And then, run this in R

df <- read.csv("dataset.csv",header=T,as.is=T)

Alternatively (simple approach) using package haven

In [44]:
#Alternatively (simple approach) using package haven
#library(haven)
SAS_data<- read_sas("imm10.sas7bdat")
head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.482608711211904824123221
7472 8-0.39-0.482608701211904814123221
747213-0.80-0.482608701211905314123221
747217-0.72-0.482608711211904214123221
747227-0.74-0.482608721211904324123221
747228-0.58-0.482608711211905724123221

Declaring some variables in data called SAS_data as categorical with specific levels/categories

In [45]:
#Declaring some variables in data called SAS_data as categorical with specific levels/categories
SAS_data$sex<- factor(SAS_data$sex,levels=c(1,2),labels=c("male","female"))

head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.4826087112119048female4123221
7472 8-0.39-0.4826087012119048male 4123221
747213-0.80-0.4826087012119053male 4123221
747217-0.72-0.4826087112119042male 4123221
747227-0.74-0.4826087212119043female4123221
747228-0.58-0.4826087112119057female4123221
In [36]:
#Frequency across sex
cat("Frequencies across sex")
table(SAS_data$sex)

cat("Proprotions across sex")
table(SAS_data$sex)/sum(table(SAS_data$sex))
Frequencies across sex
  male female 
   132    128 
Proprotions across sex
     male    female 
0.5076923 0.4923077 
In [33]:
print(SAS_data$race)

paste("Frequencies across race")
table(as.factor(SAS_data$race))


paste("Percentages across race (%)")
(table(as.factor(SAS_data$race))/sum(table(as.factor(SAS_data$race))))*100
  [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 2 3 3 3 3
 [38] 3 3 3 3 3 3 4 4 1 4 4 4 4 4 4 4 1 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[112] 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 1 3 4 4
[149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1 4 4 4 1 4 4 1 4
[186] 4 1 4 2 4 4 4 4 4 4 4 4 4 2 3 3 4 4 4 4 3 4 4 4 3 3 3 3 4 3 3 3 4 3 4 3 4
[223] 3 3 4 4 3 3 3 4 4 4 3 4 4 4 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[260] 4
attr(,"label")
[1] "race of student, 1=asian, 2=Hispanic, 3=Black, 4=White, 5=Native American"
'Frequencies across race'
  1   2   3   4 
  8  23  40 189 
'Percentages across race (%)'
        1         2         3         4 
 3.076923  8.846154 15.384615 72.692308 
In [46]:
SAS_data$race<- factor(SAS_data$race,levels=c(1,2,3,4),labels=c("asian","Hispanic","Black","White"))

print(SAS_data$race)
  [1] White    White    White    White    White    White    White    White   
  [9] White    White    White    White    White    White    White    White   
 [17] White    White    White    White    White    White    White    Black   
 [25] Black    Black    Black    Black    Black    Black    Black    Black   
 [33] Hispanic Black    Black    Black    Black    Black    Black    Black   
 [41] Black    Black    Black    White    White    asian    White    White   
 [49] White    White    White    White    White    asian    White    White   
 [57] White    White    asian    White    White    White    White    White   
 [65] White    White    White    White    White    White    White    White   
 [73] White    White    White    White    White    White    White    White   
 [81] White    White    White    White    White    White    White    Hispanic
 [89] White    White    White    White    White    White    White    White   
 [97] White    White    White    White    White    White    White    White   
[105] White    White    White    White    White    White    White    Hispanic
[113] Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic
[121] White    Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic Hispanic
[129] Hispanic Hispanic Hispanic White    White    White    White    White   
[137] White    White    White    White    White    White    White    White   
[145] asian    Black    White    White    White    White    White    White   
[153] White    White    White    White    White    White    White    White   
[161] White    White    White    White    White    White    White    White   
[169] White    White    White    White    White    White    White    White   
[177] asian    White    White    White    asian    White    White    asian   
[185] White    White    asian    White    Hispanic White    White    White   
[193] White    White    White    White    White    White    Hispanic Black   
[201] Black    White    White    White    White    Black    White    White   
[209] White    Black    Black    Black    Black    White    Black    Black   
[217] Black    White    Black    White    Black    White    Black    Black   
[225] White    White    Black    Black    Black    White    White    White   
[233] Black    White    White    White    Black    White    Black    White   
[241] White    White    White    White    White    White    White    White   
[249] White    White    White    White    White    White    White    White   
[257] White    White    White    White   
Levels: asian Hispanic Black White
In [38]:
#Crosstabulations
table(SAS_data$race,SAS_data$sex)
          
           male female
  asian       3      5
  Hispanic   11     12
  Black      20     20
  White      98     91
In [47]:
print(SAS_data$public)

SAS_data$public<- factor(SAS_data$public,levels=c(0,1),labels=c("public","non-public"))


head(SAS_data)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[260] 1
attr(,"label")
[1] "Public school: 1=public, 0=non-public"
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct><dbl><dbl><dbl><fct><fct><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.4826087112non-public19048femaleWhite123221
7472 8-0.39-0.4826087012non-public19048male White123221
747213-0.80-0.4826087012non-public19053male White123221
747217-0.72-0.4826087112non-public19042male White123221
747227-0.74-0.4826087212non-public19043femaleWhite123221
747228-0.58-0.4826087112non-public19057femaleWhite123221
In [48]:
#Cross-tabulation
table(SAS_data$race,SAS_data$sex,SAS_data$public)
, ,  = public

          
           male female
  asian       3      2
  Hispanic    1      0
  Black       0      1
  White      32     28

, ,  = non-public

          
           male female
  asian       0      3
  Hispanic   10     12
  Black      20     19
  White      66     63

Exporting the updated "SAS_data" into working repository as "SAS_data_updatedCSV" (saved as CSV file)

NB: No package is required for that

In [52]:
#No package is required
#export SAS data as CSV 
write.csv(SAS_data,"SAS_data_updatedCSV.csv")

Exporting the updated "SAS_data" into working repository as "SAS_data_updatedExcel" (saved as Excel file)

NB: A package is required for that

In [50]:
#install.packages("writexl")
In [51]:
#export SAS data as excel
#Package "writexl" is required
writexl::write_xlsx(SAS_data,"SAS_data_updatedExcel.xlsx")

4. Sourcing of external Rscripts/Complex R Functions, descriptive summaries & data visualisations in R.

In [58]:
#Loading an external script called "Novel_summary_computation_script.R"
setwd("C:/Users/user/Desktop/DataAnalysis_results_R/NUGSChina_R_class2022")
source("Novel_summary_computation_script.R")

Instructor's (Clement) novel function created in R to compute descriptive/summary statistics (it will be developed into a package soon).

The summary statistics function should first:

  1. Determine the type of variable
  1. If it's numeric find & return mean, median, mode, standard deviation, standard error, skewness, kurtosis and 95% quantile recorded to 2 decimal places as well as histogram plot of the numeric variable unique colour respectively.
  1. Else if it's categorical, it should find & return percentages for all categories/levels (in 1 decimal place) and the name of the categories as a dataframe as well as plot a pie chart with percentages for each category of the variable with different colours

Summarizing a few of the variables using my novel function

NB: The data class must strictly be a dataframe.

In [71]:
head(SAS_data)

#Exclude data on ID
SAS_data<- SAS_data[, -1]

head(SAS_data)
A tibble: 6 × 19
schidstuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct><dbl><dbl><dbl><fct><fct><dbl><dbl><dbl><dbl><dbl><dbl>
7472 3-0.13-0.4826087112non-public19048femaleWhite123221
7472 8-0.39-0.4826087012non-public19048male White123221
747213-0.80-0.4826087012non-public19053male White123221
747217-0.72-0.4826087112non-public19042male White123221
747227-0.74-0.4826087212non-public19043femaleWhite123221
747228-0.58-0.4826087112non-public19057femaleWhite123221
A tibble: 6 × 18
stuidsesmeanseshomeworkwhiteparentedpublicratiopercminmathsexracesctypecstrscsizeurbanregionschnum
<dbl><dbl><dbl><dbl><dbl><dbl><fct><dbl><dbl><dbl><fct><fct><dbl><dbl><dbl><dbl><dbl><dbl>
3-0.13-0.4826087112non-public19048femaleWhite123221
8-0.39-0.4826087012non-public19048male White123221
13-0.80-0.4826087012non-public19053male White123221
17-0.72-0.4826087112non-public19042male White123221
27-0.74-0.4826087212non-public19043femaleWhite123221
28-0.58-0.4826087112non-public19057femaleWhite123221
In [62]:
class(SAS_data)
  1. 'tbl_df'
  2. 'tbl'
  3. 'data.frame'
In [64]:
#Converting SAS_data to data.frame
SAS_data_df<- as.data.frame(SAS_data)
In [65]:
#**Summarizing a few of the variables using my novel function**
Summary_stats(data=SAS_data_df, variable_index=2)
$Variable_name
'stuid'
$numerical_summaries
A data.frame: 1 × 9
meanmedianmodestdSEskewnesskurtosisquantile_95percent_lowerquantile_95percent_upper
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
49.87526828.941.79-0.05-1.21.4897
$histogram
NULL
In [66]:
Summary_stats(data=SAS_data_df, variable_index=8)
$Variable_name
'public'
$output
A data.frame: 2 × 2
Categoriespercentage
<fct><fct>
public 25.8 %
non-public74.2 %
$pie_chart
NULL
In [67]:
Summary_stats(data=SAS_data_df, variable_index=13)
$Variable_name
'race'
$output
A data.frame: 4 × 2
Categoriespercentage
<fct><fct>
asian 3.1 %
Hispanic8.8 %
Black 15.4 %
White 72.7 %
$pie_chart
NULL

A function created to extract all quantative variables of your data

In [68]:
#A function created to extract all quantative variables of your data
Quantitative_variabes<-function(data){
    
    Variable_name=list()
    for (variable_index in seq_along(names(data))){
     Variable<-(data)[,variable_index]   
  if(is.numeric(Variable)==TRUE){ #if variable is numeric/quantitative
   Variable_name[[variable_index]]<-names(data)[variable_index]
  }
        }
    return(unlist(Variable_name))
}
In [75]:
Quantitative_variabes(data=MurderRates)
  1. 'rate'
  2. 'convictions'
  3. 'executions'
  4. 'time'
  5. 'income'
  6. 'lfp'
  7. 'noncauc'

A function created to extract all categorical variables of your data

In [72]:
Categorical_variabes<-function(data){
    
    Variable_name=list()
    for (variable_index in seq_along(names(data))){
     Variable<-(data)[,variable_index]   
  if(is.numeric(Variable)==FALSE){ #if variable is numeric/quantitative
   Variable_name[[variable_index]]<-names(data)[variable_index]
  }
        }
    return(unlist(Variable_name))
}
In [76]:
Categorical_variabes(data=MurderRates)
'southern'

Correlation matrix plot based on MurderRates data

From PerformanceAnalytics R package

In [77]:
Data_numeric=MurderRates[ ,Quantitative_variabes(data=MurderRates)] 
#colnames(Data_numeric)=c("rate","convictions","executions","time","income","labour fp","Prop NC")
#method = c("pearson", "kendall", "spearman")
#options(warn=-1) #ignore warnings for instance with running non-parametric correlation due to ties
chart.Correlation(Data_numeric, histogram=TRUE, pch=19,method = c("pearson"))

5. Interesting Practice Task: Using statistics to aid in forensic or crime detection investigation of mummy with the help of a GLM classification model.

Description of the problem

NB: This is a published paper entitled: An Experimental Study of Lesions Observed in Bog Body Funerary Performances.

Authors: Tiffany Treadway and Clement Twumasi (2021).

URL Link: https://exarc.net/ark:/88735/10595

Fig%203.jpg

Importing the data which is saved as an SPSS file into R

In [78]:
#Importing SPSS data into R
Data<- read.spss("Data_task.sav", use.value.label=F, to.data.frame=TRUE)
head(Data)
re-encoding from UTF-8

A data.frame: 6 × 9
IDGroupsBMIArea_KnifeArea_SpearWidth_KnifeLength_KnifeWidth_SpearLength_Spear
<dbl><fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
11A18.013.378478.48960.5623.891.9240.88
21A18.029.988033.99961.2024.990.8739.08
31A18.040.133734.86361.7123.471.0234.18
41A18.027.891297.41161.2821.792.6237.18
52A19.156.124037.33001.8031.180.9437.33
62A19.151.714024.19591.8028.730.5941.01

Comparing the distribution of Knife and Spear areas across groups

Comparison between the differences in each group’s BMI and the produced incision area (height/width) were tested using a Multivariate analysis of variance (MANOVA). The BMI groups or fixed factors were divided into three ranges: 18-22.5 (Group A), 22.5-27 (Group B), and 27-31.5 (Group C).

In [82]:
#Install packages before loading/use
library("mvnormtest")
library("MVTests")
#For Non-parametric pairwise comparison
library("PMCMRplus")
library(ggpubr)
library(ggplot2)
library(gridExtra)

Multivariate normality test

In [84]:
# Multivariate normality test
#The data is multivariate normal
mshapiro.test(t(Data[, 4:5]))

shapiro.test(Data[, 4])
shapiro.test(Data[, 5])
	Shapiro-Wilk normality test

data:  Z
W = 0.95851, p-value = 0.03062
	Shapiro-Wilk normality test

data:  Data[, 4]
W = 0.9396, p-value = 0.003636
	Shapiro-Wilk normality test

data:  Data[, 5]
W = 0.87003, p-value = 7.058e-06
In [86]:
p1=ggqqplot(Data, "Area_Knife", facet.by = c("Groups")) +labs(x="",y="Sample quantile (Dagger)")

p2=ggqqplot(Data, "Area_Spear", facet.by = c("Groups")) +labs(x="Normal quantile",y="Sample quantile (Spear)")

grid.arrange(p1,p2,ncol=1)

Box's M test

Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups

The hypotheses are defined as H0:The Covariance matrices are homogeneous and H1:The Covariance matrices are not homogeneous

In [89]:
results <- BoxM(data=Data[, 4:5],group=Data$Groups)
summary(results)
       Box's M Test 

Chi-Squared Value = 3.666222 , df = 6  and p-value: 0.722 

In [91]:
par(mfrow=c(2,1),mar=c(4,4,1,1))
boxplot(Data$Area_Knife~Data$Groups,
main = "",
ylim=c(0,150),
ylab=expression(paste("Incision area in mm"^"2"  )),        
xlab="",
names = c("Group A (BMI: 18-22.5)","Group B (BMI: 22.5-27)","Group C (BMI: 27-31.5)"),
las = 1,
col = c("red","blue","green"),
border = "black",
horizontal = F,
notch = F
)

mtext(text = "Dagger",
      side = 3, #side 2 = left
      line = 0,cex=1,font=2,adj =0.01)


boxplot(Data$Area_Spear~Data$Groups,
main = "",
ylim=c(0,200),
ylab=expression(paste("Incision area in mm"^"2    "  )),
xlab="",
names = c("Group A (BMI: 18-22.5)","Group B (BMI: 22.5-27)","Group C (BMI: 27-31.5)"),
las = 1,
col = c("red","blue","green"),
border = "black",
horizontal = F,
notch = F
)

mtext(text = "Spear",
      side = 3, #side 2 = left
      line = 0,cex=1,font=2,adj =0.01)

#18-22.5 (Group A), 22.5-27 (Group B), and 27-31.5 (Group C).

Implementing the Multivariate Kruskal Wallis (MKW) test

Creating a function for Multivariate Kruskal Wallis test

The function returns the test statistic and its p-value

In [92]:
multkw<- function(group,y,simplify=FALSE){
 ### sort data by group ###
    o<-order(group)
    group<-group[o]
    y<-as.matrix(y[o,])
    n<-length(group)
    k<-dim(y)[2]   #k=p
    
    if (dim(y)[1] != n)
    return("number of observations not equal to length of group")
    groupls<-unique(group)
    g<-length(groupls) #number of groups (Number of fish-parasite combination)#
    groupind<-sapply(groupls,"==",group) #group indicator#
    ni<-colSums(groupind) #num of subj of each group (Number of fish in each group)#
    r<-apply(y,2,rank) #corresponding rank variable (Parasite at each bodyparts)#
    
    ### calculation of statistic ###
    r.ik<-t(groupind)%*%r*(1/ni)  #gxp, mean rank of kth variate in ith group#
    m<- (n+1)/2 #expected value of rik#
    u.ik<-t(r.ik-m)
    U<-as.vector(u.ik)
    V<-1/(n-1)*t(r-m)%*%(r-m) #pooled within-group cov matrix
    Vstar<-Matrix::bdiag(lapply(1/ni,"*",V))
    W2<-as.numeric(t(U)%*%solve(Vstar)%*%U)
    
    ### return stat and p-value ###
   returnlist<-data.frame(statistic=W2,d.f.=k*(g-1),
   p.value=pchisq(W2,k*(g-1),lower.tail=F))
    
    if (simplify==TRUE) return (W2)
    else return (returnlist)
    }

Assignment 1:

Cut and paste the MKW test function in a seperate R script and call it in R using the source() function.

MKW test

In [94]:
MKW_result<- multkw(group=Data$Group,y=Data[,4:5],simplify=F)
MKW_result
A data.frame: 1 × 3
statisticd.f.p.value
<dbl><dbl><dbl>
2.63826540.6200599

Classification problem

Determing whether an injury was caused by a knife or a spear

In [137]:
#Importing the data
Weapon_classification<- read.csv(file="Training_Stabs.csv")
head(Weapon_classification,n=10)# view the first 10 columns
A data.frame: 10 × 4
IDBMIWeapon_TypeLengths
<int><dbl><fct><dbl>
1118.0knife23.89
2118.0knife24.99
3118.0knife23.47
4118.0knife21.79
5118.0spear40.88
6118.0spear39.08
7118.0spear34.18
8118.0spear37.18
9219.1knife31.18
10219.1knife28.73
In [96]:
min(Weapon_classification$BMI)
max(Weapon_classification$BMI)
18
31.5
In [97]:
names(Weapon_classification)#the variables of the data
  1. 'ID'
  2. 'BMI'
  3. 'Weapon_Type'
  4. 'Lengths'
In [98]:
table(Weapon_classification$Weapon_Type)
knife spear 
   64    64 

Re-categorizing BMI into 3 BMI statuses

In [138]:
#Re-categorizing BMI into 3 BMI statuses
Weapon_classification$BMI_status<- cut(Weapon_classification$BMI,
                                breaks=c(0,18,25.1,31.5),right=T,labels=c("A","B","C"))


#Create Weapon_type as 0 (Knife) and 1 (Spear)
re_categorize_func<- function(variable){
    var<- as.numeric(variable)
    for(i in seq_along(variable)){
       if(var[i]==1)   var[i]<- 0 #if a knife
       if(var[i]==2)  var[i]<- 1  # if a spear
        }
    return(var)
}

Weapon_classification$Weapon_type<- re_categorize_func(Weapon_classification$Weapon_Type)
Weapon_classification$Weapon_type<-factor(Weapon_classification$Weapon_type,
                                         levels=c(0,1),label=c("knife","spear"))

head(Weapon_classification,n=10)# view the first 10 columns
A data.frame: 10 × 6
IDBMIWeapon_TypeLengthsBMI_statusWeapon_type
<int><dbl><fct><dbl><fct><fct>
1118.0knife23.89Aknife
2118.0knife24.99Aknife
3118.0knife23.47Aknife
4118.0knife21.79Aknife
5118.0spear40.88Aspear
6118.0spear39.08Aspear
7118.0spear34.18Aspear
8118.0spear37.18Aspear
9219.1knife31.18Bknife
10219.1knife28.73Bknife
In [136]:
dim(Weapon_classification)#dimension of the data
  1. 128
  2. 7

Dividing the dataset into training and validation sets

In [159]:
#Establishing training and validation set
set.seed(1)# for reproducility
train_index <- caret::createDataPartition(Weapon_classification$Weapon_type, p=0.8, list=FALSE)
training_data<-Weapon_classification[train_index,] #training data 
Validation_data<-Weapon_classification[-train_index,]# Validation data

Fitting a binomial logistic regression based on the training data

Dependent variable: Weapon type

Independent variales: BMI and stab length

In [160]:
#Setting knife as the reference categories
training_data$Weapon_type_new<- relevel(training_data$Weapon_type,ref="knife")
training_data$BMI_status<- relevel(training_data$BMI_status,ref="A")
In [161]:
#Fitting logitic regression model
Logit_model<-glm(Weapon_type~BMI+Lengths,family="binomial",data=training_data)
summary(Logit_model)
Call:
glm(formula = Weapon_type ~ BMI + Lengths, family = "binomial", 
    data = training_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.5689  -0.2412  -0.0040   0.2249   2.3400  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -16.43866    3.96530  -4.146 3.39e-05 ***
BMI          -0.12532    0.09535  -1.314    0.189    
Lengths       0.62810    0.12822   4.898 9.66e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 144.175  on 103  degrees of freedom
Residual deviance:  47.438  on 101  degrees of freedom
AIC: 53.438

Number of Fisher Scoring iterations: 7

Assessing the accuracy of the fitted model based on the validation data

Estimating the Gini coefficient (otained from the AUC of the ROC curve), the Kolmogorov Sminorv statistics and the percentage of correct classification (obtained from the confusion matrix) as accurcacy measures.

ROC means Receiver Operator Characteristic Curve

In [119]:
#loading R packages
library("ROCR")# loading the ROCR package to obtai the ROC curve
library(pROC)
In [162]:
pred_weapon<- predict(Logit_model, Validation_data,type="response")

## library(ROCR)
df=data.frame(predictions= pred_weapon,labels=Validation_data$Weapon_type)

#logistic regression
pred <- prediction(df$predictions, df$labels)
perf <- performance(pred,"tpr","fpr")
#plot(perf,col="green",main="ROC Curve using Validation Data",xlab="Specificity",ylab="Sensitivity")
#abline(0,1)#add a 45 degree line
In [163]:
pROC_obj <- roc(df$labels,df$predictions,
            smoothed = TRUE,
            # arguments for ci
            ci=TRUE, ci.alpha=0.9, stratified=FALSE,
            # arguments for plot
            plot=TRUE, auc.polygon=TRUE, max.auc.polygon=TRUE, grid=TRUE,
            print.auc=TRUE, show.thres=TRUE)


#sens.ci <- ci.se(pROC_obj)
#plot(sens.ci, type="shape", col="lightblue")
## Warning in plot.ci.se(sens.ci, type = "shape", col = "lightblue"): Low
## definition shape.
#plot(sens.ci, type="bars")
Setting levels: control = knife, case = spear

Setting direction: controls < cases

In [164]:
Specificity_sensitivity=data.frame(cbind(c(perf@y.values),c(perf@x.values)))
names(Specificity_sensitivity)=c("Sensitivity","Specificity")
head(Specificity_sensitivity)
A data.frame: 1 × 2
SensitivitySpecificity
<list><list>
10.00000000, 0.08333333, 0.16666667, 0.25000000, 0.33333333, 0.41666667, 0.41666667, 0.50000000, 0.58333333, 0.66666667, 0.75000000, 0.83333333, 0.83333333, 0.91666667, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.00000000, 1.000000000.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.08333333, 0.16666667, 0.16666667, 0.16666667, 0.25000000, 0.33333333, 0.41666667, 0.50000000, 0.58333333, 0.66666667, 0.75000000, 0.83333333, 0.91666667, 1.00000000
In [165]:
#Area under the curve
print(paste("Area under the  first ROC curve is:",performance(pred,"auc")@"y.values"))
[1] "Area under the  first ROC curve is: 0.9375"
In [166]:
Gini_Coefficient=function(AUROC){
    return((AUROC-0.5)/0.5)}
area_under_curve<- as.numeric(performance(pred,"auc")@"y.values")
print(paste("Gini Coefficient of ROC=",Gini_Coefficient(area_under_curve)))
[1] "Gini Coefficient of ROC= 0.875"
In [167]:
TPRfromROCR=unlist(perf@y.values)
FPRfromROCR=unlist(perf@x.values)
diff_TPRFPR=TPRfromROCR-FPRfromROCR
KS=max(diff_TPRFPR)

print(paste("Kolmogorov Sminorv Value for ROC curve is:", KS))
[1] "Kolmogorov Sminorv Value for ROC curve is: 0.833333333333333"
In [168]:
# use caret and compute a confusion matrix

#Spear=True since it was kept as reference
confusion_matrix<- table(Validation_data$Weapon_type, pred_weapon > 0.5)
confusion_matrix

accuracy= ((confusion_matrix[1]+confusion_matrix[4])/sum(confusion_matrix))*100

print(paste("The percentage of accurcate classification is","",accuracy,"%"))
       
        FALSE TRUE
  knife    10    2
  spear     2   10
[1] "The percentage of accurcate classification is  83.3333333333333 %"

Prediction from the fitted logistic model for an individual with unknown BMI.

$$P(spear)=\frac{e^{bo+\sum_{i=1}^{n} b_iX_i}}{1+e^{bo+\sum_{i=1}^{n} b_iX_i}}$$

Creating a function to estimate the probability of weapon being a spear from the binomial logitic model

In [169]:
coef(Logit_model)
(Intercept)
-16.4386629978852
BMI
-0.12532122351233
Lengths
0.628096922996127
In [170]:
Prob_spear=function(model,bmi, stab_length){
   b0=as.numeric(coef(model)[1])
   b1=as.numeric(coef(model)[2])
   b2=as.numeric(coef(model)[3])
  Numerator=exp(b0+(b1*bmi)+(b2*stab_length))
  return(Numerator/(1+Numerator))  #return probability in percent and in 1 decimal place
}
In [173]:
#Assuming values for the bmi are based on the empirical data
bmi_values<- seq(18,31.5,by=.5)
bmi_values

length_values_LM<- c(30,35,60)
length_values_HW<- c(7.5,15,20,21,30,31)
  1. 18
  2. 18.5
  3. 19
  4. 19.5
  5. 20
  6. 20.5
  7. 21
  8. 21.5
  9. 22
  10. 22.5
  11. 23
  12. 23.5
  13. 24
  14. 24.5
  15. 25
  16. 25.5
  17. 26
  18. 26.5
  19. 27
  20. 27.5
  21. 28
  22. 28.5
  23. 29
  24. 29.5
  25. 30
  26. 30.5
  27. 31
  28. 31.5
In [174]:
pred_spear_bmi_LM=pred_knife_bmi_LM=NULL
pred_spear_bmi_HW=pred_knife_bmi_HW=NULL
for(i in seq_along(length_values_LM)){
pred_spear_bmi_LM[[i]]<-Prob_spear(model=Logit_model,bmi=bmi_values, stab_length=length_values_LM[i])
pred_knife_bmi_LM[[i]]<- 1-pred_spear_bmi_LM[[i]]    
    }

for(i in seq_along(length_values_HW)){
pred_spear_bmi_HW[[i]]<-Prob_spear(model=Logit_model,bmi=bmi_values, stab_length=length_values_HW[i])
pred_knife_bmi_HW[[i]]<- 1-pred_spear_bmi_HW[[i]]    
    }


colour_LM<- c("blue","red","green")
#par(mfrow=c(2,1),mar=c(4,4,1,1))
plot(bmi_values,pred_spear_bmi_LM[[1]],type="l",lwd=3,
     xlab="Possible BMI values",ylab="Probability of a weapon being a spear relative to a dagger",main="Mummy Man",col=colour_LM[1]
    ,ylim=c(0,1),xlim=c(18,32))
abline(h=0.5,col="black",lwd=3)
text(28,0.5,"Probability=0.5",col="black",font=2,pos=3)

for(i in seq_along(length_values_LM)[-1]){
lines(bmi_values,pred_spear_bmi_LM[[i]],type="l",lwd=3,col=colour_LM[i])

}

text_LM<-c("wound length= 30mm","wound length= 35mm","wound length= 60mm")
legend(x=24,y=.8,text_LM,
       col=colour_LM,bty="n",cex=.9,box.lwd = 2,fill=colour_LM)

#plot(length_values,pred_spear_length,type="l",col="red",lwd=3,xlab="Different stab length values",ylab="Probability of a weapon being a spear")
#lines(bmi_values,pred_knife,type="l",col="red",lwd=3)

#legend("center",c("Spear","Knife"),col=c("blue","red"),bty="n",cex=.9,box.lwd = 2,fill=c("blue","red"))
In [175]:
Prob_LM<- as.data.frame(do.call("cbind",pred_spear_bmi_LM))
names(Prob_LM)<-text_LM
Prob_LM$bmi_values<- bmi_values
Prob_LM
write.csv(Prob_LM,"Predicted_probabilities_LM.csv")
A data.frame: 28 × 4
wound length= 30mmwound length= 35mmwound length= 60mmbmi_values
<dbl><dbl><dbl><dbl>
0.53704760.9640477118.0
0.52143740.9618116118.5
0.50578510.9594423119.0
0.49012150.9569326119.5
0.47447730.9542749120.0
0.45888300.9514616120.5
0.44336880.9484845121.0
0.42796430.9453354121.5
0.41269820.9420055122.0
0.39759820.9384859122.5
0.38269060.9347676123.0
0.36800060.9308410123.5
0.35355140.9266966124.0
0.33936500.9223247124.5
0.32546120.9177151125.0
0.31185820.9128577125.5
0.29857200.9077425126.0
0.28561690.9023592126.5
0.27300520.8966974127.0
0.26074710.8907471127.5
0.24885100.8844983128.0
0.23732340.8779410128.5
0.22616900.8710656129.0
0.21539090.8638631129.5
0.20499020.8563246130.0
0.19496710.8484418130.5
0.18531970.8402073131.0
0.17604540.8316142131.5
In [176]:
colour_HW<- c("green","yellow","orange","pink","blue","red")
#par(mfrow=c(2,1),mar=c(4,4,1,1))
plot(bmi_values,pred_spear_bmi_LM[[1]],type="l",lwd=3,
     xlab="Possible BMI values",ylab="Probability of a weapon being a spear relative to a dagger",main="Mummy Woman",col=colour_HW[1]
    ,ylim=c(0,.8),xlim=c(18,32))
abline(h=0.5,col="black",lwd=3)
text(28,0.5,"Probability=0.5",col="black",font=2,pos=3)

for(i in seq_along(length_values_HW)[-1]){
lines(bmi_values,pred_spear_bmi_HW[[i]],type="l",lwd=3,col=colour_HW[i])

}
text_HW<-c("wound length= 7.5mm","wound length= 15mm","wound length= 20mm",
                    "wound length= 21mm","wound length= 30mm","wound length= 31mm")
legend(x=24,y=.81,text_HW,
       col=colour_HW,bty="n",cex=.9,box.lwd = 2,fill=colour_HW)
In [178]:
Prob_HW<- as.data.frame(do.call("cbind",pred_spear_bmi_HW))
names(Prob_HW)<-text_HW
Prob_HW$bmi_values<- bmi_values
Prob_HW
write.csv(Prob_HW,"Predicted_probabilities_HW.csv")
A data.frame: 28 × 7
wound length= 7.5mmwound length= 15mmwound length= 20mmwound length= 21mmwound length= 30mmwound length= 31mmbmi_values
<dbl><dbl><dbl><dbl><dbl><dbl><dbl>
8.451777e-079.391832e-050.00216642850.00405230200.53704760.684938218.0
7.938435e-078.821443e-050.00203511210.00380711110.52143740.671262118.5
7.456272e-078.285692e-050.00191174020.00357670260.50578510.657289519.0
7.003395e-077.782477e-050.00179583380.00336019150.49012150.643038819.5
6.578024e-077.309821e-050.00168694280.00315674510.47447730.628530320.0
6.178490e-076.865869e-050.00158464400.00296557990.45888300.613786420.5
5.803222e-076.448878e-050.00148853940.00278595890.44336880.598830821.0
5.450747e-076.057211e-050.00139825520.00261718870.42796430.583688921.5
5.119681e-075.689330e-050.00131343980.00245861730.41269820.568387422.0
4.808723e-075.343791e-050.00123376280.00230963110.39759820.552954222.5
4.516652e-075.019237e-050.00115891360.00216965360.38269060.537418323.0
4.242320e-074.714394e-050.00108860040.00203814220.36800060.521809423.5
3.984651e-074.428065e-050.00102254880.00191458690.35355140.506157824.0
3.742632e-074.159125e-050.00096050110.00179850820.33936500.490494124.5
3.515313e-073.906519e-050.00090221500.00168945530.32546120.474849025.0
3.301801e-073.669254e-050.00084746280.00158700440.31185820.459253225.5
3.101257e-073.446399e-050.00079603080.00149075690.29857200.443736726.0
2.912893e-073.237079e-050.00074771780.00140033840.28561690.428329326.5
2.735971e-073.040472e-050.00070233490.00131539680.27300520.413059627.0
2.569794e-072.855806e-050.00065970480.00123560120.26074710.397955327.5
2.413710e-072.682355e-050.00061966060.00116064060.24885100.383042928.0
2.267107e-072.519439e-050.00058204570.00109022270.23732340.368347428.5
2.129408e-072.366417e-050.00054671280.00102407280.22616900.353892229.0
2.000072e-072.222690e-050.00051352370.00096193270.21539090.339699329.5
1.878592e-072.087691e-050.00048234850.00090355980.20499020.325788630.0
1.764491e-071.960892e-050.00045306500.00084872610.19496710.312178230.5
1.657319e-071.841794e-050.00042555850.00079721750.18531970.298884331.0
1.556657e-071.729929e-050.00039972130.00074883250.17604540.285921231.5
In [ ]: