R CODES FOR THE 4TH MEETING/CLASS

Below is the outline of tomorrow’s class

  1. A brief introduction to looping (for loops and while loops) and the use of apply functions in R (lapply, sapply, etc.). How to store results from loops as a vector, list or data frame for use later.

  2. Introduction to simulating probability distributions (binomial, Poisson, uniform, normal, lognormal and exponential probability distribution, amongst others).We shall introduce Monte Carlo Simulation for a throw or toss of a fair die.

Next meeting outline:

We shall prove the most famous theory in Statistics called the Central Limit Theory as well based on these simulations.

Introduction to two important methods of estimation: Bootstrapping and Jacknife estimation techniques.

NB: Assignment tasks based on sections 1 to 3 will be given at the end of the class. You may still need knowledge of our first 3 meetings (already hosted on YouTube) while doing the assigned tasks. The assignment will be put on Piazza as well including all R scripts that I will write for tomorrow’s class/meeting.

1a. Introduction to loops (for loop & while loop) in R

NB: Avoid loops if not needed. Only use them when necessary.

You don't need a loop to add, substract, multiple or divide entries of two or more vectors.

Addition without loops

In [262]:
#Addition without loops
x <- 1:10
y <- 21:30

x+y

#y-x

#y/x
  1. 22
  2. 24
  3. 26
  4. 28
  5. 30
  6. 32
  7. 34
  8. 36
  9. 38
  10. 40

For loops

The syntax for for loop is as follows:

for (val in sequence)

{

statement

}

Addition with for loops as saving results as a vector

In [263]:
results1<-numeric(length(x)) #or results1<-rep(NA,length(x)) or results1<-rep(FALSE,length(x))

for(i in seq_along(x)){
 results1[i]<- x[i]+y[i]
 #print(results[i])
}


results1
  1. 22
  2. 24
  3. 26
  4. 28
  5. 30
  6. 32
  7. 34
  8. 36
  9. 38
  10. 40
In [42]:
#seq_along(x) returns all the indecies of x
results2<-rep(FALSE,length(x)) #or results<-rep(NA,length(x)) or results<-rep(FALSE,length(x))

for(i in seq_along(x)){
 results2[i]<- x[i]+y[i]  
}

results2
  1. 22
  2. 24
  3. 26
  4. 28
  5. 30
  6. 32
  7. 34
  8. 36
  9. 38
  10. 40
In [43]:
all.equal(results1, results2)# to check whether entries of two or more vectors are the same
TRUE

Creating or extracting sequence of numbers (even numbers, odd numbers and prime numbers) using loops

In [60]:
1000/3

1000%/%3
333.333333333333
333
In [32]:
#try running the codes below to see what each does
20/3 #returns 20 divided by 3
20%%3 #returns the modulus or the remainder after dividing 20 by 3
20%/%3 #returns the whole numberafter dividing 20 by 3
6.66666666666667
2
6
In [68]:
seq(2,100,by=2)

88%%3
  1. 2
  2. 4
  3. 6
  4. 8
  5. 10
  6. 12
  7. 14
  8. 16
  9. 18
  10. 20
  11. 22
  12. 24
  13. 26
  14. 28
  15. 30
  16. 32
  17. 34
  18. 36
  19. 38
  20. 40
  21. 42
  22. 44
  23. 46
  24. 48
  25. 50
  26. 52
  27. 54
  28. 56
  29. 58
  30. 60
  31. 62
  32. 64
  33. 66
  34. 68
  35. 70
  36. 72
  37. 74
  38. 76
  39. 78
  40. 80
  41. 82
  42. 84
  43. 86
  44. 88
  45. 90
  46. 92
  47. 94
  48. 96
  49. 98
  50. 100
1

How many even numbers are there from 1 to 54859 using a for loop?

In [71]:
n=11:20
seq_along(n)
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
In [81]:
#first approach
n <- 1: 564000
count<- 0
for (k in seq_along(n)) {

    if(k%% 2 == 0) {
        #print(k)
        count = count+1
        }
}
print(count)
[1] 282000
In [260]:
1:10
for (k in 1:10) {
    print(k^3)
    }
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
[1] 1
[1] 8
[1] 27
[1] 64
[1] 125
[1] 216
[1] 343
[1] 512
[1] 729
[1] 1000
In [91]:
#second approach (long approach)
n <- 1: 54859 
multiple_2<-list()  #or multiple_2<-c() or multiple_2<- NULL 
for (val in n) {
  if(val%% 2 == 0)  {
    multiple_2[[val]]<- val
     }
}


multiples_of_2<-unlist(multiple_2) #stores all multiples of 2
#multiples_of_2
length(multiples_of_2) #total number of even numbers
27429

How many multiples of 8 are there from 1 to 1000 and what are they using for loops?

Try: extracting odds numbers, multiples of 5 and multiples of 12 from 10 to 10000 using for loops

In [107]:
# How many multiples of 8 are there from 1 to 1000 and what are they?


#First approach
n <- 1: 1000

multiple_8<-list()
#try multiple_8<-c() or multiple_8<-NULL #to see it does the job but with NA's if a number is not a multiple of 8 
#multiple_8<-list()  
for (k in seq_along(n)) {
  if(k%% 8 == 0)  {
    multiple_8[[k]]<- k
     }
}



multiples_of_8<-unlist(multiple_8) #stores all multiples of 8

print(paste("The total number of multiples of 8 is", "", length(multiples_of_8))) #total number of even numbers

#print(multiples_of_8)
[1] "The total number of multiples of 8 is  125"
In [110]:
#second approach
n <- 1: 1000
#try multiple_8<-c() or multiple_8<-NULL #to see it does the job but with NA's if a number is not a multiple of 8 
multiple_8<- NULL  
for (k in seq_along(n)) {
  if(k%% 8 == 0)  {
    multiple_8[[k]]<- k
     }
}

#na.omit(multiples_of_8)
#multiples_of_8<-unlist(multiple_8) #stores all multiples of 8


#print(multiples_of_8)
#print(paste("The total number of multiples of 8 is", "", length(multiples_of_8))) #total number of even numbers

multiples_of_8<- na.omit(multiples_of_8)
paste("The total number of multiples of 8 is", "", length(multiples_of_8)) #total number of even numbers
multiples_of_8
'The total number of multiples of 8 is 125'
  1. 8
  2. 16
  3. 24
  4. 32
  5. 40
  6. 48
  7. 56
  8. 64
  9. 72
  10. 80
  11. 88
  12. 96
  13. 104
  14. 112
  15. 120
  16. 128
  17. 136
  18. 144
  19. 152
  20. 160
  21. 168
  22. 176
  23. 184
  24. 192
  25. 200
  26. 208
  27. 216
  28. 224
  29. 232
  30. 240
  31. 248
  32. 256
  33. 264
  34. 272
  35. 280
  36. 288
  37. 296
  38. 304
  39. 312
  40. 320
  41. 328
  42. 336
  43. 344
  44. 352
  45. 360
  46. 368
  47. 376
  48. 384
  49. 392
  50. 400
  51. 408
  52. 416
  53. 424
  54. 432
  55. 440
  56. 448
  57. 456
  58. 464
  59. 472
  60. 480
  61. 488
  62. 496
  63. 504
  64. 512
  65. 520
  66. 528
  67. 536
  68. 544
  69. 552
  70. 560
  71. 568
  72. 576
  73. 584
  74. 592
  75. 600
  76. 608
  77. 616
  78. 624
  79. 632
  80. 640
  81. 648
  82. 656
  83. 664
  84. 672
  85. 680
  86. 688
  87. 696
  88. 704
  89. 712
  90. 720
  91. 728
  92. 736
  93. 744
  94. 752
  95. 760
  96. 768
  97. 776
  98. 784
  99. 792
  100. 800
  101. 808
  102. 816
  103. 824
  104. 832
  105. 840
  106. 848
  107. 856
  108. 864
  109. 872
  110. 880
  111. 888
  112. 896
  113. 904
  114. 912
  115. 920
  116. 928
  117. 936
  118. 944
  119. 952
  120. 960
  121. 968
  122. 976
  123. 984
  124. 992
  125. 1000

While loops

The syntax for while loop is as follows:

while (condition of interest)

{

statement

}

How many multiples of 8 are there from 1 to 1000 and what are they using a while loop?

Try: extracting odds numbers, multiples of 5 and multiples of 12 from 10 to 10000 using while loop

In [119]:
multiples_8<-NULL

n=1
while(n<=1000){
  
    if(n%%8==0)  multiples_8[[n]]<- n
   
 
    n=n+1 
}

na.omit(unlist(multiples_8))

#multiples_of_8<-unlist(multiples_8)

#multiples_of_8
  1. 8
  2. 16
  3. 24
  4. 32
  5. 40
  6. 48
  7. 56
  8. 64
  9. 72
  10. 80
  11. 88
  12. 96
  13. 104
  14. 112
  15. 120
  16. 128
  17. 136
  18. 144
  19. 152
  20. 160
  21. 168
  22. 176
  23. 184
  24. 192
  25. 200
  26. 208
  27. 216
  28. 224
  29. 232
  30. 240
  31. 248
  32. 256
  33. 264
  34. 272
  35. 280
  36. 288
  37. 296
  38. 304
  39. 312
  40. 320
  41. 328
  42. 336
  43. 344
  44. 352
  45. 360
  46. 368
  47. 376
  48. 384
  49. 392
  50. 400
  51. 408
  52. 416
  53. 424
  54. 432
  55. 440
  56. 448
  57. 456
  58. 464
  59. 472
  60. 480
  61. 488
  62. 496
  63. 504
  64. 512
  65. 520
  66. 528
  67. 536
  68. 544
  69. 552
  70. 560
  71. 568
  72. 576
  73. 584
  74. 592
  75. 600
  76. 608
  77. 616
  78. 624
  79. 632
  80. 640
  81. 648
  82. 656
  83. 664
  84. 672
  85. 680
  86. 688
  87. 696
  88. 704
  89. 712
  90. 720
  91. 728
  92. 736
  93. 744
  94. 752
  95. 760
  96. 768
  97. 776
  98. 784
  99. 792
  100. 800
  101. 808
  102. 816
  103. 824
  104. 832
  105. 840
  106. 848
  107. 856
  108. 864
  109. 872
  110. 880
  111. 888
  112. 896
  113. 904
  114. 912
  115. 920
  116. 928
  117. 936
  118. 944
  119. 952
  120. 960
  121. 968
  122. 976
  123. 984
  124. 992
  125. 1000

1b. The apply functions

We will also learn apply() sapply(), lapply() and tapply(). The apply collection can be viewed as a substitute to the loop.

Let's create a data frame and use the apply functions while finding summary measures

In [126]:
rep(c("Male","Female"),5)
  1. 'Male'
  2. 'Female'
  3. 'Male'
  4. 'Female'
  5. 'Male'
  6. 'Female'
  7. 'Male'
  8. 'Female'
  9. 'Male'
  10. 'Female'
In [121]:
#let's create a matrix and use the apply functions while finding summary measures
Age<- c(5,7,8,9,10,12,14,36,60,50)
Size<-seq(1,30,length.out=10)
Height<-rep(c(2,4,5,7,8), 2)
Gender<-rep(c("Male","Female"),5)

Data_artifical<- data.frame(Age,Size,Height,Gender) # Data_artifical<- cbind(Age,Size,Height,Gender)
Data_artifical

#to change variable names, you to say age, size, height & gender.
# names(Data_artifical)<- c("age", "size","height","gender")
A data.frame: 10 × 4
AgeSizeHeightGender
<dbl><dbl><dbl><fct>
5 1.0000002Male
7 4.2222224Female
8 7.4444445Male
910.6666677Female
1013.8888898Male
1217.1111112Female
1420.3333334Male
3623.5555565Female
6026.7777787Male
5030.0000008Female
In [145]:
#Data_artifical[1] #
#Data_artifical[, 1]

Data_artifical[[1]]
  1. 5
  2. 7
  3. 8
  4. 9
  5. 10
  6. 12
  7. 14
  8. 36
  9. 60
  10. 50

Find mean across columns or each variable

In [154]:
se<- function(x){
    error= sd(x)/length(x)
    return(error)
}

apply(Data_artifical[,1:2],2,sum)


#apply(Data_artifical[,1:3],1,mean)
Age
211
Size
155
In [69]:
#Find mean across columns or each variable: 2=column in apply() function
apply(Data_artifical[,1:3],2,mean)
Age
21.1
Size
15.5
Height
5.2

Find mean across rows

In [158]:
# find mean across rows: 1=row & 2=column in apply() function
#1=rows
apply(Data_artifical[,1:3],1,mean)
  1. 2.66666666666667
  2. 5.07407407407407
  3. 6.81481481481481
  4. 8.88888888888889
  5. 10.6296296296296
  6. 10.3703703703704
  7. 12.7777777777778
  8. 21.5185185185185
  9. 31.2592592592593
  10. 29.3333333333333
In [162]:
tapply(Data_artifical$Age,Data_artifical$Gender, mean)
Female
22.8
Male
19.4
In [159]:
mean(Data_artifical$Age)
21.1
In [72]:
# tapply(X, INDEX, FUN = NULL)
tapply(Data_artifical[[1]],Data_artifical[[4]], mean)
Female
22.8
Male
19.4
In [76]:
round(tapply(Data_artifical[,2],Data_artifical[[4]], mean),3) #round to 3 d.p
Female
17.111
Male
13.889
In [164]:
apply(Data_artifical[,1:3],2,sum)
Age
211
Size
155
Height
52
In [173]:
x=lapply(Data_artifical[,1:3], median)

x
$Age
11
$Size
15.5
$Height
5
In [167]:
sapply(Data_artifical[,1:3], median)
Age
11
Size
15.5
Height
5
In [82]:
results=lapply(Data_artifical[,1:3], sum)

results[[1]]
results[[2]]
results[[3]]
211
155
52
In [91]:
sapply(Data_artifical[,1:3], sum)
Age
211
Size
155
Height
52
In [89]:
#Creating function to find standard error
Std_err<-function(x){
   se<- sd(x)/length(x)
    return(se)
}

lapply(Data_artifical[,1:3],Std_err)
$Age
2.00080539339093
$Size
0.975576225209192
$Height
0.225092573548455
In [90]:
tapply(Data_artifical$Age,Data_artifical$Gender,Std_err)
Female
3.83510104169369
Male
4.58606585212206

2. Introduction to simulating probability distributions

Simulating 1000 samples from binomial, Poisson, uniform, normal, lognormal and exponential distributions.

We shall prove the most famous theory in Statistics called the Central Limit Theory as well based on these simulations.

In [189]:
options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300) #Setting plot size
In [185]:
set.seed(125)
rbinom(n=10,size=10,prob=0.5)
  1. 6
  2. 3
  3. 4
  4. 4
  5. 8
  6. 8
  7. 5
  8. 4
  9. 6
  10. 5
In [187]:
set.seed(125)
rbinom(n=10,size=10,prob=0.5)
  1. 6
  2. 3
  3. 4
  4. 4
  5. 8
  6. 8
  7. 5
  8. 4
  9. 6
  10. 5
In [220]:
set.seed(123) #for reproducibility
names_prob_distn<-c("binomial","poisson","uniform","exponential","normal","lognormal")
prob_distn=NULL #list( )# c()

prob_distn[[1]]<-rbinom(n=1000,size=10,prob=0.5) #binomial sample
prob_distn[[2]]<-rpois(n=1000, lambda=1)#poisson_sample
prob_distn[[3]]<-runif(n=1000,min=0, max=1)#uniform sample
prob_distn[[4]]<-rexp(n=1000, rate=1)#exponential sample
prob_distn[[5]]<-rnorm(n=1000, mean = 0, sd = 1)#normal sample
prob_distn[[6]]<-rlnorm(n=1000, meanlog = 0, sdlog = 1)#lognormal sample

Plotting histograms of the distributions using for loops

In [115]:
options(repr.plot.width=8, repr.plot.height=8,repr.plot.res = 300) #Setting plot size & resolution
In [203]:
paste("position",seq_along(names_prob_distn))
  1. 'position 1'
  2. 'position 2'
  3. 'position 3'
  4. 'position 4'
  5. 'position 5'
  6. 'position 6'
In [221]:
par(mfrow=c(3,2),mar=c(4,4,1,0))

for(i in seq_along(names_prob_distn)){
    hist(prob_distn[[i]],col=i,main=paste(names_prob_distn[i]),xlab="Samples")
}
In [206]:
#trying n=1000 and increase n to 100000 (returns an elliptical shape for bivariate normality)
set.seed(1)
normal_sample1<- rnorm(n=100000, mean = 2, sd = 1) 
normal_sample2<- rnorm(n=100000, mean = 3, sd = 1)

plot(normal_sample1,normal_sample2,col="blue",type="p",xlab="Normal samples 1",ylab="Normal sample 2")

MONTE CARLO SIMULATION

SIMULATING A TOSS OF A DIE

Monte Carlo Simulations works perfect as the number of simulations increases

In [223]:
runif(n=1,0,1)
0.410570670850575
In [233]:
numeric(6)

rep(0,6)
  1. 0
  2. 0
  3. 0
  4. 0
  5. 0
  6. 0
  1. 0
  2. 0
  3. 0
  4. 0
  5. 0
  6. 0
In [258]:
count=rep(0,6) #numeric(6)

rand=runif(n=1,0,1) #a random variable to decide which event to occur (a single toss)
for (i in rand){
if (i<1/6){
  count[1]=count[1]+1
    print("face 1")
} else if (i<2/6){  # 1/6<i <2/6
   count[2]=count[2]+1
    print("face 2")
} else if (i<3/6){
   count[3]=count[3]+1
    print("face 3")
} else if (i<4/6){
     count[4]=count[4]+1
    print("face 4")
} else if (i<5/6){
     count[5]=count[5]+1
    print("face 5")
} else  {
     count[6]=count[6]+1
    print("face 6")
}
}
[1] "face 1"
In [264]:
count=rep(0,6)
tosses=1000  #increase the number of tosses (to 1000000) to show its a fair die
rand=runif(n=tosses,0,1) #a random variable to decide which event to occur (for n tosses)
for (i in rand){
if (i<1/6){
  count[1]=count[1]+1
    #print("face 1")
} else if (i<2/6){
   count[2]=count[2]+1
    #print("face 2")
} else if (i<3/6){
   count[3]=count[3]+1
    #print("face 3")
} else if (i<4/6){
     count[4]=count[4]+1
    #print("face 4")
} else if (i<5/6){
     count[5]=count[5]+1
    #print("face 5")
} else {
     count[6]=count[6]+1
    #print("face 6")
   }
    
}

count
  1. 159
  2. 172
  3. 165
  4. 153
  5. 157
  6. 194
In [265]:
for (i in seq_along(count)) {
    print(paste("Die number",i,"", "appears",count[i], "times"))
}
[1] "Die number 1  appears 159 times"
[1] "Die number 2  appears 172 times"
[1] "Die number 3  appears 165 times"
[1] "Die number 4  appears 153 times"
[1] "Die number 5  appears 157 times"
[1] "Die number 6  appears 194 times"
In [266]:
barplot(count/tosses,col="red",xlab="Die outcome",horiz = FALSE,
       names.arg =c("face 1","face 2","face 3","face 4","face 5","face 6"))