Mat 150 Introductory Statistics

How Do We Construct Frequency Distribution Tables Using R?

Part 2 (Relative Frequeancy Distribution Table, Cumulative Frequency Distribution, and Cumulative Relative Frequency Distribution Table)

We will use States Visited, (StatesV) variable included in first day class survey, Sur1.

Upload Sur1.csv data set from your desktop to R

Sur1 <-read.csv("~/Desktop/Sur1.csv")

NOTE: If case you did not save Sur1.csv file correctly to your desktop, execute the following code
Sur1 <-read.csv(read.csv(file.choose(),header = TRUE))

Examinne the strucure of Sur1 data set

str(Sur1)

## 'data.frame':    20 obs. of  6 variables:
##  $ SEX      : Factor w/ 2 levels "Female","Male": 1 1 1 1 2 2 2 1 1 1 ...
##  $ POLITICS : Factor w/ 3 levels "Conservative",..: 3 3 3 3 3 3 2 2 1 3 ...
##  $ NSiblings: int  4 4 2 2 2 4 5 4 4 1 ...
##  $ StatesV  : int  5 11 6 3 7 3 4 4 6 3 ...
##  $ ShoeS    : num  6 9 8 7.5 13 10.5 9 8 6 8.5 ...
##  $ UsedExcel: int  2 2 2 1 1 3 3 2 2 1 ...

View first six row of the data in spreadsheet format

head(Sur1, 6)

##      SEX POLITICS NSiblings StatesV ShoeS UsedExcel
## 1 Female Moderate         4       5   6.0         2
## 2 Female Moderate         4      11   9.0         2
## 3 Female Moderate         2       6   8.0         2
## 4 Female Moderate         2       3   7.5         1
## 5   Male Moderate         2       7  13.0         1
## 6   Male Moderate         4       3  10.5         3

Attach Sur1 to R in order to work with data variables without using $ symbol (remember to detach(Sur1) after completion of analysis)

attach(Sur1)

Summarize StatesV variable

summary(StatesV)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    4.00    6.00    7.15    9.50   18.00

Construct a frequency distribution table of States Visited using breaks to list class limits

breaks <-seq(0,18, by=3)

Group StatesV variable data into bins

StatesV.cut <-cut(StatesV,breaks)

Construct frequency table for StatesV variable in horizontal display

 table(StatesV.cut)

## StatesV.cut
##   (0,3]   (3,6]   (6,9]  (9,12] (12,15] (15,18] 
##       3       8       4       3       1       1

Transform this table to 2-column display

transform(table(StatesV.cut))

##   StatesV.cut Freq
## 1       (0,3]    3
## 2       (3,6]    8
## 3       (6,9]    4
## 4      (9,12]    3
## 5     (12,15]    1
## 6     (15,18]    1

Convert frequency table to relative frequency table by dividing frequencies by total number of StatesV data, in our case by 20

transform(table(StatesV.cut)/20)

##   StatesV.cut Freq
## 1       (0,3] 0.15
## 2       (3,6] 0.40
## 3       (6,9] 0.20
## 4      (9,12] 0.15
## 5     (12,15] 0.05
## 6     (15,18] 0.05

Convert frequency table to cumulative frequency table by summing frequencies of StatesV data cumulatively

transform(table(StatesV.cut),Cum_Freq=cumsum(Freq))

##   StatesV.cut Freq Cum_Freq
## 1       (0,3]    3        3
## 2       (3,6]    8       11
## 3       (6,9]    4       15
## 4      (9,12]    3       18
## 5     (12,15]    1       19
## 6     (15,18]    1       20

Convert frequency table to cumulative relative frequency table by dividing cumulative frequencies by total number of StatesV data, in our case by 20

transform(table(StatesV.cut),RelCum_Freq=cumsum(Freq)/20)

##   StatesV.cut Freq RelCum_Freq
## 1       (0,3]    3        0.15
## 2       (3,6]    8        0.55
## 3       (6,9]    4        0.75
## 4      (9,12]    3        0.90
## 5     (12,15]    1        0.95
## 6     (15,18]    1        1.00

Frequency Tables Using R Part 1

February 22, 2020 / Leave a comment

How Do We Construct A Frequency Distribution Table Using R? Part 1

We will use States Visited, (StatesV) variables included in the first-day class survey, Sur1.

Upload Sur1.csv data set from your desktop to R

Sur1 <-read.csv("~/Desktop/Sur1.csv")

NOTE: In case you did not save Sur1.csv file correctly to your desktop, execute the following code
Sur1 <-read.csv(read.csv(file.choose(),header = TRUE))

Examinne the strucure of Sur1 data set

str(Sur1)

## 'data.frame':    20 obs. of  6 variables:
##  $ SEX      : Factor w/ 2 levels "Female","Male": 1 1 1 1 2 2 2 1 1 1 ...
##  $ POLITICS : Factor w/ 3 levels "Conservative",..: 3 3 3 3 3 3 2 2 1 3 ...
##  $ NSiblings: int  4 4 2 2 2 4 5 4 4 1 ...
##  $ StatesV  : int  5 11 6 3 7 3 4 4 6 3 ...
##  $ ShoeS    : num  6 9 8 7.5 13 10.5 9 8 6 8.5 ...
##  $ UsedExcel: int  2 2 2 1 1 3 3 2 2 1 ...

View first six row of the data in spreadsheet format

head(Sur1, 6)

##      SEX POLITICS NSiblings StatesV ShoeS UsedExcel
## 1 Female Moderate         4       5   6.0         2
## 2 Female Moderate         4      11   9.0         2
## 3 Female Moderate         2       6   8.0         2
## 4 Female Moderate         2       3   7.5         1
## 5   Male Moderate         2       7  13.0         1
## 6   Male Moderate         4       3  10.5         3

Attach Sur1 to R in order to work with data variables without using $ symbol (remember to detach(Sur1) after completion of analysis)

attach(Sur1)

Summarize StatesV variable

summary(StatesV)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00    4.00    6.00    7.15    9.50   18.00

Construct a frequency distribution table of States Visited using breaks to list class limits

breaks <-seq(0,18, by=3)

Group StatesV variable data into bins

StatesV.cut <-cut(StatesV,breaks)

Construct frequency table for StatesV variable in horizontal display

 table(StatesV.cut)

## StatesV.cut
##   (0,3]   (3,6]   (6,9]  (9,12] (12,15] (15,18] 
##       3       8       4       3       1       1

Transform this table to 2-column display

transform(table(StatesV.cut))

##   StatesV.cut Freq
## 1       (0,3]    3
## 2       (3,6]    8
## 3       (6,9]    4
## 4      (9,12]    3
## 5     (12,15]    1
## 6     (15,18]    1

Construct a histogram of the frequency distribution of StatesV variable

hist(StatesV,breaks,col="red",border = "blue",xlab="Number of States Visited",xlim = c(0,21),ylab="Frequency",main="Distribution of States Visited by Mat150-1701 Students")

Uploading a .csv file into R

February 20, 2020 / Leave a comment

Day 1: Using R as Calculator (We will not save files in R format for now!)

Objective: How to upload a .csv file into R?

Step 1 Prepare your workspace

Whenever you begin/complete your work with R execute these two commands: rm(list=ls()) and ls() .

Launch R and type into console panel:

rm(list=ls())

press ENTER key, then check if your workspace is cleared

by using:

ls()

Step 2 Upload survey data, Sur1.csv by using following syntax

Sur1 <-read.csv(file.choose(),header=TRUE)

Note: You must have your .csv file already downloaded to your computer, (e.g. to your desktop).

Step 3 Check the structure of data uploaded by using the following code

Str(Sur1)

Step 4 Optional: Verify names of variables included in data by using the following syntax

names(Sur1)

Step 5 View the first five rows of Sur1 data using the following syntax:

head(Sur1, 5)

Step 6

Attach Sur1 data in order to use names of respected variables without $ sign

attach(Sur1)

Step 7 Get a quick look at the summary of variables of Sur1

summary(Sur1)

Sampling Distribution of Sample Mean

Example 1 (Data Set 21)

Upload Data Set 21 to R in previously saved .csv format.
Use Data Set 21 to construct a histogram of DEPTHS of 600 earthquakes.
Select 10000 random samples of size=50 from the DEPTHS variable, calculate the mean of each and construct a histogram of the sampling distribution of the sample means.
SOLUTION

EarthQ <- read.csv("~/Desktop/csv/21 - Earthquakes.csv")

attach(EarthQ)

head(EarthQ)

##   MAGNITUDE DEPTH
## 1      2.45   0.7
## 2      3.62   6.0
## 3      3.06   7.0
## 4      3.30   5.4
## 5      1.09   0.5
## 6      3.10   0.0

2.

breaks <-seq(0,45,by=5)
hist(DEPTH,breaks, col="red",xlab = "Depth [km]",ylab = "Frequency",main = "Histogram of Earthquakes Depths")

mean(DEPTH)

## [1] 5.822

sd(DEPTH)

## [1] 4.927049

fifty.depths <- function() {
    depth.S <- sample(DEPTH,
    size = 50,replace = TRUE)
    return(mean(depth.S))
}

 sim1 <-replicate(n=10000,expr=fifty.depths())

head(sim1)

## [1] 5.540 6.576 5.224 6.800 5.262 4.952

summary(sim1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.650   5.338   5.798   5.825   6.270   8.742

sd(sim1)

## [1] 0.695623

breaks <-seq(2.5,10.5,by=0.5)
hist(sim1,breaks,xlab ="Mean[km]",ylab="Frequency",col="red",border = "green",main="Sampling Distribution of the Sample Mean")

Things to ponder:

the shape of DEPTH histogram
values of the mean(DEPTH) and sd(DEPTH)
shape of means(DEPTH) sampling distribution
values of the mean of the sampling distribution and standard deviation of the sampling distribution

Tutorial 1

December 2, 2019 / Leave a comment

Five fair dice were rolled once and the proportion of ODD outcomes was observed.

Example 1

x <-sample(1:6,5,replace = TRUE); x

## [1] 5 2 2 5 4

What is the proportion of ODD outcomes?

y <- x %% 2;y

## [1] 1 0 0 1 0

prop <-sum(y)/length(x);prop

## [1] 0.4

Example 2 (Simulation)

Replicate 100,000 five-dice rolls and record the proportion of ODD numbers for each roll.

five.dice <- function() {
  dice <-sample(1:6,5, replace = TRUE)
  return(sum(dice %% 2)/length(dice))
}
sim1 <-replicate(100000, five.dice())

Here are the first six recorded proportions of ODD outcomes.

head(sim1)

## [1] 0.2 0.6 0.4 0.6 0.2 1.0

Table of Proportions of ODD outcomes from 100,000 simulated rolls

transform(table(sim1))

##   sim1  Freq
## 1    0  3065
## 2  0.2 15541
## 3  0.4 31303
## 4  0.6 31434
## 5  0.8 15626
## 6    1  3031

Histogram of Proportions of ODD Outcomes

breaks <-seq(-0.1,1.1,by=0.2)
hist(sim1,breaks, col="red", border = "green",xlab="Proportion of ODD Numbers",ylab="Frequency", main = "Sampling Distribution of Sample Proportion")

Finally, the mean proportion of simulated proportions is equal to:

mean(sim1)

## [1] 0.500216

Ponder on the following: a) shape of the sampling distribution of the sample proportion
b) the mean proportion of the sample proportion
c) proportion of ODD outcomes when rolling a fair die once
d) writing a simulation like the one above regarding the proportion of EVEN outcomes in 100,000 rolls of five-fair dice.

Assignment 3

November 26, 2019 / Leave a comment

Nonstandard Normal Distributions

Here are nonstandard density curves with a mean equal to 15 and various standard deviations.

Model Problem¹

U.S. Air Force once used ACES-II ejection seats designed for men weighing between 140 lb and 211 lb. Given that women’s weights are normally distributed with a mean 171.1 lb and a standard deviation of 46.1 lb (based on data from the National Health Survey), what percentage of women have weights that are within those limits? Were many excluded from those past specifications?

Model Solution when Using z-tables

Given: Distribution Statement: heightW ~ N(mean = 171.1lb.,sd = 46.1lb.) and heightW boundaries: 140lb and 211lb

Objective(s): a. The proportion of women’s heights within the boundaries.
b. Were many women excluded from those past specifications?

Solution Plan

1. We will sketch and label nonstandard distribution of women’s heights, (heightW) and include distribution statement as its title.
2. Next, we will calculate z-scores for the boundaries and determine the area under density cure within these limits.

3. Finally, we will provide a written statement regarding obtained results.

1. Solution

Lower boundary: x=140
Upper boundary: x=211

2. Model solution when using z-tables

Now we will convert boundaries x=140 and x=211 to respective z-scores:

x = 140 corresponds to z = -0.67
x = 211 corresponds to z = 0.84

The area under the density curve within the limits is found from z-table it is equal to 0.5564. The cumulative area under both tails is 1 – 0.5564=0.4436

3. Statements about obtained results

3a. Just about 55.64% of women met past weight requirements.

3b. About 44.00% of women were excluded from past weight specifications.

Model Solution when Using R for Calculations

For part a. we will use: pnorm(x,mean,sd) code in the following way:

pnorm(211, mean=171.1, sd=46.1)-pnorm(140,mean=171.1, sd=46.1)

## [1] 0.556662

For part b we will use: code, 1- the previous answer:

1-0.556662

## [1] 0.443338

However, we may use more fancy code, (cut/paste) as follow:

1 -(pnorm(210, mean=171.1, sd=46.1)-pnorm(140,mean=171.1, sd=46.1))

## [1] 0.4493441

Assignment 3

Extra Credit: #29 and #33 from page: 252-253. Note: Follow model solution, part; (1), (2), and (3) in order to obtain full credit.

Extracted from M.F Triola, Essential s of Statistics Sixth Edition, Pearson. Essentials of Statistics page: 252, #27↩

Extracted from M.F Triola, Essential s of Statistics Sixth Edition, Pearson. Essentials of Statistics page: 252, #27↩

Assignment 2

November 21, 2019 / 441 Comments on Assignment 2

The Normal Distribution

Here is the Standard Normal Distribution Density Curve is also known as the Bell Curve.

How do we find the area under the Standard Normal Density Curve?

Example 1

We will use the following syntax:

pnorm(0,mean=0,sd=1)

## [1] 0.5

Example 2

How do we find the area under the Standard Normal Density Curve to the left of z = 1.25?

pnorm(1.25, mean=0,sd=1)

## [1] 0.8943502

Example 3

How do we find the area under the Standard Normal Density Curve to the right of z = 1.5?

We can use the following syntax:

1 - pnorm(1.5, mean=0, sd=1)

## [1] 0.0668072

Alternatively the code below

pnorm(1.5, mean=0, sd = 1, lower.tail = FALSE)

## [1] 0.0668072

Example 4

How do we determine the area under the Standard Normal Curve between two z-values, (eg. z = -1.25 and z = 1.25)?

We can calculate the difference of two areas as follows:

pnorm(1.25, mean = 0, sd = 1) - pnorm(-1.25, mean = 0, sd = 1)

## [1] 0.7887005

Summary:

The following code pnorm(z, mean, sd) provides the area under the Standard Normal Density Curve to the left of z-value.

Assignment 2

The bone density test scores follow standard normal distribution for children, healthy premenopausal young women, and men under 50. Assume that a randomly selected person is subjected to a bone density test. Find the probability that this person has a score:

z < -2.5
-2.5 < z < -1
z > -1.

Note: For each part: a, b and c include distribution statement, density curve with shaded area. Also, use a full and complete sentence as the answer to a, b, and c part.
link

Assignment 1

November 8, 2019 / Leave a comment

Tutorial/Extra Credit Assignment

How to partition Births Data Set¹ by gender?

How to create a histogram of Birth Weight Data Set for each gender?

Upload data set by using the following syntax:

BirthsD <-read.csv(file.choose(),header =TRUE)

attach(BirthsD)
head(BirthsD,3)

##                         FACILITY         INSURANCE GENDER..1.M.
## 1 Albany Medical Center Hospital Insurance Company            0
## 2 Albany Medical Center Hospital        Blue Cross            1
## 3 Albany Medical Center Hospital        Blue Cross            0
##   LENGTH.OF.STAY ADMITTED DISCHARGED BIRTH.WEIGHT TOTAL.CHARGES
## 1              2      FRI        SUN         3500       13985.7
## 2              2      FRI        SUN         3900        3632.5
## 3             36      WED        THU          800      359091.0

Select all rows from the BirthsD data set that pertain to girls’ births and include all variables, (columns).

Start by naming this subset girlD, (girls data).

girlD <-BirthsD[GENDER..1.M. == "0",]
attach(girlD)

## The following objects are masked from BirthsD:
## 
##     ADMITTED, BIRTH.WEIGHT, DISCHARGED, FACILITY, GENDER..1.M.,
##     INSURANCE, LENGTH.OF.STAY, TOTAL.CHARGES

head(girlD)

##                          FACILITY         INSURANCE GENDER..1.M.
## 1  Albany Medical Center Hospital Insurance Company            0
## 3  Albany Medical Center Hospital        Blue Cross            0
## 6  Albany Medical Center Hospital        Blue Cross            0
## 7  Albany Medical Center Hospital          Medicaid            0
## 9  Albany Medical Center Hospital Insurance Company            0
## 13 Albany Medical Center Hospital Insurance Company            0
##    LENGTH.OF.STAY ADMITTED DISCHARGED BIRTH.WEIGHT TOTAL.CHARGES
## 1               2      FRI        SUN         3500       13985.7
## 3              36      WED        THU          800      359091.0
## 6               4      FRI        TUE         2400        6406.0
## 7               3      TUE        FRI         4200        4778.0
## 9               2      SAT        MON         3100        3860.0
## 13              4      SUN        THU         2000        6986.9

summary(BIRTH.WEIGHT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     300    2700    3100    3037    3500    4700

Create a frequency table for Girls’ birth weights.

breaks <-seq(0,5000,by=500)
BIRTH.WEIGHT.cut <-cut(BIRTH.WEIGHT,breaks)
BIRTH.WEIGHT.freq <-table(BIRTH.WEIGHT.cut)
frequency.table <-transform(BIRTH.WEIGHT.freq)
frequency.table

##    BIRTH.WEIGHT.cut Freq
## 1           (0,500]    1
## 2       (500,1e+03]    5
## 3   (1e+03,1.5e+03]    1
## 4   (1.5e+03,2e+03]   12
## 5   (2e+03,2.5e+03]   19
## 6   (2.5e+03,3e+03]   50
## 7   (3e+03,3.5e+03]   75
## 8   (3.5e+03,4e+03]   33
## 9   (4e+03,4.5e+03]    7
## 10  (4.5e+03,5e+03]    2

Create a histogram of birth weights for girls’ data subset using breaks and hist command.

breaks<-seq(0,5000,by=500)
hist(BIRTH.WEIGHT, xlab = "Birth Weight in [grams]", ylab="Frequency",ylim=c(0,80),main="Distribution of Birth Weights for Girls", col="pink",border="blue")

detach(girlD)

Now: Select all rows from BirthsD data set that pertain to boys’ births and include all variables, (columns).

Start by naming this subset boyD, (boys’ data).

boyD <-BirthsD[GENDER..1.M. == "1", ]
attach(boyD)

## The following objects are masked from BirthsD:
## 
##     ADMITTED, BIRTH.WEIGHT, DISCHARGED, FACILITY, GENDER..1.M.,
##     INSURANCE, LENGTH.OF.STAY, TOTAL.CHARGES

head(boyD,3)

##                         FACILITY         INSURANCE GENDER..1.M.
## 2 Albany Medical Center Hospital        Blue Cross            1
## 4 Albany Medical Center Hospital Insurance Company            1
## 5 Albany Medical Center Hospital Insurance Company            1
##   LENGTH.OF.STAY ADMITTED DISCHARGED BIRTH.WEIGHT TOTAL.CHARGES
## 2              2      FRI        SUN         3900        3632.5
## 4              5      MON        SAT         2800        8536.5
## 5              2      FRI        SUN         3700        3632.5

summary(BIRTH.WEIGHT)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     300    2900    3400    3273    3650    4900

Create a frequency table for Boys’ birth weights.

BIRTH.WEIGHT.cut <-cut(BIRTH.WEIGHT,breaks)
BIRTH.WEIGHT.freq <-table(BIRTH.WEIGHT.cut)
transform(BIRTH.WEIGHT.freq)

##    BIRTH.WEIGHT.cut Freq
## 1           (0,500]    1
## 2       (500,1e+03]    2
## 3   (1e+03,1.5e+03]    2
## 4   (1.5e+03,2e+03]    5
## 5   (2e+03,2.5e+03]    8
## 6   (2.5e+03,3e+03]   39
## 7   (3e+03,3.5e+03]   69
## 8   (3.5e+03,4e+03]   57
## 9   (4e+03,4.5e+03]   10
## 10  (4.5e+03,5e+03]    2

Create a histogram of birth weights for boys data subset using breaks and hist command.

breaks<-seq(0,5000,by=500)
hist(BIRTH.WEIGHT, xlab = "Birth Weight in [grams]", ylab="Frequency",ylim=c(0,80),main="Distribution of Birth Weights for Boys", col="blue",border="yellow")

detach(boyD)

Extra Credit Assignment

1. Use parts of the above syntax and create two box-and-whisker plots, (one for each gender), describe variability in each subset, and compare variability between genders.

2. Write a report using any word-processing program. Use full and complete sentences; remember to include numerical and graphical summaries in your report. In addition, attach R printout with input/output.

Data Set 4: extracted from M. F Triola, Essentials of Statistics Sixth Edition, Pearson↩

Recent Posts

Recent Comments

How Do We Construct Frequency Distribution Tables Using R?

Part 2 (Relative Frequeancy Distribution Table, Cumulative Frequency Distribution, and Cumulative Relative Frequency Distribution Table)

We will use States Visited, (StatesV) variable included in first day class survey, Sur1.

How Do We Construct A Frequency Distribution Table Using R? Part 1

We will use States Visited, (StatesV) variables included in the first-day class survey, Sur1.

Sampling Distribution of Sample Mean

Things to ponder:

Example 1

Example 2 (Simulation)

Nonstandard Normal Distributions

Here are nonstandard density curves with a mean equal to 15 and various standard deviations.

Model Problem1

The Normal Distribution

Here is the Standard Normal Distribution Density Curve is also known as the Bell Curve.

How do we find the area under the Standard Normal Density Curve?

Example 1

We will use the following syntax:

Example 2

How do we find the area under the Standard Normal Density Curve to the left of z = 1.25?

Example 3

How do we find the area under the Standard Normal Density Curve to the right of z = 1.5?

We can use the following syntax:

Alternatively the code below

Example 4

How do we determine the area under the Standard Normal Curve between two z-values, (eg. z = -1.25 and z = 1.25)?

We can calculate the difference of two areas as follows:

Summary:

Assignment 2

Tutorial/Extra Credit Assignment

How to partition Births Data Set1 by gender?

How to create a histogram of Birth Weight Data Set for each gender?

Upload data set by using the following syntax:

Select all rows from the BirthsD data set that pertain to girls’ births and include all variables, (columns).

Start by naming this subset girlD, (girls data).

Create a frequency table for Girls’ birth weights.

Create a histogram of birth weights for girls’ data subset using breaks and hist command.

Now: Select all rows from BirthsD data set that pertain to boys’ births and include all variables, (columns).

Create a frequency table for Boys’ birth weights.

Create a histogram of birth weights for boys data subset using breaks and hist command.

Extra Credit Assignment

1. Use parts of the above syntax and create two box-and-whisker plots, (one for each gender), describe variability in each subset, and compare variability between genders.

2. Write a report using any word-processing program. Use full and complete sentences; remember to include numerical and graphical summaries in your report. In addition, attach R printout with input/output.

Welcome to the BMCC OpenLab!

Support

Model Problem¹

How to partition Births Data Set¹ by gender?