Researchers: Claire E Ferguson, Laura A Kaneta, Ailin Li, Andrew SID Lang, Philip P Nelson, Moses Satralkar
Introduction
…….
Data Collection and Curation
We used a standard MEQ (Morningness-Eveningness Questionnaire) instrument [1] programmed into a Google Form to collect responses from several first semester freshmen level courses what were they? during the first few weeks of class. This resulted in a dataset of 650 responses.
We curated our data by removing all non-freshmen, all entries not aged 17-19, and several duplicates (student who were in more than one course that we surveyed).
When we wrote the questionnaire, we left the answers free response. This resulted in some non-standard responses for questions 11, 12, and 19. We conjecture that a few students who felt a little torn between two adjoining categories entered a value between the two standard responses. We ended up curating several zeros to ones for question 19 but decided to leave the rest of the data as originally entered: question 11 has 5 ones, 13 threes, and 15 fives; question 12 had 24 ones and 13 fours; and question 19 had 47 threes and 9 fives.
Once the semester had ended we collected the grades of these students and worked out their overall semester GPA and their GPAs for hourly bins corresponding to class start times: 7:00-7:59 AM, 8:00-8:59 AM, 9:00-9:59 AM, 10:00-10:59 AM, 11:00-11:59 AM, 12:00-12:59 PM, 1:00-1:59 PM, 2:00-2:59 PM, 3:00-3:59 PM, 4:00-4:59 PM, 5:00-5:59 PM, and 6:00-6:59 PM. We did not include grades from classes from which the student withdrew and for classes that had different start times on different days we took the time from the day with the longest class period.
This left a final data file with 402 unique records that is ready for modeling 17-19 yrs Freshmen Only.
Data Analysis
Our dataset consists of MEQ Scores and first-semester GPAs by class starting time of 402 first-time college freshmen aged 17-19. Scores can range from 16-86; however our scores range from 17-68 with the following distribution between types [1]:
Type | Range | N | % | Female | Male |
definite evening | 16-30 | 12 | 3% | 7 | 5 |
moderate evening | 31-41 | 95 | 24% | 65 | 30 |
intermediate | 42-58 | 258 | 64% | 171 | 87 |
moderate morning | 59-69 | 37 | 9% | 22 | 15 |
definite morning | 70-86 | 0 | 0% | 0 | 0 |
The trend line shows the evening types obtain lower grades compared to morning types.
GPA vs Chronotype by GenderThe trend lines show that the effect is more significant for males than females.
#R Code
library(ggplot2) #graphics library
setwd("C://...")
mydata = read.csv(file="20180327 17-19 yrs Ready for Analysis.csv",header=TRUE,row.names="id")
summary(mydata)
ggplot(mydata, aes(x=Total, y=GPA))
+ geom_point(color='#2980B9', size = 4)
+ geom_smooth(method=lm, color='#2C3E50') #plotting the data
GPAlmAll <- lm(GPA ~ Total + Sex + US.Resident + College, data=mydata)
summary (GPAlmAll)
[output]
Call:
lm(formula = GPA ~ Total + Sex + US.Resident + College, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-2.8381 -0.3453 0.1963 0.5524 1.1150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.784120 0.254341 10.946 < 2e-16 ***
Total 0.012216 0.004294 2.845 0.00468 **
Sex -0.129858 0.083112 -1.562 0.11899
US.Resident -0.105936 0.143670 -0.737 0.46134
CollegeBusiness -0.007007 0.124435 -0.056 0.95513
CollegeEducation 0.103880 0.153129 0.678 0.49793
CollegeNursing 0.110656 0.132337 0.836 0.40357
CollegeScience and Engineering -0.139757 0.103054 -1.356 0.17583
CollegeTheology and Ministry 0.123386 0.146178 0.844 0.39914
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7471 on 393 degrees of freedom
Multiple R-squared: 0.04753, Adjusted R-squared: 0.02814
F-statistic: 2.451 on 8 and 393 DF, p-value: 0.01342
[output]
GPAlmGender <- lm(GPA ~ Total + Sex, data=mydata)
summary(GPAlmGender)
[output]
Call:
lm(formula = GPA ~ Total + Sex, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-2.9894 -0.3603 0.2018 0.5513 1.0066
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.680268 0.205459 13.045 < 2e-16 ***
Total 0.012391 0.004282 2.894 0.00402 **
Sex -0.170118 0.078754 -2.160 0.03136 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.748 on 399 degrees of freedom
Multiple R-squared: 0.03072, Adjusted R-squared: 0.02586
F-statistic: 6.323 on 2 and 399 DF, p-value: 0.001979
[output]
confint(GPAlmGender, level=0.95) # CIs for model parameters
[output]
2.5 % 97.5 %
(Intercept) 2.276350120 3.08418664
Total 0.003972952 0.02080941
Sex -0.324942073 -0.01529337
[output]
GPAlm <- lm(GPA ~ Total, data=mydata)
summary(GPAlm)
[output]
Call:
lm(formula = GPA ~ Total, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-2.9293 -0.3643 0.1778 0.5857 1.0130
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.636441 0.205390 12.836 < 2e-16 ***
Total 0.012090 0.004299 2.812 0.00517 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7514 on 400 degrees of freedom
Multiple R-squared: 0.01939, Adjusted R-squared: 0.01693
F-statistic: 7.908 on 1 and 400 DF, p-value: 0.005165
[output]
#Now do just GPA vs. Total Score for all times.
lm <- lm(X7am ~ Total + Sex, data=mydata)
summary(lm)
confint(lm, level=0.95) # CIs for model parameters
[output]
Call:
lm(formula = X7am ~ Total + Sex, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-3.1128 -0.3263 0.5305 0.7239 1.0268
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.54713 0.52732 4.830 4.36e-06 ***
Total 0.01551 0.01077 1.440 0.153
Sex -0.07029 0.19281 -0.365 0.716
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.005 on 112 degrees of freedom
(287 observations deleted due to missingness)
Multiple R-squared: 0.01849, Adjusted R-squared: 0.0009604
F-statistic: 1.055 on 2 and 112 DF, p-value: 0.3517
2.5 % 97.5 %
(Intercept) 1.502308773 3.59195336
Total -0.005830429 0.03685108
Sex -0.452315382 0.31174105
[output]
#...
Time | N | Slope | 95% CI | p-value |
7am | 113 | 0.01551 | -0.005830429 - 0.03685108 | 0.153 |
8am | 224 | 0.017807 | 0.001105433 - 0.03450847 | 0.0368 |
9am | 325 | 0.011562 | -0.001864789 - 0.02498809 | 0.0912 |
10am | 235 | 0.013259 | 0.0008523036 - 0.02566505 | 0.0363 |
11am | 18 | 0.01928 | -0.01463043 - 0.05318793 | 0.2468 |
12pm | 239 | 0.008412 | -0.005886458 - 0.02271084 | 0.248 |
1pm | 272 | -0.002481 | -0.01588811 - 0.0109260 | 0.716 |
2pm | 326 | 0.011813 | 0.001633317 - 0.02199244 | 0.0231 |
3pm | 238 | 0.017493 | 0.003886605 - 0.03110015 | 0.012 |
4pm | 46 | 0.006865 | -0.02266914 - 0.03639984 | 0.642 |
5pm | 31 | -0.003902 | -0.03647586 - 0.02867283 | 0.808 |
6pm | 39 | 0.004018 | -0.0275687 - 0.03560398 | 0.798187 |
The model slopes (size of effect of MEQ score on GPA controlled by Gender) by class start time were analysed. The color is size of confidence interval and the label is the number of data points used to create the slope values.
The results trend line:
Panes | Line | Coefficients | ||||||
Row | Column | p-value | DF | Term | Value | StdErr | t-value | p-value |
Slope | Time | 0.0379781 | 10 | Time | -0.0012825 | 0.0005367 | -2.3897 | 0.0379781 |
intercept | 0.026001 | 0.0069596 | 3.73598 | 0.0038721 |
This shows that MEQ scores are more significant for early course than for later ones.
More Analysis
The data was split by MEQ score into the top and bottom 20%, leaving 60% in the middle. Then average GPA by class starting time was analysed for each group.
The model results are as follows (the red color indicates less than 50 data values):
Individual trend lines:
Row Column | p-value | DF | Term | Value | StdErr | t-value | p-value |
GPA Bottom 20% | 0.0512913 | 9 | Time | 0.0486018 | 0.0216339 | 2.24655 | 0.0512913 |
intercept | 2.611 | 0.284101 | 9.19038 | < 0.0001 | |||
GPA Middle 60% | 0.001727 | 9 | Time | 0.0420615 | 0.0095652 | 4.39734 | 0.001727 |
intercept | 2.89689 | 0.125612 | 23.0621 | < 0.0001 | |||
GPA Top 20% | 0.476167 | 9 | Time | 0.0159039 | 0.021392 | 0.743452 | 0.476167 |
intercept | 3.16824 | 0.280923 | 11.2779 | < 0.0001 |
Chronotype and Time Period
The data was subsetted in morning (7,8, and 9), middle-of-the-day (11, 12, 13, 14, and 15), and afternoon (16, 17, and18) classes. Then we used R to find the relationship between GPA and Chronotype for each subset.
library(Publish)
setwd("...")
mydata = read.csv(file="20180405GPAByTimeOfDayWithChronotypeWithTimeType.csv",
header=TRUE,row.names="id")
A1 <- subset(mydata,TimePeriod=="Morning" & Chronotype=="definite evening")
A2 <- subset(mydata,TimePeriod=="Morning" & Chronotype=="intermediate")
A3 <- subset(mydata,TimePeriod=="Morning" & Chronotype=="moderate evening")
A4 <- subset(mydata,TimePeriod=="Morning" & Chronotype=="moderate morning")
B1 <- subset(mydata,TimePeriod=="Middle of the Day" & Chronotype=="definite evening")
B2 <- subset(mydata,TimePeriod=="Middle of the Day" & Chronotype=="intermediate")
B3 <- subset(mydata,TimePeriod=="Middle of the Day" & Chronotype=="moderate evening")
B4 <- subset(mydata,TimePeriod=="Middle of the Day" & Chronotype=="moderate morning")
C1 <- subset(mydata,TimePeriod=="Afternoon" & Chronotype=="definite evening")
C2 <- subset(mydata,TimePeriod=="Afternoon" & Chronotype=="intermediate")
C3 <- subset(mydata,TimePeriod=="Afternoon" & Chronotype=="moderate evening")
C4 <- subset(mydata,TimePeriod=="Afternoon" & Chronotype=="moderate morning")
ci.mean(A1$GPA)
ci.mean(A2$GPA)
ci.mean(A3$GPA)
ci.mean(A4$GPA)
ci.mean(B1$GPA)
ci.mean(B2$GPA)
ci.mean(B3$GPA)
ci.mean(B4$GPA)
ci.mean(C1$GPA)
ci.mean(C2$GPA)
ci.mean(C3$GPA)
ci.mean(C4$GPA)
[output]
> ci.mean(A1$GPA)
mean CI-95%
2.56 [1.74;3.39]
> ci.mean(A2$GPA)
mean CI-95%
3.19 [3.10;3.29]
> ci.mean(A3$GPA)
mean CI-95%
2.94 [2.76;3.12]
> ci.mean(A4$GPA)
mean CI-95%
3.43 [3.25;3.60]
> ci.mean(B1$GPA)
mean CI-95%
3.19 [2.86;3.51]
> ci.mean(B2$GPA)
mean CI-95%
3.38 [3.31;3.44]
> ci.mean(B3$GPA)
mean CI-95%
3.20 [3.07;3.32]
> ci.mean(B4$GPA)
mean CI-95%
3.49 [3.35;3.63]
> ci.mean(C1$GPA)
mean CI-95%
3.15 [2.44;3.86]
> ci.mean(C2$GPA)
mean CI-95%
3.56 [3.45;3.66]
> ci.mean(C3$GPA)
mean CI-95%
3.36 [3.12;3.60]
> ci.mean(C4$GPA)
mean CI-95%
3.63 [3.40;3.87]
[output]
The results show a typical increase of GPA for all chronotypes as the day goes on but the rate of increase is, as expected, dependent on chronotype.
The Effect Class Start Time on GPA by Chronotype
We used R to build a simple linear model and found that:
1. All chronotypes in this study (zero definite morning chronotype) perform significantly better in afternoon classes than morning classes.
2. Later chronotypes perform worse at all times of the day but the difference in grades is less significant in afternoon and early evening classes than it is in morning classes. (No true evening classes - with start times 7pm or later and no definite morning chronotypes to compare to.
We sorted the dataset by randomized ID (to randomize) and then by MEQ - from low to high
The Mode MEQ Score is 48: 1227-1097+1=131 rows
Total Rows:2131
GPATimelm <- lm(GPA ~ Time + Total, data=mydata)
summary(GPATimelm)
[output]
Call:
lm(formula = GPA ~ Time + Total, data = mydata)
Residuals:
Min 1Q Median 3Q Max
-3.5241 -0.3727 0.5061 0.6944 1.0434
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.409876 0.144427 16.686 < 2e-16 ***
Time 0.033251 0.007345 4.527 6.31e-06 ***
Total 0.010798 0.002396 4.506 6.98e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9571 on 2127 degrees of freedom
Multiple R-squared: 0.0185, Adjusted R-squared: 0.01758
F-statistic: 20.05 on 2 and 2127 DF, p-value: 2.372e-09
[output]
## sliding window of data by MEQ
mydata = read.csv(file="20180405GPAByTimeOfDayWithChronotypeWithTimeType.csv",header=TRUE,row.names="id")
numrows = 50 #100 works
variables = 4
iterations = nrow(mydata) - numrows
output <- matrix(ncol=variables, nrow=iterations)
for(i in 1:iterations){
newdata <- mydata[i:(i+numrows),]
GPATimelm <- lm(GPA ~ Time, data=newdata)
output[i,1] <-mean(newdata$Total)
output[i,2] <- coef(summary(GPATimelm))["Time","Estimate"]
output[i,3] <- coef(summary(GPATimelm))["Time","Pr(>|t|)"]
output[i,4] <- confint(GPATimelm, "Time")[2] - confint(GPATimelm, "Time")[1]
}
output <- data.frame(output)
write.csv(output,file="results.csv")
meqlm <- lm(X2 ~ X1, data=output)
summary(meqlm)
[output]
Call:
lm(formula = X2 ~ X1, data = output)
Residuals:
Min 1Q Median 3Q Max
-0.121749 -0.024635 -0.002654 0.026611 0.132344
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0684233 0.0053010 12.908 < 2e-16 ***
X1 -0.0007325 0.0001109 -6.607 4.97e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.041 on 2078 degrees of freedom
Multiple R-squared: 0.02057, Adjusted R-squared: 0.0201
F-statistic: 43.65 on 1 and 2078 DF, p-value: 4.971e-11
[output]
Here is the results file and visualization using excel :( of Predicted GPA at 1:00 pm by MEQ Score and then extrapolated using the predicted slopes for each MEQ score.
===References===
1. Terman M, Terman JS. Light therapy for seasonal and nonseasonal depression: efficacy, protocol, safety, and side
effects. CNS Spectrums, 2005;10:647-663. (Downloadable at www.cet.org)