# Correlation and Mixed Effects

# Introduction

In this tutorial we want to evaluate how the parameters we measure with the MultispeQ relate to crop outputs. We will use the same experiment that we used in the previous tutorial: "ANOVA and multivariate analysis."

However, in this tutorial,the data was transformed in a csv file so we cannot generate the data-frame directly from https://photosynq.org (opens new window).

The first step will be to generate a correlation matrix of crop yield against the mean of each PhotosynQ parameter. Next, we will use a mixed effect model to account for factors that affect the PhotosynQ parameters: 1) light intensity, 2) time of measurement, 3) location within the field, etc.

# Import Libraries

library("broom")
library("dplyr")
library("data.table")
library("stringr")
library("Hmisc")
library("lme4")

# Simple Correlation

We will use the dataset 'sun2.' This dataset contains the mean values of each PhotosynQ parameter and the crop yield for each plot.

First, we need to open up our data file.

# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun2.csv"

# Let's pull that data file into R and call it data_raw
my_data = read.csv(str_c(your_location, file_name))
View(my_data)

Now, lets generate a correlation matrix.

my_data <- my_data[c(3,4,5,6,7)]
head(my_data, 6)
Yield Phi2 PhiNPQ PhiNO SPAD
1 1556 0.4017500 0.2746250 0.3237500 45.56125
2 2667 0.2710000 0.4264444 0.3025556 42.81889
3 1778 0.2536000 0.4234000 0.3232000 45.60200
4 2222 0.3087500 0.3706667 0.3136667 45.91750
5 2000 0.4288750 0.2431250 0.3280000 42.57625
6 1444 0.4229167 0.2138182 0.3446364 45.34083
res2 <- rcorr(as.matrix(my_data))
res2
        Yield  Phi2 PhiNPQ PhiNO  SPAD
  Yield   1.00 -0.28   0.48 -0.26  0.52
  Phi2   -0.28  1.00  -0.92 -0.65  0.01
  PhiNPQ  0.48 -0.92   1.00  0.31  0.23
  PhiNO  -0.26 -0.65   0.31  1.00 -0.49
  SPAD    0.52  0.01   0.23 -0.49  1.00

  n= 16 


  P
        Yield  Phi2   PhiNPQ PhiNO  SPAD  
  Yield         0.2888 0.0603 0.3317 0.0380
  Phi2   0.2888        0.0000 0.0067 0.9821
  PhiNPQ 0.0603 0.0000        0.2423 0.3816
  PhiNO  0.3317 0.0067 0.2423        0.0524
  SPAD   0.0380 0.9821 0.3816 0.0524   

The first correlation matrix presents the correlation coefficients and the second matrix contains the corresponding p values.

There was a significant correlation between crop yield and SPAD, but crop yield was not significantly correlated with any other PhotosynQ parameter.

# Transforming our data with a Mixed Effect Model

Now, we will use a mixed effect model to generate adjusted coefficients for each PhotosynQ parameter in each unique plot. The mixed effect model will take into account light intensity, the time of measurement, the location within the field (i.e. block or replicate), and each unique plot. Our unique plot, which we generated by using the 'concatenate' function in excel, combines the variety and block to give us the unique plot. For example, Variety A + Block 1 = plot A1.

We will use the dataset 'sun,' which is the same dataset used in the "ANOVA and multivariate analysis tutorial."

First, we need to read our data file

# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun.csv"

# Let's pull that data file into R and call it data_raw
data = read.csv(str_c(your_location, file_name))
View(data)

We need to set block to a categorical variable

data$Block <- factor(data$Block, ordered = TRUE)

Now we will run a mixed effect model for SPAD. We will set block as a random effect. Plot is the fixed effect whose coefficient we are interested in.

SPAD <- lmer(SPAD ~ Plot + (1|Block), data)
unable to evaluate scaled gradient Hessian is numerically singular: parameters are not uniquely determined
summary(SPAD)
  Linear mixed model fit by REML ['lmerMod']
  Formula: SPAD ~ Plot + (1 | Block)
  Data: data

  REML criterion at convergence: 526.7

  Scaled residuals: 
  Min      1Q  Median      3Q     Max 
  -4.4318 -0.4194  0.0907  0.5720  1.6051 

  Random effects:
  Groups   Name        Variance Std.Dev.
  Block    (Intercept)  1.373   1.172   
  Residual             30.137   5.490   
  Number of obs: 96, groups:  Block, 4

  Fixed effects:
              Estimate Std. Error t value
  (Intercept) 45.56125    2.26713  20.096
  PlotA2      -2.74236    3.14025  -0.873
  PlotA3       0.04075    3.54118   0.012
  PlotA4       0.35625    3.74791   0.095
  PlotB1      -2.98500    2.74484  -1.087
  PlotB2      -0.22042    3.00400  -0.073
  PlotB3       1.12589    3.28906   0.342
  PlotB4      -4.01458    4.06917  -0.987
  PlotC1      -6.17000    2.74484  -2.248
  PlotC2      -1.04292    3.39638  -0.307
  PlotC3      -6.63125    4.06917  -1.630
  PlotC4      -1.96375    3.74791  -0.524
  PlotD1      -4.97125    2.84118  -1.750
  PlotD2      -0.44625    3.39638  -0.131
  PlotD3      -1.73792    4.06917  -0.427
  PlotD4      -8.57792    4.06917  -2.108

  Correlation matrix not shown by default, as p = 16 > 12.
  Use print(x, correlation=TRUE)  or
  vcov(x)     if you need it

  convergence code: 0
  unable to evaluate scaled gradient
  Hessian is numerically singular: parameters are not uniquely determined
SPAD = tidy(SPAD)
setnames(SPAD, "estimate", "SPAD")
write.csv(SPAD, file = "C:/Users/Dan/Desktop/SPAD.csv")

For the photosynthesis parameters (Phi2, PhiNPQ, PhiNO, qL, NPQt, and LEF) we will include sqrtPAR and TimeofDay has fixed effects. Again, block is set as a random effect.

Phi2 <- lmer(Phi2 ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
unable to evaluate scaled gradient Hessian is numerically singular: parameters are not uniquely determined
summary(Phi2)
  Linear mixed model fit by REML ['lmerMod']
  Formula: Phi2 ~ sqrtPAR + TimeofDay + Plot + (1 | Block)
  Data: data

  REML criterion at convergence: -187.7

  Scaled residuals: 
  Min       1Q   Median       3Q      Max 
  -2.95051 -0.47831 -0.05387  0.53481  2.31369 

  Random effects:
  Groups   Name        Variance Std.Dev.
  Block    (Intercept) 0.001034 0.03216 
  Residual             0.003005 0.05482 
  Number of obs: 96, groups:  Block, 4

  Fixed effects:
              Estimate Std. Error t value
  (Intercept)  0.8346472  0.0443797  18.807
  sqrtPAR     -0.0130663  0.0007015 -18.627
  TimeofDay    0.0010404  0.0010073   1.033
  PlotA2      -0.0773681  0.0538900  -1.436
  PlotA3      -0.0287904  0.0555531  -0.518
  PlotA4       0.0099678  0.0568005   0.175
  PlotB1       0.0275708  0.0274103   1.006
  PlotB2       0.0573877  0.0524106   1.095
  PlotB3      -0.0308226  0.0554752  -0.556
  PlotB4       0.0454166  0.0590560   0.769
  PlotC1       0.0573289  0.0274110   2.091
  PlotC2      -0.0164455  0.0560657  -0.293
  PlotC3      -0.0378036  0.0594238  -0.636
  PlotC4       0.0444594  0.0566751   0.784
  PlotD1       0.0014681  0.0283776   0.052
  PlotD2       0.0276461  0.0543171   0.509
  PlotD3      -0.0304939  0.0588505  -0.518
  PlotD4       0.0014337  0.0592383   0.024

  Correlation matrix not shown by default, as p = 18 > 12.
  Use print(x, correlation=TRUE)  or
  vcov(x)     if you need it

  convergence code: 0
  unable to evaluate scaled gradient
  Hessian is numerically singular: parameters are not uniquely determined
Phi2 = tidy(Phi2)
setnames(Phi2, "estimate", "Phi2")
write.csv(Phi2, file = "C:/Users/Dan/Desktop/Phi2.csv")
PhiNPQ <- lmer(PhiNPQ ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
summary(PhiNPQ)
  Linear mixed model fit by REML ['lmerMod']
  Formula: PhiNPQ ~ sqrtPAR + TimeofDay + Plot + (1 | Block)
  Data: data

  REML criterion at convergence: -117.6

  Scaled residuals: 
  Min      1Q  Median      3Q     Max 
  -3.2005 -0.4525  0.0229  0.4807  2.7235 

  Random effects:
  Groups   Name        Variance  Std.Dev.
  Block    (Intercept) 0.0009506 0.03083 
  Residual             0.0061612 0.07849 
  Number of obs: 89, groups:  Block, 4

  Fixed effects:
              Estimate Std. Error t value
  (Intercept) -0.044948   0.053966  -0.833
  sqrtPAR      0.010065   0.001029   9.782
  TimeofDay   -0.004823   0.001555  -3.101
  PlotA2       0.153504   0.060392   2.542
  PlotA3       0.057124   0.063182   0.904
  PlotA4       0.019630   0.069208   0.284
  PlotB1      -0.030705   0.039249  -0.782
  PlotB2      -0.048570   0.058052  -0.837
  PlotB3       0.085652   0.065148   1.315
  PlotB4      -0.025410   0.069402  -0.366
  PlotC1      -0.074502   0.042393  -1.757
  PlotC2       0.061841   0.067318   0.919
  PlotC3      -0.036901   0.070065  -0.527
  PlotC4      -0.041492   0.065179  -0.637
  PlotD1      -0.001545   0.040634  -0.038
  PlotD2      -0.006358   0.060909  -0.104
  PlotD3       0.024222   0.076133   0.318
  PlotD4      -0.078141   0.069719  -1.121

  Correlation matrix not shown by default, as p = 18 > 12.
  Use print(x, correlation=TRUE)  or
  vcov(x)     if you need it
PhiNPQ = tidy(PhiNPQ)
setnames(PhiNPQ, "estimate", "PhiNPQ")
write.csv(PhiNPQ, file = "C:/Users/Dan/Desktop/PhiNPQ.csv")
PhiNO <- lmer(PhiNO ~ sqrtPAR + TimeofDay + Plot + (1|Block), data)
summary(PhiNO)
  Linear mixed model fit by REML ['lmerMod']
  Formula: PhiNO ~ sqrtPAR + TimeofDay + Plot + (1 | Block)
  Data: data

  REML criterion at convergence: -151.6

  Scaled residuals: 
  Min      1Q  Median      3Q     Max 
  -2.1356 -0.4583 -0.1241  0.4630  3.9859 

  Random effects:
  Groups   Name        Variance  Std.Dev.
  Block    (Intercept) 0.0000675 0.008216
  Residual             0.0038155 0.061770
  Number of obs: 89, groups:  Block, 4

  Fixed effects:
              Estimate Std. Error t value
  (Intercept)  0.2171820  0.0358098   6.065
  sqrtPAR      0.0028837  0.0008097   3.562
  TimeofDay    0.0029660  0.0012239   2.423
  PlotA2      -0.0669853  0.0348748  -1.921
  PlotA3      -0.0271227  0.0378120  -0.717
  PlotA4      -0.0340018  0.0438607  -0.775
  PlotB1       0.0032473  0.0308868   0.105
  PlotB2      -0.0081475  0.0323204  -0.252
  PlotB3      -0.0283234  0.0398242  -0.711
  PlotB4      -0.0204652  0.0440500  -0.465
  PlotC1       0.0210644  0.0333608   0.631
  PlotC2      -0.0123546  0.0419996  -0.294
  PlotC3       0.0760843  0.0446960   1.702
  PlotC4      -0.0033037  0.0398558  -0.083
  PlotD1       0.0001316  0.0319768   0.004
  PlotD2      -0.0205350  0.0354265  -0.580
  PlotD3       0.0177115  0.0504687   0.351
  PlotD4       0.0781634  0.0443591   1.762

  Correlation matrix not shown by default, as p = 18 > 12.
  Use print(x, correlation=TRUE)  or
  vcov(x)     if you need it
PhiNO = tidy(PhiNO)
setnames(PhiNO, "estimate", "PhiNO")
write.csv(PhiNO, file = "C:/Users/Dan/Desktop/PhiNO.csv")

We wrote all of our results into a csv and saved them to our computer. Using these results we will create a new csv file that includes the coefficients of all of our PhotosynQ parameters and crop yield data (you can set the coefficient for plot A1 to 0). We will use this file, taking into account light intensity, time of day, and block, to re-run the correlation matrix we did in the beginning of the tutorial. Hopefully, we will identify more significant relationships between PhotosynQ parameters and crop yield than we did using plot averages!

# Correlations using our new values

We need to read our new dataset, 'sun3.'

# Set the main folder location for your data
your_location = "C:/Users/Dan/Desktop/"
# The main data file, located in the main folder, is called... 
file_name = "sun3.csv"

# Let's pull that data file into R and call it data_raw
data = read.csv(str_c(your_location, file_name))
View(data)

Now, we will generate a correlation matrix using crop yield and the coefficients from the mixed effects model above.

data <- data[c(4,5,6,7,8)]
head(data, 6)
Yield Phi2 PhiNPQ PhiNO SPAD
1 1556 0.00000000 0.00000000 0.00000000 0.00000000
2 2667 -0.07736807 0.15350358 -0.066985307 -2.7423611
3 1778 -0.02879036 0.05712399 -0.027122671 0.0407500
4 2222 0.00996781 0.01963012 -0.034001825 0.3562500
5 2000 0.02757076 -0.03070520 0.003247293 -2.9850000
6 1444 0.05738775 -0.04856991 -0.008147453 -0.2204167
res2 <- rcorr(as.matrix(data))
res2
        Yield  Phi2 PhiNPQ PhiNO  SPAD
  Yield   1.00 -0.46   0.67 -0.57  0.52
  Phi2   -0.46  1.00  -0.77  0.12 -0.07
  PhiNPQ  0.67 -0.77   1.00 -0.71  0.53
  PhiNO  -0.57  0.12  -0.71  1.00 -0.72
  SPAD    0.52 -0.07   0.53 -0.72  1.00

  n= 16 


  P
        Yield  Phi2   PhiNPQ PhiNO  SPAD  
  Yield         0.0739 0.0043 0.0206 0.0380
  Phi2   0.0739        0.0005 0.6504 0.7864
  PhiNPQ 0.0043 0.0005        0.0021 0.0366
  PhiNO  0.0206 0.6504 0.0021        0.0015
  SPAD   0.0380 0.7864 0.0366 0.0015      

After transforming our data using a mixed effects model, the correlations between PhotosynQ parameters and crop yield are much stronger than when using averages. While the SPAD correlation is unchanged, we now see a significantly positive correlation between PhiNPQ and crop yield and a significantly negative correlation between PhiNO and crop yield.