Introduction

College student mental health has been getting more and more attention recently. Despite many solutions have been proposed, this is not an easy-to-solve problem. For this project, we are seeking to build a prediction system that can tell when students are easily get stressed out, and thus render possibility to certain interventions which would help college students to maintain a healthy mental condition. We would also expect this project to be a start- ing point in terms of helping people to better understand College student stress patterns as well.

Dataset

In this project we use StudentLife Dataset, which is collected by research group of Dartmouth College. The data set contains a wide variety of data including sensor data, EMA data, survey responses and educational data. Among the large amount of data, We mainly focus on data concern- ing student mental condition such as Stress Level, Enthusiasm, Calm, etc. and their daily behavioral data, including Sleeping Hours, Working hours, Exercise, Social, etc. After diving into the data, we can summarize the data shape as follow.

Methodology

Since our data set contains quite a limited number of records – only 60 students were involved in the study – it would be very hard for us to use these 60 students as sample to predict other student’s stress level. Therefore, we decided to switch our study focus a bit, from predicting the stress level of each individual student, to predict the overall trend of students stress level as a whole.

First, we would summarize the stress value and average it based on the participants amount. The next step we would analyze the average stress value change trend by time. Then we would use the Granger Causality Analysis to detect the causality between stress and other features. This is because we are interested in seeing how students stress pattern in general is formed, and where would it lead to. Hopefully we would find out the most significant feature that impact stress level.

Meanwhile, we would build an Autoregressive(AR) Model to predict student stress level based on their historical stress data. In order to predict student stress in a more accurate manner, we also introduce auto-correlation function (ACF) to measure the coefficient of correlation between student stress values in a time series.

Granger Causality Analysis

Prediction on student stress

Stability & Auto-correlation testing

We first calculated the moving average values of student stress level with a sliding window size of 3. This helps create a smoothed version of the original data. It seems that students stress level would peak during the midterm of the surveyed period, and it reaches another peak almost at the end of the survey period.

The rolling mean and rolling standard deviation of the time series data looks much more smooth than the original stress time series data (ts stress). However, it is hard for us to tell how stable the time series actually are, if we simply rely on visual observations. In order to test the stability of stress data, we conducted a Dickey- Fuller Test on its moving average differences, and get a small p-value of 0.024. Based on this, we can conclude that the ts stress is mostly stable.

We also used Auto Correlation Function (ACF) to measure the correlation between ts stress and itself. When lag is smaller than 7, the time series is positively correlated with itself.

Fitted Models on student stress

We built an Auto-regressive Model (AR) to fit the time series data, with parameter set- ting lag = 7. Below is the fitted values (red line) as compared to the original stress data (blue line), and the AR model would result in residual sum of squares (RSS) of 17.383. It seems that the AR model pretty much capture the trend of students stress level.

We would also like to if other models would outperform the AR model, so we built an Moving Average Model (MA) to see if the results get any better. It turned out the RSS value of MA model is higher than AR model, which reaches an RSS value of 22.210. It can also be implied from the plot that, MA model fails to capture some of the trends of ts stress.

If we applied an integrated approach, by combining Moving Average and Auto Regressive model, it actually shows an even better result with RSS reduced to 14.129. The fitted values (red line) of model can capture most of the trend features of the ts stress data.

Model Prediction on student stress

Check the complete report here: Time series prediction on College student stress level