Data is powerful, especially in the realm of education. At times, it may be self-affirming while on the other hand, it makes you question your current practices and policies because the data identifies further ramifications that make us have to dig deeper to determine what’s going on as well as devising solutions to the problems we face as educators. But, most importantly, it tells a story about the students we serve, which then we are called to do something about it as teachers and administrators. Over the last few days, I have been able to analyze several data sets that have been collected from the California Department of Education. The data sets I analyzed encompass all of the 2018 California K-12 School demographics, state testing scores, attendance rates, suspension rates, and funding mechanisms. My goal was to transform the data into several self-affirming stories of what the data is telling us as well as how we dig deeper into the stories the data is illustrating to determine new insights into we how to solve the problems we have identified.
Example 1: English Language Learners and the 2018 CAASPP ELA State Test
After looking at several data sets, there are a number of self-affirming findings that we can all relate to. First, I analyzed the data sets that include the percentage of English Language Learners (ELL’s) and the overall 2018 CAASPP English Language Arts state testing standard met percentage (i.e., passing percentage of the test). I decided to first look at the overall average of the percentage of ELL’s in our schools in California. Then, I decided to look at the overall passing percentage of the 2018 English Language Arts CAASPP state test. Overall, just looking at the averages, I saw that the average percentage of ELL”s in our schools is 22% while the overall CAASPP English Language Arts passing rate was 44% in 2018. I then decided to go further and turned each data set into a variable (i.e., ELL percentage and 2018 CAASPP ELA passing percentage). I computed a correlation, which is a statistical calculation that illustrations a relationship between each variable. This computation ultimately created an output of a moderate negative correlation, r=-.43, p < 0.01. What this means is that when one variable increases, like the percentage of ELL’s at a school site, the other variable, the 2018 CAASPP English Language Arts passing percentage, decreases. This calculation was computed from a sample size of 9376 schools, which is quite a large sample size. As a result, we can say with a high probability that this statistical output is significant and represents the phenomenon taking place in our school across California.
From this first example above, I ask you if we have learned anything new from this? On a macro and micro level, I believe it affirms that students who know English tend to score better on a test that assesses their ability to read, write, and listen using the English language. Also, this result shows we have a long way to go in developing our ELL curriculum. Right now, based on this result, its currently failing us right now. As I discussed earlier, this is an example of how data can be self-affirming of our inferences and assumptions. However, with this information, we can dig deeper. For example, future calculations can include determining if ELL’s pass percentages have increased over the course of the last five years. In addition, we can further target schools who may be receiving specific grants or Title 1 funds to draw comparisons and trends.
Example 2: 2018 CAASPP English Language Arts Passing Percentage vs. 2018 CAASPP Math Passing Percentage
Another example of this self-affirmation is when I computed another correlation between the 2018 CAASPP Math standard met percentage the CAASPP English Language Arts standard met percentage. Across the board, students scored better overall on the English Language Arts CAASPP by meeting the standard at 11% percentage points higher than the Math segments of the test. But, interestingly, when the correlation produced its output, I found that when the overall passing percentage for English increases, it does also increases for the Math passing percentage. This relationship exists with a high-level certainty, which is exemplified by strong correlation, r=.89, p < 0.01. While I cannot say this relationship demonstrates complete causality describing how when English scores rise, Math scores also rise. I can say with a high level of confidence that this relationship exists because of our large sample size of 9367 schools that were utilized for this computation.
While I believe this example is self-affirming to an extent, I do believe we can derive a number of powerful conclusions that can lead to further inquiry and analysis. First, schools that score well in English, also score well for Math. Second, schools that are not scoring well in English are scoring even worse in Math. Furthermore, we see two contrasting stories that are illustrated as a result of this data analysis. First, one story shows schools wh are doing well in English are doing well across the board. On the other hand, another story illustrates how if schools are scoring poorly in English, they are likely to be scoring even worse in Math. Now, based on this information, let’s ask some questions to see how we can further target the data for further analysis. One major question we need to ask is what types of schools are scoring well in English? We would need to begin filtering schools that are scoring above the average of the 44% passing percentage for English. In this filtering process, we need to look at demographics, funding, attendance, discipline, and student enrollment. By digging deeper into these various areas, we will get more of the major details as to why schools have higher scores than others. Maybe it relates to demographics, funding, and/or curriculum. Those are all areas that certainly need to be further investigated.
Example 3: 2018 Student Absence and Suspension Percentages vs. 2018 CAASPP English Language Arts Passing Percentage
For this last example, I decided to run a more complex statistical computation called a multiple linear regression. I took the data sets of the 2018 student absence percentage and the student suspension percentage as well as the CAASPP English Language Arts scores and turned each of these data sets into the corresponding variables. A multiple linear regression takes these variables and develops a linear model to show a relationship between the independent variables and dependent variables and makes a prediction of the outcome on the dependent variable by the independent variables. The adjusted r-squared for this computation was .322, which explains that the variance of the dependent variable (i.e., CAASPP ELA passing percentage) was explained 32% of the time by the two independent variables (i.e., student absence percentage and student suspension percentage). This is considered a moderate prediction and is a statistically significant prediction because the sample size was large at 9365 schools and the p-value was less than .01 for each variable computed.
Furthermore, from this calculation, we learned that as the absence and suspension percentages increase, they cause the overall passing scores of the CAASPP English Language Arts test to fall. In total, the calculation shows that the English scores for a specific school site will go down by 1.3 standard deviations from the average score. What this means is the furthest English CAASPP scores can go down is by 26.6 percentage points below the school sites’ overall average. Remember, as described above, this fall in scores is the most extreme, which has a very low probability of occurring. Thus, generally speaking, scores will not fall this far because the absence and suspension percentages would have to be at their highest to cause such phenomena to occur.
Overall, what can we take away from this final calculation? First and foremost, the self-affirming assumption that was further concluded here was that the English CAASPP scores will go down when absence and suspension percentages increase. However, we must remember that these two variables will only predict the CAASPP English scores with 32% accuracy. While 32% accuracy may seem high, we must remember there are many other variables that can go into this equation, which can depict a more fuller picture of what’s going on at a school besides absences, suspensions, and test scores. Secondly, we found that students being at school is extremely important. If students are not at school, they are not learning. In addition, think about how absence and suspension percentages are interrelated. When students are suspended, they are not at school, which compounds the absence percentage. How can we go deeper here? One way we can go deeper is to look at school demographics. Which populations of students are missing school and are being suspended? Furthermore, we can look and see how this specific population students are fairing on the English CAASPP.
I hope through this post you are able to see how powerful data can be to identifying problems as well as allowing us to ask further questions to deepen our understanding of the phenomena we are experiencing within our classrooms, school sites, and districts. Take your data and use it to your advantage to illustrate the stories that are occurring within your educational setting. The data will tell you how you can further by prompting you to ask questions to dig deeper into the story to undercover major details that will facilitate the learning of their underlining themes. Ultimately, through this process, you will have gained a greater understanding of the problem you have identified from the data as well as how to target pieces of data that can be used as a way to the solution.
3 thoughts on “The Power of Analyzing Statewide Education Data: Self-Affirming Conclusions that Prompt Us to Dig Deeper Into the Data”
Will you be continuing on with this topic? You have me hooked for more.
It is great.
Yes! There will be more.
More please. 🍓🍓❤