What’s Wrong with Educational Testing and How We Can Fix it

As quantitative psychologists who study education, teachers and parents often ask us, “What went wrong with all these tests, and how can we fix them?”

At best, educational assessments—from large-scale standardized exams administered over an entire state, to targeted cognitive diagnostic tests used by psychologists in schools—are considered a necessary evil by the teachers, parents, and students who are subjected to them.

At worst, educational testing can be a cruel gate-keeper, benefiting those students whose background allowed them to gain high levels of academically-relevant knowledge early-on in their development, while stifling the learning potential of those students who—for a whole variety of reasons including emotional trauma, cognitive differences, or poverty—have fallen behind.

But this doesn’t need to be the case. A new solution in quantitative psychology means that we can use existing educational data to make more fair, valid, and compassionate decisions about students than ever before.

The Problem with Current Testing Practice

Currently, nearly all educational and psychological tests are “static,” meaning they are designed to be administered only once to capture a snapshot of a student’s current knowledge and abilities. These static scores are often used to make high-stakes decisions about a student’s future: perhaps informing the identification of learning disabilities, decisions about placement into specific courses, or even admissions decisions by colleges or competitive high-schools.

But, because scores from static tests only capture information from a single point-in-time, they provide zero insight into a student’s learning trajectory. If a student scores poorly on a static test, it is impossible to know whether they are likely to be behind their peers for years to come, or whether they are merely a late-bloomer. Any educator will tell you there are many reasons why a student might score poorly on a static test when they actually have high learning potential. Maybe they come from a home where academic resources are scarce or academic skills are not as strongly encouraged. Maybe they experience test anxiety or have an emotionally traumatic experience in their past. The list of possible causes of poor test scores could go on and on, but one thing is abundantly clear: static test scores simply cannot represent the learning potential of any student.

A Better Option: Dynamic Assessment

There is an alternative to static testing that provides much richer information about students. This procedure—called dynamic assessment—requires tests to be administered more than once with instruction from an educator or clinician in between. A student takes a test, receives targeted instruction on the skills measured by the test, then takes the test again, and this pattern repeats while the student’s score goes up and up. The dynamic assessment procedure is finished when the student’s score stabilizes and no longer improves with the instruction. You could plot a student’s improvement over time on a graph like the one below:

When the dynamic assessment is finished, the rate of a student’s improvement, or their final achieved score, can be used to make educational decisions about that student. Psychologists have shown that students who live in poverty, special education students, or students who have not had adequate educational opportunity are much better served with dynamic assessment than traditional static testing. Dynamic assessment defines learning as the ability to acquire new knowledge, not by how much knowledge has already been acquired, and therefore it is a more fair and valid procedure because it focuses not on how much a student currently knows, but on how well that student can grow.

So Why Don’t We Use Dynamic Assessment in Schools?

Two reasons: one is financial, the other statistical.

First, dynamic assessment is typically much more expensive than static testing because it requires more time to administer. In the U.S., school systems often don’t have money and resources to make dynamic assessment work right now (in other countries, such as the Netherlands and Israel, dynamic assessment is much more common).

Second, tests can only be used to make high-stakes decisions about students when the scores from those tests follow particular statistical patterns that quantitative psychologists recognize as reliable and valid. But, until now, statistical models had not been developed to adequately capture the dynamic assessment process. That meant that the scores from dynamic assessments could never be studied scientifically, and they could never pass for use in schools.

A New Solution

Given this situation, we set out to develop a new statistical method, that we call dynamic measurement modeling, to statistically describe the growth trajectories captured by dynamic assessments. We also wrote statistical computer code that runs dynamic measurement models and can automatically score dynamic assessments in a reliable and valid way.

We used dynamic measurement modeling to describe the learning trajectories of thousands of students around the country—using existing data collected by federally-funded standardized reading and math tests from kindergarten through eighth grade—and we showed that dynamic scores are far less affected by student characteristics such as gender, race/ethnicity, and poverty than are static scores. This analysis strongly suggests that dynamic assessment would be much better for U.S. students than static testing.

Where You Come In

The quantitative evidence is clear: dynamic assessment is a much fairer and more valid procedure than static testing. But, because of the financial cost associated with dynamic assessment, school systems will not modify their policies and practices unless teachers, parents, and students demand change.

If a student you know has been disadvantaged by static testing, please advocate for them by recognizing the flaws in that procedure. Demand a focus on growth and learning, not only on a single test score. If we all team up—parents, teachers, and psychologists—we can change educational testing practice and create a much more equitable future for our students.

About the Author

Denis Dumas is Assistant Professor of Research Methods and Statistics in the Morgridge College of Education at the University of Denver. Before coming to DU, Dr. Dumas received his PhD in Educational Psychology and MA in Measurement, Statistics, and Evaluation from the University of Maryland at College Park, and was Assistant Professor of Educational Psychology at Howard University. In general, his research focuses on understanding student learning, cognition, and creativity through the application of latent variable methods, especially multidimensional item-response theory and non-linear growth models. He believes deeply in the power of quantitative research such as this for improving the field’s current understanding of learning, and supporting the academic development of all students. This work has led him to co-develop (with Dr. Daniel McNeish) the Dynamic Measurement Modeling framework as a way to improve the validity of psycho-educational assessment.
Daniel McNeish is an Assistant Professor of Quantitative Psychology at Arizona State University. He received a PhD in Measurement & Statistics at the University of Maryland and previously held academic positions at Utrecht University (Department of Methodology & Statistics) and UNC-Chapel Hill (Center for Developmental Science). His research focuses on statistical problems in behavioral sciences, particularly those related to small sample sizes, best practice, and challenging data structures. Acknowledgement of his research contributions has come from a dissertation award from the American Psychological Association, designation as a Rising Star by the Association for Psychological Science, and elected membership in the exclusive Society for Multivariate Experimental Psychology.