Are Interim Assessments a Waste of Time?

April 16, 2022

There was a relatively recent Hechinger Report article by Jill Barshay, “PROOF POINTS: Researchers blast data analysis for teachers to help students” that seemed to indict any and all assessments and data use in schools as a royal waste of time. It bothered me because the only source cited explicitly in the article was a 2020 opinion piece by a professor who similarly vaguely discusses “interim assessment” and doesn’t provide explicit citations of her sources.

I tweeted out my annoyance to this effect.

To Ms. Barshay’s great credit, she responded with equanimity and generosity to my tweet with multiple citations.

Since she took that time for me, I wanted to reciprocate by taking the time to review her sources with an open mind, as well as reflect on where I might land after doing so.

BUT I’ve had this post sitting in my drafts for months now, and realized I’d never do quite the deep and full analysis I might prefer due to limited time. Instead, I’m just going to bullet a short relevant summary quote for each of the sources below:

Cordray, D., Pion, G., Brandt, C., Molefe, A, & Toby, M. (2012). The Impact of the Measures of Academic Progress (MAP) Program on Student Reading Achievement. (NCEE 2013–4000). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

“Overall, the MAP program did not have a statistically significant impact on students’ reading achievement in either grade 4 or grade 5.”

Faria, A.-M., Heppen, J., Li, Y., Stachel, S., Jones, W., Sawyer, K., Thomsen, K., Kutner, M., Miser, D., Lewis, S., Casserly, M., Simon, C., Uzzell, R., Corcoran, A., & Palacios, M. (2012). Charting Success: Data Use and Student Achievement in Urban Schools. In Council of the Great City Schools. Council of the Great City Schools. https://eric.ed.gov/?id=ED536748

“the more that teachers and principals reported reviewing and analyzing student data and using this information to make instructional decisions, the higher their students’ achievement, at least in some grades and subjects. Moreover for principals, the more they reported having support in the form of an appropriate data infrastructure, adequate time for review and discussion of data, professional development, and the appropriate human resources, the higher their students’ achievement.” “The results also appear to be in line with previous research that suggests that having interim assessments may be helpful but not sufficient to produce positive changes in student achievement.” “Although these findings do not identify the specific aspects of each dimension that are most important, it appears that data use by principals, particularly in elementary school, may be as important as teacher data use. This is in line with the findings from our site visits (as well as prevailing wisdom) that suggest that leadership and support from the administration are critical.”

Henderson, S., Petrosino, A., Guckenburg, S., & Hamilton, S. (2007). Measuring how benchmark assessments affect student achievement (Issues & Answers Report, REL 2007–No. 039). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from http://ies.ed.gov/ncee/edlabs

“The study found no immediate statistically significant or substantively important difference between the program and comparison schools. That finding might, however, reflect limitations in the data rather than the ineffectiveness of benchmark assessments.” “Some nontrivial effects for subgroups might be masked by comparing school mean scores.”

Henderson, S., Petrosino, A., Guckenburg, S., & Hamilton, S. (2008). REL Technical Brief—a second follow-up year for “Measuring how benchmark assessments affect student achievement” (REL Technical Brief, REL Northeast and Islands 2007–No. 002). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Northeast and Islands. Retrieved from http://ies.ed.gov/ncee/edlabs

“The follow-up study finds no significant differences [in grade 8 math] between schools using [benchmark assessments] and those not doing so after two years.”

Konstantopoulos, S., Li, W., Miller, S. R., & van der Ploeg, A. (2016). Effects of Interim Assessments Across the Achievement Distribution. Educational and Psychological Measurement, 76(4), 587–608. https://doi.org/10.1177/0013164415606498

“The findings in Grades 3 to 8 overall suggest that Acuity had a positive and significant impact in various quantiles of the mathematics achievement distribution. The magnitude of the effects was consistently greater than one sixth of a SD. In contrast, in reading only the 10th quantile estimate was positive and significant. The findings in Grades K-2 overall suggest that the effect of mCLASS on mathematics or reading scores across the achievement distribution was small and not statistically different than zero.”

Konstantopoulos, S., Miller, S., van der Ploeg, A., Li, C.-H., & Traynor, A. (2011). The Impact of Indiana’s System of Diagnostic Assessments on Mathematics Achievement. In Society for Research on Educational Effectiveness. Society for Research on Educational Effectiveness. https://eric.ed.gov/?id=ED528756

“it is unclear that the intervention had any systematic effects on student achievement except for fifth grade mathematics.”

Some additional sources beyond Barshay’s to consider in relation to this topic:

Tim Shanahan has a blog, “Do Screening and Monitoring Tests Really Help?” that does a nice job summarizing a number of additional sources around screening and monitoring for literacy.

“the evidence supporting the use of such testing to improve reading achievement is neither strong nor straightforward. The pieces are there, but the connections are a bit shaky.” “My conclusions, from all this evidence, is that it is possible to make effective the kind of assessment that you are complaining about. However, it should also be evident that such efforts too often fail to deliver on those promises.” “In many schools/districts/states, we are overdoing it! The only reason to test someone is to find out something that you don’t know. If you know students are struggling with decoding, testing them to prove it doesn’t add much.” “The point of all this testing is to reshape your teaching to ensure that kids learn. Unfortunately, these heavy investments in assessment aren’t always (or even usually) accompanied by similar exertions in the differentiation arena.”

Paly, B. J., Klingbeil, D. A., Clemens, N. H., & Osman, D. J. (2022). A cost-effectiveness analysis of four approaches to universal screening for reading risk in upper elementary and middle school. Journal of School Psychology, 92, 246–264. https://doi.org/10.1016/j.jsp.2022.03.009

“The results suggest that the use of prior-year statewide achievement test data alone in Grades 4–8 is an efficient approach to universal screening for reading risk that may allow schools to shift resources from screening to other educational priorities.”

Heckman, J., Zhou, J. (2022) Measuring Knowledge. IZA Institute of Labor Economics, IZA DP No. 15252. Retrieved from https://www.iza.org/publications/dp/15252/measuring-knowledge

“Value-added measures are widely used to measure the output of schools. Aggregate test scores are used to measure gaps in skills across demographic groups. This paper shows that this practice is unwise. The aggregate measures used to chart student gains, child development, and the contribution of teachers and caregivers to student development are not comparable over time and persons except, possibly, for narrowly defined measures of skill.”

OK, so this review was admittedly cursory. But even just pulling out the key findings shows that the issue of district and school-wide assessments is not totally clear-cut in either direction. There’s some positive results here and there, but they are mixed.

Definitely some food for thought, wherever one might stand. So where am I?

I’m no a fan of over-testing, I don’t think many schools use data effectively for a wide variety of reasons, and I’ve seen how assessment data can be easily misinterpreted to reinforce deficit mindsets for Black students, ELLs, students with disabilities, and other typically marginalized students. And I believe the balance of attention should lie far more heavily on the side of formative, rather than summative/evaluative, use of assessments.

That said, I also believe in the need for accountability and more objective measurements through the use of external, “standardized” (i.e. normed and validated) assessments, and I have seen that when triangulated with multiple sources, including qualitative sources, and discussed as a team using a structured protocol—and most importantly, discussed with students themselves—data can be a powerful tool for equity, empowerment, and responsive instructional supports. Furthermore, interim assessments, here meaning valid and reliable assessments used at strategic points throughout the school year, can be used to measure growth in a formative sense and to inform structural decisions at the school or grade/department-level that can provide more adaptive supports to many more students.

That said, however, I also want to draw a clear line in the sand between “screening” and “benchmarks/interim assessments.” Interim assessments can be used for screening purposes, and screening can be done throughout the school year and thus become synonymous with interim assessment, but not all interim assessments are ideal screeners. This may sound like splitting hairs, but I think it’s a critical distinction, because screening is about efficient and proactive identification of need—which may lead to further data collection and analysis—while interim assessments can often be time consuming without providing more granular information. What’s a good example of a screening tool that fits this function? The DIBELS/Acadience Oral Reading Fluency (ORF) measure is a good example. It’s quick, relatively easy to administer, normed, and has a solid research base behind its use for this purpose. Another good example is the CUBED preK-3 suite of assessments, such as the NLM Reading and Listening tools. Such screening tools can provide an efficient distinction of students at possible risk of struggling to read, and if used in this way, can lead to preventative action and interventions that can improve outcomes.

Any source of quantitative data is potentially questionable, so the more that’s available and able to be contextualized alongside of qualitative to build a coherent story, the better, in my view.

But the key is that we never lose sight of the children in front of us and their ever-evolving, dynamic strengths and needs. Data must inform precise supports and be used to build a collaborative story, a story that empowers both students and the adults who work with them with the language of growth, potential, and clear instructional goals.

Anything else is a distraction.

#assessment #reading #screening #literacy #data #research

Discuss...