A group of 5 people work at a table.

My Students are Deep, Complex, and Beautiful. Assessments Should Be, Too.

by Sarah Hampton

I have a love/hate relationship with standardized assessments. I understand they’re necessary so we can get a picture of what our students are taking away from from their school experience. BUT, there are so many times when I find myself making decisions about my instruction based on how I think it will affect the summative assessment. Sometimes I am even choosing between holistic practices advocated by the standards themselves for learning and superficial practices I’ve seen work to improve test scores. Teachers are accountable to the test scores, so there is pressure to make decisions that give quick gain but may not result in long-term learning.

This conflict seems to be a common experience. When I talk to fellow educators about new pedagogies, tools, or curricula, one of the very first questions asked is always, “How will this affect my test scores?” Bottom line: high stakes assessments are still driving instructional decisions, for better or worse.

If the summative assessments were perfectly designed to really assess learning, then this would be ideal, right? If tests were truly an accurate reflection of everything we want our students to know for future learning, to be able to do, and help them be, then making instructional decisions that cause those scores to increase would mean our students were getting closer to those goals. However, I have a major bone to pick: Current standardized tests are an oversimplification of the depth, complexity, and beauty of my students.

I get that it’s important to prove that schools are doing what we say they’re doing. It’s important that we’re held accountable for all students learning in ways that benefit them. My problem? I don’t think current standardized tests are actually proving that students are learning. Most tests (even the adaptive ones I’ve seen) are still multiple choice, true/false, drop down, matching, or, occasionally, fill in the blank or short answer. The outcomes I want from my students simply cannot be properly evaluated that way! In addition, I shouldn’t feel pressured to sacrifice meaningful instructional activities that support my students’ complexity, depth, and beauty in order to make a shallow snapshot look better.

State and national assessments aren’t going away. I’m not even suggesting that they should. As a math and science teacher, I’m all about making informed decisions based on the right data. Unfortunately, as Pellegrino and Quellmalz note in Perspectives on the Integration of Technology and Assessment

“in the realm of high-stakes assessment for No Child Left Behind (NCLB) accountability, a number of regulatory, economic, and logistical issues have constrained the breadth and depth of the content and performance standards assessed in annual on-demand tests. Standard, multiple-choice item formats continue to dominate large-scale, computer-based, high-stakes testing, resulting in an over reliance on simple, highly structured problems that tap fact retrieval and the use of algorithmic solution procedures.”

So, because simple, highly structured problems are the things most current tests assess, teachers are unintentionally incentivized to promote surface level thinking around oversimplified problems.

What if there was a way to design an assessment that wasn’t constrained that way? What if assessments were made of messy problems and could honor multiple student pathways to solutions? What if they could analyze and reward student thinking on a topic? Better standardized assessments could be one possible way to reconcile what and how I want my students to learn with how they can express what they know.

Here comes the good news: the learning sciences community is integrating technology in ways that are rapidly improving assessments so they can give us a more complex and accurate picture of what our students know and do! That means that investing in the complexity of our students would translate to better scores because the assessment could give us a more comprehensive image of their understanding. It’s like the assessment world is heading into 3-d printing instead of 2-d black and white line drawings! No more tug of war between deep, meaningful instruction and what I feel like I have to do for the test because, now, they can be the same thing. Yay!

Consider My Science Tutor (MyST), a type of NSF funded project called an intelligent virtual tutor. MyST uses artificial intelligence to interpret and assess how well students understand science concepts by analyzing their dialog. The principal investigators of MyST say that “its goal is to assess student understanding of concepts rather than facts, which is very important to prepare students and the future workforce in STEM.” Student understanding of concepts rather than facts? That’s what I want!

Before advancements like the ones that make MyST a reality, a student’s answer on an assessment had to exactly match the correct answer designated by the programmer. That’s why standardized tests relied so heavily on multiple choice, T/F, etc. Now, machine learning allows computers like MyST to ‘learn’ what understanding a topic sounds like by analyzing speech from multiple dialog sessions–over 13,000 sessions for MyST, in fact. Then, it can analyze how well a new student has learned the concept based on what MyST ‘knows’. This is the essence of artificial intelligence: a machine doing what was previously thought to be unique to humans. In Towards artificial intelligence-based assessment systems, Rose Luckin says, “AI is a powerful tool to open up the ‘black box of learning’, by providing a deep, fine-grained understanding of when and how learning actually happens.”

This level of comprehension analysis was formerly only possible through one-on-one evaluation by teachers or researchers–an unrealistic burden on human time. Now, as Pellegrino and Quellmalz say:

“In addition to assessment of student knowledge and skills in highly structured problems with one right answer, technology can also support the design of complex, interactive tasks that extend the range of knowledge, skills, and cognitive processes that can be assessed.

The implications of this could transform the assessment landscape by allowing multiple computers to ‘listen’ to each student at once and check for genuine understanding. There’s something beautiful about that. Even more, when assessments can help teachers evaluate complex understanding using open-ended problems, teachers are incentivized to promote deep thinking of deep problems. My students are deep, complex, and beautiful. Assessments should be, too.

How about you? How do you resolve the ever present conflict between good teaching and standardized assessments? Do you think standardized assessments cover important issues? What do you think about creating new assessments using AI? Join the conversation by tweeting @CIRCLEducators.

Thank you to James Lester for reviewing this post. We appreciate your work in AI and your work to bring educators and researchers together on this topic.

How to cite this work

CIRCL Educator posts are licensed under a Creative Commons Attribution 4.0 International License. If you use content from this site, please cite the post and consider adding: "Used under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)."

Suggested citation format: [Authors] ([Year]). [Title]. CIRCLEducators Blog. Retrieved from [URL]