Blog: The Educated Reporter

How Will U.S. Fare in Next Round of International Testing?

(Flickr/Rona Proudfoot)

At a time when the volume of student achievement data can seem overwhelming, brace yourself: A wave of international test results for dozens of countries, including the U.S., is coming soon.

Data from two prominent exams at the precollegiate level — best known by the acronyms PISA and TIMSS — will be issued just days apart in late 2016. Collectively, the international assessments cover math, science, reading, plus (at least in this round) financial literacy and even “collaborative problem-solving.”

So, why does this matter? And what exactly can be learned?

“The global look is a good reality check” on academic achievement for the U.S. and other countries, said Ina V.S. Mullis, the co-executive director of the TIMSS & PIRLS International Study Center at Boston College.

One key question is whether the achievement gap between the U.S. and some high-scoring countries continues to grow, said Mullis during a panel at the Education Writers Association’s recent National Seminar in Boston. The results for TIMSS — the Trends in Mathematics and Science Study — will be made public Nov. 29.

The U.S. has been “consistently but slowly” improving on TIMSS, she said. But Mullis notes, for example, that South Korea’s achievement level on the international exam is growing more rapidly. “We’re trying to chase a moving target.”

A similar narrative was repeated the last time results were issued three years ago on PISA, the Program for International Student Achievement. Former U.S. Secretary of Education Arne Duncan said at the time: “We’re running in place as other high-performing countries start to lap us.”

In all three subjects tested — reading, math, and science – more countries outperformed the U.S. on the 2012 PISA exam than in the previous round. In math, for example, 29 nations scored higher than the U.S. by a statistically significant margin on the 2012 exam, up from 23 countries in 2009. Those included Singapore, South Korea, Finland, Canada, and Germany. Another nation that’s starting to draw notice for its PISA performance is Estonia, as The Hechinger Report’s Sarah Butrymowicz explained recently on EWA Radio.

Shanghai drew considerable attention for its high scores on PISA in 2012. However, those results and what they really mean have been the subject of much debate. Once again this year, China will not report national data, but several more cities will produce results, including Beijing, Jiangsu, and Guandong, as the OECD announced in 2014.

The new PISA report will be issued in early December.

Bathrooms & Student Achievement

The forthcoming international test data will surely spark plenty of discussion — and disagreement — in the U.S. about what to make of the findings. Indeed, some observers question the value of international comparisons altogether. (See, for example, this report from the Economic Policy Institute.) In any case, beyond the simple rankings of average achievement, there’s much more to mine in the international data, said Matthias von Davier, a senior research director at Educational Testing Service, during the EWA event. Among the possibilities are connecting the results to the number of computers, or even bathrooms, in the home, or looking at immigrant students.

PISA collects a wealth of student background data from questionnaires, including home resources and activities outside school. This information is useful to gain a deeper understanding beyond simply ranking countries based on their average scores.

“We can slice and dice the data by different background variables and we might gain new insights,” said von Davier of ETS, which is managing and developing the new PISA exam in 2018.

“Obviously, bathrooms don’t have a direct impact on how students perform,” he said, calling it an “indirect indicator” that signals resources in the home, in essence a proxy for socioeconomic status. But von Davier said to beware of going too far in interpreting such analyses.

“We cannot really make the determination that students are more proficient because of the computers,” he said of PISA. “However, we can say that student performance and computer usage are positively associated.”

Computer-Based Testing

The forthcoming PISA report will include 67 nations, plus comparable results for Massachusetts, North Carolina, and Puerto Rico. The focus in PISA, an exam for 15-year-olds, is applying learning in mathematics, science, and reading to real-world contexts. In 2015, PISA also included special tests for financial literacy and collaborative problem-solving.

The forthcoming TIMSS release will include results 57 countries, plus some non-national systems. One U.S. state, Florida, will have separately reportable results. The main focus of TIMSS is math and science learning at the 4th and 8th grades. It seeks to align broadly with the math and science curricula in participating countries, measuring how well students have learned key concepts and skills taught in school.

But Mullis emphasized that TIMSS by no means simply measures attainment of factual knowledge. It features a strong emphasis on “application and reasoning, because that’s what countries are hoping that their students would do.”

Here’s a breakdown of key similarities and differences, as provided by the National Center for Education Statistics.

(Source: Institute for Education Sciences)

Last, year, for the first time, students took the PISA exam on computers. TIMSS will be delivered on computer in 2018. Mullis of Boston College sees this as a promising evolution.

“You can be interactive,” she said. “So depending on what a student does, then you can ask something else. … Or through their taps and clicks, you can then study the path that they take with their problem-solving.”

The new TIMSS release also will features results from the TIMSS Advanced assessment, which is focused on math and physics. It tests students who have taken advanced coursework in these subjects at the end of their secondary school experience to gauge how successful participating countries are at preparing the future generation of scientists and engineers.

“It’s the elite of students who have taken specialized STEM courses in their countries,” said Mullis.

Meanwhile, the 2015 PISA exam included tests on financial literacy and collaborative problem-solving. And in 2018, the exam will feature a test of “global competence.”

Cultural Differences

One question posed during the EWA panel was whether international tests sufficiently account for cultural differences that might skew students’ answers. Mullis said TIMSS officials “go to huge lengths” to protect against this.

“We meet with our countries at least three times a year and go over everything” for both TIMSS and PIRLS (the international reading assessment), he said.

“We probably review hundreds of [reading] passages to get like three new ones that everybody agrees with,” she said, noting that certain topics are avoided altogether because of concerns that a certain population might not experience snow, or have ants.

Another question raised was whether U.S. poverty skews results compared with other nations.

Mullis suggested such concerns are overblown.

“My favorite counter example is the Russian Federation, which is larger than we are, far more diverse than we are, and not nearly as well off socio-economically, but is above us in every chart that TIMSS has,” she said. “Everybody has their own issues and troubles, and we think that ours are the worst. But they’re not. They’re far from the worst.”

‘Be Very Skeptical’

During the EWA panel, moderator Liana Heitin — an Education Week reporter — offered some advice when the global test scores are released.

“You have to be incredibly careful with statistical significance,” she said. “It looks like you can rank countries, but it’s not that easy. You can’t just say the U.S. is 30th or whatever, because a lot of countries are scoring in a similar band.”

In other words, it’s not clear that the difference in scores reflects different levels of achievement or mere chance. However, on PISA, for instance, the report includes color coding to indicate when the difference between on nation and another is statistically significant, Heitin notes. “So just make sure.”

She also highlighted the point made by von Davier of ETS about not confusing correlation with causation.

“You’re going to get all these comments” attempting to explain what caused the results, whether teacher quality, school spending, or other factors. “Be very skeptical.”

Editor’s Note: Matthias von Davier of ETS was unable to attend the EWA event in person because of travel delays, but recorded remarks (at the airport!) to accompany his slides, which were presented at the seminar.