Are the students’ scores in PISA valid? A comparison with students’ grades and national test results
PISA has an extensive impact on policymakers, academics and the wider education community – but are the results valid? The overall aim is to investigate the validity of the test scores of PISA 2018 in Sweden. To reach that aim, we will investigate if the PISA sample is biased and determine whether students’ test scores translate into their national test results and grades. There is a gap in knowledge regarding the relation between students’ performances in PISA and their performances on high-stake tests. However, in this study we can link PISA with register data from Statistics Sweden (SCB). First, we will determine whether and how the exclusion rate and missing number of students affected Sweden’s overall PISA score. Second, we will define the relationship between PISA scores, grades and national test-results for the students who took part in PISA 2018. Third, we will investigate to what extent students’ test motivation in PISA explains discrepancies between PISA and the national assessments, and if test motivation varies for different student groups. In the short-term, we provide new information on how students’ results in PISA relate to their performances on high-stakes tests. If we find questionable validity of PISA-results we suggest to either correct PISA scores for test motivation or to incentivize students to take the test more seriously. In the long-term, our results could motivate changes regarding the way PISA reports results to students, schools, and the public.
Final report
Project aim and development
The aim of the project “Are the students’ scores in PISA valid? A comparison with students’ grades and national test results” is to examine aspects of reliability and validity in PISA 2018, with particular attention to how exclusion, non-response, and various aspects of student motivation and response patterns may influence the results. Since the start of the project, no major changes have been made to the original aims. However, new research questions have emerged concerning the reliability of student survey responses. These questions fall within the scope of the project and have led to additional analyses comparing PISA students’ survey responses with register data.
Overview of the implementation
The project is divided into three sub-studies, which have largely followed the original plan. Each sub-study has analyzed data from PISA 2018, as well as register data covering students’ backgrounds, national test scores, and grades. In the early stages of the project, we focused on mapping how exclusion and non-response in PISA are either underestimated or exceed OECD guidelines, and what implications this has for countries’ average PISA scores. In parallel, we have explored questions related to student motivation in PISA and how students with varying levels of motivation perform on national tests. One hypothesis was that students are generally more motivated when taking a national test, and that the correlation between PISA and national test performance is stronger among students who are motivated to complete the PISA assessment. A later area of focus has been the consistency between student-reported survey data and register data, for example when students report their parents’ level of education. We examined how such discrepancies may impact analyses that rely on these variables.
Three key results and reflections on the project’s conclusions
The significance of exclusion and non-response for national results
Our analyses suggest that national PISA outcomes may be affected when exclusion and non-response levels are high. In the case of Sweden 2018, both exclusion and non-response were substantial which is a threat to the representativeness of the data. According to our estimates, Sweden’s average result was significantly affected by the exclusion of low performing students. At the same time, exclusion and non-response pose challenges for many countries. The variation across countries with respect to these factors is considerable, which may influence the comparability of countries in international rankings. Although OECD guidelines allow for up to 5% exclusion, countries are often included in reporting despite significantly higher exclusion rates. The findings thus highlight a methodological challenge in how OECD reports and compares national PISA outcomes.
Student motivation and underperformance
A recurring assumption is that some students underperform on PISA because the test has no direct consequences for them. Results from our studies show that students themselves report they would have exerted more effort on the PISA test if it had had an impact on their grades. The findings also indicate that more motivated students display a stronger correlation between PISA and the higher-stakes national tests. This suggests that students with lower motivation tend to show a greater discrepancy between their PISA and national test results. However, our results also indicate that it is likely only a relatively small group of students who would perform significantly better if PISA scores affected their grades. As such, low motivation alone cannot fully explain the large differences observed between countries—for example, between Europe and East Asia. Nonetheless, the issue of test motivation has attracted considerable interest and is central to discussions about how meaningful PISA results are for both students and policymakers.
Subject specificity and validity in PISA results
Through structural equation modelling and comparisons between PISA results, national test scores, and school grades, we have identified a specific factor that appears to be uniquely linked to PISA. Correlations between the different PISA domains (e.g., reading, mathematics, science) are relatively strong, whereas the correlations between PISA results and corresponding national tests or grades in Swedish school subjects are weaker. This may indicate that PISA to a greater extent measures a general cognitive ability or a test format–related competence, rather than the subject-specific knowledge assessed by national tests and grades.
Conclusions
Our findings underscore the importance of considering methodological aspects when interpreting results from international large-scale assessments. Variations in exclusion and non-response can influence reported outcomes and thereby complicate cross-national comparisons, especially given the differences in data collection practices across countries. In parallel, our analyses suggest that PISA, in comparison with national assessments, tends to measure a relatively high degree of general cognitive ability. Additional influencing factors include the test format and non-cognitive aspects such as students’ motivation to engage with the assessment. Taken together, these findings suggest that differences in outcomes do not solely reflect educational quality or student performance but are also shaped by how they are measures and if the assessments are actually representative for the full student population.
The project contributes to the discussion of the extent to which PISA results are comparable across countries and over time. The project calls for caution in the interpretation of PISA results and highlights the need for more nuanced analyses to enable accurate and fair comparisons across time and between countries.
Emerging research questions
The project has generated new questions concerning the reliability of student-reported survey responses and the implications this may have for subsequent analyses—for instance, when student-reported parental education is used in regression models to predict achievement. We compared students’ self-reported information with register data and identified notable discrepancies.
Another research question concerns how negatively worded survey items might affect student responses and lead to systematic measurement errors—for example, in assessments of motivation. Additional questions of interest include how student motivation to complete PISA varies across countries and based on different sources of information. One such source is process data, such as how long students spend on different tasks, how much they write, and other behaviors during the test session. Another potential source is qualitative observation data, such as reports from observers describing how the test situation unfolded and whether students appeared motivated or completed the assessment in a meaningful way.
Dissemination and collaboration
The project’s results have been disseminated through peer-reviewed publications in international academic journals, as well as through conference presentations. To date, the project has resulted in several publications, with additional manuscripts currently under review. All publications are made available through open access. The University of Gothenburg holds agreements with several publishers, and in some cases, article processing charges have been covered to ensure open access.
Findings have been presented at educational research conferences such as ECER (European Conference on Educational Research) and EARLI 2023. Interest in the use of international large-scale assessment data has grown, and project members are organizing a symposium at ECER 2025, involving researchers from around ten countries to discuss how register data can be used in educational research.
Project members have also developed a collaboration with researchers at Umeå University focusing on test motivation and have been invited to several academic seminars, including as discussants in doctoral seminars. The linking of PISA data with national register data has also drawn interest from doctoral students, several of whom are planning to use similar data in their dissertation projects. The project leader has also had the opportunity to visit Nanyang Technological University in Singapore, as well as the national organization responsible for PISA there, in order to study how test motivation is understood and managed in different test cultures.
The aim of the project “Are the students’ scores in PISA valid? A comparison with students’ grades and national test results” is to examine aspects of reliability and validity in PISA 2018, with particular attention to how exclusion, non-response, and various aspects of student motivation and response patterns may influence the results. Since the start of the project, no major changes have been made to the original aims. However, new research questions have emerged concerning the reliability of student survey responses. These questions fall within the scope of the project and have led to additional analyses comparing PISA students’ survey responses with register data.
Overview of the implementation
The project is divided into three sub-studies, which have largely followed the original plan. Each sub-study has analyzed data from PISA 2018, as well as register data covering students’ backgrounds, national test scores, and grades. In the early stages of the project, we focused on mapping how exclusion and non-response in PISA are either underestimated or exceed OECD guidelines, and what implications this has for countries’ average PISA scores. In parallel, we have explored questions related to student motivation in PISA and how students with varying levels of motivation perform on national tests. One hypothesis was that students are generally more motivated when taking a national test, and that the correlation between PISA and national test performance is stronger among students who are motivated to complete the PISA assessment. A later area of focus has been the consistency between student-reported survey data and register data, for example when students report their parents’ level of education. We examined how such discrepancies may impact analyses that rely on these variables.
Three key results and reflections on the project’s conclusions
The significance of exclusion and non-response for national results
Our analyses suggest that national PISA outcomes may be affected when exclusion and non-response levels are high. In the case of Sweden 2018, both exclusion and non-response were substantial which is a threat to the representativeness of the data. According to our estimates, Sweden’s average result was significantly affected by the exclusion of low performing students. At the same time, exclusion and non-response pose challenges for many countries. The variation across countries with respect to these factors is considerable, which may influence the comparability of countries in international rankings. Although OECD guidelines allow for up to 5% exclusion, countries are often included in reporting despite significantly higher exclusion rates. The findings thus highlight a methodological challenge in how OECD reports and compares national PISA outcomes.
Student motivation and underperformance
A recurring assumption is that some students underperform on PISA because the test has no direct consequences for them. Results from our studies show that students themselves report they would have exerted more effort on the PISA test if it had had an impact on their grades. The findings also indicate that more motivated students display a stronger correlation between PISA and the higher-stakes national tests. This suggests that students with lower motivation tend to show a greater discrepancy between their PISA and national test results. However, our results also indicate that it is likely only a relatively small group of students who would perform significantly better if PISA scores affected their grades. As such, low motivation alone cannot fully explain the large differences observed between countries—for example, between Europe and East Asia. Nonetheless, the issue of test motivation has attracted considerable interest and is central to discussions about how meaningful PISA results are for both students and policymakers.
Subject specificity and validity in PISA results
Through structural equation modelling and comparisons between PISA results, national test scores, and school grades, we have identified a specific factor that appears to be uniquely linked to PISA. Correlations between the different PISA domains (e.g., reading, mathematics, science) are relatively strong, whereas the correlations between PISA results and corresponding national tests or grades in Swedish school subjects are weaker. This may indicate that PISA to a greater extent measures a general cognitive ability or a test format–related competence, rather than the subject-specific knowledge assessed by national tests and grades.
Conclusions
Our findings underscore the importance of considering methodological aspects when interpreting results from international large-scale assessments. Variations in exclusion and non-response can influence reported outcomes and thereby complicate cross-national comparisons, especially given the differences in data collection practices across countries. In parallel, our analyses suggest that PISA, in comparison with national assessments, tends to measure a relatively high degree of general cognitive ability. Additional influencing factors include the test format and non-cognitive aspects such as students’ motivation to engage with the assessment. Taken together, these findings suggest that differences in outcomes do not solely reflect educational quality or student performance but are also shaped by how they are measures and if the assessments are actually representative for the full student population.
The project contributes to the discussion of the extent to which PISA results are comparable across countries and over time. The project calls for caution in the interpretation of PISA results and highlights the need for more nuanced analyses to enable accurate and fair comparisons across time and between countries.
Emerging research questions
The project has generated new questions concerning the reliability of student-reported survey responses and the implications this may have for subsequent analyses—for instance, when student-reported parental education is used in regression models to predict achievement. We compared students’ self-reported information with register data and identified notable discrepancies.
Another research question concerns how negatively worded survey items might affect student responses and lead to systematic measurement errors—for example, in assessments of motivation. Additional questions of interest include how student motivation to complete PISA varies across countries and based on different sources of information. One such source is process data, such as how long students spend on different tasks, how much they write, and other behaviors during the test session. Another potential source is qualitative observation data, such as reports from observers describing how the test situation unfolded and whether students appeared motivated or completed the assessment in a meaningful way.
Dissemination and collaboration
The project’s results have been disseminated through peer-reviewed publications in international academic journals, as well as through conference presentations. To date, the project has resulted in several publications, with additional manuscripts currently under review. All publications are made available through open access. The University of Gothenburg holds agreements with several publishers, and in some cases, article processing charges have been covered to ensure open access.
Findings have been presented at educational research conferences such as ECER (European Conference on Educational Research) and EARLI 2023. Interest in the use of international large-scale assessment data has grown, and project members are organizing a symposium at ECER 2025, involving researchers from around ten countries to discuss how register data can be used in educational research.
Project members have also developed a collaboration with researchers at Umeå University focusing on test motivation and have been invited to several academic seminars, including as discussants in doctoral seminars. The linking of PISA data with national register data has also drawn interest from doctoral students, several of whom are planning to use similar data in their dissertation projects. The project leader has also had the opportunity to visit Nanyang Technological University in Singapore, as well as the national organization responsible for PISA there, in order to study how test motivation is understood and managed in different test cultures.