Algorithmic fairness in student performance ML models

By Eve Bracken-Ingram

Updated Apr 13, 2026

At Student Voice, we want to help universities support students before problems escalate. Machine learning can help identify students who may be at risk of underperformance or dropout early enough for useful intervention. That opportunity matters only if the model is fair, interpretable, and careful not to reproduce existing inequalities.

A paper by Karimi-Haghighi et al. [Source] developed a machine learning approach to predict the risk of university dropout and underperformance using information available at enrolment. The authors considered factors already linked to later academic outcomes, including student demographics, school type and location, and average admission grade. Even so, underperformance can arise for many personal or institutional reasons that are hard to capture in a model. That matters because some groups are already over-represented in dropout statistics, so any predictive system must be checked carefully for bias before it is used to guide support, especially where wider AI equity challenges in higher education are already visible.

Algorithmic fairness in machine learning is therefore central to this work. As explored in another Student Voice article [1], fairness has several, sometimes competing, definitions. In practical terms, it means reducing discriminatory bias so a model does not make systematically worse decisions for certain groups. Because machine learning systems learn from historical data, bias can be carried into the model unless it is measured and addressed. In Karimi-Haghighi et al.'s study, fairness was assessed using the error-rate metrics generalised false positive rate (GFPR) and generalised false negative rate (GFNR), alongside calibration. Calibration describes how well predicted probabilities match observed outcomes, which makes results easier to interpret across groups. The authors also tracked accuracy through Area Under the ROC Curve (AUC) and applied a bias-mitigation procedure designed to equalise error rates while maintaining calibration [2].

The model was trained and tested on a dataset of 881 computer science students. The analysis compared outcomes across age, gender, nationality, academic performance, and school type. Within this dataset, foreign students were significantly more likely to underperform than domestic students. Students who failed a course or had to resit an exam in first year also showed a greater risk of dropout. Because the dataset was heavily imbalanced by gender, the authors used the SMOTE algorithm [3] to rebalance the distribution by interpolating minority cases.

A multi-layer perceptron with a hidden layer of 100 neurons produced the strongest results. Its AUC compared well with earlier studies, although accuracy was higher for male students and for students with lower admission grades than for their counterparts. Across groups, the model showed good equity in GFNR, meaning false negative errors were relatively consistent across gender, age, nationality, school type, and academic performance. GFPR showed greater disparity, particularly for students with low admission grades, but fairness improved after bias mitigation. The same process also improved equity in GFNR and AUC across most groups.

For universities, the value of this work lies not only in prediction, but in using prediction responsibly. A fairer model could help institutions target support earlier, allocate resources more deliberately, and reduce the risk that early-warning systems reinforce existing disadvantage. The wider lesson is equally important: accuracy alone is not enough. If institutions want to use machine learning in student support, they need models that are accurate, explainable, and actively monitored for bias.

FAQ

Q: How does Student Voice plan to implement the findings of the Karimi-Haghighi et al. study into real-world educational settings?

A: The most useful lesson is methodological. When institutions use predictive or text-analysis tools to support students, they need bias checks, documented metrics, and clear human oversight. Student Voice Analytics focuses on governed, reproducible analysis of student comments, helping teams spot risk signals and experience gaps without relying on opaque generic LLM workflows. Any operational use of predictive models should support timely intervention and safeguarding, not automate high-stakes decisions about individual students.

Q: What are the ethical considerations and potential privacy concerns associated with using machine learning to analyse students' data for predicting underperformance and dropout rates?

A: The ethical challenge is to use predictive insight to help students, not to stigmatise them. Institutions need to be clear about what data is collected, why it is being used, who can access it, and how long it is retained. They also need to test for bias so that certain groups are not unfairly flagged or overlooked. Transparent communication, strong access controls, and compliance with data protection law are essential if these systems are to earn trust.

Q: How does the inclusion of text analysis enhance the predictive capabilities of the machine learning models discussed by Karimi-Haghighi et al., if at all?

A: While the original study does not explicitly use text analysis, open-text data could add valuable context to structured risk indicators, especially when institutions use education text analysis tools that support repeatable analysis. Students' comments can reveal issues around belonging, workload, teaching quality, support, or wellbeing that grades and attendance alone may miss. Used carefully, text analysis helps institutions understand not just who may be at risk, but why, and related work on reducing bias in NLP systems shows why that design work matters. That leads to more targeted, humane interventions rather than one-size-fits-all responses.

References

[Source] Marzieh Karimi-Haghighi, Carlos Castillo, Davinia Hernández-Leo and Veronica Moreno Oliver (2021) Predicting Early Dropout: Calibration and Algorithmic Fairness Considerations. Companion Proceedings 11th International Conference on Learning Analytics & Knowledge
DOI: 10.48550/arXiv.2103.09068

[1] David Griffin, Definitions of Fairness in Machine Learning Explained Through Examples. Student Voice

[2] Pleiss, G. a. (2017). On fairness and calibration. Advances in Neural Information Processing
DOI: 10.48550/arXiv.1709.02012

[3] Chawla, N. V. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 321-357
DOI: 10.48550/arXiv.1106.1813

Request a walkthrough

Book a free Student Voice Analytics demo

See all-comment coverage, sector benchmarks, and reporting designed for OfS quality and NSS requirements.

  • All-comment coverage with HE-tuned taxonomy and sentiment.
  • Versioned outputs with TEF-ready reporting.
  • Benchmarks and BI-ready exports for boards and Senate.
Prefer email? info@studentvoice.ai

UK-hosted · No public LLM APIs · Same-day turnaround

Related Entries

The Student Voice Weekly

Research, regulation, and insight on student voice. Every Friday.

© Student Voice Systems Limited, All rights reserved.