Using Ensemble Machine Learning and Feature Engineering to Increase the Accuracy of Predicting Learners' Performance in an Online Educational Environment

Document Type : Original Article

Authors

1 Department of Information Technology and Computer Engineering, Payame Noor University, Tehran, Iran

2 Department of Computer Sciences, Faculty of Mathematical Sciences, Shahrkord, Iran

Abstract

Background: Online training has gained popularity as an effective teaching method, necessitating diligent monitoring of learner progress and engagement. The challenge of predicting academic performance in online courses is crucial for supporting learners at risk of academic loss. This study aimed to develop a robust model for predicting learners' performance using ensemble machine learning and feature engineering techniques.
Methods: This research employed a classification approach based on the Digital Electronic Education and Design Suite (DEEDS) dataset, which records real-time interactions of learners within an online educational environment. The dataset analyzed in this research included activity logs from 115 undergraduate students majoring in computer engineering who participated in a digital electronics course at the University of Genoa, Italy, between September and December 2015. Various machine learning algorithms, including Random Forest (RF), Adaptive Boosting (AdaBoost), Gradient Boosting (GB), Light Gradient-Boosting Machine (LightGBM), and eXtreme Gradient Boosting (XGBoost), were applied. The study also utilized ensemble learning methods such as Boosting and Stacking to enhance prediction accuracy. Feature engineering techniques were implemented to extract and select relevant features from the dataset, leading to the development of a predictive model.
Results: The proposed model achieved an accuracy of 97.43%, a precision of 96.20%, and an F1-score of 98.06%, indicating an acceptable predictive capability. Notably, the findings revealed that feature selection significantly enhanced performance; in the absence of feature selection, the accuracy dropped to 92.15%. Additionally, ensemble methods like Boosting and Stacking provided a 15% enhancement in prediction accuracy compared to traditional approaches. Overall, the integration of feature engineering and ensemble techniques acceptably optimized the model's ability to predict learners’ academic performance in online educational settings. 
Conclusion: This research validates the effectiveness of employing ensemble machine learning techniques and feature engineering in predicting learners’ academic performance in online education. Future studies should explore additional ensemble methods and incorporate diverse feature types to enhance prediction accuracy.

Highlights

Seyede Fatemeh Noorani (Google Scholar)

Keywords


  1. Siemens G. Learning analytics: envisioning a research discipline and a domain of practice. Second International Conference on Learning Analytics and Knowledge; 2012 April 29 - May 2. New York (NY): Association for Computing Machinery; 2012. P. 4-8. doi: 10.1145/2330601.233060.
  2. Chatti MA, Dyckhoff AL, Schroeder U, Thüs H. A reference model for learning analytics. International journal of Technology Enhanced learning. 2012;4(5-6):318-31. doi: 10.1504/IJTEL.2012.051815.
  3. Qiu F, Zhang G, Sheng X, Jiang L, Zhu L, Xiang Q, et al. Predicting students' performance in e-learning using learning process and behaviour data. Sci Rep. 2022;12(1):453. doi: 10.1038/s41598-021-03867-8. PubMed PMID: 35013396; PubMed Central PMCID: PMCPMC8748729.
  4. Banihashem SK, Aliabadi K, Pourroostaei Ardakani S, Delaver A, Nili Ahmadabadi M. Learning analytics: A systematic literature review. Interdisciplinary Journal of Virtual Learning in Medical Sciences. 2018;9(2). doi: 10.5812/ijvlms.63024.
  5. Wang X, Zhao Y, Li C, Ren P. ProbSAP: A comprehensive and high-performance system for student academic performance prediction. Pattern Recognition. 2023;137:109309. doi: 10.1109/ISBAST.2013.50.
  6. Liu T, Li S. Performance Prediction for Higher Education Students Using Deep Learning. Complexity. 2022;(1):1-10. doi: 10.1155/2021/9958203.
  7. Yin C, Tang D, Zhang F, Tang Q, Feng Y, He Z. Students learning performance prediction based on feature extraction algorithm and attention-based bidirectional gated recurrent unit network. Plos one. 2023;18(10):e0286156. doi: 10.1371/journal.pone.0286156. PubMed PMID: 37878591; PubMed Central PMCID: PMC10599562.
  8. Liang G, Jiang C, Ping Q, Jiang X. Academic performance prediction associated with synchronous online interactive learning behaviors based on the machine learning approach. Interactive Learning Environments. 2023:1-16. doi: 10.1080/10494820.2023.2167836.
  9. Batool S, Rashid J, Nisar MW, Kim J, Kwon H-Y, Hussain A. Educational data mining to predict students' academic performance: A survey study. EAIT. 2023;28(1):905-71. doi: 10.1007/s10639-022-11152-y.
  10. Forero-Corba W, Bennasar FN. Techniques and applications of Machine Learning and Artificial Intelligence in education: a systematic review. RIED-Revista Iberoamericana de Educación a Distancia. 2024;27(1). doi: 10.5944/ried.27.1.37491.
  11. Sarker S, Paul MK, Thasin STH, Hasan MAM. Analyzing students' academic performance using educational data mining. Computers and Education: Artificial Intelligence. 2024;7:100263. doi: 10.1016/j.caeai.2024.100263.
  12. Maksud A, Nesar A. Machine learning approaches to digital learning performance analysis. International Journal of Computing and Digital Systems. 2021;10(1):963-971. doi: 10.12785/ijcds/100187.
  13. Vahdat M, Oneto L, Anguita D, Funk M, Rauterberg M, editors. A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. Design for Teaching and Learning in a Networked World: 10th European Conference on Technology Enhanced Learning, EC-TEL 2015, Toledo, Spain, September 15-18, 2015. Cham: Springer; 2015. doi: 10.1007/978-3-319-24258-3_26.
  14. Kou G, Xu Y, Peng Y, Shen F, Chen Y, Chang K, et al. Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decision Support Systems. 2021;140:113429. doi: 10.1016/j.dss.2020.113429.
  15. Zhang Y, Liu J, Shen W. A review of ensemble learning algorithms used in remote sensing applications. Applied Sciences. 2022;12(17):8654. doi: 10.3390/app12178654.
  16. Rane N, Choudhary SP, Rane J. Ensemble deep learning and machine learning: applications, opportunities, challenges, and future directions. Studies in Medical and Health Sciences. 2024;1(2):18-41. doi: 10.48185/smhs.v1i2.1225.
  17. Mohammed A, Kora R. A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences. 2023;35(2):757-74. doi: 10.1016/j.jksuci.2023.01.014.
  18. Tomasevic N, Gvozdenovic N, Vranes S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & education. 2020;143:103676. doi: 10.1016/j.compedu.2019.103676.
  19. Wan S, Yang H. Comparison among methods of ensemble learning. International Symposium on Biometrics and Security Technologies. 2013 July 02-05; Chengdu, China. New York (NY): IEEE; 2013. P. 286-290. doi: 10.1109/ISBAST.2013.50.
  20. Baig MA, Shaikh SA, Khatri KK, Shaikh MA, Khan MZ, Rauf MA. Prediction of Students Performance Level Using Integrated Approach of ML Algorithms. Int J Emerg Technol Learn. 2023;18(1):216-34. doi: 10.3991/ijet.v18i01.35339.
  21. Brahim GB. Predicting student performance from online engagement activities using novel statistical features. Arab J Sci Eng. 2022;47(8):10225-43. doi: 10.1007/s13369-021-06548-w. PubMed PMID: 35070634; PubMed Central PMCID: PMC8762194.
  22. Tan P-N, Steinbach M, Kumar V. Introduction to data mining. 2nd ed. New Delhi: Pearson Education India; 2019. P. 137-139. Available from: https://elibrary.pearson.de/book/99.150005/9780273775324.
  23. Hussain M, Zhu W, Zhang W, Abidi SMR, Ali S. Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review. 2019;52:381-407. doi: 10.1007/s10462-018-9620-8.
  24. Bolón-Canedo V, Alonso-Betanzos A. Ensembles for feature selection: A review and future trends. Information Fusion. 2019;52:1-12. doi: 10.1016/j.inffus.2018.11.008.
  25. Mienye ID, Sun Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects. IEEE Access. 2022;10:99129-49. doi: 10.1109/ACCESS.2022.3207287.
  26. Ajibade SM, Ahmad NBB, Shamsuddin SM. Educational data mining: enhancement of student performance model using ensemble methods. Joint Conference on Green Engineering Technology & Applied Computing: 2019 Feb 4–5; Bangkok, Thailand. England: IOP Publishing; 2019. doi:10.1088/1757-899X/551/1/012061.