Main Article Content

Abstract

  • The study analyzes and assesses differential item functioning (DIF) by different demographic groups, particularly gender and cultural groupings, in order to produce appropriate test items. It is essential to examine the extent to which test items work differently among subgroups when selecting test items. This paper is based on modern ways for removing irrelevant parameters and sources of bias of any kind so that a test can produce valid results. As a result, it is recommended that test developers and policymakers evaluate and exercise caution in fair test practice by devoting more effort to more unbiased test development and decision-making. In educational testing, examination bodies should employ the Item Response Theory, and test developers should be aware of test items that may induce bias in response patterns between male and female students or any other sub-group of interest.

Keywords

Differential Item Functioning, Test Fairness, Item Bias, Item Characteristic Curve, Item Response Theory.

Article Details

References

  1. Akubuiro, I. M. (2002). Self-concept, attitude and performance of senior secondary school students in physical science subject in Southern River State, Nigeria (Unpublished master’s thesis). University of Calabar, Calabar, Nigeria.
  2. Anagbogu, G. E. (2009). Analysis of psychometric properties of WAEC and NECO Examination instruments and students’ ability parameters in Cross River State-Nigeria. An unpublished PhD thesis Department of Educational Foundations, University of Calabar.
  3. Angoff, W. (1982). Perspective on differential item functioning methodology. In: P. W. Holland & H. Wainer (Eds.), Differential item functioning. (pp. 3-24). Hillsdale, NJ: Lawrence Erlbaum Associates.
  4. Baharloo, A. (2013). Test fairness in traditional and dynamic assessment. Theory and Practice in Language Studies, 3(10) 1930-1938. doi:10.4304/tpls.3.10.1930-1938.
  5. Bolt, D., & Stout, W. (1996). Differential item functioning: its multidimensional model and resulting SIBTEST detection procedure. Behavioumetrika, 23(3), 67-95.
  6. Camilli, G. (2006). Test fairness. In R. Brennan (Ed.), Educational measurement, 4th ed. (pp. 221–256), Westport, CT: American Council on Education and Praeger.
  7. Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publications, Inc.
  8. De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
  9. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, N J
  10. Ezeh, A. (2013). Is gender a factor in mathematics performance among Nigeria pre-service teachers? Sex Role, 51(11&12), 749 -754.
  11. Flaugher, R. L (1973). The new definition of test fairness in selection: Developments and implications, research memorandum. Princeton, New Jersey: Educational testing service
  12. French, B. F., & Finch, W. H. (2010). Hierarchical logistic regression: accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3), 299-317.
  13. Fulcher, G. & Davidson, F. (2007). Language testing and assessment. Glenn Fulcher and Fred Davidson, London.
  14. Güler, N., & Penfield, R. D. (2009). A comparison of the logistic regression and contingency table methods for simultaneous detection of uniform and non-uniform DIF. Journal of Educational Measurement, 46(3), 314-329.
  15. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park CA: Sage Publication.
  16. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In: H. Wainer & H. I. Braun (Eds.), Test validity. (pp. 129–145). Lawrence Erlbaum Associates, Inc.
  17. Holland, P. W., & Wainer , H. (1993). Differential item functioning . Lawrence Erlbaum Associates, Inc.
  18. Jensen, H. R. (1980). Bias in mental testing. New York: Free Press.
  19. Jimoh, K. (2021). Gender and culture-related differential item functioning in 2016 National Examinations Council Mathematics multiple choice questions in Nigeria. (Unpublished doctoral thesis). Faculty of Education, Obafemi Awolowo University, Ile-Ife.
  20. Kanno, E. E. (2008, August 13). The deprived Cross Riverians. The Nigerian Chronicle.
  21. Lam, T. C. M. (1995). Fairness in performance assessment. ERIC digest (online). Available: http://ericae.net/db/edo/ED391982.htm (ERIC Document Reproduction service No. ED 391 982)
  22. Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677. doi:10.1007/BF02294041
  23. Linacre, J. M. (2011). The effect of misfit on measurement [Paper presentation]. Eighth International Objective Measurement Workshop, Berkeley, CA.
  24. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley.
  25. Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). Thousand Oaks, CA: Sage Publications, Inc.
  26. Park, H. S., Pearson, P. D., & Reckase, M. D. (2005). Assessing the effect of cohort, gender, and race on differential item functioning (DIF) in an adaptive test designed for multi-age groups. Reading Psychology, 26(1), 81-101.
  27. Shohamy, E. (2000). The power of tests: a critical perspective on the uses of tests. London: Longman.
  28. Stobart, G. (2005). Fairness in multicultural assessment systems. Assess. Educ. Pricile, Pol. Pract. 12(3), 275-287. doi:10.1080/09695940500337249
  29. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In: P. W. Holland & H. Wainer (Eds.), Differential item functioning. (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Associates.
  30. Weir, C. J. (2005). Language testing and validation. Palgrav McMillan. doi:10.1057/9780230514577
  31. Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2) 147–170
  32. Zumbo, B. D., (2007). A Handbook on the theory and methods of Differential Item Functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation.