Senior Expertise and Peer Consensus: A Comparative Analysis of AI and Clinician Measurements in Multi-Curve Scoliosis Assessment
- 1 Division of Information Technology, College of Information and Communications Technology West Visayas State University, Iloilo, Philippines
- 2 Department of Pediatrics, College of Medicine, West Visayas State University, Iloilo, Philippines
- 3 Department of Information Systems, College of Information and Communications Technology West Visayas State University, Iloilo, Philippines
- 4 Department of Orthopedics, West Visayas State University Medical Center, Iloilo, Philippines
Abstract
Given the scarcity in the literature, this study explored the use of AI for multi-curve scoliosis assessment. Its performance was analyzed through comparison against a group of clinicians composed of one senior and five non-senior orthopedic surgeons. The analysis focused on Cobb angle measurement and identification of vertebral endplates across three curve regions, namely Main Thoracic (MT), Proximal Thoracic (PT), and Thoracolumbar/Lumbar (TL/L). As evidenced by the results, there is a strong agreement in the MT region, as shown by low Mean Absolute Differences (MAD) at 2.21 and high interclass correlation coefficients (ICC 0.94 –0.98), suggesting the clinically reliable performance of AI in this area of the spine. Meanwhile, moderate agreement was observed in the TL/L region (ICC 0.74–0.89), although the PT region presented significant challenges, with high MAD values and ICC values near zero. This highlights variations in end vertebra selection due to anatomical and image quality limitations, which significantly affect the respective Cobb angle measurements of the human observers. On the other hand, subjectivity in identifying vertebral landmarks, which is apparent in low-quality radiographs, was revealed through qualitative observations. An interesting finding is that most of AI's measurements aligned more closely with the group consensus of non-senior clinicians than with the senior expert, possibly signifying its inclination towards combined human patterns rather than expert-level preference. Caudal endplate identification showed higher agreement across evaluators than cranial endplates, implying that certain anatomical landmarks are more consistently identifiable. This result is indicative of AI’s potential for standardizing scoliosis evaluation, particularly in the MT and TL/L region, despite its underperformance in the PT region. Thus, it concludes that there is a need to emphasize enhanced algorithm development, improved training datasets, and above all, to integrate expert oversight. The alignment of AI with general clinician consensus underlines its potential as a reliable, standardizing tool in clinical practice, but it is imperative that expert input remains a crucial part of the study.
DOI: https://doi.org/10.3844/jcssp.2026.886.897
Copyright: © 2026 Frank Ibañez Elijorde, Joselito F. Villaruz, Ma Beth S. Concepcion and Mylo N. Soriaso. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
- 67 Views
- 19 Downloads
- 0 Citations
Download
Keywords
- Artificial Intelligence and Deep Learning
- Scoliosis Assessment
- Multi-Curve Scoliosis
- Vertebral Endplate Selection
- Cobb Angle Measurement