Clinical bone age assessment can benefit from reliable artificial intelligence automation

IB Lab CSO Matthew DiFranco presented results comparing PANDA with expert readers as part of the AI in musculoskeletal imaging session at ECR 2021. The study was conducted on a cohort of over 250 German patients and showed that IB Lab PANDA is accurate to within 6 months in patients aged 2 to 17 years, an improvement of almost 2 months compared with readers amongst themselves.

IB Lab PANDA is an AI-based software solution that automates bone age reading according to Greulich & Pyle (GP). IB Lab PANDA is trained on over 12,000 hand radiographs and provides a fully automated GP assessment along with findings for skeletal development status and growth potential achieved. It can be fully integrated into a hospital PACS system and runs asynchronously to the radiologist reading procedure, meaning that IB Lab PANDA results can appear alongside the original study without the need to wait.

For the purpose of this study, hand radiographs from the University of Leipzig Medical Center with clinical GP bone age of left-hand X-ray from 2011 to 2020 were obtained. Males were aged one to 19 years, and females aged one to 18 years. From that population, nine patients per sex and age group were selected, with no patient having more than one image in the dataset.

Three readers, two pediatric radiologists with five and four years experience, and one pediatric endocrinologist with 20 years of experience, independently read each image according to GP. The images were then processed onsite with IB Lab PANDA version 1.09.9. The statistical analysis was performed to assess the agreement and potential bias between the readers and IB Lab PANDA.

Results showed that for the 259 images which met the intended use range of IB Lab PANDA, the mean difference between IB Lab PANDA and the mean of the readers was 0.78 months, which was identical to the mean difference amongst readers.



IB Lab PANDA was accurate to within six months according to mean absolute deviation (MAD), whereas the readers MAD was nearly 8 months. Root mean squared error of IB Lab PANDA was nearly 8 months compared with 10 months for the readers. According to the confidence intervals, both the MAD and root mean squared error were significantly different.

Looking at the Bland-Altman plot for agreement, and considering the reader performance as clinical acceptance limits, IB Lab PANDA limits of agreement are within the range of clinically acceptability.


Orthogonal distance regression of IB Lab PANDA versus mean readers shows the mean difference of 0.78 and the slope of 0.99, indicating no proportional bias of IB Lab PANDA across ages. The largest errors are seen for very young and old patients, which are outside the intended use range.

In conclusion, IB Lab PANDA is shown to be in agreement with expert readers for pediatric bone age assessment on a German cohort and can provide reliable, fully automated artificial intelligence automation which can be integrated into the reading workflow.