Please use this identifier to cite or link to this item: doi:10.22028/D291-47399
Title: Differential Diagnosis of Parotid Tumors on Ultrasound: Interobserver Variability and Examiner-Specific Decision Rules—A Machine Learning Approach
Author(s): Pillong, Lukas
Ohnesorg, Ida
Brust, Lukas Alexander
Palm, Jan
Schulze-Berge, Julia
Bozzato, Victoria
Voges, Manfred
Müller, Adrian
Garner, Malvina
Bozzato, Alessandro
Language: English
Title: Diagnostics
Volume: 16
Issue: 6
Publisher/Platform: MDPI
Year of Publication: 2026
Free key words: ultrasound
parotid gland tumors
machine learning
decision trees
interobserver variability
DDC notations: 610 Medicine and health
Publikation type: Journal Article
Abstract: Background/Objectives: Noninvasive differentiation of parotid gland tumors remains challenging despite ultrasound being the primary imaging modality for salivary gland lesions. Given its examiner dependence, improving diagnostic consistency and transparency is crucial. We quantified interobserver variability in parotid ultrasound, modeled examiner-specific decision patterns using machine learning surrogates, and tested whether surrogate complexity relates to examiner performance. Methods: In this retrospective, single-center study, six examiners independently rated ultrasound images of 149 parotid tumors using predefined descriptors. Performance was summarized using accuracy and the area under the receiver operating characteristic curve (AUC), with 95% confidence intervals (CIs). AUCs were compared using DeLong tests (Holm-adjusted). Interobserver agreement was assessed using pairwise Cohen’s and global Fleiss’ κ. For each examiner, a decision-tree surrogate was trained from structured descriptors and clinical metadata to reproduce examiner labels and visualize decision pathways; performance was estimated by 5-fold cross-validation. Results: Examiner accuracy ranged from 63.5% to 90.5% and AUC from 0.66 to 0.89 (best 0.89, 95% CI 0.83–0.95); the best performer exceeded the two lowest performers (p < 0.001). Agreement was higher for objective descriptors (size: κ = 0.57–0.97) than for subjective descriptors (echogenicity: κ = 0.11–0.79). Surrogate decision-tree accuracy versus histopathology ranged from 57.2% to 80.0% for unpruned and from 65.1% to 76.5% for pruned models, with high coverage (95.3–98.7%). Tree complexity showed no consistent association with examiner performance. Conclusions: Parotid ultrasound shows substantial interobserver variability. Interpretable surrogates can approximate individual labeling behavior from structured descriptors and clinical metadata, making examiner-dependent decision patterns explicit.
DOI of the first publication: 10.3390/diagnostics16060880
URL of the first publication: https://doi.org/10.3390/diagnostics16060880
Link to this record: urn:nbn:de:bsz:291--ds-473999
hdl:20.500.11880/41477
http://dx.doi.org/10.22028/D291-47399
ISSN: 2075-4418
Date of registration: 1-Apr-2026
Description of the related object: Supplementary Materials
Related object: https://www.mdpi.com/article/10.3390/diagnostics16060880/s1
Faculty: M - Medizinische Fakultät
Department: M - Anästhesiologie
M - Hals-Nasen-Ohrenheilkunde
M - Radiologie
Professorship: M - Prof. Dr. Markus Hecht
M - Prof. Dr. Bernhard Schick
M - Prof. Dr. Thomas Volk
Collections:SciDok - Der Wissenschaftsserver der Universität des Saarlandes

Files for this record:
File Description SizeFormat 
diagnostics-16-00880.pdf1,65 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons