Please use this identifier to cite or link to this item:
doi:10.22028/D291-47399 | Title: | Differential Diagnosis of Parotid Tumors on Ultrasound: Interobserver Variability and Examiner-Specific Decision Rules—A Machine Learning Approach |
| Author(s): | Pillong, Lukas Ohnesorg, Ida Brust, Lukas Alexander Palm, Jan Schulze-Berge, Julia Bozzato, Victoria Voges, Manfred Müller, Adrian Garner, Malvina Bozzato, Alessandro |
| Language: | English |
| Title: | Diagnostics |
| Volume: | 16 |
| Issue: | 6 |
| Publisher/Platform: | MDPI |
| Year of Publication: | 2026 |
| Free key words: | ultrasound parotid gland tumors machine learning decision trees interobserver variability |
| DDC notations: | 610 Medicine and health |
| Publikation type: | Journal Article |
| Abstract: | Background/Objectives: Noninvasive differentiation of parotid gland tumors remains challenging despite ultrasound being the primary imaging modality for salivary gland lesions. Given its examiner dependence, improving diagnostic consistency and transparency is crucial. We quantified interobserver variability in parotid ultrasound, modeled examiner-specific decision patterns using machine learning surrogates, and tested whether surrogate complexity relates to examiner performance. Methods: In this retrospective, single-center study, six examiners independently rated ultrasound images of 149 parotid tumors using predefined descriptors. Performance was summarized using accuracy and the area under the receiver operating characteristic curve (AUC), with 95% confidence intervals (CIs). AUCs were compared using DeLong tests (Holm-adjusted). Interobserver agreement was assessed using pairwise Cohen’s and global Fleiss’ κ. For each examiner, a decision-tree surrogate was trained from structured descriptors and clinical metadata to reproduce examiner labels and visualize decision pathways; performance was estimated by 5-fold cross-validation. Results: Examiner accuracy ranged from 63.5% to 90.5% and AUC from 0.66 to 0.89 (best 0.89, 95% CI 0.83–0.95); the best performer exceeded the two lowest performers (p < 0.001). Agreement was higher for objective descriptors (size: κ = 0.57–0.97) than for subjective descriptors (echogenicity: κ = 0.11–0.79). Surrogate decision-tree accuracy versus histopathology ranged from 57.2% to 80.0% for unpruned and from 65.1% to 76.5% for pruned models, with high coverage (95.3–98.7%). Tree complexity showed no consistent association with examiner performance. Conclusions: Parotid ultrasound shows substantial interobserver variability. Interpretable surrogates can approximate individual labeling behavior from structured descriptors and clinical metadata, making examiner-dependent decision patterns explicit. |
| DOI of the first publication: | 10.3390/diagnostics16060880 |
| URL of the first publication: | https://doi.org/10.3390/diagnostics16060880 |
| Link to this record: | urn:nbn:de:bsz:291--ds-473999 hdl:20.500.11880/41477 http://dx.doi.org/10.22028/D291-47399 |
| ISSN: | 2075-4418 |
| Date of registration: | 1-Apr-2026 |
| Description of the related object: | Supplementary Materials |
| Related object: | https://www.mdpi.com/article/10.3390/diagnostics16060880/s1 |
| Faculty: | M - Medizinische Fakultät |
| Department: | M - Anästhesiologie M - Hals-Nasen-Ohrenheilkunde M - Radiologie |
| Professorship: | M - Prof. Dr. Markus Hecht M - Prof. Dr. Bernhard Schick M - Prof. Dr. Thomas Volk |
| Collections: | SciDok - Der Wissenschaftsserver der Universität des Saarlandes |
Files for this record:
| File | Description | Size | Format | |
|---|---|---|---|---|
| diagnostics-16-00880.pdf | 1,65 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License

