Introduction: Introduction: Endoscopic ultrasound (EUS) is a high-skilled procedure with a limited number offacilities available for training. A high number of procedures is necessary to achieve competence. However, agreement between observers varies widely. Artificial intelligence (AI) aidedrecognition and characterization of anatomical structures may improve the training process whileimproving the agreement between observers. We aimed to develop an artificial intelligence modelthat recognizes in real-time the anatomical structures during EUS evaluations.
Methods: A single-center, pilot study. We developed two convolutional neural networks from linear and radial endoscopic ultrasound videos from patients without pathologies. The AI models were developed using an automated machine learning software (AI Works, MD Consulting group, Ecuador). Two expert endosonographers trained the two independent models. The linear and radial EUS algorithms metrics were calculated for recognizing anatomical structures during EUS evaluations.
Results: We included eight anatomical structures from twelve endoscopic ultrasound videos for the development of the EUS-AI algorithms. A total of 8113 samples were captured from the EUS videos (6354 for radial and 1759 for linear). The anatomical structures were recognized and labeled for the training of the AI models by two expert endosonographers (300 EUS/year). The proposed EUS Radial model reached a mean average precision (mAP) of 69.67%, F1-score (harmonic mean of sensitivity and precision) of 92%, average IoU (overall between model prediction and expert marking) of 79.08 %, with a total loss of 0.13. The developed EUS Linear model reached a mAP of 83.43%, F1-score of 89%, average IoU of 73.48%, with a total loss of 0.16.
Conclusion: The proposed AI models for linear and radial EUS recognizes and identifies the trained anatomical structures during real-time EUS evaluations. The proposed model could be implemented for the training in EUS, probably reducing the time and number of cases required for achieving competency.
Figure 1. Real-time identification of anatomical structures during radial endoscopic ultrasound evaluation.
Figure 2. Total loss of the proposed artificial intelligence model for real-time radial endoscopic ultrasound.
(The other authors declare no conflicts of interest)
Is a key opinion leader and consultant for Pentax Medical, Steris, Micro-tech, G-Tech Medical Supply, CREO Medical, mdconsgroup, and board member and consultant for EndoSound.
Is a consultant for Boston Scientific, Interscope Med, and Abbvie; a grant recipient from Boston Scientific, Conmed, Gore, Pinnacle, Merit Medical, Olympus Medical, and Ninepoint Medical; and the chief executive officer and founder of Innovative Digestive Health Education & Research Inc.
Is a consultant for Ninepoint Medical, EndoGastric Solutions, and Obalon Therapeutics.
Is a speaker for Boston Scientific, ConMed, Medtronic, and GI Supplies; an advisory board member for Microtech; and a co-owner of EndoRx.
Is a consultant for Olympus, Boston Scientific, Omega Medical Imaging, M.I. Tech, Tigen Pharma, and Ambu.
No conflicts of interest
Bile duct strictures can be caused by neoplastic and nonneoplastic processes affecting diagnostic procedures and treatment, and the most used procedure to evaluate these lesions is endoscopic retrograde cholangiopancreatography (ERCP) with brush cytology and biopsy sampling. However, several studies have indicated that this procedure has low sensitivity, diagnostic accuracy, and imaging limitations. Furthermore, these limitations could lead to diagnostic and sampling errors, requiring reinterventions and increasing costs for patients.
To overcome ERCP´s limitations, direct visualization of the bile duct system with digital single operator cholangioscopy (DSOC), a minimally invasive diagnostic and therapeutic procedure, was established. However, this procedure still has limitations, such as interobserver variability in the detection of neoplastic lesions.
The introduction of artificial intelligence (AI) in the medical field has led to the development of innovative and more precise diagnostic tools. Using complex deep learning algorithms such as convolutional neural networks (CNN), a subtype of AI, which allows the software to identify features in images and videos, assisting in image interpretation; however, few studies on the diagnostic performance of DSOC CNN models have been performed. Thinking on the limitations of both ERCP and DSOC, mdconsgroup (Guayaquil, Ecuador), developed a CNN model that recognizes macroscopic morphological characteristics of neoplastic lesions during real-time DSOC as a red flag technique, and clinically validated the model through comparisons to DSOC in expert and nonexpert endoscopists.
The study was designed as two-stage research to develop, train, and validate the performance of a CNN model (AIWorks-Cholangioscopy, mdconsgroup, Guayaquil, Ecuador) that identifies neoplastic lesions in indeterminate biliary strictures in previously recorded DSOC videos and during real-time (live) DSOC procedures. This study was performed in collaboration with the Instituto Ecuatoriano de Enfermedades Digestivas (IECED), and international centers from the United States of America and Europe including the Houston Methodist Hospital (Texas), Baylor Saint Luke’s Medical Center (Texas), the Robert Wood Johnson Medical School Rutgers University (New Jersey), and the Universitair Ziekenhuis Brussel (UZB)/Vrije Universiteit Brussel (VUB) (Belgium).
The study protocol was approved by IECED’s Institutional Review Board and was conducted in accordance with the Helsinki Declaration. All participants of their legal guardian provided written informed consent to transfer their DSOC videos to the AIWorks Cloud (mdconsgroup, Guayaquil, Ecuador) for analysis, interpretation, and publication.
Stage I consisted of an observational, analytic, prospective single-center diagnostic pilot study that aimed to development, train and internal validate the AIWorks-Cholangioscopy software. This stage was performed from January 2020 to October 2020. Based on histological findings and 12-month follow-up results, two groups were identified: a case group including patients with neoplastic lesions and a control group with patients with nonneoplastic lesions.
Development and model validation
A first version of the model (AIWorks-Cholangioscopy v1.0) was developed using the DSOC videos of twenty-three patients with definitive neoplastic lesion diagnoses based on histological findings and twelve-month follow-ups. First, based on the visual examinations performed by the experts, macroscopic neoplastic features were classified in accordance with the Carlos Robles Medranda (CRM) and Mendoza classifications. Lesions were recorded and labeled within a bounding box using the AIWorks Cloud (Figure 1). After a definitive diagnosis was confirmed at the 12-month follow-up, the collected frames were fed to the AIworks software.
Figure 1. DSOC image showing the AIWorks-Cholangioscopy v1.0 detection.
A second version of the model (AIWorks-Cholangioscopy v2.0) was developed for the second stage of the present study using 116 additional DSOC videos with identified neoplastic lesions utilizing the criteria as for CNNv1 (Figure 2). This CNN1 improved version was developed using YOLOv4 (Washington, USA) with a 90% training and 10% validation dataset distribution.
Figure 2. DSOC image showing the AIWorks-Cholangioscopy v2.0 detection.n.
To validate the models, four metrics were obtained from the model validation process of both CNN models:
AIWorks-Cholangioscopy v1.0 development and model validation
A total of 81080 frames were retrieved from twenty-three patients and distributed into a training and a testing dataset, which were used to develop AIWorks-Cholangioscopy v1.0. This model achieved a mAP of 0.29, a 32.2 IoU, 0.280 F1-score, and a total loss of 0.1034 (table 2).
Table 2. Model validation metrics of both AIWorks-Cholangioscopy models.
AIWorks-Cholangioscopy v1.0 clinical validation
During the clinical validation, an additional 25 consecutive naïve patients, not previously included for development and training, were included. AIWorks-Cholangioscopy v1.0 accurately detected tumor vessels in 10/10 histology-confirmed cholangiocarcinoma cases and 5/5 real-time endoscopic procedures. Additionally, this model was tested with ten patient videos in which cholangioscopy was performed to confirm stone removal following laser lithotripsy; 2/10 videos were incorrectly classified as positive for neoplasia by AIWorks-Cholangioscopy v1.0, most likely due to the detection of a stone-related inflammatory pattern. In the per-patient analysis, the AIWorks-Cholangioscopy v1.0 reached a 92.0% observed agreement, with a sensitivity, specificity, PPV, and NPV of 100.0%, 80.0%, 88.0%, and 100.0%, respectively. In the per-frame analysis, AIWorks-Cholangioscopy v1.0 reached a 97.0% observed agreement, with a sensitivity, specificity, PPV, and NPV of 98.0%, 95.0%, 98.0%, and 94.0%, correspondingly (Table 3).
Table 3. Clinical validation and diagnostic accuracy of both AIWorks-Cholangioscopy versions.
AIWorks-Cholangioscopy v2.0 development and validation
After AIWorks-Cholangioscopy v1.0 was clinically validated, the model was improved. An additional 116 patients with a neoplastic definitive diagnosis were enrolled, and their corresponding videos were uploaded to AIworks Clouds for training and model validation. A total of 198941 frames were retrieved from these new group of patients, not previously included in the development and validation of the first version of the model and were used to develop AIWorks-Cholangioscopy v2.0 (159153 frames for training and 39788 frames for validation). AIWorks-Cholangioscopy v2.0 had a reading rate of 30 to 60 frames per second, with a 5-second validation data reading. The developed model achieved a mAP of 0.88, a total loss of 0.0975, an F1-score of 0.738, and an 83.24% IoU average. The model validation metrics and diagnostic accuracy are summarized in table 2 and table 3.
Stage II was an observational, analytic, nested case-control, multicenter diagnostic trial that aimed to clinically validate AIWorks-Cholangioscopy v2.0 on prerecorded DSOC videos of treatment-naïve patients from different centers between October 2020 and December 2021. International centers participate as part of the expert endoscopist panel.
Expert and nonexpert DSOC video assessment:
Four expert endoscopists (>150 DSOC procedures per year), one from each participating center, uploaded DSOC videos prospectively from their respective endoscopy units to the AIWorks Cloud. Within cloud system, the same four experts and four additional nonexpert general practitioners blinded to clinical records, classified the uploaded DSOC videos as neoplastic or nonneoplastic, based on the CRM and Mendoza classifications’ criteria in a disaggregated manner (Table 1). To prevent bias, none of the experts assessed patient videos from their respective centers.
Table 1. Neoplastic lesion criteria based on the Carlos Robles-Medranda (CRM) and Mendoza classifications and disaggregated neoplastic criteria.
Additionally, the AIWorks-Cholangioscopy v2.0 was applied to the uploaded DSOC videos and classified them as neoplastic or nonneoplastic based on the CRM and Mendoza neoplasia criteria. Both groups of physicians completed an online dataset and marked the presence or absence of macroscopic features, and the number of videos observed by the participants depended on the number of cases they provided to the study. The assessment was split into four sessions by dividing the video set in two equal sets.
The diagnostic accuracy, which was defined as the sensitivity, specificity, positive, and negative predictive values (PPV and NPV), and observed agreement, of both CNN models was calculated based on both, patients, and frames. The frame-based diagnostic accuracy considered frames of interest in the patient’s prerecorded videos. We calculated the diagnostic accuracy and the area under the receiver operating curve (ROC) of the experts, nonexperts, and patient based AIWorks-Cholangioscopy v2.0 model. The diagnostic accuracy of the experts and nonexperts estimation was calculated separately for the CRM and Mendoza classifications. The comparison between AIWorks-Cholangioscopy v2.0 and diagnostic accuracies through the CRM or Mendoza classifications were defined through DeLong’s test for two ROC curves, from dependent samples. Histological findings and twelve-month follow-up results were considered the gold standard.
For stage II, a total of 170 patients, not previously included in stage I, from the participating centers were included and equally distributed into neoplastic and nonneoplastic groups. The median patient age was 62.5 years (57.0–68.8), and 46.5% of the patients were female.
After the experts and nonexperts assessment, we evaluated the overall diagnostic accuracies with both DSOC classification systems and compared the results to those of AIWorks-Cholangioscopy v2.0 and follow-up. The CNN model obtained higher pooled diagnostic accuracy than both participant groups (Table 4). Additionally, the CNN model obtained higher diagnostic accuracy that 25% of the participant experts (Table 5) and 50% of the participant nonexperts (Table 6).
Table 4. Pooled analysis of the AI model, expert and nonexpert endoscopists diagnostic accuracy.
Table 5. Comparison between the expert and the AIWorks-Cholangioscopy v2.0 model showing statistical significance.
Table 6. Comparison between the nonexperts and the AIWorks-Cholangioscopy v2.0 model showing statistical significance.
To date, despite the numerous advantages of DSOC, there is an ongoing discrepancy between the operators’ visual impression using current classifications for indeterminate biliary lesions. To overcome this limitation, the application of new technologies to aid in image interpretation has been proposed.
In the present study, we developed a new DSOC-based CNN for recognizing neoplasia in indeterminate biliary lesions in previously recorded videos and real-time live DSOC procedures and compared the model with DSOC experts and nonexperts using the CRM and Mendoza classifications. We found that the model achieved high diagnostic accuracy for detecting neoplastic lesions when analyzing frames and real-time endoscopic procedures, with a significantly better performance than the nonexpert group. Additionally, this was the first international multicenter study to develop and validate a CNN model applicable to previously recorded videos and live procedures to detect neoplastic lesions in naïve patients.
Because AI assistance should be applied at real-time (rather than after the procedures) to aid in diagnosis and image interpretation, we consider the diagnostic accuracy and clinical validation of any CNN model should be based on cases rather than frames. Thus, we obtained the diagnostic accuracy of CNN2 in detecting neoplastic lesions and compared the results to the final diagnoses based on histological findings and twelve-month follow-up data. CNN2 achieved a 90.5% sensitivity, 68.2% specificity, 74.0% PPV, and 87.8% NPV, with an observed agreement of 80%. Hence, using frames for diagnostic accuracy cannot match the true diagnostic value of CNN models in the clinical practice, and using cases instead of frames would provide a more accurate clinical validation.
Nonetheless, the key advantage of the AIWorks-Cholangioscopy v2.0 over other DSOC-based CNN models, is that it can be applied during real-time DSOC procedures, leading to more conclusive diagnostic results. This advantage could eliminate the need for repeated invasive procedures or possible delays in curative surgery. Furthermore, the accurate diagnosis of neoplastic lesions in patients with biliary disorders and prompt therapeutic responses may improve overall survival and/or decrease differences between visual examinations and histological results by improving targeted biopsy sampling. Additionally, this model may be able to shorten DSOC learning curve, as it may help increase a trainees’ diagnostic visual impression confidence, thus reducing the missed lesion rate.
The potential clinical benefits of using the AI system during DSOC include:
Moreover, the AIWorks-Cholangioscopy software may lead to increased DSOC availability, which has been proven to facilitate clinical care for such patients. In conclusion, the AIWorks-Cholangioscopy software can accurately recognize and classify biliary lesions as neoplastic in previously recorded videos and real-time DSOC procedures with naïve cases. Furthermore, our proposed CNN model effectively outperformed experts and nonexperts.