We developed a fully automatic AI model capable of providing IFR binary classification derived from CAG images alone [37]. Despite the modest accuracy, the high negative predictive value (NPV of 90% in all arteries analysis) is clinically meaningful, as it offers the potential of terminating a diagnostic procedure without engaging in actual physiology measurements. This was especially relevant for the Cx and the RCA, given the NPV of 96% and 97%, respectively. In this paper, we aimed to find whether our AI method was superior, inferior or comparable to human performance, under similar conditions.
When comparing AI with the interventional cardiologists performance, there was considerable inter-operator and inter-artery heterogeneity, thereby significantly limiting very clear-cut general conclusions. However, the AI’s performance was generally mildly superior to that of operators, with mostly superior values of AUC and accuracy.
Interestingly, the AI model largely mimicked the operators’ strengths and weaknesses, in the sense that both had modest accuracy and AUC, high NPV, low PPV and low to modest sensitivity and specificity. This may partly be the result of a lesion dataset comprised mostly of lesions with an iFR > 0.89, as is typical of such populations [37]. This significantly limits both human and AI capability of identifying and correctly classifying positive lesions. It may also reflect intrinsic limitations of classifying iFR from CAG single frames alone. Notwithstanding, it is worth noting that AI had superior results regarding both PPV and sensitivity in almost all cases, thereby suggesting superior performance when dealing with positive (i.e. iFR < 0.89) lesions. This is of note, considering the limited amount of iFR < 0.89 cases available for training, as well as the much larger amount of exposure human operators have had as operators.
There were considerable differences when individual vessels were considered.
For the LAD, the general considerations above broadly apply, although attenuated: the negative predictive value was not as high, accuracy and AUC were lower, but both PPV and sensitivity were not as low. Generally, AI slightly outperformed operators.
For the RCA and Cx, both AI and operators were able to deal well with iFR negative (> 0.89) lesions, which were correctly classified very frequently. AI was more frequently correct when classifying a lesion as negative (i.e., higher NPV). Human operators were better capable of identifying negative lesions (i.e. higher specificity), especially in the circumflex artery. In both arteries, both AI and human operators incorrectly classified lesions as positive very frequently, but AI was more sensitive to positive lesions than humans, once more suggesting that it may have greater potential in dealing with such lesions, despite the limited number of cases available for training.
Segmentation had a mixed effect on operator classification of lesions. In all arteries, particularly the RCA and Cx, it led to a much greater degree of negative classifications and a lower tendency to classify lesions as positive. As a result, specificity was increased at the expense of sensitivity. And while the PPV was slightly improved in the LAD (where the majority of positive cases were concentrated), the opposite ensued for the remaining arteries, where PPV was very low, sometimes reaching 0. Lastly, NPV was slightly reduced by segmentation as well. The AUC plots clearly reflect this trade-off, as the increased capacity for identifying negatives is substantially affected by the tendency to overclassify lesions as negative, resulting in lower AUCs and thus a lower discriminatory ability to truly detect negative lesions. These findings are likely the result of the effect segmentation has in reducing the operators’ lesion severity estimation, as we have previously demonstrated [37]. While that may be of use in an all-comers population (as in the one we have previously published) [37], significantly reducing the tendency for classifying non-severe lesions as severe, in a selected population of intermediate lesions, it seems segmentation somewhat overcorrects this tendency and is largely not advantageous.
Multiple randomized trials have proven the benefit of physiology-guided PCI, significantly reducing death, myocardial infarction and repeat revascularization [4, 5, 7]. More recently, a randomized trial comparing FFR versus angiography-guided strategy in acute myocardial infarction with multivessel disease [38], showed that a FFR-guided decision making was superior to a strategy of angiography only-guided PCI for treatment of non-infarct-related artery lesions, when considering a composite of time to death, MI or repeat revascularization.
Despite the established evidence, coronary physiology remains largely underused [17, 18]. Some challenges associated with the technicalities of the procedure and the risk of procedure complications due to guiding-catheters and guide-wires use may explain why [17, 18]. Coronary angiography digitally-derived index (either FFR or iFR) could bypass this limitations and has been explored in recent years, with commercial softwares made recently available. Most studies were based on FFR, using a ≤ 0.80 threshold. The FAST-FFR [21] was a multicenter international study of 301 patients with a predominant LAD lesion (54.2%). When comparing the measured and the estimated FFR (FFRangio), the accuracy was 92.2%, the NPV 94.8%, the PPV 89%, the sensitivity 93.5% and the specificity 91.2% [21]. The correlation between the measured and the FFRangio was r = 0.80 (p < 0.001) [21]. Witberg et al. performed a pooled analysis of 5 cohort studies, with similar results [39]. More studies have shown encouraging results of FFRangio [20, 40] and the iFR from CAG has also been explored on the REVEAL iFR trial [41], with published results expected soon. These studies used mostly non-AI methods to derive from CAG, using a combination of three dimensional image reconstruction and computational fluid dynamics. Although they demonstrated that physiology can successfully be derived from CAG images alone, they are not without caveats: significant manual input is required, more than one projection may be required and inter-operator heterogeneity has been reported, with inferior performance to that of a core-lab [42].
Although AI have shown great potential in Cardiovascular Medicine, it’s use in coronary physiology is still under development. Three studies of FFR estimation from CAG using primarily AI methods were published [33, 43, 44]. Roguin et al. conducted a pilot study using a novel automated artificial intelligence angiography-based FFR software, capable of conducting a binary analysis of FFR ≤ 0.80 [33]. In a single center population consisting of 31 patients with predominantly LAD lesions (80%), they reported an accuracy of 90%, a NPV of 87%, a PPV of 94%, sensitivity of 88% and specificity of 93% when conducting a binary analysis of FFR ≤ 0.80 [33]. Their single model is able to derive an estimated FFR value, with an area under the curve of 0,91 and an r correlation coefficient of 0,71 (p < 0.001) [33].
Cho et al. [43] used a very large sample of 1501 lesions from a single center (predominantly LAD– 67%) and plotted the target vessel diameters together with clinical characteristics (age, sex, body surface area, and target segment) to binarily classify FFR measurements with a ≤ 0.80 threshold. An overall accuracy of 82%, a NPV of 84%, a PPV of 81%, sensitivity of 84% and specificity of 80% were reported in the test set, similarly to the external validation dataset of 79 patients [43]. However, their methodology is semi-automatic, since it requires significant manual annotations.
Arefinia et al. [44] designed a deep learning model for estimating the value of FFR using angiographic images to classify LAD stenosis between 50 and 70%. 3625 images were extracted from 41 patients’ angiographic films and FFR was also classified as negative (FFR > 0.80) and positive (≤ 0.80). An AUC, accuracy, sensitivity, specificity and positive predictive value of 0.81, 0.81, 0.86, 0.75 and 0.82, respectively, were described. Despite their impressive metrics, the very small sample size significantly limits analysis, given that only a small pool of measurements is available for testing. and the fact that they omitted the potential influence of factors such age and gender in FFR estimation. External validation is also needed.
LimitationsGiven the heavy reliance on large volumes of data for AI training, the size of our dataset emerged as an important limitation. This was particularly significant for cases involving the Cx and RCA where iFR was ≤ 0.89. Nonetheless, our distribution of target vessels and positive/negative cases aligns with previously reported data [11, 15, 20, 21, 33, 40]. Consequently, a much larger dataset for effective training is required, securing a dataset with a sufficient number of iFR-positive cases, particularly for the RCA and Cx.
Although a 10-fold cross-validation approach is a standard practice in machine learning, it’s use can be considered a limitation. An 80/20% train/test split would have led to a limited testing set, hindering our ability to accurately assess model performance, especially in analyzing performance across different target vessels due to the dataset’s inherent imbalance. To overcome this limitation, we used a 10-fold cross-validation, as described.
FFR has been directly compared to angiography in clinical outcomes studies [4,5,6], while iFR has only been compared directly to FFR [11, 12, 15]. Extrapolating iFR instead of FFR can be pointed as another potential limitation. However, iFR has consistently proven to be non-inferior to FFR and has become the preferred tool for assessing epicardial physiology in many laboratories due to its simplicity. As a result, the number of iFR measurements in our lab significantly surpasses those of FFR, offering a more extensive foundation for future training, refinement, and validation.
Another limitation lies in the model’s current ability for binary classification. During initial testing, it became evident that estimating the precise iFR value would demand a substantially larger dataset, which exceeded the scope of this initial study.
The use of a single frame for model training, rather than a 3D reconstruction based on multiple 2D projections, was also a limitation, considering the inherently 3D nature of coronary arteries and the superior performance of non-AI approaches utilizing 3D methods. The interventional cardiologists were exposed to a single “raw” fluoroscopic frame, thereby mimicking what the AI models actually “see”. However, this approach is considerable different from the daily clinical practice, which may have compromised their performance. Furthermore, because operators have access to a full cine loop, it’s possible that human performance would be different had we provided them that. However, the comparison would then be unbalanced, given that we would effectively be giving humans more complete data than to AI models, which may significantly limit the interpretation of results.
Another limitation relies on the fact that this was a single-center retrospective dataset and future external validation will be necessary. However, given that physiology results have been shown to be quite reproducible, the impact of this particular limitation is likely to be minimal.
The fact that unsuccessful segmentation was used as a selection criteria may limit generalizability due to selection bias. However, we have previously shown that our segmentation AI model is accurate and capable, and in this study failed segmentation was a result of either poor image quality or artifacts (such as pacemaker leads, surgery stiches, etc.). We therefore do not believe this will be a significant limitation going forward.
Therefore, this study is exploratory, serving as a proof of concept. We intend to significantly expand the training dataset through multi-institutional collaboration, which will be crucial for improving performance and enabling external validation. We also plan to enhance our models by incorporating multiple projections and 3D reconstruction, which could potentially boost performance. Ultimately, our goal is to deploy this for clinical use, as either new software or an enhancement of current non-AI methods.
Comments (0)