Prediction of protein–protein interaction based on interaction-specific learning and hierarchical information

Overview of HI-PPI

Although GNN and its variants have been extensively explored for PPI prediction, they often lack mechanisms to effectively simulate the natural hierarchical information and interaction properties inherent to biological systems. To address these limitations, we propose HI-PPI, an interaction-specific and hierarchy-specific framework. HI-PPI is designed to integrate two critical aspects: (i) modeling the hierarchical relationships between proteins in hyperbolic space and (ii) capturing pairwise information between to-be-predicted PPIs by incorporating interaction networks. This dual-specific framework enables HI-PPI to achieve efficient and accurate representations of PPIs.

Fig. 1figure 1

A schematic diagram of HI-PPI. a The preprocess of protein data. The heterogeneous GNN encoder is adopted to extract the latent representation in contact map of protein structure. b A PPI graph is constructed based on PPI network and extracted feature. The node representation of each protein is updated iteratively by hyperbolic graph convolutional layer. Subsequently, the importance of pairwise information is controlled by a gating mechanism in interaction-specific network. The resulting embedding of PPI is processed to a classifier for prediction

In the feature extraction stage, the structure and sequence data are processed independently. For protein structure, a contact map is constructed based on the physical coordinates of the residues. Encoded structural features are derived using a pre-trained heterogeneous graph encoder and a masked codebook [18]. For protein sequence data, representations are obtained based on physicochemical properties. The feature vectors from protein structure and sequence are concatenated to form the initial representation of proteins. As shown in Fig. 1, hyperbolic GCN layer is employed to iteratively update the embedding of each protein (node) by aggregating the neighborhood information in PPI network. To effectively capture the hierarchical information, we apply the classical GCN layer within hyperbolic space, in which the level of hierarchy is represented by the distance from the origin. Furthermore, a task-specific block is employed for interaction prediction. The hyperbolic representations of protein are propagated along pairwise interaction; the Hadamard product of protein embeddings is filtered through a gating mechanism, which dynamically controls the flow of cross-interaction information.

Benchmark evaluation of HI-PPI

We train and evaluate HI-PPI on SHS27K and SHS148K [25, 26], which are classical benchmark datasets that derived from STRING database [27]. Both datasets are Homo sapiens subset of STRING, SHS27K contains 1690 proteins and 12,517 PPIs, and SHS148K contains 5189 proteins and 44,488 PPIs. The training and test sets are constructed using the Breadth-First Search (BFS) and Depth-First Search (DFS) strategies proposed by Lv et al. [14]. For each dataset, 20% of the PPIs are selected as the test set based on the above strategies, while the remaining PPIs are used as the training set.

To validate the performance of our method, we compare HI-PPI with six state-of-the-art PPI prediction methods from four perspectives, including (1) the overall performance with multiple evaluation metrics, (2) the generalization ability on different PPI types, (3) the robustness of HI-PPI against edge perturbation, and (4) ablation study with commonly used deep learning models.

The benchmark methods include PIPR [25], LDMGNN [28], AFTGAN [16], BaPPI [29], HIGH-PPI [17], and MAPE-PPI [18]. The performance results of all methods were obtained by running on the same dataset split. We conducted each experiment five times, and the final results are reported in two decimal places.

Table 1 Performance comparison on SHS27K and SHS148K with different partition schemes, data are presented as mean ± stdHI-PPI shows the best performance, generalization and robustness

As shown in Table 1, HI-PPI achieves superior performance across all evaluation metrics. Specifically, in terms of Micro-F1, HI-PPI outperforms BaPPI by an average of 2.10% on the SHS27K dataset and exceeds MAPE-PPI by an average of 3.06% on the SHS148K dataset. The improvements on SHS148K are higher than SHS27K, which could result from the percentage of unseen proteins in dataset. These results demonstrate the effectiveness of the hyperbolic operation and interaction-specific learning framework of HI-PPI. Overall, HI-PPI achieves the best performance in 15 out of the 16 evaluation schemes, highlighting its consistent superiority. More precisely, in the DFS scheme of SHS27K, our method achieves 0.7746 in Micro-F1, 0.8235 in AUPR, 0.8952 in AUC, and 0.8328 in accuracy. BaPPI ranks second on SHS27K, while MAPE-PPI achieves the second-best results on SHS148K. To evaluate the significance of the performance improvements achieved by HI-PPI, we have conducted a two-sample t-test comparing HI-PPI to the second-best method, MAPE-PPI. The P values obtained for the SHS27K(BFS), SHS27K(DFS), SHS148K(BFS), and SHS148K(DFS) datasets were 0.0023, 0.0001, 0.0003, and 0.0006, respectively. All of P values are below the 0.05 threshold, indicating a statistically significant difference in performance between HI-PPI and MAPE-PPI, confirming that the improvements achieved by HI-PPI are statistically significant.

We also observe that the structure-based methods (HI-PPI, MAPE-PPI, and HIGH-PPI) achieve better performance than the other methods that rely solely on sequence data. This can be attributed to the fact that a protein’s structure directly determines its function, and the spatial biological information provided by protein contributes to PPI prediction. PIPR gets relatively poor performance, due to its inability to effectively model global information within the PPI network.

Fig. 2figure 2

Precision-recall curves of PPI prediction of SHS27K, showing the performance of HI-PPI compared to MAPE-PPI, HIGH-PPI, BaPPI, AFTGAN, LDMGNN, and PIPR across five independent replicates. a Under the BFS partitioning. b Under the DFS partitioning. The shade indicates the range between the highest and lowest results

Fig. 3figure 3

Robustness evaluation of HI-PPI against random perturbations with different ratios, performed across 12 independent replicates

We conduct an in-depth analysis of the BFS and DFS schemes of SHS27K. Compared to SHS148K, SHS27K exhibits a lower percentage of known proteins (proteins present in both the training and test sets), establishing it as a practical benchmark dataset for evaluating the trade-off between precision and recall. The precision-recall (PR) curves for each method are presented in Fig. 2. HI-PPI achieves the highest area under the PR curve and demonstrates superior performance at most 50% thresholds. Additionally, as indicated by the shaded regions, HI-PPI exhibits significantly greater stability compared to other methods. Under the BFS scheme, HI-PPI achieves the best performance when precision ranges from 0.4 to 1, whereas under the DFS scheme, it achieves the best performance when precision ranges from 0.57 to 1. In terms of secondary performance, HIGH-PPI ranks second under the BFS scheme, while BaPPI achieves the second-best results under the DFS scheme. It is worth noting that although all methods perform better under the DFS scheme, the variance across thresholds and fluctuations between thresholds increase significantly. This can be attributed to the sparsely distributed proteins selected by the DFS strategy, which introduces bias into both the training and test datasets.

Robustness is another important evaluation metric for deep learning models. Here, we analyze the model tolerance against data perturbation. In order to simulate unknown or undiscovered interactions, we randomly remove existing interactions in SHS27K with different percentages and apply 5-fold cross validation. The Micro-F1 of each ratio is displayed in the boxplot of Fig. 3. HI-PPI maintains an average of 0.82 Micro-F1 score with 20% perturbation ratio. When the ratio increases from 0.2 to 0.8, HI-PPI keeps relatively stable performance. The difference between original dataset and 0.8 perturbation is around 0.25, indicating the great robustness of HI-PPI. The variance of Micro-F1 also increases with perturbation ratio.

Fig. 4figure 4

The performance of four advanced PPI prediction methods on PPI types of SHS27K

HI-PPI address the imbalanced distribution of PPI types

In STRING database, interactions between proteins can involve one or more of the following types: reaction, binding, post-translational modification (ptmod), activation, inhibition, catalysis, and expression. The distribution of PPI types varies across the dataset. For instance, these interaction types account for 34.97%, 33.73%, 1.85%, 4.86%, 3.09%, 20.91%, and 0.60% of the total interactions in SHS27K, respectively. The bias introduced by the imbalance in PPI type affects the predictive performance of deep learning-based methods, making the accurate prediction of the minority types a critical metric for assessing generalization ability.

We evaluated the Micro-F1 for each interaction type in the SHS27K dataset, comparing the performance of HI-PPI with advanced methods from previous studies. As shown in Fig. 4, our proposed method demonstrates superior performance in predicting reaction, binding, activation, inhibition, and expression interaction types. Notably, the improvements achieved by HI-PPI are more significant for minority interaction types. The gated interaction-specific mechanism enabled the identification of diverse interactions between different regions of the same protein pair, resulting in stable predictions for minority PPI types. For the rarest interaction type, HI-PPI outperformed competing methods with improvements ranging from 20% to 100%, highlighting its strong generalization capability in addressing the imbalance of interaction types in PPI datasets. MAPE-PPI achieved the best overall performance and demonstrated stability across most interaction types. In contrast, HIGH-PPI excelled in predicting majority types but was less effective for minority types.

We also conducted a case study on post-translational modification (PTM) in the test set of SHS27K. HI-PPI successfully predict PTMs between 9 pairs of proteins that other methods failed to identify. In the prediction result, proteins ENSP00000215832 and ENSP00000250971 occur frequently, revealing their central role in the regulation of cellular process. Furthermore, as PTM-mediated interactions are often dysregulated in diseases, the related proteins can contribute to the identification of novel drug targets or mechanisms of action. The details of protein pairs are available in Table S1 in supplementary file.

Table 2 Ablation study on components of HI-PPIInteraction-specific learning and hyperbolic operation improves the performance

We investigate the effectiveness of interaction-specific learning and hyperbolic operations by conducting an ablation study. The omission of the hyperbolic operation, referred to as w/o hyperbolic, involves replacing the hyperbolic GCN with a standard Graph Isomorphism Network (GIN). Similarly, the exclusion of element-wise product operations and the self-attention mechanism is denoted as w/o inter A and w/o inter B, respectively.

The results of the ablation study are presented in Table 2, showing a substantial performance decline without the hyperbolic GCN and interaction-specific learning components. These results demonstrate the notable performance improvements achieved by our proposed method, validating its effectiveness in addressing the limitations of modeling PPI network and pairwise interactions. The removal of hyperbolic GCN lead to a more significant decline in performance than removing interaction-specific operation. This result highlights the critical importance of hierarchical structure learning within the PPI graph for accurately predicting unknown protein–protein interactions. Among the four evaluation metrics, the decline is most pronounced in the area under the precision-recall curve (AUPR), demonstrating that HI-PPI consistently achieves improvements across varying thresholds. Additionally, the performance of HI-PPI is significantly more stable than other conditions, indicating that the integration of hyperbolic operations and interaction-specific learning contributes to both improved and more stable performance.

HI-PPI efficiently identifies the hierarchical level of proteins in PPI network

Hierarchical levels are crucial for understanding the functional organization of PPI networks. In hyperbolic space, the distance between a vector and the origin inherently represents its position in the hierarchy. Consequently, the node embeddings computed by HI-PPI naturally capture the hierarchical relationships present in the PPI network. Compared to commonly used identifiers for hub proteins [30], such as node degree, hyperbolic distance provides a direct representation of hierarchical levels. Consequently, utilizing hyperbolic distance helps to distinguish hub proteins from “bridge” proteins—those that primarily function to connect different clusters within the PPI network without exerting control over multiple cellular processes.

Table 3 The node degree, hyperbolic distance, and closeness centrality of proteins with node degrees ranging from 30 to 40

Here, we select proteins with node degrees ranging from 30 to 40 in the SHS27K dataset. For each protein, we compute its hyperbolic distance, which is the distance between the origin and the corresponding node embedding computed by HI-PPI. The hierarchical level of each protein is assessed using closeness centrality, defined as the sum of the shortest paths from a node to all other nodes—a commonly used indicator of hierarchical level. Proteins with higher closeness centrality are typically more central within the hierarchical structure, whereas a shorter hyperbolic distance to the origin corresponds to a higher hierarchical level. As shown in Table 3, closeness centrality decreases with increasing hyperbolic distance. The correlation coefficient between hyperbolic distance and closeness centrality is −0.6504, which is substantially stronger than the correlation coefficient of 0.3005 between node degree and closeness centrality. These experimental results confirm a significant relationship between hierarchical level and hyperbolic distance to origin, demonstrating the effectiveness of hyperbolic embedding in capturing hierarchical information within protein–protein interaction networks.

Fig. 5figure 5

The visualization of 195 proteins related in 2D hyperbolic space

Furthermore, to explore the hierarchical structure captured in hyperbolic space, we provide a visualization of a subgraph from the SHS27K dataset, comprising 195 proteins associated with PTMs interactions. In hyperbolic space, the origin can serve as a root node, and the distance to the origin provides a natural way to encode hierarchical relationships without requiring additional constraints. As illustrated in Fig. 5, the 195 nodes are divided into 12 groups, each located at varying distances from the origin. Notably, the proteins in group 5 (ENSP00000005340, ENSP00000078429, ENSP00000228307, ENSP00000206249, ENSP00000261991, ENSP00000251630, ENSP00000259089, ENSP00000256078, ENSP00000220507, ENSP00000256953, ENSP00000244741, ENSP00000254654, ENSP00000179259) are positioned closer to the origin, indicating their higher-level status and suggesting their role as core proteins for PTMs interactions. Detailed information about each group is provided in Additional file 1: Table 2.

HI-PPI maintains relatively stable performance on proteins with a large number of residues

The protein structure data used in our experiments are provided by AlphaFold2 [31], one of the most advanced deep learning models for protein structure prediction. While AlphaFold2 achieves remarkable accuracy, its performance is limited for long protein sequences, particularly those exceeding 1000 residues. As a result, variations in the quality of predicted structures may occur. To assess the impact of these variations on HI-PPI, we conducted a case study using the SHS27K test set. The test set comprises 1524 protein pairs, of which 221 involve at least one protein with more than 1000 residues (referred to as group A), while the remaining 1303 pairs involve only proteins with 1000 or fewer residues (group B). As shown in Table 4, the Micro-F1 and accuracy of HI-PPI on group A are 0.0213 and 0.0142 lower than those of group B, respectively. This marginal decline suggests that the presence of proteins with more than 1000 residues does not significantly impact HI-PPI’s performance. Therefore, HI-PPI maintains stable predictive performance despite variations in the quality of input structural data.

Table 4 Performance of HI-PPI on protein pairs that relate to different size of proteins

Comments (0)

No login
gif