Targeted anti-cancer immunotherapy can be directed against neoantigens, arising somatically in an individual, or tumor-associated antigens (TAAs) which are unmutated loci restrictively expressed in the tumor.1 In recent years, these approaches have found broad application in clinical trials and especially TAAs promise cohort-specific treatment options with different strategies.2 The approaches use different vector systems to ultimately stimulate or exploit an adaptive immune response against the targeted antigen by engaging peptide-loaded major histocompatibility complex class I (MHC-I) interactions with activated cytotoxic T lymphocytes (CTLs).3 However, one key challenge in making these therapies more efficacious lies in finding antigens that are broadly applicable and induce a durable anti-tumor response without serious off-tissue effects.4 Intricacies like immune-evasive antigen loss in the tumor and life-threatening adverse events are likely responsible for the scarcity of approved antigen-targeted immunotherapies.5 6 Hence, given the increasing interest in these therapies, there is a need for computational pipelines which include biomedical parameters such as the risk of side effects, tumor restriction and immunogenic potential into practical frameworks. We intended to address this need with our work, applying our methodology to find antigens against metastatic uveal melanoma (UM).
UM is the most frequent primary ocular malignancy in adults and a poorly treatable cancer.7 8 Approximately half of the patients develop metastases within 10 years, usually in the liver, with a median post-metastasis survival time of less than a year.9 Despite recent advances with tebentafusp (tradename Kimmtrak) in HLA-A*02:01-positive patients, treatment options for metastasized UM show limited efficacy.8
Our goal in this study was to develop a data-driven computational predictor for TAAs with a predicted low risk of tissue damage that addresses more of the above intricacies. In an ensemble model approach, we combined expression in tumor and tissue, tumor gene importance, and database knowledge to predict the treatment efficacy and tolerability of TAAs and their derived epitopes.
To support our in-silico model, we selected epitopes of predicted high or low efficacy for UM and experimentally compared their immunogenic potential against tumor and off-target cells with that of epitopes recommended by established computational tools. We also quantified the MHC-binding affinities of the selected epitopes and confirmed that our pipeline discriminates correctly between high-affinity and low-affinity peptides. Further, we show that pools of our high-efficacy peptides elicit an interferon gamma (IFN-γ) response in autologous CD8+ T cells in vitro. Additionally, T cells primed with these candidate peptides killed cells of the UM cell line 92.1 more efficiently than T cells primed with control peptides. We provide the annotated results as a database free of charge for non-commercial use at https://www.curatopes.com/uvealmelanoma.
MethodsOverview of the bioinformatics peptide selection workflowThe methodology developed to select TAAs with a predicted low risk of tissue damage for targeted immunotherapy integrates a data-driven computational workflow and an experimental validation procedure (figure 1A). The bioinformatics workflow integrates transcriptomics, network analysis and supervised machine learning (ML) for predicting peptide binding and immunogenicity as follows:
Transcriptomics-based prioritization of TAAs minimizing immune-related adverse events (irAE). We utilized transcriptomics and histological data to select highly tumor-restricted genes that show no residual expression in all healthy tissues for which there are quantitative data available in standardized repositories. To this end, we obtained transcriptomics data from primary UM samples and healthy tissue and prioritized protein-coding cancer genes that (a) are sufficiently expressed in the majority of the inspected UM samples, (b) lack histological evidence of protein expression in normal tissues recorded in the Human Protein Atlas (HPA), and (c) display high-in-tumor, low-in-tissue expression in the transcriptomics data available in Genotype-Tissue Expression (GTEx) (RNA-seq expression in 90% of normal tissues samples lower than in 90% of UM samples).
Candidate peptide k-mer extraction and post-hoc screening. We utilized a FASTA file of the human proteome to retrieve all annotated protein sequences for the prioritized genes, and enumerated k-mer peptides of length 9–12 amino acids for each protein sequence. To further avoid cross-reactivity, we screened each candidate peptide against the complementary part of the human proteome (eg, all non-prioritized genes) and excluded peptides with literal sequence matches.
Efficacy score (ES)-based candidate peptide ranking. To select the most promising candidate peptides for experimental validation, we developed a multivariate score function (named ES) that aggregates information on gene expression, cancer gene network connectivity, peptide binding affinity and immunogenicity to rank the peptides:
The ES function is a probability chain composed of five normalized subfunctions (a–e) modeling the (a) expression of the prioritized gene in the tumor utilizing constrained tumor median expression (consTME), (b) tumor indispensability of the prioritized gene based on the prominence of its position in a network of known cancer genes (Idspx) (figure 1B and online supplemental table S1), (c) HLA allele-specific affinity of the candidate peptide computed utilizing a constrained binding affinity based on NetMHCpan predictions (consIC50) (online supplemental table S2), (d) MHC binding probability as predicted by an ML model (gBP), and (e) candidate peptide immunogenicity inferred via an ML model (gAP). gBP and gAP are in-house random forest models (RF) that use the candidate peptide’s physicochemical properties as input (hydrophobicity, isoelectrical point, molecular weight, stability index, polarity and sequence length). To generate the training set for both models, we selected reliable peptides from the MHCBN database V.4.0, and additional binder and non-binder peptides identified through crystallography experiments. To train the models, we performed 100 iterations of weighted subsampling from the training data, and for each trained an RF model with 10 000 trees. For both models, responses were discretized at the threshold of 0.5 and the respective averages of the 100 RF models’ discretized classifications were used as the probability output of gBP and gAP. Model performance for gBP and gAP was evaluated and compared with other models on independently curated validation sets from the IEDB10 (figure 1C, table 1 and online supplemental table S3). We feed the results for each candidate peptide into the function and rank the candidate peptides according to the obtained ES value.
Overview of new elements in the study’s computational methodology. (A) A bioinformatics pipeline selects efficacious and tolerable tumor-associated antigen (TAA)-derived peptides. An efficacy score (purple) is derived from bioinformatics analyses of the gene (blue), peptide (green) and epitope (yellow) level. Together with additional filters, the efficacy score facilitates the selection of epitopes for experimental validation and their later induction into a database. GTEx, Genotype-Tissue Expression project. HPA, Human Protein Atlas. (B) The unfavourability of successful therapy evasion through antigen repression was estimated with a network-driven approach. Genes with a network neighborhood that confers a survival advantage to the tumor are less likely to be repressed and thus constitute superior antigen candidates. The identification and ranking of these genes involve a biochemical interaction network, database knowledge, and the derivation of gene-specific normalized values, the indispensability index. (C) HLA allele-agnostic predictors of peptide HLA binding and immunogenicity trained on physicochemical peptide properties show comparable performance to established tools. Left: the training and testing workflow of the machine learning predictors involved an ensemble model of random forests and the averaging of binary votes across predictions. Right: column plot of performance metrics for the two trained predictors compared with the established algorithms NetMHCPan (binding) and IEDB Tools (immunogenicity). All performances were evaluated on an independent validation set curated from IEDB.
Table 1Validation benchmarks generated on an The Immune Epitope Database (IEDB) epitope subset to compare the binding and activity predictors trained in this study to published alternatives (NetMHCpan and IEDB immunogenicity prediction, resp.) Bold values indicate the higher performance for the respective features.
A detailed description of each individual step of the bioinformatics workflow is given in online supplemental methods.
For validation, we selected the top 20 ES-ranked candidate peptides (high efficacy or HE) for HLA-A*02:01 together with 20 randomly selected candidate peptides scored ES=0, here considered negative controls (low efficacy or LE). Furthermore, to compare our results with a gold-standard approach, we selected 20 candidate peptides with an ES of zero but a standard (NetHMCpan)-predicted binding affinity that matches the HE peptides (alternative predictor or AP). These 60 peptides were synthesized at laboratory quality and 90% purity, and for a subset, their binding affinities to HLA-A*02:01 were estimated experimentally. Further, we split each peptide tier (HE, LE, and AP) into four pools of five peptides and performed in vitro experiments with HLA-A*02:01-positive healthy-donor peripheral blood mononuclear cells (PBMCs), in which we measured IFN-γ secretion by flow cytometry and ELISA. We also performed cytotoxicity assays by coculturing the HLA-A*02:01-positive UM cell line 92.1 with the peptide-stimulated PBMCs.
In-house UM sample preparation and RNA sequencingIn accordance with current regulatory and ethics standards within the context of clinical trial NCT01983748,11 patients with UM gave informed consent before tumors were surgically removed. Tumor samples were preserved in RNAlater (ThermoFisher Scientific, Waltham, Massachusetts, USA) and RNA was extracted with RNeasy Mini kits (Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Transcriptome sequencing was performed by a commercial service provider (CeGat, Tübingen, Germany). For analysis, raw FASTQ files were quality controlled and aligned against the human reference genome version hg38 using STAR.12 Quantification was performed using StringTie13 against the Gencode comprehensive annotation version 28. For further details on the sample processing, see online supplemental figure S1.
In vitro testing of candidate peptides with healthy donor or patient with UM PBMCsThe selected peptides were synthesized at laboratory quality and 90% purity by GenScript (Leiden, The Netherlands). Leukapheresis products (four healthy donors) or fresh blood (two healthy donors, two patients with UM) were obtained based on their positive cytomegalovirus (CMV) and HLA-A*02:01 status while adhering to current regulatory and ethics standards including obtaining informed consent. PBMCs were purified by Ficoll gradient centrifugation (800 g, 20 min, 20°C, brake off). The four healthy-donor leukapheresis-derived PBMCs were subsequently cryopreserved in liquid nitrogen at a concentration of 100 million/mL in freezing medium containing 10% DMSO. After thawing, 1–2 million cells/mL were recovered for 18–24 hours in serum-free TexMACS GMP medium (Miltenyi Biotec, Bergisch-Gladbach, Germany) at 37°C. Before peptide stimulation, cells were harvested by centrifugation and counted. Batches of 20 million live PBMCs were stimulated per peptide pool or CMV positive control (human PepTivator CMV pp65, Miltenyi Biotec, Bergisch-Gladbach, Germany) at a total peptide concentration of 1 µg/mL, or left unstimulated. Peptide loading was performed in 20 mL of prewarmed serum-free medium for 2 hours at 37°C. Afterwards, cells were spun down and washed with medium to remove unbound peptides. Cells were then incubated at an initial concentration of 2 million/mL for 9 days at 37°C in Roswell Park Memorial Institute (RPMI) 1640 medium (Gibco by Life Technologies GmbH, Darmstadt, Germany) supplemented with 1% (v/v) GlutaMAX (Gibco by Life Technologies GmbH, Darmstadt, Germany), 50 IU/mL IL-2 (Aldesleukin, Novartis Pharma GmbH, Nürnberg, Germany) and 1% (v/v) human AB serum (Anprotec, Bruckberg, Germany). During day 5 of incubation, culture volume was increased with fresh RPMI 1640 medium with supplements to a total of 2.5 times the volume on day 0. On day 9 of stimulation, culture supernatant was probed for IFN-γ (ELISA MAX Deluxe Set, Biolegend, San Diego, USA) and stimulated PBMCs were investigated with an IFN-γ Secretion Assay (Miltenyi Biotec, Bergisch-Gladbach, Germany) and by Incucyte Live Cell Imaging (Sartorius, Göttingen, Germany) according to the manufacturers’ instructions.
The HLA-A*02:01-positive14 UM cell line 92.115—kindly provided by Klaus Griewank, University Hospital Essen—was selected as a cytotoxicity target and cultivated in UM medium containing RPMI 1640 (Gibco by Life Technologies GmbH, Darmstadt, Germany), 2 mM L-glutamine (Gibco by Life Technologies GmbH, Darmstadt, Germany), 10% fetal bovine serum (Merck, Darmstadt, Germany), and 1× Antibiotics-Antimycotics (Gibco by Life Technologies GmbH, Darmstadt, Germany) at 37°C with 5% CO2. The 92.1 cells were stained with 0.75 µM Cytolight Green (Sartorius, Göttingen, Germany) in PBS for 20 min at 37°C before the Cytotox Assay. After two washing cycles, stained 92.1 cells were seeded in a 96 well plate and incubated for 30 min at 37°C to allow for reattachment. Stimulated PBMCs were then added in an effector:target ratio of 4:1 (final volume 200 µL) and the culture medium supplemented with Annexin V Red Dye (Sartorius, Göttingen, Germany) to facilitate ongoing staining of apoptotic cells. Green and red fluorescence channels were recorded once every 60 min over a period of 45 hours. The aggregated area of red cells (µm2/image) as a measure of cell death was automatically quantified by Incucyte Base Software (Sartorius, Göttingen, Germany) and background-corrected against stained 92.1 cells cultured in the absence of PBMCs.
In vitro HLA binding validation of selected candidate peptidesBinding affinities of predicted epitopes were analyzed by a UV-mediated peptide exchange assay using in-house produced peptide*HLA-A*02:0116 (Sanquin, Amsterdam, The Netherlands), of which the heavy chain is biotinylated, as described previously.17 Briefly, peptide exchange was performed in duplicate by combining 0.53 µM conditional p*HLA complex in the presence or absence of 50 µM of the candidate peptide. The mixture was exposed for 30 min to 366 nm UV light and subsequently incubated for 30 min at 37°C. Peptide exchange efficiency was analyzed using a beta-2 microglobulin (β2M)-specific ELISA, which only detects peptide-stabilized HLA class I complexes, indicative of peptide binding as described previously.17 Briefly, exchange reactions were incubated for 1 hour at 37°C on 2 µg/mL streptavidin-coated Nunc MaxiSorp plates. Non-bound material was removed by washing. Plate binding was assessed with 0.3 µg/mL horseradish peroxidase-conjugated anti-human β2M antibody and detected with 2,2′-azino-bis(3-ethylbenzothiazoline-6-sulfonic acid) diammonium salt substrate solution. The recorded absorbances were normalized to the absorbance of a known HLA-A*02:01 peptide ligand with a high affinity (NLVPMVATV, representing 100% relative affinity). Controls included non-exchanged conditional p*HLA complex, an HLA allele-specific non-binder (IVTDFSVIK) and UV irradiation of the conditional p*HLA complex in the absence of a rescue peptide.17
StatisticsModerated pairwise t-tests of the readouts between experimental conditions were performed with limma (Richie et al) in R, applying multiple testing correction and using donor identity as a covariate. An alpha threshold of 0.05 was applied.
ResultsIn this work, we constructed and deployed a pipeline to rank the predicted therapeutic efficacy of potential antigenic epitopes for a specific tumor of interest (see figure 1A and the Methods section for a detailed description). In the following, we illustrate the use of the pipeline, utilizing as case study the selection of efficacious and tolerable peptides for UM therapy.
Antigen and peptide selection to minimize the risk of irAEWe retrieved from public repositories the empirical mRNA abundancies of 80 primary UM biopsies18 and compared them with the GTEx data set of healthy tissues to select protein-coding genes with a high-in-tumor, low-in-tissue expression profile.19 This first step is supposed to reduce the risk of severe irAEs in a later immunotherapy setting (figure 2A). We found 9556 protein-coding genes with sufficient baseline expression in UM, of which 1722 turned out favorable due to lack of histological evidence of protein expression in healthy tissue according to the HPA (figure 2B). After filtering against GTEx, we were left with 22 protein-coding genes. Next, we confirmed that these prioritized genes are stably expressed in an independent cohort of 14 primary UM samples obtained at the Department of Ophthalmology of the Uniklinikum Erlangen (figure 2C). Interestingly, melanocyte-derived antigens like MLANA, TYR and PMEL are stably expressed across our selected UM samples.20 Moreover, TMEM200C has been recently identified as a potential marker for progression in UM.21
Systematic expression-based prioritization of antigens addresses the risk of immune-related adverse events under therapy. (A) A multistep filtering process discards genes with unfavorable expression properties. In brief: first, genes not coding for protein or not sufficiently expressed at RNA level in 90% of tumor biopsies were removed. Next, genes with histochemical evidence of protein expression in any tissue were eliminated. Finally, we discarded genes whose RNA expression in tissue was higher than that in tumor, obtaining the prioritized genes. From those, we kept peptides not occurring in the proteome complementary to the prioritized genes. See Lischer et al19 for a detailed description of the workflow. (B) The procedure described in (A) prioritized 22 genes for uveal melanoma (UM). The selection funnel represents the filter cascade described in (A), with each slice listing a filter criterion and the number of remaining genes. Tumor expression statistics were calculated from a published set of 80 primary UM samples.18 Tumq10, Tisq90 – 10th/90th percentile of RNA expression in tumor/tissue. (C) The 22 prioritized genes are stably expressed in an independent cohort of 14 primary UM biopsies. Heat map of log2-transformed transcripts per million (TPM) estimates are shown. (D) Considerable differences among the fractions of suitable peptides arising from the 22 prioritized genes are explained by the corresponding proteins’ sequence homology environment. Peptides whose sequences appear in non-prioritized proteins are discarded (red) while those with non-conflicting sequences (blue) were retained.
To further avoid peptide cross-reactivity, we first extracted all overlapping k-mer peptides of length 9–12 for all proteins expressed from the 22 prioritized genes. Next, we screened these peptides against the complementary proteome, that is, all human protein sequences arising from non-prioritized genes, and discarded peptides with literal sequence matches. As seen in figure 2D, when applying this additional filter, we found that the amount of selected tumor peptides depends on the extent of the overlap between each candidate tumor protein and the rest of the human proteome. For example, TRPM1 belongs to a family of highly conserved proteins,22 and therefore the vast majority of its peptides were discarded in the above filter. Also, more than 90% of peptides from the OCA2 gene, which encodes for a 12-transmembrane domain protein with homology to a superfamily of permeases,23 are filtered out. Overall, 9 of the 22 prioritized genes featured peptides in their expressed protein isoforms that were discarded (ABCB5, ELFN1, CABLES1, TMEM200C, SLC45A2, ALX1 and RAB38, OCA2 and TRPM1). In total, we removed 11 343 of the original 51 374 unique peptides, leaving 40 031 for subsequent analysis.
Ranking of antigens based on their importance in a cancer-targeted interaction networkWe hypothesized that selecting peptides from TAAs with a role in key cancer processes may inhibit the tumor’s ability to evade therapy by selectively suppressing gene expression.5 To substantiate this hypothesis, we generated simulations from an agent-based computational model24 reflecting the interplay between melanoma and immune cells in the tumor microenvironment (figure 3A and online supplemental mat). The simulations indicate that antigens linked to central cancer cell functions (eg, cell cycle) are less prone to expression suppression, and targeting their derived epitopes leads to a more effective depletion of cancer cells. In contrast, targeting epitopes from bystander proteins not associated with cancer pathways led to inefficient tumor control and resistance to the therapeutic CTLs. This suggests that peptides arising from proteins linked to pathways and cellular functions central to cancer progression are favorable targets.
The prioritized genes’ degree of cancer association and their interaction neighborhood inform a model of indispensability for cancer cell survival. (A) Immune pressure on antigens that support cancer cell survival leads to improved tumor control in a computational model of micrometastasis dynamics. Top left: the sketch visualizes the model’s principle, where cancer cells present a predefined set of peptide epitopes in a tumor microenvironment surveilled by antigen-presenting cells (APCs) and cytotoxic T cells (CTLs). The predefined epitopes are either coupled to malignancy or not (malignancy or bystander epitopes). See Retzlaff et al24 for details. Bottom left: example of the simulated growth of the 3D tumor mass (blue) over time, with immune infiltration in orange. Right: tumor growth trajectories under immune pressure (starting on day 25) from CTLs targeting either a malignancy antigen (top) or a bystander antigen (bottom). Traces from five simulations with partially randomized initial conditions are overlaid. (B) UM-specific prioritized genes interact with cancer-associated genes like MYC and MITF. Shown is the prioritized genes’ (large nodes) immediate neighborhood excised from the much bigger customized cancer network. Node fill and border colors represent gene expression at the RNA (biopsies) and protein (92.1 cell line) levels, respectively. (C) The prioritized genes’ node degrees and importances for UM show no inflation compared with the network background. Waterfall plots of relevant node features are shown for those genes in the network with non-zero values, with prioritized genes (red) embedded in the lower half of the distribution. (D) The prioritized genes are distributed in an unbiased manner across the range of the indispensability index (Idspx), as observed from the waterfall plot. Fifteen genes are labeled in the plot, and genes with an Idspx of 0 are omitted.
To incorporate the above idea, we generated a score describing a gene’s cancer importance (GI) by counting the associations between the gene and cancer pathways taken from a manually curated list of cancer-relevant GO terms and other databases of cancer genes (figure 1B, see online supplemental mat). Some well-studied cancer genes score very high in GI, with TP53 as the top-scoring gene (GI=235) and TNF in the second rank (GI=122). Our candidate genes had a maximum GI value of 14 for TMEM200C, while C14orf169 and PNMA6A had a GI value of zero, indicating no established direct association with cancer pathways. The melanocyte antigen MLANA had a GI value of two, the same as the melanoma-associated gene TYRP1.
As proteins execute their functions while embedded in biochemical networks, we pooled data from publicly available databases to reconstruct an interaction network around our 22 selected UM genes, all genes annotated in DriverDBv325 and their direct interactors. Next, we derived a mathematical function to define the importance of a gene in the cancer network based on the gene’s GI and that of its direct interactors (gene indispensability index, Idspx, see online supplemental mat). The Idspx attempts to rank the candidate genes in terms of how hard it is for the tumor to evade immunotherapy against an antigen. We computed this metric for our 22 candidate genes and all the other genes belonging to the network (figure 3C). We found that well-known cancer genes were top-ranked genes according to this metric, including YBX1 (rank 1), FOXP3 (rank 2), MYC (rank 9), and TP53 (rank 13). Our prioritized genes were widely distributed across the interval of the Idspx values, with CABLES1 ranked highest (rank 679, Idspx=0.8837) and C11orf71 ranked lowest (rank 12 928, Idspx=0.375, figure 3D).
Peptide ranking according to a data-driven, network-driven and model-driven computational scoreThe aim of our analysis was to rank peptides based on the expression and network indispensability of the UM prioritized genes, as well as on the peptides’ MHC binding affinity and their capability to elicit an immune response. To this end, we derived a score function (ES) that was formalized in mathematical terms as a probability chain composed of five normalized subfunctions (a–e) accounting for the tumor expression of the prioritized gene (a) and its importance in a cancer network as quantified by Idspx (b), the MHC binding probability (c) and the immunogenicity of their derived peptides as predicted by ML models (d), and the allele-specific peptide binding affinity (e, see online supplemental mat).
For all the peptides selected based on the high-in-tumor, low-in-tissue expression of its associated tumor antigen and the minimization of the probability to elicit irAE, we retrieved the data necessary for calculating the five subfunctions and estimated their ES. We calculated the ES for all pairwise combinations of the 40 031 peptides identified above and 36 frequently studied HLA alleles (approx. 1.4 million epitopes). Formally, the ES ranges from 0 (the worst score) to 100 (the best score), but in our case study of UM only 47 408 HLA-epitope combinations yielded a non-zero score (3.3%). Two genes, FNDC10 and SMIM10L1, were assigned scores of 0 for all their peptides because the genes’ Idspx was 0. Only 1534 epitopes derived from 17 genes had an ES of at least 1 (figure 4A). In this subset, the median ES was 2.84 and the maximum 42.27, which corresponds to a peptide derived from the gene TSPAN10 (figure 4B). We further investigated the distribution of ES values higher than zero for the remaining 17 candidate genes and found that most gene candidates offered a broad range of potential epitopes to select from. MLANA, a known immunogenic antigen in metastatic CM, had a relatively low score range (1 to 15.81) compared with the other prominent CM antigens like PMEL (1–31.05) or TYR (1–36.55) (figure 4B). ALX1 had only one non-zero peptide (ES=3.52).
A right-tailed empiric distribution of the efficacy score (ES) for peptides against uveal melanoma (UM) enables the straightforward selection of top candidates. (A) ES distribution is very similar across the 17 prioritized genes which produced peptides. The x and y axes have been fixed to the same intervals for global comparison. (B) The cross-gene aggregated distribution of the ES smoothens out differences between genes. Shown are only epitopes with an ES of at least 1, with 99% scored lower than 28.84. (C) Overview of the top 16 epitopes (top 1%).
We further extracted the top 1% of epitopes for a closer investigation, yielding 16 epitopes generated from 6 genes with an ES higher than 28.81 (figure 4C). TSPAN10 generated five of these epitopes for three different alleles; seeing that one was specific for HLA-A*02:01, broad applicability may be feasible for this antigen. TYRP1 produced three high-ranking epitopes; this gene has been investigated in some clinical trials using monoclonal antibodies with limited success in relapsed CM patients.26 27 TYR, another TAA associated with CM, produced two highly ranked epitopes for two different alleles. CABLES1 presented with only one highly ranked candidate and is generally not associated with UM or CM apart from containing a potential driver mutation site.28 We obtained three top-ranked peptides from PMEL, another pigmentation gene and CM antigen, which is considered a highly ranked candidate biomarker in UM.29 Finally, the gene OCA2 produced two top-ranked peptides for the allele A*02:01. This gene is an attractive target for direct epitope intervention and other therapeutic options since it is a transmembrane protein playing a role in pigmentation and melanin synthesis, which is used as prognostic and predictive marker for CM and primary UM.
Selection of peptides for experimental validation and assessment of peptide-HLA binding in silico and in vitroTo validate our strategy, we selected the top 20 peptides as ranked by ES (HE tier) for the prevalent allele HLA-A*02:01 together with 20 randomly selected peptides with an ES of zero across all alleles as negative controls (LE tier). Furthermore, to compare our results with a gold-standard approach, we selected 20 candidate peptides with an ES of zero but a NetMHCpan-predicted binding affinity distribution that matched the HE tier’s (AP tier). We had the 60 selected peptides chemically synthesized by a commercial provider, with three ultimately failing synthesis (table 2).
Table 2Peptide candidates selected for experimental validation
Figure 5 shows a heatmap summarizing the features of the peptides belonging to the HE, LE and AP tiers. Together with their amino acid sequence, we visualized the values for the physicochemical features utilized as input for the ML models, their assessment in the five subfunctions and their ES values. The HE peptides cluster together with higher average values for hydrophobicity and lower values for isoelectric point, molecular weight and polarity. The tumor median expression is similar in all peptides, independent of their tier, with the exception of four peptides derived from PMEL, whose expression is above average. The HE peptides perform in general better in the subfunction accounting for immunogenicity (gAP). On average, HE peptides also feature a higher Idspx. The value of this subfunction is more variable for the other peptide groups. Interestingly, three peptides from the LE and AP tiers cluster together with the HE peptides and display comparable values for all the metrics except the Idspx. This suggests that peptides that would be good candidates according to most of the subfunctions were discarded due to the poor cancer-network connectivity of their antigen.
The efficacy score (ES) supersedes simpler measures of suitability. Heat map visualizing patterns in the factors chosen to calculate the ES for the 60 peptide candidates selected for validation. Three different tiers of 20 peptides each were compared: high efficacy (top ES), low efficacy (negative control: ES of 0, randomly selected), and alternative predictor (gold standard control: ES of 0, but IC50-matched to high efficacy). The columns show the physicochemical peptide features used to train the binding and activity predictors (left column group) and the factors in the ES equation (right column group) after column-wise z-score transformation. The IC50 column holds the peptides’ NetMHCpan-predicted binding affinity to major histocompatibility complex (MHC) for the HLA-A*02:01 allele. Rows are labeled with the peptide’s amino acid sequence and annotated with the ES, the computationally calculated binding energy to MHC (A*02:01), and the allocated ES tier. The high-efficacy group peptides are characterized by high hydrophobicity, an observation that is in line with established knowledge.49 50
To compare the tier predictions with state-of-the-art molecular-level computational analysis, docking and molecular dynamics simulations were carried out in a blinded fashion, that is, without knowledge by the operator of tier assignment, for each of the 60 candidates by pairing HLA-A*02:01 with the respective peptide (see online supplemental mat). An analysis of variance analysis of the extracted free energy values showed that they were significantly different between the tiers, with the HE tier characterized by stronger binding (online supplemental figure S3A), indicating that our method tends to select peptides that can form stable complexes with MHC.
To further compare the peptides’ suitability, we experimentally quantified their binding affinities utilizing a UV-mediated peptide exchange assay for a sample of 33 peptides from the three tiers (online supplemental figure S3 Right and online supplemental table S4). The tested peptides from the LE tier display a relative affinity (ie, compared with a high-affinity peptide) significantly smaller than that of the HE and AP tiers. The average relative affinity of the AP tier is higher than that of the HE tier, but the distribution of relative affinities of HE peptides is more compact, with mean and median values almost identical and all relative affinities between 38% and 106%. In contrast, the AP group contains peptides with extremely high (155) and extremely low (5) outliers of relative affinity. The LE group contains two peptides with rather high relative affinities (76% and 56%). In our view, this reflects the features of the procedure we followed to train our ML model for binding probability, in which we favored a very low false positive rate to ensure that peptides of high rank were true binders.
Functional validation of peptides in in vitro coculture assays with primed T-cells and UM cellsTo test the translatability of our predictions, we used a GMP-compliant procedure to obtain antigen-specific T cells through peptide stimulation of leukapheresis products.30 We prepared peptide mixtures such that each tier of twenty peptides was split into four pools of up to five peptides, yielding a total of 12 peptide pools, to curtail experimental demand. In the HE tier, peptides were assigned to the pools in descending order of ES. To blind experimental procedures, pools were labeled randomly and provided to the experimental team.
PBMCs from four HLA-A*02:01-positive and CMV-seropositive healthy blood donors were stimulated with one of the pools, CMV-pp65 peptides as positive control, or a negative control without peptide as shown in figure 6A.
Comments (0)