Genome-guided development of a bacterial two-strain system for low-temperature soil biocementation

Genome of Sporosarcina sp. ANT_H38 — general information

Sporosarcina sp. ANT_H38 was isolated from the petroleum-contaminated soil at the Henryk Arctowski Polish Antarctic Station in 2012 (GPS coordinates: 62°09.601′ S, 58°28.464′ W). The strain was able to grow in a wide range of temperatures, i.e., 4–37 °C, and in a pH ranging between 7 and 11 (Romaniuk et al. 2018).

The genomic DNA isolated from Sporosarcina sp. ANT_H38 was sequenced using both Oxford Nanopore and Illumina technologies and subsequently assembled into a single, circular chromosome and four plasmids (pA38H1-pA38H4). The genome has a total size of 4,685,507 bp with a GC content of 39.9%. Following the assembly, the genome was automatically annotated using Bakta version 1.6.1, and the general features of the ANT_H38 genome can be found in Supplementary File Table S1.

Genome of Sporosarcina sp. ANT_H38 — functional annotation

Functional annotation of Sporosarcina sp. ANT_H38 was performed using eggNOG_mapper software. Genes related to cellular metabolism were the most abundant (1552 genes), followed by poorly characterised genes (972), genes involved in cellular processes and signalling (789) and information storage and processing (773). In terms of specific COG categories, amino acid transport and metabolism (E, 9.7%) and transcription (K, 8.3%) dominated, followed by transport and metabolism of carbohydrates (G, 6.6%), and inorganic ions (P, 6.5%) (Fig. 2).

Fig. 2figure 2

COG categories assigned to genes found within Sporosarcina sp. ANT_H38 and their respective counts. Annotations for COG categories are as follows: A: RNA processing and modification, B: Chromatin structure and dynamics, C: Energy production and conversion, D: Cell cycle control and mitosis, E: Amino Acid metabolism and transport, F: Nucleotide metabolism and transport, G: Carbohydrate metabolism and transport, H: Coenzyme metabolism, I: Lipid metabolism, J: Translation, K: Transcription, L: Replication and repair, M: Cell wall/membrane/envelope biogenesis, N: Cell motility, O: Post-translational modification, protein turnover, chaperone functions, P: Inorganic ion transport and metabolism, Q: Secondary Structure, T: Signal transduction, U: Intracellular trafficking and secretion, Y: Nuclear structure, Z: Cytoskeleton, R: General functional prediction only, S: Function unknown

Cold adaptation features identified in the ANT_H38 genome

The genome of Sporosarcina sp. ANT_H38 has been extensively characterised to identify the genomic features responsible for its cold-adaptation capabilities. Protein sequence similarity analysis was performed using three separate reference databases, including sequences of proteins responsible for cold adaptation, cold shock proteins, and a database of cold-adapted predicted proteins CAPP (Barria et al. 2013). In total, 45 cold-adaptive genes were identified in the ANT_H38 genome. Of these, eight proteins matched to the Cold Adaptation Database CAD, including proteins involved in global DNA transcription regulation (DnaA, NusA), translation initiation (InfABC), and cold shock proteins (CspB, Pnp). Additionally, 35 proteins were hit to the CAPP database, including multiple 30S and 50S ribosomal proteins, and proteins involved in carbohydrate metabolism (i.e. succinate-CoA ligase [ADP-forming] subunit alpha, SucD; 1,4-dihydroxy-2-naphthoyl-CoA synthase, MenB). Furthermore, a search versus the CSPdb database of cold shock proteins revealed two additional copies of the cold shock protein CspA. All of these proteins play a role in the strain’s ability to survive in temperatures below freezing by regulating its metabolism (Ivancic et al. 2013). A full list of identified proteins was presented in Supplementary File Table S2.

Mobilome of Sporosarcina sp. ANT_H38

Investigation of mobile genetic elements (MGEs) such as plasmids, transposable elements or (pro)phages provides insights into the mechanisms behind horizontal gene transfer, genomic evolution, and the acquisition of accessory genes (Tokuda and Shintani 2024). In this study, we delved into the Sporosarcina sp. ANT_H38 mobilome, expanding our knowledge of the strain’s genetic complexity.

In total, four plasmids with sizes of 6234 bp (pA38H1), 10,106 bp (pA38H2), 10,532 bp (pA38H3), and 27,152 bp (pA38H4) (Fig. 3) were identified. The GC content of these plasmids ranges from 33.3% (pA38H1) to 36.7% (pA38H3), which is significantly lower than the GC content (39.9%) of the chromosome. Outside of genes responsible for plasmid functioning and maintenance, i.e. replication (REP) and mobilisation for conjugal transfer (MOB), most sequences were only assigned a putative function. These genes were described in Supplementary File Table S3. Additionally, a visual representation of the plasmids, with key features marked, can be found in Fig. 3.

Fig. 3figure 3

Linear presentation of the ANT_H38 plasmids. Each line represents a plasmid and each arrow corresponds to a protein-coding sequence. Arrows linked by a grey-scale block indicate at least 30% protein sequence similarity (determined by the clinker tool version 0.0.27). Protein colours correspond to their functional modules, as shown in the legend

Comparison with plasmids deposited within the PLSDB database (version 2023_11_03_v2) revealed no significant similarity to any known plasmid sequence, pointing towards a distinct evolutionary origin.

Furthermore, a total of 15 transposase genes were identified within the Sporosarcina sp. ANT_H38 genome, representing families: IS150 (2 elements), IS1182 (2), IS21 (1), IS30 (1), ISL3 (2). A summary of the identified transposases and their genomic locations is provided in the Supplementary File Table S4.

Finally, the analysis has revealed the presence of four tailed prophages within the genome, namely phiA38H1 (coordinates: 2,402,250 to 2,430,344; 28 kb in total; 38.26% G-C content), phiA38H2 (3,777,853 to 3,826,165; 48 kb; 41.61%), phiA38H3 (3,843,298 to 3,880,511; 37 kb; 40.12%) and phiA38H4 (4,345,518 to 4,387,388; 42 kb; 41.56%). We were able to identify perfect direct terminal repeats bordering phiA38H3 – AATCGGCACGAAATCGGCACGAA (23 bp), and phiA38H4 – ATTACATCATGCCGCCCAT (19 bp), both located in intergenic regions, which might act as attachment sites recognised by phages during their integration into host’s genome. This observation indicates that both prophages might be complete and functional. The comparison of their genomes revealed limited internal protein sequence-based similarity (Fig. 4).

Fig. 4figure 4

Linear representation of the ANT_H38 prophages. Each line represents a prophage and each arrow corresponds to a protein-coding sequence. Arrows linked by a grey-scale block indicate at least 30% protein sequence similarity (determined by the clinker tool version v0.0.27). Protein colours correspond to their functional modules, as shown in the legend

Exploring the gene context of these prophages we observed that phiA38H3 has integrated near or within the iron-sulphur gene cluster (SufBCD), which may potentially impact host metabolism. Upstream of phiA38H4 lies a cluster of recombination-related proteins, including FtsK and a type I restriction-modification system. These may function in defending against foreign DNA and altering the host genome during prophage induction. Interestingly, this 29-kb region is flanked by a truncated attachment site (TTACATCATGCCGCCCA) missing one nucleotide on the 5′ end. Another tyrosine integrase (GGGNBK_21310) with low sequence identity (34%) to phiA38H4’s integrase (GGGNBK_21670) is also present, suggesting this region could be transduced independently or as part of the prophage genome.

Comparative genomics of Sporosarcina spp.

To gain insight into the genomic diversity within the Sporosarcina genus a comparative genomic analysis was performed. Based on completeness of the genome, we selected 12 Sporosarcina strains from the NCBI genome database: S. ureilytica (GCF_001753205.1); S. quadrami (GCF_014836615.1); S. thermotolerans (GCA_033253685.1); S. pasteurii NCTC4822 (GCF_900457495.1), BNCC337394 (GCF_004379295.1), and DSM 33 (GCF_031822395.1); S. psychrophila DSM 6497 (GCF_001590685.1); Sporosarcina sp. ANT_H38 (GCF_008369195.1); S. ureae P32a (GCF_002109325.1), P17a (GCF_002082015.1), P8 (GCF_002101375.1), and S204 (GCF_002081995.1); S. aquimarina SN-308-OC-B4 (GCF_019748715.1). The RIBAP pipeline was used to determine core and accessory gene sets. Following this, genes either unique to specific strains or shared amongst multiple strains (Fig. 5) were extracted and re-annotated using eggNOG-mapper v2.

Fig. 5figure 5

UpSet plot displaying the core genome and accessory genes for analysed Sporosarcina strains. Sets with less than 20 common genes were not displayed. The ANT_H38 strain was marked in blue, while Sporosarcina thermotolerans, as the strain with the most unique genes, was marked in red. The number of unique genes for each strain is shown on the bar plot on the right side of the figure, along with the total gene count in each genome (in parentheses)

The core genome was estimated to be 1865 genes, comprising between 40.4% (S. thermotolerans) and 60.2% (S. pasteurii BNCC337394) of the entire genome. Strains with the highest number of unique genes were the ones occupying more specialised environmental niches, specifically S. thermotolerans (1251), Sporosarcina sp. ANT_H38 (831), and S. psychrophila DSM6497 (595). Furthermore, despite the low number of unique genes within each S. pasteurii strain (1, 1, and 9 for BNCC33794, DSM33, and NCTC4822 strains, respectively), all three share 151 genes not found in other genomes. In contrast, S. ureae strains exhibited a greater degree of intra-species diversity, with a higher number of unique genes (117–167), and only 52 unique within the species and shared between four strains. The biggest unique intersection (understood as genes shared by at least two strains that are unique to them) was observed for both psychrotolerant strains, Sporosarcina sp. ANT_H38 and S. psychrotolerans. This set includes genes responsible for glycogen synthesis and breakdown (glgABDP); foldase protein psrA1; multiple stress response proteins (csbD; yceC and two copies of yceD); carbohydrate transporters/permeases (araQ, gntP, ugpA); and lichenan-specific phosphotransferase system (licABC). Overall, the function of the aforementioned genes indicates adaptation for cold environments. This result further confirms the results of UBCG-based phylogenetic analysis, indicating the similarity between these two strains, and their distance from other analysed strains.

Furthermore, the set of genes unique to Sporosarcina sp. ANT_H38 was analysed. The most prominent feature of this set was the abundance of genes responsible for carbohydrate metabolism. These genes are summarised in Supplementary File Table S5. Other than that, we additionally identified six copies of btuD gene and a single copy of btuF, responsible for vitamin B12 import and binding; as well as the uvrA gene, involved in protection from UV radiation.

Identification and phylogeny of the urease-encoding gene cluster

The production of urease in bacteria is well described, relying on the ureABCDEFG gene cluster. The first three genes, ureA, ureB and ureC, encode the three subunits of the urease enzyme, while the rest of genes have accessory functions, and are responsible for maturation and activation of the enzyme. To investigate the biotechnological potential of the ANT_H38 urease, we sought to understand its evolutionary relationship to ureases in other Sporosarcina strains. Here, we focus on the ureA gene, which encodes the large catalytic subunit.

A phylogenetic analysis was conducted to trace the evolution of the ureA gene across 13 Sporosarcina strains (the same strains were previously used for comparative genomic analysis), and Planococcus sp. PAMC_21323 (GCF_000785555.1) as an outgroup. First, we constructed a reference tree based on a comprehensive set of core bacterial genes using the UBCG2 pipeline (Kim et al. 2021). This tree largely confirmed the expected evolutionary relationships, with strains of the same species clustering together. Notably, the ANT_H38 strain clustered closely with S. psychrophila, reflecting their possible common evolutionary origin.

Next, a similar analysis was performed using the ureA gene sequences. The ureA-based tree largely mirrored the UBCG tree, showing two major groups corresponding to S. ureae/S. aquimarina and S. pasteurii/S. ureilytica clusters. Interestingly, the ANT_H38 showed a notable deviation. While the phylogenomic analysis shows a close relationship between the ANT_H38 strain and S. psychrophila, the ureA phylogeny places the ANT_H38 near the root of the tree (Fig. 6). This suggests that the urease from this strain may be evolutionarily distinct from ureases in other Sporosarcina strains, including its close relative S. psychrophila.

Fig. 6figure 6

Comparison of evolutionary relationships between Sporosarcina strains as determined by UBCG-based (left) and ureA-based (right) analyses

Finally, to provide a greater insight into the relationship between the ANT_H38 strain, and its closest relative, S. psychrophila, we calculated average nucleotide identity (ANI) between their genomes, and performed digital DNA:DNA hybridisation (dDDH). Obtained results confirmed previous findings, with the ANI value reaching 88.26% (typically observed for closely related organisms of different species), and the dDDH was only 33.6%. This, combined with the results of comparative genomic analysis, presents solid evidence that despite sharing a relatively recent common ancestor, the two strains have undergone significant divergent evolution.

Growth kinetics, biofilm formation and urease activity

Both tested strains demonstrated growth at 10 °C and 15 °C. At 10 °C, both strains grew similarly until day 5 and 6, where Sporosarcina sp. ANT_H38 samples started exhibiting higher turbidity compared to S. pasteurii DSM 33. On day 7, measurements for both strains showed similar values (Fig. 7A). At 15 °C, the growth of both strains was again similar in the initial phase (first 3 days); however, starting from day 4, turbidity of DSM 33 samples was much higher compared to the ANT_H38 strain (Fig. 7B). The measurements indicated that both strains are able to multiplicate and persist in low-temperature environments, with Sporosarcina sp. ANT_H38 being slightly favoured at lower temperatures (10 °C), and S. pasteurii DSM 33 growing faster at higher temperature (15 °C).

Fig. 7figure 7

Comparison of growth kinetics of S. pasteurii DSM 33 and Sporosarcina sp. ANT_H38 at A 10 °C and B 15 °C over 7 days. C Urea hydrolysis kinetics by bacterial cells in biofilm formed on sand grains, measured by phenol red indicator absorbance change at 10 °C over 24 h

Subsequently, biofilm formation and ureolytic activity were assessed via sand grain colonisation and subsequent urea hydrolysis, measured by pH changes via phenol red indicator. Both strains demonstrated biofilm formation capabilities and ureolytic activity. DSM 33 exhibited significantly higher ureolytic activity at 10 °C, confirming previous findings (Lapierre et al. 2020). ANT_H38 showed lower and delayed ureolytic activity compared to the type strain (Fig. 7C).

Utilisation of various carbon sources

In the next step, the ability of both strains to metabolise various carbon sources was tested. The analysis revealed that the type strain S. pasteurii DSM 33 was able to metabolise sodium acetate, arabinose, trehalose and starch as sole carbon sources. Sporosarcina sp. ANT_H38 demonstrated a broader carbon utilisation profile, metabolising glycerol, lactose and maltose, in addition to carbon sources utilised by DSM 33. Conversely, the ANT_H38 strain was unable to metabolise sodium acetate. Those results fall in line with results of bioinformatic analyses, since genes putatively enabling the utilisation of these carbon sources were distinguished within the ANT_H38 genome.

Biocementation analysis

The results of biocementation analysis demonstrated a significant increase in both internal friction angle and cohesion of sand samples treated with DSM 33 and 1:1 mixture of DSM 33 and ANT_H38, compared to untreated sand samples and those treated with urease-deficient E. coli DH5α (negative control). The addition of the ANT_H38 strain alone did raise the internal friction angle substantially, although not as much as other tested variants.

When assessing soil biocementation, both the internal friction angle (determining shear strength, a key characteristic of non-cohesive soils) and soil cohesion (reflecting intermolecular forces due to formed calcite bridges) are crucial parameters. It is important to note that the best combination of both of those factors was obtained when using a 1:1 mixture of tested strains. Notably, soil cohesion in this experimental setup was over three times higher compared to the reference strain alone, and almost five times higher than in the negative control (Table 1). Interestingly, when using the ANT_H38 strain alone, soil cohesion was almost as low as for bacteria-free sand beds. Overall, the results indicate a strong synergistic effect occurring between S. pasteurii DSM 33 and Sporosarcina sp. ANT_H38.

Table 1 Internal friction angle and soil cohesion parameters of biocement resulted from calcite precipitation by S. pasteurii DSM 33, ANT_H38, and both strains used in co-culture, compared to urease-depleted E. coli DH5αBiosafety analysis

To ensure that the strain is safe to use in any potential biotechnological applications, a comprehensive analysis of the genome was performed. The analysis did not reveal the presence of any known antibiotic-resistance genes. While two copies of the arsC gene were identified, these are associated with toxic metal resistance and are neither directly relevant to any biosafety concerns. Further investigation using the VFDB yielded seven putative virulence-related proteins. Close inspection of these hits revealed that none of these proteins are considered primary virulence factors, i.e. are not sufficient on their own to provide virulence.

Comments (0)

No login
gif