Comparative Genomic Analysis Reveals Motility Findings in Marine Flavobacteriaceae

Genome Properties and Phylogeny

Recent research has focused on isolating and characterizing Flavobacteriaceae bacteria from marine sponges, specifically Aplysina aerophoba and Dysidea avara. To understand their genomic features and phylogenetic relationships within the Flavobacteriaceae family, seven newly isolated strains underwent whole-genome sequencing using Illumina technology. The genome sizes of these isolates varied from 3.7 to 5.3 Mbp, with a consistent average GC content of 38% (Table 1). Genome assembly resulted in 13–83 contigs, with N50 values ranging from 0.1 to 1.3 Mbp. High completeness (>98%) and low redundancy (<2%) were observed across all genomes. Deep sequencing coverage, exceeding 200x for most genomes (DN50 at 77x), ensured robust data. Gene prediction identified between 3317 and 4738 genes per genome, with coding sequences accounting for over 98% of the assembled genomes. Functional annotation revealed that over 75% of genes could be assigned to known protein families (Pfams).

Table 1 Genome properties and quality metrics of the strains sequenced in this study

Full size table

Phylogenetic analysis, based on both 16S rRNA gene sequences and concatenated protein sequences, consistently placed all seven new isolates within the Marine clade of the Flavobacteriaceae family (Fig. 1). Notably, each isolate formed a distinct branch in phylogenetic trees, except for Aa_D4 and Aa_C5, which exhibited 99.6% 16S rRNA gene sequence identity and 99.8% Amino Acid Identity (AAI). Within the Marine clade, DN112 and DN105 formed a separate subclade, genetically distant from the other isolates. BLASTN searches against the NCBI nr/nt database suggested that Aa_C5 and Aa_D4 belong to the genus Eudoraea, Aa_F7 is closely related to Flagellimonas sp., and DN112 to Lacinutrix sp. DN50 showed highest similarity to an uncultured organism clone, while Da_B9 and DN105 were related to Flavobacteriaceae bacteria isolated from seamounts and marine sponges, respectively. Taxonomic classification using the Genome Taxonomy Database (GTDB) aligned with 16S rRNA gene-based taxonomy, indicating that none of the strains could be classified to the species level, with four assigned to genus and the remainder to family level.

Fig. 1

16S rRNA gene phylogenetic tree illustrating the relationships of sequenced microorganisms. Phylogeny was determined using maximum likelihood method with bootstrap values (>70%) indicated by black circles. Colors denote different Flavobacteriaceae clades. Cyanobacteria and Proteobacteria were used as outgroups. Bold names indicate sequences from this study. Scale bar represents 0.1 substitutions per site.

Full size image

Functional Profiling

To explore the functional diversity, Pfam profiles were generated for 66 genomes, representing Flavobacteriaceae, Cyanobacteria, Proteobacteria, four Flavobacteriaceae clades (Marine, Capnocytophaga, TenacibaculumPolaribacter, Flavobacterium), and host-associated versus non-host associated origins. Functional profiles, based on Pfam relative abundances, were significantly divergent across Flavobacteriaceae, Cyanobacteria, and Proteobacteria (PERMANOVA, p = 0.001) (Fig. 2a). Flavobacteriaceae and Cyanobacteria exhibited greater functional dissimilarity (57.4%) than Flavobacteriaceae and Proteobacteria (50%). Functional profiles also varied significantly among the four Flavobacteriaceae clades (PERMANOVA, p = 0.001) (Fig. 2b). However, functional differences between host-associated and non-host associated flavobacteria were not statistically significant (PERMANOVA, p > 0.05).

Fig. 2

NMDS clustering of genomes based on Bray-Curtis dissimilarity of Pfam abundance profiles. Clustering shown for major taxonomic groups (a) and Flavobacteriaceae clades (b).

Full size image

Out of 5173 Pfams detected, 1648 were shared among all three groups, and 1132 were uniquely found in Flavobacteriaceae (Fig. 3a). Within Flavobacteriaceae, 3843 Pfams were predicted, with 17% specific to the Marine clade. Overall, high functional conservation was observed within Flavobacteriaceae clades, with 48% of Pfams shared across all four clades (Fig. 3b). SIMPER analysis identified Pfams significantly contributing to dissimilarity between taxonomic groups (>0.2% contribution, p < 0.05), showing higher abundance in Flavobacteriaceae. These Pfams were predominantly associated with carbohydrate metabolism and transport proteins (Table 2), including TonB-dependent receptors (pfam13715, pfam07715, pfam00593) and two-component regulatory systems (pfam00072, pfam08281, pfam04542, pfam04397). TonB-dependent receptor genes were often located near SusD/RagB family protein genes (pfam07980), specific to Bacteroidetes. Another differentiating Pfam, CHU_C family (pfam13585), exclusively found in Flavobacteriaceae, is linked to cellulase localization and crystalline cellulose degradation in Cytophaga hutchinsonii. This family shows similarity to the gliding motility-associated C-terminal domain (CTD) (TIGR04131) prevalent in Bacteroidetes, recently shown to target proteins for Type IX Secretion System (T9SS). Within Flavobacteriaceae, differences between Marine, Capnocytophaga, and Flavobacterium clades were driven by Pfams related to polymeric compound attachment and degradation. The Marine and TenacibaculumPolaribacter clades showed fewer functional distinctions, except for the CHU protein family C-terminal domain (pfam13585), more abundant in the Marine clade.

Fig. 3

Venn diagrams illustrating shared and unique Pfams between Flavobacteriaceae, Cyanobacteria, and Proteobacteria (a) and within Flavobacteriaceae (b).

Full size image

Table 2 Pfams contributing most significantly (> 0.2%, p < 0.05) to differences across major taxonomic groups

Full size table

Carbohydrate Metabolism and Transport

Genomic analysis of carbohydrate-active enzymes (CAZymes) and Polysaccharide Utilization Loci (PULs) revealed the carbohydrate metabolism capabilities. Flavobacteriaceae genomes contained significantly more CAZymes per Mbp than Cyanobacteria and Proteobacteria (Kruskal-Wallis test, p < 0.001). Flavobacteriaceae showed higher frequencies of glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and carbohydrate-binding modules (CBMs) (Kruskal-Wallis test, p < 0.001), while glycosyl transferases (GTs) were more frequent in Cyanobacteria (Kruskal-Wallis test, p < 0.05) (Fig. 4a). No significant differences in CAZyme class frequencies were found between Flavobacteriaceae clades (Fig. 4b). However, significant variation existed in degradative CAZymes in PULs per Mbp (Kruskal-Wallis test, p < 0.05) and putative PULs per Mbp (Kruskal-Wallis test, p < 0.01) among Flavobacteriaceae clades. Marine clade strains averaged 42.4 degradative CAZymes per Mbp, with 14.6% PUL-associated, and 3.1 PULs per Mbp, with 5.2% “complete” (Table 3). Capnocytophaga genomes had 2.5 times more PUL-associated CAZymes (Kruskal-Wallis test, p < 0.05) than Marine clade genomes and the highest PUL frequency (8.4 PULs per Mbp). Marine, Tenacibaculum-Polaribacter, and Flavobacterium genomes followed with 3.1, 2.9, and 1.8 PULs per Mbp, respectively (Table 3). GH families GH74 and GH109 were abundant in the Marine clade, associated with xyloglucan hydrolysis and α-N-acetylgalactosaminidase activity, respectively. CBMs in the Marine clade mainly belonged to chitin- or peptidoglycan-binding CBM50 and cellulose- and xyloglucan-binding CBM44 families.

Fig. 4

Mean number of CAZymes per Mb in genomes, normalized by total gene counts. Comparison between taxonomic groups (a) and Flavobacteriaceae clades (b). Error bars indicate mean ± standard error. GHs: glycoside hydrolases; PLs: polysaccharide lyases; CEs: carbohydrate esterases; GTs: glycosyl transferases; CBMs: carbohydrate-binding modules.

Full size image

Table 3 Comparison of frequencies of degradative CAZymes, PULs and CAZymes in PULs in Flavobacteriaceae genomes

Full size table

Findings from Comparative Studies Found Motility: Gliding Motility and Type 9 Secretion

Comparative genomics provided key findings regarding motility. Based on previous research, a set of 27 proteins related to gliding motility and T9SS was examined across the genomes. These proteins were categorized as essential, important, or non-essential for motility, based on studies in F. johnsoniae. Forty-nine publicly available genomes of gliding and non-gliding Flavobacteriaceae members from different clades (Marine, Capnocytophaga, Flavobacterium, TenacibaculumPolaribacter) and environments (host-associated, non-host associated) were screened for homologs of these 27 proteins. All 29 literature-reported ‘gliding’ flavobacteria had homologs for 20 motility and T9SS-related proteins (GldA, GldB, GldF, GldG, GldD, GldH, GldI, GldJ, GldK, GldL, GldM, GldN, SprA, SprE, SprF, SprT, PorV, SigP, PorX, PorY), except Cellulophaga tyrosinoxydans, lacking GldF and GldG homologs (Fig. 5). These 20 proteins include all essential or important Gld and Spr proteins for F. johnsoniae motility. RemA motility adhesin homologs were absent in five of 29 gliding Bacteroidetes, while SprB homologs were universally present in gliding bacteria. Two strains with unknown motility, Winogradskyella jejuensis CP32T and Winogradskyella sp. J14–2, were found to possess complete sets of T9SS-based gliding protein homologs. Non-gliding P. gingivalis genome encoded major T9SS components (GldK, GldL, GldM, GldN, SprA, SprE, SprT) but lacked GldF, GldG, GldD, GldI, RemA, SprC, and SprD. No correlation was found between the presence of gliding/T9SS proteins and host association or Flavobacteriaceae clade.

Fig. 5

Presence of homologs for gliding motility and type 9 secretion proteins in predicted proteomes. Light blue squares indicate presence, white squares absence. Bold strains are from this study. Species are annotated by taxonomy (Groups) and reported motility. Bold protein names are essential/important for F. johnsoniae gliding. “Other” indicates non-T9SS gliding mechanism; NA: no information.

Full size image

Beyond Bacteroidetes, Cyanobacteria and Proteobacteria genomes were screened for gliding/T9SS protein homologs. Homologs were scarce; only GldA homologs were universal, likely due to GldA’s nature as a common ABC transporter ATP-binding component. Gliding cyanobacteria, Anabaena sp. and Trichodesmium erythraeum, encoded homologs for five gliding proteins (GldA, GldF, GldG, GldJ, GldK) and three regulatory proteins (SigP, PorX, PorY). Anabaena sp. also had SprE, absent in T. erythraeum. Prochlorococcus marinus and “Candidatus Pelagibacter ubique” genomes had the fewest homologs (three).

Experimental motility assays on the seven new sponge isolates revealed that despite all encoding essential gliding motility and T9SS proteins, most (DN105, Da_B9, Aa_C5, Aa_D4, DN112) were non-spreading on agar. Only Aa_F7 and DN50 formed spreading colonies with slender peninsulas on 1% agar (Fig. 6a, b). DN50 on 1.8% agar showed a colony surface contour pattern indicative of gliding motility and secondary colony formation (Fig. 6c), not seen in other strains.

Fig. 6

Colony morphologies of studied flavobacterial strains. a, b: Spreading colonies of DN50 and Aa_F7 on 1% agar with slender peninsulas. c: Surface contour of DN50 colonies on 1.8% agar. d: Non-spreading colonies of DN105. Phase-contrast microscopy images.

Full size image

Spreading strains (Aa_F7, DN50) had homologs for all gliding/T9SS proteins except PorQ and PorU. Gene neighborhood analysis of gldJ in Aa_F7 and DN50, compared to F. johnsoniae, showed similar gene arrangement but lacked porU homologs adjacent to gldJ (Fig. 7). Hypothetical proteins were found in this location in Aa_F7 and DN50.

Fig. 7

Gene neighborhoods of gldJ in Aa_F7 and DN50 compared to F. johnsoniae UW101. Genes of same color are orthologous. Light yellow genes lack COG assignment. (Chromosome viewer-IMG/MER).

Full size image

Weak PorQ hits were found in Aa_F7 and DN50 via low stringency BLASTP. Intriguingly, non-spreading DN105, Bizionia echini, Lacinutrix venerupis, and Tamlana sedimentorum genomes contained complete sets of orthologous gliding genes, suggesting motility gene presence doesn’t always correlate with observed gliding.

Secondary Metabolome

The secondary metabolite biosynthetic potential was explored using antiSMASH. In Flavobacteriaceae, terpene synthases were the most common biosynthetic gene clusters (BGCs) (44%), followed by type III polyketide synthases (T3PKS) (9%), non-ribosomal peptide synthetases (NRPS) (7%), lanthipeptide (6%), and aryl polyene (6%) clusters. Secondary metabolite biosynthesis genes constituted 0.02–0.3% of total protein-coding genes per genome. Cyanobacteria had the highest average BGC count (n = 7), compared to Flavobacteriaceae and Proteobacteria (n = 4). NRPS and polyketide BGCs were present in all groups. Terpene BGCs were abundant in Flavobacteriaceae and Cyanobacteria. Proteobacteria was enriched in homoserine lactone (HSL) BGCs, quorum sensing molecules. Only Muriicola jejuensis DSM 21206 among flavobacteria had HSL BGCs.

Table 4 Distribution of BGCs (mean number of BGCs per strain) across major taxonomic groups and different Flavobacteriaceae clades

Full size table

Within Flavobacteriaceae, Marine and Flavobacterium clades had the most BGCs, while TenacibaculumPolaribacter genomes averaged only 2 BGCs. Similar BGC types were enriched across flavobacterial clades: terpene, lanthipeptide, NRPS, and aryl polyene BGCs.

Genome mining of the seven new isolates revealed an average of five BGCs per strain, across six classes. Terpene, lanthipeptide, and aryl polyene BGCs accounted for 63% of identified BGC-encoded proteins. Bacteriocin, T3PKS, and NRPS BGCs were less abundant but widely distributed. DN50 had the most diverse and abundant BGCs (Fig. 8). Carotenoid pigment synthesis genes were universal in the seven isolates. DN50 also harbored a flexirubin BGC. Despite predicted secondary metabolite potential, none of the isolates exhibited antimicrobial activity against tested indicator strains (Escherichia coli, Bacillus subtilis, Staphylococcus simulans, Aeromonas salmonicida, Candida oleophila, Saprolegnia parasitica).

Fig. 8

Composition of BGCs identified in newly sequenced genomes. Absolute number of BGCs per strain, assigned to each BGC class. Colors indicate different BGC classes.

Full size image

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *