mq_saccharopolyspora
Summary report for project mq_saccharopolyspora
. Generated using BGCFlow v0.9.0
Project Description
- 26 Saccharopolysopra genomes of medium to high quality with less than 50 contigs. The new cutoff now removes GCF_000710755.1, GCF_015209505.1, GCF_014646075.1 and adds GCF_014697215.1
- Sample size 26
Available reports
BGCFlow_rules | description |
---|---|
seqfu | Calculate sequence statistics using SeqFu. |
mash | Calculate distance estimation for all samples using MinHash. |
fastani | Do pairwise Average Nucleotide Identity (ANI) calculation across all samples. |
checkm | Assess genome quality with CheckM. |
prokka-gbk | Copy annotated genbank results. |
antismash | Summarizes antiSMASH result. |
query-bigslice | Map BGCs to BiG-FAM database (https://bigfam.bioinformatics.nl/) |
bigscape | Cluster BGCs using BiG-SCAPE |
bigslice | Cluster BGCs using BiG-SLiCE (https://github.com/medema-group/bigslice) |
automlst-wrapper | Simplified Tree building using autoMLST |
arts | Run Antibiotic Resistant Target Seeker (ARTS) on samples. |
roary | Build pangenome using Roary. |
eggnog-roary | Annotate Roary output using eggNOG mapper |
deeptfactor | Use deep learning to find Transcription Factors. |
cblaster-genome | Build diamond database of genomes for cblaster search. |
cblaster-bgc | Build diamond database of BGCs for cblaster search. |
gecco | GEne Cluster prediction with COnditional random fields. |
References
If you find BGCFlow useful, please cite:
-
Nuhamunada, M., B.O. Palsson, O. S. Mohite, and T. Weber. 2022. BGCFlow [Computer software]. GITHUB: https://github.com/NBChub/bgcflow
-
Mölder, F., Jablonski, K.P., Letcher, B., Hall, M.B., Tomkins-Tinch, C.H., Sochat, V., Forster, J., Lee, S., Twardziok, S.O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., Köster, J., 2021. Sustainable data analysis with Snakemake. F1000Res 10, 33.
-
Nathan C Sheffield, Michał Stolarczyk, Vincent P Reuter, André F Rendeiro, Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects, GigaScience, Volume 10, Issue 12, December 2021, giab077
Please also cite each tools used in the analysis:
-
Gilchrist, C., Booth, T. J., van Wersch, B., van Grieken, L., Medema, M. H., & Chooi, Y. (2021). cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters (Version 1.3.9) [Computer software]. https://doi.org/10.1101/2020.11.08.370601
-
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, 'Roary: Rapid large-scale prokaryote pan genome analysis', Bioinformatics, 2015;31(22):3691-3693 doi:10.1093/bioinformatics/btv421
-
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2014. Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 25: 1043-1055.
-
Mungan,M.D., Alanjary,M., Blin,K., Weber,T., Medema,M.H. and Ziemert,N. (2020) ARTS 2.0: feature updates and expansion of the Antibiotic Resistant Target Seeker for comparative genome mining. Nucleic Acids Res.,10.1093/nar/gkaa374
-
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
-
Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509
-
Kim G.B., Gao Y., Palsson B.O., Lee S.Y. 2020. DeepTFactor: A deep learning-based tool for the prediction of transcription factors. PNAS. doi: 10.1073/pnas.2021171118
-
Telatin, A., Birolo, G., & Fariselli, P. SeqFu [Computer software]. GITHUB: https://github.com/telatin/seqfu2
-
Navarro-Muñoz, J.C., Selem-Mojica, N., Mullowney, M.W. et al. A computational framework to explore large-scale biosynthetic diversity. Nat Chem Biol 16, 60–68 (2020)
-
antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Kai Blin, Simon Shaw, Hannah E Augustijn, Zachary L Reitz, Friederike Biermann, Mohammad Alanjary, Artem Fetter, Barbara R Terlouw, William W Metcalf, Eric J N Helfrich, Gilles P van Wezel, Marnix H Medema, Tilmann Weber. Nucleic Acids Research (2023) doi: 10.1093/nar/gkad344
-
Mash: fast genome and metagenome distance estimation using MinHash. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
-
Satria A Kautsar, Justin J J van der Hooft, Dick de Ridder, Marnix H Medema, BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters, GigaScience, Volume 10, Issue 1, January 2021, giaa154
-
Mohammad Alanjary, Katharina Steinke, Nadine Ziemert, AutoMLST: an automated web server for generating multi-locus species trees highlighting natural product potential,Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W276–W282
-
Satria A Kautsar, Justin J J van der Hooft, Dick de Ridder, Marnix H Medema, BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters, GigaScience, Volume 10, Issue 1, January 2021, giaa154.
-
Satria A Kautsar, Kai Blin, Simon Shaw, Tilmann Weber, Marnix H Medema, BiG-FAM: the biosynthetic gene cluster families database, Nucleic Acids Research, gkaa812, https://doi.org/10.1093/nar/gkaa812
-
Alanjary,M., Kronmiller,B., Adamek,M., Blin,K., Weber,T., Huson,D., Philmus,B. and Ziemert,N. (2017) The Antibiotic Resistant Target Seeker (ARTS), an exploration engine for antibiotic cluster prioritization and novel drug target discovery. Nucleic Acids Res.,10.1093/nar/gkx360
-
eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Carlos P. Cantalapiedra, Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021. Molecular Biology and Evolution, msab293
-
eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8; 47(Database issue): D309–D314. doi: 10.1093/nar/gky1085
-
Mash Screen: high-throughput sequence containment estimation for genome discovery. Ondov BD, Starrett GJ, Sappington A, Kostic A, Koren S, Buck CB, Phillippy AM. Genome Biol. 2019 Nov 5;20(1):232. doi: 10.1186/s13059-019-1841-x.
-
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014 Jul 15;30(14):2068-9. PMID:24642063
-
Jain, C., Rodriguez-R, L.M., Phillippy, A.M. et al. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9, 5114 (2018). https://doi.org/10.1038/s41467-018-07641-9
-
antiSMASH 6.0: improving cluster detection and comparison capabilities. Kai Blin, Simon Shaw, Alexander M Kloosterman, Zach Charlop-Powers, Gilles P van Weezel, Marnix H Medema, & Tilmann Weber. Nucleic Acids Research (2021) doi: 10.1093/nar/gkab335.