This lesson is in the early stages of development (Alpha version)

BGCflow Tutorial: Glossary

Key Points

Introduction
  • A Snakemake environment is required to run BGCflow

  • bgcflow_wrapper provides shortcut to commonly used Snakemake commands and other tools integrated in BGCFlow

BGCflow project structure
  • A project requires a PEP configuration and a .csv file containing a list of genomes to analyse

  • The project is then defined in the config.yaml file

  • An example of a project config can be found in config/examples

Selecting rules for analysis
  • BGCflow rules can be selected by giving a TRUE value to the rules config

  • Global rules applied to all projects and defined in the config.yaml

  • Project specific rules can be given to each project and will override the global rule

  • Project specific rules are written as a separate .yaml file

BGCflow data structure
  • BGCflow adopt the cookiecutter data science directory structure

  • Output files can be found in the data directory, and are splitted into three different stages

  • The processed directory contained most of the output required for downstream analysis

Part I: Exploring BiG-SLICE query result
  • BGCflow returns an edge table of your BGC query to the top 10 hits of GCF models in the BiG-FAM database

Exploring BiG-SLICE query result
  • Different BGCflow outputs can be combined to enrich BiG-SLICE query network

Exploring BGCFlow result with MKDocs
  • For each BGCFlow projects, the result is structured as an interactive Markdown documentation that can be modified

  • Reports are made using Jupyter notebooks running on Python or R environments

Exploring BGCFlow result database using Metabase
  • All of the tabular ouputs from BGCFlow are loaded and transformed into an OLAP database (DuckDB)

  • A business intelligence tools such as Metabase, can be used to interactively explore the database and build SQL queries

Customizing BGCFlow config for analyzing fungal genomes
  • BGCFlow can be customized for analyzing fungal genomes using specific configurations

  • Running BGCFlow with these configurations allows for targeted analysis of fungal genomes

Glossary

FIXME