This lesson is in the early stages of development (Alpha version)

BGCflow data structure

Overview

Teaching: 0 min
Exercises: 5 min
Questions
  • Where can I find the analysis result of my run?

Objectives
  • Finding relevant BGCflow analysis result

Exploring the output of BGCflow

In this workshop, we have finished running all analysis for our s_venezuelae project. You can find the result in the VM at: /datadrive/bgcflow/data.

tree -L 2 /datadrive/bgcflow/data/

You can also generate a symlink to that directory so you can explore it using VS Code:

tree -L 3 /datadrive/bgcflow/data/

BGCflow adopt the cookiecutter data science directory structure. Output files can be found in the data directory, and are splitted into three different stages:

Give yourself time to look through the different output directories.

Which files are important?

It depends. Different research questions will require different analysis, and therefore different files are required for downstream analysis. In the Natural Products Genome Mining group, we aim to aid students and researchers by giving an example of Jupyter notebooks to process each output types. It is a work in progress and we are open to anyone who would like to contribute.

In the next session, I will give an example to do exploratory data analysis on BiG-SLICE query result against the BiG-FAM database.

Key Points

  • BGCflow adopt the cookiecutter data science directory structure

  • Output files can be found in the data directory, and are splitted into three different stages

  • The processed directory contained most of the output required for downstream analysis