Skip to content
Snippets Groups Projects
Commit 0bf430a6 authored by Manavalan Gajapathy's avatar Manavalan Gajapathy
Browse files

updates doc

parent 274f749b
No related branches found
No related tags found
1 merge request!1QuaC - First major review
......@@ -2,23 +2,61 @@
🦆🦆 Don't duck that QC thingy 🦆🦆
## Who am I?
QuaC is a pipeline developed using snakemake, which runs a set of selected QC tools on NGS samples.
## What can I quac about?
* Somalier
* Relatedness
* Sex
* Ancestry
* indexcov
* (Estimated) coverage of smaples
| Tool | Use |
| ----------------------------------------------------------------- | ---------------------------------------------------------- |
| [somalier](https://github.com/brentp/somalier) | Estimation of sex, ancestry and relatedness |
| [verifybamid](https://github.com/Griffan/VerifyBamID) | Estimates within-species (i.e. cross-sample) contamination |
| [mosdepth](https://github.com/brentp/mosdepth) | Fast BAM/CRAM depth calculation |
| [indexcov](https://github.com/brentp/goleft/tree/master/indexcov) | Estimate coverage from whole-genome bam or cram index |
| [covviz](https://github.com/brwnj/covviz) | Identifies large, coverage-based anomalies |
## Installation
Installation simply requires fetching the source code. Following are required:
- Git
- CGDS GitLab access
- [SSH Key for access](https://docs.uabgrid.uab.edu/wiki/Cheaha_GettingStarted#Logging_in_to_Cheaha) to Cheaha cluster
To fetch source code, change in to directory of your choice and run:
```sh
git clone -b master \
--recurse-submodules \
git@gitlab.rc.uab.edu:center-for-computational-genomics-and-data-science/sciops/pipelines/quac.git
```
Note that this repository uses git submodules, which gets automatically pulled when cloning using above command. Simply
downloading this repository from GitLab, instead of cloning, may not fetch the submodules included.
## Environment Setup
### Requirements
- Anaconda/miniconda
## How to quac?
Also the tools listed below, which are not available via conda distribution, need to be installed. Static binaries are
available for both these tools and they are hence easy to install.
* Modify config file `configs/workflow.yaml` as needed. Note: `projects_path` and `project_name` may be the most
important ones you would care about.
* Pedigree file specific to the project is required. Should be stored as `data/raw/ped/<project_name>.ped`.
* See the header of `workflow/Snakefile` for usage instructions on how to run the workflow
- [somalier](https://github.com/brentp/somalier)
- [goleft](https://github.com/brentp/goleft)
*Note:* CGDS folks using QuaC in cheaha may skip this step, as these tools are already installed and centrally available.
### Setup config file
Workflow config file `configs/workflow.yaml` provides path to certain tool installation path as well as other files that
the tools require. Modify them as necessary. Refer to the QC tool's documentation for more information on files that
they require.
### Create conda environment
```sh
module reset
......@@ -26,12 +64,22 @@ module load Anaconda3/2020.02
# create conda environment. Needed only the first time.
conda env create --file configs/env/quac.yaml
# activate conda environment
conda activate quac
# if you need to update existing environment
# if you need to update the existing environment
conda env update --file configs/env/quac.yaml
# activate conda environment
conda activate quac
```
If the default path to `datasets_central` is going to be used (i.e. you'll be using the tool for testing and/or
development), then you'll also need to initialize the default `datasets_central` directory. This can be done by running
the following (must be done for each user):
```sh
mkdir -p $USER_SCRATCH/tmp/datasets_central_manager/datasets $USER_SCRATCH/tmp/datasets_central_manager/logs
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment