Commit e948b86f authored by Curtis Hendrickson's avatar Curtis Hendrickson
Browse files

Initial commit

parents
# optional input files
sample_list_ext.tsv
# output files
processed_data/
out_table.txt
# snakemake tmp file
.snakemake
# editor files
*~
\#*
# Snakemake CONDITIONAL
This pipeline shows how you can smoothly mix local raw samples with samples already pre-processed remotely
There are 3 inputs:
* [sample_list.tsv](sample_list.tsv)
* sample_list_ext.tsv (optional - example in [not_sample_list_ext.tsv](not_sample_list_ext.tsv))
Python code up front loads the two sample lists, and creates 3 lists
SAMPLES_RAW
SAMPLES_PROC
SAMPLES_ALL
that can be used to control the rest of the flow
Step2 merges locally processed samples with remotely processed ones, if there are any remote ones.
## To run local data only
```
module load snakemake
snakemake
```
outputs
```
Job counts:
count jobs
1 all
2 step1_wc_raw_data
1 step2_rule
4
Job 3: # ----- STEP1: process local raw data ---------
Job 2: # ----- STEP1: process local raw data ---------
Job 1: # ----- STEP2 (merge) ---------
6 local_data/sample1.txt
13 local_data/sample2.txt
```
## To run local raw data with externally processed data merged:
```
module load snakemake
ln -s not_sample_list_ext.tsv sample_list_ext.tsv
snakemake
```
outputs
```
Job counts:
count jobs
1 all
1 step2_rule
2
Job 1: # ----- STEP2 (merge) ---------
6 local_data/sample1.txt
13 local_data/sample2.txt
18 ext_sample1_preprocessed
20 ext_sample2_preprocessed
```
\ No newline at end of file
# ----------------------------------------------------------------------
# Test conditional rules
#
# Two level sample lists
#
# "local" samples are raw data
# "external" samples are pre-processed data (lines already counted)
#
# ----------------------------------------------------------------------
#
# raw data processing: text file -> wc -l
#
# ----------------------------------------------------------------------
# 2020.03.31 curtish@uab.edu
# ----------------------------------------------------------------------
import pandas as pd
#import os.path
from os import path
# required
SAMPLE_FILE_LOCAL="sample_list.tsv"
# optional
SAMPLE_FILE_EXT="sample_list_ext.tsv"
samples = pd.read_table(SAMPLE_FILE_LOCAL).set_index("sample", drop=False)
print("loaded "+SAMPLE_FILE_LOCAL)
if path.exists(SAMPLE_FILE_EXT):
ext_samples = pd.read_table(SAMPLE_FILE_EXT).set_index("sample", drop=False)
print("loaded "+SAMPLE_FILE_EXT)
SAMPLES_RAW=samples["sample"].tolist()
SAMPLES_PROC=ext_samples["sample"].tolist()
SAMPLES_ALL=samples["sample"].tolist() + ext_samples["sample"].tolist()
else:
SAMPLES_RAW=samples["sample"].tolist()
SAMPLES_PROC=[]
SAMPLES_ALL=samples["sample"].tolist()
print("SAMPLES_RAW")
print(SAMPLES_RAW)
print("SAMPLES_PROC")
print(SAMPLES_PROC)
print("SAMPLES_ALL")
print(SAMPLES_ALL)
#
# default target
#
rule all:
input: "out_table.txt"
shell: "cat {input}"
# ----------------------------------------------------------------------
# STEP1: process raw files with wc -l
# ----------------------------------------------------------------------
rule step1_wc_raw_data:
input:
raw="local_data/{sample}.txt",
script="Snakefile"
output: "processed_data/{sample}.txt"
message: "# ----- STEP1: process local raw data ---------"
shell: "wc -l {input.raw} > {output}"
# ----------------------------------------------------------------------
# STEP2: merge processed data
# ----------------------------------------------------------------------
rule step2_rule:
input:
local=expand("processed_data/{sample}.txt", sample=SAMPLES_RAW),
extra=expand("extra_processed_data/{sample}.txt", sample=SAMPLES_PROC),
script="Snakefile"
output: "out_table.txt"
message: "# ----- STEP2 (merge) ---------"
shell: "cat {input.local} {input.extra} > {output}"
18 ext_sample1_preprocessed
20 ext_sample2_preprocessed
one
two
three
four
five
six
un
deux
trois
quatre
cinq
six
sept
huit
neuf
dix
onze
beaucoup
sample proc_file
xSample1 extra_processed_data/ext_sample1.txt
xSample2 extra_processed_data/ext_sample2.txt
sample raw_file
sample1 local_data/sample1.txt
sample2 local_data/sample2.txt
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment