Firehose: legacy, hg19
- within sample:
- Counts per million (CPM) : seq-dep
- FPKM (SE: RPKM): 1. seq-dep 2. tx length
- TPM: 1. tx length 2. seq-dep
- between sample:
Firehose: legacy, hg19
Xena:
vignette: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
Interactive Venn diagram: http://www.interactivenn.net/#
Amazingly comprehensive cistrome (TF, histone, open chromatin) database and analysis tool from Shirley Liu's lab: http://cistrome.org/db/#/
GeneWeaver: https://www.geneweaver.org/
MSigDB: http://software.broadinstitute.org/gsea/msigdb/index.jsp
First things first. We noticed that Trimmomatic is not suitable for the new batch of ATAC-seq sequencing file. After trimming, short fragments(i.e. shorter than 142 bp) will be lost. It turns out that current mainstream trimming softwares are not compatible with Nova-seq and Next-seq. And we have run our sequencing on a Nova-seq. After arduous searching and debugging, we found that Fastp was designed for these two new Illumina sequencing platforms and can handle what we've encountered.
Common usage of fastp to process ATAC-seq fastq file:
fastp -i R1.fq.gz -I R2.fq.gz -o ./R1_clean.fq.gz -O ./R2_clean.fq.gz -L -g -l 15 -p -w 16
As for ATAC-seq quality control, I'm currently implementing ENCODE ATAC-seq standard. As it happens, threre is an R package encodeChIPqc for calculating all sorts of parameters mentioned in ENCODE for estimating seq quality. It should be noted that calculating PCR bottleneck coefficient should be done before removing duplicates which will remarkably increase the original PBC value.
Removing and calculating duplicates is done by picard:
picard MarkDuplicates I=in.bam O=out_dedup.bam M=out_dups.txt REMOVE_DUPLICATES=true
Then removing removing non-primary chromosome (GRCh38 only):
samtools idxstats in.dedup.bam | cut -f1 | grep -v KI | grep -v GL | grep -v MT | xargs samtools view --threads 10 -b sample.bam > out.clean.bam
If the aligner did not put chr suffix infront of chromosome number, it is also neccessary to add them manually:
samtools view -H test.bam | sed -e 's/SN:1/SN:chr1/' | sed -e 's/SN:2/SN:chr2/' | sed -e 's/SN:3/SN:chr3/' | sed -e 's/SN:4/SN:chr4/' | sed -e 's/SN:5/SN:chr5/' | sed -e 's/SN:6/SN:chr6/' | sed -e 's/SN:7/SN:chr7/' | sed -e 's/SN:8/SN:chr8/' | sed -e 's/SN:9/SN:chr9/' | sed -e 's/SN:10/SN:chr10/' | sed -e 's/SN:11/SN:chr11/' | sed -e 's/SN:12/SN:chr12/' | sed -e 's/SN:13/SN:chr13/' | sed -e 's/SN:14/SN:chr14/' | sed -e 's/SN:15/SN:chr15/' | sed -e 's/SN:16/SN:chr16/' | sed -e 's/SN:17/SN:chr17/' | sed -e 's/SN:18/SN:chr18/' | sed -e 's/SN:19/SN:chr19/' | sed -e 's/SN:20/SN:chr20/' | sed -e 's/SN:21/SN:chr21/' | sed -e 's/SN:22/SN:chr22/' | sed -e 's/SN:X/SN:chrX/' | sed -e 's/SN:Y/SN:chrY/' | sed -e 's/SN:MT/SN:chrM/' | samtools reheader - test.bam > test_chr.bam
shift coordinates, this is for ATAC-seq only:
alignmentSieve -b sample.bam -o sample.filtered.bam --minMappingQuality 10 --filterMetrics log.txt --ATACshift
I mainly refer to Harvard ATAC-seq Guidelines.
For generating signal track using Deeptools, there are a few things that should be taken into consideration:
MACS2 is an old package but it work just fine. The Harvard guideline recommends a newly developed package Genrich, but I couldn't make it call peak properly.
The MACS2 parameters are:
macs2 callpeak --treatment treat.dedup.bam --name treat -g hs --bdg -q 0.05 --keep-dup all -f BAMPE --nomodel --shift -100 --extsize 200