Create your config.yaml file for ProHap

General parameters



Select transcripts

Only available for Ensembl v.108 and above. For genes that do not include any MANE Select transcript in Ensembl, "Ensembl Canonical" transcripts will be selected.

The default contaminant database is provided in the crap.fasta file in this repository.

ProHap



Data source:

Default: 1000 Genomes Project on GRCh38
VCFs are expected per chromosome, replace the chromosome number with "{chr}". Files can be either in the GZIP format or uncompressed.
See the wiki page for details
Variants under this threshold will not be included in haplotypes
Name of the AF column in the VCF file ("AF" by default). Change if you want to use the frequency in a specific population within 1000 Genomes, or according to your own file
Threshold haplotypes by
Specify 0 to skip haplotype thresholding
Pseudo-autosomal regions (PAR) on the X chromosome
The default values for the GRCh38 human genome are 2781479 and 155701383. For GRCh37, use 2699520 and 154931044 respectively.

Transcripts that do not have an annotated canonical start codon will not be used.

If disabled, UTR sequences are still removed in the final optimized database, but retained in the haplotypes FASTA.

If disabled, these haplotype cDNA sequences are translated in 3 reading frames, including UTR sequences.

Create a separate file containing all the haplotype cDNA sequences before translation. If skipping UTR variation as above, the cDNA haplotypes will begin with the canonical start codon.

ProVar



Add your VCF files:
Specify 0 to skip thresholding


Transcripts that do not have an annotated canonical start codon will not be used.

Create a separate file containing all the variant cDNA sequences before translation.


(e.g., one of the F2 files in the Zenodo repository)
(e.g., one of the F3 files in the Zenodo repository)

or copy the content below to your config.yaml file: