Tools for processing fastq C-HiC files

Map and parser reads

hicup_tool

class CHiC.tool.hicup_tool.hicup(configuration=None)[source]

Tool to run hicup, from fastq to bam files

digest_genome(genome_name, re_enzyme, genome_loc, re_enzyme2)[source]

This function takes a genome and digest it using a restriction enzyme specified

Parameters:
  • genome_name (str) – name of the output genome
  • re_enzyme (str) – name of the enzyme used to cut the genome format example A^GATCT,BglII .
  • genome_loc (str) – location of the genome in FASTA format
  • re_enzyme2 (str) – Restriction site 2 refers to the second, optional (other DNA shearing techniques such as sonication may be used) enzymatic digestion. This restriction site does NOT form a Hi-C ligation junction. This is the restriction enzyme that is used when the Hi-C sonication protocol is not followed. Typically the sonication protocol is followed.
static get_hicup_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use with hicup

Parameters:params (dict) –
--bowtie Specify the path to Bowtie
--bowtie2 Specify the path to Bowtie 2
--config Specify the configuration file
--digest Specify the digest file listing restriction fragment co-ordinates
--example Produce an example configuration file
--format Specify FASTQ format Options: Sanger, Solexa_Illumina_1.0, Illumina_1.3, Illumina_1.5
--help Print help message and exit
--index Path to the relevant reference genome Bowtie/Bowtie2 indices
--keep Keep intermediate pipeline files
--longest Maximum allowable insert size (bps)
--nofill Hi-C protocol did NOT include a fill-in of sticky ends prior to ligation step and therefore FASTQ reads shall be truncated at the Hi-C restriction enzyme cut site (if present) sequence is encountered
--outdir Directory to write output files
--quiet Suppress progress reports (except warnings)
--shortest Minimum allowable insert size (bps)
--temp Write intermediate files (i.e. all except summaryfiles and files generated by HiCUP Deduplicator) to a specified directory
--threads Specify the number of threads, allowing simultaneous processing of multiple files
--version Print the program version and exit
--zip Compress output
Returns:
Return type:list
hicup_alig_filt(**kwargs)[source]

This function aling the HiC read into a reference genome and filter them

Parameters:
  • bowtie2_loc
  • genome_index (str) – location of genome indexed with bowtie2
  • digest_genome (str) – location of genome digested
  • fastq1 (str) – location of fastq2 file
  • fastq2 (str) – location of fastq2
Returns:

Return type:

Bool

run(input_files, input_metadata, output_files)[source]

Function that runs and pass the parameters for all the functions

Parameters:
  • input_files (dict) –
  • metadata (dict) –
  • output_files (dict) –
untar_index(**kwargs)[source]

Extracts the Bowtie2 index files from the genome index tar file. :param genome_file_name: Location string of the genome fasta file :type genome_file_name: str :param genome_idx: Location of the Bowtie2 index file :type genome_idx: str :param bt2_1_file: Location of the <genome>.1.bt2 index file :type bt2_1_file: str :param bt2_2_file: Location of the <genome>.2.bt2 index file :type bt2_2_file: str :param bt2_3_file: Location of the <genome>.3.bt2 index file :type bt2_3_file: str :param bt2_4_file: Location of the <genome>.4.bt2 index file :type bt2_4_file: str :param bt2_rev1_file: Location of the <genome>.rev.1.bt2 index file :type bt2_rev1_file: str :param bt2_rev2_file: Location of the <genome>.rev.2.bt2 index file :type bt2_rev2_file: str

Returns:Boolean indicating if the task was successful
Return type:bool

Create CHiCAGO input files

makeRmap

makeBaitmap

makeDesignFiles

class CHiC.tool.makeDesignFiles.makeDesignFilesTool(configuration=None)[source]

Tool for makeing the design files as part of the input for Chicago capture Hi-C

static get_design_params(params)[source]

This function handle chicago parameters, selecting the given ones and passing to the command line.

makeDesignFiles(**kwargs)[source]

make the design files and store it in the specify design folder. It is a wrapper of makeDesignFiles.py

Parameters:
  • designDir (str,) – Path to the folder with the output files(recommended the same folder as .map and .baitmap files).
  • parameters (dict,) – list of parameter already selected by get_makeDesignFiles_params().
Returns:

  • bool
  • outFilePrefix (str) – writes the output files in the defined location

run(input_files, input_metadata, output_files)[source]

The main function to run makeDesignFiles.

Parameters:
  • input_files (dict) – designDir : path to the designDir containin .rmap and .baitmap files
  • input_metadata (dict) –
  • output_files (dict) –
    outFilePrefix : path to the output folder and prefix name of files
    example: “/folder1/folder2/prefixname”. Recommended to use the path to designDir and the same prefix as .rmap and .baitmap
Returns:

  • output_files (dict) – List of location for the output files.
  • output_metadata (dict) – List of matching metadata dict objects.

Convert bam files into chicago input

bam2chicago

class CHiC.tool.bam2chicago_tool.bam2chicagoTool(configuration=None)[source]

Tool for preprocess the input files

bam2chicago(**kwargs)[source]

Main function that preprocess the bam files into Chinput files. Part of the input files of CHiCAGO. It is a wrapper of bam2chicago.sh.

Parameters:
  • bamFile (str,) – path to paired-end file produced by a HiC aligner; Chicago has only been tested with data produced by HiCUP (http://www.bioinformatics.babraham.ac.uk/projects/hicup/). However, it should theoretically be possible to use other HiC aligners for this purpose.
  • rmapFile (str,) – A tab-separated file of the format <chr> <start> <end> <numeric ID>, describing the restriction digest (or “virtual digest” if pooled fragments are used). These numeric IDs are referred to as “otherEndID” in Chicago. All fragments mapping outside of the digest coordinates will be disregarded by both these scripts and Chicago.
  • baitMapFile (str,) – Tab-separated file of the format <chr> <start> <end> <numeric ID> <annotation>, listing the coordinates of the baited/captured restriction fragments (should be a subset of the fragments listed in rmapfile), their numeric IDs (should match those listed in rmapfile for the corresponding fragments) and their annotations (such as, for example, the names of baited promoters). The numeric IDs are referred to as “baitID” in Chicago.
  • chinput (str) – name of the output file. Bbam2chicago creates a folder with the name of this sample, and inside the folder there is a file with chinput.chinput, that is the final output.
Returns:

  • bool
  • chinput (str,) – name of the sample

run(input_files, input_metadata, output_files)[source]

Function that runs and pass the parameters to bam2chicago

Parameters:
  • input_files (dict) –
  • hicup_outdir_tar (str) –
  • rmapFile (str) –
  • baitmapFile (str) –
  • metadata (dict) –
Returns:

  • output_files (list)
  • List of locations for the output files.
  • output_metadata (list)
  • List of matching metadata dict objects

sort_chicago(**kwargs)[source]

This function sort bamfile by name of the reads as bam2chicago requires

Parameters:
  • bamfile (str) –
  • sorted_bam (str) –
Returns:

sorted_bam

Return type:

str

static untar_hicup_out(hicup_outdir_tar, bam_name)[source]

Untar hicup output filder

Parameters:
  • hicup_outdir_tar (str) – path to hicup output folder
  • path_bam (str) – path to bam file
Returns:

Return type:

bool

Normalize data and call C-HiC peaks

run_chicago

class CHiC.tool.run_chicago.ChicagoTool(configuration=None)[source]

tool for running the CHiCAGO algorithm

chicago(**kwargs)[source]

Run and annotate the Capture-HiC peaks. Chicago will create 4 folders under the outpu_prefix data : output_index.Rds –> chicago data saved on Rds format output_index_params.txt –> parameters used to run Chicago output_index.export_format –> chicago output in the chosen format diag_plots : 3 plots to assest the quality of the output (see CHicago Capture-HiC documentation for details) enrichment_data: files for the feature enrichment output (in case is used) examples: output_index_proxExamples.pdf: random chosen peaks showing interactions regions see http://regulatorygenomicsgroup.org/chicago for more information

Parameters:
  • input_files (str ot comma separated list if there is more than one replicate) –
  • output_prefix (str) –
  • output_dir (str (whole path for the output)) –
  • params (dict) –
Returns:

writes the output files in the defined location

Return type:

bool

static get_chicago_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:
Return type:list
run(input_files, input_metadata, output_files)[source]

The main function to run chicago for peak calling. The input files are .chinput and are transformed from BAM files using bam2chicago.sh input files could be just one file or a comma separated files from more than one biological replicate. Technical replicates should be pooled to one .chinput

Parameters:
  • input_files (dict) – list of .chinput files, or str with a single .chinput file
  • input_metadata (dict) –
  • output_files (dict with the output path) –
Returns:

  • output_files (Dict) – List of locations for the output files,
  • output_metadata (Dict) – List of matching metadata dict objects

static untar_chinput(chinput_tar)[source]

This function take as input the tar chinput

Parameters:chinput_tar (str) – path to the tar file, the tar files should have the same prefix name as the tar file
Returns:
Return type:list of untar files