nvLighter is a GPU-accelerated re-engineering of Lighter, a very low memory footprint error corrector, implementing the algorithms described in:
Lighter: fast and memory-efficient sequencing error correction without counting
Li Song, Liliana Florea and Ben Langmead
Genome Biology, 2014, 15:509
- nvLighter can use both multiple CPU threads as well as multiple GPUs to concurrently correct large sets of reads. In practice, the error correction algorithm is so fast it can be easily bottlenecked by I/O: hence, we recommend using nvLighter on an SSD-equipped system in order to appreciate maximum acceleration.
Usage
- At the moment, the command line options of nvLighter differ from those of Lighter, as nvLighter is designed to handle a single input file at the time. Moreover, nvLighter supports many file formats for both input and output, among which plain .txt files (with one read per row), FASTA, and FASTQ. Additionally, as slow gzip compression can quickly become a bottleneck, nvLighter supports LZ4 compression on any of the above formats.
* ./nvLighter --help
*
* nvLighter - Copyright 2015, NVIDIA Corporation
* usage:
* nvLighter [options] input_file output_file
* options:
* -v int (0-6) [5] # verbosity level
* -zlib string [1R] # e.g. "1", ..., "9", "1R"
* -t int [auto] # number of CPU threads
* -d int [0] # add the specified GPU device
* -k k-mer genome-size alpha # error correction parameters
* -K k-mer genome-size # error correction parameters
* -maxcor # maximum correction factor
* -newQual # new quality score value
* -no-cpu # disable CPU usage
* -no-gpu # disable GPU usage
*
- For example:
* nvLighter -t 20 -d 0 -d 1 -k 31 3500000000 0.2 NA12878.fq.gz NA12878.corrected.fq.lz4
*
- will use 20 CPU threads and GPU 0 and 1 to correct a 35x coverage human dataset and output the result to an LZ4-compressed FASTQ file; whereas the command:
* nvLighter -no-cpu -d 0 -k 31 3500000000 0.2 NA12878.fq.gz NA12878.corrected.txt.lz4
*
- will use only GPU 0 to correct the same 35x coverage human reads and output the result to an LZ4-compressed .txt file.
- As described in the original paper, the alpha parameter should be generally set as 7.0/C, where C is the coverage of the genome. If alpha is not specified, it will be computed with an additional streaming pass through the input reads.
Architecture
nvLighter's class hierarchy is documented in the nvLighter module.