: This module contains a series of functions to perform string alignment. Most of them can be called both from host (CPU) and device (GPU) code, although some target only the device (e.g. the warp-parallel ones), and others are host functions that execute various kernels on the device (e.g. the ones for Batch Alignment).
Generally speaking, all these functions are parametrized over the alignment algorithm, which is specified by an Aligner object. The aligner encodes the scheme used to score character matches, mismatches, insertions and deletions. Currently, there are three types of aligners, Edit Distance, Smith-Waterman and Gotoh.

: The main functionalities are scoring, traceback and batch alignment. The following is a brief overview; for a detailed list of all classes and functions see the Alignment Module documentation.

Scoring

: Scoring is the process of computing the dynamic programming matrix of the scores of all possible alignments between a pattern string P and a text string T (roughly speaking, the cell (i,j) of the DP matrix encodes the best possible score between the suffix T[0:i] and the suffix P[0,j]). The scoring functions do not compute the actual alignments, they just assign scores to all possible valid alignments and identify them by their last cell. These, are passed to an Alignment Sink, a user-defined delegate responsible for handling the results.

The module exposes several variants of the following functions:

: according to whether one wants to process a pattern with qualities or not, whether he's interested in a single best score or the best N, and so on. To see a concrete usage example, consider the following code snippet:

typedef vector_view<const char*> const_string;
const_string  text = make_string("AAAAGGGTGCTCAA");
const_string  pattern  = make_string("GGGTGCTCAA");
const int32 ed = aln::banded_alignment_score<5>(
       aln::make_edit_distance_aligner<aln::SEMI_GLOBAL>(), // build an edit-distance aligner
       pattern,                                             // pattern string
       text,                                                // text string
       -255 );                                              // minimum accepted score
const aln::SimpleGotohScheme scoring(                       // build a Gotoh scoring scheme
     2,                                                     // match bonus
    -1,                                                     // mismatch penalty
    -1,                                                     // gap open penalty
    -1 )                                                    // gap extension penalty
aln::Best2Sink<int32> best2;                                // keep the best 2 scores
aln::banded_alignment_score<5>(
       aln::make_gotoh_aligner<aln::LOCAL>( scoring ),      // build a local Gotoh aligner
       pattern,                                             // pattern string
       text,                                                // text string
       -255,                                                // minimum accepted score
       best2 );                                             // alignment sink

Aligners and Alignment Algorithms

: In the previous section we have seen that scoring functions take an aligner as an input. NVBIO exposes three such types, which specify the actual type of alignment scheme to use:

: These objects are parameterized by an AlignmentType, which can be any of GLOBAL, SEMI_GLOBAL or LOCAL, and an Algorithm Tag, which specifies the actual algorithm to employ. At the moment, there are three such algorithms:

PatternBlockingTag : a DP algorithm which blocks the matrix in stripes along the pattern
TextBlockingTag : a DP algorithm which blocks the matrix in stripes along the text
MyersTag : the Myers bit-vector algorithm, a very fast algorithm to perform edit distance computations

Traceback

: Traceback is the process of computing the actual alignment between two strings, that is to say the series of edits (character replacements, insertions and deletions) necessary to obtain a given alignment. Traceback functions can either report a single optimal alignment, or enumerate all possible alignments. Once again, the output alignments are built implicitly, calling a user-defined Backtracer delegate.

Again, the module exposes both banded and whole-matrix traceback:

Backtracer Model

: A backtracer is a user-defined delegate implementing the following interface:

struct Backtracer
{
    // soft-clip the next n characters
    void clip(const uint32 n);
    // push the given string edit
    void push(const DirectionVector op);  // op = {SUBSTITUTION|INSERTION|DELETION}
};

: Notice that traceback algorithms will push operations backwards, starting from the end of the alignment and proceeding towards the source. The very first and very last operations will be calls to clip(n), potentially with n = 0, signalling the length of the soft-clipping areas respectively at the end and beginning of the pattern.

Memory Management

Some of the functions in this module require a certain amount of temporary storage, for example to store a column of the DP matrix while scoring it, or a stripe of the flow matrix for traceback. On a CPU, running a modest amount of threads, one would typically either allocate this storage on the heap inside the function itself and free it before returning, or require the user to pass an allocator. On a GPU, where thousands of threads might call into the same function at once, both strategies would likely be very inefficient because of resource contention (an allocator would have to do some form of locking to be thread safe). Instead, nvbio delegates the responsibility to the caller, who needs to pass the necessary temporary storage (e.g. a matrix column) to the function. The temporary storage type is a template iterator, meaning that the user is able to control freely and exactly what kind of memory is passed and its addressing scheme (for example, one could pass a plain local memory array, but also a strided_iterator pointing to some global memory arena). Convenience functions are available to let nvbio do the necessary allocations at compile-time in local memory, but the user needs to be aware of the fact that:

1. the maximum problem size needs to be statically provided at compile-time
2. when calling into these functions from a large thread grid, the amount of allocated local memory might be significant. This is particularly problematic for traceback functions, which can easily need up to ~50KB per thread.

Batch Alignment

: Besides offering functions to compute an individual alignment between a single pattern and a single text (from within a thread or a warp), this module offers parallel execution contexts for performing larger batches of alignment jobs on the device with a single call. Under the hood, these contexts dispatch the parallel execution invoking one or multiple kernels, possibly splitting the computations of each alignment in many smaller chunks so as to load balance and re-gather execution coherence. The parallel algorithm employed is specified at compile-time by a Batch Scheduler.

: The following code snippet shows an example using the convenience function batch_alignment_score() :

const uint32 n_strings   = 10000;
const uint32 pattern_len = 100;
const uint32 text_len    = 200;
// build two concatenated string sets
thrust::device_vector<uint8>  pattern_storage( n_strings * pattern_len );
thrust::device_vector<uint32> pattern_offsets( n_strings+1 );
thrust::device_vector<uint8>  text_storage( n_strings * text_len );
thrust::device_vector<uint32> text_offsets( n_strings+1 );
// fill their content with random characters in [0,4)
const uint32 ALPHABET_SIZE = 4u;
fill_random( ALPHABET_SIZE, pattern_storage.begin(), pattern_storage.end() );
fill_random( ALPHABET_SIZE, text_storage.begin(), text_storage.end() );
// prepare their offset vectors
thrust::sequence( pattern_offsets.begin(), pattern_offsets.begin() + n_strings+1, 0u, pattern_len );
thrust::sequence( text_offsets.begin(),    text_offsets.begin()    + n_strings+1, 0u, text_len );
// prepare a vector of alignment sinks
thrust::device_vector< aln::BestSink<uint32> > sinks( n_strings );
// and execute the batch alignment
aln::batch_alignment_score(
    aln::make_edit_distance_aligner<aln::SEMI_GLOBAL, MyersTag<ALPHABET_SIZE> >(),
    make_concatenated_string_set( n_strings, pattern_storage.begin(), pattern_offsets.begin() ),
    make_concatenated_string_set( n_strings, text_storage.begin(),    text_offsets.begin() ),
    sinks.begin(),
    aln::DeviceThreadScheduler() );

Technical Overview

: A complete list of the classes and functions in this module is given in the Alignment Module documentation.