- This page will teach you to familiarize with NVBIO's notion of string sets and with some basic parallel constructs. Assume you have the string from the previous example, but now want to extract all of its 3-mers, for example to use them in a seed & extend aligner. The collection of all 3-mers can be thought of as a set of substrings, or infixes of your original string; NVBIO represents such an object as an InfixSet. Moreover, it provides a function to extract the coordinates of such seeds in parallel.
using namespace nvbio;
{
const char dna_string[] = "ACGTTGCA";
nvbio::assign( len, nvbio::from_string<DNA>( dna_string ), h_dna.begin() );
len,
d_seed_coords );
infix_set_type d_seeds(
n_seeds,
d_dna.begin(),
d_seed_coords.begin() );
...
- At this point it would be nice to do something with the string-set we just created. For the sake of the example, we will just print all its strings. Technically, we could just loop through the strings in the set, and print them on the host. However, suppose we want to do it in the native space of the set, i.e. on the device, and suppose we want to do it in parallel: how can we do this? Well, we will do it here using a parallel primitive, and particularly the for_each() construct:
...
nvbio::for_each<device_tag>(
n_seeds,
seeds.begin(),
print_strings<infix_set_type>() );
}
- for_each() takes a functor encoding the body of our loop, and applies it to all elements of a sequence. In this case, the sequence is the string-set, and the body of our loop will be the following:
template <typename string_set_type>
struct print_strings
{
typedef typename string_set_type::string_type string_type;
void operator() (const string_type string) const
{
char seed[4];
nvbio::to_string<DNA>(
string,
seed );
printf("%s\n", seed);
}
};
- Notice how we marked our print_strings methods with the NVBIO_HOST_DEVICE qualifier: this tells the CUDA compiler that these methods can be compiled both for the host and the device.
It is also worth mentioning that if we kept our vectors on the host (i.e. replacing the device_tag specifiers with host_tag), everything would have still run in parallel, except it would have run on the host.
Next: All-Mapping using an FM-index Top: NVBIO