NVBIO
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Classes | Functions
Wavelet Trees

Detailed Description

A Wavelet Tree is a data structure that can be used to encode a string T of n symbols from an alphabet of s characters in space O(n log(s)), that allows both symbol access and ranking in O(log(s)) time, i.e:
  • each character T[i] can be recovered in O(log(s)) time
  • the number of occurrences of a character c in the substring T[0,i] can be counted in O(log(s)) time
In other words, a Wavelet Tree is both an alternative string representation (often more amenable to compression), and a storage-efficient rank dictionary. For the sake of comparison, notice that the rank_dictionary class, which is based on a standard sampled occurrence table built directly on top of the original string T, needs O(s) storage - exponentially more in the number of bits per symbol b = log(s).

Classes

struct  nvbio::WaveletTree< BitStreamIterator, IndexIterator, SymbolType >
 
struct  nvbio::WaveletTreeStorage< SystemTag, IndexType, SymbolType >
 

Functions

template<typename system_tag , typename string_iterator , typename index_type , typename symbol_type >
void setup (const index_type string_len, const string_iterator &string, WaveletTreeStorage< system_tag, index_type, symbol_type > &out_tree)
 
template<typename BitStreamIterator , typename IndexIterator , typename IndexType , typename SymbolType >
NVBIO_FORCEINLINE
NVBIO_HOST_DEVICE SymbolType 
text (const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &tree, const IndexType i)
 
template<typename BitStreamIterator , typename IndexIterator , typename SymbolType >
NVBIO_FORCEINLINE
NVBIO_HOST_DEVICE WaveletTree
< BitStreamIterator,
IndexIterator, SymbolType >
::index_type 
rank (const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &tree, const typename WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::index_type i, const uint32 c)
 
template<typename BitStreamIterator , typename IndexIterator , typename SymbolType >
NVBIO_FORCEINLINE
NVBIO_HOST_DEVICE WaveletTree
< BitStreamIterator,
IndexIterator, SymbolType >
::range_type 
rank (const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &tree, const typename WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::range_type range, const uint32 c)
 
template<typename SystemTag , typename IndexType , typename SymbolType >
WaveletTreeStorage< SystemTag,
IndexType, SymbolType >
::plain_view_type 
plain_view (WaveletTreeStorage< SystemTag, IndexType, SymbolType > &tree)
 
template<typename SystemTag , typename IndexType , typename SymbolType >
WaveletTreeStorage< SystemTag,
IndexType, SymbolType >
::const_plain_view_type 
plain_view (const WaveletTreeStorage< SystemTag, IndexType, SymbolType > &tree)
 

Function Documentation

template<typename SystemTag , typename IndexType , typename SymbolType >
WaveletTreeStorage< SystemTag, IndexType, SymbolType >::plain_view_type plain_view ( WaveletTreeStorage< SystemTag, IndexType, SymbolType > &  tree)
related

plain_view specialization

Definition at line 341 of file wavelet_tree.h.

template<typename SystemTag , typename IndexType , typename SymbolType >
WaveletTreeStorage< SystemTag, IndexType, SymbolType >::const_plain_view_type plain_view ( const WaveletTreeStorage< SystemTag, IndexType, SymbolType > &  tree)
related

plain_view specialization

Definition at line 350 of file wavelet_tree.h.

template<typename BitStreamIterator , typename IndexIterator , typename SymbolType >
NVBIO_FORCEINLINE NVBIO_HOST_DEVICE WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::index_type rank ( const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &  tree,
const typename WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::index_type  i,
const uint32  c 
)
related

fetch the number of occurrences of character c in the substring [0,i]

Parameters
treethe wavelet tree
ithe end of the query range [0,i]
cthe query character
template<typename BitStreamIterator , typename IndexIterator , typename SymbolType >
NVBIO_FORCEINLINE NVBIO_HOST_DEVICE WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::range_type rank ( const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &  tree,
const typename WaveletTree< BitStreamIterator, IndexIterator, SymbolType >::range_type  range,
const uint32  c 
)
related

fetch the number of occurrences of character c in the substring [0,i]

Parameters
treethe wavelet tree
ithe end of the query range [0,i]
cthe query character
template<typename system_tag , typename string_iterator , typename index_type , typename symbol_type >
void setup ( const index_type  string_len,
const string_iterator string,
WaveletTreeStorage< system_tag, index_type, symbol_type > &  out_tree 
)
related

build a Wavelet Tree out of a string: the output consists of a bit-string representing the different bit-planes of the output Wavelet Tree, and the tree structure itself, a binary heap, recording the sequence split of each node.

Template Parameters
string_iteratorthe string type: must provide a random access iterator interface as well as define a proper stream_traits<StringType> expansion; particularly, stream_traits<StringType>::SYMBOL_SIZE is used to infer the number of bits needed to represent the symbols in the string's alphabet
template<typename BitStreamIterator , typename IndexIterator , typename IndexType , typename SymbolType >
NVBIO_FORCEINLINE NVBIO_HOST_DEVICE SymbolType text ( const WaveletTree< BitStreamIterator, IndexIterator, SymbolType > &  tree,
const IndexType  i 
)
related

fetch the text character at position i in the wavelet tree

Parameters
treethe wavelet tree
ithe index of the character to extract

Definition at line 457 of file wavelet_tree_inl.h.