1 | Multicriteria Learned Data Structures

Repetition- and linearity-aware rank/select dictionaries

We revisit the fundamental problem of compressing an integer dictionary that supports efficient rank and select operations by exploiting two kinds of regularities arising in real data: _repetitiveness_ and _approximate linearity_. Our first …

Compressing and Indexing Aligned Readsets

Compressed full-text indexes are one of the main success stories of bioinformatics data structures but even they struggle to handle some DNA readsets. This may seem surprising since, at least when dealing with short reads from the same individual, …

Efficiently merging $r$-indexes

Large sequencing projects, such as GenomeTrakr and MetaSub, are updated frequently (sometimes daily, in the case of GenomeTrakr) with new data. Therefore, it is imperative that any data structure indexing such data supports efficient updates. Toward …

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which …

Compression Strategies and Space Conscious Representations for Deep Neural Networks

Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of parameters, thus …

Reproducing the sparse Huffman Address Map compression for deep neural networks

Deploying large convolutional neural networks (CNNs) on limited-resource devices is still an open challenge in the big data era. To deal with this challenge, a synergistic composition of network compression algorithms and compact storage of the …

A “learned” approach to quicken and compress rank/select dictionaries

We address the well-known problem of designing, implementing and experimenting compressed data structures for supporting $\textsf{rank}$ and $\textsf{select}$ queries over a dictionary of integers. This problem has been studied far and wide since the …

PFP Compressed Suffix Trees

Prefix-free parsing (PFP) was introduced by Boucher et al. (2019) as a preprocessing step to ease the computation of Burrows-Wheeler Transforms (BWTs) of genomic databases. Given a string $S$, it produces a dictionary $D$ and a parse $P$ of …

Identifying the $k$ Best Targets for an Advertisement Campaign via Online Social Networks

We propose a novel approach for the recommendation of possible customers (users) to advertisers (e.g., brands) based on two main aspects: (i) the comparison between On-line Social Network profiles, and (ii) neighborhood analysis on the On-line Social …

Practical Random Access to SLP-Compressed Texts

Data compression is a powerful tool for managing massive but repetitive datasets, especially schemes such as grammar-based compression that support computation over the data without decompressing it. In the best case such a scheme takes a dataset so …