The ever growing need to efficiently store, retrieve and analyze massive datasets, originated by very different sources, is currently made more complex by the different requirements posed by users and applications. Such a new level of complexity cannot be handled properly by current data structures for big data problems.

To successfully meet these challenges, we launched a project, funded by the Italian Ministry of University and Research (PRIN no. 2017WR7SHH), that will lay down the theoretical and algorithmic-engineering foundations of a new generation of Multicriteria Data Structures and Algorithms. The multicriteria feature refers to the fact that we wish to seamlessly integrate, via a principled optimization approach, modern compressed data structures with new, revolutionary, data structures learned from the input data by using proper machine-learning tools. The goal of the optimization is to select, among a family of properly designed data structures, the one that “best fits” the multiple constraints imposed by its context of use, thus eventually dominating the multitude of trade-offs currently offered by known solutions, especially in the realm of Big Data applications.

What is a multicriteria data structure?

A multicriteria data structure, for a given problem $P$, is defined by a pair $\langle \mathcal F, \mathcal A \rangle_P$ where $\mathcal F$ is a family of data structures, each one solving $P$ with a proper trade-off in the use of some resources (e.g. time, space, energy), and $\mathcal A$ is an optimisation algorithm that selects in $\mathcal F$ the data structure that best fits an instance of $P$.

Family of data structures
Family of data structures
Computational resources
Computational resources
Optimisation algorithm
Optimisation algorithm

For more details on the project, have a look at its full description here.


Quickly discover relevant content by filtering publications.
Learned data structures. Oneto L., Navarin N., Sperduti A., Anguita D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, 2020.




A grammar compressor for huge files with many repetitions.

Block-epsilon tree

A compressed rank/select dictionary exploiting approximate linearity and repetitiveness.


A Combination of Convolutional and Recurrent Deep Neural Networks for Nucleosome Positioning Identification.


An extensible framework to efficiently compute alignment-free functions on a set of large genomic sequences.


A general software framework for the efficient acquisition of FASTA/Q genomic files in a MapReduce environment.


A SPARK software system for the collection of $k$-mer statistics.


A compressed bitvector/container supporting efficient random access and rank queries.

Learned sorted table search

A software library to speed-up sorted table search procedures via learning from data.

PFP Data Structures

Data structures supporting Longest Common Extensions and Suffix Array queries, built on the prefix-free parsing of the text.


A data structure enabling fast searches in arrays of billions of items using orders of magnitude less space than traditional indexes.


Python library of sorted containers with state-of-the-art query performance and compressed memory usage.


Compression strategies and space-conscious representations for deep neural networks.

Invited talks

Learning-based approaches to compressed data structures design
Extraction of Functional Knowledge from Large Biological Graphs and Applications in Precision Medicine
FM-index - the story... with a moral for (young) researchers
Introduction to compressed data structures
Learned Indexes
A rigorous approach to design learned data structures
On the compressed indexing of dictionaries of keys for massive key-value stores
A tutorial on learning-based compressed data structures
Theory and practice of learning-based compressed data structures
Learned and Compressed Data Structures, challenges on Storage Systems
Learned and Compressed Data Structures
The future of data structures: data-aware and self-designing
Life sciences and algorithmic design: speed and accuracy in small space
Hybrid Data Structures and beyond
The evolution of searching data structures



The future of compressed data structures, 20 years after the FM-index

July 19, 2022 – July 20, 2022 Lipari, Italy
Organised by Paolo Ferragina and Giovanni Manzini. Info

PhD course in Data compression and compressed data structures

May 16, 2022 – May 31, 2022 Pisa, Italy
Organised by Paolo Ferragina and Giovanni Manzini. Info

2020 Student Challenge @ Algorithm Engineering UniPi course

November 4, 2020 – February 10, 2021 Pisa, Italy
Organised by Paolo Ferragina and Giorgio Vinciguerra. Info

Meetings & Reports


Project end

August 28, 2022 (postponed to February 28, 2023 because of COVID-19)

Report of the second year

September 1, 2021

Report of the first year

September 1, 2020

Kickoff meeting

October 14, 2019 Video conference

Minutes of the meeting


Project start

September 1, 2019


Principal Investigators


Marco Frasca

Assistant professor


Raffaele Giancarlo

Full professor

Università di Pisa


Davide Bacciu

Assistant professor


Andrea Guerra

PhD Student

Università degli Studi di Milano


Marco Frasca

Assistant professor


Dario Malchiodi

Associate professor


Marco Mesiti

Associate professor


Paolo Perlasca

Assistant professor

Università degli Studi di Palermo


Raffaele Giancarlo

Full professor


Giosuè Lo Bosco

Associate professor


Simona E. Rombo

Associate professor

Università degli Studi del Piemonte Orientale “Amedeo Avogadro”


Lavinia Egidi

Associate professor


Manuel Striani

Postdoc research fellow