The ever growing need to efficiently store, retrieve and analyze massive datasets, originated by very different sources, is currently made more complex by the different requirements posed by users and applications. Such a new level of complexity cannot be handled properly by current data structures for big data problems.

To successfully meet these challenges, we launched a project, funded by the Italian Ministry of Education (PRIN no. 2017WR7SHH), that will lay down the theoretical and algorithmic-engineering foundations of a new generation of Multicriteria Data Structures and Algorithms. The multicriteria feature refers to the fact that we wish to seamlessly integrate, via a principled optimization approach, modern compressed data structures with new, revolutionary, data structures learned from the input data by using proper machine-learning tools. The goal of the optimization is to select, among a family of properly designed data structures, the one that “best fits” the multiple constraints imposed by its context of use, thus eventually dominating the multitude of trade-offs currently offered by known solutions, especially in the realm of Big Data applications.

What is a multicriteria data structure?

A multicriteria data structure, for a given problem $P$, is defined by a pair $\langle \mathcal F, \mathcal A \rangle_P$ where $\mathcal F$ is a family of data structures, each one solving $P$ with a proper trade-off in the use of some resources (e.g. time, space, energy), and $\mathcal A$ is an optimisation algorithm that selects in $\mathcal F$ the data structure that best fits an instance of $P$.

Family of data structures
Family of data structures
Computational resources
Computational resources
Optimisation algorithm
Optimisation algorithm

For more details on the project, have a look at its full description here.

Recent Posts

Publications

Quickly discover relevant content by filtering publications.
Learned data structures. Oneto L., Navarin N., Sperduti A., Anguita D. (eds) Recent Trends in Learning From Data. Studies in Computational Intelligence, vol 896. Springer, 2020.

PDF DOI

Projects

PGM-index

A data structure enabling fast searches in arrays of billions of items using orders of magnitude less space than traditional indexes.

Talks

DNA combinatorial messages and Epigenomics: The case of chromatin organization and nucleosome occupancy in eukaryotic genomes

Epigenomics is the study of modifications on the genetic material of a cell that do not depend on changes in the DNA sequence, since those latter involve specific proteins around which DNA wraps. The end result is that epigenomic changes have a fundamental role in the proper working of each cell in Eukaryotic organisms. A particularly important part of Epigenomics concentrates on the study of chromatin, that is, a fiber composed of a DNA-protein complex and very characterizing of Eukaryotes.

Hybrid Data Structures and beyond

The ever growing need to efficiently store, retrieve and analyze massive datasets, originated by very different sources, is currently made more complex by the different requirements posed by users, devices and applications. Such a new level of complexity cannot be handled properly by current data structures for Big Data problems. To successfully meet these challenges, new surprising results have appeared recently in the literature that integrate classic approaches (such as B-trees) with various kinds of learning models (such as Neural Networks), called Hybrid Data Structures.

Events

 
 
 
 
 

Project end

Sep 2021 – Present
 
 
 
 
 

Second meeting

Feb 2020 – Feb 2020 Dept. of Computer Science, Via Largo Bruno Pontecorvo 3, Pisa
 
 
 
 
 

Kickoff meeting

Sep 2019 – Oct 2019 Video conference
Minutes of the meeting

 
 
 
 
 

Project start

Sep 2019 – Present

People

Principal Investigators

Avatar

Marco Frasca

Assistant professor

Avatar

Raffaele Giancarlo

Full professor

Università di Pisa

Avatar

Davide Bacciu

Assistant professor

Avatar

Luca Oneto

Associate professor

Università degli Studi di Milano

Avatar

Marco Frasca

Assistant professor

Avatar

Dario Malchiodi

Associate professor

Avatar

Marco Mesiti

Associate professor

Avatar

Paolo Perlasca

Assistant professor

Università degli Studi di Palermo

Avatar

Raffaele Giancarlo

Full professor

Avatar

Andrea De Salve

Junior researcher (RTD-A)

Avatar

Giosuè Lo Bosco

Associate professor

Avatar

Simona E. Rombo

Associate professor

Università degli Studi del Piemonte Orientale “Amedeo Avogadro”

Avatar

Lavinia Egidi

Associate professor