Advances in data-aware compressed-indexing schemes for integer and string keys

Abstract

Compressed-indexing schemes for a collection of keys are the backbone of modern data systems, and their space and query time efficiency is crucial to the system’s performance, particularly under continuously growing volumes of data. We discuss some recent developments of these schemes, beginning with integer keys, and then delving into the more complex case of variable-length string keys. The underlying principle driving these advancements is to take advantage of the regularities and trends in the input data by either integrating learned models, which uncover some new regularities and thus enable more effective compression, or by employing data-aware optimization approaches that orchestrate known encoding schemes for synthesizing the best data structure design. We present experiments that demonstrate the robustness of these new approaches compared to the input-sensitive space-time efficiency of existing solutions. Finally, we conclude by discussing new research opportunities.

Date
March 7, 2023
Location
Dresden, Germany
Avatar
Giorgio Vinciguerra
Postdoc

Postdoc in Computer Science