Advances in data-aware compressed-indexing schemes for integer and string keys


Compressed-indexing schemes for a collection of keys are the backbone of modern data systems, and their space and query time efficiency is crucial to the system’s performance, particularly under continuously growing volumes of data. We discuss some recent developments of these schemes, beginning with integer keys, and then delving into the more complex case of variable-length string keys. The underlying principle driving these advancements is to take advantage of the regularities and trends in the input data by either integrating learned models, which uncover some new regularities and thus enable more effective compression, or by employing data-aware optimization approaches that orchestrate known encoding schemes for synthesizing the best data structure design. We present experiments that demonstrate the robustness of these new approaches compared to the input-sensitive space-time efficiency of existing solutions. Finally, we conclude by discussing new research opportunities.

March 7, 2023
Dresden, Germany
Giorgio Vinciguerra

Postdoc in Computer Science