Compressed string dictionaries via data-aware subtrie compaction

Abstract

String dictionaries are a core component of a plethora of applications, so it is not surprising that they have been widely and deeply investigated in the literature since the introduction of tries in the ’60s. We introduce a new approach to trie compression, called COmpressed COllapsed Trie (CoCo-trie), that hinges upon a data-aware optimisation scheme that selects the best subtries to collapse based on a pool of succinct encoding schemes in order to minimise the overall space occupancy. CoCo-trie supports not only the classic lookup query but also the more sophisticated rank operation, formulated over a sorted set of strings. We corroborate our theoretical achievements with a large set of experiments over datasets originating from a variety of sources, e.g., URLs, DNA sequences, databases. We show that our CoCo-trie provides improved space-time trade-offs on all those datasets when compared against well-established and highly-engineered trie-based string dictionaries.

Publication
Proceedings of the 29th International Symposium on String Processing and Information Retrieval (SPIRE)
Avatar
Antonio Boffa
PhD student

PhD student in Computer Science

Avatar
Paolo Ferragina
Full professor

Professor of Algorithms and PI of the project

Avatar
Francesco Tosoni
PhD student

PhD student in Computer Science

Avatar
Giorgio Vinciguerra
Postdoc

Postdoc in Computer Science