Finite State Transducers String-to-String Map in Rust

Several years ago working on a project I found a requirement for a really fast loading and compact string map that I could search over with regexes. Rust already has a great data structure for searching strings with regexes, called fst. However, fst only supports integers as map keyed values. To create a string-to-string map from fst I packed the strings into a pointer table along with the fst data itself. The fst data can be memmapped into use, so loading time is almost negligible.

I have decided to open-source that code in case that it would be useful for others. I will put all that code in the fst_stringstring crate and on Github.

Building a String Table

The biggest feature of this code is the memory mapped string table. Creating and using a string table looks like this:

use fst_stringstring::builder::StringBuilder;
use fst_stringstring::strings::StringMap;

#[test]
fn builder1() -> std::io::Result<()> {
   let mut builder = StringBuilder::new("testy.fmm")?;
   let i1 = builder.insert("abcd")?;
   let i2 = builder.insert("efgh")?;
   builder.finish();
   assert_eq!(i1, 0);
   assert_eq!(i2, 5);

   let map = StringMap::new("testy.fmm")?;
   assert_eq!( map.get(0), "abcd" );
   assert_eq!( map.get(1), "bcd" );
   assert_eq!( map.get(2), "cd" );
   assert_eq!( map.get(3), "d" );
   assert_eq!( map.get(4), "" );
   assert_eq!( map.get(5), "efgh" );
   assert_eq!( map.get(6), "fgh" );
   assert_eq!( map.get(7), "gh" );
   assert_eq!( map.get(8), "h" );
   assert_eq!( map.get(9), "" );
   assert_eq!( map.get(10), "" );

   std::fs::remove_file("testy.fmm")?;
   Ok(())
}

Integrating String Key-Values with FST

The next step for a string-string FST map is to memory map both the string table and the fst data. To build a FST file, the string inserts must happen in ascending order. Aside from that the mmap itself is deemed unsafe behavior, so an unsafe block is required.

use fst_stringstring::builder::StringBuilder;
use fst_stringstring::strings::StringMap;
use fst_stringstring::map::StringStringMap;
use std::fs::{File,remove_file};
use std::path::Path;
use fst::Map;
use memmap::Mmap;

#[test]
fn mbuilder1() -> std::io::Result<()> {
   if Path::new("testy.fst").exists() { remove_file("testy.fst")?; }
   if Path::new("testy.fmm").exists() { remove_file("testy.fmm")?; }

   let mut builder = StringBuilder::new("testy.fmm")?;
   let wtr = std::io::BufWriter::new(File::create("testy.fst")?);
   let mut fst_builder = fst::MapBuilder::new(wtr).expect("Unable to create fst::MapBuilder");

   let i1 = builder.insert("abcd")?;
   fst_builder.insert("a", i1 as u64).expect("fst insert");
   let i2 = builder.insert("efgh")?;
   fst_builder.insert("b", i2 as u64).expect("fst insert");
   builder.finish();
   fst_builder.finish().expect("Unable to finish build of fst::MapBuilder");

   let mmap = unsafe {
      Mmap::map(&File::open("testy.fst")?).expect("unable to mmap file")
   };
   let fmap = Map::new(mmap).expect("Could not create fst::Map from file");
   let smap = StringMap::new("testy.fmm")?;
   let map = StringStringMap::new(fmap, smap);

   assert_eq!( map.get("a"), Some("abcd".to_string()) );
   assert_eq!( map.get("b"), Some("efgh".to_string()) );
   assert_eq!( map.get("c"), None );

   if Path::new("testy.fst").exists() { remove_file("testy.fst")?; }
   if Path::new("testy.fmm").exists() { remove_file("testy.fmm")?; }
   Ok(())
}