Hari Kailad

Small Buffer Optimization


Small Buffer Optimization is often useful for improving performance of arbitrary sized integers. One case where I ran into this was while writing a disassembler.

So I decided to start working on a disassembler and IR lifter for fun, and to learn about how decompilation works. One important type is the Offset, which represents an offset from the start of the address space or the size of a relative jump. Initially, I had

pub struct Offset(i128);

However, one issue with this is if say there is a virtual machine in a reverse engineering challenge where the address space is much larger (say $2^{256}$ spaces). As it is theoretically possible, I decided to replace the i128 with a BigInt from the num-bigint crate. However, the default rust arbitrary precision integer crate is quite slow. Especially since most usage will have small offsets, repeatedly allocating chunks of memory on the heap is quite intensive.

Small Buffer Optimization

To optimize for small values, we can instead store most integers on the stack. We can store a pointer bit which is disabled if the integer is on the stack. However, if it is larger, we enable the pointer bit and the rest of the data is represented as a pointer. For integers, it looks like this:

pub enum BigInt {
    Inline(i128),
    Heap((*mut u32, isize)),
}

If the integer is less than $2^{128}$, then we are guaranteed to store it inline. However, if it is greater, it is stored in base $2^{32}$, with each digit represented as a u32.

One issue with this is that since rust enums are tagged or disjoint unions, the tag takes up 8 bytes.

pub enum BigInt {
    Inline(i128),
    Heap((*mut u32, isize)),
}

fn main() {
    println!("{}", std::mem::size_of::<BigInt>()); // We get 24
}

However, we need a size of 8 bytes to fit the alignment of data on the stack. Otherwise, with one bit out of 128 used as a tag, the heap pointer can only point to a specific region of data, and we have to do 127 bit arithmetic.

The smallvec library implements the SmallVec in a similar way:

#[repr(C)]
pub union RawSmallVec<T, const N: usize> {
    inline: ManuallyDrop<MaybeUninit<[T; N]>>,
    heap: (*const T, usize),
}

#[repr(transparent)]
#[derive(Clone, Copy)]
struct TaggedLen(usize);

#[repr(C)]
pub struct SmallVec<T, const N: usize> {
    len: TaggedLen,
    raw: RawSmallVec<T, N>,
    _marker: PhantomData<T>,
}

Note here that with the specific value of i128, the size is still 24 bytes. Though I could have used SmallVec, using a custom BigInt made it easier to do operations. Importantly, I could match on the enum to check whether arguments to operations were inline or stored on the heap.

For example, when adding two numbers together:

    match (&a, &b) {
        (&BigInt::Inline(i), &BigInt::Inline(j)) => match i.overflowing_add(j) {
            (t, false) => BigInt::Inline(t),
            (t, true) => {
                let mut res = [0, 0, 0, 0, 1];

                let mut v = t;

                for r in 0..4 {
                    res[r] = v as u32;

                    v >>= 32;
                }

                let mut slice = ManuallyDrop::new(<Box<[u32]>>::from(res));

                BigInt::Heap((slice.as_mut_ptr(), 5))
            }
        }
        // ...

Here, if there is no overflow we can just sum the two using builtin fast CPU arithmetic. Otherwise, we can put the overflow in the last value and place the sum into a Box on the heap.

Summary

In summary, small buffer optimizations are somewhat useful when dealing with arbitrarily sized objects (and they should add it to num-bigint because its so slow!!).