One of my favorite things about Rust, and one of the first things that attracted me to it, is its iterators and the many combinator functions that are available.

Before getting into Rust, I was focused on Typescript/JavaScript, and I had a preference for .filter(), .map() and reduce() in lieu of manual loops. I already had a disposition for function composition, method chaining and closures.

Anyone who has read the Rust Book might have seen the statement in the section on “Loops vs. Iterators”:

The point is this: iterators, although a high-level abstraction, get compiled down to roughly the same code as if you’d written the lower-level code yourself. Iterators are one of Rust’s zero-cost abstractions, by which we mean using the abstraction imposes no additional runtime overhead. 1

I always assumed this was true for the most part, but recently I had a chance to dig in deeper.

The scenario

One of my clients is a global retailer and I have been helping them get a core service that feeds their ecommerce website production-ready. One specific task involved extending service plan filtering so consumers see the correct product service plans for the given product price and date they are shopping.

My approach was based around the fairly idiomatic Option 2, but the existing code was not. It is common for those new to Rust to continue doing things that are not very “Rusty”. To be fair, a lot of those old methods you may know still work in Rust (i.e. like imperative array length checks ). However, when we stick to the Rust idioms, all kinds of features open up and coding becomes a lot easier.

Something akin to this non-idiomatic code is shown below. It filters some incoming data and stages it to be later transformed into another format. Notably, they allocate a Vec<T>, conditionally push items on to it and unconditionally return it. So any caller has to look at the vec’s state to know what is going on.

fn filter_stuff(context: &SomeContext, things: &[CoolThings]) -> Vec<CoolThings> {
  let stuff = context.getStuff();
  let output_vec = vec![];

  let Some(first_dep) = stuff.first else {
    return output_vec;
  }

  for whatever in things {
    if let Some(yea) = whatever.first_field {
      if let Some(the_right_stuff) = stuff {
        ...
          output_vec.push(the_right_stuff.clone());
      }
      ...
    }
  }
  output_vec
}

The calling code of filter_stuff was something like this:

fn caller(...) -> CallersReturn {
  let values = filter_stuff(context, things);

  if values.len() > 0 {
    // move values into caller's return
  }
}

caller checks the filter_stuff return value len() to determine if it should continue its own work. This is reasonable in almost any C-style language, but there is another way.

Adaptation

My code provided alternate filtering that would fall back to the existing filter_stuff if it did not produce any values. Here is a rough re-creation of my filter code:

fn andrews_filter(context: &SomeContext, things: &[CoolThings], ...)
-> Option<Vec<CoolThings>>
{
  let first_dep = context.first()?;
  let sec_dep = context.second()?;

  let values: Vec<CoolThings> = first_dep.iter().filter_map(|first| {
    ...
      Some(first.clone())
  }).collect()

  if values.len() > 0 {
    return Some(values);
  }
  None
}

You can see there is no rote vec allocation upfront. Instead, I simply use an iterator chain to select the values I need and transform them. At the end, I did the same len() test to turn my vec into a Option::Some(T) or an Option::None. Nothing surprising here.

A nice bonus of using Option<T> return types is the simple early returns you see above:

let first_dep = context.first()?;

By leveraging the question mark operator, you get concise option extraction and early return, unlike this:

let Some(first_dep_value) = context.first else {
  return None;
};

Very nice.

The next step was joining these two filter functions in a sensible way. My first idea was this:

fn caller(...) -> CallersReturn {
  let values = andrews_filter(context, things, ...) // Option returns from our filters funcs let us 
    .or_else(|| filter_stuff(context, things));     // use FP-like combinators such as or_else

  if let Some(vals) = values {                      // A more idiomatic test of an Option vs. len()
    ...
  }
}

Again, using Option<T> wrapped values gives you access to all its combinators (Options have nice combinators too), as you see here with the or_else which is well behaved - it does not pre-compute its closure unless it definitely branches there.

Now all I had to do was adapt filter_stuff to return an Option to make the or_else valid.

fn filter_stuff(context: &SomeContext, things: &[CoolThings])
  -> Option<Vec<CoolThings>>
{
  ...
  let Some(first_dep) = stuff.first else {
    return None;             // return Option::None instead of empty Vec
  }
  ...
  if output_vec.len() > 0 {
    return Some(output_vec); // wrap a non-empty Vec in Option::Some<T>
  }
  None                       // otherwise return Option::None
}

We can always do more

At this point I had something that was working, but it was time to clean up the code and get ready for a PR. I had that nagging feeling that my code could be improved. The fact that I had imperative code after the iterator stood out to me. Why can’t I do it all with an iterator when they are so powerful and have 100’s of useful methods? For instance, the fact that you can collect a iterator of Result<T, E> into a Result<Vec<T, E> still seems magical to me! 3

My first thought went back to the basic tools of JavaScript: filter, map and reduce. I knew I needed a collection of things to be wrapped into a new singular type, so maybe I could simply use the Rust equivalents? The Rust equivalents are a little different though.

Rust’s reduce is definitely not like JavaScript’s. Rust’s reduce is a more restricted version of JavaScript’s in the sense that you cannot project into a new type distinct from the Iterator::Item type. Because reduce can only return the item type you already have, it takes no initialization parameter like what you would see in JavaScript:

[0, 1, 1, 2, 3]
  .reduce(
    (acc, memo) => acc.final_number + memo, 
    { final_number: 42 } // initial value identifies our new type
  ); 

The actual analog for JavaScript reduce in Rust is fold. fold takes an initializer that can be any type. Unfortunately I could not figure out a way to both check that the vec has items (a Some variant) and then just return the vec wrapped in that variant, without allocating a new vec. Although I only need to see a single collection item to determine that the vec is a Some variant, I really need to see the entire collection to wrap it in a Some to return. Imperatively this is trivial.

Functionally (read non-imperative), iterator closures are limited by Rust’s borrowing rules. Without getting lost in the Rust borrowing morass:

  1. How can we possibly return an owned vec, which would be captured and owned by the closure anonymous structure, if we want to immutably borrow the vec to iterate it in the first place?
    • We can’t.

Typescript does not have these restrictions:

  let arr = [0, 1, 2, 3];
  enum Option {
    None,
    Some
  }
  let out = arr.reduce((acc: any, memo: any) => {
    if (acc == Option.None) {
      return (Option.Some, arr);
    }
    return acc;
  }, Option.None)

Typescript, being gargage collected, has no issue returning a pointer to the closed-over array in the outer scope. We don’t get an early exit with reduce unfortunately.

At this point, I decided to shift gears. Heretofore, I would have done web searches, but this year I have started to experiment with LLMs. So I typed into Sonnet 3.5:beta: “idiomatic way in Rust to project a non-empty vec into a Option::Some and an empty vec into an Option::None using only iterator combinators”. 4

The suggestion was this:

fn andrews_filter(context: &SomeContext, things: &[CoolThings], ...)
  -> Option<Vec<CoolThings>>
{
  ...
  let values: Vec<CoolThings> = first_dep.iter().filter_map(|first| { ... })
+   .fold(Option::<Vec<CoolThings>>::None, |acc: Option<Vec<CoolThings>>, item| {
+     match acc {
+         Some(mut vc) => {
+             vc.push(item.clone());          // second clone for each item!
+             Some(vc)
+         },
+         None => Some(vec![item.clone()])    // second vec allocation AND clone the item again!
+     }
  });
  values
}

So it is suggesting fold, which I didn’t think would work given my constraints. In pure “vibe coder” fashion, I accept the code change, watch the LSP check it and then run my unit tests. Everything seems ok, however…

After first blush, I noticed what looks like a vector allocation there, which is precisely what I don’t want. I want a nearly zero-cost way to project a non-empty vec into a Some(vec<T>). And not only that, but because the collection did not consist of a Copy type, was it cloning every item a second time as well? You cannot see it in the example code (elided above as …), but the items in the Vec<CoolThings> were already cloned from the owning &[CoolThings] slice.

To be certain about if there were any cool optimizations that made this code less-bad, I checked out the assembly. Below is a small representation of the source, with simpler vector types:

#[no_mangle]
pub fn folder() -> Option<Vec<usize>> {
    let v = vec![1,2,3];
    let f: Option<Vec<usize>> = v.iter()
        .map(|item| {
            item + 1
        })
        .fold(Option::<Vec<usize>>::None, |acc: Option<Vec<usize>>, item| {
            match acc {
                Some(mut vc) => {
                    vc.push(item);
                    Some(vc)
                },
                None => Some(vec![item])
            }
        });
    f
}

Compiler Explorer presents the following code (in release build):

mov     r15, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
movzx   eax, byte ptr [r15]
mov     edi, 24
mov     esi, 8
call    qword ptr [rip + __rust_alloc@GOTPCREL]                     ; alloc source vec
test    rax, rax
je      .LBB2_15                                                    ; branch on alloc err
mov     rbx, rax                                                
mov     qword ptr [rax], 1                                          ; init 1,2,3
mov     qword ptr [rax + 8], 2
mov     qword ptr [rax + 16], 3

We see that we immediately allocate the first vec and insert 1, 2, and 3, as expected. Later down in the procedure we start getting into the filter_map and fold related code. The first iteration unrolled is here:

mov     edi, 8                                                      ; alloc 8 bytes for our first vec element
mov     esi, 8                                                      ; 8 byte alignment
call    qword ptr [rip + __rust_alloc@GOTPCREL]                     ; invoke allocator
test    rax, rax
je      .LBB2_2                                                     ; branch on alloc err
mov     qword ptr [rax], 2                                          ; dest vec[0] = 1+1
mov     r12, qword ptr [rbx + 8]                                
mov     qword ptr [rsp], 1                                      
mov     qword ptr [rsp + 8], rax                                    ; new element ptr
mov     qword ptr [rsp + 16], 1                                 
mov     rdi, rsp                                                    
call    alloc::raw_vec::RawVec<T,A>::grow_one::h163fffecacfb1b78    ; extend single element to vec

We can see that we allocate a new naked vec element to be the first mapped element from the source vec. We then extend that vec element with grow_one which is amortized expansion as you might expect. That creates enough space to require no more allocations for the final two elements. Even though we don’t need to reallocate, the code does descend into the realloc library code and early exits for the remaining two items. 5

The two other unrolled iterations, not shown here, oddly do not have the compile-time optimization of the mov qword ptr [rax], 2 which is what the first iteration had for 1+1. These others copy the source vec’s elements and apply the inc instruction to the same effect. Not a massive deal, just a quirk of this toolchain version or target maybe?

So amazingly, while on the surface of the Rust source, it looks like we could be allocating a third vec, in fact we aren’t 😉. Regarding the issue of additional clones of the vec items, I’ll leave it as an exercise to the reader to modify the example to have non-copy vec items and determine if those occur. 6

Next up

Let’s look at Grok3 to see if we get anything better. I gave it essentially the same prompt and it came back with this:

#[no_mangle]
pub fn functional() -> Option<Vec<usize>> {
    let v: Vec<usize> = vec![1, 2, 3];
    Some(
      v.iter()
        .map(|item| item + 1)
        .collect::<Vec<usize>>()
    )
    .filter(|vec| !vec.is_empty())
}

As soon as I saw it I groaned. Why didn’t I think of this one? This clearly won’t be unnecessarily allocating anything. Perhaps the only question is, is it too terse?. Option::filter takes an option and any Some variant is passed through the closure to determine if it should be kept as-is or turned into a None. None variants just remain None variants.

This produces something we’re more used to with LLVM release builds; ultra optimized code:

functional:
push    rbx
mov     rbx, rdi
mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
movzx   ecx, byte ptr [rax]
movzx   eax, byte ptr [rax]
mov     edi, 24                           ; allocate all three quad words up front
mov     esi, 8
call    qword ptr [rip + __rust_alloc@GOTPCREL]
test    rax, rax
je      .LBB0_2
mov     qword ptr [rax], 2                ; insert const final elements as-is, 1+1
mov     qword ptr [rax + 8], 3            ; 2+1
mov     qword ptr [rax + 16], 4           ; 3+1
mov     qword ptr [rbx], 3
mov     qword ptr [rbx + 8], rax
mov     qword ptr [rbx + 16], 3
mov     rax, rbx
pop     rbx
ret

All the final vec elements are inserted as pre-computed immediate const values since the compiler knows these are not changing and can calculate their value at compile time. It also allocates all three vec elements at once.

This suggests that writing simpler iterators gives the compiler something much easier to optimize. While the prior example early exited realloc calls, this version does not even attempt them and there are far fewer instructions overall.

I tried to over-complicate this example so that Compiler Explorer would not optimize everything away, but in the end, I had to add so much extra boiler plate, it became an unfair comparison to the previous approach above. Suffice to say, this version is much more optimized in release mode.

Outro

This post is already long, but I wanted to circle back to show what the assembly of the original code with the imperative Option wrapping looks like. After all, that is what started this journey:

pub fn checker() -> Option<Vec<usize>> {
    ...
    let f: Vec<usize> = v.iter()...

    if f.is_empty() {
        return None;
    }
    Some(f)
}

The assembly turns out to be identical, instruction for instruction to the slicker, more “functional” version assembly above this. Literally identical. You’ll notice that in this example, it’s using a nicer (to me?) form of is_empty(). Rest assured, it generates the same code that a length check does.


  1. https://doc.rust-lang.org/book/ch13-04-performance.html ↩︎

  2. They’re called Sum types apparently. Fancy! ↩︎

  3. https://stackoverflow.com/questions/6379662/how-do-i-convert-a-vecresultt-e-to-resultvect-e ↩︎

  4. I’ve read you don’t need to be verbose, but I have been having better luck spelling things out. ↩︎

  5. Confirmed after running through GDB/GEF a few times. ↩︎

  6. SPOILER: This is sort of a tricky question based on the source provided. In the original LLM suggestion, it is cloning items in the fold step. If those are kept and the other clones occur in the map, then we definitely have 2 clones. But we only need 1 if we emit the owned type from the map. If we remove the clones from the fold and simply take ownership, then there is simply the one clone in the map↩︎