Faster JS Array Filtering

# 2023-02-02

# Making JavaScript array filtering 25 times faster.

Filtering values out of an array in JavaScript is a very common operation, but there are a surprisingly large number of methods to achieve this, and some are better than others if we prioritise performance.

Time is money, and this is especially relevant for JavaScript, since it controls nearly every single e-commerce website, subscription service and donation page. The BBC found that for every additional second in load time, they lose 10% of users from their site. The average attention span is nearly nonexistent, and our user is probably already back on TikTok.

The elephant in the room is that timing JavaScript execution speed can often give inconsistent or misleading results. Google's V8 JavaScript engine has gone through 15 heavy years of development. A good portion of this progress has been dedicated to performance and optimisations, particularly around the just-in-time (JIT) compilation. In search of performance JavaScript engines have gone to the extreme, often making strategic, aggressive guesses about what your code is about to do and compile some code for that scenario ahead of time. If it's correct, it pays off with a performance boost, if not, things can go much slower than expected. As a result, performance is heavily dependent on the code that has been previously run, or whether you're reliably doing the same operations over and over. You could even see this demonstrated when timing the same Node.js program over and over again, and often watch its elapsed time drop with each run. It gets easier and easier for V8 to predict which parts of your code are going to be running the most and therefore should be optimised. Performance optimisations are everywhere in modern programming languages, especially ones with a JIT compiler involved like JavaScript, Python (Pypy), Lua and .NET’s CLR. Even beyond this, performance obviously varies drastically between language versions and from machine to machine. This makes performance testing for reliable results hard to do and any results may not be relevant for long. In our tests here, performance recordings were run on the same Nanode Linode instance with minimal background processes using Node v18.14.0, and averaged across 100 runs to give more accurate time recordings. Finally, to try and offset some craziness from the JIT compiler, the order of the timed functions were randomised, although you could easily argue this leads to less realistic results.

Before we start testing, we're going to need some numbers. Since we can't specify a seed value for Math.random() we'll keep it easy and just use a linear range from 0 to 100,000. We'll build some solutions to filter out all even numbers from this list.

function createArray(): number[] {
  return [...Array(100_000).keys()]
}

Splice

We could start with an initial benchmark by trying the most obvious solution. If we iterate across the array, we can just splice out any values that meet the condition. Of course, we would have to iterate through the array backwards to avoid the splice from messing with the indices of the following values we have left to check. We don't have to create a new array or return a value.

function filterEven(arr: number[]) {
  for (let i = arr.length - 1; i >= 0; i--) {
    if (arr[i] % 2 === 0) {
      arr.splice(i, 1);
    }
  }
}
Elapsed: 6.7978ms

forEach

Instead of altering the existing array, maybe it would be quicker if we copied over valid elements into a new array? When we try this with a forEach, we find the performance has improved quite significantly.

function filterEven(arr: number[]): number[] {
    const filtered: number[] = [];
    arr.forEach(element => {
        if (element % 2 !== 0) {
            filtered.push(element);
        }
    });
    return filtered;
}
Elapsed: 2.9932ms

Filter

Then we remember, that JavaScript lists have a filter method that does exactly what we need. We can try this and find slightly better performance.

function filterEven(arr: number[]): number[] {
  return arr.filter(x => { return x % 2 === 0 })
}
Elapsed: 2.4269ms

New Array

But it turns out that forEach is slightly slower than a normal for loop. If we repeat the forEach method with a regular method, we see a big improvement.

function filterEven(arr: number[]): number[] {
  const filtered = [];
  for (let i = 0; i < arr.length; i++) {
    if (arr[i] % 2 !== 0) {
      filtered.push(arr[i]);
    }
  }
  return filtered;
}
Elapsed: 1.2517ms

Pop

If we don't particularly care about the order of these elements, there is a faster way still. Splicing an list is quite an expensive operation as we have to shift all of the elements in the right half of the list over by 1 to fill the gap we have just created. This is why element removal from an array at a given index is O(N). Popping on the other hand removes only the last element in the list, and involves none of this shuffling. When we find an element to remove, we could quickly swap it with whatever the last element in the array, and then pop it from the array in constant time. Importantly, this is an element we have already checked, since we are looping backwards through the list.

function filterEven(arr: number[]) {
  for (let i = arr.length - 1; i >= 0; i--) {
    if (arr[i] % 2 !== 0) {
      [arr[arr.length-1], arr[i]] = [arr[i], arr[arr.length-1]];
      arr.pop()
    }
  }
}
Elapsed: 0.3490ms

Improved Swap

In the previous algorithm, we used the JavaScript bracket notation to easily swap two elements. This is all very readable and elegant, but not optimal for performance (at least with Node v18). We have the additional overhead of creating an array in memory. In terms of performance, nothing can beat the good ol' temp value.

function filterEven(arr: number[]) {
  for (let i = arr.length - 1; i >= 0; i--) {
    if (arr[i] % 2 !== 0) {
      const temp = arr[arr.length-1];
      arr[i] = temp;
      arr[arr.length-1] = arr[i];
      arr.pop()
    }
  }
}
Elapsed: 0.2758ms

We have achieved a very respectable 24.6X increase in performance compared to splicing out each element to discard. Our tests were run using Node.js v18, and these results are likely to change wildly between versions as JIT compilers and optimisation strategies are changed in Google's V8 engine, but the potential huge difference in performance is still unexpected. In summary, splicing is a pretty slow operation, and there are usually better alternatives.

0%
04:38:39