Maps in Other Languages

a gentle introduction to patterns

The Map Pattern in Other Languages

In the first post on the map pattern I showed how the pattern could be coded in C++ in a few different ways. Now it’s time to look at some other popular languages: Rust, Java and Python.

A caveat: I am not a day-to-day user of these languages and they are partly a mystery to me as I do not exactly know how they are translated to machine code at the end. Rust should be pretty clear, but I need some more time to familiarise myself with it. Java and Python, however, are pretty obscure to me in terms of how the code gets translated into machine code. I have tried my best so please be nice in your comments and suggest improvements.

Rust

The map pattern is just as trivial in Rust as to use the Standard template library in C++ (see previous blog post). Here is the code:

fn map_seq(arr: Vec<i64>, foo: fn(&i64) -> i64) -> Vec<i64> {
    arr.iter().map(foo).collect()
}

The map interface I use here takes a vector of numbers: arr, applies the map function fooon every element and returns the resulting vector. This becomes a one-liner in Rust since the language has a built in functionality for mapping a function on each element of a vector.

The Rayon crate makes it equally easy to make the above code parallel:

fn map_seq(arr: Vec<i64>, foo: fn(&i64) -> i64) -> Vec<i64> {
    arr.par_iter().map(foo).collect()
}

Note the par_iter()! It creates iterators working in parallel on threads in a thread pool. It makes it very easy to work on iterable objects in parallel. At the end, the iterables are joined back in a vector which is returned to the caller.

The Seq stream map took: 51.3091s Par stream map took: 91.5159s, speedup: 0.5606577654811896

Java

Although I have written a fair amount of Java, I have not yet worked a lot in writing data-parallel Java code. My first attempt was using the Streams interface which has a parallel counter-part and the code looks very much like the Rust code:

  public List<Integer> map_seq_stream(List<Integer> arr, UnaryOperator<Integer> foo) {
    return arr.stream().map(foo).collect(Collectors.toList());
  }

  public List<Integer> map_par_stream(List<Integer> arr, UnaryOperator<Integer> foo) throws Exception {
      return pool.submit(() -> arr.stream().parallel().map(foo).collect(Collectors.toList())).get();
  }

These are very similar to the Rust versions above. The pool reference is a field I store in a class where map_par_stream is a method. It holds a reference to a thread pool I use to create a controlled number of threads for the parallel version.

Sequential stream map took: 5.1 s.

Parallel stream map took: 9.5 s.

As you can see, this is a very poor performing implementation. It is extremely inefficient to create a strem of Integer objects and to collect them back again in a list. Performance is abysmal and it uses a buckload of memory. In this case I made the map over 50 M elements (instead of 200 M as I used in the C++ and Rust versions) because over that I exhausted my 16 G memory.

Let’s try using primitive arrays int[] instead of List<Integer> and an IntUnaryOperator instead of UnaryOperator<Integer> to see what the overhead of Integer object type can make.

  public int[] prim_map_seq_stream(int[] arr, IntUnaryOperator foo) {
    return IntStream.of(arr).map(foo).toArray();
  }

  public int[] prim_map_par_stream(int[] arr, IntUnaryOperator foo) throws Exception {
    return pool.submit(() -> IntStream.of(arr).parallel().map(foo).toArray()).get();
  }

Sequential primitive int stream map took: 0.12 s.

Parallel primitive int stream map took: 0.05 s.

Not only is this a factor of ~50 faster, but we also get a decent speedup.

Second attempt is using a simple for loop and divide the work in paralle similar to the threads version in C++.

  public List<Integer> map_seq_for(List<Integer> arr, UnaryOperator<Integer> foo) {
    for (int i = 0; i < arr.size(); i++) {
      arr.set(i, foo.apply(arr.get(i)));
    }
    return arr;
  }

  public List<Integer> map_par_for(List<Integer> arr, UnaryOperator<Integer> foo) {
    int procs = pool.getParallelism();
    List<RecursiveAction> tasks = new ArrayList<>();
    for (int p = 0; p < procs; p++) {
      LocalMap localMap = new LocalMap(arr, procs, p, foo);
      tasks.add(localMap);
      pool.execute(localMap);
    }

    for (RecursiveAction t : tasks) {
      t.join();
    }

    return arr;
  }


  private static class LocalMap extends RecursiveAction {

    List<Integer> arr;
    int procs;
    int p;
    UnaryOperator<Integer> foo;

    public LocalMap(List<Integer> arr, int procs, int p, UnaryOperator<Integer> foo) {
      this.arr = arr;
      this.procs = procs;
      this.p = p;
      this.foo = foo;
    }

    @Override
    protected void compute() {
      int from = p * arr.size() / procs;
      int to = p == procs - 1 ? arr.size() : (p + 1) * arr.size() / procs;
      for (int i = from; i < to; i++) {
        arr.set(i, foo.apply(arr.get(i)));
      }
    }
  }

My, my! Can you see why people think Java is verbose?

Here, I had to create a special object LocalMap extending the RecursiveAction class which can be executed by the Fork/Join framework. This is the same mechanism that is used by the parallel streams interface so I thought it should be comparable.

This is better than the streams version, but still I barely get any speedup of the parallel code.

Seq for map took: 2.861s Par for map took: 2.3459s, speedup: 1.2195745769214374

Sequential for-loop map took: 2.9 s.

Parallel for-loop map took: 2.3 s.

We now at least get a speedup with the parallel version, but only about a factor of 1.2.

Finally I switched all List<Integer> to int[] in order to use primitive types instead of the object type Integer.

Seq prim for map took: 0.0266s Par prim for map took: 0.0242s, speedup: 1.0991735537190082 Seq for map took: 2.7873s Par for map took: 2.7895s, speedup: 0.9992113281950171 Seq prim stream map took: 0.1176s Par prim stream map took: 0.0529s, speedup: 2.223062381852552 Seq stream map took: 5.297s Par stream map took: 8.6409s, speedup: 0.6130148479903713

Python

def map_for(arr, foo):
    for el in arr:
        el = foo(el)
    return arr

def map_seq(arr, foo):
    return list(map(foo, arr))

def map_par(arr, foo, pool):
    return pool.map(foo, arr, 10000)

np_arr = np.array(arr)
np_arr_map = foo(np_arr)

For map time: 10.817958116531372 Seq map time: 13.73941354751587 Par map time: 22.972440338134767 Speedup: 0.5980824564253252 np map time: 0.7083923578262329 Speedup: 19.395202948937115

For map time: 15.828584599494935 Seq map time: 17.142254757881165 Par map time: 20.58997473716736 Speedup: 0.832553462386593 np map time: 0.3134563207626343 Speedup: 54.68785799620926

Performance

To summarise:

Rust:

Time for seq1 map: Some(687.16141ms) Time for par1 map: Some(426.27363ms), speedup: 1.6120195143199452

Java:

Seq map for took: 20.216s Par map took: 18.9936s, speedup: 1.0643585207648893 Seq stream map took: 51.3091s Par stream map took: 91.5159s, speedup: 0.5606577654811896

Mats Brorsson
CEO EMBE Innovation, Research scientist @ uni.lu, Professor @ KTH

I strive to make the increasingly complex parallel computing hardware resources available to the general software developer. Passionate about software development with high performance requirements and to coach people to reach their potential. I make parallel programming composable and easier.

Related

comments powered by Disqus