你破坏Java代码的样子，真美！-电脑破坏代码

本文转载自微信公众号「小明菜市场」，作者小明菜市场。转载本文请联系小明菜市场公众号。

前言

在之前如果需要处理集合需要先手动分成几部分，然后为每部分创建线程，最后在合适的时候合并，这是手动处理并行集合的方法，在java8中，有了新功能，可以一下开启并行模式。

并行流

认识开启并行流

并行流是什么?是把一个流内容分成多个数据块，并用不同线程分别处理每个不同数据块的流。例如，有下面一个例子，在List中，需要对List数据进行分别计算，其代码如下所示：

List<Apple> appleList = new ArrayList<>(); // 假装数据是从库里查出来的 
 
for (Apple apple : appleList) { 
    apple.setPrice(5.0 * apple.getWeight() / 1000); 
}

在这里，时间复杂度为O(list.size)，随着list的增加，耗时也在增加。并行流可以解决这个问题，代码如下所示：

appleList.parallelStream().forEach(apple -> apple.setPrice(5.0 * apple.getWeight() / 1000));

这里通过调parallelStream()说明当前流为并行流，然后进行并行执行。并行流内部使用了默认的ForkJoinPool线程池，默认线程数为处理器的核心数。

测试并行流

普通代码如下所示：

public static void main(String[] args) throws InterruptedException { 
    List<Apple> appleList = initAppleList(); 
 
    Date begin = new Date(); 
    for (Apple apple : appleList) { 
        apple.setPrice(5.0 * apple.getWeight() / 1000); 
        Thread.sleep(1000); 
    } 
    Date end = new Date(); 
    log.info("苹果数量：{}个, 耗时：{}s", appleList.size(), (end.getTime() - begin.getTime()) /1000); 
}

输出的内容为耗时4s。

并行代码如下所示：

List<Apple> appleList = initAppleList(); 
 
Date begin = new Date(); 
appleList.parallelStream().forEach(apple -> 
                                   { 
                                       apple.setPrice(5.0 * apple.getWeight() / 1000); 
                                       try { 
                                           Thread.sleep(1000); 
                                       } catch (InterruptedException e) { 
                                           e.printStackTrace(); 
                                       } 
                                   } 
                                  ); 
Date end = new Date(); 
log.info("苹果数量：{}个, 耗时：{}s", appleList.size(), (end.getTime() - begin.getTime()) /1000);

输出结果为耗时1s。可以看到耗时大大提升了3s。

并行流拆分会影响流的速度

对于并行流来说需要注意以下几点：

对于 iterate 方法来处理的前 n 个数字来说，不管并行与否，它总是慢于循环的，
而对于 LongStream.rangeClosed() 方法来说，就不存在 iterate 的第两个痛点了。它生成的是基本类型的值，不用拆装箱操作，另外它可以直接将要生成的数字 1 - n 拆分成 1 - n/4， 1n/4 - 2n/4， ... 3n/4 - n 这样四部分。因此并行状态下的 rangeClosed() 是快于 for 循环外部迭代的

代码如下所示：

package lambdasinaction.chap7; 
 
import java.util.stream.*; 
 
public class ParallelStreams { 
 
    public static long iterativeSum(long n) { 
        long result = 0; 
        for (long i = 0; i <= n; i++) { 
            result += i; 
        } 
        return result; 
    } 
 
    public static long sequentialSum(long n) { 
        return Stream.iterate(1L, i -> i + 1).limit(n).reduce(Long::sum).get(); 
    } 
 
    public static long parallelSum(long n) { 
        return Stream.iterate(1L, i -> i + 1).limit(n).parallel().reduce(Long::sum).get(); 
    } 
 
    public static long rangedSum(long n) { 
        return LongStream.rangeClosed(1, n).reduce(Long::sum).getAsLong(); 
    } 
 
    public static long parallelRangedSum(long n) { 
        return LongStream.rangeClosed(1, n).parallel().reduce(Long::sum).getAsLong(); 
    } 
 
} 
package lambdasinaction.chap7; 
 
import java.util.concurrent.*; 
import java.util.function.*; 
 
public class ParallelStreamsHarness { 
 
    public static final ForkJoinPool FORK_JOIN_POOL = new ForkJoinPool(); 
 
    public static void main(String[] args) { 
        System.out.println("Iterative Sum done in: " + measurePerf(ParallelStreams::iterativeSum, 10_000_000L) + " msecs"); 
        System.out.println("Sequential Sum done in: " + measurePerf(ParallelStreams::sequentialSum, 10_000_000L) + " msecs"); 
        System.out.println("Parallel forkJoinSum done in: " + measurePerf(ParallelStreams::parallelSum, 10_000_000L) + " msecs" ); 
        System.out.println("Range forkJoinSum done in: " + measurePerf(ParallelStreams::rangedSum, 10_000_000L) + " msecs"); 
        System.out.println("Parallel range forkJoinSum done in: " + measurePerf(ParallelStreams::parallelRangedSum, 10_000_000L) + " msecs" ); 
    } 
 
    public static <T, R> long measurePerf(Function<T, R> f, T input) { 
        long fastest = Long.MAX_VALUE; 
        for (int i = 0; i < 10; i++) { 
            long start = System.nanoTime(); 
            R result = f.apply(input); 
            long duration = (System.nanoTime() - start) / 1_000_000; 
            System.out.println("Result: " + result); 
            if (duration < fastest) fastest = duration; 
        } 
        return fastest; 
    } 
}

共享变量会造成数据出现问题

public static long sideEffectSum(long n) { 
    Accumulator accumulator = new Accumulator(); 
    LongStream.rangeClosed(1, n).forEach(accumulator::add); 
    return accumulator.total; 
} 
 
public static long sideEffectParallelSum(long n) { 
    Accumulator accumulator = new Accumulator(); 
    LongStream.rangeClosed(1, n).parallel().forEach(accumulator::add); 
    return accumulator.total; 
} 
 
public static class Accumulator { 
    private long total = 0; 
 
    public void add(long value) { 
        total += value; 
    } 
}

并行流的注意

尽量使用 LongStream / IntStream / DoubleStream 等原始数据流代替 Stream 来处理数字，以避免频繁拆装箱带来的额外开销
要考虑流的操作流水线的总计算成本，假设 N 是要操作的任务总数，Q 是每次操作的时间。N * Q 就是操作的总时间，Q 值越大就意味着使用并行流带来收益的可能性越大
对于较少的数据量，不建议使用并行流
容易拆分成块的流数据，建议使用并行流