Optimizing Streams Applications

Edit me

  1. Compile with -a.
  2. Fuse operators into the same PE to reduce communication costs.
  3. Insert threaded ports into PEs to increase throughput through pipeline parallelism. Prefer threaded ports over PEs to obtain pipeline parallelism.
  4. Use multiple PEs in an application to take advantage of multiple hosts.
  5. Use one PE per host. If there are two PEs on the same host, they should probably be fused into one PE. Insert threaded ports to regain parallelism.
  6. Improve the performance of bottlenecks to improve the throughput of an application. Trying to improve the performance of an application without knowing who is the bottleneck is a waste of time. When a parallel region is no longer the bottleneck, further parallelism will not help.
  7. Know your hardware. Distribute PEs to hosts so as to avoid over-subscribing any resource (cores, memory, disk, etc.) on that host.

Updated: