Optimizing Streams Applications
Edit me
- Compile with -a.
- Fuse operators into the same PE to reduce communication costs.
- Insert threaded ports into PEs to increase throughput through pipeline parallelism. Prefer threaded ports over PEs to obtain pipeline parallelism.
- Use multiple PEs in an application to take advantage of multiple hosts.
- Use one PE per host. If there are two PEs on the same host, they should probably be fused into one PE. Insert threaded ports to regain parallelism.
- Improve the performance of bottlenecks to improve the throughput of an application. Trying to improve the performance of an application without knowing who is the bottleneck is a waste of time. When a parallel region is no longer the bottleneck, further parallelism will not help.
- Know your hardware. Distribute PEs to hosts so as to avoid over-subscribing any resource (cores, memory, disk, etc.) on that host.