FreePOOMA Optimization Guide
The following should be considered hints as to where to use which
data-parallel feature of FreePOOMA and why.
- You can use expression templates with no performance impact as long
as you do not reference data-objects multiple times. I.e. the
compiler will not optimize
A = B + pow(B, 2);
well because it cannot see (at expression template expansion time) that
B and B are the same. This results not only in extra loads from memory,
but also in extra induction variables in the loops and as such in
more register pressure that may hurt you especially with register starved
architectures.
- If you have an expression where you need one (and only one) data-object
multiple times, possibly shifted, use the Stencil or FieldStencil
facilities.
- In all other cases, resort to using ScalarCode or PatchFunction, where
the latter is usually better optimized because you don't use Locs for
indexing there. ScalarCode is good for stuff that you want to write
dimension-agnostic, and PatchFunction is in a somewhat messy state.
Copyright (C) 2004 Richard Günther
Verbatim copying and distribution of this entire article is permitted in any medium, provided this notice is preserved.
Last updated $Date: 2004/12/22 12:12:35 $ by $Author: richi $.