perf_opt.h File Reference

This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Defines

#define USE_PLAIN_VEC_KERNELS
 file perf_opt.h Default settings for optimum performance on different architectures and compilers.
#define DEF_UNROLL_DEPTH   4
#define DEF_PREFETCH_AHEAD   4
#define DEF_CACHELINE_SZ   32
#define DEF_CACHE_LOC_READ   2
 This is optimized for small objects, for large ones 0,1 or 0,0 may be best.
#define DEF_CACHE_LOC_WRITE   3
#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD
 How many cache lines (!) to prefetch ahead of use, depends on your memory latency.
#define UNROLL_DEPTH   DEF_UNROLL_DEPTH
 How many iters per loop (unrolling) Trade code bloat against speed.
#define CACHELINE_SZ   DEF_CACHELINE_SZ
 (L1) Cache line size in bytes.
#define CACHE_LOC_READ   DEF_CACHE_LOC_READ
 Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).
#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE
#define EL_PER_CL(T)   (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1)
#define PREF_OFFS(T)   (EL_PER_CL(T)*PREFETCH_AHEAD)


Define Documentation

#define CACHE_LOC_READ   DEF_CACHE_LOC_READ

Cache locality for read from and written to pointers 0: don't cache (streaming data, only accessed once).

3: cache in all caches (likely to be reaccessed soon) 1,2 are intermediate values. See gcc docu on __builtin_prefetch Advice: Use lower values for reading than writing ... (you'll more likely need the result again, not the args) For large objects (larger than L2/L3 cache), CACHE_LOC_READ=0 is best; CACHE_LOC_WRITE is less important, but 0 also seems best.

Definition at line 161 of file perf_opt.h.

#define CACHE_LOC_WRITE   DEF_CACHE_LOC_WRITE

Definition at line 164 of file perf_opt.h.

#define CACHELINE_SZ   DEF_CACHELINE_SZ

(L1) Cache line size in bytes.

32 or 64 bytes on many archs Used to only issue a prefetch once per cacheline and to scale the offset when prefetching ahead.

Definition at line 148 of file perf_opt.h.

Referenced by do_mat_mat_mult(), do_mat_vec_mult(), do_mat_vec_transmult(), TBCI::dot(), and TBCI::Vector< T >::operator*().

#define DEF_CACHE_LOC_READ   2

This is optimized for small objects, for large ones 0,1 or 0,0 may be best.

Definition at line 128 of file perf_opt.h.

#define DEF_CACHE_LOC_WRITE   3

Definition at line 129 of file perf_opt.h.

#define DEF_CACHELINE_SZ   32

Definition at line 123 of file perf_opt.h.

#define DEF_PREFETCH_AHEAD   4

Definition at line 120 of file perf_opt.h.

#define DEF_UNROLL_DEPTH   4

Definition at line 117 of file perf_opt.h.

#define EL_PER_CL (  )     (signed)((CACHELINE_SZ/sizeof( T ))?(CACHELINE_SZ/sizeof( T )):1)

Definition at line 168 of file perf_opt.h.

#define PREF_OFFS (  )     (EL_PER_CL(T)*PREFETCH_AHEAD)

Definition at line 169 of file perf_opt.h.

#define PREFETCH_AHEAD   DEF_PREFETCH_AHEAD

How many cache lines (!) to prefetch ahead of use, depends on your memory latency.

4 or 8 seem to be good choices

Definition at line 136 of file perf_opt.h.

#define UNROLL_DEPTH   DEF_UNROLL_DEPTH

How many iters per loop (unrolling) Trade code bloat against speed.

4 or 8 are good values However 1 might be best if your compiler is better at unrolling than us

Definition at line 142 of file perf_opt.h.

#define USE_PLAIN_VEC_KERNELS

file perf_opt.h Default settings for optimum performance on different architectures and compilers.

(c) Kurt Garloff <kurt@garloff.de>, 2002-07-30

Id
perf_opt.h,v 1.1.2.13 2008/10/23 19:57:51 garloff Exp

Definition at line 114 of file perf_opt.h.


Generated on Wed Nov 20 09:28:24 2013 for TBCI Numerical high perf. C++ Library by  doxygen 1.5.6