3
<title>SIMD alignment and fftw_malloc - FFTW 3.2.2</title>
3
<title>SIMD alignment and fftw_malloc - FFTW 3.3</title>
4
4
<meta http-equiv="Content-Type" content="text/html">
5
<meta name="description" content="FFTW 3.2.2">
5
<meta name="description" content="FFTW 3.3">
6
6
<meta name="generator" content="makeinfo 4.13">
7
7
<link title="Top" rel="start" href="index.html#Top">
8
<link rel="up" href="Data-Alignment.html#Data-Alignment" title="Data Alignment">
9
<link rel="prev" href="Data-Alignment.html#Data-Alignment" title="Data Alignment">
10
<link rel="next" href="Stack-alignment-on-x86.html#Stack-alignment-on-x86" title="Stack alignment on x86">
8
<link rel="up" href="Other-Important-Topics.html#Other-Important-Topics" title="Other Important Topics">
9
<link rel="prev" href="Other-Important-Topics.html#Other-Important-Topics" title="Other Important Topics">
10
<link rel="next" href="Multi_002ddimensional-Array-Format.html#Multi_002ddimensional-Array-Format" title="Multi-dimensional Array Format">
11
11
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
13
13
This manual is for FFTW
14
(version 3.2.2, 12 July 2009).
14
(version 3.3, 26 July 2011).
16
16
Copyright (C) 2003 Matteo Frigo.
49
49
<a name="SIMD-alignment-and-fftw_malloc"></a>
50
50
<a name="SIMD-alignment-and-fftw_005fmalloc"></a>
52
Next: <a rel="next" accesskey="n" href="Stack-alignment-on-x86.html#Stack-alignment-on-x86">Stack alignment on x86</a>,
53
Previous: <a rel="previous" accesskey="p" href="Data-Alignment.html#Data-Alignment">Data Alignment</a>,
54
Up: <a rel="up" accesskey="u" href="Data-Alignment.html#Data-Alignment">Data Alignment</a>
52
Next: <a rel="next" accesskey="n" href="Multi_002ddimensional-Array-Format.html#Multi_002ddimensional-Array-Format">Multi-dimensional Array Format</a>,
53
Previous: <a rel="previous" accesskey="p" href="Other-Important-Topics.html#Other-Important-Topics">Other Important Topics</a>,
54
Up: <a rel="up" accesskey="u" href="Other-Important-Topics.html#Other-Important-Topics">Other Important Topics</a>
58
<h4 class="subsection">3.1.1 SIMD alignment and fftw_malloc</h4>
58
<h3 class="section">3.1 SIMD alignment and fftw_malloc</h3>
60
60
<p>SIMD, which stands for “Single Instruction Multiple Data,” is a set of
61
61
special operations supported by some processors to perform a single
62
62
operation on several numbers (usually 2 or 4) simultaneously. SIMD
63
63
floating-point instructions are available on several popular CPUs:
64
SSE/SSE2 (single/double precision) on Pentium III and higher and on
65
AMD64, AltiVec (single precision) on some PowerPCs (Apple G4 and
66
higher), and MIPS Paired Single. FFTW can be compiled to support the
64
SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) on some PowerPCs (Apple G4 and
65
higher), and MIPS Paired Single (currently only in FFTW 3.2.x). FFTW can be compiled to support the
67
66
SIMD instructions on any of these systems.
68
<a name="index-SIMD-102"></a><a name="index-SSE-103"></a><a name="index-SSE2-104"></a><a name="index-AltiVec-105"></a><a name="index-MIPS-PS-106"></a><a name="index-precision-107"></a>
69
A program linking to an FFTW library compiled with SIMD support can
67
<a name="index-SIMD-102"></a><a name="index-SSE-103"></a><a name="index-SSE2-104"></a><a name="index-AVX-105"></a><a name="index-AltiVec-106"></a><a name="index-MIPS-PS-107"></a><a name="index-precision-108"></a>
69
<p>A program linking to an FFTW library compiled with SIMD support can
70
70
obtain a nonnegligible speedup for most complex and r2c/c2r
71
71
transforms. In order to obtain this speedup, however, the arrays of
72
72
complex (or real) data passed to FFTW must be specially aligned in
74
74
stringent than that provided by the usual <code>malloc</code> (etc.)
75
75
allocation routines.
77
<p><a name="index-portability-108"></a>In order to guarantee proper alignment for SIMD, therefore, in case
77
<p><a name="index-portability-109"></a>In order to guarantee proper alignment for SIMD, therefore, in case
78
78
your program is ever linked against a SIMD-using FFTW, we recommend
79
79
allocating your transform data with <code>fftw_malloc</code> and
80
80
de-allocating it with <code>fftw_free</code>.
81
<a name="index-fftw_005fmalloc-109"></a><a name="index-fftw_005ffree-110"></a>These have exactly the same interface and behavior as
81
<a name="index-fftw_005fmalloc-110"></a><a name="index-fftw_005ffree-111"></a>These have exactly the same interface and behavior as
82
82
<code>malloc</code>/<code>free</code>, except that for a SIMD FFTW they ensure
83
83
that the returned pointer has the necessary alignment (by calling
84
84
<code>memalign</code> or its equivalent on your OS).
88
88
<code>new</code> (in C++) to a fixed-size array declaration. If the array
89
89
happens not to be properly aligned, FFTW will not use the SIMD
91
<a name="index-C_002b_002b-111"></a>
91
<a name="index-C_002b_002b-112"></a>
92
<a name="index-fftw_005falloc_005freal-113"></a><a name="index-fftw_005falloc_005fcomplex-114"></a>Since <code>fftw_malloc</code> only ever needs to be used for real and
93
complex arrays, we provide two convenient wrapper routines
94
<code>fftw_alloc_real(N)</code> and <code>fftw_alloc_complex(N)</code> that are
95
equivalent to <code>(double*)fftw_malloc(sizeof(double) * N)</code> and
96
<code>(fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N)</code>,
97
respectively (or their equivalents in other precisions).