1
@node Tutorial, Other Important Topics, Introduction, Top
4
* Complex One-Dimensional DFTs::
5
* Complex Multi-Dimensional DFTs::
6
* One-Dimensional DFTs of Real Data::
7
* Multi-Dimensional DFTs of Real Data::
8
* More DFTs of Real Data::
11
This chapter describes the basic usage of FFTW, i.e., how to compute
12
@cindex basic interface
13
the Fourier transform of a single array. This chapter tells the
14
truth, but not the @emph{whole} truth. Specifically, FFTW implements
15
additional routines and flags that are not documented here, although
16
in many cases we try to indicate where added capabilities exist. For
17
more complete information, see @ref{FFTW Reference}. (Note that you
18
need to compile and install FFTW before you can use it in a program.
19
For the details of the installation, see @ref{Installation and
22
We recommend that you read this tutorial in order.@footnote{You can
23
read the tutorial in bit-reversed order after computing your first
24
transform.} At the least, read the first section (@pxref{Complex
25
One-Dimensional DFTs}) before reading any of the others, even if your
26
main interest lies in one of the other transform types.
28
Users of FFTW version 2 and earlier may also want to read @ref{Upgrading
31
@c ------------------------------------------------------------
32
@node Complex One-Dimensional DFTs, Complex Multi-Dimensional DFTs, Tutorial, Tutorial
33
@section Complex One-Dimensional DFTs
36
Plan: To bother about the best method of accomplishing an accidental result.
37
[Ambrose Bierce, @cite{The Enlarged Devil's Dictionary}.]
45
The basic usage of FFTW to compute a one-dimensional DFT of size
46
@code{N} is simple, and it typically looks something like this code:
52
fftw_complex *in, *out;
55
in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
56
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
57
p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
59
fftw_execute(p); /* @r{repeat as needed} */
62
fftw_free(in); fftw_free(out);
66
You must link this code with the @code{fftw3} library. On Unix systems,
67
link with @code{-lfftw3 -lm}.
69
The example code first allocates the input and output arrays. You can
70
allocate them in any way that you like, but we recommend using
71
@code{fftw_malloc}, which behaves like
73
@code{malloc} except that it properly aligns the array when SIMD
74
instructions (such as SSE and Altivec) are available (@pxref{SIMD
75
alignment and fftw_malloc}). [Alternatively, we provide a convenient wrapper function @code{fftw_alloc_complex(N)} which has the same effect.]
76
@findex fftw_alloc_complex
80
The data is an array of type @code{fftw_complex}, which is by default a
81
@code{double[2]} composed of the real (@code{in[i][0]}) and imaginary
82
(@code{in[i][1]}) parts of a complex number.
85
The next step is to create a @dfn{plan}, which is an object
87
that contains all the data that FFTW needs to compute the FFT.
88
This function creates the plan:
91
fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in, fftw_complex *out,
92
int sign, unsigned flags);
94
@findex fftw_plan_dft_1d
97
The first argument, @code{n}, is the size of the transform you are
98
trying to compute. The size @code{n} can be any positive integer, but
99
sizes that are products of small factors are transformed most
100
efficiently (although prime sizes still use an @Onlogn{} algorithm).
102
The next two arguments are pointers to the input and output arrays of
103
the transform. These pointers can be equal, indicating an
104
@dfn{in-place} transform.
108
The fourth argument, @code{sign}, can be either @code{FFTW_FORWARD}
109
(@code{-1}) or @code{FFTW_BACKWARD} (@code{+1}),
110
@ctindex FFTW_FORWARD
111
@ctindex FFTW_BACKWARD
112
and indicates the direction of the transform you are interested in;
113
technically, it is the sign of the exponent in the transform.
115
The @code{flags} argument is usually either @code{FFTW_MEASURE} or
117
@code{FFTW_ESTIMATE}. @code{FFTW_MEASURE} instructs FFTW to run
118
@ctindex FFTW_MEASURE
119
and measure the execution time of several FFTs in order to find the
120
best way to compute the transform of size @code{n}. This process takes
121
some time (usually a few seconds), depending on your machine and on
122
the size of the transform. @code{FFTW_ESTIMATE}, on the contrary,
123
does not run any computation and just builds a
124
@ctindex FFTW_ESTIMATE
125
reasonable plan that is probably sub-optimal. In short, if your
126
program performs many transforms of the same size and initialization
127
time is not important, use @code{FFTW_MEASURE}; otherwise use the
130
@emph{You must create the plan before initializing the input}, because
131
@code{FFTW_MEASURE} overwrites the @code{in}/@code{out} arrays.
132
(Technically, @code{FFTW_ESTIMATE} does not touch your arrays, but you
133
should always create plans first just to be sure.)
135
Once the plan has been created, you can use it as many times as you
136
like for transforms on the specified @code{in}/@code{out} arrays,
137
computing the actual transforms via @code{fftw_execute(plan)}:
139
void fftw_execute(const fftw_plan plan);
143
The DFT results are stored in-order in the array @code{out}, with the
144
zero-frequency (DC) component in @code{out[0]}.
146
If @code{in != out}, the transform is @dfn{out-of-place} and the input
147
array @code{in} is not modified. Otherwise, the input array is
148
overwritten with the transform.
151
If you want to transform a @emph{different} array of the same size, you
152
can create a new plan with @code{fftw_plan_dft_1d} and FFTW
153
automatically reuses the information from the previous plan, if
154
possible. Alternatively, with the ``guru'' interface you can apply a
155
given plan to a different array, if you are careful.
156
@xref{FFTW Reference}.
158
When you are done with the plan, you deallocate it by calling
159
@code{fftw_destroy_plan(plan)}:
161
void fftw_destroy_plan(fftw_plan plan);
163
@findex fftw_destroy_plan
164
If you allocate an array with @code{fftw_malloc()} you must deallocate
165
it with @code{fftw_free()}. Do not use @code{free()} or, heaven
166
forbid, @code{delete}.
169
FFTW computes an @emph{unnormalized} DFT. Thus, computing a forward
170
followed by a backward transform (or vice versa) results in the original
171
array scaled by @code{n}. For the definition of the DFT, see @ref{What
172
FFTW Really Computes}.
174
@cindex normalization
177
If you have a C compiler, such as @code{gcc}, that supports the
178
C99 standard, and you @code{#include <complex.h>} @emph{before}
179
@code{<fftw3.h>}, then @code{fftw_complex} is the native
180
double-precision complex type and you can manipulate it with ordinary
181
arithmetic. Otherwise, FFTW defines its own complex type, which is
182
bit-compatible with the C99 complex type. @xref{Complex numbers}.
183
(The C++ @code{<complex>} template class may also be usable via a
187
To use single or long-double precision versions of FFTW, replace the
188
@code{fftw_} prefix by @code{fftwf_} or @code{fftwl_} and link with
189
@code{-lfftw3f} or @code{-lfftw3l}, but use the @emph{same}
190
@code{<fftw3.h>} header file.
194
Many more flags exist besides @code{FFTW_MEASURE} and
195
@code{FFTW_ESTIMATE}. For example, use @code{FFTW_PATIENT} if you're
196
willing to wait even longer for a possibly even faster plan (@pxref{FFTW
198
@ctindex FFTW_PATIENT
199
You can also save plans for future use, as described by @ref{Words of
200
Wisdom-Saving Plans}.
202
@c ------------------------------------------------------------
203
@node Complex Multi-Dimensional DFTs, One-Dimensional DFTs of Real Data, Complex One-Dimensional DFTs, Tutorial
204
@section Complex Multi-Dimensional DFTs
206
Multi-dimensional transforms work much the same way as one-dimensional
207
transforms: you allocate arrays of @code{fftw_complex} (preferably
208
using @code{fftw_malloc}), create an @code{fftw_plan}, execute it as
209
many times as you want with @code{fftw_execute(plan)}, and clean up
210
with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}).
212
FFTW provides two routines for creating plans for 2d and 3d transforms,
213
and one routine for creating plans of arbitrary dimensionality.
214
The 2d and 3d routines have the following signature:
216
fftw_plan fftw_plan_dft_2d(int n0, int n1,
217
fftw_complex *in, fftw_complex *out,
218
int sign, unsigned flags);
219
fftw_plan fftw_plan_dft_3d(int n0, int n1, int n2,
220
fftw_complex *in, fftw_complex *out,
221
int sign, unsigned flags);
223
@findex fftw_plan_dft_2d
224
@findex fftw_plan_dft_3d
226
These routines create plans for @code{n0} by @code{n1} two-dimensional
227
(2d) transforms and @code{n0} by @code{n1} by @code{n2} 3d transforms,
228
respectively. All of these transforms operate on contiguous arrays in
229
the C-standard @dfn{row-major} order, so that the last dimension has the
230
fastest-varying index in the array. This layout is described further in
231
@ref{Multi-dimensional Array Format}.
233
FFTW can also compute transforms of higher dimensionality. In order to
234
avoid confusion between the various meanings of the the word
235
``dimension'', we use the term @emph{rank}
237
to denote the number of independent indices in an array.@footnote{The
238
term ``rank'' is commonly used in the APL, FORTRAN, and Common Lisp
239
traditions, although it is not so common in the C@tie{}world.} For
240
example, we say that a 2d transform has rank@tie{}2, a 3d transform has
241
rank@tie{}3, and so on. You can plan transforms of arbitrary rank by
242
means of the following function:
245
fftw_plan fftw_plan_dft(int rank, const int *n,
246
fftw_complex *in, fftw_complex *out,
247
int sign, unsigned flags);
249
@findex fftw_plan_dft
251
Here, @code{n} is a pointer to an array @code{n[rank]} denoting an
252
@code{n[0]} by @code{n[1]} by @dots{} by @code{n[rank-1]} transform.
253
Thus, for example, the call
255
fftw_plan_dft_2d(n0, n1, in, out, sign, flags);
257
is equivalent to the following code fragment:
262
fftw_plan_dft(2, n, in, out, sign, flags);
264
@code{fftw_plan_dft} is not restricted to 2d and 3d transforms,
265
however, but it can plan transforms of arbitrary rank.
267
You may have noticed that all the planner routines described so far
268
have overlapping functionality. For example, you can plan a 1d or 2d
269
transform by using @code{fftw_plan_dft} with a @code{rank} of @code{1}
270
or @code{2}, or even by calling @code{fftw_plan_dft_3d} with @code{n0}
271
and/or @code{n1} equal to @code{1} (with no loss in efficiency). This
272
pattern continues, and FFTW's planning routines in general form a
273
``partial order,'' sequences of
274
@cindex partial order
275
interfaces with strictly increasing generality but correspondingly
278
@code{fftw_plan_dft} is the most general complex-DFT routine that we
279
describe in this tutorial, but there are also the advanced and guru interfaces,
280
@cindex advanced interface
281
@cindex guru interface
282
which allow one to efficiently combine multiple/strided transforms
283
into a single FFTW plan, transform a subset of a larger
284
multi-dimensional array, and/or to handle more general complex-number
285
formats. For more information, see @ref{FFTW Reference}.
287
@c ------------------------------------------------------------
288
@node One-Dimensional DFTs of Real Data, Multi-Dimensional DFTs of Real Data, Complex Multi-Dimensional DFTs, Tutorial
289
@section One-Dimensional DFTs of Real Data
291
In many practical applications, the input data @code{in[i]} are purely
292
real numbers, in which case the DFT output satisfies the ``Hermitian''
294
redundancy: @code{out[i]} is the conjugate of @code{out[n-i]}. It is
295
possible to take advantage of these circumstances in order to achieve
296
roughly a factor of two improvement in both speed and memory usage.
298
In exchange for these speed and space advantages, the user sacrifices
299
some of the simplicity of FFTW's complex transforms. First of all, the
300
input and output arrays are of @emph{different sizes and types}: the
301
input is @code{n} real numbers, while the output is @code{n/2+1}
302
complex numbers (the non-redundant outputs); this also requires slight
303
``padding'' of the input array for
305
in-place transforms. Second, the inverse transform (complex to real)
306
has the side-effect of @emph{destroying its input array}, by default.
307
Neither of these inconveniences should pose a serious problem for
308
users, but it is important to be aware of them.
310
The routines to perform real-data transforms are almost the same as
311
those for complex transforms: you allocate arrays of @code{double}
312
and/or @code{fftw_complex} (preferably using @code{fftw_malloc} or
313
@code{fftw_alloc_complex}), create an @code{fftw_plan}, execute it as
314
many times as you want with @code{fftw_execute(plan)}, and clean up
315
with @code{fftw_destroy_plan(plan)} (and @code{fftw_free}). The only
316
differences are that the input (or output) is of type @code{double}
317
and there are new routines to create the plan. In one dimension:
320
fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex *out,
322
fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double *out,
325
@findex fftw_plan_dft_r2c_1d
326
@findex fftw_plan_dft_c2r_1d
328
for the real input to complex-Hermitian output (@dfn{r2c}) and
329
complex-Hermitian input to real output (@dfn{c2r}) transforms.
332
Unlike the complex DFT planner, there is no @code{sign} argument.
333
Instead, r2c DFTs are always @code{FFTW_FORWARD} and c2r DFTs are
334
always @code{FFTW_BACKWARD}.
335
@ctindex FFTW_FORWARD
336
@ctindex FFTW_BACKWARD
337
(For single/long-double precision
338
@code{fftwf} and @code{fftwl}, @code{double} should be replaced by
339
@code{float} and @code{long double}, respectively.)
343
Here, @code{n} is the ``logical'' size of the DFT, not necessarily the
344
physical size of the array. In particular, the real (@code{double})
345
array has @code{n} elements, while the complex (@code{fftw_complex})
346
array has @code{n/2+1} elements (where the division is rounded down).
347
For an in-place transform,
349
@code{in} and @code{out} are aliased to the same array, which must be
350
big enough to hold both; so, the real array would actually have
351
@code{2*(n/2+1)} elements, where the elements beyond the first
352
@code{n} are unused padding. (Note that this is very different from
353
the concept of ``zero-padding'' a transform to a larger length, which
354
changes the logical size of the DFT by actually adding new input
355
data.) The @math{k}th element of the complex array is exactly the
356
same as the @math{k}th element of the corresponding complex DFT. All
357
positive @code{n} are supported; products of small factors are most
358
efficient, but an @Onlogn algorithm is used even for prime sizes.
360
As noted above, the c2r transform destroys its input array even for
361
out-of-place transforms. This can be prevented, if necessary, by
362
including @code{FFTW_PRESERVE_INPUT} in the @code{flags}, with
363
unfortunately some sacrifice in performance.
365
@ctindex FFTW_PRESERVE_INPUT
366
This flag is also not currently supported for multi-dimensional real
369
Readers familiar with DFTs of real data will recall that the 0th (the
370
``DC'') and @code{n/2}-th (the ``Nyquist'' frequency, when @code{n} is
371
even) elements of the complex output are purely real. Some
372
implementations therefore store the Nyquist element where the DC
373
imaginary part would go, in order to make the input and output arrays
374
the same size. Such packing, however, does not generalize well to
375
multi-dimensional transforms, and the space savings are miniscule in
376
any case; FFTW does not support it.
378
An alternative interface for one-dimensional r2c and c2r DFTs can be
379
found in the @samp{r2r} interface (@pxref{The Halfcomplex-format
380
DFT}), with ``halfcomplex''-format output that @emph{is} the same size
381
(and type) as the input array.
382
@cindex halfcomplex format
383
That interface, although it is not very useful for multi-dimensional
384
transforms, may sometimes yield better performance.
386
@c ------------------------------------------------------------
387
@node Multi-Dimensional DFTs of Real Data, More DFTs of Real Data, One-Dimensional DFTs of Real Data, Tutorial
388
@section Multi-Dimensional DFTs of Real Data
390
Multi-dimensional DFTs of real data use the following planner routines:
393
fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
394
double *in, fftw_complex *out,
396
fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
397
double *in, fftw_complex *out,
399
fftw_plan fftw_plan_dft_r2c(int rank, const int *n,
400
double *in, fftw_complex *out,
403
@findex fftw_plan_dft_r2c_2d
404
@findex fftw_plan_dft_r2c_3d
405
@findex fftw_plan_dft_r2c
407
as well as the corresponding @code{c2r} routines with the input/output
408
types swapped. These routines work similarly to their complex
409
analogues, except for the fact that here the complex output array is cut
410
roughly in half and the real array requires padding for in-place
411
transforms (as in 1d, above).
413
As before, @code{n} is the logical size of the array, and the
414
consequences of this on the the format of the complex arrays deserve
416
@cindex r2c/c2r multi-dimensional array format
417
Suppose that the real data has dimensions @ndims (in row-major order).
418
Then, after an r2c transform, the output is an @ndimshalf array of
419
@code{fftw_complex} values in row-major order, corresponding to slightly
420
over half of the output of the corresponding complex DFT. (The division
421
is rounded down.) The ordering of the data is otherwise exactly the
422
same as in the complex-DFT case.
424
Since the complex data is slightly larger than the real data, some
425
complications arise for in-place transforms. In this case, the final
426
dimension of the real data must be padded with extra values to
427
accommodate the size of the complex data---two values if the last
428
dimension is even and one if it is odd.
430
That is, the last dimension of the real data must physically contain
438
2 * (n<sub>d-1</sub>/2+1)
440
@code{double} values (exactly enough to hold the complex data).
441
This physical array size does not, however, change the @emph{logical}
452
values are actually stored in the last dimension, and
462
is the last dimension passed to the plan-creation routine.
464
For example, consider the transform of a two-dimensional real array of
465
size @code{n0} by @code{n1}. The output of the r2c transform is a
466
two-dimensional complex array of size @code{n0} by @code{n1/2+1}, where
467
the @code{y} dimension has been cut nearly in half because of
468
redundancies in the output. Because @code{fftw_complex} is twice the
469
size of @code{double}, the output array is slightly bigger than the
470
input array. Thus, if we want to compute the transform in place, we
471
must @emph{pad} the input array so that it is of size @code{n0} by
472
@code{2*(n1/2+1)}. If @code{n1} is even, then there are two padding
473
elements at the end of each row (which need not be initialized, as they
474
are only used for output).
477
The following illustration depicts the input and output arrays just
478
described, for both the out-of-place and in-place transforms (with the
479
arrows indicating consecutive memory locations):
480
@image{rfftwnd-for-html}
484
@float Figure,fig:rfftwnd
485
@center @image{rfftwnd}
486
@caption{Illustration of the data layout for a 2d @code{nx} by @code{ny}
487
real-to-complex transform.}
489
@ref{fig:rfftwnd} depicts the input and output arrays just
490
described, for both the out-of-place and in-place transforms (with the
491
arrows indicating consecutive memory locations):
495
These transforms are unnormalized, so an r2c followed by a c2r
496
transform (or vice versa) will result in the original data scaled by
497
the number of real data elements---that is, the product of the
498
(logical) dimensions of the real data.
499
@cindex normalization
502
(Because the last dimension is treated specially, if it is equal to
503
@code{1} the transform is @emph{not} equivalent to a lower-dimensional
504
r2c/c2r transform. In that case, the last complex dimension also has
505
size @code{1} (@code{=1/2+1}), and no advantage is gained over the
508
@c ------------------------------------------------------------
509
@node More DFTs of Real Data, , Multi-Dimensional DFTs of Real Data, Tutorial
510
@section More DFTs of Real Data
512
* The Halfcomplex-format DFT::
513
* Real even/odd DFTs (cosine/sine transforms)::
514
* The Discrete Hartley Transform::
517
FFTW supports several other transform types via a unified @dfn{r2r}
518
(real-to-real) interface,
520
so called because it takes a real (@code{double}) array and outputs a
521
real array of the same size. These r2r transforms currently fall into
522
three categories: DFTs of real input and complex-Hermitian output in
523
halfcomplex format, DFTs of real input with even/odd symmetry
524
(a.k.a. discrete cosine/sine transforms, DCTs/DSTs), and discrete
525
Hartley transforms (DHTs), all described in more detail by the
528
The r2r transforms follow the by now familiar interface of creating an
529
@code{fftw_plan}, executing it with @code{fftw_execute(plan)}, and
530
destroying it with @code{fftw_destroy_plan(plan)}. Furthermore, all
531
r2r transforms share the same planner interface:
534
fftw_plan fftw_plan_r2r_1d(int n, double *in, double *out,
535
fftw_r2r_kind kind, unsigned flags);
536
fftw_plan fftw_plan_r2r_2d(int n0, int n1, double *in, double *out,
537
fftw_r2r_kind kind0, fftw_r2r_kind kind1,
539
fftw_plan fftw_plan_r2r_3d(int n0, int n1, int n2,
540
double *in, double *out,
545
fftw_plan fftw_plan_r2r(int rank, const int *n, double *in, double *out,
546
const fftw_r2r_kind *kind, unsigned flags);
548
@findex fftw_plan_r2r_1d
549
@findex fftw_plan_r2r_2d
550
@findex fftw_plan_r2r_3d
551
@findex fftw_plan_r2r
553
Just as for the complex DFT, these plan 1d/2d/3d/multi-dimensional
554
transforms for contiguous arrays in row-major order, transforming (real)
555
input to output of the same size, where @code{n} specifies the
556
@emph{physical} dimensions of the arrays. All positive @code{n} are
557
supported (with the exception of @code{n=1} for the @code{FFTW_REDFT00}
558
kind, noted in the real-even subsection below); products of small
559
factors are most efficient (factorizing @code{n-1} and @code{n+1} for
560
@code{FFTW_REDFT00} and @code{FFTW_RODFT00} kinds, described below), but
561
an @Onlogn algorithm is used even for prime sizes.
563
Each dimension has a @dfn{kind} parameter, of type
564
@code{fftw_r2r_kind}, specifying the kind of r2r transform to be used
567
@tindex fftw_r2r_kind
568
(In the case of @code{fftw_plan_r2r}, this is an array @code{kind[rank]}
569
where @code{kind[i]} is the transform kind for the dimension
570
@code{n[i]}.) The kind can be one of a set of predefined constants,
571
defined in the following subsections.
573
In other words, FFTW computes the separable product of the specified
574
r2r transforms over each dimension, which can be used e.g. for partial
575
differential equations with mixed boundary conditions. (For some r2r
576
kinds, notably the halfcomplex DFT and the DHT, such a separable
577
product is somewhat problematic in more than one dimension, however,
578
as is described below.)
580
In the current version of FFTW, all r2r transforms except for the
581
halfcomplex type are computed via pre- or post-processing of
582
halfcomplex transforms, and they are therefore not as fast as they
583
could be. Since most other general DCT/DST codes employ a similar
584
algorithm, however, FFTW's implementation should provide at least
585
competitive performance.
588
@node The Halfcomplex-format DFT, Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data, More DFTs of Real Data
589
@subsection The Halfcomplex-format DFT
591
An r2r kind of @code{FFTW_R2HC} (@dfn{r2hc}) corresponds to an r2c DFT
595
(@pxref{One-Dimensional DFTs of Real Data}) but with ``halfcomplex''
596
format output, and may sometimes be faster and/or more convenient than
598
@cindex halfcomplex format
599
The inverse @dfn{hc2r} transform is of kind @code{FFTW_HC2R}.
602
This consists of the non-redundant half of the complex output for a 1d
603
real-input DFT of size @code{n}, stored as a sequence of @code{n} real
604
numbers (@code{double}) in the format:
608
r_0, r_1, r_2, \ldots, r_{n/2}, i_{(n+1)/2-1}, \ldots, i_2, i_1
612
r0, r1, r2, r(n/2), i((n+1)/2-1), ..., i2, i1
616
r<sub>0</sub>, r<sub>1</sub>, r<sub>2</sub>, ..., r<sub>n/2</sub>, i<sub>(n+1)/2-1</sub>, ..., i<sub>2</sub>, i<sub>1</sub>
630
is the real part of the @math{k}th output, and
640
is the imaginary part. (Division by 2 is rounded down.) For a
641
halfcomplex array @code{hc[n]}, the @math{k}th component thus has its
642
real part in @code{hc[k]} and its imaginary part in @code{hc[n-k]}, with
643
the exception of @code{k} @code{==} @code{0} or @code{n/2} (the latter
644
only if @code{n} is even)---in these two cases, the imaginary part is
645
zero due to symmetries of the real-input DFT, and is not stored.
646
Thus, the r2hc transform of @code{n} real values is a halfcomplex array of
647
length @code{n}, and vice versa for hc2r.
648
@cindex normalization
651
Aside from the differing format, the output of
652
@code{FFTW_R2HC}/@code{FFTW_HC2R} is otherwise exactly the same as for
653
the corresponding 1d r2c/c2r transform
654
(i.e. @code{FFTW_FORWARD}/@code{FFTW_BACKWARD} transforms, respectively).
655
Recall that these transforms are unnormalized, so r2hc followed by hc2r
656
will result in the original data multiplied by @code{n}. Furthermore,
657
like the c2r transform, an out-of-place hc2r transform will
658
@emph{destroy its input} array.
660
Although these halfcomplex transforms can be used with the
661
multi-dimensional r2r interface, the interpretation of such a separable
662
product of transforms along each dimension is problematic. For example,
663
consider a two-dimensional @code{n0} by @code{n1}, r2hc by r2hc
664
transform planned by @code{fftw_plan_r2r_2d(n0, n1, in, out, FFTW_R2HC,
665
FFTW_R2HC, FFTW_MEASURE)}. Conceptually, FFTW first transforms the rows
666
(of size @code{n1}) to produce halfcomplex rows, and then transforms the
667
columns (of size @code{n0}). Half of these column transforms, however,
668
are of imaginary parts, and should therefore be multiplied by @math{i}
669
and combined with the r2hc transforms of the real columns to produce the
670
2d DFT amplitudes; FFTW's r2r transform does @emph{not} perform this
671
combination for you. Thus, if a multi-dimensional real-input/output DFT
672
is required, we recommend using the ordinary r2c/c2r
673
interface (@pxref{Multi-Dimensional DFTs of Real Data}).
676
@node Real even/odd DFTs (cosine/sine transforms), The Discrete Hartley Transform, The Halfcomplex-format DFT, More DFTs of Real Data
677
@subsection Real even/odd DFTs (cosine/sine transforms)
679
The Fourier transform of a real-even function @math{f(-x) = f(x)} is
680
real-even, and @math{i} times the Fourier transform of a real-odd
681
function @math{f(-x) = -f(x)} is real-odd. Similar results hold for a
682
discrete Fourier transform, and thus for these symmetries the need for
683
complex inputs/outputs is entirely eliminated. Moreover, one gains a
684
factor of two in speed/space from the fact that the data are real, and
685
an additional factor of two from the even/odd symmetry: only the
686
non-redundant (first) half of the array need be stored. The result is
687
the real-even DFT (@dfn{REDFT}) and the real-odd DFT (@dfn{RODFT}), also
688
known as the discrete cosine and sine transforms (@dfn{DCT} and
689
@dfn{DST}), respectively.
690
@cindex real-even DFT
694
@cindex discrete cosine transform
696
@cindex discrete sine transform
700
(In this section, we describe the 1d transforms; multi-dimensional
701
transforms are just a separable product of these transforms operating
702
along each dimension.)
704
Because of the discrete sampling, one has an additional choice: is the
705
data even/odd around a sampling point, or around the point halfway
706
between two samples? The latter corresponds to @emph{shifting} the
707
samples by @emph{half} an interval, and gives rise to several transform
708
variants denoted by REDFT@math{ab} and RODFT@math{ab}: @math{a} and
709
@math{b} are @math{0} or @math{1}, and indicate whether the input
710
(@math{a}) and/or output (@math{b}) are shifted by half a sample
711
(@math{1} means it is shifted). These are also known as types I-IV of
712
the DCT and DST, and all four types are supported by FFTW's r2r
713
interface.@footnote{There are also type V-VIII transforms, which
714
correspond to a logical DFT of @emph{odd} size @math{N}, independent of
715
whether the physical size @code{n} is odd, but we do not support these
718
The r2r kinds for the various REDFT and RODFT types supported by FFTW,
719
along with the boundary conditions at both ends of the @emph{input}
720
array (@code{n} real numbers @code{in[j=0..n-1]}), are:
725
@code{FFTW_REDFT00} (DCT-I): even around @math{j=0} and even around @math{j=n-1}.
726
@ctindex FFTW_REDFT00
729
@code{FFTW_REDFT10} (DCT-II, ``the'' DCT): even around @math{j=-0.5} and even around @math{j=n-0.5}.
730
@ctindex FFTW_REDFT10
733
@code{FFTW_REDFT01} (DCT-III, ``the'' IDCT): even around @math{j=0} and odd around @math{j=n}.
734
@ctindex FFTW_REDFT01
738
@code{FFTW_REDFT11} (DCT-IV): even around @math{j=-0.5} and odd around @math{j=n-0.5}.
739
@ctindex FFTW_REDFT11
742
@code{FFTW_RODFT00} (DST-I): odd around @math{j=-1} and odd around @math{j=n}.
743
@ctindex FFTW_RODFT00
746
@code{FFTW_RODFT10} (DST-II): odd around @math{j=-0.5} and odd around @math{j=n-0.5}.
747
@ctindex FFTW_RODFT10
750
@code{FFTW_RODFT01} (DST-III): odd around @math{j=-1} and even around @math{j=n-1}.
751
@ctindex FFTW_RODFT01
754
@code{FFTW_RODFT11} (DST-IV): odd around @math{j=-0.5} and even around @math{j=n-0.5}.
755
@ctindex FFTW_RODFT11
759
Note that these symmetries apply to the ``logical'' array being
760
transformed; @strong{there are no constraints on your physical input
761
data}. So, for example, if you specify a size-5 REDFT00 (DCT-I) of the
762
data @math{abcde}, it corresponds to the DFT of the logical even array
763
@math{abcdedcb} of size 8. A size-4 REDFT10 (DCT-II) of the data
764
@math{abcd} corresponds to the size-8 logical DFT of the even array
765
@math{abcddcba}, shifted by half a sample.
767
All of these transforms are invertible. The inverse of R*DFT00 is
768
R*DFT00; of R*DFT10 is R*DFT01 and vice versa (these are often called
769
simply ``the'' DCT and IDCT, respectively); and of R*DFT11 is R*DFT11.
770
However, the transforms computed by FFTW are unnormalized, exactly
771
like the corresponding real and complex DFTs, so computing a transform
772
followed by its inverse yields the original array scaled by @math{N},
773
where @math{N} is the @emph{logical} DFT size. For REDFT00,
774
@math{N=2(n-1)}; for RODFT00, @math{N=2(n+1)}; otherwise, @math{N=2n}.
775
@cindex normalization
779
Note that the boundary conditions of the transform output array are
780
given by the input boundary conditions of the inverse transform.
781
Thus, the above transforms are all inequivalent in terms of
782
input/output boundary conditions, even neglecting the 0.5 shift
785
FFTW is most efficient when @math{N} is a product of small factors; note
786
that this @emph{differs} from the factorization of the physical size
787
@code{n} for REDFT00 and RODFT00! There is another oddity: @code{n=1}
788
REDFT00 transforms correspond to @math{N=0}, and so are @emph{not
789
defined} (the planner will return @code{NULL}). Otherwise, any positive
790
@code{n} is supported.
792
For the precise mathematical definitions of these transforms as used by
793
FFTW, see @ref{What FFTW Really Computes}. (For people accustomed to
794
the DCT/DST, FFTW's definitions have a coefficient of @math{2} in front
795
of the cos/sin functions so that they correspond precisely to an
796
even/odd DFT of size @math{N}. Some authors also include additional
797
multiplicative factors of
807
for selected inputs and outputs; this makes
808
the transform orthogonal, but sacrifices the direct equivalence to a
811
@subsubheading Which type do you need?
813
Since the required flavor of even/odd DFT depends upon your problem,
814
you are the best judge of this choice, but we can make a few comments
815
on relative efficiency to help you in your selection. In particular,
816
R*DFT01 and R*DFT10 tend to be slightly faster than R*DFT11
817
(especially for odd sizes), while the R*DFT00 transforms are sometimes
818
significantly slower (especially for even sizes).@footnote{R*DFT00 is
819
sometimes slower in FFTW because we discovered that the standard
820
algorithm for computing this by a pre/post-processed real DFT---the
821
algorithm used in FFTPACK, Numerical Recipes, and other sources for
822
decades now---has serious numerical problems: it already loses several
823
decimal places of accuracy for 16k sizes. There seem to be only two
824
alternatives in the literature that do not suffer similarly: a
825
recursive decomposition into smaller DCTs, which would require a large
826
set of codelets for efficiency and generality, or sacrificing a factor of
833
in speed to use a real DFT of twice the size. We currently
834
employ the latter technique for general @math{n}, as well as a limited
835
form of the former method: a split-radix decomposition when @math{n}
836
is odd (@math{N} a multiple of 4). For @math{N} containing many
837
factors of 2, the split-radix method seems to recover most of the
838
speed of the standard algorithm without the accuracy tradeoff.}
840
Thus, if only the boundary conditions on the transform inputs are
841
specified, we generally recommend R*DFT10 over R*DFT00 and R*DFT01 over
842
R*DFT11 (unless the half-sample shift or the self-inverse property is
843
significant for your problem).
845
If performance is important to you and you are using only small sizes
846
(say @math{n<200}), e.g. for multi-dimensional transforms, then you
847
might consider generating hard-coded transforms of those sizes and types
848
that you are interested in (@pxref{Generating your own code}).
850
We are interested in hearing what types of symmetric transforms you find
854
@node The Discrete Hartley Transform, , Real even/odd DFTs (cosine/sine transforms), More DFTs of Real Data
855
@subsection The Discrete Hartley Transform
857
If you are planning to use the DHT because you've heard that it is
858
``faster'' than the DFT (FFT), @strong{stop here}. The DHT is not
859
faster than the DFT. That story is an old but enduring misconception
860
that was debunked in 1987.
862
The discrete Hartley transform (DHT) is an invertible linear transform
863
closely related to the DFT. In the DFT, one multiplies each input by
864
@math{cos - i * sin} (a complex exponential), whereas in the DHT each
865
input is multiplied by simply @math{cos + sin}. Thus, the DHT
866
transforms @code{n} real numbers to @code{n} real numbers, and has the
867
convenient property of being its own inverse. In FFTW, a DHT (of any
868
positive @code{n}) can be specified by an r2r kind of @code{FFTW_DHT}.
870
@cindex discrete Hartley transform
873
Like the DFT, in FFTW the DHT is unnormalized, so computing a DHT of
874
size @code{n} followed by another DHT of the same size will result in
875
the original array multiplied by @code{n}.
876
@cindex normalization
878
The DHT was originally proposed as a more efficient alternative to the
879
DFT for real data, but it was subsequently shown that a specialized DFT
880
(such as FFTW's r2hc or r2c transforms) could be just as fast. In FFTW,
881
the DHT is actually computed by post-processing an r2hc transform, so
882
there is ordinarily no reason to prefer it from a performance
883
perspective.@footnote{We provide the DHT mainly as a byproduct of some
884
internal algorithms. FFTW computes a real input/output DFT of
885
@emph{prime} size by re-expressing it as a DHT plus post/pre-processing
886
and then using Rader's prime-DFT algorithm adapted to the DHT.}
887
However, we have heard rumors that the DHT might be the most appropriate
888
transform in its own right for certain applications, and we would be
889
very interested to hear from anyone who finds it useful.
891
If @code{FFTW_DHT} is specified for multiple dimensions of a
892
multi-dimensional transform, FFTW computes the separable product of 1d
893
DHTs along each dimension. Unfortunately, this is not quite the same
894
thing as a true multi-dimensional DHT; you can compute the latter, if
895
necessary, with at most @code{rank-1} post-processing passes
896
[see e.g. H. Hao and R. N. Bracewell, @i{Proc. IEEE} @b{75}, 264--266 (1987)].
898
For the precise mathematical definition of the DHT as used by FFTW, see
899
@ref{What FFTW Really Computes}.