1
@node Installation and Customization, Acknowledgments, Upgrading from FFTW version 2, Top
2
@chapter Installation and Customization
5
This chapter describes the installation and customization of FFTW, the
6
latest version of which may be downloaded from
7
@uref{http://www.fftw.org, the FFTW home page}.
9
In principle, FFTW should work on any system with an ANSI C compiler
10
(@code{gcc} is fine). However, planner time is drastically reduced if
11
FFTW can exploit a hardware cycle counter; FFTW comes with cycle-counter
12
support for all modern general-purpose CPUs, but you may need to add a
13
couple of lines of code if your compiler is not yet supported
14
(@pxref{Cycle Counters}). (On Unix, there will be a warning at the end
15
of the @code{configure} output if no cycle counter is found.)
21
Installation of FFTW is simplest if you have a Unix or a GNU system,
22
such as GNU/Linux, and we describe this case in the first section below,
23
including the use of special configuration options to e.g. install
24
different precisions or exploit optimizations for particular
25
architectures (e.g. SIMD). Compilation on non-Unix systems is a more
26
manual process, but we outline the procedure in the second section. It
27
is also likely that pre-compiled binaries will be available for popular
30
Finally, we describe how you can customize FFTW for particular needs by
31
generating @emph{codelets} for fast transforms of sizes not supported
32
efficiently by the standard FFTW distribution.
36
* Installation on Unix::
37
* Installation on non-Unix systems::
39
* Generating your own code::
42
@c ------------------------------------------------------------
44
@node Installation on Unix, Installation on non-Unix systems, Installation and Customization, Installation and Customization
45
@section Installation on Unix
47
FFTW comes with a @code{configure} program in the GNU style.
48
Installation can be as simple as:
57
This will build the uniprocessor complex and real transform libraries
58
along with the test programs. (We recommend that you use GNU
59
@code{make} if it is available; on some systems it is called
60
@code{gmake}.) The ``@code{make install}'' command installs the fftw
61
and rfftw libraries in standard places, and typically requires root
62
privileges (unless you specify a different install directory with the
63
@code{--prefix} flag to @code{configure}). You can also type
64
``@code{make check}'' to put the FFTW test programs through their paces.
65
If you have problems during configuration or compilation, you may want
66
to run ``@code{make distclean}'' before trying again; this ensures that
67
you don't have any stale files left over from previous compilation
70
The @code{configure} script chooses the @code{gcc} compiler by default,
71
if it is available; you can select some other compiler with:
73
./configure CC="@r{@i{<the name of your C compiler>}}"
76
The @code{configure} script knows good @code{CFLAGS} (C compiler flags)
77
@cindex compiler flags
78
for a few systems. If your system is not known, the @code{configure}
79
script will print out a warning. In this case, you should re-configure
82
./configure CFLAGS="@r{@i{<write your CFLAGS here>}}"
84
and then compile as usual. If you do find an optimal set of
85
@code{CFLAGS} for your system, please let us know what they are (along
86
with the output of @code{config.guess}) so that we can include them in
89
@code{configure} supports all the standard flags defined by the GNU
90
Coding Standards; see the @code{INSTALL} file in FFTW or
91
@uref{http://www.gnu.org/prep/standards/html_node/index.html, the GNU web page}.
92
Note especially @code{--help} to list all flags and
93
@code{--enable-shared} to create shared, rather than static, libraries.
94
@code{configure} also accepts a few FFTW-specific flags, particularly:
100
@code{--enable-float}: Produces a single-precision version of FFTW
101
(@code{float}) instead of the default double-precision (@code{double}).
106
@code{--enable-long-double}: Produces a long-double precision version of
107
FFTW (@code{long double}) instead of the default double-precision
108
(@code{double}). The @code{configure} script will halt with an error
109
message if @code{long double} is the same size as @code{double} on your
110
machine/compiler. @xref{Precision}.
114
@code{--enable-quad-precision}: Produces a quadruple-precision version
115
of FFTW using the nonstandard @code{__float128} type provided by
116
@code{gcc} 4.6 or later on x86, x86-64, and Itanium architectures,
117
instead of the default double-precision (@code{double}). The
118
@code{configure} script will halt with an error message if the
119
compiler is not @code{gcc} version 4.6 or later or if @code{gcc}'s
120
@code{libquadmath} library is not installed. @xref{Precision}.
124
@code{--enable-threads}: Enables compilation and installation of the
125
FFTW threads library (@pxref{Multi-threaded FFTW}), which provides a
126
simple interface to parallel transforms for SMP systems. By default,
127
the threads routines are not compiled.
130
@code{--enable-openmp}: Like @code{--enable-threads}, but using OpenMP
131
compiler directives in order to induce parallelism rather than
132
spawning its own threads directly, and installing an @samp{fftw3_omp} library
133
rather than an @samp{fftw3_threads} library (@pxref{Multi-threaded
134
FFTW}). You can use both @code{--enable-openmp} and @code{--enable-threads}
135
since they compile/install libraries with different names. By default,
136
the OpenMP routines are not compiled.
139
@code{--with-combined-threads}: By default, if @code{--enable-threads}
140
is used, the threads support is compiled into a separate library that
141
must be linked in addition to the main FFTW library. This is so that
142
users of the serial library do not need to link the system threads
143
libraries. If @code{--with-combined-threads} is specified, however,
144
then no separate threads library is created, and threads are included
145
in the main FFTW library. This is mainly useful under Windows, where
146
no system threads library is required and inter-library dependencies
151
@code{--enable-mpi}: Enables compilation and installation of the FFTW
152
MPI library (@pxref{Distributed-memory FFTW with MPI}), which provides
153
parallel transforms for distributed-memory systems with MPI. (By
154
default, the MPI routines are not compiled.) @xref{FFTW MPI
158
@cindex Fortran-callable wrappers
159
@code{--disable-fortran}: Disables inclusion of legacy-Fortran
160
wrapper routines (@pxref{Calling FFTW from Legacy Fortran}) in the standard
161
FFTW libraries. These wrapper routines increase the library size by
162
only a negligible amount, so they are included by default as long as
163
the @code{configure} script finds a Fortran compiler on your system.
164
(To specify a particular Fortran compiler @i{foo}, pass
165
@code{F77=}@i{foo} to @code{configure}.)
168
@code{--with-g77-wrappers}: By default, when Fortran wrappers are
169
included, the wrappers employ the linking conventions of the Fortran
170
compiler detected by the @code{configure} script. If this compiler is
171
GNU @code{g77}, however, then @emph{two} versions of the wrappers are
172
included: one with @code{g77}'s idiosyncratic convention of appending
173
two underscores to identifiers, and one with the more common
174
convention of appending only a single underscore. This way, the same
175
FFTW library will work with both @code{g77} and other Fortran
176
compilers, such as GNU @code{gfortran}. However, the converse is not
177
true: if you configure with a different compiler, then the
178
@code{g77}-compatible wrappers are not included. By specifying
179
@code{--with-g77-wrappers}, the @code{g77}-compatible wrappers are
180
included in addition to wrappers for whatever Fortran compiler
181
@code{configure} finds.
185
@code{--with-slow-timer}: Disables the use of hardware cycle counters,
186
and falls back on @code{gettimeofday} or @code{clock}. This greatly
187
worsens performance, and should generally not be used (unless you don't
188
have a cycle counter but still really want an optimized plan regardless
189
of the time). @xref{Cycle Counters}.
192
@code{--enable-sse}, @code{--enable-sse2}, @code{--enable-avx},
193
@code{--enable-altivec}: Enable the compilation of SIMD code for SSE
194
(Pentium III+), SSE2 (Pentium IV+), AVX (Sandy Bridge, Interlagos),
195
AltiVec (PowerPC G4+). SSE and AltiVec only work with
196
@code{--enable-float} (above). SSE2 works in both single and double
197
precision (and is simply SSE in single precision). The resulting code
198
will @emph{still work} on earlier CPUs lacking the SIMD extensions
199
(SIMD is automatically disabled, although the FFTW library is still
203
These options require a compiler supporting SIMD extensions, and
204
compiler support is always a bit flaky: see the FFTW FAQ for a list of
205
compiler versions that have problems compiling FFTW.
207
With AltiVec and @code{gcc}, you may have to use the
208
@code{-mabi=altivec} option when compiling any code that links to FFTW,
209
in order to properly align the stack; otherwise, FFTW could crash when
210
it tries to use an AltiVec feature. (This is not necessary on MacOS X.)
212
With SSE/SSE2 and @code{gcc}, you should use a version of gcc that
213
properly aligns the stack when compiling any code that links to FFTW.
214
By default, @code{gcc} 2.95 and later versions align the stack as
215
needed, but you should not compile FFTW with the @code{-Os} option or the
216
@code{-mpreferred-stack-boundary} option with an argument less than 4.
222
To force @code{configure} to use a particular C compiler @i{foo}
223
(instead of the default, usually @code{gcc}), pass @code{CC=}@i{foo} to the
224
@code{configure} script; you may also need to set the flags via the variable
225
@code{CFLAGS} as described above.
226
@cindex compiler flags
228
@c ------------------------------------------------------------
229
@node Installation on non-Unix systems, Cycle Counters, Installation on Unix, Installation and Customization
230
@section Installation on non-Unix systems
232
It should be relatively straightforward to compile FFTW even on non-Unix
233
systems lacking the niceties of a @code{configure} script. Basically,
234
you need to edit the @code{config.h} header (copy it from
235
@code{config.h.in}) to @code{#define} the various options and compiler
236
characteristics, and then compile all the @samp{.c} files in the
237
relevant directories.
239
The @code{config.h} header contains about 100 options to set, each one
240
initially an @code{#undef}, each documented with a comment, and most of
241
them fairly obvious. For most of the options, you should simply
242
@code{#define} them to @code{1} if they are applicable, although a few
243
options require a particular value (e.g. @code{SIZEOF_LONG_LONG} should
244
be defined to the size of the @code{long long} type, in bytes, or zero
245
if it is not supported). We will likely post some sample
246
@code{config.h} files for various operating systems and compilers for
247
you to use (at least as a starting point). Please let us know if you
248
have to hand-create a configuration file (and/or a pre-compiled binary)
249
that you want to share.
251
To create the FFTW library, you will then need to compile all of the
252
@samp{.c} files in the @code{kernel}, @code{dft}, @code{dft/scalar},
253
@code{dft/scalar/codelets}, @code{rdft}, @code{rdft/scalar},
254
@code{rdft/scalar/r2cf}, @code{rdft/scalar/r2cb},
255
@code{rdft/scalar/r2r}, @code{reodft}, and @code{api} directories.
256
If you are compiling with SIMD support (e.g. you defined
257
@code{HAVE_SSE2} in @code{config.h}), then you also need to compile
258
the @code{.c} files in the @code{simd-support},
259
@code{@{dft,rdft@}/simd}, @code{@{dft,rdft@}/simd/*} directories.
261
Once these files are all compiled, link them into a library, or a shared
262
library, or directly into your program.
264
To compile the FFTW test program, additionally compile the code in the
265
@code{libbench2/} directory, and link it into a library. Then compile
266
the code in the @code{tests/} directory and link it to the
267
@code{libbench2} and FFTW libraries. To compile the @code{fftw-wisdom}
268
(command-line) tool (@pxref{Wisdom Utilities}), compile
269
@code{tools/fftw-wisdom.c} and link it to the @code{libbench2} and FFTW
272
@c ------------------------------------------------------------
273
@node Cycle Counters, Generating your own code, Installation on non-Unix systems, Installation and Customization
274
@section Cycle Counters
275
@cindex cycle counter
277
FFTW's planner actually executes and times different possible FFT
278
algorithms in order to pick the fastest plan for a given @math{n}. In
279
order to do this in as short a time as possible, however, the timer must
280
have a very high resolution, and to accomplish this we employ the
281
hardware @dfn{cycle counters} that are available on most CPUs.
282
Currently, FFTW supports the cycle counters on x86, PowerPC/POWER, Alpha,
283
UltraSPARC (SPARC v9), IA64, PA-RISC, and MIPS processors.
286
Access to the cycle counters, unfortunately, is a compiler and/or
287
operating-system dependent task, often requiring inline assembly
288
language, and it may be that your compiler is not supported. If you are
289
@emph{not} supported, FFTW will by default fall back on its estimator
290
(effectively using @code{FFTW_ESTIMATE} for all plans).
291
@ctindex FFTW_ESTIMATE
293
You can add support by editing the file @code{kernel/cycle.h}; normally,
294
this will involve adapting one of the examples already present in order
295
to use the inline-assembler syntax for your C compiler, and will only
296
require a couple of lines of code. Anyone adding support for a new
297
system to @code{cycle.h} is encouraged to email us at @email{fftw@@fftw.org}.
299
If a cycle counter is not available on your system (e.g. some embedded
300
processor), and you don't want to use estimated plans, as a last resort
301
you can use the @code{--with-slow-timer} option to @code{configure} (on
302
Unix) or @code{#define WITH_SLOW_TIMER} in @code{config.h} (elsewhere).
303
This will use the much lower-resolution @code{gettimeofday} function, or even
304
@code{clock} if the former is unavailable, and planning will be
307
@c ------------------------------------------------------------
308
@node Generating your own code, , Cycle Counters, Installation and Customization
309
@section Generating your own code
310
@cindex code generator
312
The directory @code{genfft} contains the programs that were used to
313
generate FFTW's ``codelets,'' which are hard-coded transforms of small
316
We do not expect casual users to employ the generator, which is a rather
317
sophisticated program that generates directed acyclic graphs of FFT
318
algorithms and performs algebraic simplifications on them. It was
319
written in Objective Caml, a dialect of ML, which is available at
320
@uref{http://caml.inria.fr/ocaml/index.en.html}.
324
If you have Objective Caml installed (along with recent versions of
325
GNU @code{autoconf}, @code{automake}, and @code{libtool}), then you
326
can change the set of codelets that are generated or play with the
327
generation options. The set of generated codelets is specified by the
328
@code{@{dft,rdft@}/@{codelets,simd@}/*/Makefile.am} files. For example, you can add
329
efficient REDFT codelets of small sizes by modifying
330
@code{rdft/codelets/r2r/Makefile.am}.
332
After you modify any @code{Makefile.am} files, you can type @code{sh
333
bootstrap.sh} in the top-level directory followed by @code{make} to
334
re-generate the files.
336
We do not provide more details about the code-generation process, since
337
we do not expect that most users will need to generate their own code.
338
However, feel free to contact us at @email{fftw@@fftw.org} if
339
you are interested in the subject.
341
@cindex monadic programming
342
You might find it interesting to learn Caml and/or some modern
343
programming techniques that we used in the generator (including monadic
344
programming), especially if you heard the rumor that Java and
345
object-oriented programming are the latest advancement in the field.
346
The internal operation of the codelet generator is described in the
347
paper, ``A Fast Fourier Transform Compiler,'' by M. Frigo, which is
348
available from the @uref{http://www.fftw.org,FFTW home page} and also
349
appeared in the @cite{Proceedings of the 1999 ACM SIGPLAN Conference on
350
Programming Language Design and Implementation (PLDI)}.