3
* Correct bug in configure script: --enable-portable-binary option was ignored!
4
Thanks to Andrew Salamon for the bug report.
6
* Threads compilation fix on AIX: prefer xlc_r to cc_r, and don't use
7
either if we are using gcc. Thanks to Guy Moebs for the bug report.
9
* Updated FAQ to note that Apple gcc 4.0.1 on MacOS/Intel is broken,
10
and suggest a workaround. configure script now detects Core/Duo arch.
12
* Use -maltivec when checking for altivec.h. Fixes Gentoo bug #129304,
13
thanks to Markus Dittrich.
17
* Performance improvements for Intel EMT64.
19
* Performance improvements for large-size transforms with SIMD.
21
* Cycle counter support for Intel icc and Visual C++ on x86-64.
23
* In fftw-wisdom tool, replaced obsolete --impatient with --measure.
25
* Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
27
* Windows DLL support for Fortran API (added missing __declspec(dllexport)).
29
* SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
30
CPUs lacking a CPUID instruction; thanks to Eric Korpela.
34
* Faster FFTW_ESTIMATE planner.
36
* New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
38
* "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
40
* Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
42
* Faster in-place non-square transpositions (FFTW uses these internally
43
for in-place FFTs, and you can also perform them explicitly using
46
* Faster prime-size DFTs: implemented Bluestein's algorithm, as well
47
as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
49
* SIMD support for split complex arrays.
51
* Much faster Altivec/VMX performance.
53
* New fftw_set_timelimit function to specify a (rough) upper bound to the
54
planning time (does not affect ESTIMATE mode).
56
* Removed --enable-3dnow support; use --enable-k7 instead.
58
* FMA (fused multiply-add) version is now included in "standard" FFTW,
59
and is enabled with --enable-fma (the default on PowerPC and Itanium).
61
* Automatic detection of native architecture flag for gcc. New
62
configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
63
for people distributing compiled binaries of FFTW (see manual).
65
* Automatic detection of Altivec under Linux with gcc 3.4 (so that
66
same binary should work on both Altivec and non-Altivec PowerPCs).
68
* Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
71
* Various documentation clarifications.
73
* 64-bit clean. (Fixes a bug affecting the split guru planner on
74
64-bit machines, reported by David Necas.)
76
* Fixed Debian bug #259612: inadvertent use of SSE instructions on
77
non-SSE machines (causing a crash) for --enable-sse binaries.
79
* Fixed bug that caused HC2R transforms to destroy the input in
80
certain cases, even if the user specified FFTW_PRESERVE_INPUT.
82
* Fixed bug where wisdom would be lost under rare circumstances,
83
causing excessive planning time.
85
* FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
87
* Fixed accidentally exported symbol that prohibited simultaneous
88
linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
90
* Support Win32 threads under MinGW (thanks to Alessio Massaro).
92
* Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
94
* Fix build failure if no Fortran compiler is found (thanks to Charles
95
Radley for the bug report).
97
* Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
98
detection of icc architecture flag (e.g. -xW).
100
* Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
102
* Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
104
* Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
105
but its malloc is 16-byte aligned).
107
* Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
108
MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
109
reports/fixes). Added x86-64 cycle counter for PGI compilers,
110
courtesy Cristiano Calonaci.
112
* Fix compilation problem in test program due to C99 conflict.
114
* Portability fix for import_system_wisdom with djgpp (thanks to Juan
117
* Fixed compilation failure on MacOS 10.3 due to getopt conflict.
119
* Work around Visual C++ (version 6/7) bug in SSE compilation;
120
thanks to Eddie Yee for his detailed report.
122
Changes from FFTW 3.1 beta 2:
124
* Several minor compilation fixes.
126
* Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
127
fftw_set_timelimit function. Make wisdom work with time-limited plans.
129
Changes from FFTW 3.1 beta 1:
131
* Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
133
* Fixed more 64-bit problems, thanks to John Pavel for the bug report.
135
* Further speed improvements for Altivec/VMX.
137
* Further speed improvements for non-square transpositions.
143
* Some speed improvements in SIMD code.
145
* --without-cycle-counter option is removed. If no cycle counter is found,
146
then the estimator is always used. A --with-slow-timer option is provided
147
to force the use of lower-resolution timers.
149
* Several fixes for compilation under Visual C++, with help from Stefane Ruel.
151
* Added x86 cycle counter for Visual C++, with help from Morten Nissov.
153
* Added S390 cycle counter, courtesy of James Treacy.
155
* Added missing static keyword that prevented simultaneous linkage
156
of different-precision versions; thanks to Rasmus Larsen for the bug report.
158
* Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
160
* Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
162
* Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
163
preprocessor limits; thanks to Peter Vouras for the bug report.
165
* Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
166
thanks to Nicolas Decoster for the patch.
168
* Added 'make smallcheck' target in tests/ directory, at the request of
173
Major goals of this release:
175
* Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
177
* Complete rewrite, to make it easier to add new algorithms and transforms.
179
* New API, to support more general semantics.
183
* SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
184
(With special thanks to Franz Franchetti for many experimental prototypes
185
and to Stefan Kral for the vectorizing generator from fftwgel.)
187
* True in-place 1d transforms of large sizes (as well as compressed
188
twiddle tables for additional memory/cache savings).
190
* More arbitrary placement of real & imaginary data, e.g. including
191
interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
193
* Efficient prime-size transforms of real data.
195
* Multidimensional transforms can operate on a subset of a larger matrix,
196
and/or transform selected dimensions of a multidimensional array.
198
* By popular demand, simultaneous linking to double precision (fftw),
199
single precision (fftwf), and long-double precision (fftwl) versions
200
of FFTW is now supported.
202
* Cycle counters (on all modern CPUs) are exploited to speed planning.
204
* Efficient transforms of real even/odd arrays, a.k.a. discrete
205
cosine/sine transforms (types I-IV). (Currently work via pre/post
206
processing of real transforms, ala FFTPACK, so are not optimal.)
208
* DHTs (Discrete Hartley Transforms), again via post-processing
209
of real transforms (and thus suboptimal, for now).
211
* Support for linking to just those parts of FFTW that you need,
212
greatly reducing the size of statically linked programs when
213
only a limited set of transform sizes/types are required.
215
* Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
216
with a command-line tool (fftw-wisdom) to generate/update it.
218
* Fortran API can be used with both g77 and non-g77 compilers
221
* Multi-threaded version has optional OpenMP support.
223
* Authors' good looks have greatly improved with age.
225
Changes from 3.0beta3:
227
* Separate FMA distribution to better exploit fused multiply-add instructions
228
on PowerPC (and possibly other) architectures.
230
* Performance improvements via some inlining tweaks.
232
* fftw_flops now returns double arguments, not int, to avoid overflows
235
* Workarounds for automake bugs.
237
Changes from 3.0beta2:
239
* The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
240
FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
241
we replaced it with a slower routine that is more accurate.
243
* The guru planner and execute functions now have two variants, one that
244
takes complex arguments and one that takes separate real/imag pointers.
246
* Execute and planner routines now automatically align the stack on x86,
247
in case the calling program is misaligned.
249
* README file for test program.
251
* Fixed bugs in the combination of SIMD with multi-threaded transforms.
253
* Eliminated internal fftw_threads_init function, which some people were
254
calling accidentally instead of the fftw_init_threads API function.
256
* Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
258
* Support AMD x86-64 SIMD and cycle counter.
260
* Support SSE2 intrinsics in forthcoming gcc 3.3.
262
Changes from 3.0beta1:
264
* Faster in-place 1d transforms of non-power-of-two sizes.
266
* SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
269
* Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
270
default distribution only includes hard-coded size-8 DCT-II/III, however.
272
* Many minor improvements to the manual. Added section on using the
273
codelet generator to customize and enhance FFTW.
275
* The default 'make check' should now only take a few minutes; for more
276
strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
278
* fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
279
the latter uses stdout.
281
* Fixed ability to compile with a C++ compiler.
283
* Fixed support for C99 complex type under glibc.
285
* Fixed problems with alloca under MinGW, AIX.
287
* Workaround for gcc/SPARC bug.
289
* Fixed multi-threaded initialization failure on IRIX due to lack of
290
user-accessible PTHREAD_SCOPE_SYSTEM there.