1
Sat Nov 24 22:37:54 EST 2012 stevenj@fftw.org
2
* fixed deadlock bug caused by bogosity flag getting out of synch between processes; thanks to Michael Pippig for the bug report
5
M ./kernel/planner.c -3 +6
8
Wed Nov 21 18:34:29 EST 2012 athena@fftw.org
13
Wed Nov 21 18:33:15 EST 2012 athena@fftw.org
14
* use 2x2 AVX transposition instead of individual stores.
16
This seems to improve single-precision AVX on Sandy Bridge machines.
19
M ./simd-support/simd-avx.h -2 +14
21
Tue Nov 20 12:18:00 EST 2012 stevenj@fftw.org
22
* revert part of Taylor patch to acx_mpi.m4: do not link -lmpi if mpicc works without libraries, as -lmpi may be some completely different MPI implementation
24
M ./m4/acx_mpi.m4 -3 +3
26
Tue Nov 20 11:44:57 EST 2012 stevenj@fftw.org
27
* fix deadlock bug (thanks to Michael Pippig for the bug report and patch, and to Graham Dennis for the bug report) in which some processes called MPI_Alltoall and some called MPI_Alltoallv
29
M ./mpi/transpose-alltoall.c -3 +2
31
Mon Oct 29 15:20:01 EDT 2012 athena@fftw.org
34
M ./doc/tutorial.texi -2 +2
36
Mon Oct 29 09:16:43 EDT 2012 athena@fftw.org
37
* clarify that padding only applies to in-place transforms
39
M ./doc/tutorial.texi -5 +10
41
Sun Oct 28 18:42:48 EDT 2012 athena@fftw.org
42
* make the index-computation logic less paranoid
44
The problem is that for each K and for each expression of the form P[I
45
+ STRIDE * K] in a loop, most compilers will try to lift an induction
46
variable PK := &P[I + STRIDE * K]. In large codelets we have many
47
such values of K. For example, a codelet of size 32 with 4 input
48
pointers will generate O(128) induction variables, which will likely
49
overflow the register set, which is likely worse than doing the index
50
computation in the first place.
52
In the past we (wisely and correctly) assumed that compilers will do
53
the wrong thing, and consequently we disabled the induction-variable
54
"optimization" altogether by setting STRIDE ^= ZERO, where ZERO is a
55
value guaranteed to be 0. Since the compiler does not know that
56
ZERO=0, it cannot perform its "optimization" and it is forced to
59
With this patch, FFTW is a little bit less paranoid. FFTW now
60
disables the induction-variable optimization" only when we estimate
61
that the codelet uses more than ESTIMATED_AVAILABLE_INDEX_REGISTERS
64
Currently we set ESTIMATED_AVAILABLE_INDEX_REGISTERS=16. 16 registers ought
65
to be enough for anybody (or so the amd64 and ARM ISA's seem to imply).
68
M ./genfft/gen_hc2c.ml -1 +1
69
M ./genfft/gen_hc2cdft.ml -1 +1
70
M ./genfft/gen_hc2cdft_c.ml -1 +1
71
M ./genfft/gen_hc2hc.ml -1 +1
72
M ./genfft/gen_notw.ml -2 +2
73
M ./genfft/gen_notw_c.ml -2 +2
74
M ./genfft/gen_r2cb.ml -3 +3
75
M ./genfft/gen_r2cf.ml -3 +3
76
M ./genfft/gen_r2r.ml -2 +2
77
M ./genfft/gen_twiddle.ml -1 +1
78
M ./genfft/gen_twiddle_c.ml -1 +1
79
M ./genfft/gen_twidsq.ml -2 +2
80
M ./genfft/gen_twidsq_c.ml -2 +2
81
M ./genfft/genutil.ml -1 +2
82
M ./kernel/ifftw.h -3 +20
84
Sun Oct 28 18:33:24 EDT 2012 athena@fftw.org
87
M ./kernel/buffered.c +1
90
Sat Oct 27 09:58:49 EDT 2012 athena@fftw.org
91
* bump version to 3.3.3
94
M ./configure.ac -1 +1
96
Sat Oct 27 09:55:15 EDT 2012 athena@fftw.org
97
* evaluate plans for >1ms when using gettimeofday()
99
The previous limit 10ms was too paranoid, and it made life difficult
100
on machines without an "official" cycle counter, such as ARM.
102
M ./kernel/timer.c -1 +1
104
Sat Oct 27 09:46:04 EDT 2012 athena@fftw.org
105
* use 4-way NEON SIMD instead of 2-way
107
Kai-Uwe Bloem tried to warn me a year ago that 128-bit NEON was better
108
than 64-bit NEON even on machines with a 64-bit pipe, but I foolishly
109
did not listen. Now that 128-bit NEON pipes are starting to appear on
110
the market it is definitely time to switch.
113
M ./simd-support/simd-neon.h -55 +100
115
Wed Sep 26 14:21:12 EDT 2012 athena@fftw.org
116
* Note that fftw-3.3 includes MPI support
118
M ./doc/intro.texi -5 +4
120
Wed Jul 18 11:25:40 EDT 2012 athena@fftw.org
121
* remove obsolete unused function
123
M ./dft/bluestein.c -14
125
Fri Jun 29 15:57:14 EDT 2012 stevenj@fftw.org
126
* whoops, call omp_get_max_threads; thanks to Hanno Rein for the bug report
128
M ./doc/threads.texi -1 +1
1
130
Sat Apr 28 10:55:09 EDT 2012 athena@fftw.org
2
131
* Fix libfftw3/libfftw3_threads chicken-egg problem