1
ATLAS 3.6.0, changes from 3.4.2:
1
ATLAS 3.8.3 released 02/18/09, changes from 3.8.2:
2
* New architectures, arch-specific kernels, and configure support:
3
- Changed Core2Duo arch to simply Core2
4
+ New kernels backported from 3.9.x for Core2 substantially increase
6
+ New arch defs for all Core2 archs
7
- Added recognition of Corei7 architecture
8
+ Added arch defs which presently use new Core2-tuned kernels
9
- Added new K10h kernel
10
- Updated 64-bit arch defs to use it
12
- Fixed archinfo_x86 to always use extended family type (new usage)
13
- Fixed mass typo for archinfo_freebsd
14
- Fixed flag quotes for Make.mvtune
15
- Fixed error causing GEMM to get wrong answer if K=3 on MIPS arch
16
- Added additional workarounds for gcc's solaris_x86 division bug
17
- Fixed bug in TRSM tuning
18
ATLAS 3.8.2 released 06/06/08, Changes from 3.8.1:
20
- Pervasive performance bug in GEMM, affecting all architectures
21
- Occasional access of C when BETA=0
22
* Configure improvements:
23
- Improved freebsd architecture probe
24
- Improved linux cpu throttling probe
25
- Added Itanium 2 detection as "McKinley" in archinfo_linux
26
* Added cleanup for 4x1 case for 64-bit sgemm (almost doubles LAPACK 'Upper'
27
Cholesky performance -- ATLAS Cholesky is fast without this fix)
28
* Added mu=4 SSE M cleanup for extra performance
29
ATLAS 3.8.1 released 02/22/08, Changes from 3.8.0:
30
* Fixed bug in slvtst that counted complex flops same as real
31
* Fixed bug causing wrong answer for row-major gemm C=A*A' or A'A
32
* Fixed bug in configure causing Pentium-M to be IDed as CoreDuo
33
* Fixed bug in tfc.c causing memory overwrite when too many samples taken
34
* Improved L1 BLAS timers so they work like the rest of the package, and
35
thus don't die all the time on tolerance failures
36
* Improved ATLAS/tune/blas/gemm/mmsearch.c:
37
- for x86, tried more registers, since smart compiler can reduce A & B
38
regs to 2 (and possibly even 1)
39
- Made it so search tries both load-C-at-top and load-C-at-bottom of
40
M loop. Bottom is superior for error, and ATLAS originally defaulted
42
* Added configure support for new K10h platform from AMD, as well as
43
basic architectural defaults (no new kernels, just good search)
45
ATLAS 3.8.0 released 10/10/07, changes from 3.6.0:
46
* Improved installation support: now works with 5-step standard install:
47
- configure, build, check, time, install
48
- Support for easily building 32 or 64 bit libraries
49
- Support for building dynamic (shared) libraries
50
- Can build in any directory
51
* Added detailed installation guide (ATLAS/doc/atlas_install.pdf),
52
indicating how to build ATLAS, as well as describing how you can
53
ensure that the produced libraries get adequate performance as well
54
as the correct answers.
55
* Improved GEMM performance on most platforms:
56
- HAMMER (Opteron/Athlon-64), P4, P4E, Core2Duo, CoreDuo, MIPS,
57
G5/PowerPC970, POWER4, POWER5, etc.
58
- Better handling of long-thin matrices (K >> M,N) and rank-K, K<=4 shapes
59
- Improved complex performance on some platforms
60
- Further reduced error on some platforms
61
+ ATLAS error bound always <= reference BLAS before reduction
63
- OSX/x86, Solaris/x86, Linux/MIPS, modern Windows,
64
* A lot of other changes, see developer ChangeLog below for further details
65
ATLAS 3.8.0 released 10/10/07, changes from 3.7.40:
66
* Updated some documentation
67
ATLAS 3.7.40 released 10/10/07, changes from 3.7.39:
68
* Fixed configure, where lack of \n after GOODGCC caused errors on Itanium
69
* Increased MAXALLOC in tfc.c to allow larger malloc in CacheEdge detection
70
* Replaced nonportable == with -eq (int) or = (str) in test lines of
72
* Rewrote config's handling of 32/64 compiler flags to be more robust
73
to get around error found when trying to install 32bit SunOS libs
74
* Added USIII architectural defaults and config support
75
* Updated atlas_devel and atlas_contrib
76
ATLAS 3.7.39 released 10/07/07, changes from 3.7.38:
77
* Updated configure to handle AIX 64-bit flags automatically
78
* Expanded and corrected PowerPC ABI section in atlas_contrib
79
* Fixed PowerPC assembly kernels to work under AIX for 64 & 32 bit ABIs
80
ATLAS 3.7.38 released 10/05/07, changes from 3.7.37:
81
* Added new install guide, ATLAS/doc/atlas_install.pdf
83
* Added F77 testing wrappers for POSV and GESV, so slvtst can test F77 iface
84
* Expanded configure support for AIX, but build still dies
85
* Configure support and flags for G4
86
* Added arch defaults for:
88
- G4 using apple's hacked gcc 3.1
91
ATLAS 3.7.37 released 08/10/07, changes from 3.7.36:
92
* Fixed error in gemm, so we call SYRK for A*A^T only when beta=0
93
ATLAS 3.7.36 released 08/09/07, changes from 3.7.35:
94
* Some smoothing ops allowing easier use of windows compilers
95
* Fixed error in mmsearch causing PPC searches to die wt latency problems
96
* Fixed error where wrong flags caused snrm2 to be incorrect on Core2Duo
97
* Changed GER to heavily favor applying alpha to X, in order to keep LAPACK
98
from barfing up a lung on those tiny matrix test cases
99
* Fixed error in complex syreflect causing wrong answers in [c,z]gemm when
100
gemm is used to do a syrk
101
ATLAS 3.7.35 released 07/26/07, changes from 3.7.34:
102
* Changed it so pthread calls assert zero return value (debugging aid)
103
* Improved threaded GEMM performance for cases where two dim < NB
104
* Increased default MaxMalloc to 64MB
105
* Improved Windows support:
106
- Added support for building Windows ATLAS with Intel's ifort
107
- Added support for building on Windows without the cygwin library
108
- Added ability to get cycle accurate timer when using Windows compiler
109
* Improved POWER4 & P4SSE2 arch defaults.
110
* Removed duplicate symbols in Make.mmsrc messing up shared library building
111
ATLAS 3.7.34 released 06/25/07, changes from 3.7.33:
112
* Fixed error causing read of C for beta=0 in ATL_mmJITcp
113
* [S,D]KC compiles the bulk of the non-kernel library
114
* Added 64 bit single precision Core2Duo kernel, added to arch defs
115
* Added gcc4.2/P432SSE2 arch defs
116
* Changed all Makefiles so ICC compiles only interface routines, and
117
* Added support for POWER4/Linux, including 64 & 32 bit arch defs using gcc
118
- No xlc support or single precision assembly yet
119
* Install using gnu compilers now works under Windows
120
* Now works correctly for Linux/POWER5/gcc
121
ATLAS 3.7.33 released 05/01/07, changes from 3.7.32:
122
* Made it so ATLAS builds on Solaris x86:
123
- Had to remove all constant divides in integer expressions in assembly,
124
as Sun geniuses decided to change comment character to '/'
125
+ \/ is supposed to work, but doesn't
126
- Had to touch every x86 assembly file to change assembly comments to /**/
127
ATLAS 3.7.32 released 04/27/07, changes from 3.7.31:
128
* Adapted MIPS double prec kernel to single
129
* Added 32-bit support (n32) to MIPS (assembly & config)
130
* Ported UltraSPARC assembly kernels used by arch defs to v9 ABI
131
* Added arch defs to build 64 bit (v9) ABI for Solarix/UltraSPARC
132
* Documented these new interfaces in atlas_contrib.
133
ATLAS 3.7.31 released 04/17/07, changes from 3.7.30:
134
* Fixed bug in atlas_prefetch found by David Cournapeau.
135
* Added MIPSICE9 prefetch option, d/zgemm assembly kernels and arch defaults.
136
- These should work on most MIPS platforms
137
- Assembly kernels work under IRIX, but no way to get cc to do prefetch
138
+ could not make cc's pragma work with ATLAS's atlas_prefetch defs
139
* Added support for OSX/PowerPC970:
140
- Double precision assembly kernel getting 82.5% of peak (4*Mhz)
141
- Single precision assembly kernel getting 79% of peak (8*Mhz)
142
- Arch defaults for 64 & 32 bit installs
143
- Config support for random-ass apple flag extravaganza
144
ATLAS 3.7.30 released 03/25/07, changes from 3.7.29:
146
- fixed error in building --nof77 dynamic libs
147
- fixed dynamic lib link for f77 interface libs
148
- Updated L1 kernel testers in tune/ for function routs to call the test
149
func first (so correct answer not on stack), and to check for NaN
150
- Fixed it so error report genned again.
151
- Fixed error causing real JITcp to copy all the time, and then fixed
152
error in func ptr when this was selected.
153
* Wrote special Just In Time Copy (JITcp) gemm for complex that copies A&B
154
a block at a time, and calls the real kernel for complex matmul
155
- Speeds up large-case z/cgemm on some platforms (5-10%)
156
- Speeds up long-K case for some platforms (as much as doubles perf)
157
* Fixed miscalculation of CacheEdge, where I stopped using it for large K.
158
This fix reduces memory usage, and speeds up asymptotic case a bit.
159
ATLAS 3.7.29 released 02/28/07, changes from 3.7.28:
160
* Wrote special routines (mmBPP and mmMNK) for handling small M, N and
161
large K case. For M = N <= NB can double performance. Presently works
162
for real precisions (s,d) only.
163
* Translated x87 Athlon-64 kernel to 32-bit assembly.
164
* Put in special code to handle SYRK call to GEMM by calling SYRK and
165
reflecting the triangular matrix. Doubles speed, and avoid fp error
167
* Added arch defaults for Core2Duo32SSE3
168
* Fixed some problems with -b 32 in configure and building dynamic libs
169
* Fixed ATLAS/bin Makefile to correctly link x?l1blastst_dyn
171
ATLAS 3.7.28 released 02/11/07, changes from 3.7.27:
172
* bugfix release on 3.7.27 on configure/compiler behavior:
173
- Fixed possible infinite loop in probing for f77libs
174
- Made gnu arch defaults work for gnu compilers regardless of compiler name
175
ATLAS 3.7.27 released 02/10/07, changes from 3.7.26:
176
* Support for building ATLAS to .so! See INSTALL.txt for details.
177
* Expanded support for appending compiler flags:
178
- Can specify flags to be appended to gcc in user-contributed index files
179
- Can append flags to only C compilers
180
- Can append flags to only C+usergcc, all+usergcc, etc.
181
* Configure now recognizes gnu compilers as gnu compiler regardless of
182
compiler name when looking for default flags for user-override compilers
183
ATLAS 3.7.26 released 01/30/07, changes from 3.7.25:
184
* Added line to all assembly files to declare them as not requiring an
185
executable stack for Linux (apparently, lack broke SELinux).
186
* Numerous assembly fixes, particularly forced use of .text and asmdecor
187
in all x86 assembly files.
188
* Fixed dnrm2's to call sqrtl to avoid gcc round-down.
189
ATLAS 3.7.25 released 01/22/07, changes from 3.7.24:
190
* Added x87 nrm2 assembly kernels to avoid gcc probs, changed old
191
gcc-compiled nrm21 kernels to use double native precision for
192
accumulator (breaks dnrm2 due to gcc's spurious round-down).
193
* Changed Athlon64 and Core2Duo arch defaults to use load-at-bottom gemm
194
kernels, which should reduce GEMM error
195
* Changed configure to error out if ran in ATLAS source directory.
196
* Changed all ATLAS/doc postscript files to .pdf
197
ATLAS 3.7.24 released 12/18/06, changes from 3.7.23:
198
* Fixed alignment problem in x87 hammer kernel causing large performance
199
losses for AMD64 machines.
200
ATLAS 3.7.23 released 12/07/06, changes from 3.7.22:
201
* Fixed bug in Makefile causing repeated path
202
* Added basic config support for Irix
203
* Added basic arch defaults for MIPS R1[2,4,6]K using MIPSpro cc
204
* Several small bug/compatibility fixes found by MIPS/cc install
205
* Modified handling of MAFLAGS to prevent compiler hang for gcc3/Itan
207
ATLAS 3.7.22 released 11/26/06, changes from 3.7.21:
208
* Fixed bug in mmsearch's ProbeFPU that gave advantage to muladd=0, not =1.
209
* Added support for Itanium's to config
210
- Added extra lines with gcc 4's best flags to ?cases.flg
211
- gcc 3 still produces best code by slight margin
212
- Found arch defaults that do well for both gcc 3 & 4
213
* Fixed complex C = A A' bug:
214
https://sourceforge.net/tracker/index.php?func=detail&aid=1598272& \
215
group_id=23725&atid=379482
216
ATLAS 3.7.21 released 11/18/06, changes from 3.7.20:
217
* Made gemm call axpy-based GEMM when K < 4 && M >= 40 and
218
no-copy code would be used -- should help bottom of LU recursion perf
219
* Changed it so all F2C probes linked by Fortran do all I/O in Fortran,
220
instead of printing from C (some platforms seem to have problems
221
redirecting C I/O from a Fortran-linked program).
223
* Added config support for solaris install
224
ATLAS 3.7.20 released 11/11/06, changes from 3.7.19:
225
* Added ability to use Cij = instead of Cij += on first iteration of loop
227
- Max K unrolling where this is done is set by cpp macro MAX_CASG_KU
228
to avoid code bloat (always works for full unroll)
229
- For muladd=1, doesn't work if K is unknown at compile-time
230
- Speeds up load-at-bottom and beta=0 code
231
* Added ability to prefetch C when prefA selected and doing load-at-bottom
232
or beta=0. Gives nice speedup on HammerX2, need to test other machines
233
* Added -falign-loops=4 to x87-using flags
234
- big speedup on Hammer, need to test on Intel
235
* Several bug fixes to allow config/install to work on OSX/Core2Duo:
236
- Fixed userindex so that it substitutes $(GOODGCC) for gcc in .SSE & .3DN
237
files as well as in .flg
238
- Made user override of 64 bits switch the probed assembly if it was
240
- Fixed freebsd archinfo syntax error (typo in code that fixed overflow).
241
- Fixed typo in iamax_SSE.c
242
- Replaced binary constant with hex in Core2Duo gemm kernel
243
- For portability, rewrote saxpy_sse.c to avoid indirect jumps
244
ATLAS 3.7.19 released 10/14/06, changes from 3.7.18:
245
* Fixed config so it defines [S,D]MAFLAGS, and changed muladd probe
247
* Fixed a couple more assembly files to work with OS X
248
* User can now successfully override 32/64 bit choice on the configure
249
line using -b [32,64].
250
- Made config append -m32/-m64 to gnu compiler collection when ptrbits
251
is overridden by the user on the configure line
252
- Fixed error in userflag.c
253
- Fixed lack of ' ' around C compiler names in GEMM files
254
- After probes finished in config, made 32-bit override change detected
255
asmb to 32 if it was presently 64
256
ATLAS 3.7.18 released 10/12/06, changes from 3.7.17:
257
* Bugfix release only:
258
- Fixed configure so that multiple compiler flags can be passed to config.
259
- Adapted x86 assembly kernels in Level 1 & src directories so that they
260
will also run under OS X
261
- Added needed #define to ATLAS/src/invtst.c
262
- Added fix to disambiguate int & long in f77/C interface
263
ATLAS 3.7.17 released 09/09/06, changes from 3.7.16:
264
* Added ability to generate non-diagonally dominant positive definite
265
matrices to Cholesky-based testers if POSDEFGEN is defined
266
* Added new Core2Duo kernel (also think good for P4E64).
267
* New Core2Duo arch defaults.
268
ATLAS 3.7.16 released 08/30/06, changes from 3.7.15:
269
* Added flag --with-netlib-lapack to configure
270
* Added src/testing f77 wrapper for QR
271
- Still must write LU wrapper and test LLt
272
* Added crude ability to call QR from slvtst
273
* Added config support for Core2Duo and Core2Solo
274
* Added architectural defaults for Core2Duo64SSE3
275
- Hand-tuned cases not yet optimized; presently using P4-tuned kernels
276
* Made "make install" allow copy of fortran interface to fail w/o dying
277
(for users w/o fortran compiler)
278
ATLAS 3.7.15 released 08/22/06, changes from 3.7.14:
279
* New x87 kernel that achieves over 90% of peak for double precision
280
Opteron/Athlon-64. Gemm runs at roughly same speed as old SSE kernel,
281
but LU and Cholesky actually get a speedup. The fp stack usage
282
of this kernel was suggested by the new gcc.
283
- New arch defaults for HAMMER64SSE[2,3]
284
* Modified ILEANV so small problems aren't told to use the full ATLAS NB.
285
* Fixed error in mmsearch.c that often caused complex performance to be
287
* Fixes/updates to ATLAS config system:
288
- Added support for DESTDIR system on install target as in gnu
289
- Made config kill any genned core and object files after run
290
- Made "make build" delete all config executables
291
- Added --nof77 to configure
292
- Added "make check" as sanity test instead of "make test"
293
+ If --nof77 has been thrown, "make check" only calls C interface testers
294
- Added probe for 3DNow, merged 3DNow 1 & 2.
295
ATLAS 3.7.14 released 08/17/06, changes from 3.7.13:
296
* Fixes/updates to ATLAS config system:
297
- Improved cpu throttling probe
298
- Added compiler test so only compilers that work are chosen from defaults
299
- Added simple C interoperation test
300
- Fixed frontend/backend tmpnam collision prob (config[1,0].tmp)
301
- Re-enabled parallel make support
302
- Fixed buildinfo support
303
- Added clock speed probe to config
304
- Enabled "make time" to produce performance summary!
305
- Added "make check" as alias to "make test" to make more like gnu
306
-- Alias not working, need to check!
307
- Fixed error in -Si nof77 1, which caused config to die w/o f77 compiler
308
* Added new arch defaults for P4E[32,64]SSE3 and HAMMER64SSE3, which get
309
better performance for gcc 4.2 (perf should still be OK for gcc 3).
310
ATLAS 3.7.13 released 07/26/06, changes from 3.7.12:
311
* Mainly, fixes/updates to ATLAS config:
312
- Added cpu throttling test to linux, and enabled it
313
- Added "make install" to copy libs and includes
314
- Fixed basic "make error_report"
315
- Added 32/64 bit distinguishing in x86 arch def
316
- Added "-Si nof77 1" to enable easier build wt no f77 compiler
317
- Added "--help" handling to configure
318
- Added "-Si archdef 0" to suppress use of architectural defaults
319
- Added "-Si cputhrchk 0" to suppress CPU throttling error exit
320
ATLAS 3.7.12 released 07/19/06, changes from 3.7.11:
321
* Completely rewrote configure handling to make ATLAS act more like
323
- You now build ATLAS in an arbitrary build directory
324
+ /path/to/ATLAS/configure ; make build ; make test
325
- Read ATLAS/INSTALL.txt for directions, everything is changed!
326
- Presently, only supported OSes are Linux and FreeBSD (OSX).
327
Will be adding more in subsequent developer releases.
328
* Added support for prefetch in generator, mmsearch.c, fc.c, etc.
329
* Improved broken GetUserNB in ummsearch.c, which prevented good user cases
330
from being found on many systems
331
* mmsearch.c improvements:
332
- Added prefetch searching
333
- Updated FindMUNU to suggest 1-D vals on x86 boxes (2-op assembler).
334
- Made sure GetNO1D always returns false for x86 boxes (2-op assembler)
335
- Added special case for large number of registers (eg. Itanium) to
336
speed up munu search (searches near-square only)
337
+ Untested, and likely needs fixing
338
- several small error-handling issues
339
* Improved masearch.c & L1CacheSize.c to make loop-removal by compiler
341
ATLAS 3.7.11 released 07/21/05, changes from 3.7.10:
342
* This is a bugfix release:
343
- Fixed doc path errors caught by Kate Minola
344
- Fixed f77getrf/getri FunkyInts declaration
345
- Fixed Level 1 ref stX/StX typo in ATL_[dz,sc]refnrm2 caught
347
- Fixed assembly typo in ATL_dmm6x1x72_sse2 caught by Simon Perreault
348
- Added Dean's x86 assembly probe as backup for uname x8664 probe,
349
as Kate Minola reports uname probe doesn't work under solaris/x8664
350
ATLAS 3.7.10 released 04/24/05, changes from 3.7.9:
351
* Updated config.c to use Dean Gaudet's contributed CPUID probe to get
352
relatively OS-independent x86 arch info.
353
* Fixed problem where altivec makes config think not using arch def flags.
354
* Added support for EM64T:
355
- Updated config to search for x86_64 independant hammer arch
356
- Updated P4E assembly kernels to run under x86_64
357
- Updated hammer kernels to not use 3DNow inst if compiled on Intel
358
+ cpp macro ATL_Has3DNow is now defined on sys possessing 3DNow!,
359
even if SSE is the selected SIMD paradigm
360
- Generated P4E64 arch defaults
361
* Added support for 64 bit ABI PowerPC Linux:
362
- Updated config to search for 64 bit PPC
363
- New macro ATL_USE64BITS set for all 64 bit ABI
364
- Updated G4 assembler kernel to handle 64 and 32 bit Linux ABIs
365
- Updated G5 assembler kernel to handle 64 and 32 bit Linux ABIs
366
ATLAS 3.7.9 released 04/22/05, changes from 3.7.8:
367
* In order to get icc to auto-vectorize, changed all ref L1 for loops:
368
for (i=0; i != N; i++) ---> for (i=0; i < N; i++)
369
also changed code generator (only if ATL_SSE1 defined):
370
for (k=N; k; k--) ---> for (k=0; k < N; k++)
371
* icc arch defaults for P4e (using autovectorization)
372
* Fixed errors in FA_malloc
373
* Changed mmsearch to use median of CPU times and min of WALL (no more tol)
374
* Updated config to recognize the G5 (PPC970FX) and handle apple gcc
375
* Updated AltiVec kernel to use line fetch for G5
376
* Added G5-specific DGEMM assembly kernel
377
* Arch defaults for G5
378
ATLAS 3.7.8 released 07/24/04, changes from 3.7.7:
379
* Better [d,z]GEMM kernel for Transmeta Efficeon
380
ATLAS 3.7.7 released 07/17/04, changes from 3.7.6:
381
* Better [d,z]GEMM kernel for Transmeta Efficeon
382
ATLAS 3.7.6 released 07/16/04, changes from 3.7.5:
383
* Arch defaults & config support for Transmeta Efficeon.
384
* New single prec SSE kernel, added to P4E arch defaults.
385
ATLAS 3.7.5 released 06/27/04, changes from 3.7.4:
386
* Added PA-RISC 2.0 config support, arch defaults, & assembly kernels
387
ATLAS 3.7.4 released 06/12/04, changes from 3.7.3:
388
* Modified L1 testers so they all take same flags
389
* Modified L1 timers so they all take same flags (not same as testers)
390
* Modified L1 & L2 tester & timers so they all take force-alignment flags:
391
-Fa 16 -Fx -32 : force 16-byte align for A, misalign X to 32 bytes
392
ATLAS 3.7.3 released 03/20/04, changes from 3.7.2:
393
* Added P4E (prescott) support
394
* Changed config to distinguish between P4 implementations based on model
395
number; presently knows about P4 (models 0-2) and prescott (model 3)
396
* Added SSE3 to ISA probe
397
* Updated s/d P4 kernels (not cleanup yet) to work with SSE3, and smaller
398
block sizes that prescott likes
399
* Added architectural defaults for P4E (prescott)
400
ATLAS 3.7.2 released 02/29/04, changes from 3.7.1:
401
* Added empirical tuning of TRSM_NB parameter
402
ATLAS 3.7.1 released 02/21/04, changes from 3.7.0:
403
* Increased 32-bit hammer single precision gemm to 64 bit speed
404
ATLAS 3.7.0 released 02/14/04 (I love optimization), changes from 3.6.0:
405
* Increased 32-bit hammer double precision gemm to 64 bit speed
407
ATLAS 3.6.0 released 12/22/03, changes from 3.4.2:
2
408
* Gemm speedups for most architectures
3
409
- Hammer (Opteron/Athlon-64)