1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
5
GMP Development Projects
12
GMP Development Projects
17
Copyright 2000, 2001 Free Software Foundation, Inc. <br><br>
18
This file is part of the GNU MP Library. <br><br>
19
The GNU MP Library is free software; you can redistribute it and/or modify
20
it under the terms of the GNU Lesser General Public License as published
21
by the Free Software Foundation; either version 2.1 of the License, or (at
22
your option) any later version. <br><br>
23
The GNU MP Library is distributed in the hope that it will be useful, but
24
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
25
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
26
License for more details. <br><br>
27
You should have received a copy of the GNU Lesser General Public License
28
along with the GNU MP Library; see the file COPYING.LIB. If not, write to
29
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
34
<!-- NB. timestamp updated automatically by emacs -->
36
This file current as of 7 Dec 2001. An up-to-date version is available at
37
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
38
Please send comments about this page to
39
<a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
42
<p> This file lists projects suitable for volunteers. Please see the
43
<a href="tasks.html">tasks file</a> for smaller tasks.
45
<p> If you want to work on any of the projects below, please let tege@swox.com
46
know. If you want to help with a project that already somebody else is
47
working on, please talk to that person too, tege@swox.com can put you in
48
touch. (There are no email addresses of volunteers below, due to spamming
52
<li> <strong>Faster multiplication</strong>
54
<p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
55
or Fermat FFT. Several new developments are desirable:
58
<li> Handle multiplication of operands with different digit count
59
better than today. We now split the operands in a very
60
inefficient way, see mpn/generic/mul.c.
62
<li> Consider N-way Toom-Cook. See Knuth's Seminumerical
63
Algorithms for details on the method. Code implementing it
64
exists. This is asymptotically inferior to FFTs, but is finer
65
grained. A toom-4 might fit in between toom-3 and the FFTs
68
<li> It's possible CPU dependent effects like cache locality will
69
have a greater impact on speed than algorithmic improvements.
71
<li> Add support for partial products, either a given number of low limbs
72
or high limbs of the result. A high partial product can be used by
73
<code>mpf_mul</code> now, a low half partial product might be of use
74
in a future sub-quadratic REDC. On small sizes a partial product
75
will be faster simply through fewer cross-products, similar to the
76
way squaring is faster. But work by Thom Mulders shows that for
77
Karatsuba and higher order algorithms the advantage is progressively
78
lost, so for large sizes partial products turn out to be no faster.
82
<p> Another possibility would be an optimized cube. In the basecase that
83
should definitely be able to save cross-products in a similar fashion to
84
squaring, but some investigation might be needed for how best to adapt
85
the higher-order algorithms. Not sure whether cubing or further small
86
powers have any particularly important uses though.
88
<p> <li> <strong>Assembly routines</strong>
90
<p> Write new and improve existing assembly routines. The tests/devel
91
programs and the tune/speed.c and tune/many.pl programs are useful for
92
testing and timing the routines you write. See the README files in those
93
directories for more information.
95
<p> Please make sure your new routines are fast for these three situations:
97
<li> Operands that fit into the cache.
98
<li> Small operands of less than, say, 10 limbs.
99
<li> Huge operands that does not fit into the cache.
102
<p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
103
mpn_sqr_basecase. The latter two don't exist for all machines, while
104
mpn_addmul_1 exists for almost all machines.
106
<p> Standard techniques for these routines are unrolling, software
107
pipelining, and specialization for common operand values. For machines with
108
poor integer multiplication, it is often possible to improve the performance
109
using floating-point operations, or SIMD operations such as MMX or Sun's VIS.
111
<p> Using floating-point operations is interesting but somewhat tricky.
112
Since IEEE double has 53 bit of mantissa, one has to split the operands in
113
small prices, so that no result is greater than 2^53. For 32-bit computers,
114
splitting one operand into 16-bit pieces works. For 64-bit machines, one
115
operand can be split into 21-bit pieces and the other into 32-bit pieces. (A
116
64-bit operand can be split into just three 21-bit pieces if one allows the
117
split operands to be negative!)
120
<p> <li> <strong>Faster extended GCD</strong>
122
<p> We currently have extended GCD based on Lehmer's method.
123
But the binary method can quite easily be improved for bignums
124
in a similar way Lehmer improved Euclid's algorithm. The result should
125
beat Lehmer's method by about a factor of 2. Furthermore, the multiprecision
126
step of Lehmer's method and the binary method will be identical, so the
127
current code can mostly be reused. It should be possible to share code
128
between GCD and GCDEXT, and probably with Jacobi symbols too.
130
<p> Paul Zimmerman has worked on sub-quadratic GCD and GCDEXT, but it seems
131
that the most likely algorithm doesn't kick in until about 3000 limbs.
133
<p> <li> <strong>Math functions for the mpf layer</strong>
135
<p> Implement the functions of math.h for the GMP mpf layer! Check the book
136
"Pi and the AGM" by Borwein and Borwein for ideas how to do this. These
137
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
138
cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
140
<p> These are implemented in mpfr, redoing them in mpf might not be worth
141
bothering with, if the long term plan is to bring mpfr in as the new mpf.
143
<p> <li> <strong>Faster sqrt</strong>
145
<p> The current code uses divisions, which are reasonably fast, but it'd be
146
possible to use only multiplications by computing 1/sqrt(A) using this
156
That final multiply might be the full size of the input (though it might
157
only need the high half of that), so there may or may not be any speedup
160
<p> <li> <strong>Nth root</strong>
162
<p> Implement, at the mpn level, a routine for computing the nth root of a
163
number. The routine should use Newton's method, preferably without using
166
<p> If the routine becomes fast enough, perhaps square roots could be computed
169
<p> <li> <strong>Quotient-Only Division</strong>
171
<p> Some work can be saved when only the quotient is required in a division,
172
basically the necessary correction -0, -1 or -2 to the estimated
173
quotient can almost always be determined from only a few limbs of
174
multiply and subtract, rather than forming a complete remainder. The
175
greatest savings are when the quotient is small compared to the dividend
178
<p> Some code along these lines can be found in the current
179
<code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
180
remainder than might be strictly necessary. That function in its
181
current form actually then always goes on to calculate a full remainder.
182
Burnikel and Zeigler describe a similar approach for the divide and
185
<p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>
187
<p> <code>mpn_bdivmod</code> and the <code>redc</code> in
188
<code>mpz_powm</code> should use some sort of divide and conquer
189
algorithm. This would benefit <code>mpz_divexact</code>, and
190
<code>mpn_gcd</code> on large unequal size operands. See "Exact
191
Division with Karatsuba Complexity" by Jebelean for a (brief)
194
<p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
195
added to control whether <code>mpz_divexact</code> uses
196
<code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
197
is faster on large divisors.
199
<p> For the REDC, basically all that's needed is Montgomery's algorithm done
200
in multi-limb integers. R is multiple limbs, and the inverse and
201
operands are multi-precision.
203
<p> For exact division the time to calculate a multi-limb inverse is not
204
amortized across many modular operations, but instead will probably
205
create a threshold below which the current style
206
<code>mpn_bdivmod</code> is best. There's also Krandick and Jebelean,
207
"Bidirectional Exact Integer Division" to basically use a low to high
208
exact division for the low half quotient, and a quotient-only division
211
<p> It will be noted that low-half and high-half multiplies, and a low-half
212
square, can be used. These ought to each take as little as half the
213
time of a full multiply, or square, though work by Thom Mulders shows
214
the advantage is progressively lost as Karatsuba and higher algorithms
217
<p> <li> <strong>Test Suite</strong>
219
<p> Add a test suite for old bugs. These tests wouldn't loop or use
220
random numbers, but just test for specific bugs found in the
223
<p> More importantly, improve the random number controls for test
227
<li> Use the new random number framework.
228
<li> Have every test accept a seed parameter.
229
<li> Allow `make check' to set the seed parameter.
230
<li> Add more tests for important, now untested functions.
233
<p> With this in place, it should be possible to rid GMP of
234
practically all bugs by having some dedicated GMP test machines
235
running "make check" with ever increasing seeds (and
236
periodically updating to the latest GMP).
238
<p> If a few more ASSERTs were sprinkled through the code, running
239
some tests with --enable-assert might be useful, though not a
240
substitute for tests on the normal build.
242
<p> An important feature of any random tests will be to record the
243
seeds used, and perhaps to snapshot operands before performing
244
each test, so any problem exposed can be reproduced.
254
eval: (add-hook 'write-file-hooks 'time-stamp)
255
time-stamp-start: "This file current as of "
256
time-stamp-format: "%:d %3b %:y"
257
time-stamp-end: "\\."
258
time-stamp-line-limit: 50