1
Copyright 2002 Free Software Foundation, Inc.
1
Copyright 2002, 2005 Free Software Foundation, Inc.
3
3
This file is part of the GNU MP Library.
15
15
You should have received a copy of the GNU Lesser General Public License
16
16
along with the GNU MP Library; see the file COPYING.LIB. If not, write to
17
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
17
the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
32
powerpc generic, 604, 604e
33
powerpc/750 740, 750, 7400
33
================================================
34
powerpc generic, 604, 604e, 744x, 745x
35
powerpc/750 740, 750, 7400, 7410
36
38
The top-level powerpc directory is currently mostly aimed at 604/604e but
43
45
The code is quite well optimized for the 604e, other chips have had less
46
Altivec SIMD available in 7400 might hold some promise, but unfortunately
48
Altivec SIMD available in 74xx might hold some promise, but unfortunately
47
49
GMP only guarantees 32-bit data alignment, so there's lots of fiddling
48
50
around with partial operations at the start and end of limb vectors. A
49
51
128-bit limb would be a novel idea, but is unlikely to be practical, since
50
52
it would have to work with ordinary +, -, * etc in the C code.
54
Also, Altivec isn't very well suited for the GMP multiplication needs.
55
Using floating-point based multiplication has much better better performance
56
potential for all current powerpcs, both the ones with slow integer multiply
57
units (603, 740, 750, 7400, 7410) and those with fast (604, 604e, 744x,
58
745x). This is because all powerpcs do some level of pipelining in the FPU:
60
603 and 750 can sustain one fmadd every 2nd cycle.
61
604 and 604e can sustain one fmadd per cycle.
62
7400 and 7410 can sustain 3 fmadd in 4 cycles.
63
744x and 745x can sustain 4 fmadd in 5 cycles.
62
75
The GMP code uses the "r" forms, powerpc-defs.m4 transforms them to plain
63
76
numbers according to what GMP_ASM_POWERPC_R_REGISTERS finds is needed.
77
(Note that this style isn't fully general, as the identifier r4 and the
78
register r4 will not be distinguishable on some systems. However, this is
79
not a problem for the limited GMP assembly usage.)
86
lis 9, __gmp_modlimb_invert_table@ha
87
rlwinm 11, 5, 31, 25, 31
88
la 9, __gmp_modlimb_invert_table@l(9)
97
lwz 7, .LCL0-.LCF0(30)
99
lwz 11, .LC0-.LCTOC1(30)
100
rlwinm 3, 5, 31, 25, 31
105
.tc __gmp_modlimb_invert_table[TC],__gmp_modlimb_invert_table[RW]
107
rlwinm 0, 5, 31, 25, 31
111
lis r2, ha16(___gmp_modlimb_invert_table)
112
rlwinm r9, r5, 31, 25, 31
113
la r2, lo16(___gmp_modlimb_invert_table)(r2)
121
addis r2, r7, ha16(L___gmp_modlimb_invert_table$non_lazy_ptr-L0001$pb)
122
rlwinm r9, r5, 31, 25, 31
123
lwz r2, lo16(L___gmp_modlimb_invert_table$non_lazy_ptr-L0001$pb)(r2)
126
.non_lazy_symbol_pointer
127
L___gmp_modlimb_invert_table$non_lazy_ptr:
128
.indirect_symbol ___gmp_modlimb_invert_table
130
.subsections_via_symbols
133
For GNU/Linux and Darwin, we might want to duplicate __gmp_modlimb_invert_table
134
into the text section in this file. We should thus be able to reach it like
139
rlwinm r9, r5, 31, 25, 31
140
addi r9, r9, lo16(local_modlimb_table-L0)