~ubuntu-branches/ubuntu/quantal/gclcvs/quantal

@deftypefun void mpn_tdiv_qr (mp_limb_t *@var{qp}, mp_limb_t *@var{rp}, mp_size_t @var{qxn}, const mp_limb_t *@var{np}, mp_size_t @var{nn}, const mp_limb_t *@var{dp}, mp_size_t @var{dn})

4195

Divide @{@var{np}, @var{nn}@} by @{@var{dp}, @var{dn}@} and put the quotient

4196

at @{@var{qp}, @var{nn}@minus{}@var{dn}+1@} and the remainder at @{@var{rp},

4197

@var{dn}@}. The quotient is rounded towards 0.

4198

4199

No overlap is permitted between arguments. @var{nn} must be greater than or

4200

equal to @var{dn}. The most significant limb of @var{dp} must be non-zero.

4201

The @var{qxn} operand must be zero.

4202

@comment FIXME: Relax overlap requirements!

4203

@end deftypefun

4204

4205

@deftypefun mp_limb_t mpn_divrem (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})

4206

[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best

4207

performance.]

4208

4209

Divide @{@var{rs2p}, @var{rs2n}@} by @{@var{s3p}, @var{s3n}@}, and write the

4210

quotient at @var{r1p}, with the exception of the most significant limb, which

4211

is returned. The remainder replaces the dividend at @var{rs2p}; it will be

4212

@var{s3n} limbs long (i.e., as many limbs as the divisor).

4213

4214

In addition to an integer quotient, @var{qxn} fraction limbs are developed, and

4215

stored after the integral limbs. For most usages, @var{qxn} will be zero.

4216

4217

It is required that @var{rs2n} is greater than or equal to @var{s3n}. It is

4218

required that the most significant bit of the divisor is set.

4219

4220

If the quotient is not needed, pass @var{rs2p} + @var{s3n} as @var{r1p}. Aside

4221

from that special case, no overlap between arguments is permitted.

4222

4223

Return the most significant limb of the quotient, either 0 or 1.

4224

4225

The area at @var{r1p} needs to be @var{rs2n} @minus{} @var{s3n} + @var{qxn}

4226

limbs large.

4227

@end deftypefun

4228

4229

@deftypefn Function mp_limb_t mpn_divrem_1 (mp_limb_t *@var{r1p}, mp_size_t @var{qxn}, @w{mp_limb_t *@var{s2p}}, mp_size_t @var{s2n}, mp_limb_t @var{s3limb})

4230

@deftypefnx Macro mp_limb_t mpn_divmod_1 (mp_limb_t *@var{r1p}, mp_limb_t *@var{s2p}, @w{mp_size_t @var{s2n}}, @w{mp_limb_t @var{s3limb}})

4231

Divide @{@var{s2p}, @var{s2n}@} by @var{s3limb}, and write the quotient at

4232

@var{r1p}. Return the remainder.

4233

4234

The integer quotient is written to @{@var{r1p}+@var{qxn}, @var{s2n}@} and in

4235

addition @var{qxn} fraction limbs are developed and written to @{@var{r1p},

4236

@var{qxn}@}. Either or both @var{s2n} and @var{qxn} can be zero. For most

4237

usages, @var{qxn} will be zero.

4238

4239

@code{mpn_divmod_1} exists for upward source compatibility and is simply a

4240

macro calling @code{mpn_divrem_1} with a @var{qxn} of 0.

4241

4242

The areas at @var{r1p} and @var{s2p} have to be identical or completely

4243

separate, not partially overlapping.

4244

@end deftypefn

4245

4246

@deftypefun mp_limb_t mpn_divmod (mp_limb_t *@var{r1p}, mp_limb_t *@var{rs2p}, mp_size_t @var{rs2n}, const mp_limb_t *@var{s3p}, mp_size_t @var{s3n})

4247

[This function is obsolete. Please call @code{mpn_tdiv_qr} instead for best

4248

performance.]

4249

@end deftypefun

4250

4251

@deftypefn Macro mp_limb_t mpn_divexact_by3 (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}})

4252

@deftypefnx Function mp_limb_t mpn_divexact_by3c (mp_limb_t *@var{rp}, mp_limb_t *@var{sp}, @w{mp_size_t @var{n}}, mp_limb_t @var{carry})

4253

Divide @{@var{sp}, @var{n}@} by 3, expecting it to divide exactly, and writing

4254

the result to @{@var{rp}, @var{n}@}. If 3 divides exactly, the return value is

4255

zero and the result is the quotient. If not, the return value is non-zero and

4256

the result won't be anything useful.

4257

4258

@code{mpn_divexact_by3c} takes an initial carry parameter, which can be the

4259

return value from a previous call, so a large calculation can be done piece by

4260

piece from low to high. @code{mpn_divexact_by3} is simply a macro calling

4261

@code{mpn_divexact_by3c} with a 0 carry parameter.

4262

4263

These routines use a multiply-by-inverse and will be faster than

4264

@code{mpn_divrem_1} on CPUs with fast multiplication but slow division.

4265

4266

The source @ma{a}, result @ma{q}, size @ma{n}, initial carry @ma{i}, and

4267

return value @ma{c} satisfy @m{cb^n+a-i=3q, c*b^n + a-i = 3*q}, where

4268

@m{b=2\GMPraise{@code{mp\_bits\_per\_limb}}, b=2^mp_bits_per_limb}. The

4269

return @ma{c} is always 0, 1 or 2, and the initial carry @ma{i} must also be

4270

0, 1 or 2 (these are both borrows really). When @ma{c=0} clearly

4271

@ma{q=(a-i)/3}. When @m{c \neq 0, c!=0}, the remainder @ma{(a-i) @bmod{} 3}

4272

is given by @ma{3-c}, because @ma{b @equiv{} 1 @bmod{} 3} (when

4273

@code{mp_bits_per_limb} is even, which is always so currently).

4274

@end deftypefn

4275

4276

@deftypefun mp_limb_t mpn_mod_1 (mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})

4277

Divide @{@var{s1p}, @var{s1n}@} by @var{s2limb}, and return the remainder.

4278

@var{s1n} can be zero.

4279

@end deftypefun

4280

4281

@deftypefun mp_limb_t mpn_bdivmod (mp_limb_t *@var{rp}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, const mp_limb_t *@var{s2p}, mp_size_t @var{s2n}, unsigned long int @var{d})

4282

This function puts the low

4283

@ma{@GMPfloor{@var{d}/@nicode{mp\_bits\_per\_limb}}} limbs of @var{q} =

4284

@{@var{s1p}, @var{s1n}@}/@{@var{s2p}, @var{s2n}@} mod @m{2^d,2^@var{d}} at

4285

@var{rp}, and returns the high @var{d} mod @code{mp_bits_per_limb} bits of

4286

@var{q}.

4287

4288

@{@var{s1p}, @var{s1n}@} - @var{q} * @{@var{s2p}, @var{s2n}@} mod @m{2

4289

\GMPraise{@var{s1n}*@code{mp\_bits\_per\_limb}},

4290

2^(@var{s1n}*@nicode{mp\_bits\_per\_limb})} is placed at @var{s1p}. Since the

4291

low @ma{@GMPfloor{@var{d}/@nicode{mp\_bits\_per\_limb}}} limbs of this

4292

difference are zero, it is possible to overwrite the low limbs at @var{s1p}

4293

with this difference, provided @ma{@var{rp} @le{} @var{s1p}}.

4294

4295

This function requires that @ma{@var{s1n} * @nicode{mp\_bits\_per\_limb}

4296

@ge{} @var{D}}, and that @{@var{s2p}, @var{s2n}@} is odd.

4297

4298

@strong{This interface is preliminary. It might change incompatibly in future

4299

revisions.}

4300

@end deftypefun

4301

4302

@deftypefun mp_limb_t mpn_lshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})

4303

Shift @{@var{sp}, @var{n}@} left by @var{count} bits, and write the result to

4304

@{@var{rp}, @var{n}@}. The bits shifted out at the left are returned in the

4305

least significant @var{count} bits of the return value (the rest of the return

4306

value is zero).

4307

4308

@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The

4309

regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided

4310

@ma{@var{rp} @ge{} @var{sp}}.

4311

4312

This function is written in assembly for most CPUs.

4313

@end deftypefun

4314

4315

@deftypefun mp_limb_t mpn_rshift (mp_limb_t *@var{rp}, const mp_limb_t *@var{sp}, mp_size_t @var{n}, unsigned int @var{count})

4316

Shift @{@var{sp}, @var{n}@} right by @var{count} bits, and write the result to

4317

@{@var{rp}, @var{n}@}. The bits shifted out at the right are returned in the

4318

most significant @var{count} bits of the return value (the rest of the return

4319

value is zero).

4320

4321

@var{count} must be in the range 1 to @nicode{mp_bits_per_limb}@minus{}1. The

4322

regions @{@var{sp}, @var{n}@} and @{@var{rp}, @var{n}@} may overlap, provided

4323

@ma{@var{rp} @le{} @var{sp}}.

4324

4325

This function is written in assembly for most CPUs.

4326

@end deftypefun

4327

4328

@deftypefun int mpn_cmp (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})

4329

Compare @{@var{s1p}, @var{n}@} and @{@var{s2p}, @var{n}@} and return a

4330

positive value if @ma{@var{s1} > @var{s2}}, 0 if they are equal, or a negative

4331

value if @ma{@var{s1} < @var{s2}}.

4332

@end deftypefun

4333

4334

@deftypefun mp_size_t mpn_gcd (mp_limb_t *@var{rp}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t *@var{s2p}, mp_size_t @var{s2n})

4335

Set @{@var{rp}, @var{retval}@} to the greatest common divisor of @{@var{s1p},

4336

@var{s1n}@} and @{@var{s2p}, @var{s2n}@}. The result can be up to @var{s2n}

4337

limbs, the return value is the actual number produced. Both source operands

4338

are destroyed.

4339

4340

@{@var{s1p}, @var{s1n}@} must have at least as many bits as @{@var{s2p},

4341

@var{s2n}@}. @{@var{s2p}, @var{s2n}@} must be odd. Both operands must have

4342

non-zero most significant limbs.

4343

@end deftypefun

4344

4345

@deftypefun mp_limb_t mpn_gcd_1 (const mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t @var{s2limb})

4346

Return the greatest common divisor of @{@var{s1p}, @var{s1n}@} and

4347

@var{s2limb}. Both operands must be non-zero.

4348

@end deftypefun

4349

4350

@deftypefun mp_size_t mpn_gcdext (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, mp_size_t *@var{r2n}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n}, mp_limb_t *@var{s2p}, mp_size_t @var{s2n})

4351

Calculate the greatest common divisor of @{@var{s1p}, @var{s1n}@} and

4352

@{@var{s2p}, @var{s2n}@}. Store the gcd at @{@var{r1p}, @var{retval}@} and

4353

the first cofactor at @{@var{r2p}, *@var{r2n}@}, with *@var{r2n} negative if

4354

the cofactor is negative. @var{r1p} and @var{r2p} should each have room for

4355

@ma{@var{s1n}+1} limbs, but the return value and value stored through

4356

@var{r2n} indicate the actual number produced.

4357

4358

@ma{@{@var{s1p}, @var{s1n}@} @ge{} @{@var{s2p}, @var{s2n}@}} is required, and

4359

both must be non-zero. The regions @{@var{s1p}, @ma{@var{s1n}+1}@} and

4360

@{@var{s2p}, @ma{@var{s2n}+1}@} are destroyed (i.e. the operands plus an extra

4361

limb past the end of each).

4362

4363

The cofactor @var{r1} will satisfy @m{r_2 s_1 + k s_2 = r_1, @var{r2}*@var{s1}

4364

+ @var{k}*@var{s2} = @var{r1}}. The second cofactor @var{k} is not calculated

4365

but can easily be obtained from @m{(r_1 - r_2 s_1) / s_2, (@var{r1} -

4366

@var{r2}*@var{s1}) / @var{s2}}.

4367

@end deftypefun

4368

4369

@deftypefun mp_size_t mpn_sqrtrem (mp_limb_t *@var{r1p}, mp_limb_t *@var{r2p}, const mp_limb_t *@var{sp}, mp_size_t @var{n})

4370

Compute the square root of @{@var{sp}, @var{n}@} and put the result at

4371

@{@var{r1p}, @ma{@GMPceil{@var{n}/2}}@} and the remainder at @{@var{r2p},

4372

@var{retval}@}. @var{r2p} needs space for @var{n} limbs, but the return value

4373

indicates how many are produced.

4374

4375

The most significant limb of @{@var{sp}, @var{n}@} must be non-zero. The

4376

areas @{@var{r1p}, @ma{@GMPceil{@var{n}/2}}@} and @{@var{sp}, @var{n}@} must

4377

be completely separate. The areas @{@var{r2p}, @var{n}@} and @{@var{sp},

4378

@var{n}@} must be either identical or completely separate.

4379

4380

If the remainder is not wanted then @var{r2p} can be @code{NULL}, and in this

4381

case the return value is zero or non-zero according to whether the remainder

4382

would have been zero or non-zero.

4383

4384

A return value of zero indicates a perfect square. See also

4385

@code{mpz_perfect_square_p}.

4386

@end deftypefun

4387

4388

@deftypefun mp_size_t mpn_get_str (unsigned char *@var{str}, int @var{base}, mp_limb_t *@var{s1p}, mp_size_t @var{s1n})

4389

Convert @{@var{s1p}, @var{s1n}@} to a raw unsigned char array at @var{str} in

4390

base @var{base}, and return the number of characters produced. There may be

4391

leading zeros in the string. The string is not in ASCII; to convert it to

4392

printable format, add the ASCII codes for @samp{0} or @samp{A}, depending on

4393

the base and range.

4394

4395

The most significant limb of the input @{@var{s1p}, @var{s1n}@} must be

4396

non-zero. The area @{@var{s1p}, @var{s1n}+1@} is clobbered.

4397

4398

The area at @var{str} has to have space for the largest possible number

4399

represented by a @var{s1n} long limb array, plus one extra character.

4400

@end deftypefun

4401

4402

@deftypefun mp_size_t mpn_set_str (mp_limb_t *@var{r1p}, const char *@var{str}, size_t @var{strsize}, int @var{base})

4403

Convert the raw unsigned char array at @var{str} of length @var{strsize} to a

4404

limb array. The base of @var{str} is @var{base}. @var{strsize} must be at

4405

least 1.

4406

4407

Return the number of limbs stored in @var{r1p}.

4408

@end deftypefun

4409

4410

@deftypefun {unsigned long int} mpn_scan0 (const mp_limb_t *@var{s1p}, unsigned long int @var{bit})

4411

Scan @var{s1p} from bit position @var{bit} for the next clear bit.

4412

4413

It is required that there be a clear bit within the area at @var{s1p} at or

4414

beyond bit position @var{bit}, so that the function has something to return.

4415

@end deftypefun

4416

4417

@deftypefun {unsigned long int} mpn_scan1 (const mp_limb_t *@var{s1p}, unsigned long int @var{bit})

4418

Scan @var{s1p} from bit position @var{bit} for the next set bit.

4419

4420

It is required that there be a set bit within the area at @var{s1p} at or

4421

beyond bit position @var{bit}, so that the function has something to return.

4422

@end deftypefun

4423

4424

@deftypefun void mpn_random (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})

4425

@deftypefunx void mpn_random2 (mp_limb_t *@var{r1p}, mp_size_t @var{r1n})

4426

Generate a random number of length @var{r1n} and store it at @var{r1p}. The

4427

most significant limb is always non-zero. @code{mpn_random} generates

4428

uniformly distributed limb data, @code{mpn_random2} generates long strings of

4429

zeros and ones in the binary representation.

4430

4431

@code{mpn_random2} is intended for testing the correctness of the @code{mpn}

4432

routines.

4433

@end deftypefun

4434

4435

@deftypefun {unsigned long int} mpn_popcount (const mp_limb_t *@var{s1p}, mp_size_t @var{n})

4436

Count the number of set bits in @{@var{s1p}, @var{n}@}.

4437

@end deftypefun

4438

4439

@deftypefun {unsigned long int} mpn_hamdist (const mp_limb_t *@var{s1p}, const mp_limb_t *@var{s2p}, mp_size_t @var{n})

4440

Compute the hamming distance between @{@var{s1p}, @var{n}@} and @{@var{s2p},

4441

@var{n}@}.

4442

@end deftypefun

4443

4444

@deftypefun int mpn_perfect_square_p (const mp_limb_t *@var{s1p}, mp_size_t @var{n})

4445

Return non-zero iff @{@var{s1p}, @var{n}@} is a perfect square.

4446

@end deftypefun

4447

4448

4449

@node Random Number Functions, Formatted Output, Low-level Functions, Top

4450

@chapter Random Number Functions

4451

@cindex Random number functions

4452

4453

Sequences of pseudo-random numbers in GMP are generated using a variable of

4454

type @code{gmp_randstate_t}, which holds an algorithm selection and a current

4455

state. Such a variable must be initialized by a call to one of the

4456

@code{gmp_randinit} functions, and can be seeded with one of the

4457

@code{gmp_randseed} functions.

4458

4459

The functions actually generating random numbers are described in @ref{Integer

4460

Random Numbers}, and @ref{Miscellaneous Float Functions}.

4461

4462

The older style random number functions don't accept a @code{gmp_randstate_t}

4463

parameter but instead share a global variable of that type. They use a

4464

default algorithm and are currently not seeded (though perhaps that will

4465

change in the future). The new functions accepting a @code{gmp_randstate_t}

4466

are recommended for applications that care about randomness.

4467

4468

@menu

4469

* Random State Initialization::

4470

* Random State Seeding::

4471

@end menu

4472

4473

@node Random State Initialization, Random State Seeding, Random Number Functions, Random Number Functions

4474

@section Random State Initialization

4475

@cindex Random number state

4476

4477

@deftypefun void gmp_randinit_default (gmp_randstate_t @var{state})

4478

Initialize @var{state} with a default algorithm. This will be a compromise

4479

between speed and randomness, and is recommended for applications with no

4480

special requirements.

4481

@end deftypefun

4482

4483

@deftypefun void gmp_randinit_lc_2exp (gmp_randstate_t @var{state}, mpz_t @var{a}, @w{unsigned long @var{c}}, @w{unsigned long @var{m2exp}})

4484

Initialize @var{state} with a linear congruential algorithm @m{X = (@var{a}X +

4485

@var{c}) @bmod 2^{m2exp}, X = (@var{a}*X + @var{c}) mod 2^@var{m2exp}}.

4486

4487

The low bits of @ma{X} in this algorithm are not very random. The least

4488

significant bit will have a period no more than 2, and the second bit no more

4489

than 4, etc. For this reason only the high half of each @ma{X} is actually

4490

used.

4491

4492

When a random number of more than @ma{@var{m2exp}/2} bits is to be generated,

4493

multiple iterations of the recurrence are used and the results concatenated.

4494

@end deftypefun

4495

4496

@deftypefun int gmp_randinit_lc_2exp_size (gmp_randstate_t @var{state}, unsigned long @var{size})

4497

Initialize @var{state} for a linear congruential algorithm as per

4498

@code{gmp_randinit_lc_2exp}. @var{a}, @var{c} and @var{m2exp} are selected

4499

from a table, chosen so that @var{size} bits (or more) of each @ma{X} will be

4500

used, ie. @ma{@var{m2exp} @ge{} @var{size}/2}.

4501

4502

If successful the return value is non-zero. If @var{size} is bigger than the

4503

table data provides then the return value is zero. The maximum @var{size}

4504

currently supported is 128.

4505

@end deftypefun

4506

4507

@deftypefun void gmp_randinit (gmp_randstate_t @var{state}, @w{gmp_randalg_t @var{alg}}, ...)

4508

@strong{This function is obsolete.}

4509

4510

Initialize @var{state} with an algorithm selected by @var{alg}. The only

4511

choice is @code{GMP_RAND_ALG_LC}, which is @code{gmp_randinit_lc_2exp_size}.

4512

A third parameter of type @code{unsigned long} is required, this is the

4513

@var{size} for that function. @code{GMP_RAND_ALG_DEFAULT} or 0 are the same

4514

as @code{GMP_RAND_ALG_LC}.

4515

4516

@code{gmp_randinit} sets bits in @code{gmp_errno} to indicate an error.

4517

@code{GMP_ERROR_UNSUPPORTED_ARGUMENT} if @var{alg} is unsupported, or

4518

@code{GMP_ERROR_INVALID_ARGUMENT} if the @var{size} parameter is too big.

4519

@end deftypefun

4520

4521

@c Not yet in the library.

4522

@ignore

4523

@deftypefun void gmp_randinit_lc (gmp_randstate_t @var{state}, mpz_t @var{a}, unsigned long int @var{c}, mpz_t @var{m})

4524

Initialize @var{state} for a linear congruential scheme @m{X = (@var{a}X +

4525

@var{c}) @bmod @var{m}, X = (@var{a}*X + @var{c}) mod 2^@var{m}}.

4526

@end deftypefun

4527

@end ignore

4528

4529

@deftypefun void gmp_randclear (gmp_randstate_t @var{state})

4530

Free all memory occupied by @var{state}.

4531

@end deftypefun

4532

4533

4534

@node Random State Seeding, , Random State Initialization, Random Number Functions

4535

@section Random State Seeding

4536

@cindex Random number seeding

4537

4538

@deftypefun void gmp_randseed (gmp_randstate_t @var{state}, mpz_t @var{seed})

4539

@deftypefunx void gmp_randseed_ui (gmp_randstate_t @var{state}, @w{unsigned long int @var{seed}})

4540

Set an initial seed value into @var{state}.

4541

4542

The size of a seed determines how many different sequences of random numbers

4543

that it's possible to generate. The ``quality'' of the seed is the randomness

4544

of a given seed compared to the previous seed used, and this affects the

4545

randomness of separate number sequences. The method for choosing a seed is

4546

critical if the generated numbers are to be used for important applications,

4547

such as generating cryptographic keys.

4548

4549

Traditionally the system time has been used to seed, but care needs to be

4550

taken with this. If an application seeds often and the resolution of the

4551

system clock is low, then the same sequence of numbers might be repeated.

4552

Also, the system time is quite easy to guess, so if unpredictability is

4553

required then it should definitely not be the only source for the seed value.

4554

On some systems there's a special device @file{/dev/random} which provides

4555

random data better suited for use as a seed.

4556

@end deftypefun

4557

4558

4559

@node Formatted Output, Formatted Input, Random Number Functions, Top

4560

@chapter Formatted Output

4561

@cindex Formatted output

4562

@cindex @code{printf} formatted output

4563

4564

@menu

4565

* Formatted Output Strings::

4566

* Formatted Output Functions::

4567

* C++ Formatted Output::

4568

@end menu

4569

4570

@node Formatted Output Strings, Formatted Output Functions, Formatted Output, Formatted Output

4571

@section Format Strings

4572

4573

@code{gmp_printf} and friends accept format strings similar to the standard C

4574

@code{printf} (@pxref{Formatted Output,,,libc,The GNU C Library Reference

4575

Manual}). A format specification is of the form

4576

4577

@example

4578

% [flags] [width] [.[precision]] [type] conv

4579

@end example

4580

4581

GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}

4582

and @code{mpf_t} respectively. @samp{Z} and @samp{Q} behave like integers.

4583

@samp{Q} will print a @samp{/} and a denominator, if needed. @samp{F} behaves

4584

like a float. For example,

4585

4586

@example

4587

mpz_t z;

4588

gmp_printf ("%s is an mpz %Zd\n", "here", z);

4589

4590

mpq_t q;

4591

gmp_printf ("a hex rational: %#40Qx\n", q);

4592

4593

mpf_t f;

4594

int n;

4595

gmp_printf ("fixed point mpf %.*Ff with %d digits\n", n, f, n);

4596

@end example

4597

4598

All the standard C @code{printf} types behave the same as the C library

4599

@code{printf}, and can be freely intermixed with the GMP extensions. In the

4600

current implementation the standard parts of the format string are simply

4601

handed to @code{printf} and only the GMP extensions handled directly.

4602

4603

The flags accepted are as follows. GLIBC style @nisamp{'}

4604

(@pxref{Locales,,Locales and Internationalization,libc,The GNU C Library

4605

Reference Manual}) is only for the standard C types (not the GMP types), and

4606

only if the C library supports it.

4607

4608

@quotation

4609

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4610

@item @nicode{0} @tab pad with zeros (rather than spaces)

4611

@item @nicode{#} @tab show the base with @samp{0x}, @samp{0X} or @samp{0}

4612

@item @nicode{+} @tab always show a sign

4613

@item (space) @tab show a space or a @samp{-} sign

4614

@item @nicode{'} @tab group digits, GLIBC style (not GMP types)

4615

@end multitable

4616

@end quotation

4617

4618

The standard types accepted are as follows. @samp{h} and @samp{l} are

4619

portable, the rest will depend on the compiler (or include files) for the type

4620

and the C library for the output.

4621

4622

@quotation

4623

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4624

@item @nicode{h} @tab @nicode{short}

4625

@item @nicode{hh} @tab @nicode{char}

4626

@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}

4627

@item @nicode{l} @tab @nicode{long} or @nicode{wchar_t}

4628

@item @nicode{ll} @tab same as @nicode{L}

4629

@item @nicode{L} @tab @nicode{long long} or @nicode{long double}

4630

@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}

4631

@item @nicode{t} @tab @nicode{ptrdiff_t}

4632

@item @nicode{z} @tab @nicode{size_t}

4633

@end multitable

4634

@end quotation

4635

4636

@noindent

4637

The GMP types are

4638

4639

@quotation

4640

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4641

@item @nicode{F} @tab @nicode{mpf_t}, float conversions

4642

@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions

4643

@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions

4644

@end multitable

4645

@end quotation

4646

4647

The conversions accepted are as follows. @samp{a} and @samp{A} are always

4648

supported for @code{mpf_t} but depend on the C library for standard C float

4649

types. @samp{m} and @samp{p} depend on the C library.

4650

4651

@quotation

4652

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4653

@item @nicode{a} @nicode{A} @tab hex floats, GLIBC style

4654

@item @nicode{c} @tab character

4655

@item @nicode{d} @tab decimal integer

4656

@item @nicode{e} @nicode{E} @tab scientific format float

4657

@item @nicode{f} @tab fixed point float

4658

@item @nicode{i} @tab same as @nicode{d}

4659

@item @nicode{g} @nicode{G} @tab fixed or scientific float

4660

@item @nicode{m} @tab @code{strerror} string, GLIBC style

4661

@item @nicode{n} @tab characters written so far

4662

@item @nicode{o} @tab octal integer

4663

@item @nicode{p} @tab pointer

4664

@item @nicode{s} @tab string

4665

@item @nicode{u} @tab unsigned integer

4666

@item @nicode{x} @nicode{X} @tab hex integer

4667

@end multitable

4668

@end quotation

4669

4670

@samp{o}, @samp{x} and @samp{X} are unsigned for the standard C types, but for

4671

@samp{Z} and @samp{Q} a sign is included. @samp{u} is not meaningful for

4672

@code{Z} and @code{Q}.

4673

4674

@samp{n} can be used with any of the types, even the GMP types.

4675

4676

Other types or conversions that might be accepted by the C library

4677

@code{printf} cannot be used through @code{gmp_printf}, this includes for

4678

instance extensions registered with GLIBC @code{register_printf_function}.

4679

Also currently there's no support for POSIX @samp{$} style numbered arguments

4680

(perhaps this will be added in the future).

4681

4682

The precision field has it's usual meaning for integer @samp{Z} and float

4683

@samp{F} types, but is currently undefined for @samp{Q} and should not be used

4684

with that.

4685

4686

@code{mpf_t} conversions only ever generate as many digits as can be

4687

accurately represented by the operand, the same as @code{mpf_get_str} does.

4688

Zeros will be used if necessary to pad to the requested precision. This

4689

happens even for an @samp{f} conversion of an @code{mpf_t} which is an

4690

integer, for instance @ma{2^@W{1024}} in an @code{mpf_t} of 128 bits precision

4691

will only produce about 20 digits, then pad with zeros to the decimal point.

4692

An empty precision field like @samp{%.Fe} or @samp{%.Ff} can be used to

4693

specifically request all significant digits.

4694

4695

The decimal point character (or string) is taken from the current locale

4696

settings on systems which provide @code{localeconv} (@pxref{Locales,,Locales

4697

and Internationalization,libc,The GNU C Library Reference Manual}). The C

4698

library will normally do the same for standard float output.

4699

4700

4701

@node Formatted Output Functions, C++ Formatted Output, Formatted Output Strings, Formatted Output

4702

@section Functions

4703

4704

Each of the following functions is similar to the corresponding C library

4705

function. The basic @code{printf} forms take a variable argument list. The

4706

@code{vprintf} forms take an argument pointer, see @ref{Variadic

4707

Functions,,,libc,The GNU C Library Reference Manual}, or @samp{man 3

4708

va_start}.

4709

4710

It should be emphasised that if a format string is invalid, or the arguments

4711

don't match what the format specifies, then the behaviour of any of these

4712

functions will be unpredictable. GCC format string checking is not available,

4713

since it doesn't recognise the GMP extensions.

4714

4715

The file based functions @code{gmp_printf} and @code{gmp_fprintf} will return

4716

@ma{-1} to indicate a write error. All the functions can return @ma{-1} if

4717

the C library @code{printf} variant in use returns @ma{-1}, but this shouldn't

4718

normally occur.

4719

4720

@deftypefun int gmp_printf (const char *@var{fmt}, ...)

4721

@deftypefunx int gmp_vprintf (const char *@var{fmt}, va_list @var{ap})

4722

Print to the standard output @code{stdout}. Return the number of characters

4723

written, or @ma{-1} if an error occurred.

4724

@end deftypefun

4725

4726

@deftypefun int gmp_fprintf (FILE *@var{fp}, const char *@var{fmt}, ...)

4727

@deftypefunx int gmp_vfprintf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})

4728

Print to the stream @var{fp}. Return the number of characters written, or

4729

@ma{-1} if an error occurred.

4730

@end deftypefun

4731

4732

@deftypefun int gmp_sprintf (char *@var{buf}, const char *@var{fmt}, ...)

4733

@deftypefunx int gmp_vsprintf (char *@var{buf}, const char *@var{fmt}, va_list @var{ap})

4734

Form a null-terminated string in @var{buf}. Return the number of characters

4735

written, excluding the terminating null.

4736

4737

No overlap is permitted between the space at @var{buf} and the string

4738

@var{fmt}.

4739

4740

These functions are not recommended, since there's no protection against

4741

exceeding the space available at @var{buf}.

4742

@end deftypefun

4743

4744

@deftypefun int gmp_snprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, ...)

4745

@deftypefunx int gmp_vsnprintf (char *@var{buf}, size_t @var{size}, const char *@var{fmt}, va_list @var{ap})

4746

Form a null-terminated string in @var{buf}. No more than @var{size} bytes

4747

will be written. To get the full output, @var{size} must be enough for the

4748

string and null-terminator.

4749

4750

The return value is the total number of characters which ought to have been

4751

produced, excluding the terminating null. If @ma{@var{retval} >= @var{size}}

4752

then the actual output has been truncated to the first @ma{@var{size}-1}

4753

characters, and a null appended.

4754

4755

No overlap is permitted between the region @{@var{buf},@var{size}@} and the

4756

@var{fmt} string.

4757

4758

Notice the return value is in ISO C99 @code{snprintf} style. This is so even

4759

if the C library @code{vsnprintf} is the older GLIBC 2.0.x style.

4760

@end deftypefun

4761

4762

@deftypefun int gmp_asprintf (char **@var{pp}, const char *@var{fmt}, ...)

4763

@deftypefunx int gmp_vasprintf (char *@var{pp}, const char *@var{fmt}, va_list @var{ap})

4764

Form a null-terminated string in a block of memory obtained from the current

4765

memory allocation function (@pxref{Custom Allocation}). The block will be the

4766

size of the string and null-terminator. Put the address of the block in

4767

*@var{pp}. Return the number of characters produced, excluding the

4768

null-terminator.

4769

4770

Unlike the C library @code{asprintf}, @code{gmp_asprintf} doesn't return

4771

@ma{-1} if there's no more memory available, it lets the current allocation

4772

function handle that.

4773

@end deftypefun

4774

4775

@deftypefun int gmp_obstack_printf (struct obstack *@var{ob}, const char *@var{fmt}, ...)

4776

@deftypefunx int gmp_obstack_vprintf (struct obstack *@var{ob}, const char *@var{fmt}, va_list @var{ap})

4777

Append to the current obstack object, in the same style as

4778

@code{obstack_printf}. Return the number of characters written. A

4779

null-terminator is not written.

4780

4781

@var{fmt} cannot be within the current obstack object, since the object might

4782

move as it grows.

4783

4784

These functions are available only when the C library provides the obstack

4785

feature, which probably means only on GNU systems, see

4786

@ref{Obstacks,,,libc,The GNU C Library Reference Manual}.

4787

@end deftypefun

4788

4789

4790

@node C++ Formatted Output, , Formatted Output Functions, Formatted Output

4791

@section C++ Formatted Output

4792

@cindex C++ @code{ostream} output

4793

@cindex @code{ostream} output

4794

4795

The following functions are provided in @file{libgmpxx}, which is built if C++

4796

support is enabled (@pxref{Build Options}). Prototypes are available from

4797

@code{<gmp.h>}.

4798

4799

@deftypefun ostream& operator<< (ostream& @var{stream}, mpz_t @var{op})

4800

Print @var{op} to @var{stream}, using its @code{ios} formatting settings.

4801

@code{ios::width} is reset to 0 after output, the same as the standard

4802

@code{ostream operator<<} routines do.

4803

4804

In hex or octal, @var{op} is printed as a signed number, the same as for

4805

decimal. This is unlike the standard @code{operator<<} routines on @code{int}

4806

etc, which instead give twos complement.

4807

@end deftypefun

4808

4809

@deftypefun ostream& operator<< (ostream& @var{stream}, mpq_t @var{op})

4810

Print @var{op} to @var{stream}, using its @code{ios} formatting settings.

4811

@code{ios::width} is reset to 0 after output, the same as the standard

4812

@code{ostream operator<<} routines do.

4813

4814

Output will be a fraction like @samp{5/9}, or if the denominator is 1 then

4815

just a plain integer like @samp{123}.

4816

4817

In hex or octal, @var{op} is printed as a signed value, the same as for

4818

decimal. If @code{ios::showbase} is set then a base indicator is shown on

4819

both the numerator and denominator (if the denominator is required).

4820

@end deftypefun

4821

4822

@deftypefun ostream& operator<< (ostream& @var{stream}, mpf_t @var{op})

4823

Print @var{op} to @var{stream}, using its @code{ios} formatting settings.

4824

@code{ios::width} is reset to 0 after output, the same as the standard

4825

@code{ostream operator<<} routines do. The decimal point follows the current

4826

locale, on systems providing @code{localeconv}.

4827

4828

Hex and octal are supported, unlike the standard @code{operator<<} routines on

4829

@code{double} etc. The mantissa will be in hex or octal, the exponent will be

4830

in decimal. For hex the exponent delimiter is an @samp{@@}. This is as per

4831

@code{mpf_out_str}. @code{ios::showbase} is supported, and will put a base on

4832

the mantissa.

4833

@end deftypefun

4834

4835

These operators mean that GMP types can be printed in the usual C++ way, for

4836

example,

4837

4838

@example

4839

mpz_t z;

4840

int n;

4841

...

4842

cout << "iteration " << n << " value " << z << "\n";

4843

@end example

4844

4845

But note that @code{ostream} output (and @code{istream} input, @pxref{C++

4846

Formatted Input}) is the only overloading available and using for instance

4847

@code{+} with an @code{mpz_t} will have unpredictable results.

4848

4849

4850

@node Formatted Input, C++ Class Interface, Formatted Output, Top

4851

@chapter Formatted Input

4852

@cindex Formatted input

4853

@cindex @code{scanf} formatted input

4854

4855

@menu

4856

* Formatted Input Strings::

4857

* Formatted Input Functions::

4858

* C++ Formatted Input::

4859

@end menu

4860

4861

4862

@node Formatted Input Strings, Formatted Input Functions, Formatted Input, Formatted Input

4863

@section Formatted Input Strings

4864

4865

@code{gmp_scanf} and friends accept format strings similar to the standard C

4866

@code{scanf} (@pxref{Formatted Input,,,libc,The GNU C Library Reference

4867

Manual}). A format specification is of the form

4868

4869

@example

4870

% [flags] [width] [type] conv

4871

@end example

4872

4873

GMP adds types @samp{Z}, @samp{Q} and @samp{F} for @code{mpz_t}, @code{mpq_t}

4874

and @code{mpf_t} respectively. @samp{Z} and @samp{Q} behave like integers.

4875

@samp{Q} will read a @samp{/} and a denominator, if present. @samp{F} behaves

4876

like a float.

4877

4878

GMP variables don't require an @code{&} when passed to @code{gmp_scanf}, since

4879

they're already ``call-by-reference''. For example,

4880

4881

@example

4882

/* to read say "a(5) = 1234" */

4883

int n;

4884

mpz_t z;

4885

gmp_scanf ("a(%d) = %Zd\n", &n, z);

4886

4887

mpq_t q1, q2;

4888

gmp_sscanf ("0377 + 0x10/0x11", "%Qi + %Qi", q1, q2);

4889

4890

/* to read say "topleft (1.55,-2.66)" */

4891

mpf_t x, y;

4892

char buf[32];

4893

gmp_scanf ("%31s (%Ff,%Ff)", buf, x, y);

4894

@end example

4895

4896

All the standard C @code{scanf} types behave the same as in the C library

4897

@code{scanf}, and can be freely intermixed with the GMP extensions. In the

4898

current implementation the standard parts of the format string are simply

4899

handed to @code{scanf} and only the GMP extensions handled directly.

4900

4901

The flags accepted are as follows. @samp{a} and @samp{'} will depend on

4902

support from the C library, and @samp{'} cannot be used with GMP types.

4903

4904

@quotation

4905

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4906

@item @nicode{*} @tab read but don't store

4907

@item @nicode{a} @tab allocate a buffer (string conversions)

4908

@item @nicode{'} @tab group digits, GLIBC style (not GMP types)

4909

@end multitable

4910

@end quotation

4911

4912

The standard types accepted are as follows. @samp{h} and @samp{l} are

4913

portable, the rest will depend on the compiler (or include files) for the type

4914

and the C library for the input.

4915

4916

@quotation

4917

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4918

@item @nicode{h} @tab @nicode{short}

4919

@item @nicode{hh} @tab @nicode{char}

4920

@item @nicode{j} @tab @nicode{intmax_t} or @nicode{uintmax_t}

4921

@item @nicode{l} @tab @nicode{long} or @nicode{wchar_t}

4922

@item @nicode{ll} @tab same as @nicode{L}

4923

@item @nicode{L} @tab @nicode{long long} or @nicode{long double}

4924

@item @nicode{q} @tab @nicode{quad_t} or @nicode{u_quad_t}

4925

@item @nicode{t} @tab @nicode{ptrdiff_t}

4926

@item @nicode{z} @tab @nicode{size_t}

4927

@end multitable

4928

@end quotation

4929

4930

@noindent

4931

The GMP types are

4932

4933

@quotation

4934

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4935

@item @nicode{F} @tab @nicode{mpf_t}, float conversions

4936

@item @nicode{Q} @tab @nicode{mpq_t}, integer conversions

4937

@item @nicode{Z} @tab @nicode{mpz_t}, integer conversions

4938

@end multitable

4939

@end quotation

4940

4941

The conversions accepted are as follows. @samp{p} and @samp{[} will depend on

4942

support from the C library, the rest are standard.

4943

4944

@quotation

4945

@multitable {(space)} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

4946

@item @nicode{c} @tab character or characters

4947

@item @nicode{d} @tab decimal integer

4948

@item @nicode{e} @nicode{E} @nicode{f} @nicode{g} @nicode{G}

4949

@tab float

4950

@item @nicode{i} @tab integer with base indicator

4951

@item @nicode{n} @tab characters written so far

4952

@item @nicode{o} @tab octal integer

4953

@item @nicode{p} @tab pointer

4954

@item @nicode{s} @tab string of non-whitespace characters

4955

@item @nicode{u} @tab decimal integer

4956

@item @nicode{x} @nicode{X} @tab hex integer

4957

@item @nicode{[} @tab string of characters in a set

4958

@end multitable

4959

@end quotation

4960

4961

@samp{e}, @samp{E}, @samp{f}, @samp{g} and @samp{G} are identical, they all

4962

read either fixed point or scientific format, and either @samp{e} or @samp{E}

4963

for the exponent in scientific format.

4964

4965

@samp{x} and @samp{X} are identical, both accept both upper and lower case

4966

hexadecimal.

4967

4968

@samp{o}, @samp{u}, @samp{x} and @samp{X} all read positive or negative

4969

values. For the standard C types these are described as ``unsigned''

4970

conversions, but that merely affects certain overflow handling, negatives are

4971

still allowed (see @code{strtoul}, @ref{Parsing of Integers,,,libc,The GNU C

4972

Library Reference Manual}). For GMP types there are no overflows, and

4973

@samp{d} and @samp{u} are identical.

4974

4975

@samp{Q} type reads the numerator and (optional) denominator as given. If the

4976

value might not be in canonical form then @code{mpq_canonicalize} must be

4977

called before using it in any calculations (@pxref{Rational Number

4978

Functions}).

4979

4980

@samp{Qi} will read a base specification separately for the numerator and

4981

denominator. For example @samp{0x10/11} would be 16/11, whereas

4982

@samp{0x10/0x11} would be 16/17.

4983

4984

@samp{n} can be used with any of the types above, even the GMP types.

4985

@samp{*} to suppress assignment is allowed, though the field would then do

4986

nothing at all.

4987

4988

Other conversions or types that might be accepted by the C library

4989

@code{scanf} cannot be used through @code{gmp_scanf}.

4990

4991

Whitespace is read and discarded before a field, except for @samp{c} and

4992

@samp{[} conversions.

4993

4994

For float conversions, the decimal point character (or string) expected is

4995

taken from the current locale settings on systems which provide

4996

@code{localeconv} (@pxref{Locales,,Locales and Internationalization,libc,The

4997

GNU C Library Reference Manual}). The C library will normally do the same for

4998

standard float input.

4999

5000

5001

@node Formatted Input Functions, C++ Formatted Input, Formatted Input Strings, Formatted Input

5002

@section Formatted Input Functions

5003

5004

Each of the following functions is similar to the corresponding C library

5005

function. The plain @code{scanf} forms take a variable argument list. The

5006

@code{vscanf} forms take an argument pointer, see @ref{Variadic

5007

Functions,,,libc,The GNU C Library Reference Manual}, or @samp{man 3

5008

va_start}.

5009

5010

It should be emphasised that if a format string is invalid, or the arguments

5011

don't match what the format specifies, then the behaviour of any of these

5012

functions will be unpredictable. GCC format string checking is not available,

5013

since it doesn't recognise the GMP extensions.

5014

5015

No overlap is permitted between the @var{fmt} string and any of the results

5016

produced.

5017

5018

@deftypefun int gmp_scanf (const char *@var{fmt}, ...)

5019

@deftypefunx int gmp_vscanf (const char *@var{fmt}, va_list @var{ap})

5020

Read from the standard input @code{stdin}.

5021

@end deftypefun

5022

5023

@deftypefun int gmp_fscanf (FILE *@var{fp}, const char *@var{fmt}, ...)

5024

@deftypefunx int gmp_vfscanf (FILE *@var{fp}, const char *@var{fmt}, va_list @var{ap})

5025

Read from the stream @var{fp}.

5026

@end deftypefun

5027

5028

@deftypefun int gmp_sscanf (const char *@var{s}, const char *@var{fmt}, ...)

5029

@deftypefunx int gmp_vsscanf (const char *@var{s}, const char *@var{fmt}, va_list @var{ap})

5030

Read from a null-terminated string @var{s}.

5031

@end deftypefun

5032

5033

The return value from each of these functions is the same as the standard C99

5034

@code{scanf}, namely the number of fields successfully parsed and stored.

5035

@samp{%n} fields and fields read but suppressed by @samp{*} don't count

5036

towards the return value.

5037

5038

If end of file or file error, or end of string, is reached when a match is

5039

required, and when no previous non-suppressed fields have matched, then the

5040

return value is EOF instead of 0. A match is required for a literal character

5041

in the format string or a field other than @samp{%n}. Whitespace in the

5042

format string is only an optional match and won't induce an EOF in this

5043

fashion. Leading whitespace read and discarded for a field doesn't count as a

5044

match.

5045

5046

5047

@node C++ Formatted Input, , Formatted Input Functions, Formatted Input

5048

@section C++ Formatted Input

5049

@cindex C++ @code{istream} input

5050

@cindex @code{istream} input

5051

5052

The following functions are provided in @file{libgmpxx}, which is built only

5053

if C++ support is enabled (@pxref{Build Options}). Prototypes are available

5054

from @code{<gmp.h>}.

5055

5056

@deftypefun istream& operator>> (istream& @var{stream}, mpz_t @var{rop})

5057

Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.

5058

@end deftypefun

5059

5060

@deftypefun istream& operator>> (istream& @var{stream}, mpq_t @var{rop})

5061

Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.

5062

5063

An integer like @samp{123} will be read, or a fraction like @samp{5/9}. If

5064

the fraction is not in canonical form then @code{mpq_canonicalize} must be

5065

called (@pxref{Rational Number Functions}).

5066

@end deftypefun

5067

5068

@deftypefun istream& operator>> (istream& @var{stream}, mpf_t @var{rop})

5069

Read @var{rop} from @var{stream}, using its @code{ios} formatting settings.

5070

5071

Hex or octal floats are not supported, but might be in the future.

5072

@end deftypefun

5073

5074

These operators mean that GMP types can be read in the usual C++ way, for

5075

example,

5076

5077

@example

5078

mpz_t z;

5079

...

5080

cin >> z;

5081

@end example

5082

5083

But note that @code{istream} input (and @code{ostream} output, @pxref{C++

5084

Formatted Output}) is the only overloading available and using for instance

5085

@code{+} with an @code{mpz_t} will have unpredictable results.

5086

5087

5088

@node C++ Class Interface, BSD Compatible Functions, Formatted Input, Top

5089

@chapter C++ Class Interface

5090

@cindex C++ Interface

5091

5092

This chapter describes the C++ class based interface to GMP.

5093

5094

All GMP C language types and functions can be used in C++ programs, since

5095

@file{gmp.h} has @code{extern "C"} qualifiers, but the class interface offers

5096

overloaded functions and operators which may be more convenient.

5097

5098

Due to the implementation of this interface, a reasonably recent C++ compiler

5099

is required, one supporting namespaces, partial specialization of templates

5100

and member templates. For GCC this means version 2.91 or later.

5101

5102

@strong{Everything described in this chapter is to be considered preliminary

5103

and might be subject to incompatible changes if some unforeseen difficulty

5104

reveals itself.}

5105

5106

@menu

5107

* C++ Interface General::

5108

* C++ Interface Integers::

5109

* C++ Interface Rationals::

5110

* C++ Interface Floats::

5111

* C++ Interface MPFR::

5112

* C++ Interface Random Numbers::

5113

* C++ Interface Limitations::

5114

@end menu

5115

5116

5117

@node C++ Interface General, C++ Interface Integers, C++ Class Interface, C++ Class Interface

5118

@section C++ Interface General

5119

5120

@noindent

5121

All the C++ classes and functions are available with

5122

5123

@example

5124

#include <gmpxx.h>

5125

@end example

5126

5127

@noindent

5128

The classes defined are

5129

5130

@deftp Class mpz_class

5131

@deftpx Class mpq_class

5132

@deftpx Class mpf_class

5133

@end deftp

5134

5135

The standard operators and various standard functions are overloaded to allow

5136

arithmetic with these classes. For example,

5137

5138

@example

5139

int

5140

main (void)

5141

5142

mpz_class a, b, c;

5143

5144

a = 1234;

5145

b = "-5678";

5146

c = a+b;

5147

cout << "sum is " << c << "\n";

5148

cout << "absolute value is " << abs(c) << "\n";

5149

5150

return 0;

5151

5152

@end example

5153

5154

An important feature of the implementation is that an expression like

5155

@code{a=b+c} results in a single call to the corresponding @code{mpz_add},

5156

without using a temporary for the @code{b+c} part. Expressions which by their

5157

nature imply intermediate values, like @code{a=b*c+d*e}, still use temporaries

5158

though.

5159

5160

The classes can be freely intermixed in expressions, as can the classes and

5161

the standard C++ types.

5162

5163

Conversions back from the classes to standard C++ types aren't done

5164

automatically, instead member functions like @code{get_si} are provided (see

5165

the following sections for details).

5166

5167

Also there are no automatic conversions from the classes to the corresponding

5168

GMP C types, instead a reference to the underlying C object can be obtained

5169

with the following functions,

5170

5171

@deftypefun mpz_t mpz_class::get_mpz_t ()

5172

@deftypefunx mpq_t mpq_class::get_mpq_t ()

5173

@deftypefunx mpf_t mpf_class::get_mpf_t ()

5174

@end deftypefun

5175

5176

These can be used to call a C function which doesn't have a C++ class

5177

interface. For example to set @code{a} to the GCD of @code{b} and @code{c},

5178

5179

@example

5180

mpz_class a, b, c;

5181

...

5182

mpz_gcd (a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t());

5183

@end example

5184

5185

In the other direction, a class can be initialized from the corresponding GMP

5186

C type, or assigned to if an explicit constructor is used. In both cases this

5187

makes a copy of the value, it doesn't create any sort of association. For

5188

example,

5189

5190

@example

5191

mpz_t z;

5192

// ... init and calculate z ...

5193

mpz_class x(z);

5194

mpz_class y;

5195

y = mpz_class (z);

5196

@end example

5197

5198

There are no namespace setups in @file{gmpxx.h}, all types and functions are

5199

simply put into the global namespace. This is what @file{gmp.h} has done in

5200

the past, and continues to do for compatibility. The extras provided by

5201

@file{gmpxx.h} follow GMP naming conventions and are unlikely to clash with

5202

anything.

5203

5204

5205

@node C++ Interface Integers, C++ Interface Rationals, C++ Interface General, C++ Class Interface

5206

@section C++ Interface Integers

5207

5208

@deftypefun void mpz_class::mpz_class (type @var{n})

5209

Construct an @code{mpz_class}. All the standard C++ types may be used, except

5210

@code{long long} and @code{long double}, and all the GMP C++ classes can be

5211

used. Any necessary conversion follows the corresponding C function, for

5212

example @code{double} follows @code{mpz_set_d} (@pxref{Assigning Integers}).

5213

@end deftypefun

5214

5215

@deftypefun void mpz_class::mpz_class (mpz_t @var{z})

5216

Construct an @code{mpz_class} from an @code{mpz_t}. The value in @var{z} is

5217

copied into the new @code{mpz_class}, there won't be any permanent association

5218

between it and @var{z}.

5219

@end deftypefun

5220

5221

@deftypefun void mpz_class::mpz_class (const char *@var{s})

5222

@deftypefunx void mpz_class::mpz_class (const char *@var{s}, int base)

5223

@deftypefunx void mpz_class::mpz_class (const string& @var{s})

5224

@deftypefunx void mpz_class::mpz_class (const string& @var{s}, int base)

5225

Construct an @code{mpz_class} converted from a string using

5226

@code{mpz_set_str}, (@pxref{Assigning Integers}). If the @var{base} is not

5227

given then 0 is used.

5228

@end deftypefun

5229

5230

@deftypefun mpz_class operator/ (mpz_class @var{a}, mpz_class @var{d})

5231

@deftypefunx mpz_class operator% (mpz_class @var{a}, mpz_class @var{d})

5232

Divisions involving @code{mpz_class} round towards zero, as per the

5233

@code{mpz_tdiv_q} and @code{mpz_tdiv_r} functions (@pxref{Integer Division}).

5234

This corresponds to the rounding used for plain @code{int} calculations on

5235

most machines.

5236

5237

The @code{mpz_fdiv...} or @code{mpz_cdiv...} functions can always be called

5238

directly if desired. For example,

5239

5240

@example

5241

mpz_class q, a, d;

5242

...

5243

mpz_fdiv_q (q.get_mpz_t(), a.get_mpz_t(), d.get_mpz_t());

5244

@end example

5245

@end deftypefun

5246

5247

@deftypefun mpz_class abs (mpz_class @var{op1})

5248

@deftypefunx int cmp (mpz_class @var{op1}, type @var{op2})

5249

@deftypefunx int cmp (type @var{op1}, mpz_class @var{op2})

5250

@deftypefunx double mpz_class::get_d (void)

5251

@deftypefunx long mpz_class::get_si (void)

5252

@deftypefunx {unsigned long} mpz_class::get_ui (void)

5253

@maybepagebreak

5254

@deftypefunx bool mpz_class::fits_sint_p (void)

5255

@deftypefunx bool mpz_class::fits_slong_p (void)

5256

@deftypefunx bool mpz_class::fits_sshort_p (void)

5257

@maybepagebreak

5258

@deftypefunx bool mpz_class::fits_uint_p (void)

5259

@deftypefunx bool mpz_class::fits_ulong_p (void)

5260

@deftypefunx bool mpz_class::fits_ushort_p (void)

5261

@maybepagebreak

5262

@deftypefunx int sgn (mpz_class @var{op})

5263

@deftypefunx mpz_class sqrt (mpz_class @var{op})

5264

These functions provide a C++ class interface to the corresponding GMP C

5265

routines.

5266

5267

@code{cmp} can be used with any of the classes or the standard C++ types,

5268

except @code{long long} and @code{long double}.

5269

@end deftypefun

5270

5271

@sp 1

5272

Overloaded operators for combinations of @code{mpz_class} and @code{double}

5273

are provided for completeness, but it should be noted that if the given

5274

@code{double} is not an integer then the way any rounding is done is currently

5275

unspecified. The rounding might take place at the start, in the middle, or at

5276

the end of the operation, and it might change in the future.

5277

5278

Conversions between @code{mpz_class} and @code{double}, however, are defined

5279

to follow the corresponding C functions @code{mpz_get_d} and @code{mpz_set_d}.

5280

And comparisons are always made exactly, as per @code{mpz_cmp_d}.

5281

5282

5283

@node C++ Interface Rationals, C++ Interface Floats, C++ Interface Integers, C++ Class Interface

5284

@section C++ Interface Rationals

5285

5286

In all the following constructors, if a fraction is given then it should be in

5287

canonical form, or if not then @code{mpq_class::canonicalize} called.

5288

5289

@deftypefun void mpq_class::mpq_class (type @var{op})

5290

@deftypefunx void mpq_class::mpq_class (integer @var{num}, integer @var{den})

5291

Construct an @code{mpq_class}. The initial value can be a single value of any

5292

type, or a pair of integers (@code{mpz_class} or standard C++ integer types)

5293

representing a fraction, except that @code{long long} and @code{long double}

5294

are not supported. For example,

5295

5296

@example

5297

mpq_class q (99);

5298

mpq_class q (1.75);

5299

mpq_class q (1, 3);

5300

@end example

5301

@end deftypefun

5302

5303

@deftypefun void mpq_class::mpq_class (mpq_t @var{q})

5304

Construct an @code{mpq_class} from an @code{mpq_t}. The value in @var{q} is

5305

copied into the new @code{mpq_class}, there won't be any permanent association

5306

between it and @var{q}.

5307

@end deftypefun

5308

5309

@deftypefun void mpq_class::mpq_class (const char *@var{s})

5310

@deftypefunx void mpq_class::mpq_class (const char *@var{s}, int base)

5311

@deftypefunx void mpq_class::mpq_class (const string& @var{s})

5312

@deftypefunx void mpq_class::mpq_class (const string& @var{s}, int base)

5313

Construct an @code{mpq_class} converted from a string using

5314

@code{mpq_set_str}, (@pxref{Initializing Rationals}). If the @var{base} is

5315

not given then 0 is used.

5316

@end deftypefun

5317

5318

@deftypefun void mpq_class::canonicalize ()

5319

Put an @code{mpq_class} into canonical form, as per @ref{Rational Number

5320

Functions}. All arithmetic operators require their operands in canonical

5321

form, and will return results in canonical form.

5322

@end deftypefun

5323

5324

@deftypefun mpq_class abs (mpq_class @var{op})

5325

@deftypefunx int cmp (mpq_class @var{op1}, type @var{op2})

5326

@deftypefunx int cmp (type @var{op1}, mpq_class @var{op2})

5327

@maybepagebreak

5328

@deftypefunx double mpq_class::get_d (void)

5329

@deftypefunx int sgn (mpq_class @var{op})

5330

These functions provide a C++ class interface to the corresponding GMP C

5331

routines.

5332

5333

@code{cmp} can be used with any of the classes or the standard C++ types,

5334

except @code{long long} and @code{long double}.

5335

@end deftypefun

5336

5337

@deftypefun {mpz_class&} mpq_class::get_num ()

5338

@deftypefunx {mpz_class&} mpq_class::get_den ()

5339

Get a reference to an @code{mpz_class} which is the numerator or denominator

5340

of an @code{mpq_class}. This can be used both for read and write access. If

5341

the object returned is modified, it modifies the original @code{mpq_class}.

5342

5343

If direct manipulation might produce a non-canonical value, then

5344

@code{mpq_class::canonicalize} must be called before further operations.

5345

@end deftypefun

5346

5347

@deftypefun mpz_t mpq_class::get_num_mpz_t ()

5348

@deftypefunx mpz_t mpq_class::get_den_mpz_t ()

5349

Get a reference to the underlying @code{mpz_t} numerator or denominator of an

5350

@code{mpq_class}. This can be passed to C functions expecting an

5351

@code{mpz_t}. Any modifications made to the @code{mpz_t} will modify the

5352

original @code{mpq_class}.

5353

5354

If direct manipulation might produce a non-canonical value, then

5355

@code{mpq_class::canonicalize} must be called before further operations.

5356

@end deftypefun

5357

5358

@deftypefun istream& operator>> (istream& @var{stream}, mpq_class& @var{rop});

5359

Read @var{rop} from @var{stream}, using its @code{ios} formatting settings,

5360

the same as @code{mpq_t operator>>} (@pxref{C++ Formatted Input}).

5361

5362

If the @var{rop} read might not be in canonical form then

5363

@code{mpq_class::canonicalize} must be called.

5364

@end deftypefun

5365

5366

5367

@node C++ Interface Floats, C++ Interface MPFR, C++ Interface Rationals, C++ Class Interface

5368

@section C++ Interface Floats

5369

5370

When an expression requires the use of temporary intermediate @code{mpf_class}

5371

values, like @code{f=g*h+x*y}, those temporaries will have the same precision

5372

as the destination @code{f}. Explicit constructors can be used if this

5373

doesn't suit.

5374

5375

@deftypefun {} mpf_class::mpf_class (type @var{op})

5376

@deftypefunx {} mpf_class::mpf_class (type @var{op}, unsigned long @var{prec})

5377

Construct an @code{mpf_class}. Any standard C++ type can be used, except

5378

@code{long long} and @code{long double}, and any of the GMP C++ classes can be

5379

used.

5380

5381

If @var{prec} is given, the initial precision is that value, in bits. If

5382

@var{prec} is not given, then the initial precision is determined by the type

5383

of @var{op} given. An @code{mpz_class}, @code{mpq_class}, string, or C++

5384

builtin type will give the default @code{mpf} precision (@pxref{Initializing

5385

Floats}). An @code{mpf_class} or expression will give the precision of that

5386

value. The precision of a binary expression is the higher of the two

5387

operands.

5388

5389

@example

5390

mpf_class f(1.5); // default precision

5391

mpf_class f(1.5, 500); // 500 bits (at least)

5392

mpf_class f(x); // precision of x

5393

mpf_class f(abs(x)); // precision of x

5394

mpf_class f(-g, 1000); // 1000 bits (at least)

5395

mpf_class f(x+y); // greater of precisions of x and y

5396

@end example

5397

@end deftypefun

5398

5399

@deftypefun mpf_class abs (mpf_class @var{op})

5400

@deftypefunx mpf_class ceil (mpf_class @var{op})

5401

@deftypefunx int cmp (mpf_class @var{op1}, type @var{op2})

5402

@deftypefunx int cmp (type @var{op1}, mpf_class @var{op2})

5403

@maybepagebreak

5404

@deftypefunx mpf_class floor (mpf_class @var{op})

5405

@deftypefunx mpf_class hypot (mpf_class @var{op1}, mpf_class @var{op2})

5406

@deftypefunx double mpf_class::get_d (void)

5407

@deftypefunx long mpf_class::get_si (void)

5408

@deftypefunx {unsigned long} mpf_class::get_ui (void)

5409

@maybepagebreak

5410

@deftypefunx bool mpf_class::fits_sint_p (void)

5411

@deftypefunx bool mpf_class::fits_slong_p (void)

5412

@deftypefunx bool mpf_class::fits_sshort_p (void)

5413

@maybepagebreak

5414

@deftypefunx bool mpf_class::fits_uint_p (void)

5415

@deftypefunx bool mpf_class::fits_ulong_p (void)

5416

@deftypefunx bool mpf_class::fits_ushort_p (void)

5417

@maybepagebreak

5418

@deftypefunx int sgn (mpf_class @var{op})

5419

@deftypefunx mpf_class sqrt (mpf_class @var{op})

5420

@deftypefunx mpf_class trunc (mpf_class @var{op})

5421

These functions provide a C++ class interface to the corresponding GMP C

5422

routines.

5423

5424

@code{cmp} can be used with any of the classes or the standard C++ types,

5425

except @code{long long} and @code{long double}.

5426

5427

The accuracy provided by @code{hypot} is not currently guaranteed.

5428

@end deftypefun

5429

5430

@deftypefun {unsigned long int} mpf_class::get_prec ()

5431

@deftypefunx void mpf_class::set_prec (unsigned long @var{prec})

5432

@deftypefunx void mpf_class::set_prec_raw (unsigned long @var{prec})

5433

Get or set the current precision of an @code{mpf_class}.

5434

5435

The restrictions described for @code{mpf_set_prec_raw} (@pxref{Initializing

5436

Floats}) apply to @code{mpf_class::set_prec_raw}. Note in particular that the

5437

@code{mpf_class} must be restored to it's allocated precision before being

5438

destroyed. This must be done by application code, there's no automatic

5439

mechanism for it.

5440

@end deftypefun

5441

5442

5443

@node C++ Interface MPFR, C++ Interface Random Numbers, C++ Interface Floats, C++ Class Interface

5444

@section C++ Interface MPFR

5445

5446

The C++ class interface to MPFR is provided if MPFR is enabled (@pxref{Build

5447

Options}). This interface must be regarded as preliminary and possibly

5448

subject to incompatible changes in the future, since MPFR itself is

5449

preliminary. All definitions can be obtained with

5450

5451

@example

5452

#include <mpfrxx.h>

5453

@end example

5454

5455

@noindent

5456

This defines

5457

5458

@deftp Class mpfr_class

5459

@end deftp

5460

5461

@noindent

5462

which behaves similarly to @code{mpf_class} (@pxref{C++ Interface Floats}).

5463

5464

5465

@node C++ Interface Random Numbers, C++ Interface Limitations, C++ Interface MPFR, C++ Class Interface

5466

@section C++ Interface Random Numbers

5467

5468

@deftp Class gmp_randclass

5469

The C++ class interface to the GMP random number functions uses

5470

@code{gmp_randclass} to hold an algorithm selection and current state, as per

5471

@code{gmp_randstate_t}.

5472

@end deftp

5473

5474

@deftypefun {} gmp_randclass::gmp_randclass (void (*@var{randinit}) (gmp_randstate_t, ...), ...)

5475

Construct a @code{gmp_randclass}, using a call to the given @var{randinit}

5476

function (@pxref{Random State Initialization}). The arguments expected are

5477

the same as @var{randinit}, but with @code{mpz_class} instead of @code{mpz_t}.

5478

For example,

5479

5480

@example

5481

gmp_randclass r1 (gmp_randinit_default);

5482

gmp_randclass r2 (gmp_randinit_lc_2exp_size, 32);

5483

gmp_randclass r3 (gmp_randinit_lc_2exp, a, c, m2exp);

5484

@end example

5485

5486

@code{gmp_randinit_lc_2exp_size} can fail if the size requested is too big,

5487

the behaviour of @code{gmp_randclass::gmp_randclass} is undefined in this case

5488

(perhaps this will change in the future).

5489

@end deftypefun

5490

5491

@deftypefun {} gmp_randclass::gmp_randclass (gmp_randalg_t @var{alg}, ...)

5492

Construct a @code{gmp_randclass} using the same parameters as

5493

@code{gmp_randinit} (@pxref{Random State Initialization}). This function is

5494

obsolete and the above @var{randinit} style should be preferred.

5495

@end deftypefun

5496

5497

@deftypefun void gmp_randclass::seed (unsigned long int @var{s})

5498

@deftypefunx void gmp_randclass::seed (mpz_class @var{s})

5499

Seed a random number generator. See @pxref{Random Number Functions}, for how

5500

to choose a good seed.

5501

@end deftypefun

5502

5503

@deftypefun mpz_class gmp_randclass::get_z_bits (unsigned long @var{bits})

5504

@deftypefunx mpz_class gmp_randclass::get_z_bits (mpz_class @var{bits})

5505

Generate a random integer with a specified number of bits.

5506

@end deftypefun

5507

5508

@deftypefun mpz_class gmp_randclass::get_z_range (mpz_class @var{n})

5509

Generate a random integer in the range 0 to @ma{@var{n}-1} inclusive.

5510

@end deftypefun

5511

5512

@deftypefun mpf_class gmp_randclass::get_f ()

5513

@deftypefunx mpf_class gmp_randclass::get_f (unsigned long @var{prec})

5514

Generate a random float @var{f} in the range @ma{0 <= @var{f} < 1}. @var{f}

5515

will be to @var{prec} bits precision, or if @var{prec} is not given then to

5516

the precision of the destination. For example,

5517

5518

@example

5519

gmp_randclass r;

5520

...

5521

mpf_class f (0, 512); // 512 bits precision

5522

f = r.get_f(); // random number, 512 bits

5523

@end example

5524

@end deftypefun

5525

5526

5527

5528

@node C++ Interface Limitations, , C++ Interface Random Numbers, C++ Class Interface

5529

@section C++ Interface Limitations

5530

5531

@table @asis

5532

@item @code{mpq_class} and Templated Reading

5533

A generic piece of template code probably won't know that @code{mpq_class}

5534

requires a @code{canonicalize} call if inputs read with @code{operator>>}

5535

might be non-canonical. This can lead to incorrect results.

5536

5537

@code{operator>>} behaves as it does for reasons of efficiency. A

5538

canonicalize can be quite time consuming on large operands, and is best

5539

avoided if it's not necessary.

5540

5541

But this potential difficulty reduces the usefulness of @code{mpq_class}.

5542

Perhaps a mechanism to tell @code{operator>>} what to do will be adopted in

5543

the future, maybe a preprocessor define, a global flag, or an @code{ios} flag

5544

pressed into service. Or maybe, at the risk of inconsistency, the

5545

@code{mpq_class} @code{operator>>} could canonicalize and leave @code{mpq_t}

5546

@code{operator>>} not doing so, for use on those occasions when that's

5547

acceptable. Send feedback or alternate ideas to @email{bug-gmp@@gnu.org}.

5548

5549

@item Subclassing

5550

Subclassing the GMP C++ classes works, but is not currently recommended.

5551

5552

Expressions involving subclasses resolve correctly (or seem to), but in normal

5553

C++ fashion the subclass doesn't inherit constructors and assignments.

5554

There's many of those in the GMP classes, and a good way to reestablish them

5555

in a subclass is not yet provided.

5556

5557

@item Templated Expressions

5558

5559

A subtle difficulty exists when using expressions together with

5560

application-defined template functions. Consider the following, with @code{T}

5561

intended to be some numeric type,

5562

5563

@example

5564

template <class T>

5565

T fun (const T &, const T &);

5566

@end example

5567

5568

@noindent

5569

When used with, say, plain @code{mpz_class} variables, it works fine: @code{T}

5570

is resolved as @code{mpz_class}.

5571

5572

@example

5573

mpz_class f(1), g(2);

5574

fun (f, g); // Good

5575

@end example

5576

5577

@noindent

5578

But when one of the arguments is an expression, it doesn't work.

5579

5580

@example

5581

mpz_class f(1), g(2), h(3);

5582

fun (f, g+h); // Bad

5583

@end example

5584

5585

This is because @code{g+h} ends up being a certain expression template type

5586

internal to @code{gmpxx.h}, which the C++ template resolution rules are unable

5587

to automatically convert to @code{mpz_class}. The workaround is simply to add

5588

an explicit cast.

5589

5590

@example

5591

mpz_class f(1), g(2), h(3);

5592

fun (f, mpz_class(g+h)); // Good

5593

@end example

5594

5595

Similarly, within @code{fun} it may be necessary to cast an expression to type

5596

@code{T} when calling a templated @code{fun2}.

5597

5598

@example

5599

template <class T>

5600

void fun (T f, T g)

5601

5602

fun2 (f, f+g); // Bad

5603

5604

5605

template <class T>

5606

void fun (T f, T g)

5607

5608

fun2 (f, T(f+g)); // Good

5609

5610

@end example

5611

@end table

5612

5613

5614

@node BSD Compatible Functions, Custom Allocation, C++ Class Interface, Top

5615

@comment node-name, next, previous, up

5616

@chapter Berkeley MP Compatible Functions

5617

@cindex Berkeley MP compatible functions

5618

@cindex BSD MP compatible functions

5619

5620

These functions are intended to be fully compatible with the Berkeley MP

5621

library which is available on many BSD derived U*ix systems. The

5622

@samp{--enable-mpbsd} option must be used when building GNU MP to make these

5623

available (@pxref{Installing GMP}).

5624

5625

The original Berkeley MP library has a usage restriction: you cannot use the

5626

same variable as both source and destination in a single function call. The

5627

compatible functions in GNU MP do not share this restriction---inputs and

5628

outputs may overlap.

5629

5630

It is not recommended that new programs are written using these functions.

5631

Apart from the incomplete set of functions, the interface for initializing

5632

@code{MINT} objects is more error prone, and the @code{pow} function collides

5633

with @code{pow} in @file{libm.a}.

5634

5635

@cindex @file{mp.h}

5636

Include the header @file{mp.h} to get the definition of the necessary types and

5637

functions. If you are on a BSD derived system, make sure to include GNU

5638

@file{mp.h} if you are going to link the GNU @file{libmp.a} to your program.

5639

This means that you probably need to give the @samp{-I<dir>} option to the

5640

compiler, where @samp{<dir>} is the directory where you have GNU @file{mp.h}.

5641

5642

@deftypefun {MINT *} itom (signed short int @var{initial_value})

5643

Allocate an integer consisting of a @code{MINT} object and dynamic limb space.

5644

Initialize the integer to @var{initial_value}. Return a pointer to the

5645

@code{MINT} object.

5646

@end deftypefun

5647

5648

@deftypefun {MINT *} xtom (char *@var{initial_value})

5649

Allocate an integer consisting of a @code{MINT} object and dynamic limb space.

5650

Initialize the integer from @var{initial_value}, a hexadecimal,

5651

null-terminated C string. Return a pointer to the @code{MINT} object.

5652

@end deftypefun

5653

5654

@deftypefun void move (MINT *@var{src}, MINT *@var{dest})

5655

Set @var{dest} to @var{src} by copying. Both variables must be previously

5656

initialized.

5657

@end deftypefun

5658

5659

@deftypefun void madd (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})

5660

Add @var{src_1} and @var{src_2} and put the sum in @var{destination}.

5661

@end deftypefun

5662

5663

@deftypefun void msub (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})

5664

Subtract @var{src_2} from @var{src_1} and put the difference in

5665

@var{destination}.

5666

@end deftypefun

5667

5668

@deftypefun void mult (MINT *@var{src_1}, MINT *@var{src_2}, MINT *@var{destination})

5669

Multiply @var{src_1} and @var{src_2} and put the product in @var{destination}.

5670

@end deftypefun

5671

5672

@deftypefun void mdiv (MINT *@var{dividend}, MINT *@var{divisor}, MINT *@var{quotient}, MINT *@var{remainder})

5673

@deftypefunx void sdiv (MINT *@var{dividend}, signed short int @var{divisor}, MINT *@var{quotient}, signed short int *@var{remainder})

5674

Set @var{quotient} to @var{dividend}/@var{divisor}, and @var{remainder} to

5675

@var{dividend} mod @var{divisor}. The quotient is rounded towards zero; the

5676

remainder has the same sign as the dividend unless it is zero.

5677

5678

Some implementations of these functions work differently---or not at all---for

5679

negative arguments.

5680

@end deftypefun

5681

5682

@deftypefun void msqrt (MINT *@var{op}, MINT *@var{root}, MINT *@var{remainder})

5683

Set @var{root} to @m{\lfloor\sqrt{@var{op}}\rfloor, the truncated integer part

5684

of the square root of @var{op}}, like @code{mpz_sqrt}. Set @var{remainder} to

5685

@m{(@var{op} - @var{root}^2), @var{op}@minus{}@var{root}*@var{root}}, i.e.

5686

zero if @var{op} is a perfect square.

5687

5688

If @var{root} and @var{remainder} are the same variable, the results are

5689

undefined.

5690

@end deftypefun

5691

5692

@deftypefun void pow (MINT *@var{base}, MINT *@var{exp}, MINT *@var{mod}, MINT *@var{dest})

5693

Set @var{dest} to (@var{base} raised to @var{exp}) modulo @var{mod}.

5694

@end deftypefun

5695

5696

@deftypefun void rpow (MINT *@var{base}, signed short int @var{exp}, MINT *@var{dest})

5697

Set @var{dest} to @var{base} raised to @var{exp}.

5698

@end deftypefun

5699

5700

@deftypefun void gcd (MINT *@var{op1}, MINT *@var{op2}, MINT *@var{res})

5701

Set @var{res} to the greatest common divisor of @var{op1} and @var{op2}.

5702

@end deftypefun

5703

5704

@deftypefun int mcmp (MINT *@var{op1}, MINT *@var{op2})

5705

Compare @var{op1} and @var{op2}. Return a positive value if @var{op1} >

5706

@var{op2}, zero if @var{op1} = @var{op2}, and a negative value if @var{op1} <

5707

@var{op2}.

5708

@end deftypefun

5709

5710

@deftypefun void min (MINT *@var{dest})

5711

Input a decimal string from @code{stdin}, and put the read integer in

5712

@var{dest}. SPC and TAB are allowed in the number string, and are ignored.

5713

@end deftypefun

5714

5715

@deftypefun void mout (MINT *@var{src})

5716

Output @var{src} to @code{stdout}, as a decimal string. Also output a newline.

5717

@end deftypefun

5718

5719

@deftypefun {char *} mtox (MINT *@var{op})

5720

Convert @var{op} to a hexadecimal string, and return a pointer to the string.

5721

The returned string is allocated using the default memory allocation function,

5722

@code{malloc} by default.

5723

@end deftypefun

5724

5725

@deftypefun void mfree (MINT *@var{op})

5726

De-allocate, the space used by @var{op}. @strong{This function should only be

5727

passed a value returned by @code{itom} or @code{xtom}.}

5728

@end deftypefun

5729

5730

5731

@node Custom Allocation, Language Bindings, BSD Compatible Functions, Top

5732

@comment node-name, next, previous, up

5733

@chapter Custom Allocation

5734

@cindex Custom allocation

5735

@cindex Memory allocation

5736

@cindex Allocation of memory

5737

5738

By default GMP uses @code{malloc}, @code{realloc} and @code{free} for memory

5739

allocation, and if they fail GMP prints a message to the standard error output

5740

and terminates the program.

5741

5742

Alternate functions can be specified to allocate memory in a different way or

5743

to have a different error action on running out of memory.

5744

5745

This feature is available in the Berkeley compatibility library (@pxref{BSD

5746

Compatible Functions}) as well as the main GMP library.

5747

5748

@deftypefun void mp_set_memory_functions (@* void *(*@var{alloc_func_ptr}) (size_t), @* void *(*@var{realloc_func_ptr}) (void *, size_t, size_t), @* void (*@var{free_func_ptr}) (void *, size_t))

5749

Replace the current allocation functions from the arguments. If an argument

5750

is @code{NULL}, the corresponding default function is used.

5751

5752

These functions will be used for all memory allocation done by GMP, apart from

5753

temporary space from @code{alloca} if that function is available and GMP is

5754

configured to use it (@pxref{Build Options}).

5755

5756

@strong{Be sure to call @code{mp_set_memory_functions} only when there are no

5757

active GMP objects allocated using the previous memory functions! Usually

5758

that means calling it before any other GMP function.}

5759

@end deftypefun

5760

5761

The functions supplied should fit the following declarations:

5762

5763

@deftypefun {void *} allocate_function (size_t @var{alloc_size})

5764

Return a pointer to newly allocated space with at least @var{alloc_size}

5765

bytes.

5766

@end deftypefun

5767

5768

@deftypefun {void *} reallocate_function (void *@var{ptr}, size_t @var{old_size}, size_t @var{new_size})

5769

Resize a previously allocated block @var{ptr} of @var{old_size} bytes to be

5770

@var{new_size} bytes.

5771

5772

The block may be moved if necessary or if desired, and in that case the

5773

smaller of @var{old_size} and @var{new_size} bytes must be copied to the new

5774

location. The return value is a pointer to the resized block, that being the

5775

new location if moved or just @var{ptr} if not.

5776

5777

@var{ptr} is never @code{NULL}, it's always a previously allocated block.

5778

@var{new_size} may be bigger or smaller than @var{old_size}.

5779

@end deftypefun

5780

5781

@deftypefun void deallocate_function (void *@var{ptr}, size_t @var{size})

5782

De-allocate the space pointed to by @var{ptr}.

5783

5784

@var{ptr} is never @code{NULL}, it's always a previously allocated block of

5785

@var{size} bytes.

5786

@end deftypefun

5787

5788

A @dfn{byte} here means the unit used by the @code{sizeof} operator.

5789

5790

The @var{old_size} parameters to @var{reallocate_function} and

5791

@var{deallocate_function} are passed for convenience, but of course can be

5792

ignored if not needed. The default functions using @code{malloc} and friends

5793

for instance don't use them.

5794

5795

No error return is allowed from any of these functions, if they return then

5796

they must have performed the specified operation. In particular note that

5797

@var{allocate_function} or @var{reallocate_function} mustn't return

5798

@code{NULL}.

5799

5800

Getting a different fatal error action is a good use for custom allocation

5801

functions, for example giving a graphical dialog rather than the default print

5802

to @code{stderr}. How much is possible when genuinely out of memory is

5803

another question though.

5804

5805

There's currently no defined way for the allocation functions to recover from

5806

an error such as out of memory, they must terminate program execution. A

5807

@code{longjmp} or throwing a C++ exception will have undefined results. This

5808

may change in the future.

5809

5810

GMP may use allocated blocks to hold pointers to other allocated blocks. This

5811

will limit the assumptions a conservative garbage collection scheme can make.

5812

5813

Since the default GMP allocation uses @code{malloc} and friends, those

5814

functions will be linked in even if the first thing a program does is an

5815

@code{mp_set_memory_functions}. It's necessary to change the GMP sources if

5816

this is a problem.

5817

5818

5819

@node Language Bindings, Algorithms, Custom Allocation, Top

5820

@chapter Language Bindings

5821

5822

The following packages and projects offer access to GMP from languages other

5823

than C, though perhaps with varying levels of functionality and efficiency.

5824

5825

@c GNUstep Base Library @uref{http://www.gnustep.org} (version 0.9.1) is

5826

@c intending to use GMP for its NSDecimal class, which would be an Objective

5827

@c C binding for GMP. Has some configure stuff ready, but no code.

5828

5829

@c @spaceuref{U} is the same as @uref{U}, but with a couple of extra spaces

5830

@c in tex, just to separate the URL from the preceding text a bit.

5831

@iftex

5832

@macro spaceuref {U}

5833

@ @ @uref{\U\}

5834

@end macro

5835

@end iftex

5836

@ifnottex

5837

@macro spaceuref {U}

5838

@uref{\U\}

5839

@end macro

5840

@end ifnottex

5841

5842

@sp 1

5843

@table @asis

5844

@item C++

5845

@itemize @bullet

5846

@item

5847

GMP C++ class interface, @pxref{C++ Class Interface} @* Straightforward

5848

interface, expression templates to eliminate temporaries.

5849

@item

5850

ALP @spaceuref{http://www.inria.fr/saga/logiciels/ALP} @* Linear algebra and

5851

polynomials using templates.

5852

@item

5853

CLN @spaceuref{http://clisp.cons.org/~haible/packages-cln.html"} @* High level

5854

classes for arithmetic.

5855

@item

5856

LiDIA @spaceuref{http://www.informatik.tu-darmstadt.de/TI/LiDIA} @* A C++

5857

library for computational number theory.

5858

@item

5859

NTL @spaceuref{http://www.shoup.net/ntl} @* A C++ number theory library.

5860

@end itemize

5861

5862

@item Fortran

5863

@itemize @bullet

5864

@item

5865

Omni F77 @spaceuref{http://pdplab.trc.rwcp.or.jp/pdperf/Omni/home.html} @*

5866

Arbitrary precision floats.

5867

@end itemize

5868

5869

@item Haskell

5870

@itemize @bullet

5871

@item

5872

Glasgow Haskell Compiler @spaceuref{http://www.haskell.org/ghc}

5873

@end itemize

5874

5875

@item Java

5876

@itemize @bullet

5877

@item

5878

Kaffe @spaceuref{http://www.kaffe.org}

5879

@item

5880

Kissme @spaceuref{http://kissme.sourceforge.net}

5881

@end itemize

5882

5883

@item Lisp

5884

@itemize @bullet

5885

@item

5886

GNU Common Lisp @spaceuref{http://www.gnu.org/software/gcl/gcl.html} @* In the

5887

process of switching to GMP for bignums.

5888

@item

5889

Librep @spaceuref{http://librep.sourceforge.net}

5890

@end itemize

5891

5892

@item M4

5893

@itemize @bullet

5894

@item

5895

GNU m4 betas @spaceuref{http://www.seindal.dk/rene/gnu} @* Optionally provides

5896

an arbitrary precision @code{mpeval}.

5897

@end itemize

5898

5899

@item ML

5900

@itemize @bullet

5901

@item

5902

MLton compiler @spaceuref{http://www.sourcelight.com/MLton}

5903

@end itemize

5904

5905

@item Oz

5906

@itemize @bullet

5907

@item

5908

Mozart @spaceuref{http://www.mozart-oz.org}

5909

@end itemize

5910

5911

@item Perl

5912

@itemize @bullet

5913

@item

5914

GMP module, see @file{demos/perl} in the GMP sources.

5915

@item

5916

Math::GMP @spaceuref{http://www.cpan.org} @* Compatible with Math::BigInt, but

5917

not as many functions as the GMP module above.

5918

@end itemize

5919

5920

@need 1000

5921

@item Pike

5922

@itemize @bullet

5923

@item

5924

mpz module in the standard distribution, @uref{http://pike.idonex.com}

5925

@end itemize

5926

5927

@need 500

5928

@item Prolog

5929

@itemize @bullet

5930

@item

5931

SWI Prolog @spaceuref{http://www.swi.psy.uva.nl/projects/SWI-Prolog} @*

5932

Arbitrary precision floats.

5933

@end itemize

5934

5935

@item Python

5936

@itemize @bullet

5937

@item

5938

mpz module in the standard distribution, @uref{http://www.python.org}

5939

@end itemize

5940

5941

@item Scheme

5942

@itemize @bullet

5943

@item

5944

RScheme @spaceuref{http://www.rscheme.org}

5945

@end itemize

5946

5947

@item Other

5948

@itemize @bullet

5949

@item

5950

DrGenius @spaceuref{http://drgenius.seul.org} @* Geometry system and

5951

mathematical programming language.

5952

@item

5953

GiNaC @spaceuref{http://www.ginac.de} @* C++ computer algebra using CLN.

5954

@item

5955

Maxima @uref{http://www.ma.utexas.edu/users/wfs/maxima.html} @* Macsyma

5956

computer algebra using GCL.

5957

@item

5958

Q @spaceuref{http://www.musikwissenschaft.uni-mainz.de/~ag/q} @* Equational

5959

programming system.

5960

@item

5961

Yacas @spaceuref{http://www.xs4all.nl/~apinkus/yacas.html} @* Computer algebra

5962

system.

5963

@end itemize

5964

5965

@end table

5966

5967

5968

@node Algorithms, Internals, Language Bindings, Top

5969

@chapter Algorithms

5970

@cindex Algorithms

5971

5972

This chapter is an introduction to some of the algorithms used for various GMP

5973

operations. The code is likely to be hard to understand without knowing

5974

something about the algorithms.

5975

5976

Some GMP internals are mentioned, but applications that expect to be

5977

compatible with future GMP releases should take care to use only the

5978

documented functions.

5979

5980

@menu

5981

* Multiplication Algorithms::

5982

* Division Algorithms::

5983

* Greatest Common Divisor Algorithms::

5984

* Powering Algorithms::

5985

* Root Extraction Algorithms::

5986

* Radix Conversion Algorithms::

5987

* Other Algorithms::

5988

* Assembler Coding::

5989

@end menu

5990

5991

5992

@node Multiplication Algorithms, Division Algorithms, Algorithms, Algorithms

5993

@section Multiplication

5994

@cindex Multiplication algorithms

5995

5996

N@cross{}N limb multiplications and squares are done using one of four

5997

algorithms, as the size N increases.

5998

5999

@quotation

6000

@multitable {KaratsubaMMM} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

6001

@item Algorithm @tab Threshold

6002

@item Basecase @tab (none)

6003

@item Karatsuba @tab @code{KARATSUBA_MUL_THRESHOLD}

6004

@item Toom-3 @tab @code{TOOM3_MUL_THRESHOLD}

6005

@item FFT @tab @code{FFT_MUL_THRESHOLD}

6006

@end multitable

6007

@end quotation

6008

6009

Similarly for squaring, with the @code{SQR} thresholds. Note though that the

6010

FFT is only used if GMP is configured with @samp{--enable-fft}, @pxref{Build

6011

Options}.

6012

6013

N@cross{}M multiplications of operands with different sizes above

6014

@code{KARATSUBA_MUL_THRESHOLD} are currently done by splitting into M@cross{}M

6015

pieces. The Karatsuba and Toom-3 routines then operate only on equal size

6016

operands. This is not very efficient, and is slated for improvement in the

6017

future.

6018

6019

@menu

6020

* Basecase Multiplication::

6021

* Karatsuba Multiplication::

6022

* Toom-Cook 3-Way Multiplication::

6023

* FFT Multiplication::

6024

* Other Multiplication::

6025

@end menu

6026

6027

6028

@node Basecase Multiplication, Karatsuba Multiplication, Multiplication Algorithms, Multiplication Algorithms

6029

@subsection Basecase Multiplication

6030

6031

Basecase N@cross{}M multiplication is a straightforward rectangular set of

6032

cross-products, the same as long multiplication done by hand and for that

6033

reason sometimes known as the schoolbook or grammar school method. This is an

6034

@m{O(NM),O(N*M)} algorithm. See Knuth section 4.3.1 algorithm M

6035

(@pxref{References}), and the @file{mpn/generic/mul_basecase.c} code.

6036

6037

Assembler implementations of @code{mpn_mul_basecase} are essentially the same

6038

as the generic C code, but have all the usual assembler tricks and

6039

obscurities introduced for speed.

6040

6041

A square can be done in roughly half the time of a multiply, by using the fact

6042

that the cross products above and below the diagonal are the same. A triangle

6043

of products below the diagonal is formed, doubled (left shift by one bit), and

6044

then the products on the diagonal added. This can be seen in

6045

@file{mpn/generic/sqr_basecase.c}. Again the assembler implementations take

6046

essentially the same approach.

6047

6048

@tex

6049

\def\GMPline#1#2#3#4#5#6{%

6050

\hbox {%

6051

\vrule height 2.5ex depth 1ex

6052

\hbox to 2em {\hfil{#2}\hfil}%

6053

\vrule \hbox to 2em {\hfil{#3}\hfil}%

6054

\vrule \hbox to 2em {\hfil{#4}\hfil}%

6055

\vrule \hbox to 2em {\hfil{#5}\hfil}%

6056

\vrule \hbox to 2em {\hfil{#6}\hfil}%

6057

\vrule}}

6058

\GMPdisplay{

6059

\hbox{%

6060

\vbox{%

6061

\hbox to 1.5em {\vrule height 2.5ex depth 1ex width 0pt}%

6062

\hbox {\vrule height 2.5ex depth 1ex width 0pt u0\hfil}%

6063

\hbox {\vrule height 2.5ex depth 1ex width 0pt u1\hfil}%

6064

\hbox {\vrule height 2.5ex depth 1ex width 0pt u2\hfil}%

6065

\hbox {\vrule height 2.5ex depth 1ex width 0pt u3\hfil}%

6066

\hbox {\vrule height 2.5ex depth 1ex width 0pt u4\hfil}%

6067

\vfill}%

6068

\vbox{%

6069

\hbox{%

6070

\hbox to 2em {\hfil u0\hfil}%

6071

\hbox to 2em {\hfil u1\hfil}%

6072

\hbox to 2em {\hfil u2\hfil}%

6073

\hbox to 2em {\hfil u3\hfil}%

6074

\hbox to 2em {\hfil u4\hfil}}%

6075

\vskip 0.7ex

6076

\hrule

6077

\GMPline{u0}{d}{}{}{}{}%

6078

\hrule

6079

\GMPline{u1}{}{d}{}{}{}%

6080

\hrule

6081

\GMPline{u2}{}{}{d}{}{}%

6082

\hrule

6083

\GMPline{u3}{}{}{}{d}{}%

6084

\hrule

6085

\GMPline{u4}{}{}{}{}{d}%

6086

\hrule}}}

6087

@end tex

6088

@ifnottex

6089

@example

6090

@group

6091

u0 u1 u2 u3 u4

6092

+---+---+---+---+---+

6093

u0 | d | | | | |

6094

+---+---+---+---+---+

6095

u1 | | d | | | |

6096

+---+---+---+---+---+

6097

u2 | | | d | | |

6098

+---+---+---+---+---+

6099

u3 | | | | d | |

6100

+---+---+---+---+---+

6101

u4 | | | | | d |

6102

+---+---+---+---+---+

6103

@end group

6104

@end example

6105

@end ifnottex

6106

6107

In practice squaring isn't a full 2@cross{} faster than multiplying, it's

6108

usually around 1.5@cross{}. Less than 1.5@cross{} probably indicates

6109

@code{mpn_sqr_basecase} wants improving on that CPU.

6110

6111

On some CPUs @code{mpn_mul_basecase} can be faster than the generic C

6112

@code{mpn_sqr_basecase}. @code{BASECASE_SQR_THRESHOLD} is the size at which

6113

to use @code{mpn_sqr_basecase}, this will be zero if that routine should be

6114

used always.

6115

6116

6117

@node Karatsuba Multiplication, Toom-Cook 3-Way Multiplication, Basecase Multiplication, Multiplication Algorithms

6118

@subsection Karatsuba Multiplication

6119

6120

The Karatsuba multiplication algorithm is described in Knuth section 4.3.3

6121

part A, and various other textbooks. A brief description is given here.

6122

6123

The inputs @ma{x} and @ma{y} are treated as each split into two parts of equal

6124

length (or the most significant part one limb shorter if N is odd).

6125

6126

@tex

6127

\global\newdimen\GMPboxwidth \GMPboxwidth=5em

6128

\global\newdimen\GMPboxheight \GMPboxheight=3ex

6129

\def\GMPbox#1#2{%

6130

\vbox {%

6131

\hrule

6132

\hbox{%

6133

\vrule height 2ex depth 1ex

6134

\hbox to \GMPboxwidth {\hfil\hbox{$#1$}\hfil}%

6135

\vrule

6136

\hbox to \GMPboxwidth {\hfil\hbox{$#2$}\hfil}%

6137

\vrule}

6138

\hrule

6139

}}

6140

\GMPdisplay{%

6141

\vbox{%

6142

\hbox to 2\GMPboxwidth {high \hfil low}

6143

\vskip 0.7ex

6144

\GMPbox{x_1}{x_0}

6145

\vskip 0.5ex

6146

\GMPbox{y_1}{y_0}

6147

}}

6148

6149

%\moveright \lispnarrowing

6150

%\vskip 0.5 ex

6151

%\vskip 0.5 ex

6152

@end tex

6153

@ifnottex

6154

@example

6155

@group

6156

high low

6157

+----------+----------+

6158

| x1 | x0 |

6159

+----------+----------+

6160

6161

+----------+----------+

6162

| y1 | y0 |

6163

+----------+----------+

6164

@end group

6165

@end example

6166

@end ifnottex

6167

6168

Let @ma{b} be the power of 2 where the split occurs, ie.@: if @ms{x,0} is

6169

@ma{k} limbs (@ms{y,0} the same) then

6170

@m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}}, b=2^(k*mp_bits_per_limb)}.

6171

With that @m{x=x_1b+x_0,x=x1*b+x0} and @m{y=y_1b+y_0,y=y1*b+y0}, and the

6172

following holds,

6173

6174

@display

6175

@m{xy = (b^2+b)x_1y_1 - b(x_1-x_0)(y_1-y_0) + (b+1)x_0y_0,

6176

x*y = (b^2+b)*x1*y1 - b*(x1-x0)*(y1-y0) + (b+1)*x0*y0}

6177

@end display

6178

6179

This formula means doing only three multiplies of (N/2)@cross{}(N/2) limbs,

6180

whereas a basecase multiply of N@cross{}N limbs is equivalent to four

6181

multiplies of (N/2)@cross{}(N/2). The factors @ma{(b^2+b)} etc represent the

6182

positions where the three products must be added.

6183

6184

@tex

6185

\global\newdimen\GMPboxwidth \GMPboxwidth=5em

6186

\global\newdimen\GMPboxheight \GMPboxheight=3ex

6187

\def\GMPboxA#1#2{%

6188

\vbox to \GMPboxheight{%

6189

\hrule \vfil

6190

\hbox{%

6191

\strut \vrule

6192

\hbox to 2\GMPboxwidth {\hfil\hbox{$#1$}\hfil}%

6193

\vrule

6194

\hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%

6195

\vrule}

6196

\vfil \hrule}}

6197

\def\GMPboxB#1#2{%

6198

\hbox{%

6199

\vbox to \GMPboxheight{%

6200

\vfil \hbox to \GMPboxwidth {\hfil #1} \vfil }

6201

\vbox to \GMPboxheight{%

6202

\hrule \vfil

6203

\hbox{%

6204

\strut \vrule

6205

\hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}

6206

\vrule}

6207

\vfil \hrule}}}

6208

\GMPdisplay{%

6209

\vbox{%

6210

\hbox to 4\GMPboxwidth {high \hfil low}

6211

\vskip 0.7ex

6212

\GMPboxA{x_1y_1}{x_0y_0}

6213

\vskip 0.5ex

6214

\GMPboxB{$+$}{x_1y_1}

6215

\vskip 0.5ex

6216

\GMPboxB{$+$}{x_0y_0}

6217

\vskip 0.5ex

6218

\GMPboxB{$-$}{(x_1-x_0)(y_1-y_0)}

6219

}}

6220

@end tex

6221

@ifnottex

6222

@example

6223

@group

6224

high low

6225

+--------+--------+ +--------+--------+

6226

| x1*y1 | | x0*y0 |

6227

+--------+--------+ +--------+--------+

6228

+--------+--------+

6229

add | x1*y1 |

6230

+--------+--------+

6231

+--------+--------+

6232

add | x0*y0 |

6233

+--------+--------+

6234

+--------+--------+

6235

sub | (x1-x0)*(y1-y0) |

6236

+--------+--------+

6237

@end group

6238

@end example

6239

@end ifnottex

6240

6241

The term @m{(x_1-x_0)(y_1-y_0),(x1-x0)*(y1-y0)} is best calculated as an

6242

absolute value, and the sign used to choose to add or subtract. Notice the

6243

sum @m{\mathop{\rm high}(x_0y_0)+\mathop{\rm low}(x_1y_1),

6244

high(x0*y0)+low(x1*y1)} occurs twice, so it's possible to do @m{5k,5*k} limb

6245

additions, rather than @m{6k,6*k}, but in GMP extra function call overheads

6246

outweigh the saving.

6247

6248

Squaring is similar to multiplying, but with @ma{x=y} the formula reduces to

6249

an equivalent with three squares,

6250

6251

@display

6252

@m{x^2 = (b^2+b)x_1^2 - b(x_1-x_0)^2 + (b+1)x_0^2,

6253

x^2 = (b^2+b)*x1^2 - b*(x1-x0)^2 + (b+1)*x0^2}

6254

@end display

6255

6256

The final result is accumulated from those three squares the same way as for

6257

the three multiplies above. The middle term @m{(x_1-x_0)^2,(x1-x0)^2} is now

6258

always positive.

6259

6260

A similar formula for both multiplying and squaring can be constructed with a

6261

middle term @m{(x_1+x_0)(y_1+y_0),(x1+x0)*(y1+y0)}. But those sums can exceed

6262

@ma{k} limbs, leading to more carry handling and additions than the form

6263

above.

6264

6265

Karatsuba multiplication is asymptotically an @ma{O(N^@W{1.585})} algorithm,

6266

the exponent being @m{\log3/\log2,log(3)/log(2)}, representing 3 multiplies

6267

each 1/2 the size of the inputs. This is a big improvement over the basecase

6268

multiply at @ma{O(N^2)} and the advantage soon overcomes the extra additions

6269

Karatsuba performs.

6270

6271

@code{KARATSUBA_MUL_THRESHOLD} can be as little as 10 limbs. The @code{SQR}

6272

threshold is usually about twice the @code{MUL}. The basecase algorithm will

6273

take a time of the form @m{M(N) = aN^2 + bN + c, M(N) = a*N^2 + b*N + c} and

6274

the Karatsuba algorithm @m{K(N) = 3M(N/2) + dN + e, K(N) = 3*M(N/2) + d*N +

6275

e}. Clearly per-crossproduct speedups in the basecase code reduce @ma{a} and

6276

decrease the threshold, but linear style speedups reducing @ma{b} will

6277

actually increase the threshold. The latter can be seen for instance when

6278

adding an optimized @code{mpn_sqr_diagonal} to @code{mpn_sqr_basecase}. Of

6279

course all speedups reduce total time, and in that sense the algorithm

6280

thresholds are merely of academic interest.

6281

6282

6283

@node Toom-Cook 3-Way Multiplication, FFT Multiplication, Karatsuba Multiplication, Multiplication Algorithms

6284

@subsection Toom-Cook 3-Way Multiplication

6285

6286

The Karatsuba formula is the simplest case of a general approach to splitting

6287

inputs that leads to both Toom-Cook and FFT algorithms. A description of

6288

Toom-Cook can be found in Knuth section 4.3.3, with an example 3-way

6289

calculation after Theorem A. The 3-way form used in GMP is described here.

6290

6291

The operands are each considered split into 3 pieces of equal length (or the

6292

most significant part 1 or 2 limbs shorter than the others).

6293

6294

@iftex

6295

@global@newdimen@GMPboxwidth @GMPboxwidth=5em

6296

@global@newdimen@GMPboxheight @GMPboxheight=3ex

6297

@end iftex

6298

@tex

6299

\def\GMPbox#1#2#3{%

6300

\vbox to \GMPboxheight{%

6301

\hrule \vfil

6302

\hbox{%

6303

\strut \vrule

6304

\hbox to \GMPboxwidth {\hfil\hbox{$#1$}\hfil}%

6305

\vrule

6306

\hbox to \GMPboxwidth {\hfil\hbox{$#2$}\hfil}%

6307

\vrule

6308

\hbox to \GMPboxwidth {\hfil\hbox{$#3$}\hfil}%

6309

\vrule}

6310

\vfil \hrule

6311

}}

6312

\GMPdisplay{%

6313

\vbox{%

6314

\hbox to 3\GMPboxwidth {high \hfil low}

6315

\vskip 0.7ex

6316

\GMPbox{x_2}{x_1}{x_0}

6317

\vskip 0.5ex

6318

\GMPbox{y_2}{y_1}{y_0}

6319

\vskip 0.5ex

6320

}}

6321

@end tex

6322

@ifnottex

6323

@example

6324

@group

6325

high low

6326

+----------+----------+----------+

6327

| x2 | x1 | x0 |

6328

+----------+----------+----------+

6329

6330

+----------+----------+----------+

6331

| y2 | y1 | y0 |

6332

+----------+----------+----------+

6333

@end group

6334

@end example

6335

@end ifnottex

6336

6337

@noindent

6338

These parts are treated as the coefficients of two polynomials

6339

6340

@display

6341

@group

6342

@m{X(t) = x_2t^2 + x_1t + x_0,

6343

X(t) = x2*t^2 + x1*t + x0}

6344

@m{Y(t) = y_2t^2 + y_1t + y_0,

6345

Y(t) = y2*t^2 + y1*t + y0}

6346

@end group

6347

@end display

6348

6349

Again let @ma{b} equal the power of 2 which is the size of the @ms{x,0},

6350

@ms{x,1}, @ms{y,0} and @ms{y,1} pieces, ie.@: if they're @ma{k} limbs each

6351

then @m{b=2\GMPraise{$k*$@code{mp\_bits\_per\_limb}},

6352

b=2^(k*mp_bits_per_limb)}. With this @ma{x=X(b)} and @ma{y=Y(b)}.

6353

6354

Let a polynomial @m{W(t)=X(t)Y(t),W(t)=X(t)*Y(t)} and suppose its coefficients

6355

are

6356

6357

@display

6358

@m{W(t) = w_4t^4 + w_3t^3 + w_2t^2 + w_1t + w_0,

6359

W(t) = w4*t^4 + w3*t^3 + w2*t^2 + w1*t + w0}

6360

@end display

6361

6362

@noindent

6363

The @m{w_i,w[i]} are going to be determined, and when they are they'll give

6364

the final result using @ma{w=W(b)}, since @m{xy=X(b)Y(b),x*y=X(b)*Y(b)=W(b)}.

6365

The coefficients will be roughly @ma{b^2} each, and the final @ma{W(b)} will

6366

be an addition like,

6367

6368

@tex

6369

\def\GMPbox#1#2{%

6370

\moveright #1\GMPboxwidth

6371

\vbox to \GMPboxheight{%

6372

\hrule \vfil

6373

\hbox{%

6374

\strut \vrule

6375

\hbox to 2\GMPboxwidth {\hfil\hbox{$#2$}\hfil}%

6376

\vrule}

6377

\vfil \hrule

6378

}}

6379

\GMPdisplay{%

6380

\vbox{%

6381

\hbox to 6\GMPboxwidth {high \hfil low}

6382

\vskip 0.7ex

6383

\GMPbox{0}{w_4}

6384

\vskip 0.5ex

6385

\GMPbox{1}{w_3}

6386

\vskip 0.5ex

6387

\GMPbox{2}{w_2}

6388

\vskip 0.5ex

6389

\GMPbox{3}{w_1}

6390

\vskip 0.5ex

6391

\GMPbox{4}{w_1}

6392

}}

6393

@end tex

6394

@ifnottex

6395

@example

6396

@group

6397

high low

6398

+-------+-------+

6399

| w4 |

6400

+-------+-------+

6401

+--------+-------+

6402

| w3 |

6403

+--------+-------+

6404

+--------+-------+

6405

| w2 |

6406

+--------+-------+

6407

+--------+-------+

6408

| w1 |

6409

+--------+-------+

6410

+-------+-------+

6411

| w0 |

6412

+-------+-------+

6413

@end group

6414

@end example

6415

@end ifnottex

6416

6417

The @m{w_i,w[i]} coefficients could be formed by a simple set of cross

6418

products, like @m{w_4=x_2y_2,w4=x2*y2}, @m{w_3=x_2y_1+x_1y_2,w3=x2*y1+x1*y2},

6419

@m{w_2=x_2y_0+x_1y_1+x_0y_2,w2=x2*y0+x1*y1+x0*y2} etc, but this would need all

6420

nine @m{x_iy_j,x[i]*y[j]} for @ma{i,j=0,1,2}, and would be equivalent merely

6421

to a basecase multiply. Instead the following approach is used.

6422

6423

@ma{X(t)} and @ma{Y(t)} are evaluated and multiplied at 5 points, giving

6424

values of @ma{W(t)} at those points. The points used can be chosen in

6425

various ways, but in GMP the following are used

6426

6427

@quotation

6428

@multitable {@m{t=\infty,t=inf}M} {MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM}

6429

@item Point @tab Value

6430

@item @ma{t=0} @tab @m{x_0y_0,x0*y0}, which gives @ms{w,0} immediately

6431

@item @ma{t=2} @tab @m{(4x_2+2x_1+x_0)(4y_2+2y_1+y_0),(4*x2+2*x1+x0)*(4*y2+2*y1+y0)}

6432

@item @ma{t=1} @tab @m{(x_2+x_1+x_0)(y_2+y_1+y_0),(x2+x1+x0)*(y2+y1+y0)}

6433

@item @m{t={1\over2},t=1/2} @tab @m{(x_2+2x_1+4x_0)(y_2+2y_1+4y_0),(x2+2*x1+4*x0)*(y2+2*y1+4*y0)}

6434

@item @m{t=\infty,t=inf} @tab @m{x_2y_2,x2*y2}, which gives @ms{w,4} immediately

6435

@end multitable

6436

@end quotation

6437

6438

At @m{t={1\over2},t=1/2} the value calculated is actually

6439

@m{16X({1\over2})Y({1\over2}), 16*X(1/2)*Y(1/2)}, giving a value for

6440

@m{16W({1\over2}),16*W(1/2)}, and this is always an integer. At

6441

@m{t=\infty,t=inf} the value is actually @m{\lim_{t\to\infty} {X(t)Y(t)\over

6442

t^4}, X(t)*Y(t)/t^4 in the limit as t approaches infinity}, but it's much

6443

easier to think of as simply @m{x_2y_2,x2*y2} giving @ms{w,4} immediately

6444

(much like @m{x_0y_0,x0*y0} at @ma{t=0} gives @ms{w,0} immediately).

6445

6446

Now each of the points substituted into

6447

@m{W(t)=w_4t^4+\cdots+w_0,W(t)=w4*t^4+@dots{}+w0} gives a linear combination

6448

of the @m{w_i,w[i]} coefficients, and the value of those combinations has just

6449

been calculated.

6450

6451

@tex

6452

\GMPdisplay{%

6453

$\matrix{%

6454

W(0) & = & & & & & & & & & w_0 \cr

6455

16W({1\over2}) & = & w_4 & + & 2w_3 & + & 4w_2 & + & 8w_1 & + & 16w_0 \cr

6456

W(1) & = & w_4 & + & w_3 & + & w_2 & + & w_1 & + & w_0 \cr

6457

W(2) & = & 16w_4 & + & 8w_3 & + & 4w_2 & + & 2w_1 & + & w_0 \cr

6458

W(\infty) & = & w_4 \cr

6459

}$}

6460

@end tex

6461

@ifnottex

6462

@example

6463

@group

6464

W(0) = w0

6465

16*W(1/2) = w4 + 2*w3 + 4*w2 + 8*w1 + 16*w0

6466

W(1) = w4 + w3 + w2 + w1 + w0

6467

W(2) = 16*w4 + 8*w3 + 4*w2 + 2*w1 + w0

6468

W(inf) = w4

6469

@end group

6470

@end example

6471

@end ifnottex

6472

6473

This is a set of five equations in five unknowns, and some elementary linear

6474

algebra quickly isolates each @m{w_i,w[i]}, by subtracting multiples of one

6475

equation from another.

6476

6477

In the code the set of five values @ma{W(0)},@dots{},@m{W(\infty),W(inf)} will

6478

represent those certain linear combinations. By adding or subtracting one

6479

from another as necessary, values which are each @m{w_i,w[i]} alone are

6480

arrived at. This involves only a few subtractions of small multiples (some of

6481

which are powers of 2), and so is fast. A couple of divisions remain by

6482

powers of 2 and one division by 3 (or by 6 rather), and that last uses the

6483

special @code{mpn_divexact_by3} (@pxref{Exact Division}).

6484

6485

In the code the values @ms{w,4}, @ms{w,2} and @ms{w,0} are formed in the

6486

destination with pointers @code{E}, @code{C} and @code{A}, and @ms{w,3} and

6487

@ms{w,1} in temporary space @code{D} and @code{B} are added to them. There

6488

are extra limbs @code{tD}, @code{tC} and @code{tB} at the high end of

6489

@ms{w,3}, @ms{w,2} and @ms{w,1} which are handled separately. The final

6490

addition then is as follows.

6491

6492

@tex

6493

\def\GMPboxT#1{%

6494

\vbox to \GMPboxheight{%

6495

\hrule

6496

\hbox {\strut \vrule{} #1 \vrule}%

6497

\hrule

6498

}}

6499

\GMPdisplay{%

6500

\advance\baselineskip by 1ex

6501

\vbox{%

6502

\hbox to 6\GMPboxwidth {high \hfil low}

6503

\vbox to \GMPboxheight{%

6504

\hrule \vfil

6505

\hbox{%

6506

\strut \vrule

6507

\hbox to 2\GMPboxwidth {\hfil@code{E}\hfil}

6508

\vrule

6509

\hbox to 2\GMPboxwidth {\hfil@code{C}\hfil}

6510

\vrule

6511

\hbox to 2\GMPboxwidth {\hfil@code{A}\hfil}

6512

\vrule}

6513

\vfil \hrule

6514

6515

\moveright \GMPboxwidth

6516

\vbox to \GMPboxheight{%

6517

\hrule \vfil

6518

\hbox{%

6519

\strut \vrule

6520

\hbox to 2\GMPboxwidth {\hfil@code{D}\hfil}

6521

\vrule

6522

\hbox to 2\GMPboxwidth {\hfil@code{B}\hfil}

6523

\vrule}

6524

\vfil \hrule

6525

6526

\hbox{%

6527

\hbox to \GMPboxwidth{\hfil \GMPboxT{\code{tD}}}%

6528

\hbox to \GMPboxwidth{\hfil \GMPboxT{\code{tC}}}%

6529

\hbox to \GMPboxwidth{\hfil \GMPboxT{\code{tB}}}}

6530

}}

6531

@end tex

6532

@ifnottex

6533

@example

6534

@group

6535

high low

6536

+-------+-------+-------+-------+-------+-------+

6537

| E | C | A |

6538

+-------+-------+-------+-------+-------+-------+

6539

+------+-------++------+-------+

6540

| D || B |

6541

+------+-------++------+-------+

6542

-- -- --

6543

|tD| |tC| |tB|

6544

-- -- --

6545

@end group

6546

@end example

6547

@end ifnottex

6548

6549

The conversion of @ma{W(t)} values to the coefficients is interpolation. A

6550

polynomial of degree 4 like @ma{W(t)} is uniquely determined by values known

6551

at 5 different points. The points can be chosen to make the linear equations

6552

come out with a convenient set of steps for isolating the @m{w_i,w[i]}.

6553

6554

In @file{mpn/generic/mul_n.c} the @code{interpolate3} routine performs the

6555

interpolation. The open-coded one-pass version may be a bit hard to

6556

understand, the steps performed can be better seen in the @code{USE_MORE_MPN}

6557

version.

6558

6559

Squaring follows the same procedure as multiplication, but there's only one

6560

@ma{X(t)} and it's evaluated at 5 points, and those values squared to give

6561

values of @ma{W(t)}. The interpolation is then identical, and in fact the

6562

same @code{interpolate3} subroutine is used for both squaring and multiplying.

6563

6564

Toom-3 is asymptotically @ma{O(N^@W{1.465})}, the exponent being

6565

@m{\log5/\log3,log(5)/log(3)}, representing 5 recursive multiplies of 1/3 the

6566

original size. This is an improvement over Karatsuba at @ma{O(N^@W{1.585})},

6567

though Toom-Cook does more work in the evaluation and interpolation and so it

6568

only realizes its advantage above a certain size.

6569

6570

Near the crossover between Toom-3 and Karatsuba there's generally a range of

6571

sizes where the difference between the two is small.

6572

@code{TOOM3_MUL_THRESHOLD} is a somewhat arbitrary point in that range and

6573

successive runs of the tune program can give different values due to small

6574

variations in measuring. A graph of time versus size for the two shows the

6575

effect, see @file{tune/README}.

6576

6577

At the fairly small sizes where the Toom-3 thresholds occur it's worth

6578

remembering that the asymptotic behaviour for Karatsuba and Toom-3 can't be

6579

expected to make accurate predictions, due of course to the big influence of

6580

all sorts of overheads, and the fact that only a few recursions of each are

6581

being performed. Even at large sizes there's a good chance machine dependent

6582

effects like cache architecture will mean actual performance deviates from

6583

what might be predicted.

6584

6585

The formula given above for the Karatsuba algorithm has an equivalent for

6586

Toom-3 involving only five multiplies, but this would be complicated and

6587

unenlightening.

6588

6589

An alternate view of Toom-3 can be found in Zuras (@pxref{References}), using

6590

a vector to represent the @ma{x} and @ma{y} splits and a matrix multiplication

6591

for the evaluation and interpolation stages. The matrix inverses are not

6592

meant to be actually used, and they have elements with values much greater

6593

than in fact arise in the interpolation steps. The diagram shown for the

6594

3-way is attractive, but again doesn't have to be implemented that way and for

6595

example with a bit of rearrangement just one division by 6 can be done.

6596

6597

6598

@node FFT Multiplication, Other Multiplication, Toom-Cook 3-Way Multiplication, Multiplication Algorithms

6599

@subsection FFT Multiplication

6600

6601

At large to very large sizes a Fermat style FFT multiplication is used,

6602

following Sch@"onhage and Strassen (@pxref{References}). Descriptions of FFTs

6603

in various forms can be found in many textbooks, for instance Knuth section

6604

4.3.3 part C or Lipson chapter IX. A brief description of the form used in

6605

GMP is given here.

6606

6607

The multiplication done is @m{xy \bmod 2^N+1, x*y mod 2^N+1}, for a given

6608

@ma{N}. A full product @m{xy,x*y} is obtained by choosing @m{N \ge

6609

\mathop{\rm bits}(x)+\mathop{\rm bits}(y), N>=bits(x)+bits(y)} and padding

6610

@ma{x} and @ma{y} with high zero limbs. The modular product is the native

6611

form for the algorithm, so padding to get a full product is unavoidable.

6612

6613

The algorithm follows a split, evaluate, pointwise multiply, interpolate and

6614

combine similar to that described above for Karatsuba and Toom-3. A @ma{k}

6615

parameter controls the split, with an FFT-@ma{k} splitting into @ma{2^k}

6616

pieces of @ma{M=N/2^k} bits each. @ma{N} must be a multiple of

6617

@m{2^k\times@code{mp\_bits\_per\_limb}, (2^k)*@nicode{mp_bits_per_limb}} so

6618

the split falls on limb boundaries, avoiding bit shifts in the split and

6619

combine stages.

6620

6621

The evaluations, pointwise multiplications, and interpolation, are all done

6622

modulo @m{2^{N'}+1, 2^N'+1} where @ma{N'} is @ma{2M+k+3} rounded up to a

6623

multiple of @ma{2^k} and of @code{mp_bits_per_limb}. The results of

6624

interpolation will be the following negacyclic convolution of the input

6625

pieces, and the choice of @ma{N'} ensures these sums aren't truncated.

6626

@tex

6627

$$ w_n = \sum_{{i+j = b2^k+n}\atop{b=0,1}} (-1)^b x_i y_j $$

6628

@end tex

6629

@ifnottex

6630

6631

@example

6632

---

6633

\ b

6634

w[n] = / (-1) * x[i] * y[j]

6635

---

6636

i+j==b*2^k+n

6637

b=0,1

6638

@end example

6639

6640

@end ifnottex

6641

The points used for the evaluation are @ma{g^i} for @ma{i=0} to @ma{2^k-1}

6642

where @m{g=2^{2N'/2^k}, g=2^(2N'/2^k)}. @ma{g} is a @m{2^k,2^k'}th root of

6643

unity mod @m{2^{N'}+1,2^N'+1}, which produces necessary cancellations at the

6644

interpolation stage, and it's also a power of 2 so the fast fourier transforms

6645

used for the evaluation and interpolation do only shifts, adds and negations.

6646

6647

The pointwise multiplications are done modulo @m{2^{N'}+1, 2^N'+1} and either

6648

recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or

6649

basecase), whichever is optimal at the size @ma{N'}. The interpolation is an

6650

inverse fast fourier transform. The resulting set of sums of @m{x_iy_j,

6651

x[i]*y[j]} are added at appropriate offsets to give the final result.

6652

6653

Squaring is the same, but @ma{x} is the only input so it's one transform at

6654

the evaluate stage and the pointwise multiplies are squares. The

6655

interpolation is the same.

6656

6657

For a mod @ma{2^N+1} product, an FFT-@ma{k} is an @m{O(N^{k/(k-1)}),

6658

O(N^(k/(k-1)))} algorithm, the exponent representing @ma{2^k} recursed modular

6659

multiplies each @m{1/2^{k-1},1/2^(k-1)} the size of the original. Each

6660

successive @ma{k} is an asymptotic improvement, but overheads mean each is

6661

only faster at bigger and bigger sizes. In the code, @code{FFT_MUL_TABLE} and

6662

@code{FFT_SQR_TABLE} are the thresholds where each @ma{k} is used. Each new

6663

@ma{k} effectively swaps some multiplying for some shifts, adds and overheads.

6664

6665

A mod @ma{2^N+1} product can be formed with a normal

6666

@ma{N@cross{}N@rightarrow{}2N} bit multiply plus a subtraction, so an FFT and

6667

Toom-3 etc can be compared directly. A @ma{k=4} FFT at @ma{O(N^@W{1.333})}

6668

can be expected to be the first faster than Toom-3 at @ma{O(N^@W{1.465})}. In

6669

practice this is what's found, with @code{FFT_MODF_MUL_THRESHOLD} and

6670

@code{FFT_MODF_SQR_THRESHOLD} being between 300 and 1000 limbs, depending on

6671

the CPU. So far it's been found that only very large FFTs recurse into

6672

pointwise multiplies above these sizes.

6673

6674

When an FFT is to give a full product, the change of @ma{N} to @ma{2N} doesn't

6675

alter the theoretical complexity for a given @ma{k}, but for the purposes of

6676

considering where an FFT might be first used it can be assumed that the FFT is

6677

recursing into a normal multiply and that on that basis it's doing @ma{2^k}

6678

recursed multiplies each @m{1/2^{k-2},1/2^(k-2)} the size of the inputs,

6679

making it @m{O(N^{k/(k-2)}), O(N^(k/(k-2)))}. This would mean @ma{k=7} at

6680

@ma{O(N^@W{1.4})} would be the first FFT faster than Toom-3. In practice

6681

@code{FFT_MUL_THRESHOLD} and @code{FFT_SQR_THRESHOLD} have been found to be in

6682

the @ma{k=8} range, somewhere between 3000 and 10000 limbs.

6683

6684

The way @ma{N} is split into @ma{2^k} pieces and then @ma{2M+k+3} is rounded

6685

up to a multiple of @ma{2^k} and @code{mp_bits_per_limb} means that when

6686

@ma{2^k@ge{}@nicode{mp\_bits\_per\_limb}} the effective @ma{N} is a multiple

6687

of @m{2^{2k-1},2^(2k-1)} bits. The @ma{+k+3} means some values of @ma{N} just

6688

under such a multiple will be rounded to the next. The complexity

6689

calculations above assume that a favourable size is used, meaning one which

6690

isn't padded through rounding, and it's also assumed that the extra @ma{+k+3}

6691

bits are negligible at typical FFT sizes.

6692

6693

The practical effect of the @m{2^{2k-1},2^(2k-1)} constraint is to introduce a

6694

step-effect into measured speeds. For example @ma{k=8} will round @ma{N} up

6695

to a multiple of 32768 bits, so for a 32-bit limb there'll be 512 limb groups

6696

of sizes for which @code{mpn_mul_n} runs at the same speed. Or for @ma{k=9}

6697

groups of 2048 limbs, @ma{k=10} groups of 8192 limbs, etc. In practice it's

6698

been found each @ma{k} is used at quite small multiples of its size constraint

6699

and so the step effect is quite noticeable in a time versus size graph.

6700

6701

The threshold determinations currently measure at the mid-points of size

6702

steps, but this is sub-optimal since at the start of a new step it can happen

6703

that it's better to go back to the previous @ma{k} for a while. Something

6704

more sophisticated for @code{FFT_MUL_TABLE} and @code{FFT_SQR_TABLE} will be

6705

needed.

6706

6707

6708

@node Other Multiplication, , FFT Multiplication, Multiplication Algorithms

6709

@subsection Other Multiplication

6710

6711

The 3-way Toom-Cook algorithm described above (@pxref{Toom-Cook 3-Way

6712

Multiplication}) generalizes to split into an arbitrary number of pieces, as

6713

per Knuth section 4.3.3 algorithm C. This is not currently used, though it's

6714

possible a Toom-4 might fit in between Toom-3 and the FFTs. The notes here

6715

are merely for interest.

6716

6717

In general a split into @ma{r+1} pieces is made, and evaluations and pointwise

6718

multiplications done at @m{2r+1,2*r+1} points. A 4-way split does 7 pointwise

6719

multiplies, 5-way does 9, etc. Asymptotically an @ma{(r+1)}-way algorithm is

6720

@m{O(N^{log(2r+1)/log(r+1)}, O(N^(log(2*r+1)/log(r+1)))}. Only the pointwise

6721

multiplications count towards big-@ma{O} complexity, but the time spent in the

6722

evaluate and interpolate stages grows with @ma{r} and has a significant

6723

practical impact, with the asymptotic advantage of each @ma{r} realized only

6724

at bigger and bigger sizes. The overheads grow as @m{O(Nr),O(N*r)}, whereas

6725

in an @ma{r=2^k} FFT they grow only as @m{O(N \log r), O(N*log(r))}.

6726

6727

Knuth algorithm C evaluates at points 0,1,2,@dots{},@m{2r,2*r}, but exercise 4

6728

uses @ma{-r},@dots{},0,@dots{},@ma{r} and the latter saves some small

6729

multiplies in the evaluate stage (or rather trades them for additions), and

6730

has a further saving of nearly half the interpolate steps. The idea is to

6731

separate odd and even final coefficients and then perform algorithm C steps C7

6732

and C8 on them separately. The divisors at step C7 become @ma{j^2} and the

6733

multipliers at C8 become @m{2tj-j^2,2*t*j-j^2}.

6734

6735

Splitting odd and even parts through positive and negative points can be

6736

thought of as using @ma{-1} as a square root of unity. If a 4th root of unity

6737

was available then a further split and speedup would be possible, but no such

6738

root exists for plain integers. Going to complex integers with

6739

@m{i=\sqrt{-1}, i=sqrt(-1)} doesn't help, essentially because in cartesian

6740

form it takes three real multiplies to do a complex multiply. The existence

6741

of @m{2^k,2^k'}th roots of unity in a suitable ring or field lets the fast

6742

fourier transform keep splitting and get to @m{O(N \log r), O(N*log(r))}.

6743

6744

6745

@node Division Algorithms, Greatest Common Divisor Algorithms, Multiplication Algorithms, Algorithms

6746

@section Division Algorithms

6747

@cindex Division algorithms

6748

6749

@menu

6750

* Single Limb Division::

6751

* Basecase Division::

6752

* Divide and Conquer Division::

6753

* Exact Division::

6754

* Exact Remainder::

6755

* Small Quotient Division::

6756

@end menu

6757

6758

6759

@node Single Limb Division, Basecase Division, Division Algorithms, Division Algorithms

6760

@subsection Single Limb Division

6761

6762

N@cross{}1 division is implemented using repeated 2@cross{}1 divisions from

6763

high to low, either with a hardware divide instruction or a multiplication by

6764

inverse, whichever is best on a given CPU.

6765

6766

The multiply by inverse follows section 8 of ``Division by Invariant Integers

6767

using Multiplication'' by Granlund and Montgomery (@pxref{References}) and is

6768

implemented as @code{udiv_qrnnd_preinv} in @file{gmp-impl.h}. The idea is to

6769

have a fixed-point approximation to @ma{1/d} (see @code{invert_limb}) and then

6770

multiply by the high limb (plus one bit) of the dividend to get a quotient

6771

@ma{q}. With @ma{d} normalized (high bit set), @ma{q} is no more than 1 too

6772

small. Subtracting @m{qd,q*d} from the dividend gives a remainder, and

6773

reveals whether @ma{q} or @ma{q-1} is correct.

6774

6775

The result is a division done with two multiplications and four or five

6776

arithmetic operations. On CPUs with low latency multipliers this can be much

6777

faster than a hardware divide, though the cost of calculating the inverse at

6778

the start may mean it's only better on inputs bigger than say 4 or 5 limbs.

6779

6780

When a divisor must be normalized, either for the generic C

6781

@code{__udiv_qrnnd_c} or the multiply by inverse, the division performed is

6782

actually @m{a2^k,a*2^k} by @m{d2^k,d*2^k} where @ma{a} is the dividend and

6783

@ma{k} is the power necessary to have the high bit of @m{d2^k,d*2^k} set. The

6784

bit shifts for the dividend are usually accomplished ``on the fly'' meaning by

6785

extracting the appropriate bits at each step. Done this way the quotient

6786

limbs come out aligned ready to store. When only the remainder is wanted, an

6787

alternative is to take the dividend limbs unshifted and calculate @m{r = a

6788

\bmod d2^k, r = a mod d*2^k} followed by an extra final step @m{r2^k \bmod

6789

d2^k, r*2^k mod d*2^k}. This can help on CPUs with poor bit shifts or few

6790

registers.

6791

6792

The multiply by inverse can be done two limbs at a time. The calculation is

6793

basically the same, but the inverse is two limbs and the divisor treated as if

6794

padded with a low zero limb. This means more work, since the inverse will

6795

need a 2@cross{}2 multiply, but the four 1@cross{}1s to do that are

6796

independent and can therefore be done partly or wholly in parallel. Likewise

6797

for a 2@cross{}1 calculating @m{qd,q*d}. The net effect is to process two

6798

limbs with roughly the same two multiplies worth of latency that one limb at a

6799

time gives. This extends to 3 or 4 limbs at a time, though the extra work to

6800

apply the inverse will almost certainly soon reach the limits of multiplier

6801

throughput.

6802

6803

A similar approach in reverse can be taken to process just half a limb at a

6804

time if the divisor is only a half limb. In this case the 1@cross{}1 multiply

6805

for the inverse effectively becomes two @m{1\over2@cross{}1, (1/2)x1} for each

6806

limb, which can be a saving on CPUs with a fast half limb multiply, or in fact

6807

if the only multiply is a half limb, and especially if it's not pipelined.

6808

6809

6810

@node Basecase Division, Divide and Conquer Division, Single Limb Division, Division Algorithms

6811

@subsection Basecase Division

6812

6813

Basecase N@cross{}M division is like long division done by hand, but in base

6814

@m{2\GMPraise{@code{mp\_bits\_per\_limb}}, 2^mp_bits_per_limb}. See Knuth

6815

section 4.3.1 algorithm D, and @file{mpn/generic/sb_divrem_mn.c}.

6816

6817

Briefly stated, while the dividend remains larger than the divisor, a high

6818

quotient limb is formed and the N@cross{}1 product @m{qd,q*d} subtracted at

6819

the top end of the dividend. With a normalized divisor (most significant bit

6820

set), each quotient limb can be formed with a 2@cross{}1 division and a

6821

1@cross{}1 multiplication plus some subtractions. The 2@cross{}1 division is

6822

by the high limb of the divisor and is done either with a hardware divide or a

6823

multiply by inverse (the same as in @ref{Single Limb Division}) whichever is

6824

faster. Such a quotient is sometimes one too big, requiring an addback of the

6825

divisor, but that happens rarely.

6826

6827

With Q=N@minus{}M being the number of quotient limbs, this is an

6828

@m{O(QM),O(Q*M)} algorithm and will run at a speed similar to a basecase

6829

Q@cross{}M multiplication, differing in fact only in the extra multiply and

6830

divide for each of the Q quotient limbs.

6831

6832

6833

@node Divide and Conquer Division, Exact Division, Basecase Division, Division Algorithms

6834

@subsection Divide and Conquer Division

6835

6836

For divisors larger than @code{DC_THRESHOLD}, division is done by dividing.

6837

Or to be precise by a recursive divide and conquer algorithm based on work by

6838

Moenck and Borodin, Jebelean, and Burnikel and Ziegler (@pxref{References}).

6839

6840

The algorithm consists essentially of recognising that a 2N@cross{}N division

6841

can be done with the basecase division algorithm (@pxref{Basecase Division}),

6842

but using N/2 limbs as a base, not just a single limb. This way the

6843

multiplications that arise are (N/2)@cross{}(N/2) and can take advantage of

6844

Karatsuba and higher multiplication algorithms (@pxref{Multiplication

6845

Algorithms}). The ``digits'' of the quotient are formed by recursive

6846

N@cross{}(N/2) divisions.

6847

6848

If the (N/2)@cross{}(N/2) multiplies are done with a basecase multiplication

6849

then the work is about the same as a basecase division, but with more function

6850

call overheads and with some subtractions separated from the multiplies.

6851

These overheads mean that it's only when N/2 is above

6852

@code{KARATSUBA_MUL_THRESHOLD} that divide and conquer is of use.

6853

6854

@code{DC_THRESHOLD} is based on the divisor size N, so it will be somewhere

6855

above twice @code{KARATSUBA_MUL_THRESHOLD}, but how much above depends on the

6856

CPU. An optimized @code{mpn_mul_basecase} can lower @code{DC_THRESHOLD} a

6857

little by offering a ready-made advantage over repeated @code{mpn_submul_1}

6858

calls.

6859

6860

Divide and conquer is asymptotically @m{O(M(N)\log N),O(M(N)*log(N))} where

6861

@ma{M(N)} is the time for an N@cross{}N multiplication done with FFTs. The

6862

actual time is a sum over multiplications of the recursed sizes, as can be

6863

seen near the end of section 2.2 of Burnikel and Ziegler. For example, within

6864

the Toom-3 range, divide and conquer is @m{2.63M(N), 2.63*M(N)}. With higher

6865

algorithms the @ma{M(N)} term improves and the multiplier tends to @m{\log N,

6866

log(N)}. In practice, at moderate to large sizes, a 2N@cross{}N division is

6867

about 2 to 4 times slower than an N@cross{}N multiplication.

6868

6869

Newton's method used for division is asymptotically @ma{O(M(N))} and should

6870

therefore be superior to divide and conquer, but it's believed this would only

6871

be for large to very large N.

6872

6873

6874

@node Exact Division, Exact Remainder, Divide and Conquer Division, Division Algorithms

6875

@subsection Exact Division

6876

6877

A so-called exact division is when the dividend is known to be an exact

6878

multiple of the divisor. Jebelean's exact division algorithm uses this

6879

knowledge to make some significant optimizations (@pxref{References}).

6880

6881

The idea can be illustrated in decimal for example with 368154 divided by

6882

543. Because the low digit of the dividend is 4, the low digit of the

6883

quotient must be 8. This is arrived at from @m{4 \mathord{\times} 7 \bmod 10,

6884

4*7 mod 10}, using the fact 7 is the modular inverse of 3 (the low digit of

6885

the divisor), since @m{3 \mathord{\times} 7 \mathop{\equiv} 1 \bmod 10, 3*7

6886

@equiv{} 1 mod 10}. So @m{8\mathord{\times}543 = 4344,8*543=4344} can be

6887

subtracted from the dividend leaving 363810. Notice the low digit has become

6888

zero.

6889

6890

The procedure is repeated at the second digit, with the next quotient digit 7

6891

(@m{1 \mathord{\times} 7 \bmod 10, 7 @equiv{} 1*7 mod 10}), subtracting

6892

@m{7\mathord{\times}543 = 3801,7*543=3801}, leaving 325800. And finally at

6893

the third digit with quotient digit 6 (@m{8 \mathord{\times} 7 \bmod 10, 8*7

6894

mod 10}), subtracting @m{6\mathord{\times}543 = 3258,6*543=3258} leaving 0.

6895

So the quotient is 678.

6896

6897

Notice however that the multiplies and subtractions don't need to extend past

6898

the low three digits of the dividend, since that's enough to determine the

6899

three quotient digits. For the last quotient digit no subtraction is needed

6900

at all. On a 2N@cross{}N division like this one, only about half the work of

6901

a normal basecase division is necessary.

6902

6903

For an N@cross{}M exact division producing Q=N@minus{}M quotient limbs, the

6904

saving over a normal basecase division is in two parts. Firstly, each of the

6905

Q quotient limbs needs only one multiply, not a 2@cross{}1 divide and

6906

multiply. Secondly, the crossproducts are reduced when @ma{Q>M} to

6907

@m{QM-M(M+1)/2,Q*M-M*(M+1)/2}, or when @ma{Q@le{}M} to @m{Q(Q-1)/2,

6908

Q*(Q-1)/2}. Notice the savings are complementary. If Q is big then many

6909

divisions are saved, or if Q is small then the crossproducts reduce to a small

6910

number.

6911

6912

The modular inverse used is calculated efficiently by @code{modlimb_invert} in

6913

@file{gmp-impl.h}. This does four multiplies for a 32-bit limb, or six for a

6914

64-bit limb. @file{tune/modlinv.c} has some alternate implementations that

6915

might suit processors better at bit twiddling than multiplying.

6916

6917

The sub-quadratic exact division described by Jebelean in ``Exact Division

6918

with Karatsuba Complexity'' is not currently implemented. It uses a

6919

rearrangement similar to the divide and conquer for normal division

6920

(@pxref{Divide and Conquer Division}), but operating from low to high. A

6921

further possibility not currently implemented is ``Bidirectional Exact Integer

6922

Division'' by Krandick and Jebelean which forms quotient limbs from both the

6923

high and low ends of the dividend, and can halve once more the number of

6924

crossproducts needed in a 2N@cross{}N division.

6925

6926

A special case exact division by 3 exists in @code{mpn_divexact_by3},

6927

supporting Toom-3 multiplication and @code{mpq} canonicalizations. It forms

6928

quotient digits with a multiply by the modular inverse of 3 (which is

6929

@code{0xAA..AAB}) and uses two comparisons to determine a borrow for the next

6930

limb. The multiplications don't need to be on the dependent chain, as long as

6931

the effect of the borrows is applied. Only a few optimized assembler

6932

implementations currently exist.

6933

6934

6935

@node Exact Remainder, Small Quotient Division, Exact Division, Division Algorithms

6936

@subsection Exact Remainder

6937

6938

If the exact division algorithm is done with a full subtraction at each stage

6939

and the dividend isn't a multiple of the divisor, then low zero limbs are

6940

produced but with a remainder in the high limbs. For dividend @ma{a}, divisor

6941

@ma{d}, quotient @ma{q}, and @m{b = 2 \GMPraise{@code{mp\_bits\_per\_limb}}, b

6942

= 2^mp_bits_per_limb}, then this remainder @ma{r} is of the form

6943

@tex

6944

$$ a = qd + r b^n $$

6945

@end tex

6946

@ifnottex

6947

6948

@example

6949

a = q*d + r*b^n

6950

@end example

6951

6952

@end ifnottex

6953

@ma{n} represents the number of zero limbs produced by the subtractions, that

6954

being the number of limbs produced for @ma{q}. @ma{r} will be in the range

6955

@ma{0@le{}r<d} and can be viewed as a remainder, but one shifted up by a

6956

factor of @ma{b^n}.

6957

6958

Carrying out full subtractions at each stage means the same number of cross

6959

products must be done as a normal division, but there's still some single limb

6960

divisions saved. When @ma{d} is a single limb some simplifications arise,

6961

providing good speedups on a number of processors.

6962

6963

@code{mpn_bdivmod}, @code{mpn_divexact_by3}, @code{mpn_modexact_1_odd} and the

6964

@code{redc} function in @code{mpz_powm} differ subtly in how they return

6965

@ma{r}, leading to some negations in the above formula, but all are

6966

essentially the same.

6967

6968

Clearly @ma{r} is zero when @ma{a} is a multiple of @ma{d}, and this leads to

6969

divisibility or congruence tests which are potentially more efficient than a

6970

normal division.

6971

6972

The factor of @ma{b^n} on @ma{r} can be ignored in a GCD when @ma{d} is odd,

6973

hence the use of @code{mpn_bdivmod} in @code{mpn_gcd}, and the use of

6974

@code{mpn_modexact_1_odd} by @code{mpn_gcd_1} and @code{mpz_kronecker_ui} etc

6975

(@pxref{Greatest Common Divisor Algorithms}).

6976

6977

Montgomery's REDC method for modular multiplications uses operands of the form

6978

of @m{xb^{-n}, x*b^-n} and @m{yb^{-n}, y*b^-n} and on calculating @m{(xb^{-n})

6979

(yb^{-n}), (x*b^-n)*(y*b^-n)} uses the factor of @ma{b^n} in the exact

6980

remainder to reach a product in the same form @m{(xy)b^{-n},

6981

(x*y)*b^-n} (@pxref{Modular Powering Algorithm}).

6982

6983

Notice that @ma{r} generally gives no useful information about the ordinary

6984

remainder @ma{a @bmod d} since @ma{b^n @bmod d} could be anything. If however

6985

@ma{b^n @equiv{} 1 @bmod d}, then @ma{r} is the negative of the ordinary

6986

remainder. This occurs whenever @ma{d} is a factor of @ma{b^n-1}, as for

6987

example with 3 in @code{mpn_divexact_by3}. Other such factors include 5, 17

6988

and 257, but no particular use has been found for this.

6989

6990

6991

@node Small Quotient Division, , Exact Remainder, Division Algorithms

6992

@subsection Small Quotient Division

6993

6994

An N@cross{}M division where the number of quotient limbs Q=N@minus{}M is

6995

small can be optimized somewhat.

6996

6997

An ordinary basecase division normalizes the divisor by shifting it to make

6998

the high bit set, shifting the dividend accordingly, and shifting the

6999

remainder back down at the end of the calculation. This is wasteful if only a

7000

few quotient limbs are to be formed. Instead a division of just the top

7001

@m{\rm2Q,2*Q} limbs of the dividend by the top Q limbs of the divisor can be

7002

used to form a trial quotient. This requires only those limbs normalized, not

7003

the whole of the divisor and dividend.

7004

7005

A multiply and subtract then applies the trial quotient to the M@minus{}Q

7006

unused limbs of the divisor and N@minus{}Q dividend limbs (which includes Q

7007

limbs remaining from the trial quotient division). The starting trial

7008

quotient can be 1 or 2 too big, but all cases of 2 too big and most cases of 1

7009

too big are detected by first comparing the most significant limbs that will

7010

arise from the subtraction. An addback is done if the quotient still turns

7011

out to be 1 too big.

7012

7013

This whole procedure is essentially the same as one step of the basecase

7014

algorithm done in a Q limb base, though with the trial quotient test done only

7015

with the high limbs, not an entire Q limb ``digit'' product. The correctness

7016

of this weaker test can be established by following the argument of Knuth

7017

section 4.3.1 exercise 20 but with the @m{v_2 \GMPhat q > b \GMPhat r

7018

+ u_2, v2*q>b*r+u2} condition appropriately relaxed.

7019

7020

7021

@need 1000

7022

@node Greatest Common Divisor Algorithms, Powering Algorithms, Division Algorithms, Algorithms

7023

@section Greatest Common Divisor

7024

@cindex Greatest common divisor algorithms

7025

7026

@menu

7027

* Binary GCD::

7028

* Accelerated GCD::

7029

* Extended GCD::

7030

* Jacobi Symbol::

7031

@end menu

7032

7033

7034

@node Binary GCD, Accelerated GCD, Greatest Common Divisor Algorithms, Greatest Common Divisor Algorithms

7035

@subsection Binary GCD

7036

7037

At small sizes GMP uses an @ma{O(N^2)} binary style GCD. This is described in

7038

many textbooks, for example Knuth section 4.5.2 algorithm B. It simply

7039

consists of successively reducing operands @ma{a} and @ma{b} using

7040

@ma{@gcd{}(a,b) = @gcd{}(@min{}(a,b),@abs{}(a-b))}, and also that if @ma{a}

7041

and @ma{b} are first made odd then @ma{@abs{}(a-b)} is even and factors of two

7042

can be discarded.

7043

7044

Variants like letting @ma{a-b} become negative and doing a different next step

7045

are of interest only as far as they suit particular CPUs, since on small

7046

operands it's machine dependent factors that determine performance.

7047

7048

The Euclidean GCD algorithm, as per Knuth algorithms E and A, reduces using

7049

@ma{a @bmod b} but this has so far been found to be slower everywhere. One

7050

reason the binary method does well is that the implied quotient at each step

7051

is usually small, so often only one or two subtractions are needed to get the

7052

same effect as a division. Quotients 1, 2 and 3 for example occur 67.7% of

7053

the time, see Knuth section 4.5.3 Theorem E.

7054

7055

When the implied quotient is large, meaning @ma{b} is much smaller than

7056

@ma{a}, then a division is worthwhile. This is the basis for the initial

7057

@ma{a @bmod b} reductions in @code{mpn_gcd} and @code{mpn_gcd_1} (the latter

7058

for both N@cross{}1 and 1@cross{}1 cases). But after that initial reduction,

7059

big quotients occur too rarely to make it worth checking for them.

7060

7061

7062

@node Accelerated GCD, Extended GCD, Binary GCD, Greatest Common Divisor Algorithms

7063

@subsection Accelerated GCD

7064

7065

For sizes above @code{GCD_ACCEL_THRESHOLD}, GMP uses the Accelerated GCD

7066

algorithm described independently by Weber and Jebelean (the latter as the

7067

``Generalized Binary'' algorithm), @pxref{References}. This algorithm is

7068

still @ma{O(N^2)}, but is much faster than the binary algorithm since it does

7069

fewer multi-precision operations. It consists of alternating the @ma{k}-ary

7070

reduction by Sorenson, and a ``dmod'' exact remainder reduction.

7071

7072

For operands @ma{u} and @ma{v} the @ma{k}-ary reduction replaces @ma{u} with

7073

@m{nv-du,n*v-d*u} where @ma{n} and @ma{d} are single limb values chosen to

7074

give two trailing zero limbs on that value, which can be stripped. @ma{n} and

7075

@ma{d} are calculated using an algorithm similar to half of a two limb GCD

7076

(see @code{find_a} in @file{mpn/generic/gcd.c}).

7077

7078

When @ma{u} and @ma{v} differ in size by more than a certain number of bits, a

7079

dmod is performed to zero out bits at the low end of the larger. It consists

7080

of an exact remainder style division applied to an appropriate number of bits

7081

(@pxref{Exact Division}, and @pxref{Exact Remainder}). This is faster than a

7082

@ma{k}-ary reduction but useful only when the operands differ in size.

7083

There's a dmod after each @ma{k}-ary reduction, and if the dmod leaves the

7084

operands still differing in size then it's repeated.

7085

7086

The @ma{k}-ary reduction step can introduce spurious factors into the GCD

7087

calculated, and these are eliminated at the end by taking GCDs with the

7088

original inputs @ma{@gcd{}(u,@gcd{}(v,g))} using the binary algorithm. Since

7089

@ma{g} is almost always small this takes very little time.

7090

7091

At small sizes the algorithm needs a good implementation of @code{find_a}. At

7092

larger sizes it's dominated by @code{mpn_addmul_1} applying @ma{n} and @ma{d}.

7093

7094

7095

@node Extended GCD, Jacobi Symbol, Accelerated GCD, Greatest Common Divisor Algorithms

7096

@subsection Extended GCD

7097

7098

The extended GCD calculates @ma{@gcd{}(a,b)} and also cofactors @ma{x} and

7099

@ma{y} satisfying @m{ax+by=\gcd(a@C{}b), a*x+b*y=gcd(a@C{}b)}. Lehmer's

7100

multi-step improvement of the extended Euclidean algorithm is used. See Knuth

7101

section 4.5.2 algorithm L, and @file{mpn/generic/gcdext.c}. This is an

7102

@ma{O(N^2)} algorithm.

7103

7104

The multipliers at each step are found using single limb calculations for

7105

sizes up to @code{GCDEXT_THRESHOLD}, or double limb calculations above that.

7106

The single limb code is faster but doesn't produce full-limb multipliers,

7107

hence not making full use of the @code{mpn_addmul_1} calls.

7108

7109

When a CPU has a data-dependent multiplier, meaning one which is faster on

7110

operands with fewer bits, the extra work in the double-limb calculation might

7111

only save some looping overheads, leading to a large @code{GCDEXT_THRESHOLD}.

7112

7113

Currently the single limb calculation doesn't optimize for the small quotients

7114

that often occur, and this can lead to unusually low values of

7115

@code{GCDEXT_THRESHOLD}, depending on the CPU.

7116

7117

An analysis of double-limb calculations can be found in ``A Double-Digit

7118

Lehmer-Euclid Algorithm'' by Jebelean (@pxref{References}). The code in GMP

7119

was developed independently.

7120

7121

It should be noted that when a double limb calculation is used, it's used for

7122

the whole of that GCD, it doesn't fall back to single limb part way through.

7123

This is because as the algorithm proceeds, the inputs @ma{a} and @ma{b} are

7124

reduced, but the cofactors @ma{x} and @ma{y} grow, so the multipliers at each

7125

step are applied to a roughly constant total number of limbs.

7126

7127

7128

@node Jacobi Symbol, , Extended GCD, Greatest Common Divisor Algorithms

7129

@subsection Jacobi Symbol

7130

7131

@code{mpz_jacobi} and @code{mpz_kronecker} are currently implemented with a

7132

simple binary algorithm similar to that described for the GCDs (@pxref{Binary

7133

GCD}). They're not very fast when both inputs are large. Lehmer's multi-step

7134

improvement or a binary based multi-step algorithm is likely to be better.

7135

7136

When one operand fits a single limb, and that includes @code{mpz_kronecker_ui}

7137

and friends, an initial reduction is done with either @code{mpn_mod_1} or

7138

@code{mpn_modexact_1_odd}, followed by the binary algorithm on a single limb.

7139

The binary algorithm is well suited to a single limb, and the whole

7140

calculation in this case is quite efficient.

7141

7142

In all the routines sign changes for the result are accumulated using some bit

7143

twiddling, avoiding table lookups or conditional jumps.

7144

7145

7146

@need 1000

7147

@node Powering Algorithms, Root Extraction Algorithms, Greatest Common Divisor Algorithms, Algorithms

7148

@section Powering Algorithms

7149

@cindex Powering algorithms

7150

7151

@menu

7152

* Normal Powering Algorithm::

7153

* Modular Powering Algorithm::

7154

@end menu

7155

7156

7157

@node Normal Powering Algorithm, Modular Powering Algorithm, Powering Algorithms, Powering Algorithms

7158

@subsection Normal Powering

7159

7160

Normal @code{mpz} or @code{mpf} powering uses a simple binary algorithm,

7161

successively squaring and then multiplying by the base when a 1 bit is seen in

7162

the exponent, as per Knuth section 4.6.3. The ``left to right''

7163

variant described there is used rather than algorithm A, since it's just as

7164

easy and can be done with somewhat less temporary memory.

7165

7166

7167

@node Modular Powering Algorithm, , Normal Powering Algorithm, Powering Algorithms

7168

@subsection Modular Powering

7169

7170

Modular powering is implemented using a @ma{2^k}-ary sliding window algorithm,

7171

as per ``Handbook of Applied Cryptography'' algorithm 14.85

7172

(@pxref{References}). @ma{k} is chosen according to the size of the exponent.

7173

Larger exponents use larger values of @ma{k}, the choice being made to

7174

minimize the average number of multiplications that must supplement the

7175

squaring.

7176

7177

The modular multiplies and squares use either a simple division or the REDC

7178

method by Montgomery (@pxref{References}). REDC is a little faster,

7179

essentially saving N single limb divisions in a fashion similar to an exact

7180

remainder (@pxref{Exact Remainder}). The current REDC has some limitations.

7181

It's only @ma{O(N^2)} so above @code{POWM_THRESHOLD} division becomes faster

7182

and is used. It doesn't attempt to detect small bases, but rather always uses

7183

a REDC form, which is usually a full size operand. And lastly it's only

7184

applied to odd moduli.

7185

7186

7187

@node Root Extraction Algorithms, Radix Conversion Algorithms, Powering Algorithms, Algorithms

7188

@section Root Extraction Algorithms

7189

@cindex Root extraction algorithms

7190

7191

@menu

7192

* Square Root Algorithm::

7193

* Nth Root Algorithm::

7194

* Perfect Square Algorithm::

7195

* Perfect Power Algorithm::

7196

@end menu

7197

7198

7199

@node Square Root Algorithm, Nth Root Algorithm, Root Extraction Algorithms, Root Extraction Algorithms

7200

@subsection Square Root

7201

7202

Square roots are taken using the ``Karatsuba Square Root'' algorithm by Paul

7203

Zimmermann (@pxref{References}). This is expressed in a divide and conquer

7204

form, but as noted in the paper it can also be viewed as a discrete variant of

7205

Newton's method.

7206

7207

In the Karatsuba multiplication range this is an @m{O({3\over2}

7208

M(N/2)),O(1.5*M(N/2))} algorithm, where @ma{M(n)} is the time to multiply two

7209

numbers of @ma{n} limbs. In the FFT multiplication range this grows to a

7210

bound of @m{O(6 M(N/2)),O(6*M(N/2))}. In practice a factor of about 1.5 to

7211

1.8 is found in the Karatsuba and Toom-3 ranges, growing to 2 or 3 in the FFT

7212

range.

7213

7214

The algorithm does all its calculations in integers and the resulting

7215

@code{mpn_sqrtrem} is used for both @code{mpz_sqrt} and @code{mpf_sqrt}.

7216

The extended precision given by @code{mpf_sqrt_ui} is obtained by

7217

padding with zero limbs.

7218

7219

7220

@node Nth Root Algorithm, Perfect Square Algorithm, Square Root Algorithm, Root Extraction Algorithms

7221

@subsection Nth Root

7222

7223

Integer Nth roots are taken using Newton's method with the following

7224

iteration, where @ma{A} is the input and @ma{n} is the root to be taken.

7225

@tex

7226

$$a_{i+1} = {1\over n} \left({A \over a_i^{n-1}} + (n-1)a_i \right)$$

7227

@end tex

7228

@ifnottex

7229

7230

@example

7231

1 A

7232

a[i+1] = - * ( --------- + (n-1)*a[i] )

7233

n a[i]^(n-1)

7234

@end example

7235

7236

@end ifnottex

7237

The initial approximation @m{a_1,a[1]} is generated bitwise by successively

7238

powering a trial root with or without new 1 bits, aiming to be just above the

7239

true root. The iteration converges quadratically when started from a good

7240

approximation. When @ma{n} is large more initial bits are needed to get good

7241

convergence. The current implementation is not particularly well optimized.

7242

7243

7244

@node Perfect Square Algorithm, Perfect Power Algorithm, Nth Root Algorithm, Root Extraction Algorithms

7245

@subsection Perfect Square

7246

7247

@code{mpz_perfect_square_p} is able to quickly exclude most non-squares by

7248

checking whether the input is a quadratic residue modulo some small integers.

7249

7250

The first test is modulo 256 which means simply examining the least

7251

significant byte. Only 44 different values occur as the low byte of a square,

7252

so 82.8% of non-squares can be immediately excluded. Similar tests modulo

7253

primes from 3 to 29 exclude 99.5% of those remaining, or if a limb is 64 bits

7254

then primes up to 53 are used, excluding 99.99%. A single N@cross{}1

7255

remainder using @code{PP} from @file{gmp-impl.h} quickly gives all these

7256

remainders.

7257

7258

A square root must still be taken for any value that passes the residue tests,

7259

to verify it's really a square and not one of the 0.086% (or 0.000156% for 64

7260

bits) non-squares that get through. @xref{Square Root Algorithm}.

7261

7262

7263

@node Perfect Power Algorithm, , Perfect Square Algorithm, Root Extraction Algorithms

7264

@subsection Perfect Power

7265

7266

Detecting perfect powers is required by some factorization algorithms.

7267

Currently @code{mpz_perfect_power_p} is implemented using repeated Nth root

7268

extractions, though naturally only prime roots need to be considered.

7269

(@xref{Nth Root Algorithm}.)

7270

7271

If a prime divisor @ma{p} with multiplicity @ma{e} can be found, then only

7272

roots which are divisors of @ma{e} need to be considered, much reducing the

7273

work necessary. To this end divisibility by a set of small primes is checked.

7274

7275

7276

@node Radix Conversion Algorithms, Other Algorithms, Root Extraction Algorithms, Algorithms

7277

@section Radix Conversion

7278

@cindex Radix conversion algorithms

7279

7280

Radix conversions are less important than other algorithms. A program

7281

dominated by conversions should probably use a different data representation.

7282

7283

@menu

7284

* Binary to Radix::

7285

* Radix to Binary::

7286

@end menu

7287

7288

7289

@node Binary to Radix, Radix to Binary, Radix Conversion Algorithms, Radix Conversion Algorithms

7290

@subsection Binary to Radix

7291

7292

Conversions from binary to a power-of-2 radix use a simple and fast @ma{O(N)}

7293

bit extraction algorithm.

7294

7295

Conversions from binary to other radices use repeated divisions, first by the

7296

biggest power of the radix that fits in a single limb, then by the radix on

7297

the remainders. This is an @ma{O(N^2)} algorithm and can be quite

7298

time-consuming on large inputs.

7299

7300

7301

@node Radix to Binary, , Binary to Radix, Radix Conversion Algorithms

7302

@subsection Radix to Binary

7303

7304

Conversions from a power-of-2 radix into binary use a simple and fast

7305

@ma{O(N)} bitwise concatenation algorithm.

7306

7307

Conversions from other radices use repeated multiplications, first

7308

accumulating as many digits as fit in a limb, then doing an N@cross{}1

7309

multi-precision multiplication. This is @ma{O(N^2)} and is certainly

7310

sub-optimal on sizes above the Karatsuba multiply threshold.

7311

7312

7313

@need 1000

7314

@node Other Algorithms, Assembler Coding, Radix Conversion Algorithms, Algorithms

7315

@section Other Algorithms

7316

7317

@menu

7318

* Factorial Algorithm::

7319

* Binomial Coefficients Algorithm::

7320

* Fibonacci Numbers Algorithm::

7321

* Lucas Numbers Algorithm::

7322

@end menu

7323

7324

7325

@node Factorial Algorithm, Binomial Coefficients Algorithm, Other Algorithms, Other Algorithms

7326

@subsection Factorial

7327

7328

Factorials @ma{n!} are calculated by a simple product from @ma{1} to @ma{n},

7329

but arranged into certain sub-products.

7330

7331

First as many factors as fit in a limb are accumulated, then two of those

7332

multiplied to give a 2-limb product. When two 2-limb products are ready

7333

they're multiplied to a 4-limb product, and when two 4-limbs are ready they're

7334

multiplied to an 8-limb product, etc. A stack of outstanding products is

7335

built up, with two of the same size multiplied together when ready.

7336

7337

Arranging for multiplications to have operands the same (or nearly the same)

7338

size means the Karatsuba and higher multiplication algorithms can be used.

7339

And even on sizes below the Karatsuba threshold an N@cross{}N multiply will

7340

give a basecase multiply more to work on.

7341

7342

An obvious improvement not currently implemented would be to strip factors of

7343

2 from the products and apply them at the end with a bit shift. Another

7344

possibility would be to determine the prime factorization of the result (which

7345

can be done easily), and use a powering method, at each stage squaring then

7346

multiplying in those primes with a 1 in their exponent at that point. The

7347

advantage would be some multiplies turned into squares.

7348

7349

7350

@node Binomial Coefficients Algorithm, Fibonacci Numbers Algorithm, Factorial Algorithm, Other Algorithms

7351

@subsection Binomial Coefficients

7352

7353

Binomial coefficients @m{\left({n}\atop{k}\right), C(n@C{}k)} are calculated

7354

by first arranging @ma{k @le{} n/2} using @m{\left({n}\atop{k}\right) =

7355

\left({n}\atop{n-k}\right), C(n@C{}k) = C(n@C{}n-k)} if necessary, and then

7356

evaluating the following product simply from @ma{i=2} to @ma{i=k}.

7357

@tex

7358

$$ \left({n}\atop{k}\right) = (n-k+1) \prod_{i=2}^{k} {{n-k+i} \over i} $$

7359

@end tex

7360

@ifnottex

7361

7362

@example

7363

k (n-k+i)

7364

C(n,k) = (n-k+1) * prod -------

7365

i=2 i

7366

@end example

7367

7368

@end ifnottex

7369

It's easy to show that each denominator @ma{i} will divide the product so far,

7370

so the exact division algorithm is used (@pxref{Exact Division}).

7371

7372

The numerators @ma{n-k+i} and denominators @ma{i} are first accumulated into

7373

as many fit a limb, to save multi-precision operations, though for

7374

@code{mpz_bin_ui} this applies only to the divisors, since @ma{n} is an

7375

@code{mpz_t} and @ma{n-k+i} in general won't fit in a limb at all.

7376

7377

An obvious improvement would be to strip factors of 2 from each multiplier and

7378

divisor and count them separately, to be applied with a bit shift at the end.

7379

Factors of 3 and perhaps 5 could even be handled similarly. Another

7380

possibility, if @ma{n} is not too big, would be to determine the prime

7381

factorization of the result based on the factorials involved, and power up

7382

those primes appropriately. This would help most when @ma{k} is near

7383

@ma{n/2}.

7384

7385

7386

@node Fibonacci Numbers Algorithm, Lucas Numbers Algorithm, Binomial Coefficients Algorithm, Other Algorithms

7387

@subsection Fibonacci Numbers

7388

7389

The Fibonacci functions @code{mpz_fib_ui} and @code{mpz_fib2_ui} are designed

7390

for calculating isolated @m{F_n,F[n]} or @m{F_n,F[n]},@m{F_{n-1},F[n-1]}

7391

values efficiently.

7392

7393

For small @ma{n}, a table of single limb values in @code{__gmp_fib_table} is

7394

used. On a 32-bit limb this goes up to @m{F_{47},F[47]}, or on a 64-bit limb

7395

up to @m{F_{93},F[93]}. For convenience the table starts at @m{F_{-1},F[-1]}.

7396

7397

Beyond the table, values are generated with a binary powering algorithm,

7398

calculating a pair @m{F_n,F[n]} and @m{F_{n-1},F[n-1]} working from high to

7399

low across the bits of @ma{n}. The formulas used are

7400

@tex

7401

$$\eqalign{

7402

F_{2k+1} &= 4F_k^2 - F_{k-1}^2 + 2(-1)^k \cr

7403

F_{2k-1} &= F_k^2 + F_{k-1}^2 \cr

7404

F_{2k} &= F_{2k+1} - F_{2k-1}

7405

}$$

7406

@end tex

7407

@ifnottex

7408

7409

@example

7410

F[2k+1] = 4*F[k]^2 - F[k-1]^2 + 2*(-1)^k

7411

F[2k-1] = F[k]^2 + F[k-1]^2

7412

7413

F[2k] = F[2k+1] - F[2k-1]

7414

@end example

7415

7416

@end ifnottex

7417

At each step, @ma{k} is the high @ma{b} bits of @ma{n}. If the next bit of

7418

@ma{n} is 0 then @m{F_{2k},F[2k]},@m{F_{2k-1},F[2k-1]} is used, or if it's a 1

7419

then @m{F_{2k+1},F[2k+1]},@m{F_{2k},F[2k]} is used, and the process repeated

7420

until all bits of @ma{n} are incorporated. Notice these formulas require just

7421

two squares per bit of @ma{n}.

7422

7423

It'd be possible to handle the first few @ma{n} above the single limb table

7424

with simple additions, using the defining Fibonacci recurrence @m{F_{k+1} =

7425

F_k + F_{k-1}, F[k+1]=F[k]+F[k-1]}, but this is not done since it usually

7426

turns out to be faster for only about 10 or 20 values of @ma{n}, and including

7427

a block of code for just those doesn't seem worthwhile. If they really

7428

mattered it'd be better to extend the data table.

7429

7430

Using a table avoids lots of calculations on small numbers, and makes small

7431

@ma{n} go fast. A bigger table would make more small @ma{n} go fast, it's

7432

just a question of balancing size against desired speed. For GMP the code is

7433

kept compact, with the emphasis primarily on a good powering algorithm.

7434

7435

@code{mpz_fib2_ui} returns both @m{F_n,F[n]} and @m{F_{n-1},F[n-1]}, but

7436

@code{mpz_fib_ui} is only interested in @m{F_n,F[n]}. In this case the last

7437

step of the algorithm can become one multiply instead of two squares. One of

7438

the following two formulas is used, according as @ma{n} is odd or even.

7439

@tex

7440

$$\eqalign{

7441

F_{2k} &= F_k (F_k + 2F_{k-1}) \cr

7442

F_{2k+1} &= (2F_k + F_{k-1}) (2F_k - F_{k-1}) + 2(-1)^k

7443

}$$

7444

@end tex

7445

@ifnottex

7446

7447

@example

7448

F[2k] = F[k]*(F[k]+2F[k-1])

7449

7450

F[2k+1] = (2F[k]+F[k-1])*(2F[k]-F[k-1]) + 2*(-1)^k

7451

@end example

7452

7453

@end ifnottex

7454

@m{F_{2k+1},F[2k+1]} here is the same as above, just rearranged to be a

7455

multiply. For interest, the @m{2(-1)^k, 2*(-1)^k} term both here and above

7456

can be applied just to the low limb of the calculation, without a carry or

7457

borrow into further limbs, which saves some code size. See comments with

7458

@code{mpz_fib_ui} and the internal @code{mpn_fib2_ui} for how this is done.

7459

7460

7461

@node Lucas Numbers Algorithm, , Fibonacci Numbers Algorithm, Other Algorithms

7462

@subsection Lucas Numbers

7463

7464

@code{mpz_lucnum2_ui} derives a pair of Lucas numbers from a pair of Fibonacci

7465

numbers with the following simple formulas.

7466

@tex

7467

$$\eqalign{

7468

L_k &= F_k + 2F_{k-1} \cr

7469

L_{k-1} &= 2F_k - F_{k-1}

7470

}$$

7471

@end tex

7472

@ifnottex

7473

7474

@example

7475

L[k] = F[k] + 2*F[k-1]

7476

L[k-1] = 2*F[k] - F[k-1]

7477

@end example

7478

7479

@end ifnottex

7480

@code{mpz_lucnum_ui} is only interested in @m{L_n,L[n]}, and some work can be

7481

saved. Trailing zero bits on @ma{n} can be handled with a single square each.

7482

@tex

7483

$$ L_{2k} = L_k^2 - 2(-1)^k $$

7484

@end tex

7485

@ifnottex

7486

7487

@example

7488

L[2k] = L[k]^2 - 2*(-1)^k

7489

@end example

7490

7491

@end ifnottex

7492

And the lowest 1 bit can be handled with one multiply of a pair of Fibonacci

7493

numbers, similar to what @code{mpz_fib_ui} does.

7494

@tex

7495

$$ L_{2k+1} = 5F_{k-1} (2F_k + F_{k-1}) - 4(-1)^k $$

7496

@end tex

7497

@ifnottex

7498

7499

@example

7500

L[2k+1] = 5*F[k-1]*(2*F[k]+F[k-1]) - 4*(-1)^k

7501

@end example

7502

7503

@end ifnottex

7504

7505

7506

@node Assembler Coding, , Other Algorithms, Algorithms

7507

@section Assembler Coding

7508

7509

The assembler subroutines in GMP are the most significant source of speed at

7510

small to moderate sizes. At larger sizes algorithm selection becomes more

7511

important, but of course speedups in low level routines will still speed up

7512

everything proportionally.

7513

7514

Carry handling and widening multiplies that are important for GMP can't be

7515

easily expressed in C. GCC @code{asm} blocks help a lot and are provided in

7516

@file{longlong.h}, but hand coding low level routines invariably offers a

7517

speedup over generic C by a factor of anything from 2 to 10.

7518

7519

@menu

7520

* Assembler Code Organisation::

7521

* Assembler Basics::

7522

* Assembler Carry Propagation::

7523

* Assembler Cache Handling::

7524

* Assembler Floating Point::

7525

* Assembler SIMD Instructions::

7526

* Assembler Software Pipelining::

7527

* Assembler Loop Unrolling::

7528

@end menu

7529

7530

7531

@node Assembler Code Organisation, Assembler Basics, Assembler Coding, Assembler Coding

7532

@subsection Code Organisation

7533

7534

The various @file{mpn} subdirectories contain machine-dependent code, written

7535

in C or assembler. The @file{mpn/generic} subdirectory contains default code,

7536

used when there's no machine-specific version of a particular file.

7537

7538

Each @file{mpn} subdirectory is for an ISA family. Generally 32-bit and

7539

64-bit variants in a family cannot share code and will have separate

7540

directories. Within a family further subdirectories may exist for CPU

7541

variants.

7542

7543

7544

@node Assembler Basics, Assembler Carry Propagation, Assembler Code Organisation, Assembler Coding

7545

@subsection Assembler Basics

7546

7547

@code{mpn_addmul_1} and @code{mpn_submul_1} are the most important routines

7548

for overall GMP performance. All multiplications and divisions come down to

7549

repeated calls to these. @code{mpn_add_n}, @code{mpn_sub_n},

7550

@code{mpn_lshift} and @code{mpn_rshift} are next most important.

7551

7552

On some CPUs assembler versions of the internal functions

7553

@code{mpn_mul_basecase} and @code{mpn_sqr_basecase} give significant speedups,

7554

mainly through avoiding function call overheads. They can also potentially

7555

make better use of a wide superscalar processor.

7556

7557

The restrictions on overlaps between sources and destinations

7558

(@pxref{Low-level Functions}) are designed to facilitate a variety of

7559

implementations. For example, knowing @code{mpn_add_n} won't have partly

7560

overlapping sources and destination means reading can be done far ahead of

7561

writing on superscalar processors, and loops can be vectorized on a vector

7562

processor, depending on the carry handling.

7563

7564

7565

@node Assembler Carry Propagation, Assembler Cache Handling, Assembler Basics, Assembler Coding

7566

@subsection Carry Propagation

7567

7568

The problem that presents most challenges in GMP is propagating carries from

7569

one limb to the next. In functions like @code{mpn_addmul_1} and

7570

@code{mpn_add_n}, carries are the only dependencies between limb operations.

7571

7572

On processors with carry flags, a straightforward CISC style @code{adc} is

7573

generally best. AMD K6 @code{mpn_addmul_1} however is an example of an

7574

unusual set of circumstances where a branch works out better.

7575

7576

On RISC processors generally an add and compare for overflow is used. This

7577

sort of thing can be seen in @file{mpn/generic/aors_n.c}. Some carry

7578

propagation schemes require 4 instructions, meaning at least 4 cycles per

7579

limb, but other schemes may use just 1 or 2. On wide superscalar processors

7580

performance may be completely determined by the number of dependent

7581

instructions between carry-in and carry-out for each limb.

7582

7583

On vector processors good use can be made of the fact that a carry bit only

7584

very rarely propagates more than one limb. When adding a single bit to a

7585

limb, there's only a carry out if that limb was @code{0xFF...FF} which on

7586

random data will be only 1 in @m{2\GMPraise{@code{mp\_bits\_per\_limb}},

7587

2^mp_bits_per_limb}. @file{mpn/cray/add_n.c} is an example of this, it adds

7588

all limbs in parallel, adds one set of carry bits in parallel and then only

7589

rarely needs to fall through to a loop propagating further carries.

7590

7591

On the x86s, GCC (as of version 2.95.2) doesn't generate particularly good code

7592

for the RISC style idioms that are necessary to handle carry bits in

7593

C. Often conditional jumps are generated where @code{adc} or @code{sbb} forms

7594

would be better. And so unfortunately almost any loop involving carry bits

7595

needs to be coded in assembler for best results.

7596

7597

7598

@node Assembler Cache Handling, Assembler Floating Point, Assembler Carry Propagation, Assembler Coding

7599

@subsection Cache Handling

7600

7601

GMP aims to perform well both on operands that fit entirely in L1 cache and

7602

those that don't. In the assembler subroutines this means prefetching, either

7603

always or when large enough operands are presented.

7604

7605

Pre-fetching sources combines well with loop unrolling, since a prefetch can

7606

be initiated once per unrolled loop (or more than once if the loop processes

7607

more than one cache line).

7608

7609

Pre-fetching destinations won't be necessary if the CPU has a big enough store

7610

queue. Older processors without a write-allocate L1 however will want

7611

destination prefetching, to avoid repeated write-throughs, unless they can

7612

keep up with the rate at which destination limbs are produced.

7613

7614

The distance ahead to prefetch will be determined by the rate data is

7615

processed versus the time it takes to bring a line up to L1. Naturally the

7616

net data rate from L2 or RAM will always limit the rate of data processing.

7617

Prefetch distance may also be limited by the number of prefetches the

7618

processor can have in progress at any one time.

7619

7620

If a special prefetch instruction doesn't exist then a plain load can be used,

7621

so long as the CPU supports out-of-order loads. But this may mean having a

7622

second copy of a loop so that the last few limbs can be processed without

7623

prefetching, since reading past the end of an operand must be avoided.

7624

7625

7626

@node Assembler Floating Point, Assembler SIMD Instructions, Assembler Cache Handling, Assembler Coding

7627

@subsection Floating Point

7628

7629

Floating point arithmetic is used in GMP for multiplications on CPUs with poor

7630

integer multipliers. Floating point generally doesn't suit other operations

7631

like additions or shifts, due to difficulties implementing carry handling.

7632

7633

With IEEE 53-bit double precision floats, integer multiplications producing up

7634

to 53 bits will give exact results. Breaking a multiplication into

7635

16@cross{}@ma{32@rightarrow{}48} bit pieces is convenient. With some care

7636

though three 21@cross{}@ma{32@rightarrow{}53} bit products can be used to do a

7637

64@cross{}32 multiply, if one of those 21@cross{}32 parts uses the sign bit.

7638

7639

Generally limbs want to be treated as unsigned, but on some CPUs floating

7640

point conversions only treat integers as signed. Copying through a zero

7641

extended memory region or testing and adjusting for a sign bit may be

7642

necessary.

7643

7644

Currently floating point FFTs aren't used for large multiplications. On some

7645

processors they probably have a good chance of being worthwhile, if great care

7646

is taken with precision control.

7647

7648

7649

@node Assembler SIMD Instructions, Assembler Software Pipelining, Assembler Floating Point, Assembler Coding

7650

@subsection SIMD Instructions

7651

7652

The single-instruction multiple-data support in current microprocessors is

7653

aimed at signal processing algorithms where each data point can be treated

7654

more or less independently. There's generally not much support for

7655

propagating the sort of carries that arise in GMP.

7656

7657

SIMD multiplications of say four 16@cross{}16 bit multiplies only do as much

7658

work as one 32@cross{}32 from GMP's point of view, and need some shifts and

7659

adds besides. But of course if say the SIMD form is fully pipelined and uses

7660

less instruction decoding then it may still be worthwhile.

7661

7662

On the 80x86 chips, MMX has so far found a use in @code{mpn_rshift} and

7663

@code{mpn_lshift} since it allows 64-bit operations, and is used in a special

7664

case for 16-bit multipliers in the P55 @code{mpn_mul_1}. 3DNow and SSE

7665

haven't found a use so far.

7666

7667

7668

@node Assembler Software Pipelining, Assembler Loop Unrolling, Assembler SIMD Instructions, Assembler Coding

7669

@subsection Software Pipelining

7670

7671

Software pipelining consists of scheduling instructions around the branch

7672

point in a loop. For example a loop taking a checksum of an array of limbs

7673

might have a load and an add, but the load wouldn't be for that add, rather

7674

for the one next time around the loop. Each load then is effectively

7675

scheduled back in the previous iteration, allowing latency to be hidden.

7676

7677

Naturally this is wanted only when doing things like loads or multiplies that

7678

take a few cycles to complete, and only where a CPU has multiple functional

7679

units so that other work can be done while waiting.

7680

7681

A pipeline with several stages will have a data value in progress at each

7682

stage and each loop iteration moves them along one stage. This is like

7683

juggling.

7684

7685

Within the loop some moves between registers may be necessary to have the

7686

right values in the right places for each iteration. Loop unrolling can help

7687

this, with each unrolled block able to use different registers for different

7688

values, even if some shuffling is still needed just before going back to the

7689

top of the loop.

7690

7691

7692

@node Assembler Loop Unrolling, , Assembler Software Pipelining, Assembler Coding

7693

@subsection Loop Unrolling

7694

7695

Loop unrolling consists of replicating code so that several limbs are

7696

processed in each loop. At a minimum this reduces loop overheads by a

7697

corresponding factor, but it can also allow better register usage, for example

7698

alternately using one register combination and then another. Judicious use of

7699

@command{m4} macros can help avoid lots of duplication in the source code.

7700

7701

Unrolling is commonly done to a power of 2 multiple so the number of unrolled

7702

loops and the number of remaining limbs can be calculated with a shift and

7703

mask. But other multiples can be used too, just by subtracting each @var{n}

7704

limbs processed from a counter and waiting for less than @var{n} remaining (or

7705

offsetting the counter by @var{n} so it goes negative when there's less than

7706

@var{n} remaining).

7707

7708

The limbs not a multiple of the unrolling can be handled in various ways, for

7709

example

7710

7711

@itemize @bullet

7712

@item

7713

A simple loop at the end (or the start) to process the excess. Care will be

7714

wanted that it isn't too much slower than the unrolled part.

7715

7716

@item

7717

A set of binary tests, for example after an 8-limb unrolling, test for 4 more

7718

limbs to process, then a further 2 more or not, and finally 1 more or not.

7719

This will probably take more code space than a simple loop.

7720

7721

@item

7722

A @code{switch} statement, providing separate code for each possible excess,

7723

for example an 8-limb unrolling would have separate code for 0 remaining, 1

7724

remaining, etc, up to 7 remaining. This might take a lot of code, but may be

7725

the best way to optimize all cases in combination with a deep pipelined loop.

7726

7727

@item

7728

A computed jump into the middle of the loop, thus making the first iteration

7729

handle the excess. This should make times smoothly increase with size, which

7730

is attractive, but setups for the jump and adjustments for pointers can be

7731

tricky and could become quite difficult in combination with deep pipelining.

7732

@end itemize

7733

7734

One way to write the setups and finishups for a pipelined unrolled loop is

7735

simply to duplicate the loop at the start and the end, then delete

7736

instructions at the start which have no valid antecedents, and delete

7737

instructions at the end whose results are unwanted. Sizes not a multiple of

7738

the unrolling can then be handled as desired.

7739

7740

7741

@node Internals, Contributors, Algorithms, Top

7742

@chapter Internals

7743

7744

@strong{This chapter is provided only for informational purposes and the

7745

various internals described here may change in future GMP releases.

7746

Applications expecting to be compatible with future releases should use only

7747

the documented interfaces described in previous chapters.}

7748

7749

@menu

7750

* Integer Internals::

7751

* Rational Internals::

7752

* Float Internals::

7753

* Raw Output Internals::

7754

* C++ Interface Internals::

7755

@end menu

7756

7757

@node Integer Internals, Rational Internals, Internals, Internals

7758

@section Integer Internals

7759

7760

@code{mpz_t} variables represent integers using sign and magnitude, in space

7761

dynamically allocated and reallocated. The fields are as follows.

7762

7763

@table @asis

7764

@item @code{_mp_size}

7765

The number of limbs, or the negative of that when representing a negative

7766

integer. Zero is represented by @code{_mp_size} set to zero, in which case

7767

the @code{_mp_d} data is unused.

7768

7769

@item @code{_mp_d}

7770

A pointer to an array of limbs which is the magnitude. These are stored

7771

``little endian'' as per the @code{mpn} functions, so @code{_mp_d[0]} is the

7772

least significant limb and @code{_mp_d[ABS(_mp_size)-1]} is the most

7773

significant. Whenever @code{_mp_size} is non-zero, the most significant limb

7774

is non-zero.

7775

7776

Currently there's always at least one limb allocated, so for instance

7777

@code{mpz_set_ui} never needs to reallocate, and @code{mpz_get_ui} can fetch

7778

@code{_mp_d[0]} unconditionally (though its value is then only wanted if

7779

@code{_mp_size} is non-zero).

7780

7781

@item @code{_mp_alloc}

7782

@code{_mp_alloc} is the number of limbs currently allocated at @code{_mp_d},

7783

and naturally @code{_mp_alloc >= ABS(_mp_size)}. When an @code{mpz} routine

7784

is about to (or might be about to) increase @code{_mp_size}, it checks

7785

@code{_mp_alloc} to see whether there's enough space, and reallocates if not.

7786

@code{MPZ_REALLOC} is generally used for this.

7787

@end table

7788

7789

The various bitwise logical functions like @code{mpz_and} behave as if

7790

negative values were twos complement. But sign and magnitude is always used

7791

internally, and necessary adjustments are made during the calculations.

7792

Sometimes this isn't pretty, but sign and magnitude are best for other

7793

routines.

7794

7795

Some internal temporary variables are setup with @code{MPZ_TMP_INIT} and these

7796

have @code{_mp_d} space obtained from @code{TMP_ALLOC} rather than the memory

7797

allocation functions. Care is taken to ensure that these are big enough that

7798

no reallocation is necessary (since it would have unpredictable consequences).

7799

7800

7801

@node Rational Internals, Float Internals, Integer Internals, Internals

7802

@section Rational Internals

7803

7804

@code{mpq_t} variables represent rationals using an @code{mpz_t} numerator and

7805

denominator (@pxref{Integer Internals}).

7806

7807

The canonical form adopted is denominator positive (and non-zero), no common

7808

factors between numerator and denominator, and zero uniquely represented as

7809

0/1.

7810

7811

It's believed that casting out common factors at each stage of a calculation

7812

is best in general. A GCD is an @ma{O(N^2)} operation so it's better to do a

7813

few small ones immediately than to delay and have to do a big one later.

7814

Knowing the numerator and denominator have no common factors can be used for

7815

example in @code{mpq_mul} to make only two cross GCDs necessary, not four.

7816

7817

This general approach to common factors is badly sub-optimal in the presence

7818

of simple factorizations or little prospect for cancellation, but GMP has no

7819

way to know when this will occur. As per @ref{Efficiency}, that's left to

7820

applications. The @code{mpq_t} framework might still suit, with

7821

@code{mpq_numref} and @code{mpq_denref} for direct access to the numerator and

7822

denominator, or of course @code{mpz_t} variables can be used directly.

7823

7824

7825

@node Float Internals, Raw Output Internals, Rational Internals, Internals

7826

@section Float Internals

7827

7828

Efficient calculation is the primary aim of GMP floats and the use of whole

7829

limbs and simple rounding facilitates this.

7830

7831

@code{mpf_t} floats have a variable precision mantissa and a single machine

7832

word signed exponent. The mantissa is represented using sign and magnitude.

7833

7834

@c FIXME: The arrow heads don't join to the lines exactly.

7835

@tex

7836

\global\newdimen\GMPboxwidth \GMPboxwidth=5em

7837

\global\newdimen\GMPboxheight \GMPboxheight=3ex

7838

\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}

7839

\GMPdisplay{%

7840

\vbox{%

7841

\hbox to 5\GMPboxwidth {most significant limb \hfil least significant limb}

7842

\vskip 0.7ex

7843

\def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}

7844

\hbox {

7845

\hbox to 3\GMPboxwidth {%

7846

\setbox 0 = \hbox{@code{\_mp\_exp}}%

7847

\dimen0=3\GMPboxwidth

7848

\advance\dimen0 by -\wd0

7849

\divide\dimen0 by 2

7850

\advance\dimen0 by -1em

7851

\setbox1 = \hbox{$\rightarrow$}%

7852

\dimen1=\dimen0

7853

\advance\dimen1 by -\wd1

7854

\GMPcentreline{\dimen0}%

7855

\hfil

7856

\box0%

7857

\hfil

7858

\GMPcentreline{\dimen1{}}%

7859

\box1}

7860

\hbox to 2\GMPboxwidth {\hfil @code{\_mp\_d}}}

7861

\vskip 0.5ex

7862

\vbox {%

7863

\hrule

7864

\hbox{%

7865

\vrule height 2ex depth 1ex

7866

\hbox to \GMPboxwidth {}%

7867

\vrule

7868

\hbox to \GMPboxwidth {}%

7869

\vrule

7870

\hbox to \GMPboxwidth {}%

7871

\vrule

7872

\hbox to \GMPboxwidth {}%

7873

\vrule

7874

\hbox to \GMPboxwidth {}%

7875

\vrule}

7876

\hrule

7877

}

7878

\hbox {%

7879

\hbox to 0.8 pt {}

7880

\hbox to 3\GMPboxwidth {%

7881

\hfil $\cdot$} \hbox {$\leftarrow$ radix point\hfil}}

7882

\hbox to 5\GMPboxwidth{%

7883

\setbox 0 = \hbox{@code{\_mp\_size}}%

7884

\dimen0 = 5\GMPboxwidth

7885

\advance\dimen0 by -\wd0

7886

\divide\dimen0 by 2

7887

\advance\dimen0 by -1em

7888

\dimen1 = \dimen0

7889

\setbox1 = \hbox{$\leftarrow$}%

7890

\setbox2 = \hbox{$\rightarrow$}%

7891

\advance\dimen0 by -\wd1

7892

\advance\dimen1 by -\wd2

7893

\hbox to 0.3 em {}%

7894

\box1

7895

\GMPcentreline{\dimen0}%

7896

\hfil

7897

\box0

7898

\hfil

7899

\GMPcentreline{\dimen1}%

7900

\box2}

7901

}}

7902

@end tex

7903

@ifnottex

7904

@example

7905

most least

7906

significant significant

7907

limb limb

7908

7909

_mp_d

7910

|---- _mp_exp ---> |

7911

_____ _____ _____ _____ _____

7912

|_____|_____|_____|_____|_____|

7913

. <------------ radix point

7914

7915

<-------- _mp_size --------->

7916

@sp 1

7917

@end example

7918

@end ifnottex

7919

7920

@noindent

7921

The fields are as follows.

7922

7923

@table @asis

7924

@item @code{_mp_size}

7925

The number of limbs currently in use, or the negative of that when

7926

representing a negative value. Zero is represented by @code{_mp_size} and

7927

@code{_mp_exp} both set to zero, and in that case the @code{_mp_d} data is

7928

unused. (In the future @code{_mp_exp} might be undefined when representing

7929

zero.)

7930

7931

@item @code{_mp_prec}

7932

The precision of the mantissa, in limbs. In any calculation the aim is to

7933

produce @code{_mp_prec} limbs of result (the most significant being non-zero).

7934

7935

@item @code{_mp_d}

7936

A pointer to the array of limbs which is the absolute value of the mantissa.

7937

These are stored ``little endian'' as per the @code{mpn} functions, so

7938

@code{_mp_d[0]} is the least significant limb and

7939

@code{_mp_d[ABS(_mp_size)-1]} the most significant.

7940

7941

The most significant limb is always non-zero, but there are no other

7942

restrictions on its value, in particular the highest 1 bit can be anywhere

7943

within the limb.

7944

7945

@code{_mp_prec+1} limbs are allocated to @code{_mp_d}, the extra limb being

7946

for convenience (see below). There are no reallocations during a calculation,

7947

only in a change of precision with @code{mpf_set_prec}.

7948

7949

@item @code{_mp_exp}

7950

The exponent, in limbs, determining the location of the implied radix point.

7951

Zero means the radix point is just above the most significant limb. Positive

7952

values mean a radix point offset towards the lower limbs and hence a value

7953

@ma{@ge{} 1}, as for example in the diagram above. Negative exponents mean a

7954

radix point further above the highest limb.

7955

7956

Naturally the exponent can be any value, it doesn't have to fall within the

7957

limbs as the diagram shows, it can be a long way above or a long way below.

7958

Limbs other than those included in the @code{@{_mp_d,_mp_size@}} data

7959

are treated as zero.

7960

@end table

7961

7962

@sp 1

7963

@noindent

7964

The following various points should be noted.

7965

7966

@table @asis

7967

@item Low Zeros

7968

The least significant limbs @code{_mp_d[0]} etc can be zero, though such low

7969

zeros can always be ignored. Routines likely to produce low zeros check and

7970

avoid them to save time in subsequent calculations, but for most routines

7971

they're quite unlikely and aren't checked.

7972

7973

@item Mantissa Size Range

7974

The @code{_mp_size} count of limbs in use can be less than @code{_mp_prec} if

7975

the value can be represented in less. This means low precision values or

7976

small integers stored in a high precision @code{mpf_t} can still be operated

7977

on efficiently.

7978

7979

@code{_mp_size} can also be greater than @code{_mp_prec}. Firstly a value is

7980

allowed to use all of the @code{_mp_prec+1} limbs available at @code{_mp_d},

7981

and secondly when @code{mpf_set_prec_raw} lowers @code{_mp_prec} it leaves

7982

@code{_mp_size} unchanged and so the size can be arbitrarily bigger than

7983

@code{_mp_prec}.

7984

7985

@item Rounding

7986

All rounding is done on limb boundaries. Calculating @code{_mp_prec} limbs

7987

with the high non-zero will ensure the application requested minimum precision

7988

is obtained.

7989

7990

The use of simple ``trunc'' rounding towards zero is efficient, since there's

7991

no need to examine extra limbs and increment or decrement.

7992

7993

@item Bit Shifts

7994

Since the exponent is in limbs, there are no bit shifts in basic operations

7995

like @code{mpf_add} and @code{mpf_mul}. When differing exponents are

7996

encountered all that's needed is to adjust pointers to line up the relevant

7997

limbs.

7998

7999

Of course @code{mpf_mul_2exp} and @code{mpf_div_2exp} will require bit shifts,

8000

but the choice is between an exponent in limbs which requires shifts there, or

8001

one in bits which requires them almost everywhere else.

8002

8003

@item Use of @code{_mp_prec+1} Limbs

8004

The extra limb on @code{_mp_d} (@code{_mp_prec+1} rather than just

8005

@code{_mp_prec}) helps when an @code{mpf} routine might get a carry from its

8006

operation. @code{mpf_add} for instance will do an @code{mpn_add} of

8007

@code{_mp_prec} limbs. If there's no carry then that's the result, but if

8008

there is a carry then it's stored in the extra limb of space and

8009

@code{_mp_size} becomes @code{_mp_prec+1}.

8010

8011

Whenever @code{_mp_prec+1} limbs are held in a variable, the low limb is not

8012

needed for the intended precision, only the @code{_mp_prec} high limbs. But

8013

zeroing it out or moving the rest down is unnecessary. Subsequent routines

8014

reading the value will simply take the high limbs they need, and this will be

8015

@code{_mp_prec} if their target has that same precision. This is no more than

8016

a pointer adjustment, and must be checked anyway since the destination

8017

precision can be different from the sources.

8018

8019

Copy functions like @code{mpf_set} will retain a full @code{_mp_prec+1} limbs

8020

if available. This ensures that a variable which has @code{_mp_size} equal to

8021

@code{_mp_prec+1} will get its full exact value copied. Strictly speaking

8022

this is unnecessary since only @code{_mp_prec} limbs are needed for the

8023

application's requested precision, but it's considered that an @code{mpf_set}

8024

from one variable into another of the same precision ought to produce an exact

8025

copy.

8026

8027

@item Application Precisions

8028

@code{__GMPF_BITS_TO_PREC} converts an application requested precision to an

8029

@code{_mp_prec}. The value in bits is rounded up to a whole limb then an

8030

extra limb is added since the most significant limb of @code{_mp_d} is only

8031

non-zero and therefore might contain only one bit.

8032

8033

@code{__GMPF_PREC_TO_BITS} does the reverse conversion, and removes the extra

8034

limb from @code{_mp_prec} before converting to bits. The net effect of

8035

reading back with @code{mpf_get_prec} is simply the precision rounded up to a

8036

multiple of @code{mp_bits_per_limb}.

8037

8038

Note that the extra limb added here for the high only being non-zero is in

8039

addition to the extra limb allocated to @code{_mp_d}. For example with a

8040

32-bit limb, an application request for 250 bits will be rounded up to 8

8041

limbs, then an extra added for the high being only non-zero, giving an

8042

@code{_mp_prec} of 9. @code{_mp_d} then gets 10 limbs allocated. Reading

8043

back with @code{mpf_get_prec} will take @code{_mp_prec} subtract 1 limb and

8044

multiply by 32, giving 256 bits.

8045

8046

Strictly speaking, the fact the high limb has at least one bit means that a

8047

float with, say, 3 limbs of 32-bits each will be holding at least 65 bits, but

8048

for the purposes of @code{mpf_t} it's considered simply to be 64 bits, a nice

8049

multiple of the limb size.

8050

@end table

8051

8052

8053

@node Raw Output Internals, C++ Interface Internals, Float Internals, Internals

8054

@section Raw Output Internals

8055

8056

@noindent

8057

@code{mpz_out_raw} uses the following format.

8058

8059

@tex

8060

\global\newdimen\GMPboxwidth \GMPboxwidth=5em

8061

\global\newdimen\GMPboxheight \GMPboxheight=3ex

8062

\def\centreline{\hbox{\raise 0.8ex \vbox{\hrule \hbox{\hfil}}}}

8063

\GMPdisplay{%

8064

\vbox{%

8065

\def\GMPcentreline#1{\hbox{\raise 0.5 ex \vbox{\hrule \hbox to #1 {}}}}

8066

\vbox {%

8067

\hrule

8068

\hbox{%

8069

\vrule height 2.5ex depth 1.5ex

8070

\hbox to \GMPboxwidth {\hfil size\hfil}%

8071

\vrule

8072

\hbox to 3\GMPboxwidth {\hfil data bytes\hfil}%

8073

\vrule}

8074

\hrule}

8075

}}

8076

@end tex

8077

@ifnottex

8078

@example

8079

+------+------------------------+

8080

| size | data bytes |

8081

+------+------------------------+

8082

@end example

8083

@end ifnottex

8084

8085

The size is 4 bytes written most significant byte first, being the number of

8086

subsequent data bytes, or the twos complement negative of that when a negative

8087

integer is represented. The data bytes are the absolute value of the integer,

8088

written most significant byte first.

8089

8090

The most significant data byte is always non-zero, so the output is the same

8091

on all systems, irrespective of limb size.

8092

8093

In GMP 1, leading zero bytes were written to pad the data bytes to a multiple

8094

of the limb size. @code{mpz_inp_raw} will still accept this, for

8095

compatibility.

8096

8097

The use of ``big endian'' for both the size and data fields is deliberate, it

8098

makes the data easy to read in a hex dump of a file. Unfortunately it also

8099

means that the limb data must be reversed when reading or writing, so neither

8100

a big endian nor little endian system can just read and write @code{_mp_d}.

8101

8102

8103

@node C++ Interface Internals, , Raw Output Internals, Internals

8104

@section C++ Interface Internals

8105

8106

A system of expression templates is used to ensure something like @code{a=b+c}

8107

turns into a simple call to @code{mpz_add} etc. For @code{mpf_class} and

8108

@code{mpfr_class} the scheme also ensures the precision of the final

8109

destination is used for any temporaries within a statement like

8110

@code{f=w*x+y*z}. These are important features which a naive implementation

8111

cannot provide.

8112

8113

A simplified description of the scheme follows. The true scheme is

8114

complicated by the fact that expressions have different return types. For

8115

detailed information, refer to the source code.

8116

8117

To perform an operation, say, addition, we first define a ``function object''

8118

evaluating it,

8119

8120

@example

8121

struct __gmp_binary_plus

8122

8123

static void eval(mpf_t f, mpf_t g, mpf_t h) @{ mpf_add(f, g, h); @}

8124

@};

8125

@end example

8126

8127

@noindent

8128

And an ``additive expression'' object,

8129

8130

@example

8131

__gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >

8132

operator+(const mpf_class &f, const mpf_class &g)

8133

8134

return __gmp_expr

8135

<__gmp_binary_expr<mpf_class, mpf_class, __gmp_binary_plus> >(f, g);

8136

8137

@end example

8138

8139

The seemingly redundant @code{__gmp_expr<__gmp_binary_expr<...>>} is used to

8140

encapsulate any possible kind of expression into a single template type. In

8141

fact even @code{mpf_class} etc are @code{typedef} specializations of

8142

@code{__gmp_expr}.

8143

8144

Next we define assignment of @code{__gmp_expr} to @code{mpf_class}.

8145

8146

@example

8147

template <class T>

8148

mpf_class & mpf_class::operator=(const __gmp_expr<T> &expr)

8149

8150

expr.eval(this->get_mpf_t(), this->precision());

8151

return *this;

8152

8153

8154

template <class Op>

8155

void __gmp_expr<__gmp_binary_expr<mpf_class, mpf_class, Op> >::eval

8156

(mpf_t f, unsigned long int precision)

8157

8158

Op::eval(f, expr.val1.get_mpf_t(), expr.val2.get_mpf_t());

8159

8160

@end example

8161

8162

where @code{expr.val1} and @code{expr.val2} are references to the expression's

8163

operands (here @code{expr} is the @code{__gmp_binary_expr} stored within the

8164

@code{__gmp_expr}).

8165

8166

This way, the expression is actually evaluated only at the time of assignment,

8167

when the required precision (that of @code{f}) is known. Furthermore the

8168

target @code{mpf_t} is now available, thus we can call @code{mpf_add} directly

8169

with @code{f} as the output argument.

8170

8171

Compound expressions are handled by defining operators taking subexpressions

8172

as their arguments, like this:

8173

8174

@example

8175

template <class T, class U>

8176

__gmp_expr

8177

<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >

8178

operator+(const __gmp_expr<T> &expr1, const __gmp_expr<U> &expr2)

8179

8180

return __gmp_expr

8181

<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, __gmp_binary_plus> >

8182

(expr1, expr2);

8183

8184

@end example

8185

8186

And the corresponding specializations of @code{__gmp_expr::eval}:

8187

8188

@example

8189

template <class T, class U, class Op>

8190

void __gmp_expr

8191

<__gmp_binary_expr<__gmp_expr<T>, __gmp_expr<U>, Op> >::eval

8192

(mpf_t f, unsigned long int precision)

8193

8194

// declare two temporaries

8195

mpf_class temp1(expr.val1, precision), temp2(expr.val2, precision);

8196

Op::eval(f, temp1.get_mpf_t(), temp2.get_mpf_t());

8197

8198

@end example

8199

8200

The expression is thus recursively evaluated to any level of complexity and

8201

all subexpressions are evaluated to the precision of @code{f}.

8202

8203

8204

@node Contributors, References, Internals, Top

8205

@comment node-name, next, previous, up

8206

@appendix Contributors

8207

@cindex Contributors

8208

8209

Torbjorn Granlund wrote the original GMP library and is still developing and

8210

maintaining it. Several other individuals and organizations have contributed

8211

to GMP in various ways. Here is a list in chronological order:

8212

8213

Gunnar Sjoedin and Hans Riesel helped with mathematical problems in early

8214

versions of the library.

8215

8216

Richard Stallman contributed to the interface design and revised the first

8217

version of this manual.

8218

8219

Brian Beuning and Doug Lea helped with testing of early versions of the

8220

library and made creative suggestions.

8221

8222

John Amanatides of York University in Canada contributed the function

8223

@code{mpz_probab_prime_p}.

8224

8225

Paul Zimmermann of Inria sparked the development of GMP 2, with his

8226

comparisons between bignum packages.

8227

8228

Ken Weber (Kent State University, Universidade Federal do Rio Grande do Sul)

8229

contributed @code{mpz_gcd}, @code{mpz_divexact}, @code{mpn_gcd}, and

8230

@code{mpn_bdivmod}, partially supported by CNPq (Brazil) grant 301314194-2.

8231

8232

Per Bothner of Cygnus Support helped to set up GMP to use Cygnus' configure.

8233

He has also made valuable suggestions and tested numerous intermediary

8234

releases.

8235

8236

Joachim Hollman was involved in the design of the @code{mpf} interface, and in

8237

the @code{mpz} design revisions for version 2.

8238

8239

Bennet Yee contributed the initial versions of @code{mpz_jacobi} and

8240

@code{mpz_legendre}.

8241

8242

Andreas Schwab contributed the files @file{mpn/m68k/lshift.S} and

8243

@file{mpn/m68k/rshift.S} (now in @file{.asm} form).

8244

8245

The development of floating point functions of GNU MP 2, were supported in part

8246

by the ESPRIT-BRA (Basic Research Activities) 6846 project POSSO (POlynomial

8247

System SOlving).

8248

8249

GNU MP 2 was finished and released by SWOX AB, SWEDEN, in cooperation with the

8250

IDA Center for Computing Sciences, USA.

8251

8252

Robert Harley of Inria, France and David Seal of ARM, England, suggested clever

8253

improvements for population count.

8254

8255

Robert Harley also wrote highly optimized Karatsuba and 3-way Toom

8256

multiplication functions for GMP 3. He also contributed the ARM assembly

8257

code.

8258

8259

Torsten Ekedahl of the Mathematical department of Stockholm University provided

8260

significant inspiration during several phases of the GMP development. His

8261

mathematical expertise helped improve several algorithms.

8262

8263

Paul Zimmermann wrote the Divide and Conquer division code, the REDC code, the

8264

REDC-based mpz_powm code, the FFT multiply code, and the Karatsuba square

8265

root. The ECMNET project Paul is organizing was a driving force behind many

8266

of the optimizations in GMP 3.

8267

8268

Linus Nordberg wrote the new configure system based on autoconf and

8269

implemented the new random functions.

8270

8271

Kent Boortz made the Macintosh port.

8272

8273

Kevin Ryde worked on a number of things: optimized x86 code, m4 asm macros,

8274

parameter tuning, speed measuring, the configure system, function inlining,

8275

divisibility tests, bit scanning, Jacobi symbols, Fibonacci and Lucas number

8276

functions, printf and scanf functions, perl interface, demo expression parser,

8277

the algorithms chapter in the manual, gmpasm-mode.el, and various

8278

miscellaneous improvements elsewhere.

8279

8280

Steve Root helped write the optimized alpha 21264 assembly code.

8281

8282

Gerardo Ballabio wrote the @file{gmpxx.h} C++ class interface and the C++

8283

istream input routines.

8284

8285

GNU MP 4.0.1 was finished and released by Torbjorn Granlund and Kevin Ryde.

8286

Torbjorn's work was partially funded by the IDA Center for Computing Sciences,

8287

USA.

8288

8289

(This list is chronological, not ordered after significance. If you have

8290

contributed to GMP but are not listed above, please tell @email{tege@@swox.com}

8291

about the omission!)

8292

8293

8294

@node References, GNU Free Documentation License, Contributors, Top

8295

@comment node-name, next, previous, up

8296

@appendix References

8297

@cindex References

8298

8299

@c FIXME: In tex, the @uref's are unhyphenated, which is good for clarity,

8300

@c but being long words they upset paragraph formatting (the preceding line

8301

@c can get badly stretched). Would like an conditional @* style line break

8302

@c if the uref is too long to fit on the last line of the paragraph, but it's

8303

@c not clear how to do that. For now explicit @texlinebreak{}s are used on

8304

@c paragraphs that come out bad.

8305

8306

@section Books

8307

8308

@itemize @bullet

8309

@item

8310

Henri Cohen, ``A Course in Computational Algebraic Number Theory'', Graduate

8311

Texts in Mathematics number 138, Springer-Verlag, 1993.

8312

@texlinebreak{} @uref{http://www.math.u-bordeaux.fr/~cohen}

8313

8314

@item

8315

Donald E. Knuth, ``The Art of Computer Programming'', volume 2,

8316

``Seminumerical Algorithms'', 3rd edition, Addison-Wesley, 1998.

8317

@texlinebreak{} @uref{http://www-cs-faculty.stanford.edu/~knuth/taocp.html}

8318

8319

@item

8320

John D. Lipson, ``Elements of Algebra and Algebraic Computing'',

8321

The Benjamin Cummings Publishing Company Inc, 1981.

8322

8323

@item

8324

Alfred J. Menezes, Paul C. van Oorschot and Scott A. Vanstone, ``Handbook of

8325

Applied Cryptography'', @uref{http://www.cacr.math.uwaterloo.ca/hac/}

8326

8327

@item

8328

Richard M. Stallman, ``Using and Porting GCC'', Free Software Foundation, 1999,

8329

available online @uref{http://www.gnu.org/software/gcc/onlinedocs/}, and in

8330

the GCC package @uref{ftp://ftp.gnu.org/gnu/gcc/}

8331

@end itemize

8332

8333

@section Papers

8334

8335

@itemize @bullet

8336

@item

8337

Christoph Burnikel and Joachim Ziegler, ``Fast Recursive Division'',

8338

Max-Planck-Institut fuer Informatik Research Report MPI-I-98-1-022, @texlinebreak{}

8339

@uref{http://www.mpi-sb.mpg.de/~ziegler/TechRep.ps.gz}

8340

8341

@item

8342

Torbjorn Granlund and Peter L. Montgomery, ``Division by Invariant Integers

8343

using Multiplication'', in Proceedings of the SIGPLAN PLDI'94 Conference, June

8344

1994. Also available @uref{ftp://ftp.cwi.nl/pub/pmontgom/divcnst.psa4.gz}

8345

(and .psl.gz).

8346

8347

@item

8348

Peter L. Montgomery, ``Modular Multiplication Without Trial Division'', in

8349

Mathematics of Computation, volume 44, number 170, April 1985.

8350

8351

@item

8352

Tudor Jebelean,

8353

``An algorithm for exact division'',

8354

Journal of Symbolic Computation,

8355

volume 15, 1993, pp. 169-180.

8356

Research report version available @texlinebreak{}

8357

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-35.ps.gz}

8358

8359

@item

8360

Tudor Jebelean, ``Exact Division with Karatsuba Complexity - Extended

8361

Abstract'', RISC-Linz technical report 96-31, @texlinebreak{}

8362

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-31.ps.gz}

8363

8364

@item

8365

Tudor Jebelean, ``Practical Integer Division with Karatsuba Complexity'',

8366

ISSAC 97, pp. 339-341. Technical report available @texlinebreak{}

8367

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1996/96-29.ps.gz}

8368

8369

@item

8370

Tudor Jebelean, ``A Generalization of the Binary GCD Algorithm'', ISSAC 93,

8371

pp. 111-116. Technical report version available @texlinebreak{}

8372

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1993/93-01.ps.gz}

8373

8374

@item

8375

Tudor Jebelean, ``A Double-Digit Lehmer-Euclid Algorithm for Finding the GCD

8376

of Long Integers'', Journal of Symbolic Computation, volume 19, 1995,

8377

pp. 145-157. Technical report version also available @texlinebreak{}

8378

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1992/92-69.ps.gz}

8379

8380

@item

8381

Werner Krandick and Tudor Jebelean, ``Bidirectional Exact Integer Division'',

8382

Journal of Symbolic Computation, volume 21, 1996, pp. 441-455. Early

8383

technical report version also available

8384

@uref{ftp://ftp.risc.uni-linz.ac.at/pub/techreports/1994/94-50.ps.gz}

8385

8386

@item

8387

R. Moenck and A. Borodin, ``Fast Modular Transforms via Division'',

8388

Proceedings of the 13th Annual IEEE Symposium on Switching and Automata

8389

Theory, October 1972, pp. 90-96. Reprinted as ``Fast Modular Transforms'',

8390

Journal of Computer and System Sciences, volume 8, number 3, June 1974,

8391

pp. 366-386.

8392

8393

@item

8394

Arnold Sch@"onhage and Volker Strassen, ``Schnelle Multiplikation grosser

8395

Zahlen'', Computing 7, 1971, pp. 281-292.

8396

8397

@item

8398

Kenneth Weber, ``The accelerated integer GCD algorithm'',

8399

ACM Transactions on Mathematical Software,

8400

volume 21, number 1, March 1995, pp. 111-122.

8401

8402

@item

8403

Paul Zimmermann, ``Karatsuba Square Root'', INRIA Research Report 3805,

8404

November 1999, @uref{http://www.inria.fr/RRRT/RR-3805.html}

8405

8406

@item

8407

Paul Zimmermann, ``A Proof of GMP Fast Division and Square Root

8408

Implementations'', @texlinebreak{}

8409

@uref{http://www.loria.fr/~zimmerma/papers/proof-div-sqrt.ps.gz}

8410

8411

@item

8412

Dan Zuras, ``On Squaring and Multiplying Large Integers'', ARITH-11: IEEE

8413

Symposium on Computer Arithmetic, 1993, pp. 260 to 271. Reprinted as ``More

8414

on Multiplying and Squaring Large Integers'', IEEE Transactions on Computers,

8415

volume 43, number 8, August 1994, pp. 899-908.

8416

@end itemize

8417

8418

8419

@node GNU Free Documentation License, Concept Index, References, Top

8420

@appendix GNU Free Documentation License

8421

@cindex GNU Free Documentation License

8422

@include fdl.texi

8423

8424

8425

@node Concept Index, Function Index, GNU Free Documentation License, Top

8426

@comment node-name, next, previous, up

8427

@unnumbered Concept Index

8428

@printindex cp

8429

8430

@node Function Index, , Concept Index, Top

8431

@comment node-name, next, previous, up

8432

@unnumbered Function and Type Index

8433

@printindex fn

8434

8435

@bye

8436

8437

@c Local variables:

8438

@c fill-column: 78

8439

@c compile-command: "make gmp.info"

8440

@c End:

Older »