1
c-----------------------------------------------------------------------
7
c Reverse communication interface for applying NP additional steps to
8
c a K step nonsymmetric Arnoldi factorization.
10
c Input: OP*V_{k} - V_{k}*H = r_{k}*e_{k}^T
12
c with (V_{k}^T)*B*V_{k} = I, (V_{k}^T)*B*r_{k} = 0.
14
c Output: OP*V_{k+p} - V_{k+p}*H = r_{k+p}*e_{k+p}^T
16
c with (V_{k+p}^T)*B*V_{k+p} = I, (V_{k+p}^T)*B*r_{k+p} = 0.
18
c where OP and B are as in snaupd. The B-norm of r_{k+p} is also
19
c computed and returned.
23
c ( IDO, BMAT, N, K, NP, NB, RESID, RNORM, V, LDV, H, LDH,
24
c IPNTR, WORKD, INFO )
27
c IDO Integer. (INPUT/OUTPUT)
28
c Reverse communication flag.
29
c -------------------------------------------------------------
30
c IDO = 0: first call to the reverse communication interface
31
c IDO = -1: compute Y = OP * X where
32
c IPNTR(1) is the pointer into WORK for X,
33
c IPNTR(2) is the pointer into WORK for Y.
34
c This is for the restart phase to force the new
35
c starting vector into the range of OP.
36
c IDO = 1: compute Y = OP * X where
37
c IPNTR(1) is the pointer into WORK for X,
38
c IPNTR(2) is the pointer into WORK for Y,
39
c IPNTR(3) is the pointer into WORK for B * X.
40
c IDO = 2: compute Y = B * X where
41
c IPNTR(1) is the pointer into WORK for X,
42
c IPNTR(2) is the pointer into WORK for Y.
44
c -------------------------------------------------------------
45
c When the routine is used in the "shift-and-invert" mode, the
46
c vector B * Q is already available and do not need to be
47
c recompute in forming OP * Q.
49
c BMAT Character*1. (INPUT)
50
c BMAT specifies the type of the matrix B that defines the
51
c semi-inner product for the operator OP. See snaupd.
52
c B = 'I' -> standard eigenvalue problem A*x = lambda*x
53
c B = 'G' -> generalized eigenvalue problem A*x = lambda*M**x
56
c Dimension of the eigenproblem.
59
c Current size of V and H.
62
c Number of additional Arnoldi steps to take.
65
c Blocksize to be used in the recurrence.
66
c Only work for NB = 1 right now. The goal is to have a
67
c program that implement both the block and non-block method.
69
c RESID Real array of length N. (INPUT/OUTPUT)
70
c On INPUT: RESID contains the residual vector r_{k}.
71
c On OUTPUT: RESID contains the residual vector r_{k+p}.
73
c RNORM Real scalar. (INPUT/OUTPUT)
74
c B-norm of the starting residual on input.
75
c B-norm of the updated residual r_{k+p} on output.
77
c V Real N by K+NP array. (INPUT/OUTPUT)
78
c On INPUT: V contains the Arnoldi vectors in the first K
80
c On OUTPUT: V contains the new NP Arnoldi vectors in the next
81
c NP columns. The first K columns are unchanged.
83
c LDV Integer. (INPUT)
84
c Leading dimension of V exactly as declared in the calling
87
c H Real (K+NP) by (K+NP) array. (INPUT/OUTPUT)
88
c H is used to store the generated upper Hessenberg matrix.
90
c LDH Integer. (INPUT)
91
c Leading dimension of H exactly as declared in the calling
94
c IPNTR Integer array of length 3. (OUTPUT)
95
c Pointer to mark the starting locations in the WORK for
96
c vectors used by the Arnoldi iteration.
97
c -------------------------------------------------------------
98
c IPNTR(1): pointer to the current operand vector X.
99
c IPNTR(2): pointer to the current result vector Y.
100
c IPNTR(3): pointer to the vector B * X when used in the
101
c shift-and-invert mode. X is the current operand.
102
c -------------------------------------------------------------
104
c WORKD Real work array of length 3*N. (REVERSE COMMUNICATION)
105
c Distributed array to be used in the basic Arnoldi iteration
106
c for reverse communication. The calling program should not
107
c use WORKD as temporary workspace during the iteration !!!!!!
108
c On input, WORKD(1:N) = B*RESID and is used to save some
109
c computation at the first step.
111
c INFO Integer. (OUTPUT)
113
c > 0: Size of the spanning invariant subspace of OP found.
117
c-----------------------------------------------------------------------
125
c 1. D.C. Sorensen, "Implicit Application of Polynomial Filters in
126
c a k-Step Arnoldi Method", SIAM J. Matr. Anal. Apps., 13 (1992),
128
c 2. R.B. Lehoucq, "Analysis and Implementation of an Implicitly
129
c Restarted Arnoldi Iteration", Rice University Technical Report
130
c TR95-13, Department of Computational and Applied Mathematics.
133
c sgetv0 ARPACK routine to generate the initial vector.
134
c ivout ARPACK utility routine that prints integers.
135
c second ARPACK utility routine for timing.
136
c smout ARPACK utility routine that prints matrices
137
c svout ARPACK utility routine that prints vectors.
138
c slabad LAPACK routine that computes machine constants.
139
c slamch LAPACK routine that determines machine constants.
140
c slascl LAPACK routine for careful scaling of a matrix.
141
c slanhs LAPACK routine that computes various norms of a matrix.
142
c sgemv Level 2 BLAS routine for matrix vector multiplication.
143
c saxpy Level 1 BLAS that computes a vector triad.
144
c sscal Level 1 BLAS that scales a vector.
145
c scopy Level 1 BLAS that copies one vector to another .
146
c sdot Level 1 BLAS that computes the scalar product of two vectors.
147
c snrm2 Level 1 BLAS that computes the norm of a vector.
150
c Danny Sorensen Phuong Vu
151
c Richard Lehoucq CRPC / Rice University
152
c Dept. of Computational & Houston, Texas
153
c Applied Mathematics
158
c xx/xx/92: Version ' 2.4'
160
c\SCCS Information: @(#)
161
c FILE: naitr.F SID: 2.4 DATE OF SID: 8/27/96 RELEASE: 2
164
c The algorithm implemented is:
167
c Given V_{k} = [v_{1}, ..., v_{k}], r_{k};
168
c r_{k} contains the initial residual vector even for k = 0;
169
c Also assume that rnorm = || B*r_{k} || and B*r_{k} are already
170
c computed by the calling program.
172
c betaj = rnorm ; p_{k+1} = B*r_{k} ;
173
c For j = k+1, ..., k+np Do
174
c 1) if ( betaj < tol ) stop or restart depending on j.
175
c ( At present tol is zero )
176
c if ( restart ) generate a new starting vector.
177
c 2) v_{j} = r(j-1)/betaj; V_{j} = [V_{j-1}, v_{j}];
178
c p_{j} = p_{j}/betaj
179
c 3) r_{j} = OP*v_{j} where OP is defined as in snaupd
180
c For shift-invert mode p_{j} = B*v_{j} is already available.
181
c wnorm = || OP*v_{j} ||
182
c 4) Compute the j-th step residual vector.
183
c w_{j} = V_{j}^T * B * OP * v_{j}
184
c r_{j} = OP*v_{j} - V_{j} * w_{j}
187
c rnorm = || r_(j) ||
188
c If (rnorm > 0.717*wnorm) accept step and go back to 1)
189
c 5) Re-orthogonalization step:
191
c r_{j} = r_{j} - V_{j}*s; rnorm1 = || r_{j} ||
192
c alphaj = alphaj + s_{j};
193
c 6) Iterative refinement step:
194
c If (rnorm1 > 0.717*rnorm) then
196
c accept step and go back to 1)
199
c If this is the first time in step 6), go to 5)
200
c Else r_{j} lies in the span of V_{j} numerically.
201
c Set r_{j} = 0 and rnorm = 0; go to 1)
207
c-----------------------------------------------------------------------
210
& (ido, bmat, n, k, np, nb, resid, rnorm, v, ldv, h, ldh,
211
& ipntr, workd, info)
213
c %----------------------------------------------------%
214
c | Include files for debugging and timing information |
215
c %----------------------------------------------------%
220
c %------------------%
221
c | Scalar Arguments |
222
c %------------------%
225
integer ido, info, k, ldh, ldv, n, nb, np
229
c %-----------------%
230
c | Array Arguments |
231
c %-----------------%
235
& h(ldh,k+np), resid(n), v(ldv,k+np), workd(3*n)
243
parameter (one = 1.0E+0, zero = 0.0E+0)
249
logical first, orth1, orth2, rstart, step3, step4
250
integer ierr, i, infol, ipj, irj, ivj, iter, itry, j, msglvl,
253
& betaj, ovfl, temp1, rnorm1, smlnum, tst1, ulp, unfl,
255
save first, orth1, orth2, rstart, step3, step4,
256
& ierr, ipj, irj, ivj, iter, itry, j, msglvl, ovfl,
257
& betaj, rnorm1, smlnum, ulp, unfl, wnorm
259
c %-----------------------%
260
c | Local Array Arguments |
261
c %-----------------------%
266
c %----------------------%
267
c | External Subroutines |
268
c %----------------------%
270
external saxpy, scopy, sscal, sgemv, sgetv0, slabad,
271
& svout, smout, ivout, second
273
c %--------------------%
274
c | External Functions |
275
c %--------------------%
278
& sdot, snrm2, slanhs, slamch
279
external sdot, snrm2, slanhs, slamch
281
c %---------------------%
282
c | Intrinsic Functions |
283
c %---------------------%
287
c %-----------------%
288
c | Data statements |
289
c %-----------------%
291
data first / .true. /
293
c %-----------------------%
294
c | Executable Statements |
295
c %-----------------------%
299
c %-----------------------------------------%
300
c | Set machine-dependent constants for the |
301
c | the splitting and deflation criterion. |
302
c | If norm(H) <= sqrt(OVFL), |
303
c | overflow should not occur. |
304
c | REFERENCE: LAPACK subroutine slahqr |
305
c %-----------------------------------------%
307
unfl = slamch( 'safe minimum' )
309
call slabad( unfl, ovfl )
310
ulp = slamch( 'precision' )
311
smlnum = unfl*( n / ulp )
317
c %-------------------------------%
318
c | Initialize timing statistics |
319
c | & message level for debugging |
320
c %-------------------------------%
325
c %------------------------------%
326
c | Initial call to this routine |
327
c %------------------------------%
341
c %-------------------------------------------------%
342
c | When in reverse communication mode one of: |
343
c | STEP3, STEP4, ORTH1, ORTH2, RSTART |
344
c | will be .true. when .... |
345
c | STEP3: return from computing OP*v_{j}. |
346
c | STEP4: return from computing B-norm of OP*v_{j} |
347
c | ORTH1: return from computing B-norm of r_{j+1} |
348
c | ORTH2: return from computing B-norm of |
349
c | correction to the residual vector. |
350
c | RSTART: return from OP computations needed by |
352
c %-------------------------------------------------%
360
c %-----------------------------%
361
c | Else this is the first step |
362
c %-----------------------------%
364
c %--------------------------------------------------------------%
366
c | A R N O L D I I T E R A T I O N L O O P |
368
c | Note: B*r_{j-1} is already in WORKD(1:N)=WORKD(IPJ:IPJ+N-1) |
369
c %--------------------------------------------------------------%
373
if (msglvl .gt. 1) then
374
call ivout (logfil, 1, j, ndigit,
375
& '_naitr: generating Arnoldi vector number')
376
call svout (logfil, 1, rnorm, ndigit,
377
& '_naitr: B-norm of the current residual is')
380
c %---------------------------------------------------%
381
c | STEP 1: Check if the B norm of j-th residual |
382
c | vector is zero. Equivalent to determing whether |
383
c | an exact j-step Arnoldi factorization is present. |
384
c %---------------------------------------------------%
387
if (rnorm .gt. zero) go to 40
389
c %---------------------------------------------------%
390
c | Invariant subspace found, generate a new starting |
391
c | vector which is orthogonal to the current Arnoldi |
392
c | basis and continue the iteration. |
393
c %---------------------------------------------------%
395
if (msglvl .gt. 0) then
396
call ivout (logfil, 1, j, ndigit,
397
& '_naitr: ****** RESTART AT STEP ******')
400
c %---------------------------------------------%
401
c | ITRY is the loop variable that controls the |
402
c | maximum amount of times that a restart is |
403
c | attempted. NRSTRT is used by stat.h |
404
c %---------------------------------------------%
414
c %--------------------------------------%
415
c | If in reverse communication mode and |
416
c | RSTART = .true. flow returns here. |
417
c %--------------------------------------%
419
call sgetv0 (ido, bmat, itry, .false., n, j, v, ldv,
420
& resid, rnorm, ipntr, workd, ierr)
421
if (ido .ne. 99) go to 9000
422
if (ierr .lt. 0) then
424
if (itry .le. 3) go to 20
426
c %------------------------------------------------%
427
c | Give up after several restart attempts. |
428
c | Set INFO to the size of the invariant subspace |
429
c | which spans OP and exit. |
430
c %------------------------------------------------%
434
tnaitr = tnaitr + (t1 - t0)
441
c %---------------------------------------------------------%
442
c | STEP 2: v_{j} = r_{j-1}/rnorm and p_{j} = p_{j}/rnorm |
443
c | Note that p_{j} = B*r_{j-1}. In order to avoid overflow |
444
c | when reciprocating a small RNORM, test against lower |
446
c %---------------------------------------------------------%
448
call scopy (n, resid, 1, v(1,j), 1)
449
if (rnorm .ge. unfl) then
451
call sscal (n, temp1, v(1,j), 1)
452
call sscal (n, temp1, workd(ipj), 1)
455
c %-----------------------------------------%
456
c | To scale both v_{j} and p_{j} carefully |
457
c | use LAPACK routine SLASCL |
458
c %-----------------------------------------%
460
call slascl ('General', i, i, rnorm, one, n, 1,
462
call slascl ('General', i, i, rnorm, one, n, 1,
463
& workd(ipj), n, infol)
466
c %------------------------------------------------------%
467
c | STEP 3: r_{j} = OP*v_{j}; Note that p_{j} = B*v_{j} |
468
c | Note that this is not quite yet r_{j}. See STEP 4 |
469
c %------------------------------------------------------%
474
call scopy (n, v(1,j), 1, workd(ivj), 1)
480
c %-----------------------------------%
481
c | Exit in order to compute OP*v_{j} |
482
c %-----------------------------------%
487
c %----------------------------------%
488
c | Back from reverse communication; |
489
c | WORKD(IRJ:IRJ+N-1) := OP*v_{j} |
490
c | if step3 = .true. |
491
c %----------------------------------%
494
tmvopx = tmvopx + (t3 - t2)
498
c %------------------------------------------%
499
c | Put another copy of OP*v_{j} into RESID. |
500
c %------------------------------------------%
502
call scopy (n, workd(irj), 1, resid, 1)
504
c %---------------------------------------%
505
c | STEP 4: Finish extending the Arnoldi |
506
c | factorization to length j. |
507
c %---------------------------------------%
510
if (bmat .eq. 'G') then
517
c %-------------------------------------%
518
c | Exit in order to compute B*OP*v_{j} |
519
c %-------------------------------------%
522
else if (bmat .eq. 'I') then
523
call scopy (n, resid, 1, workd(ipj), 1)
527
c %----------------------------------%
528
c | Back from reverse communication; |
529
c | WORKD(IPJ:IPJ+N-1) := B*OP*v_{j} |
530
c | if step4 = .true. |
531
c %----------------------------------%
533
if (bmat .eq. 'G') then
535
tmvbx = tmvbx + (t3 - t2)
540
c %-------------------------------------%
541
c | The following is needed for STEP 5. |
542
c | Compute the B-norm of OP*v_{j}. |
543
c %-------------------------------------%
545
if (bmat .eq. 'G') then
546
wnorm = sdot (n, resid, 1, workd(ipj), 1)
547
wnorm = sqrt(abs(wnorm))
548
else if (bmat .eq. 'I') then
549
wnorm = snrm2(n, resid, 1)
552
c %-----------------------------------------%
553
c | Compute the j-th residual corresponding |
554
c | to the j step factorization. |
555
c | Use Classical Gram Schmidt and compute: |
556
c | w_{j} <- V_{j}^T * B * OP * v_{j} |
557
c | r_{j} <- OP*v_{j} - V_{j} * w_{j} |
558
c %-----------------------------------------%
561
c %------------------------------------------%
562
c | Compute the j Fourier coefficients w_{j} |
563
c | WORKD(IPJ:IPJ+N-1) contains B*OP*v_{j}. |
564
c %------------------------------------------%
566
call sgemv ('T', n, j, one, v, ldv, workd(ipj), 1,
569
c %--------------------------------------%
570
c | Orthogonalize r_{j} against V_{j}. |
571
c | RESID contains OP*v_{j}. See STEP 3. |
572
c %--------------------------------------%
574
call sgemv ('N', n, j, -one, v, ldv, h(1,j), 1,
577
if (j .gt. 1) h(j,j-1) = betaj
584
if (bmat .eq. 'G') then
586
call scopy (n, resid, 1, workd(irj), 1)
591
c %----------------------------------%
592
c | Exit in order to compute B*r_{j} |
593
c %----------------------------------%
596
else if (bmat .eq. 'I') then
597
call scopy (n, resid, 1, workd(ipj), 1)
601
c %---------------------------------------------------%
602
c | Back from reverse communication if ORTH1 = .true. |
603
c | WORKD(IPJ:IPJ+N-1) := B*r_{j}. |
604
c %---------------------------------------------------%
606
if (bmat .eq. 'G') then
608
tmvbx = tmvbx + (t3 - t2)
613
c %------------------------------%
614
c | Compute the B-norm of r_{j}. |
615
c %------------------------------%
617
if (bmat .eq. 'G') then
618
rnorm = sdot (n, resid, 1, workd(ipj), 1)
619
rnorm = sqrt(abs(rnorm))
620
else if (bmat .eq. 'I') then
621
rnorm = snrm2(n, resid, 1)
624
c %-----------------------------------------------------------%
625
c | STEP 5: Re-orthogonalization / Iterative refinement phase |
626
c | Maximum NITER_ITREF tries. |
628
c | s = V_{j}^T * B * r_{j} |
629
c | r_{j} = r_{j} - V_{j}*s |
630
c | alphaj = alphaj + s_{j} |
632
c | The stopping criteria used for iterative refinement is |
633
c | discussed in Parlett's book SEP, page 107 and in Gragg & |
634
c | Reichel ACM TOMS paper; Algorithm 686, Dec. 1990. |
635
c | Determine if we need to correct the residual. The goal is |
636
c | to enforce ||v(:,1:j)^T * r_{j}|| .le. eps * || r_{j} || |
637
c | The following test determines whether the sine of the |
638
c | angle between OP*x and the computed residual is less |
639
c | than or equal to 0.717. |
640
c %-----------------------------------------------------------%
642
if (rnorm .gt. 0.717*wnorm) go to 100
646
c %---------------------------------------------------%
647
c | Enter the Iterative refinement phase. If further |
648
c | refinement is necessary, loop back here. The loop |
649
c | variable is ITER. Perform a step of Classical |
650
c | Gram-Schmidt using all the Arnoldi vectors V_{j} |
651
c %---------------------------------------------------%
655
if (msglvl .gt. 2) then
658
call svout (logfil, 2, xtemp, ndigit,
659
& '_naitr: re-orthonalization; wnorm and rnorm are')
660
call svout (logfil, j, h(1,j), ndigit,
661
& '_naitr: j-th column of H')
664
c %----------------------------------------------------%
665
c | Compute V_{j}^T * B * r_{j}. |
666
c | WORKD(IRJ:IRJ+J-1) = v(:,1:J)'*WORKD(IPJ:IPJ+N-1). |
667
c %----------------------------------------------------%
669
call sgemv ('T', n, j, one, v, ldv, workd(ipj), 1,
670
& zero, workd(irj), 1)
672
c %---------------------------------------------%
673
c | Compute the correction to the residual: |
674
c | r_{j} = r_{j} - V_{j} * WORKD(IRJ:IRJ+J-1). |
675
c | The correction to H is v(:,1:J)*H(1:J,1:J) |
676
c | + v(:,1:J)*WORKD(IRJ:IRJ+J-1)*e'_j. |
677
c %---------------------------------------------%
679
call sgemv ('N', n, j, -one, v, ldv, workd(irj), 1,
681
call saxpy (j, one, workd(irj), 1, h(1,j), 1)
685
if (bmat .eq. 'G') then
687
call scopy (n, resid, 1, workd(irj), 1)
692
c %-----------------------------------%
693
c | Exit in order to compute B*r_{j}. |
694
c | r_{j} is the corrected residual. |
695
c %-----------------------------------%
698
else if (bmat .eq. 'I') then
699
call scopy (n, resid, 1, workd(ipj), 1)
703
c %---------------------------------------------------%
704
c | Back from reverse communication if ORTH2 = .true. |
705
c %---------------------------------------------------%
707
if (bmat .eq. 'G') then
709
tmvbx = tmvbx + (t3 - t2)
712
c %-----------------------------------------------------%
713
c | Compute the B-norm of the corrected residual r_{j}. |
714
c %-----------------------------------------------------%
716
if (bmat .eq. 'G') then
717
rnorm1 = sdot (n, resid, 1, workd(ipj), 1)
718
rnorm1 = sqrt(abs(rnorm1))
719
else if (bmat .eq. 'I') then
720
rnorm1 = snrm2(n, resid, 1)
723
if (msglvl .gt. 0 .and. iter .gt. 0) then
724
call ivout (logfil, 1, j, ndigit,
725
& '_naitr: Iterative refinement for Arnoldi residual')
726
if (msglvl .gt. 2) then
729
call svout (logfil, 2, xtemp, ndigit,
730
& '_naitr: iterative refinement ; rnorm and rnorm1 are')
734
c %-----------------------------------------%
735
c | Determine if we need to perform another |
736
c | step of re-orthogonalization. |
737
c %-----------------------------------------%
739
if (rnorm1 .gt. 0.717*rnorm) then
741
c %---------------------------------------%
742
c | No need for further refinement. |
743
c | The cosine of the angle between the |
744
c | corrected residual vector and the old |
745
c | residual vector is greater than 0.717 |
746
c | In other words the corrected residual |
747
c | and the old residual vector share an |
748
c | angle of less than arcCOS(0.717) |
749
c %---------------------------------------%
755
c %-------------------------------------------%
756
c | Another step of iterative refinement step |
757
c | is required. NITREF is used by stat.h |
758
c %-------------------------------------------%
763
if (iter .le. 1) go to 80
765
c %-------------------------------------------------%
766
c | Otherwise RESID is numerically in the span of V |
767
c %-------------------------------------------------%
775
c %----------------------------------------------%
776
c | Branch here directly if iterative refinement |
777
c | wasn't necessary or after at most NITER_REF |
778
c | steps of iterative refinement. |
779
c %----------------------------------------------%
787
titref = titref + (t5 - t4)
789
c %------------------------------------%
790
c | STEP 6: Update j = j+1; Continue |
791
c %------------------------------------%
794
if (j .gt. k+np) then
796
tnaitr = tnaitr + (t1 - t0)
798
do 110 i = max(1,k), k+np-1
800
c %--------------------------------------------%
801
c | Check for splitting and deflation. |
802
c | Use a standard test as in the QR algorithm |
803
c | REFERENCE: LAPACK subroutine slahqr |
804
c %--------------------------------------------%
806
tst1 = abs( h( i, i ) ) + abs( h( i+1, i+1 ) )
808
& tst1 = slanhs( '1', k+np, h, ldh, workd(n+1) )
809
if( abs( h( i+1,i ) ).le.max( ulp*tst1, smlnum ) )
813
if (msglvl .gt. 2) then
814
call smout (logfil, k+np, k+np, h, ldh, ndigit,
815
& '_naitr: Final upper Hessenberg matrix H of order K+NP')
821
c %--------------------------------------------------------%
822
c | Loop back to extend the factorization by another step. |
823
c %--------------------------------------------------------%
827
c %---------------------------------------------------------------%
829
c | E N D O F M A I N I T E R A T I O N L O O P |
831
c %---------------------------------------------------------------%