<BODY>

VECFEM3 Reference Manual: LINSOL

Type: format

Introduction

The linear equation solver LINSOL is designed to solve a l -dimensional linear system MAT*x = b with a very large and sparse l x l real matrix MAT . In this manual page we explain the structure of the storage scheme wich is used by LINSOL. In the following we use the term A instead of MAT .

Parallel Computer

LINSOL is designed to run optimally on parallel distributed-memory computers. Before the computation can start, the GLOBAL matrix has to be distributed to the local memories of the processors. This is done in the following way:

Each process(or) p gets lmatbk(p) consecutive rows of the GLOBAL matrix A starting at row ptrmbk(p) +1. Thus we have a LOCAL lmatbk(p) x l matrix A(p) on every process(or) p . This LOCAL matrix is stored in the sparse matrix storage scheme described in the next chapter. Additionaly, LINSOL assumes a logical column block splitting of the matrix on each process(or). This is explained in the chapter Column Block Splitting .

The Storage Scheme

The real LOCAL matrix A(p) on processor p is of size lmatbk(p) x l . It is represented as sum of LOCAL 10 sub-matrices A(i,p) :

A(p) = A(1,p) + A(2,p) + ... + A(i,p) + ... + A(10,p)

Each submatrix A(i,p) contains one (and only one) special data structure. The vectors within the allowed data structures are named "vector terms". The number of vector terms for A(i,p) is stored in the (virtual) array nvt(i,p) . The different submatrices (and vector terms respectively) can be assembled without cutback to the matrix A(p) .

WARNING:

Two elements of two different vector terms can have the same entry in the matrix A(p) - this means that the entry is the sum of the two elements. The overlay of elements in the matrix A(p) does not influence the correct computation of the solution of the linear system; but then the normalization does not work in the specified way - this means that you don't get exactly the chosen normalization (except of the normalization by main diagonal).

In the following, the index p indicating the process(or) number will be dropped. The integer number typ indicates the type of the LOCAL submatrix.

NOTES:

The variables iac , iar and the array index are GLOBAL and relate to the GLOBAL matrix A . The variables l and lvt are local and relate to the LOCAL matrix A . The variables adda , indc and indr are pointers and point to the matrix array mat and to the array of indices index respectively. The following examples show the settings of the varables for the first block of the LOCAL matrix on the first process(or)- this means that for the other blocks and (or) on the other processes (processors) the values of the GLOBAL variables ( iac , iar , index ) change.

MAIN : main diagonal ( typ =10 or 1)

A(1) is the main diagonal of the matrix A . This is the diagonal starting with row- and column-index ptrmbk(p) +1. Gaps with no entries have to be filled by zeroes


                 1234567890123...
        1 *-------
       2 -*------
       3 --*-----
A(1) = 4 ---*----
       5 ----*---
       6 -----*--
       7 ------*-
       8 -------*

FULLDIAG : full diagonal ( typ =20 or 2)

Starting at the column iac+1 and the row iar+1 , the vector term FULLDIAG is a diagonal portion of length lvt . Gaps with no entries have to be filled by zeroes.

In the example iac =3, iar =1 and lvt =4 holds.
```
                1234567890123...
        1 --------
       2 ---*----
       3 ----*---
A(2) = 4 -----*--
       5 ------*-
       6 --------
       7 --------
       8 --------
```
PDIAG : packed diagonal ( typ =30 or 3)

Starting at the column iac+1 and the row iar+1 , the vector term PDIAG is a sparse diagonal portion with length lvt . A pointer indc points to the starting adress of the array index which is used to adress the nonzero elements in the diagonal.

In the example iac =3, iar =1, lvt =3 and index(indc+1..lvt) =(1,2,4) holds.
```
                 1234567890123...
       1 --------
       2 ---*----
       3 ----*---
A(3) = 4 --------
       5 ------*-
       6 --------
       7 --------
       8 --------
```
IXCOL : indexed column ( typ =40 or 4)

This vector term has entries in lvt contiguous rows starting at row iar+1 . A pointer indc points to the starting adress of the array index which is used to adress the column indices relating to the contiguous row indices.

In the example iar =1, lvt =4 and index(indc+1..lvt) =(4,2,1,7) holds.
```
                 1234567890123...
       1 --------
       2 ---*----
       3 -*------
A(4) = 4 *-------
       5 ------*-
       6 --------
       7 --------
       8 --------
```
IXROW : indexed row ( typ =50 or 5)

This vector term has entries in lvt contiguous columns starting at column iac+1 . A pointer indr points to the starting adress of the array index which is used to adress the row indices relating to the contiguous column indices.

In the example iac =0, lvt =4 and index(indr+1..lvt) =(4,3,6,1) holds.
```
                 1234567890123...
       1 ---*----
       2 --------
       3 -*------
A(5) = 4 *-------
       5 --------
       6 --*-----
       7 --------
       8 --------
```
SKY : starry sky ( typ =60 or 6)

lvt entries of nonzero elements are completely defined by two pointers indc,indr . The column-indices are listed in the integer vector index(indc+1..lvt) , the row-indices in the vector index(indr+1..lvt) .

In the example lvt =3, index(indc+1..lvt) =(1,2,6), and index(indr+1..lvt) =(3,1,7) holds.
```
                 1234567890123...
       1 -*------
       2 --------
       3 *-------
A(6) = 4 --------
       5 --------
       6 --------
       7 -----*--
       8 --------
```

FULLROW: full row ( typ =70 or 7)

FULLROW defines lvt contiguous elements in the row iar +1 starting at column iac +1. Gaps with no entries have to be filled by zeroes.

In the example iar =7, iac =2 and lvt =5 holds.


                 1234567890123...
       1 --------
       2 --------
       3 --------
A(7) = 4 --------
       5 --------
       6 --------
       7 --------
       8 --*****-

FULLCOL : full column ( typ =80 or 8)

FULLCOL defines lvt contiguous elements in the column iac +1 starting at row iar +1. Gaps with no entries have to be filled by zeroes.

In the example iac =7, iar =2 and lvt =3 holds.
```
                 1234567890123...
       1 --------
       2 --------
       3 -------*
A(8) = 4 -------*
       5 -------*
       6 --------
       7 --------
       8 --------
```

PROW: packed row ( typ =90 or 9) {not yet implemented!!!!}

PROW defines lvt elements adressed by vector index via indc in the row iar +1.

In the example iar =7, lvt =3 and index(indc+1..lvt) =(3,5,8) holds.


                 1234567890123...
       1 --------
       2 --------
       3 --------
A(9) = 4 --------
       5 --------
       6 --------
       7 --------
       8 --*-*--*

PCOL : packed column ( typ =100 or 11) {not yet implemented!!!!}

PCOL defines lvt elements adressed by vector index via indr in the column iac +1.

In the example iac =7, lvt =3 and index(indr+1..lvt) =(3,4,7) holds.
```
                1234567890123...
        1 --------
        2 --------
        3 -------*
A(11) = 4 -------*
        5 --------
        6 --------
        7 -------*
        8 --------
```
Note : The typ SKY="starry sky" is not allowed on a vector computer.

On each process, all index vectors are stored in the integer array index of length lindex . Indc +1 and indr +1 point to the first element of index vectors relating to vector terms within the array index . The entries of the LOCAL matrix A are stored in the real array mat of length lmat . Adda +1 points to the first element of a vector term within the array mat . The integer numbers typ, adda, lvt, iac, iar, indc and indr are handed over in the integer matrix info(ia1,ia2) ; ia1>=nvt and ia2 >=7 must hold. The values info(i,1) ,..., info(i,7) specify the information for the i-th vector term of typ=info(i,1) . The following table shows the meaning of info(i,k) for the different data structures:

shape	k=1	k=2	k=3	k=4	k=5	k=6	k=7
MAIN	1 or 10	adda	l	0	0	0	0
FULLDIAG	2 or 20	adda	lvt	iac	iar	0	0
PDIAG	3 or 30	adda	lvt	iac	iar	indc	0
IXCOL	4 or 40	adda	lvt	0	iar	indc	0
IXROW	5 or 50	adda	lvt	iac	0	0	indr
SKY	6 or 60	adda	lvt	0	0	indc	indr
FULLROW	7 or 70	adda	lvt	iac	iar	0	0
FULLCOL	8 or 80	adda	lvt	iac	iar	0	0
PROW	9 or 90	adda	lvt	0	iar	indc	0
PCOL	11 or 100	adda	lvt	iac	0	0 indr

Column Block Splitting

LINSOL assumes that the matrix is not only partitioned in blocks of rows among the processors (processes), but also that each matrix MAT(p) is partitioned logically in nproc (== number of processors (processes)) column blocks. It must hold: if a vector term of the matrix MAT(p) has its first element in column block i (i=1,..,nproc) - this means that it is between column ptrmbk(i) +1 and ptrmbk(i+1) - ,then ALL elements of the vector term have to be in the column block i.

The Symmetrical Case

A symmetrical matrix is stored in a special symmetrical storage scheme which uses the symmetry of the matrix. Matrix A can be represented as

A = AA + D + AA(T)

where D is the main diagonal of A and AA is a M X M -matrix (not necessarily equal to the upper or lower triangle of A ). AA(T) denotes the transposed matrix of AA . In this case only the matrix AA and the main diagonal (if nonzero) must be handed over.

How to Select the Sub-matrices

But there is the problem how to separate the matrix A into the local sub-matrices A(i) with i element of set {1,2,..,11} because of different partitioning possibilities. The optimal way depends strongly on the used computer. We recommend the following strategy to find the partitioning: first look to all 'long' full row or column portions in your matrix; second look to all 'long' packed row or column portions. Then look to all occupied diagonals of A (=diagonal with at least one nonzero element). The long diagonals with many nonzero elements are stored in the scheme FULLDIAG . If there are enough nonzero elements in the diagonal i.e. the packed diagonal is 'long', you should store the diagonal in scheme PDIAG . What 'long' means, depends on your computer. Normally 'long' is the vector length where a computer nearly reaches its peak performance for the special operation (e.g. general triad, linked triad, scalar product, ..). For the remaining elements of the matrix you can select the SKY scheme, whereas the length of one vector term may not exceed l.

Examples

Here follow two examples for the storage scheme in the nonsymmetrical and the symmetrical case.

Example (nonsymmetric problem on a monoprocessor)

First we look to a nonsymmetric problem. MAT is a 8 X 8 matrix (=>l=8):

       | 4.  0   2.  0   0   0   5.  0. |
       | 0   4. -1.  3.  0   0   0   0. |
       | 0   0   4. -1.  0  -2.  0   0. |
MAT =  | 3.  0   0   4. -1.  0   0   0. |
       | 0   0   0   0   4. -1.  3.  0. |
       | 0   0   0   0   0   4.  0   0. |
       | 0   0   0   0   0   0   4.  0. |
       | 0   0   1.  1.  1.  1.  1.  4. |

The matrix MAT is subdivided into five vector terms

MAT = MAIN + 1 x FULLDIAG + 1 x IXCOL + 1 x SKY + 1 x FULLROW
    = A(1) +  A(2)  +  A(4) +  A(6) +  A(7)              .

We set ia1=nvt=5 and ia2=7. We get

MAIN:

info(1,1)=typ=10
info(1,2)=adda=0
info(1,3)=l=8


         | 4.                            | 
         |     4.                        | 
         |         4.                    |  
A(1) =   |             4.                |   
         |                 4.            |
         |                     4.        |
         |                         4.    |
         |                             4.|

FULLDIAG :

info(2,1)=typ=20
info(2,2)=adda=8
info(2,3)=lvt=4
info(2,4)=iac=2
info(2,5)=iar=1


         |                               |   
         |        -1.                    |  
         |            -1.                |  
A(2) =   |                -1.            | 
         |                    -1.        |   
         |                               |   
         |                               |
         |                               |

IXCOL:

info(3,1)=typ=40
info(3,2)=adda=12
info(3,3)=lvt=4
info(3,5)=iar=0
info(3,6)=indc=0


         |                         5.    |   
         |             3.                |   
         |                    -2.        |   
A(4) =   | 3.                            |   
         |                               |   
         |                               |   
         |                               |
         |                               |

SKY :

info(4,1)=typ=60
info(4,2)=adda=16
info(4,3)=lvt=2
info(4,6)=indc=4
info(4,7)=indr=6


         |         2.                    |  
         |                               |   
         |                               |   
A(6) =   |                               |   
         |                         3.    |   
         |                               |   
         |                               |
         |                               |

FULLROW:

info(5,1)=typ=70
info(5,2)=adda=18
info(5,3)=lvt=5
info(5,4)=iac=2
info(5,5)=iar=7


         |                               |   
         |                               |   
         |                               |  
A(7) =   |                               |   
         |                               |   
         |                               |   
         |                               |
         |         1.  1.  1.  1.  1.    |     
                   .

The matrix elements are saved in MAT (lmat=23):

        [------A(1)-----------][----A(2)------][----A(4)---][-A(6)][-----A(7)-----]
MAT =  (4.,4.,4.,4.,4.,4.,4.,4.,-1.,-1.,-1.,-1.,5.,3.,-2.,3.,2.,3.,1.,1.,1.,1.,1.)
        ^                      ^                ^            ^     ^
adda(1)=0              adda(2)=8        adda(3)=12    adda(4)=16 adda(5)=18

The index vectors are stored in the array index (lindex=8):

         [--A(4)-][--A(6)--]
index =  (7,4,6,1,3,7, 1,5)
          ^       ^    ^  
info(3,6)=0 info(4,4)=4 info(4,6)=6

Example (symmetric problem on a monoprocessor)

Now we look to a symmetric problem. MAT is a 8 X 8 matrix (=>l=8):


         | 4.  0   2.  3.  0   0   5.  0. |
         | 0   4. -1.  3.  0   0   0   0. |
         | 2  -1.  4. -1.  0  -2.  0   1. |
MAT =    | 3.  3. -1.  4. -1.  0   0   1. |
         | 0   0   0  -1.  4. -1.  3.  1. | 
         | 0   0  -2.  0  -1.  4.  0   1. |
         | 5.  0   0   0   3.  0   4.  1. |
         | 0   0   1.  1.  1.  1.  1.  4. |

MAT = A + D + A(T), where D is the main diagonal and A(T) is the transposed matrix of A. To store this symmetrical matrix, we can use the storage scheme of example 4.1. All we have to do is to set lsym =.true.

References

R. Weiss, H. Haefner, W. Schoenauer: LINSOL (LINear SOLver) - Description and User's Guide for the Parallelized Version (DRAFT Version 0.96); University of Karlsruhe; Computing Center; Internal Report 61/95; 1995.

Authors

Program by L. Grosz, H. Haefner, C. Roll, P. Sternecker, R. Weiss 1989-96. Please contact L.Grosz.

By L. Grosz, Auckland, 4 June 2000.