Array representation in ASR

Currently an array is represented as a Variable symbol with a type that has the dimension* dim argument set. Each dimension is defined as:

dimension = (expr? start, expr? end)

It's not super well documented yet what it means, but I think it's similar to the AST:

-- Encoding of an array dimension declaration:
--           start      end     end_star
-- Declaration:
-- X(n)       1          n       Expr   # Note: X(n) is equivalent to X(1:n)
-- X(:)       ()         ()      Expr
-- X(a:)      a          ()      Expr
-- X(:b)      ()         b       Expr
-- X(a:b)     a          b       Expr
-- X(*)       ()         ()      Star
-- X(a:*)     a          ()      Star

Taking https://github.com/lcompilers/lpython/wiki/Array-Types-Design into account, I think an array should always be represented by:

dimension = (expr start, expr end)

Where both the start and end expressions are always present. For the : case the expression will link to integer, dim :: n local variables.

Furthermore, I think it should actually be simplified further. The types should be changed from:

    = Integer(int kind, dimension* dims)

to

    = Integer(int kind, expr* shape, expr* lbound)

or

    = Integer(int kind, dimension* dims)

dimension = (expr extent, expr lbound)

Where the shape contains the sizes of each dimension, which currently have to be computed using shape=start-end+1. It is actually the shape that is the invariant:

  • It does not change when an array such as x(3:5) is passed into a subroutine into an argument declared as x(:) or x(0:) or x(3:n); But both start and end do change.
  • It is independent of the lower bound, whether zero based (NumPy), or the default 1 based (Fortran) or arbitrary (Fortran)
  • In this sense it is the fundamental of an array, common to all frontend languages

The lbound (equal to the current start) can change as an array is passed into a subroutine, but often it is irrelevant to many of the operations such as runtime bounds checking (which mainly checks the sizes of the array; obviously the lbound is used for the function implementation indexing checking). In Fortran the lbound always has to be manually specified in each subroutine if different from 1. The exception is apparently allocatable, intent(in) arrays where the lbound is passed in implicitly.

If we later decided that we do not want to even represent lbound in ASR at all, it would be easier to do also.

See also:

Edited by Ondřej Čertík