Data Dependence Analysis

Data dependence analysis is a memory disambiguation technique through which a compiler tries to establish dependence relations between scalar variables or between array accesses in a program. The existence of dependences or of independence, provides essential information for further analysis and legal transformations such as loop parallelization, loop tiling, loop interchange and so on.

Cetus implements an array data dependence testing framework that includes loop-based affine array-subscript testing. The focus of this implementation is towards automatic loop parallelization. This framework acts as a wrapper around the conventional data-dependence test; we currently use the Banerjee-Wolfe inequalities (Optimizing Supercompilers for Supercomputers, Michael Wolfe) as our default dependence test. The wrap-around framework provides Cetus with the flexibility of including other dependence tests, thereby improving the scope of our implementation; (we provide the Range test since v1.1). A whole program data-dependence graph provides the necessary output information and the interface, to enable subsequent analyses and transformations.

Initially, the algorithm traverses all loops in the program IR and each loop and its inner nest are tested for eligibility. The eligibility check allows us to broaden/narrow the scope of the implementation. Currently, the algorithm includes the following checks:

  • Canonical loops (Fortran DO loop format): This implements a canonical check for the loop initial assignment expression, the loop condition and the loop index increment statement
  • Function calls are supported with simple analysis. Known function calls without side-effects can be added to a list of parallelizable calls (e.g. system calls). We also support simple interprocedural side-effect analysis
  • Control flow modifiers are not allowed (break statements can be handled specially for parallelization)
  • The loop increment must be an integer constant (symbolic increments are handled using range analysis)

Loop information for all eligible loops is collected and stored internally for further dependence analysis.

SYMBOLIC ANALYSIS SUPPORT: The Data dependence framework has added substantial support for using range analysis in the case of symbolic values. Range information is used to determine loop bounds and loop increments during eligibility testing. In the case of symbolic bounds, range information is used conservatively to test the maximum possible range of values. Symbolic information is also used to simplify array subscripts before they are provided to the Banerjee-Wolfe inequalities for testing. This symbolic support thus adds great value to dependence analysis, especially to the Banerjee test.

The algorithm then tests all array accesses within an entire loop nest for independence. Arrays identified by AliasAnalysis as aliased are currently assumed to be dependent in all directions for the enclosing loops. Pairs of array accesses such that at least one of them is a write access are provided as input to the dependence test which returns a set of direction vectors corresponding to the common enclosing loop nest for the pair of accesses (if independence could not be proved). These direction vectors are then used to build a Data Dependence Graph (DDG) for the loop nest.

The data dependence graph for each loop nest becomes a part of a larger whole program based graph which is attached to the Program IR object upon completion of dependence testing. This implementation is new, and substantially different from release v1.0. For a user looking to use dependence information in an analysis or transformation pass, a reference to this DDG must be obtained from the Program object. A comprehensive API is then available via DDGraph (see Cetus javadoc) to extract dependence information related to loops. This implementation has the advantage that dependence testing is run fewer times and a common API is available to pass users. NOTE: However, it is the pass writer’s responsibility that after a transformation on the IR has been performed, dependence testing must be re-run in order to create an up-to-date version of the dependence graph which automatically replaces itself in the Program IR.