Interleaving and dataflow analysis.

Kurt Stirewalt (kurt@cc.gatech.edu)
1 Nov 1994 11:28:09 -0500

This is a response to Linda's list of different interleaving types.
Some of the cases seem to open up well to dataflow analysis.

>TEXTUAL
>
>- same var name used for 2 different purposes
> [should be able to handle it by dflow analysis, renaming]

Agreed.

>- no sharing, just interleaved in text
> -- scaling/unscaling wrapped around core computation (NPEDLN)
> More generally, delocalized relationships btwn
> computational fragments or pieces of data:
> - "Wrapped" computations are special cases of this.
> - data coupling is another form (e.g., ellipsoid is usually
> represented
> by 3 separate DP integers A,B,C, which specify the lengths of its
> semi-axes). With good coding style, these components are acted upon
> consistently, all at the same time, in localized computations. So
> in the NAIF library, for instance, they might not be involved in
> really interesting cases of interleaving.

One way to think of the wrapping/unwrapping is to think of the composition
of three functions:
f_wrap
f_compute
f_unwrap
such that
f_wrap o f_compute o f_unwrap
(where "o" is the functional composition operator)
represents functionally what is going on in the code. The trick to detecting
these then is to perform functional extraction on the code (that is, detect
(or maybe impose?) functional boundaries on the sub-computations that make up
the code). How to do this is an open question, but I had been thinking
seriously about doing exactly this sort of functional extraction based on
the languages denotational semantics (to support the multiple output
domain detection). I *think* that if we can phrase this problem semantically,
then there is probably an efficient dataflow solution to its detection.

> -- error handling involving inputs.

A quick and dirty way to detect such a situation would be:
1) Do control dependence on a subroutine.
2) Identify all return points (return statements), call this set R.
3) Foreach r in R, determine the control conditions which caused
it to execute, call this set C.
4) Foreach c in C, check to see if the input parameters are a part of
the conditional test. If so, then mark the r that corresponds
with c as an error checking return.

> -- multiple entry points to a routine (e.g., dafana.f)

Not sure what to do with this one exactly, but since we would be using
heuristics to decide on points to impose a functional boundary, perhaps
multiple entry points would allow us to prune the search space of possible
functional encapsulations?

Obviously, using Refine/Fortran, we can find all such multiple entry points
simply by searching for an ENTRY statement AST.

>IMPLEMENTATION LEVEL (algorithmic, data structure)

>- loop fusion

Dataflow analysis should be able to detect the "simple" cases of loop
fusion (those in which 2 loops with no sharing but similar access patterns
have been combined for efficiency reasons).

For the cases involving sharing, we need a way to think about the problem,
and I believe that a lot of the concepts that we used in selecting loops
for parallelization/vectorization may also be useful here. Maybe we could
talk about this during the meeting Wednesday>

>- error handling (NPEDLN: error checks on SCLA**2 an intermediate result)

It seems that in this case too we could use the control dependence info
to frame our question. Since SCLA is not an input parameter, we will need
some other criteria for judging that this return is an error return as
opposed to a legal one. In this case, we can note that the return is based on
the value test of a variable whose use is required in the computation of
an output parameter? Alternatively, if dist and/or pnear had been
identified as OUT parameters instead of IN-OUT, then we could note that
this return occured before an assignment to an OUT parameter which should
signify an error.

>- multiple outputs w/ different roles
> NPEDLN: 1) computation of nearest point on ellipsoid to a line and
> 2) shortest distance btwn the line and the ellipsoid.

Hopefully we can address this question using the same framework that
I'm planning to use to determine if multiple outputs are actually an
encoding of a single output. Don't know yet though.

>- multiple special cases, with sharing
> -- special idiosyncratic case that needs special handling (not covered by
> general case or would complicate general case if not handled separately):
> e.g., SURFPT: finds intersection of ellipsoid and line-of-sight vector.
> sets up problem as one of solving a quadratic equation and then does a
> case analysis on conditions involving the equation's coefficients.
> Each of these cases means something different in terms of the position
> of the observer relative to the ellipsoid and how to compute the
> intersection.
> e.g., DIAGS2: Diagonalize a symmetric 2x2 matrix (SYMMAT).
> special case: check for the case of a diagonal input matrix early on:
> C We check for the case of a diagonal input matrix, since
> C eigenvector determination is simplified by ruling out this
> C case.

Not sure about this one.

> -- multiple special cases result in single output that can be
> interpreted differently, depending on which case occurred
> (e.g., INELPL -- NXPTS XPT1 XPT2 -- these are interpreted
> differently, depending on flag NXPTS -- a data tag saying how to
> interpret the outputs; like a control flag, saying whether the
> other outputs are defined (similar to FOUND), but it's not just
> a control flag: if it is -1, it is saying the ellipse lies on
> the plane and the answer is infinity.)

Sounds like the problem of determining that multiple return values are
encoding a single value whose type is a partially ordered set. The only
difference in this and the case in which multiple outputs are encoding
a value or an undefined element is that the former is not a lattice because
many of the elements of the poset are incomparable. But we still logically
think of the return value as "the intersection" which is one type no matter
how many constituent points are involved.

>- control coupling
> INEDPL: 1) intersection of ellipse and plane
> 2) control flag FOUND saying whether intersection exists.
> (control coupling -- also outputs like FAILED, ERROR)

Is this the same as the multiple special cases result in single output...
case above?

>- data space sharing
> -- same output value used for two different purposes (clock and
> accumulator) (length and existence-flag)
> -- same dstruc slot used for 2 different purposes

So in this case, a value is computed once but interpreted differently
depending upon context? In the case of the same output value being used
for twi different purposes, it may be that one purpose is an abstraction
of another (sounds like the existence-flag is an abstraction of the length
under the interpretation that the thing does not exist when the length is
0 and does exist otherwise). Is that accurate? I'm not sure how we could
determine that one use was an abstraction of another, but it sounds worth
investigating.

>- multiple entry pts to routine
> -- these act as subroutines w/ access to global state that NAIF builders
> want treated as private. *Global interactions involving global state.*

>[Some interleavings at the implementation level are specializations at
>the design level. For example, special cases interleaved in case stmt may
>share control condition checking and some preliminary computations (as
>in INELPL). At implem. level this can be seen as interleaving several
>special computations for the various cases (ellipse lies on plane,
>does not intersect plane, intersects at 1 pt, intersects at 2), while
>at design level, might be seen as separate encapsulations of
>computations that are done, depending on which case the data falls into.]

-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Kurt Stirewalt (kurt@cc.gatech.edu | kswalt@mitre.org)                      %
% Georgia Tech Software Engineering Research Center. Atlanta, Georgia.        %
%                      "Not tonight honey, it's darts night"                  %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%