394 lines
23 KiB
TeX
394 lines
23 KiB
TeX
% !TEX encoding = UTF-8
|
|
% !TEX spellcheck = en_US
|
|
% !TEX root = ../paper.tex
|
|
\chapter{Main explanation?}
|
|
|
|
\section{First definition of the SDG}
|
|
\label{sec:first-def-sdg}
|
|
|
|
The system dependence graph (SDG) is a method for program slicing that was first
|
|
proposed by Horwitz, Reps and Blinkey \cite{HorwitzRB88}. It builds upon the
|
|
existing control flow graph (CFG), defining dependencies between vertices of the
|
|
CFG, and building a program dependence graph (PDG), which represents them. The
|
|
system dependence graph (SDG) is then build from the assembly of the different
|
|
PDGs (each representing a method of the program), linking each method call to
|
|
its corresponding definition. Because each graph is built from the previous one,
|
|
new constructs can be added with to the CFG, without the need to alter the
|
|
algorithm that converts CFG to PDG and then to SDG. The only modification
|
|
possible is the redefinition of a dependency or the addition of new kinds of
|
|
dependence.
|
|
|
|
The language covered by the initial proposal was a simple one, featuring
|
|
procedures with modifiable parameters and basic instructions, including calls to
|
|
procedures, variable assignments, arithmetic and logic operators and conditional
|
|
instructions (branches and loops): the basic features of an imperative
|
|
programming language. The control flow graph was as simple as the programs
|
|
themselves, with each graph representing one procedure. The instructions of the
|
|
program are represented as vertices of the graph and are split into two
|
|
categories: statements, which have no effect on the control flow (assignments,
|
|
procedure calls) and predicates, whose execution may lead to one of multiple
|
|
---though traditionally two--- paths (conditional instructions). Statements are
|
|
connected sequentially to the next instruction. Predicates have two outgoing
|
|
edges, each connected to the first statement that should be executed, according
|
|
to the result of evaluating the conditional expression in the guard of the
|
|
predicate.
|
|
|
|
\begin{definition}[Control Flow Graph~\cite{???}]
|
|
A \emph{control flow graph} $G$ of a program $P$ is a tuple $\langle N, E \rangle$, where $N$ is a set of nodes, composed of a method's statements and two special nodes, ``Start'' and ``End''. $E$ is a set of edges of the form $e = \left(n_1, n_2\right)$ a directed edge from $n_1$ to $n_2$
|
|
\end{definition}
|
|
|
|
To build the PDG and then the SDG, some dependencies must be extracted from the CFG, which are defined as follows:
|
|
|
|
\begin{definition}[Postdominance]
|
|
Vertex $b$ \textit{postdominates} vertex $b$ if and only if $a \neq b$ and $b$ is on every path from $a$ to the ``End'' vertex.
|
|
\end{definition}
|
|
|
|
\begin{definition}[Control dependency]
|
|
\label{def:ctrl-dep}
|
|
Vertex $b$ is \textit{control dependent} on vertex $a$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $a$'s successors. It follows that a vertex with only one successor cannot be the source of control dependence.
|
|
\end{definition}
|
|
|
|
\begin{definition}[Data dependency]
|
|
Vertex $b$ is \textit{data dependent} on vertex $a$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there an $x$-definition free path from $a$ to $b$.\footnote{The initial definition of data dependency was further split into in-loop data dependencies and the rest, but the difference is not relevant for computing the slices in the SDG.}
|
|
\end{definition}
|
|
|
|
It should be noted that variable definitions and uses can be computed for each
|
|
statement independently, analyzing the procedures called by it if necessary. In
|
|
general, any instruction uses all variables that appear in it, save for the
|
|
left-hand side of assignments. Similarly, no instruction defines variables,
|
|
except those in the left-hand side of assignments. The variables used and
|
|
defined by a procedure call are those used and defined by its body.
|
|
|
|
With the data and control dependencies, the PDG may be built, by replacing the
|
|
edges from the CFG by data and control dependence edges. The first tends to be
|
|
represented as a thin solid line, and the latter as a thick solid line. In the
|
|
examples, data dependencies will be thin solid red lines.
|
|
|
|
The organization of the vertices of the PDG tends to resemble a tree graph, with
|
|
the ``Start'' node in the position of the root (at the top), and the ``End''
|
|
node typically omitted. The control dependence edges structure the tree
|
|
vertically. In the case that a vertex is control dependent on multiple vertices,
|
|
it will be placed one level below the lowest source of control dependency. With
|
|
a programming language this simple, cyclical control dependencies do not appear,
|
|
but should they do so in further sections, the instructions are sorted top to
|
|
bottom in the order they appear in the program. Horizontally, the vertices are
|
|
sorted by their order in the program, left to right, in order to make the graph
|
|
more readable. Data dependency edges are placed without reordering the nodes of
|
|
the graph. In the examples given, edges like $a \datadep a$ or $b \ctrldep b$
|
|
may be omitted, as they are not relevant for later use of the graph. Please be
|
|
noted that the location of the vertices is irrelevant for the slicing algorithm,
|
|
and the aforementioned sorting rules are just for consistency with previous
|
|
papers on the topic and to ease the visualization of programs.
|
|
|
|
Finally, the SDG is built from the combination of all the PDGs that compose the
|
|
program. Each call vertex is connected to the ``Start'' of the corresponding
|
|
procedure. All edges that connect PDGs are represented with dashed lines.
|
|
|
|
\begin{figure}
|
|
\begin{minipage}{0.3\linewidth}
|
|
\begin{lstlisting}
|
|
proc main() {
|
|
a = 10;
|
|
b = 20;
|
|
f(a, b);
|
|
print(a);
|
|
}
|
|
|
|
proc f(x, y) {
|
|
while (x > y) {
|
|
x = x - 1;
|
|
}
|
|
print(x);
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.6\linewidth}
|
|
\includegraphics[width=0.3\linewidth]{img/cfgsimple}
|
|
\includegraphics[width=0.65\linewidth]{img/cfgsimple2}
|
|
\end{minipage}
|
|
\includegraphics[width=0.5\linewidth]{img/pdgsimple}
|
|
\includegraphics[width=0.49\linewidth]{img/pdgsimple2}
|
|
\includegraphics[width=0.6\linewidth]{img/sdgsimple}
|
|
\includegraphics[width=0.4\linewidth]{img/legendsimple}
|
|
\caption{A simple program with its CFGs (top right), PDGs (center) and SDG (bottom).}
|
|
\label{fig:sdg-loop}
|
|
\end{figure}
|
|
|
|
\subsubsection{Procedures and data dependencies}
|
|
|
|
The only thing left to explain before introducing more constructs into the
|
|
language is the passing of parameters. Most programming language accept a
|
|
variable number of input parameters and one output parameter. In the case of
|
|
input parameters passed by reference, or constructs such as structs or classes,
|
|
modifying a field of a parameter may modify the original variable. In order to
|
|
deal with everything related to parameter passing, including global variables,
|
|
class fields, etc. there is a small extension to be made to the CFG and PDG.
|
|
|
|
In the CFG, the ``Start'' and ``End'' nodes contain a list of assignments,
|
|
inputting and outputting respectively the appropriate values, as can be seen in
|
|
the example. Consequently, every vertex that contains a procedure or function
|
|
call pack and unpack the arguments. For every variable $x$ that is used in a
|
|
procedure, every call to it must be preceded by $x_{in} = x$, and the
|
|
procedures's ``Start'' vertex must contain $x = x_{in}$. The opposite happens
|
|
when a variable must be ``outputted''\carlos{replace}: before the ``End'' node,
|
|
the value must be packed ($x_{out} = x$), and after each call, the value must be
|
|
assigned to the corresponding variable ($x = x_{out}$). Parameters may be
|
|
assigned as $par^i_{in} = expr_i$ (where $i$ is the index of the parameter in
|
|
the procedure definition, $par^i$ is the name of the parameter and $expr_i$ is
|
|
the expression in the $i^{th}$ position in the procedure call) in the call
|
|
vertex, and parameters whose modifications inside the procedure are passed back
|
|
to the calling procedure must be extracted as $var = par^i_{out}$ (where $var$
|
|
is the name of the variable ---passed by reference--- in the calling
|
|
procedure).\carlos{What if object/struct passed by value?} As an addition, in
|
|
the SDG, an extra edge is added (summary edge), which represents the
|
|
dependencies that the input variables have on the outputs. This allows the
|
|
algorithm to know the dependencies without traversing the corresponding
|
|
function.
|
|
|
|
All these additions are added as extra lines in the ``Start'', ``End'' and
|
|
calling vertices. When building the PDG, all additions (variable assignments)
|
|
are split into their own vertices, and are control dependent on them. Data
|
|
dependencies no longer flow throw the call vertex, but throw the appropriate
|
|
child, which minimizes the size of the slice produced. As an example,
|
|
figure~\ref{fig:sdg-loop} shows the three stages of a program, from CFG to SDG.
|
|
The construction of the CFG is straight-forward, save for the packing and
|
|
unpacking of variables in the start, end and call vertices. In the PDG, the
|
|
statements are split, control and data dependencies replace the control flow
|
|
edges. Finally, both PDGs are linked via call and parameter (input and output)
|
|
edges, forming the SDG. Summary edges are placed according to the data and
|
|
control flow of the method call, and the graph is complete.
|
|
|
|
\section{Unconditional control flow}
|
|
|
|
Even though the initial definition of the SDG was useful to compute slices, the
|
|
language covered was not enough for the typical language of the 1980's, which
|
|
included (in one form or another) unconditional control flow. Therefore, one of
|
|
the first additions contributed to the algorithm to build system dependence
|
|
graphs was the inclusion of unconditional jumps, such as ``break'',
|
|
``continue'', ``goto'' and ``return'' statements (or any other equivalent). A
|
|
naive representation would be to treat them the same as any other statement, but
|
|
with the outgoing edge landing in the corresponding instruction (outside the
|
|
loop, at the loop condition, at the method's end, etc.). An alternative
|
|
approach is to represent the instruction as an edge, not a vertex, connecting
|
|
the previous statement with the next to be executed. Both of these approaches
|
|
fail to generate a control dependence from the unconditional jump, as the
|
|
definition of control dependence (see Definition~\ref{def:ctrl-dep}) requires a
|
|
vertex to have more than one successor for it to be possible to be a source of
|
|
control dependence. From here, there stem two approaches: the first would be to
|
|
redefine control dependency, in order to reflect the real effect of these
|
|
instructions ---as some authors~\cite{DanBHHKL11} have tried to do--- and the
|
|
second would be to alter the creation of the SDG to ``create'' those
|
|
dependencies, which is the most widely--used solution.
|
|
|
|
The most popular approach was proposed by Ball and Horwitz\cite{BalH93}, and
|
|
represents unconditional jumps as a \textsl{pseudo--predicate}. The true edge
|
|
would lead to the next instruction to be executed, and the false edge would be
|
|
non-executable or \textit{dummy} edges, connected to the instruction that would
|
|
be executed were the unconditional jump a \textit{nop}. The consequence of this
|
|
solution is that every instruction placed after the unconditional jump is
|
|
control dependent on the jump, as can be seen in Figure~\ref{fig:break-graphs}.
|
|
In the example, when slicing with respect to variable $a$ on line 5, every
|
|
statement would be included, save for ``print(a)''. Line 4 is not strictly
|
|
necessary in this example ---in the context of weak slicing---, but is included
|
|
nonetheless. In the original paper, the transformation is proved to be
|
|
complete, but not correct, as for some examples, the slice includes more
|
|
unconditional jumps that would be strictly necessary, even for weak slicing.
|
|
Ball and Horwitz theorize that a more correct approach would be possible, if it
|
|
weren't for the limitation of slices to be a subset of statements of the
|
|
program, in the same order as in the original.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}{0.3\linewidth}
|
|
\begin{lstlisting}
|
|
static void f() {
|
|
int a = 1;
|
|
while (a > 0) {
|
|
if (a > 10) break;
|
|
a++;
|
|
}
|
|
System.out.println(a);
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.6\linewidth}
|
|
\includegraphics[width=0.4\linewidth]{img/breakcfg}
|
|
\includegraphics[width=0.59\linewidth]{img/breakpdg}
|
|
\end{minipage}
|
|
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
|
|
\label{fig:break-graphs}
|
|
\end{figure}
|
|
|
|
\section{Exceptions}
|
|
|
|
As seen in section~\ref{sec:intro-exception}, exception handling in Java adds
|
|
two constructs: the \texttt{throw} and the \texttt{try-catch} statements. The
|
|
first one resembles an unconditional control flow statement, with an unknown (on
|
|
compile time) destination. The exception will be caught by a \texttt{catch} of
|
|
the corresponding type or a supertype ---if it exists. Otherwise, it will crash
|
|
the corresponding thread (or in single-threaded programs, stop the Java Virtual
|
|
Machine). The second stops the exceptional control flow conditionally, based on
|
|
the dynamic typing of the exception thrown. Both introduce challenges that must
|
|
be solved.
|
|
|
|
\subsection{\texttt{throw} statement}
|
|
|
|
The \texttt{throw} statement represents two elements at the same time: an
|
|
unconditional jump and an erroneous exit from its method. The first one has
|
|
been extensively covered and solved, but the second requires a small addition
|
|
to the CFG: instead of having a single ``End'' node, it will be split in two
|
|
---normal and error exit---, though the ``End'' cannot be removed, as a restriction
|
|
of most slicing algorithms is that the CFG have only one sink node. Therefore all
|
|
nodes that connected to the ``End'' will now lead to ``Normal exit'', all throw
|
|
statements' true outgoing edges will connect to the ``Error exit'', and both exit
|
|
nodes will converge on the ``End'' node.
|
|
|
|
\texttt{throw} statements in Java take a single value, a subtype of \texttt{Throwable},
|
|
and that value is used to stop the propagation of the exception; which can be handled
|
|
as a returned value. This treatment of \texttt{throw} statements only modifies the
|
|
structure of the CFG, without altering any other algorithm, nor the basic definitions
|
|
for control and data dependencies, making it very easy to incorporate to any existing
|
|
slicing software solution that follows the general model described.
|
|
|
|
\begin{example}[CFG of an uncaught \texttt{throw} statement] \ \\
|
|
\begin{minipage}{0.69\linewidth}
|
|
\begin{lstlisting}
|
|
double f(int x) {
|
|
if (x < 0)
|
|
throw new RuntimeException()
|
|
return Math.sqrt(x)
|
|
}
|
|
\end{lstlisting}
|
|
By analyzing the CFG, we can see that both exits are control dependent on the \texttt{throw}
|
|
statement; data dependencies present no special case in this example.
|
|
\end{minipage}
|
|
\begin{minipage}{0.3\linewidth}
|
|
\includegraphics[width=\linewidth]{img/throw-example-cfg}
|
|
\end{minipage}
|
|
\end{example}
|
|
|
|
\subsection{\texttt{try-catch} statement}
|
|
|
|
The \texttt{try-catch-finally} statement is the only way to stop an exception once it's thrown,
|
|
filtering by type, or otherwise letting it propagate further up the call stack. On top of that,
|
|
\texttt{finally} helps guarantee consistency, executing in any case (even when an exception is
|
|
left uncaught, the program returns or an exception occurs in a \texttt{catch} block). The main
|
|
problem with this construct is that \texttt{catch} blocks are not always necessary, but their
|
|
absence may make the compilation fail ---because a \texttt{try} block has no \texttt{catch} or
|
|
\texttt{finally} block---, or modify the execution in unexpected ways that are not always accounted
|
|
for in slicing software.
|
|
|
|
For the \texttt{try} block, it is normally represented as a pseudo--predicate, connected to the
|
|
first statement inside it and to the end of the first instruction after the whole \texttt{try-catch-finally}
|
|
construct. Inside the \texttt{try} there can be four distinct sources of exceptions:
|
|
|
|
\begin{description}
|
|
\item[Method calls.] If an exception is thrown inside a method and it is not caught, it will
|
|
surface inside the \texttt{try} block. As \textit{checked} exceptions must be declared
|
|
explicitly, method declarations may be consulted to see if a method call may or may not
|
|
throw any exceptions. On this front, polymorphism and inheritance present no problem, as
|
|
inherited methods may not modify the signature ---which includes the exceptions that may
|
|
be thrown. If \textit{unchecked} exceptions are also considered, all method calls shall
|
|
be included, as any can trigger at the very least a \texttt{StackOverflowException}.
|
|
\item[\texttt{throw} statements.] The least common, but most simple, as it is treated as a
|
|
\texttt{throw} inside a method.
|
|
\item[Implicit unchecked exceptions.] If \textit{unchecked} exceptions are considered, many
|
|
common expressions may throw an exception, with the most common ones being trying to call
|
|
a method or accessing a field of a \texttt{null} object (\texttt{NullPointerException}),
|
|
accessing an invalid index on an array (\texttt{ArrayIndexOutOfBoundsException}), dividing
|
|
an integer by 0 (\texttt{ArithmeticException}), trying to cast to an incompatible type
|
|
(\texttt{ClassCastException}) and many others. On top of that, the user may create new
|
|
types that inherit from \texttt{RuntimeException}, but those may only be explicitly thrown.
|
|
Their inclusion in program slicing and therefore in the method's CFG generates extra
|
|
dependencies that make the slices produced bigger.
|
|
\item[Erorrs.] May be generated at any point in the execution of the program, but they normally
|
|
signal a situation from which it may be impossible to recover, such as an internal JVM error.
|
|
In general, most programs do not consider these to be ``catch-able''.
|
|
\end{description}
|
|
|
|
All exception sources are treated in a similar fashion: the statement that may throw an exception
|
|
is treated as a predicate, with the true edge connected to the next instruction were the statement
|
|
to execute without raising exceptions; and the false edge connected to the \texttt{catch} node.
|
|
|
|
\carlos{CATCH Representation doesn't matter, it is similar to a switch but checking against types.
|
|
The difference exists where there exists the possibility of not catching the exception;
|
|
which is semantically possible to define. When a \texttt{catch (Throwable e)} is declared,
|
|
it is impossible for the exception to exit the method; therefore the control dependency must
|
|
be redefined.}
|
|
|
|
The filter for exceptions in Java's \texttt{catch} blocks is a type (or multiple types since
|
|
Java 8), with a class that encompasses all possible exceptions (\texttt{Throwable}), which acts
|
|
as a catch--all.
|
|
In the literature there exist two alternatives to represent \texttt{catch}: one mimics a static
|
|
switch statement, placing all the \texttt{catch} block headers at the same height, all pending
|
|
from the exception-throwing exception and the other mimics a dynamic switch or a chain of \texttt{if}
|
|
statements. The option chosen affects how control dependencies should be computed, as the different
|
|
structures generate different control dependencies by default.
|
|
|
|
\begin{description}
|
|
\item[Switch representation.] There exists no relation between different \texttt{catch} blocks,
|
|
each exception--throwing statement is connected through an edge labeled false to each
|
|
of the \texttt{catch} blocks that could be entered. Each \texttt{catch} block is a
|
|
pseudo--statement, with its true edge connected to the end of the \texttt{try} and the
|
|
As an example, a \texttt{1 / 0} expression may be connected to \texttt{ArithmeticException},
|
|
\texttt{RuntimeException}, \texttt{Exception} or \texttt{Throwable}.
|
|
If any exception may not be caught, there exists a connection to the ``Error exit'' of the method.
|
|
\item[If-else representation.] Each exception--throwing statement is connected to the first
|
|
\texttt{catch} block. Each \texttt{catch} block is represented as a predicate, with the true
|
|
edge connected to the first statement inside the \texttt{catch} block, and the false edge
|
|
to the next \texttt{catch} block, until the last one. The last one will be a pseudo--predicate
|
|
connected to the first statement after the \texttt{try} if it is a catch--all type or to the
|
|
``Error exit'' if it isn't.
|
|
\end{description}
|
|
|
|
\begin{example}[Catches.]\ \\
|
|
\begin{minipage}{0.49\linewidth}
|
|
\begin{lstlisting}
|
|
try {
|
|
f();
|
|
} catch (CheckedException e) {
|
|
} catch (UncheckedException e) {
|
|
} catch (Throwable e) {
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.49\linewidth}
|
|
\carlos{missing figures with 4 alternatives: if-else (with catch--all and without) and switch (same two)}
|
|
% \includegraphics[0.5\linewidth]{img/catch1}
|
|
% \includegraphics[0.5\linewidth]{img/catch2}
|
|
% \includegraphics[0.5\linewidth]{img/catch3}
|
|
% \includegraphics[0.5\linewidth]{img/catch4}
|
|
\end{minipage}
|
|
\end{example}
|
|
|
|
Regardless of the approach, when there exists a catch--all block, there is no dependency generated
|
|
from the \texttt{catch}, as all of them will lead to the next instruction. However, this means that
|
|
if no data is outputted from the \texttt{try} or \texttt{catch} block, the catches will not be picked
|
|
up by the slicing algorithm, which may alter the results unexpectedly. If this problem arises, the
|
|
simple and obvious solution would be to add artificial edges to force the inclusion of all \texttt{catch}
|
|
blocks, which adds instructions to the slice ---lowering its score when evaluating against benchmarks---
|
|
but are completely innocuous as they just stop the exception, without running any extra instruction.
|
|
|
|
Another alternative exists, though, but slows down the process of creating a slice from a SDG.
|
|
The \texttt{catch} block is only strictly needed if an exception that it catches may be thrown and
|
|
an instruction after the \texttt{try-catch} block should be executed; in any other case the \texttt{catch}
|
|
block is irrelevant and should not be included. However, this change requires analyzing the inclusion
|
|
of \texttt{catch} blocks after the two--pass algorithm has completed, slowing it down. In any case, each
|
|
approach trades time for accuracy and vice--versa, but the trade--off is small enough to be negligible.
|
|
|
|
Regarding \textit{unchecked} exceptions, an extra layer of analysis should be performed to tag statements
|
|
with the possible exceptions they may throw. On top of that, methods must be analyzed and tagged
|
|
accordingly. The worst case is that of inaccessible methods, which may throw any \texttt{RuntimeException},
|
|
but with the source code unavailable, they must be marked as capable of throwing it. This results on
|
|
a graph where each instruction is dependent on the proper execution of the previous statement; save
|
|
for simple statements that may not generate exceptions. The trade--off here is between completeness and
|
|
correctness, with the inclusion of \textit{unchecked} exceptions increasing both the completeness and the
|
|
slice size, reducing correctness. A possible solution would be to only consider user--generated exceptions
|
|
or assume that library methods may never throw an unchecked exception. A new slicing variation that
|
|
annotates methods or limits the unchecked exceptions to be considered.
|
|
|
|
Regarding the \texttt{finally} block, most approaches treat it properly; representing it twice: once
|
|
for the case where there is no active exception and another one for the case where it executes with
|
|
an exception active. An exception could also be thrown here, but that would be represented normally.
|
|
|
|
% vim: set noexpandtab:ts=2:sw=2:wrap
|