400 lines
29 KiB
TeX
400 lines
29 KiB
TeX
% !TEX encoding = UTF-8
|
|
% !TEX spellcheck = en_GB
|
|
% !TEX root = ../paper.tex
|
|
\chapter{Main explanation?}
|
|
\label{cha:incremental}
|
|
|
|
\carlos{Review if we want to call nodes ``Enter'' and ``Exit'' or ``Start'' and ``End'' (I'd prefer the first one).}
|
|
\sergio{Enter o Entry?}
|
|
|
|
\section{First definition of the SDG}
|
|
\label{sec:first-def-sdg}
|
|
|
|
The system dependence graph (SDG) is a method for program slicing that was first
|
|
proposed by Horwitz, Reps and Blinkey \cite{HorwitzRB88}. It builds upon the
|
|
existing control flow graph (CFG), defining dependencies between vertices of the
|
|
CFG, and building a program dependence graph (PDG), which represents them. The
|
|
system dependence graph (SDG) is then built from the assembly of the different
|
|
PDGs (each representing a method of the program), linking each method call to
|
|
its corresponding definition. Because each graph is built from the previous one,
|
|
new constructs can be added with to the CFG, without the need to alter the
|
|
algorithm that converts CFG to PDG and then to SDG. The only modification
|
|
possible is the redefinition of a dependency or the addition of new kinds of
|
|
dependence.
|
|
|
|
The language covered by the initial proposal was a simple one, featuring
|
|
procedures with modifiable parameters and basic instructions, including calls to
|
|
procedures, variable assignments, arithmetic and logic operators and conditional
|
|
instructions (branches and loops): the basic features of an imperative
|
|
programming language. The control flow graph was as simple as the programs
|
|
themselves, with each graph representing one procedure. The instructions of the
|
|
program are represented as vertices of the graph and are split into two
|
|
categories: statements, which have no effect on the control flow (assignments,
|
|
procedure calls) and predicates, whose execution may lead to one of multiple
|
|
---though traditionally two--- paths (conditional instructions). Statements are
|
|
connected sequentially to the next instruction. Predicates have two outgoing
|
|
edges, each connected to the first statement that should be executed, according
|
|
to the result of evaluating the conditional expression in the guard of the
|
|
predicate.
|
|
|
|
\begin{definition}[Control Flow Graph \carlos{add original citation}]
|
|
\label{def:cfg}
|
|
A \emph{control flow graph} $G$ of a program $P$ is a directed graph, represented as a tuple $\langle N, E \rangle$, where $N$ is a set of nodes, composed of a method's statements plus two special nodes, ``Start'' and ``End''; and $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$. Most algorithms to generate the SDG mandate the ``Start'' node to be the only source and ``End'' to be the only sink in the graph. \carlos{Is it necessary to define source and sink in the context of a graph?}.
|
|
|
|
Edges are created according to the possible execution paths that exist; each statement is connected to any statement that may immediately follow it. Formally, an edge $e = (n_1, n_2)$ exists if and only if there exists an execution of the program where $n_2$ is executed immediately after $n_1$. In general, expressions are not evaluated; so an \texttt{if} instruction has two outgoing edges even if the condition is always true or false, e.g. \texttt{1 == 0}.
|
|
\end{definition}
|
|
|
|
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence.
|
|
|
|
\begin{definition}[Postdominance \carlos{add original citation?}]
|
|
\label{def:postdominance}
|
|
Vertex $b$ \textit{postdominates} vertex $a$ if and only if $b$ is on every path from $a$ to the ``End'' vertex.
|
|
\end{definition}
|
|
|
|
\begin{definition}[Control dependency \carlos{add original citation}]
|
|
\label{def:ctrl-dep}
|
|
Vertex $b$ is \textit{control dependent} on vertex $a$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $a$'s successors. It follows that a vertex with only one successor cannot be the source of control dependence.
|
|
\end{definition}
|
|
|
|
\begin{definition}[Data dependency \carlos{add original citation}]
|
|
\label{def:data-dep}
|
|
Vertex $b$ is \textit{data dependent} on vertex $a$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists a \carlos{could it be ``an''??} $x$-definition free path from $a$ to $b$.
|
|
|
|
Data dependency was originally defined as flow dependency, and split into loop and non--loop related dependencies, but that distinction is no longer useful to compute program slices.
|
|
It should be noted that variable definitions and uses can be computed for each statement independently, analyzing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
|
|
\end{definition}
|
|
|
|
With the data and control dependencies, the PDG may be built by replacing the
|
|
edges from the CFG by data and control dependence edges. The first tends to be
|
|
represented as a thin solid line, and the latter as a thick solid line. In the
|
|
examples, data dependencies will be thin solid red lines.
|
|
|
|
\begin{definition}[Program dependence graph]
|
|
\label{def:pdg}
|
|
The \textsl{program dependence graph} (PDG) is a directed graph (and originally a tree) represented by three elements: a set of nodes $N$, a set of control edges $E_c$ and a set of data edges $E_d$.
|
|
|
|
The set of nodes corresponds to the set of nodes of the CFG, excluding the ``End'' node.
|
|
|
|
Both sets of edges are built as follows. There is a control edge between two nodes $n_1$ and $n_2$ if and only if $n_1 \ctrldep n_2$, and a data edge between $n_1$ and $n_2$ if and only if $n_1 \datadep n_2$. Additionally, if a node $n$ does not have any incoming control edges, it has a ``default'' control edge $e = (\textnormal{Start},n)$; so that ``Start'' is the only source node of the graph.
|
|
|
|
Note: the most common graphical representation is a tree--like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable.
|
|
\end{definition}
|
|
|
|
Finally, the SDG is built from the combination of all the PDGs that compose the
|
|
program.
|
|
|
|
\begin{definition}[System dependence graph]
|
|
\label{def:sdg}
|
|
The \textsl{system dependence graph} (SDG) is a directed graph that represents the control and data dependencies of a whole program. It has three kinds of edges: control, data and function call. The graph is built combining multiple PDGs, with the ``Start'' nodes labeled after the function they begin. There exists one function call edge between each node containing one or more calls and each of the ``Start'' node of the method called. In a programming language where the function call is ambiguous (e.g. with pointers or polymorphism), there exists one edge leading to every possible function called.
|
|
\end{definition}
|
|
|
|
\begin{example}[Creation of a SDG from a simple program]
|
|
Given the program shown below (left), the control flow graphs for both methods are shown on the right: \\
|
|
\begin{minipage}{0.2\linewidth}
|
|
\begin{lstlisting}
|
|
proc main() {
|
|
a = 10;
|
|
b = 20;
|
|
f(a, b);
|
|
}
|
|
|
|
proc f(x, y) {
|
|
while (x > y) {
|
|
x = x - 1;
|
|
}
|
|
print(x);
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.79\linewidth}
|
|
\includegraphics[width=0.6\linewidth]{img/cfgsimple}
|
|
\end{minipage}
|
|
|
|
Then, control and data dependencies are computed, arranging the nodes in the PDG. Finally, the two graphs are connected with summary edges to create the SDG:
|
|
|
|
\begin{center}
|
|
\includegraphics[width=0.8\linewidth]{img/sdgsimple}
|
|
\end{center}
|
|
\end{example}
|
|
|
|
\subsubsection{Function calls and data dependencies}
|
|
|
|
\carlos{Vocabulary: when is appropriate the use of method, function and procedure????}
|
|
|
|
In the original definition of the SDG, there was special handling of data dependencies when calling functions, as it was considered that parameters were passed by value, and global variables did not exist. \carlos{Name and cite paper that introduced it} solves this issue by splitting function calls and function into multiple nodes. This proposal solved everything related to parameter passing: by value, by reference, complex variables such as structs or objects and return values.
|
|
|
|
To such end, the following modifications are made to the different graphs:
|
|
|
|
\begin{description}
|
|
\item[CFG.] In each CFG, global variables read or modified and parameters are added to the label of the ``Start'' node in assignments of the form $par = par_{in}$ for each parameter and $x = x_{in}$ for global variables. Similarly, global variables and parameters modified are added to the label of the ``End'' node as $x_{out} = x$. The parameters are only passed back if the value set by the called method can be read by the callee. Finally, in method calls the same values must be packed and unpacked: each statement containing a function called is relabeled to contain input (of the form $par_{in} = \textnormal{exp}$ for parameters or $x_{in} = x$ for global variables) and output (always of the form $x = x_{out}$).
|
|
\item[PDG.] Each node modified in the CFG is split into multiple nodes: the original label is the main node and each assignment is represented as a new node, which is control--dependent on the main one. Visually, input is placed on the left and output on the right; with parameters sorted accordingly.
|
|
\item[SDG.] Three kinds of edges are introduced: parameter input (param--in), parameter output (param--out) and summary edges. Parameter input edges are placed between each method call's input node and the corresponding method definition input node. Parameter output edges are placed between each method definition's output node and the corresponding method call output node. Summary edges are placed between the input and output nodes of a method call, according to the dependencies inside the method definition: if there is a path from an input node to an output node, that shows a dependence and a summary method is placed in all method calls between those two nodes.
|
|
|
|
Note: parameter input and output edges are separated because the traversal algorithm traverses them only sometimes (the output edges are excluded in the first pass and the input edges in the second).
|
|
\end{description}
|
|
|
|
\begin{example}[Variable packing and unpacking]
|
|
Let it be a function $f(x, y)$ with two integer parameters, and a call $f(a + b, c)$, with parameters passed by reference if possible. The label of the method call node in the CFG would be ``\texttt{x\_in = a + b, y\_in = c, f(a + b, c), c = y\_out}''; method $f$ would have \texttt{x = x\_in, y = y\_in} in the ``Start'' node and \texttt{y\_out = y} in the ``End'' node. The relevant section of the SDG would be:
|
|
\begin{center}
|
|
\includegraphics[width=0.5\linewidth]{img/parameter-passing}
|
|
\end{center}
|
|
\end{example}
|
|
|
|
\section{Unconditional control flow}
|
|
|
|
Even though the initial definition of the SDG was useful to compute slices, the
|
|
language covered was not enough for the typical language of the 1980s, which
|
|
included (in one form or another) unconditional control flow. Therefore, one of
|
|
the first additions contributed to the algorithm to build system dependence
|
|
graphs was the inclusion of unconditional jumps, such as ``break'',
|
|
``continue'', ``goto'' and ``return'' statements (or any other equivalent). A
|
|
naive representation would be to treat them the same as any other statement, but
|
|
with the outgoing edge landing in the corresponding instruction (outside the
|
|
loop, at the loop condition, at the method's end, etc.).
|
|
An alternative approach is to represent the instruction as an edge, not a vertex, connecting the previous statement with the next to be executed. Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
|
|
From here, there stem two approaches: the first would be to
|
|
redefine control dependency, in order to reflect the real effect of these
|
|
instructions ---as some authors~\cite{DanBHHKL11} have tried to do--- and the
|
|
second would be to alter the creation of the SDG to ``create'' those
|
|
dependencies, which is the most widely--used solution \cite{BalH93}.
|
|
|
|
The most popular approach was proposed by Ball and Horwitz~\cite{BalH93}, classifying instructions into three separate categories:
|
|
|
|
\begin{description}
|
|
\item[Statement.] Any instruction that is not a conditional or unconditional jump. It has one outgoing edge in the CFG, to the next instruction that follows it in the program.
|
|
\item[Predicate.] Any conditional jump instruction, such as \texttt{while}, \texttt{until}, \texttt{do-while}, \texttt{if}, etc. It has two outgoing edges, labeled \textit{true} and \textit{false}; leading to the corresponding instructions.
|
|
\item[Pseudo--predicates.] Unconditional jumps (e.g. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are like predicates, with the difference that the outgoing edge labeled \textit{false} is marked as non--executable, and there is no possible execution where such edge would be possible, according to the definition of the CFG (as seen in definition~\ref{def:cfg}). Originally the edges had a specific reasoning backing them up: the \textit{true} edge leads to the jump's destination and the \textit{false} one, to the instruction that would be executed if the unconditional jump was removed, or converted into a \texttt{no op} (a blank operation that performs no change to the program's state). This specific behavior is used with unconditional jumps, but no longer applies to pseudo--predicates, as more instructions have used this category as means of ``artificially'' \carlos{bad word choice} generating control dependencies.
|
|
\end{description}
|
|
|
|
As a consequence of this classification, every instruction after an unconditional jump $j$ is control--dependent (either directly or indirectly) on $j$ and the structure containing it (a conditional statement or a loop), as can be seen in the following example.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}{0.3\linewidth}
|
|
\begin{lstlisting}
|
|
static void f() {
|
|
int a = 1;
|
|
while (a > 0) {
|
|
if (a > 10) break;
|
|
a++;
|
|
}
|
|
System.out.println(a);
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.6\linewidth}
|
|
\includegraphics[width=0.4\linewidth]{img/breakcfg}
|
|
\includegraphics[width=0.59\linewidth]{img/breakpdg}
|
|
\end{minipage}
|
|
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
|
|
\label{fig:break-graphs}
|
|
\end{figure}
|
|
|
|
\begin{example}[Control dependencies generated by unconditional instructions]
|
|
\label{exa:unconditional}
|
|
Figure~\ref{fig:break-graphs} showcases a small program with a \texttt{break} statement, its CFG and PDG with a slice in gray. The slicing criterion (line 5, variable $a$) is control dependent on both the unconditional jump and its surrounding conditional instruction (both on line 4); even though it is not necessary to include it (in the context of weak slicing).
|
|
|
|
Note: the ``Start'' node $S$ is also categorized as a pseudo--statement, with the \textit{false} edge connected to the ``End'' node, therefore generating a dependence from $S$ to all the nodes inside the method. This removes the need to handle $S$ with a special case when converting a CFG to a PDG, but lowers the explainability of non--executable edges as leading to the ``instruction that would be executed if the node was absent or a no--op''.
|
|
\end{example}
|
|
|
|
The original paper~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of instructions were lifted.
|
|
|
|
\begin{example}[Nested unconditional jumps]
|
|
\label{exa:nested-unconditional}
|
|
In the case of nested unconditional jumps where both jump to the same destination, only one of them (the out--most one) is needed. Figure~\ref{fig:nested-unconditional} showcases the problem, with the minimal slice \carlos{have not defined this yet} in gray, and the algorithmically computed slice in light blue. Specifically, lines 3 and 5 are included unnecessarily.
|
|
|
|
\begin{figure}
|
|
\begin{minipage}{0.15\linewidth}
|
|
\begin{lstlisting}
|
|
while (X) {
|
|
if (Y) {
|
|
if (Z) {
|
|
A;
|
|
break;
|
|
}
|
|
B;
|
|
break;
|
|
}
|
|
C;
|
|
}
|
|
D;
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.84\linewidth}
|
|
\includegraphics[width=0.4\linewidth]{img/nested-unconditional-cfg}
|
|
\includegraphics[width=0.59\linewidth]{img/nested-unconditional-pdg}
|
|
\end{minipage}
|
|
\caption{A program with nested unconditional control flow (left), its CFG (center) and PDG (right).}
|
|
\label{fig:nested-unconditional}
|
|
\end{figure}
|
|
\end{example}
|
|
|
|
\carlos{Add proposals to fix both problems showcased.}
|
|
|
|
\section{Exceptions}
|
|
|
|
Exception handling was first tackled in the context of Java program slicing by Sinha et al. \cite{SinH98}, with later contributions by Allen and Horwitz~\cite{AllH03}. There exist contributions for other programming languages, which will be explored later (chapter~\ref{cha:state-art}) and other small contributions. The following section will explain the treatment of the different elements of exception handling in Java program slicing.
|
|
|
|
As seen in section~\ref{sec:intro-exception}, exception handling in Java adds
|
|
two constructs: \texttt{throw} and \texttt{try-catch}. Structurally, the
|
|
first one resembles an unconditional control flow statement carrying a value ---like \texttt{return} statements--- but its destination is not fixed, as it depends on the dynamic typing of the value.
|
|
If there is a compatible \texttt{catch} block, execution will continue inside it, otherwise the method exits with the corresponding value as the error.
|
|
The same process is repeated in the method that called the current one, until either the call stack is emptied or the exception is successfully caught.
|
|
If the exception is not caught at all, the program exits with an error ---except in multi--threaded programs, in which case the corresponding thread is terminated.
|
|
The \texttt{try-catch} statement can be compared to a \texttt{switch} which compares types (with \texttt{instanceof}) instead of constants (with \texttt{==} and \texttt{Object\#equals(Object)}). Both structures require special handling to place the proper dependencies, so that slices are complete and as correct as can be.
|
|
|
|
\subsection{\texttt{throw} statement}
|
|
|
|
The \texttt{throw} statement compounds two elements in one instruction: an
|
|
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} instruction, but the second one requires a small addition to the CFG: there must be an alternative control flow, where the path of the exception is shown. For now, without including \texttt{try-catch} structures, any exception thrown will exit its method with an error; so a new ``Error end'' node is needed. The preexisting ``End'' node is renamed ``Normal end'', but now the CFG has two distinct sink nodes; which is forbidden in most slicing algorithms. To solve that problem, a general ``End'' node is created, with both normal and exit ends connected to it; making it the only sink in the graph.
|
|
|
|
In order to properly accomodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking is
|
|
|
|
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw} as if it were an unconditional jump whose destination is the ``Error exit''.
|
|
|
|
\begin{example}[CFG of an uncaught \texttt{throw} statement]
|
|
Consider the simple Java method on the right of figure~\ref{fig:throw}; which performs a square root if the number is positive, throwing otherwise a \texttt{RuntimeError}. The CFG in the centre illustrates the treatment of \texttt{throw}, ``normal exit'' and ``error exit'' as pseudo--statements, and the PDG on the right describes the
|
|
\label{exa:throw}
|
|
\begin{figure}[h]
|
|
\begin{minipage}{0.3\linewidth}
|
|
\begin{lstlisting}
|
|
double f(int x) {
|
|
if (x < 0)
|
|
throw new RuntimeException()
|
|
return Math.sqrt(x)
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.69\linewidth}
|
|
\includegraphics[width=\linewidth]{img/throw-example-cfg}
|
|
\end{minipage}
|
|
\caption{A simple program with a \texttt{throw} statement, its CFG (centre) and its PDG (left).}
|
|
\label{fig:throw}
|
|
\end{figure}
|
|
\end{example}
|
|
|
|
\subsection{\texttt{try-catch} statement}
|
|
|
|
The \texttt{try-catch-finally} statement is the only way to stop an exception once \added{it is}\deleted{it's} thrown,
|
|
filtering by type, or otherwise letting it propagate further up the call stack. On top of that,
|
|
\texttt{finally} helps guarantee consistency, executing in any case (even when an exception is
|
|
left uncaught, the program returns or an exception occurs in a \texttt{catch} block). The main
|
|
problem with this construct is that \texttt{catch} blocks are not always necessary, but their
|
|
absence may make the compilation fail ---because a \texttt{try} block has no \texttt{catch} or
|
|
\texttt{finally} block---, or modify the execution in unexpected ways that are not always accounted
|
|
for in slicing software.
|
|
|
|
For the \texttt{try} block, it is normally represented as a pseudo--predicate, connected to the
|
|
first statement inside it and to the end of the first instruction after the whole \texttt{try-catch-finally}
|
|
construct. Inside the \texttt{try} there can be four distinct sources of exceptions:
|
|
|
|
\begin{description}
|
|
\item[Method calls.] If an exception is thrown inside a method and it is not caught, it will
|
|
surface inside the \texttt{try} block. As \textit{checked} exceptions must be declared
|
|
explicitly, method declarations may be consulted to see if a method call may or may not
|
|
throw any exceptions. On this front, polymorphism and inheritance present no problem, as
|
|
inherited methods may not modify the signature ---which includes the exceptions that may
|
|
be thrown. If \textit{unchecked} exceptions are also considered, all method calls shall
|
|
be included, as any can trigger at the very least a \texttt{StackOverflowException}.
|
|
\item[\texttt{throw} statements.] The least common, but most simple, as it is treated as a
|
|
\texttt{throw} inside a method.
|
|
\item[Implicit unchecked exceptions.] If \textit{unchecked} exceptions are considered, many
|
|
common expressions may throw an exception, with the most common ones being trying to call
|
|
a method or accessing a field of a \texttt{null} object (\texttt{NullPointerException}),
|
|
accessing an invalid index on an array (\texttt{ArrayIndexOutOfBoundsException}), dividing
|
|
an integer by 0 (\texttt{ArithmeticException}), trying to cast to an incompatible type
|
|
(\texttt{ClassCastException}) and many others. On top of that, the user may create new
|
|
types that inherit from \texttt{RuntimeException}, but those may only be explicitly thrown.
|
|
Their inclusion in program slicing and therefore in the method's CFG generates extra
|
|
dependencies that make the slices produced bigger.
|
|
\item[\added{Errors}\deleted{Erorrs}.] May be generated at any point in the execution of the program, but they normally
|
|
signal a situation from which it may be impossible to recover, such as an internal JVM error.
|
|
In general, most programs do not consider these to be ``catch-able''.
|
|
\end{description}
|
|
|
|
All exception sources are treated in a similar fashion: the statement that may throw an exception
|
|
is treated as a predicate, with the true edge connected to the next instruction were the statement
|
|
to execute without raising exceptions; and the false edge connected to the \texttt{catch} node.
|
|
|
|
\carlos{CATCH Representation doesn't matter, it is similar to a switch but checking against types.
|
|
The difference exists where there exists the possibility of not catching the exception;
|
|
which is semantically possible to define. When a \texttt{catch (Throwable e)} is declared,
|
|
it is impossible for the exception to exit the method; therefore the control dependency must
|
|
be redefined.}
|
|
|
|
The filter for exceptions in Java's \texttt{catch} blocks is a type (or multiple types since
|
|
Java 8), with a class that encompasses all possible exceptions (\texttt{Throwable}), which acts
|
|
as a catch--all.
|
|
In the literature there exist two alternatives to represent \texttt{catch}: one mimics a static
|
|
switch statement, placing all the \texttt{catch} block headers at the same height, all pending
|
|
from the exception-throwing exception and the other mimics a dynamic switch or a chain of \texttt{if}
|
|
statements. The option chosen affects how control dependencies should be computed, as the different
|
|
structures generate different control dependencies by default.
|
|
|
|
\begin{description}
|
|
\item[Switch representation.] There exists no relation between different \texttt{catch} blocks,
|
|
each exception--throwing statement is connected through an edge labeled false to each
|
|
of the \texttt{catch} blocks that could be entered. Each \texttt{catch} block is a
|
|
pseudo--statement, with its true edge connected to the end of the \texttt{try} and the
|
|
As an example, a \texttt{1 / 0} expression may be connected to \texttt{ArithmeticException},
|
|
\texttt{RuntimeException}, \texttt{Exception} or \texttt{Throwable}.
|
|
If any exception may not be caught, there exists a connection to the ``Error exit'' of the method.
|
|
\item[If-else representation.] Each exception--throwing statement is connected to the first
|
|
\texttt{catch} block. Each \texttt{catch} block is represented as a predicate, with the true
|
|
edge connected to the first statement inside the \texttt{catch} block, and the false edge
|
|
to the next \texttt{catch} block, until the last one. The last one will be a pseudo--predicate
|
|
connected to the first statement after the \texttt{try} if it is a catch--all type or to the
|
|
``Error exit'' if it \added{is not}\deleted{isn't}.
|
|
\end{description}
|
|
|
|
\begin{example}[Catches.]\ \\
|
|
\begin{minipage}{0.49\linewidth}
|
|
\begin{lstlisting}
|
|
try {
|
|
f();
|
|
} catch (CheckedException e) {
|
|
} catch (UncheckedException e) {
|
|
} catch (Throwable e) {
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.49\linewidth}
|
|
\carlos{missing figures with 4 alternatives: if-else (with catch--all and without) and switch (same two)}\josep{Definitely!!!}
|
|
% \includegraphics[0.5\linewidth]{img/catch1}
|
|
% \includegraphics[0.5\linewidth]{img/catch2}
|
|
% \includegraphics[0.5\linewidth]{img/catch3}
|
|
% \includegraphics[0.5\linewidth]{img/catch4}
|
|
\end{minipage}
|
|
\end{example}
|
|
|
|
Regardless of the approach, when there exists a catch--all block, there is no dependency generated
|
|
from the \texttt{catch}, as all of them will lead to the next instruction. However, this means that
|
|
if no data is outputted from the \texttt{try} or \texttt{catch} block, the catches will not be picked
|
|
up by the slicing algorithm, which may alter the results unexpectedly. If this problem arises, the
|
|
simple and obvious solution would be to add artificial edges to force the inclusion of all \texttt{catch}
|
|
blocks, which adds instructions to the slice ---lowering its score when evaluating against benchmarks---
|
|
but are completely innocuous as they just stop the exception, without running any extra instruction.
|
|
|
|
Another alternative exists, though, but slows down the process of creating a slice from a SDG.
|
|
The \texttt{catch} block is only strictly needed if an exception that it catches may be thrown and
|
|
an instruction after the \texttt{try-catch} block should be executed; in any other case the \texttt{catch}
|
|
block is irrelevant and should not be included. However, this change requires analyzing the inclusion
|
|
of \texttt{catch} blocks after the two--pass algorithm has completed, slowing it down. In any case, each
|
|
approach trades time for accuracy and vice--versa, but the trade--off is small enough to be negligible.
|
|
|
|
Regarding \textit{unchecked} exceptions, an extra layer of analysis should be performed to tag statements
|
|
with the possible exceptions they may throw. On top of that, methods must be analyzed and tagged
|
|
accordingly. The worst case is that of inaccessible methods, which may throw any \texttt{RuntimeException},
|
|
but with the source code unavailable, they must be marked as capable of throwing it. This results on
|
|
a graph where each instruction is dependent on the proper execution of the previous statement; save
|
|
for simple statements that may not generate exceptions. The trade--off here is between completeness and
|
|
correctness, with the inclusion of \textit{unchecked} exceptions increasing both the completeness and the
|
|
slice size, reducing correctness. A possible solution would be to only consider user--generated exceptions
|
|
or assume that library methods may never throw an unchecked exception. A new slicing variation that
|
|
annotates methods or limits the unchecked exceptions to be considered.
|
|
|
|
Regarding the \texttt{finally} block, most approaches treat it properly; representing it twice: once
|
|
for the case where there is no active exception and another one for the case where it executes with
|
|
an exception active. An exception could also be thrown here, but that would be represented normally.
|
|
|
|
% vim: set noexpandtab:ts=2:sw=2:wrap
|