tfm-report/Secciones/incremental_slicing.tex

406 lines
35 KiB
TeX

% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = ../paper.tex
\chapter{Program slicing with exception handling}
\label{cha:incremental}
\section{First definition of the SDG}
\label{sec:first-def-sdg}
The SDG is the most common data structure for program representation in the field of program slicing.
It was first proposed by Horwitz et al. \cite{HorwitzRB88} and, since then, many approaches to program slicing have based their models on it.
It builds upon the existing CFG, which represents the control flow between the statements of a method. Then, it creates a PDG using the CFG's vertices and the dependencies computed from it.
The SDG is finally built from the assembly of the different method's PDGs, linking each method call to its corresponding definition.
Because each graph is built from the previous one, new statements and statements can be added with to the CFG, without the need to alter the algorithm that converts each CFG to PDG and then to the final SDG.
The only modification possible is the redefinition of an already defined dependence or the addition of new kinds of dependence.
The seminal appearance of the SDG covers a simple imperative programming language, featuring procedures and basic statements like calls, variable assignments, arithmetic and logic operators and conditional statements (branches and loops).
\begin{definition}[Control Flow Graph (based on \cite{Allen70})]
\label{def:cfg}
Given a method $M$, which contains a list of statements $s = \{s_1, s_2, ...\}$, the \emph{control flow graph} of $M$ is a directed graph $G = \langle N, E \rangle$, where:
\begin{itemize}
\item $N = s \cup \{\textnormal{Enter}, \textnormal{Exit}\}$: a set of nodes such that for each statement $s_i$ in $s$ there is a node in $N$ labelled with $s_i$ and two special nodes ``Enter'' and ``Exit'', which represent the beginning and end of the method, respectively.
\item $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$. There exist edges between normal statements, in the order they appear in the program: the ``Enter'' node is connected to the first statement, which in turn is connected to the second, etc. Additionally, conditional statements (i.e., \texttt{if}) have two outgoing edges: one towards the first statement executed if the condition evaluates to \textit{true} and another towards the first statement if the condition evaluates to \textit{false}.
\end{itemize}
\end{definition}
Most algorithms, in order to generate the SDG, mandate the ``Enter'' node to be the only source and the ``Exit'' node to be the only sink in the graph.
In general, expressions are not evaluated when generating the CFG; so an \texttt{if} conditional statement will two outgoing edges regardless the condition value being always true or false (e.g., \texttt{1 == 0}).
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence. First, though, we need to define the concept of postdominance in a graph, as it is necessary in the definition of control dependence:
\begin{definition}[Postdominance \cite{Tip95}]
\label{def:postdominance}
Let $C = (N, E)$ be a CFG. $b \in N$ \textit{postdominates} $a \in N$ if and only if $b$ is present on every possible sequence from $a$ to ``Exit''.
\end{definition}
From the previous definition, given that the ``Exit'' node is the only sink in the CFG, every node will have a path to it, so it follows that any node postdominates itself.
\begin{definition}[Control dependence \cite{HorwitzRB88}]
\label{def:ctrl-dep}
Let $C = (N, E)$ be a CFG. $b \in N$ is \textit{control dependent} on $a \in N$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $\{n~|~(a, n) \in E, n \in N\}$ ($a$'s successors).
\end{definition}
It follows that a node with less than two outgoing edges cannot be the source of control dependence.
\begin{definition}[Data dependence \cite{HorwitzRB88}]
\label{def:data-dep}
Let $C = (N,E)$ be a CFG.
$b \in N$ is \textit{data dependent} on $a \in N$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists in $C$ a sequence of edges from $a$ to $b$ where $x$ is not defined.
\end{definition}
Data dependence was originally defined as flow dependence, and subcategorized into loop-carried and loop-independent flow-dependencies, but that distinction is no longer used to compute program slices with the SDG. It should be noted that variable definitions and uses can be computed for each statement independently, analysing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
With the data and control dependencies, the PDG may now be built by replacing the
edges from the CFG by data and control dependence edges. The first tends to be
represented as a thin dashed line or a thin solid coloured line; and the latter as a thin solid black line.
In the examples, data and control dependencies are represented by red and black solid lines, respectively.
\begin{definition}[Program dependence graph]
\label{def:pdg}
Given a method $M$, composed of statements $S = \{s_1, s_2, ... s_n\}$ and its associated CFG $C = (N, E)$, the \textit{program dependence graph} (PDG) of $M$ is a directed graph $G = \langle N', E_c, E_d \rangle$, where:
\begin{enumerate}
\item $N' = N~\backslash~\{\textnormal{Exit}\}$
\item $(a, b) \in E_c \iff a, b \in N' \wedge (a \ctrldep b \vee a = \textnormal{Enter}) ~ \wedge \not\exists c \in N' ~.~ a \ctrldep c \wedge c \ctrldep b$ (\textit{control edges})
\item $(a, b) \in E_d \iff a, b \in N' \wedge a \datadep b$ (\textit{data edges})
\end{enumerate}
\end{definition}
Regarding the graphical representation of the PDG, the most common one is a tree-like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable. An example of the creation of the PDGs of a program's methods can be seen in Example~\ref{exa:simple-pdg}.
\begin{example}[Creation of a PDG from a simple program]
\label{exa:simple-pdg}
Consider the program shown on the left side of Figure~\ref{fig:simple-sdg-code}, where two procedures in a simple imperative language are shown. The CFG that corresponds to each procedure is shown on the right side.
\begin{figure}[h]
\begin{minipage}{0.2\linewidth}
\begin{lstlisting}
proc main() {
a = 10;
b = 20;
f(a, b);
}
proc f(x, y) {
while (x > y) {
x = x - 1;
}
print(x);
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.79\linewidth}
\centering
\includegraphics[width=0.6\linewidth]{img/cfgsimple}
\end{minipage}
\caption{A simple imperative program composed of two procedures (left) and their associated CFGs (right).}
\label{fig:simple-sdg-code}
\end{figure}
Then, the nodes of each CFG are rearranged, according to the control and data dependencies, to create the corresponding PDGs. Both are shown in Figure~\ref{fig:simple-sdg}, each bounded by a rectangle.
\begin{figure}[h]
\centering
\includegraphics[width=0.8\linewidth]{img/sdgsimple}
\caption{The PDG that corresponds to the program from Figure~\ref{fig:simple-sdg-code}.}
\label{fig:simple-sdg}
\end{figure}
\end{example}
Before creating the SDG by joining the different PDGs, we must consider the treatment of method calls and their data dependencies.
\subsubsection{Method calls and data dependencies}
Although it is not imperative, since the inception of the SDG, data input and output from method calls\footnotemark has been treated with special detail. A similar system is used for a method input (parameters) and output (return value) as with the global variables it can access (static variables and fields from a class in Java).
Method calls can access global variables and modify them, and to that end we must add fictitious nodes that represent variable input and output from the methods in both the method calls and their declarations.
This proposal can also be extended to those programming languages that pass parameters by reference instead of the more common pass-by-value.
Java objects and arrays can also be analysed more deeply, as even though Java passes parameters by value, modifications to fields of an object or elements of an array affect the original object or array.
\footnotetext{Method calls in this thesis will refer to Java method calls, but most if not all the details provided apply to functions, procedures and other routines.}
In practice, the following modifications are made to the different graphs:
\begin{description}
\item[CFG.] The CFG's structure is not modified, as the control flow is not altered by the treatment of variables. Instead, some labels are extended with extra information, which is later used in the PDG's creation. Specifically, the ``Enter'' node, the ``Exit'' node and nodes that contain method calls are modified:
\begin{description}
\item[Enter.] Each global variable that is used or modified and every parameter are appended to the node's label in assignments of the form $par = par_{in}$ in the case of parameters and $x = x_{in}$ in the case of global variables. These lines are the input information, and will become the input nodes.
\item[End.] Each global variable that is modified and every parameter whose modification can be read by the caller are prepended to the node's label. The assignments take the form $x_{out} = x$ for both. The method's output is also added, if the method will return a value, as \texttt{output}. These lines constitute the output information, and will be transformed into output nodes.
\item[Method call.] Each method call must be preceded by the input information and followed by the output information of the corresponding method. The input takes the form $par_{in} = \textnormal{exp}$ for each parameter and $x_{in} = x$ for each global variable $x$. The output is always of the form $x = x_{out}$, except for the output of the function, which is labelled \texttt{output}.
\end{description}
\item[PDG.] Each node augmented with input or output information in the CFG is now split into multiple nodes: the original label (``Enter'', ``Exit'' or function call) is the main node and each assignment contained in the input and output information is represented as a new node, which is control-dependent on the main one.
\end{description}
Now that method calls are properly handled, the SDG can be defined as the combination of PDGs, with the addition of four dependencies that connect the method calls and their definitions.
\begin{definition}[System dependence graph]
\label{def:sdg}
Given a program $P$, composed of a set of methods $M = \{m_0 ... m_n\}$ and their associated PDGs---each method $m_i$ has a $PDG^i = \langle N^i, E_c^i, E_d^i \rangle$.
The \textit{system dependence graph} (SDG) of $P$ is a graph $G = \langle N, E_c, E_d, E_{call}, E_{in}, E_{out}, E_{sum} \rangle$ where:
\begin{enumerate}
\item $N = \bigcup_{i=0}^n N^i$
\item $E_c = \bigcup_{i=0}^n E_c^i$
\item $E_d = \bigcup_{i=0}^n E_d^i$
\item $(a, b) \in E_{call}$ if and only if $a$ is a statement that contains a call and $b$ is a method ``Enter'' node of the function or method called by $a$. $(a, b)$ is a \textit{call edge}.
\item $(a, b) \in E_{in}$ if and only if $a$ and $b$ are input nodes which refer to the same variable or parameter, $m_{call} \ctrldep a \wedge m_{enter} \ctrldep b \wedge (m_{call}, m_{enter}) \in E_{call}$ ($m_{call}$ is a method call, $m_{enter}$ is an ``Enter'' node). $(a, b)$ is a \textit{parameter-input} or \textit{param-in edge}.
\item $(a, b) \in E_{out}$ if and only if $a$ and $b$ are output nodes which refer to the same variable or to the output, $m_{enter} \ctrldep a \wedge m_{call} \ctrldep b \wedge (m_{call}, m_{enter}) \in E_{call}$ ($m_{call}$ is a method call, $m_{enter}$ is an ``Enter'' node). $(a, b)$ is a \textit{parameter-output} or \textit{param-out edge}.
\item $(a, b) \in E_{sum}$ if and only if $a$ is an input node and $b$ is an output node, $m_{call} \ctrldep a \wedge m_{call} \ctrldep b$, $m_{call}$ is a node that contains a method call and there is a path from $a$ to $b$. $(a, b)$ is a \textit{summary edge}.
\end{enumerate}
\end{definition}
Regarding call edges, in programming languages with ambiguous method calls (those that have polymorphism or pointers), there may exist multiple outgoing call edges from a statement with a single method call.
To avoid confusion, the ``Enter'' nodes of each method are relabelled with their method's name.
\begin{example}[The creation of a system dependence graph]
\label{exa:example-sdg}
For simplicity, we explore a single small method that is called by another.
Let $f(x, y)$ be a method with two integer parameters that modifies the argument passed in its second parameter. Its code is displayed in Figure~\ref{fig:example-sdg-code}. It also uses a global variable $z$. A valid call to $f$ could be $f(a + 1, b)$, with parameters passed by reference when possible.
\begin{figure}[h]
\begin{lstlisting}
void f(int x, int y) {
z += x;
y++;
}
\end{lstlisting}
\caption{A simple method that modifies a parameter and a global variable.}
\label{fig:example-sdg-code}
\end{figure}
The CFG is very simple, with the addition of the parameter information to the labels of the nodes. The aforementioned method call would be labelled as ``$z_{in} = z$, $x_{in} = a + 1$, $y_{in} = b$, $f(a + 1, b)$, $b = y_{out}$, $z = z_{out}$'', with the inputs, the actual call and the outputs.
The PDG seems more complicated, but can be pieced together piece by piece. In Figure~\ref{fig:example-sdg-graph}, the PDG is the graph below and including the node ``Enter f''. First, the input and output information is extracted into nodes, and placed in order. The input nodes will generate data dependencies (shown in red) to the statements inside the method, and those in turn to the output nodes. All statements are control-dependent on the ``Enter'' node, as there are no conditional expressions.
Finally, if we connect the PDG of the method that contains the method call $f(a + 1, b)$ to the method's PDG we obtain the SDG (where shown partially, as the method containing the method call has not been detailed). There are param-in and param-out dependencies (shown with dashes), which connect each input node from the method call to its corresponding node from the method declaration (and vice versa for the outputs). There is also the call edge, which connects the actual call to the declaration, and finally there are the summary edges, which of course summarize the dependencies that exist between the input and output nodes inside the method.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{img/parameter-passing}
\caption{The CFG of $f$ from Figure~\ref{fig:example-sdg-code} (left) and its SDG (right).}
\label{fig:example-sdg-graph}
\end{figure}
\end{example}
\section{Creating slices with the SDG}
Once a SDG has been built, it can be traversed to create slices, without the need to rebuild it unless the underlying program changes. The traversal process is actually consists of two passes:
The node that corresponds to the statement in the slicing criterion is selected as the initial node. From there, all edges except for \textit{param-in} are traversed backwards. All nodes encountered are added to a set (the slice). When all possible edges have been traversed, the second pass begins, ignoring \textit{param-out} edges, adding the nodes found to the aforementioned set.
When the process has ended, the set of nodes encountered during the two-pass traversal constitutes the slice.
Along this thesis there are some examples where the SDG has been sliced, filling the nodes in grey and marking the slicing criterion in bold. Some are Example~\ref{exa:program-slicing2}, Example~\ref{exa:unconditional}, Example~\ref{exa:problem-break-sub} and Example~\ref{exa:incorrect-try-catch-graph}.
\section{Unconditional control flow}
Even though the initial definition of the SDG was adequate to compute slices, the
language covered was not enough for the typical language of the 1980s, which
included (in one form or another) unconditional control flow. Therefore, one of
the first additions contributed to the algorithm to build SDGs was the inclusion of unconditional jumps, such as ``break'',
``continue'', ``goto'' and ``return'' statements (or any other equivalent).
A naive representation would be to treat them the same as any other statement, but
with the outgoing edge landing in the corresponding statement (e.g., outside the
loop); or, alternatively, to represent the statement as an edge, not a vertex, connecting the previous statement with the next to be executed.
Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see Definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
From here, there stem two approaches: the first would be to
redefine control dependence, in order to reflect the real effect of these
statements---as some authors have done~\cite{DanBHHKL11}---and the
second would be to alter some step of the SDG's construction to introduce those
dependencies.
The most popular approach follows the latter option (modifying the SDG's construction), and was proposed by Ball et al.~\cite{BalH93}. It classifies statements into three separate categories:
\begin{description}
\item[Statement.] Any statement that is not a conditional or unconditional jump. In the CFG, their nodes have one outgoing edge pointing to the next statement that follows them in the program.
\item[Predicate.] Any conditional jump statement, such as \texttt{while}, \texttt{until}, \texttt{do-while}, \texttt{if}, etc. In the CFG, nodes representing predicates have two outgoing edges, labelled \textit{true} and \textit{false}, leading to the statements that would be executed with each result of the condition evaluation. As mentioned before, in general no evaluation is performed on the conditions, so every conditional statement has two outgoing edges, even if the condition is trivially \textit{true} or \textit{false} (e.g., $1 = 1$ or \textit{false}).
\item[Pseudo-predicates.] Unconditional jumps (i.e. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are treated like predicates, with the difference that the outgoing edge labelled \textit{false} is marked as non-executable---because there is no possible execution where such edge would be possible, according to the definition of the CFG (see Definition~\ref{def:cfg}). For unconditional jumps, the \textit{true} statement leads to the statement that will be executed after the jump is performed, and the \textit{false} edge to the statement that \textit{would} be executed if the jump was skipped or turned into a no-operation.
In future sections, other statements will make use of the pseudo-predicate structure (two outgoing edges, one non-executable), but using a different definition to place the non-executable edge. Therefore, the behaviour described for unconditional jumps is not universal for all statements classified as pseudo-statements.
\end{description}
As a consequence of this classification, every statement after an unconditional jump $j$ is control-dependent on it, as can be seen in the following example.
\begin{example}[Control dependencies generated by unconditional jumps]
\label{exa:unconditional}
Consider the program on the left side of Figure~\ref{fig:break-graphs}, which contains a loop and a \texttt{break} statement. The figure also includes the CFG and PDG for the method, showcasing the data and control dependencies of the statements. The slicing criterion $\langle 6, a\rangle$ is control dependent on both the unconditional jump and its surrounding conditional statement. Therefore, the slice (all nodes coloured in grey) includes both. They are necessary to terminate the loop, but they could be excluded in the context of weak slicing: the loop does not need to terminate, the slice can keep producing values.
\begin{figure}[h]
\centering
\begin{minipage}{0.3\linewidth}
\begin{lstlisting}
static void f() {
int a = 1;
while (a > 0) {
if (a > 10)
break;
a++;
}
System.out.println(a);
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
\includegraphics[width=0.4\linewidth]{img/breakcfg}
\includegraphics[width=0.59\linewidth]{img/breakpdg}
\end{minipage}
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
\label{fig:break-graphs}
\end{figure}
\end{example}
% The
% The original paper\josep{que original paper? parece que hablas de alguno que hayas hablado antes, pero el lector ya no se acuerda. Empieza de otra manera...}~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to Example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of statements were lifted.
% \begin{example}[Nested unconditional jumps]
% \label{exa:nested-unconditional}
% \josep{Esta frase es dificil de leer. No se entiende hasta leerla dos o tres veces.}In the case of nested unconditional jumps where both jump to the same destination, only one of them (the out--most one) is needed \josep{El lector no tiene contexto para saber de que hablas. Mejor empieza al reves: Consider the program in Figure~\ref{fig:nested-unconditional} where we can observe two nested unconditional jumps in lines X and Y. If we slice this program using the dependencies computed according to \cite{} then we compute the slice in light blue. Nevertheless, the minimal slice is composed of the nodes in grey [NOTA: yo no veo los colores. Arreglar la frase si no coincide con los colores]. This means that the slice computed includes unnecessary code (lines 3 and 5 are included unnecessarily). This problem is explained in depth and a solution proposed in Section~\ref{}}. Figure~\ref{fig:nested-unconditional} showcases the problem, with the minimal slice \carlos{have not defined this yet} in grey, and the algorithmically computed slice in light blue. Specifically, lines 3 and 5 are included unnecessarily.
% \begin{figure}
% \begin{minipage}{0.15\linewidth}
% \begin{lstlisting}
% while (X) {
% if (Y) {
% if (Z) {
% A;
% break;
% }
% B;
% break;
% }
% C;
% }
% D;
% \end{lstlisting}
% \end{minipage}
% \begin{minipage}{0.84\linewidth}
% \includegraphics[width=0.4\linewidth]{img/nested-unconditional-cfg}
% \includegraphics[width=0.59\linewidth]{img/nested-unconditional-pdg}
% \end{minipage}
% \caption{A program with nested unconditional control flow (left), its CFG (center) and \josep{its} PDG (right).}
% \label{fig:nested-unconditional}
% \end{figure}
% \end{example}
% \carlos{Add proposals to fix both problems showcased.}
\section{Exceptions}
Exception handling was first tackled in the context of Java program slicing by Sinha et al. \cite{SinH98}, with later contributions by Allen and Horwitz~\cite{AllH03}. There exist contributions for other programming languages, which will be explored later in chapter~\ref{cha:state-art}. This section explains the treatment of the different elements of exception handling in Java program slicing.
As seen in section~\ref{sec:intro-exception}, exception handling in Java adds
two constructs: \texttt{throw} and \texttt{try-catch}. Structurally, the
first one resembles an unconditional control flow statement carrying a value---like \texttt{return} statements---but its destination is not fixed, as it depends on the dynamic typing of the value.
The \texttt{try-catch} statement can be likened to a \texttt{switch} which compares types (using the \texttt{instanceof} operator) instead of constants. Both structures require special handling to place the proper dependencies, so that slices are complete and as correct as possible.
\subsection{\texttt{throw} statement}
The \texttt{throw} statement compounds two elements in one statement: an
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} statement, but the second one requires a small addition to the CFG: there must be an alternative control flow for the error to flow throw until it is caught or the program terminates.
So far, without including \texttt{try-catch} structures, any exception thrown will activate the aforementioned ``exception mode'' and leave its method with an error state. Hence, in order to model this behaviour, a different exit point (represented with a node labelled ``Error exit'') needs to be defined.
Consequently, the pre-existing ``Exit'' node is renamed to ``Normal exit''. Now we face the problem that CFGs may have two distinct sink nodes, something which is forbidden in most slicing algorithms.
To solve that problem, a general ``Exit'' node is created, with both ``Normal exit'' and ``Error exit'' connected to it, which makes it the new sink of the CFG.
In order to properly accommodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking must be moved from ``Exit'' to both ``Normal exit'' and ``Error exit''. This duplicates some nodes, but allows some of those duplicated to be removed. Therefore, this change constitutes an increase in precision, as now the outputted variables are differentiated. For example, a slice which only requires the ``Error exit'' may include less variable modifications than one which includes both.
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw} statement as if it were an unconditional jump whose destination is the ``Error exit''.
\begin{example}[CFG of an uncaught \texttt{throw} statement]
Consider the simple Java method on the left of Figure~\ref{fig:throw}; which performs a square root on a global variable $x$ if the number is positive, otherwise throwing a \texttt{RuntimeError}. The CFG in the centre illustrates the treatment of \texttt{throw} as a pseudo-statement and the new nodes ``Normal exit'' and ``Error exit''. The PDG on the right describes the control dependencies generated from the \texttt{throw} statement to the following statements and exit nodes.
\label{exa:throw}
\begin{figure}[h]
\begin{minipage}{0.3\linewidth}
\begin{lstlisting}
void f() {
if (x < 0)
throw new RuntimeException()
x = Math.sqrt(x)
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.69\linewidth}
\includegraphics[width=\linewidth]{img/throw-example-cfg}
\end{minipage}
\caption{A simple program with a \texttt{throw} statement (left), its CFG (centre) and its PDG (right).}
\label{fig:throw}
\end{figure}
\end{example}
\subsection{\texttt{try-catch-finally} statement}
The \texttt{try-catch} statement is the only way to stop an exception once it is thrown.
It filters exceptions by their type; letting those which do not match any of the catch blocks propagate to an external \texttt{try-catch} statement or to the previous method in the call stack.
On top of that, the \texttt{finally} statement helps programmers guarantee code execution. It can be used as a replacement for or in conjunction with \texttt{catch} statements.
The code placed inside a \texttt{finally} statement is guaranteed to run if the \texttt{try} block has been entered.
This holds true whether the \texttt{try} block exits correctly, an exception is caught, an exception is left uncaught or an exception is caught and another one is thrown while handling it (within its \texttt{catch} block).
The main problem when including \texttt{try-catch} blocks in program slicing is that \texttt{catch} blocks are not always strictly necessary for the slice (less so for weak slices), but introduce control dependencies that must be properly mapped to the SDG. The absence of \texttt{catch} blocks may also be a problem for compilation, as Java requires at least one \texttt{catch} or \texttt{finally} block to accompany each \texttt{try} block; though that could be fixed after generating the slice, if it is required that the slice should be executable.
Allen et al.'s representation of the \texttt{try} block is as a pseudo-predicate, connected to the first statement inside it and to the statement that follows the \texttt{try} block.
This generates control dependencies from the \texttt{try} node to each of the statements it contains.
Inside the \texttt{try} there can be four distinct sources of exceptions:
\begin{description}
\item[\texttt{throw} statements.] The least common, but most simple to treat, because the exception is always thrown. The only problem may come from the ambiguity of the exception's type. For example, in the statement \texttt{throw ((Throwable) o)}, where \texttt{o} is a variable of type Object, the real type of the exception is unknown.
\item[Implicit unchecked exceptions.] If \textit{unchecked} exceptions are considered, many
common expressions may throw an exception, with the most common ones being trying to call
a method or accessing a field of a \texttt{null} object (\texttt{NullPointerException}),
accessing an invalid index on an array (\texttt{ArrayIndexOutOfBoundsException}), dividing
an integer by 0 (\texttt{ArithmeticException}), trying to cast to an incompatible type
(\texttt{ClassCastException}) and many others. On top of that, the user may create new
types that inherit from \texttt{RuntimeException}, but those may only be explicitly thrown.
Their inclusion in program slicing and therefore in the method's CFG generates extra
dependencies that make the slices produced bigger. For this reason, they are not considered in most of the previous works. This does not mean that they require special treatment in the graph, they just need to be identified in all instructions that may generated them.
\item[Method calls.] If an exception is thrown inside a method and it is not caught, it will
surface inside the \texttt{try} block.
As \textit{checked} exceptions must be declared explicitly, method declarations may be consulted to see if a method call may or may not throw any exceptions.
On this front, polymorphism and inheritance present no problem, as inherited methods must match the signature of the parent method---including exceptions that may be thrown.
In case \textit{unchecked} exceptions are also considered, method calls could be analysed to know which exceptions may be thrown, or the documentation could be checked automatically for the comment annotation \texttt{@throws} to know which ones can be raised. This is the most common way an exception appears inside a \texttt{try-catch} statement.
\item[Errors.] May be generated at any point in the execution of the program, but they normally
signal a situation from which it may be impossible to recover, such as an internal JVM error.
In general, most programs will not attempt to catch them, and can be excluded in order to simplify implicit unchecked exceptions (any statement at any moment may throw an Error). Therefore, most slicing software ignores them. Similarly to implicit unchecked exceptions, they do not need special treatment, but their identification is costly and can complicate the SDG until every instruction is dependent on the correct execution of the previous one; which is true in a technical sense but not in most practical applications of program slicing.
\end{description}
All exception sources (except \texttt{throw} statements) are treated very similarly: the statement that may throw an exception
has an outgoing edge the next statement. Then, there is an outgoing edge to each \texttt{catch} statement whose type may be compatible with the exception raised.
The nodes that represent \texttt{try} and \texttt{catch} statements are both pseudo-predicates: the \textit{true} edge leads to the first statement inside them, and the \textit{false} edge leads to the first instruction after the \texttt{try-catch} statement.
Unfortunately, when the exception source is a method call, there is an augmented behaviour that make the representation slightly different, since there may be variables to unpack, both in the case of a normal or erroneous exit. To that end, nodes containing method calls have an unlimited number of outgoing edges: one that points to an auxiliary node labelled ``normal return'', in which the output variables produced by any normal exit of the method are placed. Each catch must then be labelled with the output variables produced by the erroneous exits of the method.
The ``normal return'' node is itself a pseudo-statement. The \textit{true} edge is connected to the following statement, and the \textit{false} one to the first common statement between all the paths of non-zero length start from the method call. The most common destinations for the \textit{false} edge are (1) the first statement after the \texttt{try-catch} (if all exceptions that could be thrown are caught) and (2) the ``Error exit'' of the method (if some exception is not caught).
\begin{example}[Code that throws and catches exceptions.]
Consider the segment of Java code in Figure~\ref{fig:try-catch} (left), which includes some statements without any data dependence (X, Y and Z), and a method call to $f$ that uses $x$ and $y$, two global variables. $f$ may throw an exception, so it has been placed inside a \texttt{try-catch} structure, with a statement in the \texttt{catch} that logs a message when it occurs. Additionally, consider the case that when $f$ exits normally, only $x$ is modified; but when an error occurs, only $y$ is modified.
As can be seen in the CFG shown in Figure~\ref{fig:try-catch} (centre), the nodes ``Normal return'', ``catch'' and ``try'' are considered as pseudo-statements, and their \textit{true} and \textit{false} edges (solid and dashed respectively) are used to create control dependencies.
The statements contained after the function call, inside the \texttt{catch} statement and inside the \texttt{try} statement are respectively controlled by the aforementioned nodes.
Finally, consider the statement \texttt{Z}; which is not dependent on any part of the \texttt{try-catch} statement, as all exceptions that may be thrown are caught: it will execute regardless of the path taken inside the \texttt{try} block.% \carlos{Consider critiquing the result, saying that despite the last sentence, statements can be removed (the catch) so that the dependencies are no longer the same.}
\begin{figure}[h]
\begin{minipage}{0.35\linewidth}
\begin{lstlisting}
try {
X;
f();
Y;
} catch (Exception e) {
System.out.println("error");
}
Z;
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.64\linewidth}
\includegraphics[width=\linewidth]{img/try-catch-example}
\end{minipage}
\caption{A simple program with a method call that could throw an exception (left), its CFG (centre) and its PDG (left).}
\label{fig:try-catch}
\end{figure}
\end{example}
% vim: set noexpandtab:ts=2:sw=2:wrap