A \emph{control flow graph}$G$ of a program $P$ is a directed graph, represented as a tuple $\langle N, E \rangle$, where $N$ is a set of nodes, composed of a method's statements plus two special nodes, ``Start'' and ``End''; and $E$ is a set of edges of the form $e =\left(n_1, n_2\right) | n_1, n_2\in N$. Most algorithms to generate the SDG mandate the ``Start'' node to be the only source and ``End'' to be the only sink in the graph. \carlos{Is it necessary to define source and sink in the context of a graph?}.
Edges are created according to the possible execution paths that exist; each statement is connected to any statement that may immediately follow it. Formally, an edge $e =(n_1, n_2)$ exists if and only if there exists an execution of the program where $n_2$ is executed immediately after $n_1$. In general, expressions are not evaluated; so an \texttt{if} instruction has two outgoing edges even if the condition is always true or false, e.g. \texttt{1 == 0}.
Vertex $b$ is \textit{control dependent} on vertex $a$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $a$'s successors. It follows that a vertex with only one successor cannot be the source of control dependence.
Vertex $b$ is \textit{data dependent} on vertex $a$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists a \carlos{could it be ``an''??}$x$-definition free path from $a$ to $b$.
Data dependency was originally defined as flow dependency, and split into loop and non--loop related dependencies, but that distinction is no longer useful to compute program slices.
It should be noted that variable definitions and uses can be computed for each statement independently, analyzing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
The \textsl{program dependence graph} (PDG) is a directed graph (and originally a tree) represented by three elements: a set of nodes $N$, a set of control edges $E_c$ and a set of data edges $E_d$.
Both sets of edges are built as follows. There is a control edge between two nodes $n_1$ and $n_2$ if and only if $n_1\ctrldep n_2$, and a data edge between $n_1$ and $n_2$ if and only if $n_1\datadep n_2$. Additionally, if a node $n$ does not have any incoming control edges, it has a ``default'' control edge $e =(\textnormal{Start},n)$; so that ``Start'' is the only source node of the graph.
Note: the most common graphical representation is a tree--like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable.
\end{definition}
Finally, the SDG is built from the combination of all the PDGs that compose the
The \textsl{system dependence graph} (SDG) is a directed graph that represents the control and data dependencies of a whole program. It has three kinds of edges: control, data and function call. The graph is built combining multiple PDGs, with the ``Start'' nodes labeled after the function they begin. There exists one function call edge between each node containing one or more calls and each of the ``Start'' node of the method called. In a programming language where the function call is ambiguous (e.g. with pointers or polymorphism), there exists one edge leading to every possible function called.
\end{definition}
\begin{example}[Creation of a SDG from a simple program]
Given the program shown below (left), the control flow graphs for both methods are shown on the right: \\
Then, control and data dependencies are computed, arranging the nodes in the PDG. Finally, the two graphs are connected with summary edges to create the SDG:
\subsubsection{Function calls and data dependencies}
\carlos{Vocabulary: when is appropriate the use of method, function and procedure????}
In the original definition of the SDG, there was special handling of data dependencies when calling functions, as it was considered that parameters were passed by value, and global variables did not exist. \carlos{Name and cite paper that introduced it} solves this issue by splitting function calls and function into multiple nodes. This proposal solved everything related to parameter passing: by value, by reference, complex variables such as structs or objects and return values.
To such end, the following modifications are made to the different graphs:
\begin{description}
\item[CFG.] In each CFG, global variables read or modified and parameters are added to the label of the ``Start'' node in assignments of the form $par = par_{in}$ for each parameter and $x = x_{in}$ for global variables. Similarly, global variables and parameters modified are added to the label of the ``End'' node as $x_{out}= x$. The parameters are only passed back if the value set by the called method can be read by the callee. Finally, in method calls the same values must be packed and unpacked: each statement containing a function called is relabeled to contain input (of the form $par_{in}=\textnormal{exp}$ for parameters or $x_{in}= x$ for global variables) and output (always of the form $x = x_{out}$).
\item[PDG.] Each node modified in the CFG is split into multiple nodes: the original label is the main node and each assignment is represented as a new node, which is control--dependent on the main one. Visually, input is placed on the left and output on the right; with parameters sorted accordingly.
\item[SDG.] Three kinds of edges are introduced: parameter input (param--in), parameter output (param--out) and summary edges. Parameter input edges are placed between each method call's input node and the corresponding method definition input node. Parameter output edges are placed between each method definition's output node and the corresponding method call output node. Summary edges are placed between the input and output nodes of a method call, according to the dependencies inside the method definition: if there is a path from an input node to an output node, that shows a dependence and a summary method is placed in all method calls between those two nodes.
Note: parameter input and output edges are separated because the traversal algorithm traverses them only sometimes (the output edges are excluded in the first pass and the input edges in the second).
\end{description}
\begin{example}[Variable packing and unpacking]
Let it be a function $f(x, y)$ with two integer parameters, and a call $f(a + b, c)$, with parameters passed by reference if possible. The label of the method call node in the CFG would be ``\texttt{x\_in = a + b, y\_in = c, f(a + b, c), c = y\_out}''; method $f$ would have \texttt{x = x\_in, y = y\_in} in the ``Start'' node and \texttt{y\_out = y} in the ``End'' node. The relevant section of the SDG would be:
loop, at the loop condition, at the method's end, etc.).
An alternative approach is to represent the instruction as an edge, not a vertex, connecting the previous statement with the next to be executed. Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
From here, there stem two approaches: the first would be to
dependencies, which is the most widely--used solution \cite{BalH93}.
The most popular approach was proposed by Ball and Horwitz~\cite{BalH93}, classifying instructions into three separate categories:
\begin{description}
\item[Statement.] Any instruction that is not a conditional or unconditional jump. It has one outgoing edge in the CFG, to the next instruction that follows it in the program.
\item[Predicate.] Any conditional jump instruction, such as \texttt{while}, \texttt{until}, \texttt{do-while}, \texttt{if}, etc. It has two outgoing edges, labeled \textit{true} and \textit{false}; leading to the corresponding instructions.
\item[Pseudo--predicates.] Unconditional jumps (e.g. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are like predicates, with the difference that the outgoing edge labeled \textit{false} is marked as non--executable, and there is no possible execution where such edge would be possible, according to the definition of the CFG (as seen in definition~\ref{def:cfg}). Originally the edges had a specific reasoning backing them up: the \textit{true} edge leads to the jump's destination and the \textit{false} one, to the instruction that would be executed if the unconditional jump was removed, or converted into a \texttt{no op} (a blank operation that performs no change to the program's state). This specific behavior is used with unconditional jumps, but no longer applies to pseudo--predicates, as more instructions have used this category as means of ``artificially'' \carlos{bad word choice} generating control dependencies.
\end{description}
As a consequence of this classification, every instruction after an unconditional jump $j$ is control--dependent (either directly or indirectly) on $j$ and the structure containing it (a conditional statement or a loop), as can be seen in the following example.
\begin{example}[Control dependencies generated by unconditional instructions]
\label{exa:unconditional}
Figure~\ref{fig:break-graphs} showcases a small program with a \texttt{break} statement, its CFG and PDG with a slice in gray. The slicing criterion (line 5, variable $a$) is control dependent on both the unconditional jump and its surrounding conditional instruction (both on line 4); even though it is not necessary to include it (in the context of weak slicing).
Note: the ``Start'' node $S$ is also categorized as a pseudo--statement, with the \textit{false} edge connected to the ``End'' node, therefore generating a dependence from $S$ to all the nodes inside the method. This removes the need to handle $S$ with a special case when converting a CFG to a PDG, but lowers the explainability of non--executable edges as leading to the ``instruction that would be executed if the node was absent or a no--op''.
The original paper~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of instructions were lifted.
In the case of nested unconditional jumps where both jump to the same destination, only one of them (the out--most one) is needed. Figure~\ref{fig:nested-unconditional} showcases the problem, with the minimal slice \carlos{have not defined this yet} in gray, and the algorithmically computed slice in light blue. Specifically, lines 3 and 5 are included unnecessarily.
\caption{A program with nested unconditional control flow (left), its CFG (center) and PDG (right).}
\label{fig:nested-unconditional}
\end{figure}
\end{example}
\carlos{Add proposals to fix both problems showcased.}
\section{Exceptions}
Exception handling was first tackled in the context of Java program slicing by Sinha et al. \cite{SinH98}, with later contributions by Allen and Horwitz~\cite{AllH03}. There exist contributions for other programming languages, which will be explored later (chapter~\ref{cha:state-art}) and other small contributions. The following section will explain the treatment of the different elements of exception handling in Java program slicing.
As seen in section~\ref{sec:intro-exception}, exception handling in Java adds
two constructs: \texttt{throw} and \texttt{try-catch}. Structurally, the
first one resembles an unconditional control flow statement carrying a value ---like \texttt{return} statements--- but its destination is not fixed, as it depends on the dynamic typing of the value.
If there is a compatible \texttt{catch} block, execution will continue inside it, otherwise the method exits with the corresponding value as the error.
The same process is repeated in the method that called the current one, until either the call stack is emptied or the exception is successfully caught.
If the exception is not caught at all, the program exits with an error ---except in multi--threaded programs, in which case the corresponding thread is terminated.
The \texttt{try-catch} statement can be compared to a \texttt{switch} which compares types (with \texttt{instanceof}) instead of constants (with \texttt{==} and \texttt{Object\#equals(Object)}). Both structures require special handling to place the proper dependencies, so that slices are complete and as correct as can be.
\subsection{\texttt{throw} statement}
The \texttt{throw} statement compounds two elements in one instruction: an
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} instruction, but the second one requires a small addition to the CFG: there must be an alternative control flow, where the path of the exception is shown. For now, without including \texttt{try-catch} structures, any exception thrown will exit its method with an error; so a new ``Error end'' node is needed. The preexisting ``End'' node is renamed ``Normal end'', but now the CFG has two distinct sink nodes; which is forbidden in most slicing algorithms. To solve that problem, a general ``End'' node is created, with both normal and exit ends connected to it; making it the only sink in the graph.
In order to properly accomodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking is
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw} as if it were an unconditional jump whose destination is the ``Error exit''.
\begin{example}[CFG of an uncaught \texttt{throw} statement]
Consider the simple Java method on the right of figure~\ref{fig:throw}; which performs a square root if the number is positive, throwing otherwise a \texttt{RuntimeError}. The CFG in the centre illustrates the treatment of \texttt{throw}, ``normal exit'' and ``error exit'' as pseudo--statements, and the PDG on the right describes the