tfm-report/incremental_slicing.tex

% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = paper.tex
\chapter{Main explanation?}

\section{First definition of the SDG}
\label{sec:first-def-sdg}

The system dependence graph (SDG) is a method for program slicing that was first proposed by Horwitz, Reps and Blinkey \cite{HorwitzRB88}. It builds upon the existing control flow graph (CFG), defining dependencies between vertices of the CFG, and building a program dependence graph (PDG), which represents them. The system dependence graph (SDG) is then build from the assembly of the different PDGs (each representing a method of the program), linking each method call to its corresponding definition. Because each graph is built from the previous one, new constructs can be added with to the CFG, without the need to alter the algorithm that converts CFG to PDG and then to SDG. The only modification possible is the redefinition of a dependency or the addition of new kinds of dependence.

The language covered by the initial proposal was a simple one, featuring procedures with modifiable parameters and basic instructions, including calls to procedures, variable assignments, arithmetic and logic operators and conditional instructions (branches and loops): the basic features of an imperative programming language. The control flow graph was as simple as the programs themselves, with each graph representing one procedure. The instructions of the program are represented as vertices of the graph and are split into two categories: statements, which have no effect on the control flow (assignments, procedure calls) and predicates, whose execution may lead to one of multiple ---though traditionally two--- paths (conditional instructions). Statements are connected sequentially to the next instruction. Predicates have two outgoing edges, each connected to the first statement that should be executed, according to the result of evaluating the conditional expression in the guard of the predicate.

\begin{definition}[Control Flow Graph~\cite{???}]
	A \emph{control flow graph} $G$ of a program $P$ is a tuple $\langle N, E \rangle$.
\end{definition}

To build the PDG and then the SDG, some dependencies must be extracted from the CFG, which are defined as follows:

\begin{definition}[Postdominance]
	Vertex $b$ \textit{postdominates} vertex $b$ if and only if $a \neq b$ and $b$ is on every path from $a$ to the ``End'' vertex.
\end{definition}

\begin{definition}[Control dependency]
	\label{def:ctrl-dep}
	Vertex $b$ is \textit{control dependent} on vertex $a$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $a$'s successors. It follows that a vertex with only one successor cannot be the source of control dependence.
\end{definition}

\begin{definition}[Data dependency]
	Vertex $b$ is \textit{data dependent} on vertex $a$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there an $x$-definition free path from $a$ to $b$.\footnote{The initial definition of data dependency was further split into in-loop data dependencies and the rest, but the difference is not relevant for computing the slices in the SDG.}
\end{definition}

It should be noted that variable definitions and uses can be computed for each statement independently, analyzing the procedures called by it if necessary. In general, any instruction uses all variables that appear in it, save for the left-hand side of assignments. Similarly, no instruction defines variables, except those in the left-hand side of assignments. The variables used and defined by a procedure call are those used and defined by its body.

With the data and control dependencies, the PDG may be built, by replacing the edges from the CFG by data and control dependence edges. The first tends to be represented as a thin solid line, and the latter as a thick solid line. In the examples, data dependencies will be thin solid red lines.

The organization of the vertices of the PDG tends to resemble a tree graph, with the ``Start'' node in the position of the root (at the top), and the ``End'' node typically omitted. The control dependence edges structure the tree vertically. In the case that a vertex is control dependent on multiple vertices, it will be placed one level below the lowest source of control dependency. With a programming language this simple, cyclical control dependencies do not appear, but should they do so in further sections, the instructions are sorted top to bottom in the order they appear in the program. Horizontally, the vertices are sorted by their order in the program, left to right, in order to make the graph more readable. Data dependency edges are placed without reordering the nodes of the graph. In the examples given, edges like $a \datadep a$ or $b \ctrldep b$ may be omitted, as they are not relevant for later use of the graph. Please be noted that the location of the vertices is irrelevant for the slicing algorithm, and the aforementioned sorting rules are just for consistency with previous papers on the topic and to ease the visualization of programs.

Finally, the SDG is built from the combination of all the PDGs that compose the program. Each call vertex is connected to the ``Start'' of the corresponding procedure. All edges that connect PDGs are represented with dashed lines.

\begin{figure}
\begin{minipage}{0.3\linewidth}
	\begin{lstlisting}
	proc main() {
		a = 10;
		b = 20;
		f(a, b);
		print(a);
	}

	proc f(x, y) {
		while (x > y) {
			x = x - 1;
		}
		print(x);
	}
	\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
	\includegraphics[width=0.3\linewidth]{img/cfgsimple}
	\includegraphics[width=0.65\linewidth]{img/cfgsimple2}
\end{minipage}
\includegraphics[width=0.5\linewidth]{img/pdgsimple}
\includegraphics[width=0.49\linewidth]{img/pdgsimple2}
\includegraphics[width=0.6\linewidth]{img/sdgsimple}
\includegraphics[width=0.4\linewidth]{img/legendsimple}
\caption{A simple program with its CFGs (top right), PDGs (center) and SDG (bottom).}
\label{fig:sdg-loop}
\end{figure}

\subsubsection{Procedures and data dependencies}

The only thing left to explain before introducing more constructs into the language is the passing of parameters. Most programming language accept a variable number of input parameters and one output parameter. In the case of input parameters passed by reference, or constructs such as structs or classes, modifying a field of a parameter may modify the original variable. In order to deal with everything related to parameter passing, including global variables, class fields, etc. there is a small extension to be made to the CFG and PDG.

In the CFG, the ``Start'' and ``End'' nodes contain a list of assignments, inputting and outputting respectively the appropriate values, as can be seen in the example. Consequently, every vertex that contains a procedure or function call pack and unpack the arguments. For every variable $x$ that is used in a procedure, every call to it must be preceded by $x_{in} = x$, and the procedures's ``Start'' vertex must contain $x = x_{in}$. The opposite happens when a variable must be ``outputted''\carlos{replace}: before the ``End'' node, the value must be packed ($x_{out} = x$), and after each call, the value must be assigned to the corresponding variable ($x = x_{out}$). Parameters may be assigned as $par^i_{in} = expr_i$ (where $i$ is the index of the parameter in the procedure definition, $par^i$ is the name of the parameter and $expr_i$ is the expression in the $i^{th}$ position in the procedure call) in the call vertex, and parameters whose modifications inside the procedure are passed back to the calling procedure must be extracted as $var = par^i_{out}$ (where $var$ is the name of the variable ---passed by reference--- in the calling procedure).\carlos{What if object/struct passed by value?} As an addition, in the SDG, an extra edge is added (summary edge), which represents the dependencies that the input variables have on the outputs. This allows the algorithm to know the dependencies without traversing the corresponding function.

All these additions are added as extra lines in the ``Start'', ``End'' and calling vertices.
When building the PDG, all additions (variable assignments) are split into their own vertices, and are control dependent on them.
Data dependencies no longer flow throw the call vertex, but throw the appropriate child, which minimizes the size of the slice produced.
As an example, figure~\ref{fig:sdg-loop} shows the three stages of a program, from CFG to SDG.
The construction of the CFG is straight-forward, save for the packing and unpacking of variables in the start, end and call vertices.
In the PDG, the statements are split, control and data dependencies replace the control flow edges.
Finally, both PDGs are linked via call and parameter (input and output) edges, forming the SDG.
Summary edges are placed according to the data and control flow of the method call, and the graph is complete.

\section{Unconditional control flow}

Even though the initial definition of the SDG was useful to compute slices, the language covered was not enough for the typical language of the 1980's, which included (in one form or another) unconditional control flow.
Therefore, one of the first additions contributed to the algorithm to build system dependence graphs was the inclusion of unconditional jumps, such as ``break'', ``continue'', ``goto'' and ``return'' statements (or any other equivalent).
A naive representation would be to treat them the same as any other statement, but with the outgoing edge landing in the corresponding instruction (outside the loop, at the loop condition, at the method's end, etc.).
An alternative approach is to represent the instruction as an edge, not a vertex, connecting the previous statement with the next to be executed.
Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see Definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
From here, there stem two approaches: the first would be to redefine control dependency, in order to reflect the real effect of these instructions ---as some authors~\cite{DanBHHKL11} have tried to do--- and the second would be to alter the creation of the SDG to ``create'' those dependencies, which is the most widely--used solution.

The most popular approach was proposed by Ball and Horwitz\cite{BalH93}, and represents unconditional jumps as a \textsl{pseudo--predicate}.
The true edge would lead to the next instruction to be executed, and the false edge would be non-executable or \textit{dummy} edges, connected to the instruction that would be executed were the unconditional jump a \textit{nop}.
The consequence of this solution is that every instruction placed after the unconditional jump is control dependent on the jump, as can be seen in Figure~\ref{fig:break-graphs}.
In the example, when slicing with respect to variable $a$ on line 5, every statement would be included, save for ``print(a)''.
Line 4 is not strictly necessary in this example ---in the context of weak slicing---, but is included nonetheless.
In the original paper, the transformation is proved to be complete, but not correct, as for some examples, the slice includes more unconditional jumps that would be strictly necessary, even for weak slicing.
Ball and Horwitz theorize that a more correct approach would be possible, if it weren't for the limitation of slices to be a subset of statements of the program, in the same order as in the original.

\begin{figure}
\centering
\begin{minipage}{0.3\linewidth}
	\begin{lstlisting}
static void f() {
	int a = 1;
	while (a > 0) {
		if (a > 10) break;
		a++;
	}
	System.out.println(a);
}
	\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
	\includegraphics[width=0.4\linewidth]{img/breakcfg}
	\includegraphics[width=0.59\linewidth]{img/breakpdg}
\end{minipage}
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
\label{fig:break-graphs}
\end{figure}

\section{Exceptions}

As seen in section~\ref{sec:intro-exception}, exception handling in Java adds two constructs: the \texttt{throw} and the \texttt{try-catch} statements. The first one resembles an unconditional control flow statement, with an unknown (on compile time) destination. The exception will be caught by a \texttt{catch} of the corresponding type or a supertype ---if it exists. Otherwise, it will crash the corresponding thread (or in single-threaded programs, stop the Java Virtual Machine).

\subsection{\texttt{throw} statement}

The \texttt{throw} statement is represented as a ``return'', but instead of leaving the method through the ``End'' node, a new ``Error end'' or ``Error exit'' is created. This represents the

\subsection{\texttt{try-catch} statement}

% vim: set noexpandtab:ts=2:sw=2:wrap