% !TEX encoding = UTF-8
% !TEX spellcheck = en_US
% !TEX root = ../paper.tex

\chapter{Background}
\label{cha:background}

\section{Program slicing}
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
answers the question: ``which parts of a program \added{do} affect a given statement and
variable\added{s}?'' The statement and the variable\added{s} are the basic input to create a slice
and are called the \textsl{slicing criterion}. The criterion can be more
complex, as different slicing techniques may require additional pieces of input.
The \textsl{slice} of a program is the list of statements from the original
program ---which constitutes a valid program---, whose execution will result in
the same values for the variable\added{s} (selected in the slicing criterion)\deleted{ being read
by a debugger in the selected statement}.
There exist two fundamental dimensions along which the problem of slicing can be
proposed \added{\cite{vocabulary}}:
\begin{itemize}
	\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
		statically or dynamically.
		\textsl{Static slicing} \cite{Wei81} \added{produces slices that}\deleted{is a slice which} consider\deleted{s} all
		possible executions of the program, only taking into account the
		semantics of the programming language.
		In contrast, \textsl{dynamic slicing} \cite{KorL88} \added{considers a single execution of the program, thus, limiting} \deleted{limits} the slice to
		the statements present in an execution log. The slicing criterion is
		expanded to include a position in the log that corresponds to one
		instance of the selected statement, making it much more specific. It may
		help finding a bug related to indeterministic behavior (such as a random
		or pseudo-random number generator), but must be recomputed for each case
		being analyzed.
	\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
		\cite{Wei81} is generally more used, because it looks at the statements
		that affect the slicing criterion. In contrast, \textsl{forward slicing}
		\cite{BerC85} computes the statements that are affected by the slicing
		criterion. There also exists a mixed approach called \textsl{chopping}
		\cite{JacR94}, which is used to find all statements that affect \added{some variables in the slicing criterion and at the same time they are affected by some other variables in} \deleted{or are
		affected by} the slicing criterion.
\end{itemize}

Since the definition of program slicing, the most extended form of slicing has
been \textsl{static backward slicing}, which obtains the list of statements that
affect the value of a variable in a given statement, in all possible executions
of the program (i.e., for any input data).
\begin{definition}[Strong static backward slice \cite{Wei81,HorwitzRB88}]
	\label{def:strong-slice}
	\carlos{Falta ver exactamente cuál es la cita correcta.}
	Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
	$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
	or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
	respect to $C$ if $S$ has the following properties:
	\begin{enumerate}
		\item $S$ is an executable program.
		\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
		\item For any input $I$, the values produced on each execution of $s$
			for each of the variables in $v$ is the same when executing $S$ as
			when executing $P$. \label{enum:exact-output}
	\end{enumerate}
\end{definition}

\begin{definition}[Weak static backward slice \cite{RepY89}]
	\label{def:weak-slice}
	\carlos{Comprobar cita y escribir formalmente}
	Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
	$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
	or may not be used in $s$), $S$ is the \textsl{weak slice} of $P$ with
	respect to $C$ if $S$ has the following properties:
	\begin{enumerate}
		\item $S$ is an executable program.
		\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
		\item For any input $I$, the values produced on each execution of $s$
			for each of the variables in $v$ when executing $P$ is a prefix of
			those produced while executing $S$ ---which means that the slice
			may continue producing values, but the first values produced always
			match up with \added{all those produced with} the original program.
	\end{enumerate}
\end{definition}

Both definitions (\ref{def:strong-slice} and~\ref{def:weak-slice}) are
used throughout the literature \added{(see, e.g., \cite{pending})}, with some cases favoring the first and some the
second. Though the definitions come from the corresponding citations, the naming
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
where slices \added{that}\deleted{which} produce the same output as the original are named
\textsl{strong}, and those where the original is a prefix of the slice,
\textsl{weak} \carlos{Se podría argumentar que con el slice débil es suficiente
para debugging, ya que si un error se presenta en el original, aparecerá también en el programa fragmentado}. \josep{Pues si. añade un parrafo. a continuacion explicando ese hecho, porque asi justificas la existencia de los dos. Un lector que no sepa de slicing ahora mismo se esta preguntando para que sirve la weak :-)}

\begin{example}[Strong, weak and incorrect slices]
	In table~\ref{tab:slice-weak} we can observe examples for the various
	definitions. Each row shows the values produced by the execution of a
	program or one of its slices.  The first is the original, which computes
	$3!$. Slice A is one slice, whose execution is identical and therefore is a
	strong slice. Slice B \added{correctly produced the same values as the original program}\deleted{is correct} but \added{it} continues producing values after the
	original stops ---a weak slice. It would fit the relaxed definition but not
	a strong one. Slice C is incorrect, as the values differ from the original.
	Some data or control dependency has not been included in the slice and the
	program are behaving in a different way.
\end{example}

\begin{table}
	\centering
	\label{tab:slice-weak}
	\begin{tabular}{r | r | r | r | r | r }
		Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
		Original & 1 & 2 & 6 & - & - \\ \hline
		Slice A & 1 & 2 & 6 & - & - \\ \hline
		Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
		Slice C & 1 & 1 & 3 & 5 & 8 \\
	\end{tabular}
	\caption{Execution logs of different slices and their original program.}
\end{table}

Program slicing is a language--agnostic tool, but the original proposal by
Weiser~\cite{Wei81} \added{covered}\deleted{covers} a simple imperative programming language.
Since \added{then}, the literature has been expanded by dozens of authors, that have
described and implemented slicing for more complex structures, such as
uncontrolled control flow~\cite{HorwitzRB88}, global variables~\cite{???},
exception handling~\cite{AllH03}; and for other programming paradigms, such as
object-oriented languages~\cite{???} or functional languages~\cite{???}.
\carlos{Se pueden poner más, faltan las citas correspondientes.}

\subsection{The System Dependence Graph (SDG)}

There exist multiple approaches to compute a slice from a given program and
\added{slicing} criterion, but the most efficient and broadly use\added{d} data structure is the System
Dependence Graph (SDG), first introduced by Horwitz, Reps and
Blinkey~\cite{HorwitzRB88}. It is computed from the program's statements, and
once built, a slicing criterion is chosen, the graph traversed using a specific
algorithm, and the slice obtained. Its efficiency resides in the fact that for
multiple slices that share the same program, the graph must only be built once.
On top of that, building the graph has a complexity of $\mathcal{O}(n^2)$ with
respect to the number of statements in a program, but the traversal is linear
with respect to the number of nodes in the graph (each corresponding to a
statement).

The SDG is a directed graph, and as such it has vertices or nodes, each
representing an instruction in the program ---barring some auxiliary nodes
introduced by some approaches--- and directed edges, which represent the
dependencies among nodes. Those edges represent various kinds of dependencies
---control, data, calls, parameter passing, summary--- which will be defined in
section~\ref{sec:first-def-sdg}.

To create the SDG, first a \textsl{control flow graph} \added{(CFG)} is built for each method
in the program, then its control and data dependencies are computed, resulting
in the \textsl{program dependence graph} \added{(PDG)}. Finally, all the graphs from every
method are joined into the SDG. This process will be explained at greater
lengths in section~\ref{sec:first-def-sdg}.
%TODO: marked for removal --- this process is repeated later in ref{sec:first-deg-sdg}
%\begin{description}
	%\item[CFG] The control flow graph is the representation of the control
		%dependencies in a method of a program. Every statement has an edge from
		%itself to every statement that can immediately follow. This means that
		%most will only have one outgoing edge, and conditional jumps and loops
		%will have two. The graph starts in a ``Begin'' or ``Start'' node, and
		%ends in an ``End'' node, to which the last statement and all return
		%statements are connected. It is created directly from the source code,
		%without any need for data dependency analysis.
	%\item[PDG] The program dependence graph is the result of restructuring and
		%adding data dependencies to a CFG. All statements are placed below and
		%connected to a ``Begin'' node, except those which are inside a loop or
		%conditional block. Then data dependencies are added (red or dashed
		%edges), adding an edge between two nodes if there is a data dependency.
	%\item[SDG] Finally, the system dependence graph is the interconnection of
		%each method's PDG. When a call is made, the input arguments are passed
		%to subnodes of the call, and the result is obtained in another subnode.
		%There is an edge from the call to the beginning of the corresponding
		%method, and an extra type of edge exists: \textsl{summary edges}, which
		%summarize the data dependencies between input and output variables.
%\end{description}
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
multiplication program is converted to CFG, then PDG and finally SDG. For
simplicity, only the CFG and PDG of \texttt{multiply} are shown \josep{en realidad también está el SDG)} . Control
dependencies are black, data dependencies red\added{,} and summary edges blue.

\begin{figure}
	\centering
	\begin{minipage}{0.4\linewidth}
	\begin{lstlisting}
	int multiply(int x, int y) {
		int result = 0;
		while (x > 0) {
			result += y;
			x--;
		}
		System.out.println(result);
		return result;
	}
	\end{lstlisting}
	\end{minipage}
	\begin{minipage}{0.59\linewidth}
	\includegraphics[width=\linewidth]{img/multiplycfg}
	\end{minipage}
	\includegraphics[width=\linewidth]{img/multiplypdg}
	\includegraphics[width=\linewidth]{img/multiplysdg}
	\caption{A simple multiplication program, its CFG, PDG and SDG}
	\label{fig:basic-graphs}
\end{figure}

\subsection{Metrics}

There are four relevant metrics considered when evaluating a slicing algorithm:

\begin{description}
	\item[Completeness.] The solution includes all the statements that affect
		the \added{slicing criterion}\deleted{slice}. This is the most important feature, and almost all
		publications achieve at least completeness. Trivial completeness is
		easily achievable, as simple as including the whole program in the
		slice.
	\item[Correctness.] The solution excludes all statements that \added{do not}\deleted{don't} affect
		the \added{slicing criterion}\deleted{slice}. Most solutions are complete, but the degree of correctness is
		what sets them apart, as smaller slices will not execute unnecessary
		code to compute the values, decreasing the executing time.
	\item[Features covered.] Which features or language a slicing algorithm
		covers. Different approaches to slicing cover different programming
		languages and even paradigms. There are slicing techniques (published or
		commercially available) for most popular programming languages, from C++
		to Erlang. Some slicing techniques only cover a subset of the targeted
		language, and as such are less useful for commercial applications, but
		can be a stepping stone in the betterment of the field.
	\item[Speed.] Speed of graph generation and slice creation. As previously
		stated, slicing is a two-step process: build\added{ing} a graph and travers\deleted{e}\added{ing} it.
		The traversal is linear in most proposals, with small variations. Graph
		generation tends to be longer and with higher variance, but it is not as
		relevant, because it is only done once (per program being analyzed). As
		such, this is the least important metric. Only proposals that deviate
		from the aforementioned schema show a wider variation in speed.
\end{description}

\subsection{Program slicing as a debugging technique}

Program slicing is first and foremost a debugging technique, having each
variation a different purpose:

\begin{description}
	\item[Backward static.] Used to obtain the lines that affect a statement,
		normally used on a line which outputs an incorrect value, to narrow down
		the source of the bug.
	\item[Forward\deleted{e} static.] Used to obtain the lines affected by a statement,
		used to identify dead code, to check the effects a line has \added{on}\deleted{in} the rest
		of the program.
	\item[Chopping static.] Obtains both the statements affected by and the
		statements that affect the selected statement.
	\item[Dynamic.] Can be combined with any of the previous variations, and
		limits the slice to an execution log, only including statements that
		have run in a specific execution. The slice produced is much smaller and
		useful.
	\item[Quasi--static.] Some input values are given, and some are left
		unspecified: the result is a slice between the small dynamic slice and
		the general but bigger static slice. It can be specially useful when
		debugging a set of function calls which have a specific static input for
		some parameters, and variable input for others.
	\item[Simultaneous.] Similar to dynamic slicing, but considers multiple
		executions instead of only one. Similarly to quasy--static slicing, it
		can offer a slightly bigger slice while keeping the scope focused on the
		source of the bug.
	\carlos{completar}
\end{description}

\section{Exception handling in Java}
\label{sec:intro-exception}

Exception handling is common in most modern programming languages. In Java, it
consists of the following elements:
\begin{description}
	\item[Throwable.] An interface that encompasses all the exceptions or errors
		that may be thrown. Child classes are \texttt{Exception} for most errors
		and \texttt{Error} for internal errors in the Java Virtual Machine.
		Exceptions can be classified in\added{to} two categories: \textsl{unchecked}
		(those inheriting from \texttt{RuntimeException} or \texttt{Error}) and
		\textsl{checked} (the rest). The first may be thrown anywhere, whereas
		the second, if thrown, must be caught or declared in the method header.
	\item[throws.] A statement that activates an exception, altering the normal
		control-flow of the method. If the statement is inside a \textsl{try}
		block with a \textsl{catch} clause for its type or any supertype, the
		control flow will continue in the first statement of such clause.
		Otherwise, the method is exited and the check performed again, until
		either the exception is caught or the last method in the stack
		(\textsl{main}) is popped, and the execution of the program ends
		abruptly.
	\item[try.] This statement is followed by a block of statements and by one
		or more \textsl{catch} clauses. All exceptions thrown in the statements
		contained or any methods called will be processed by the list of
		catches. Optionally, after the \textsl{catch} clauses a \textsl{finally}
		block may appear.
	\item[catch.] Contains two elements: a variable declaration (the type must
		be an exception) and a block of statements to be executed when an
		exception of the corresponding type (or a subtype) is thrown.
		\textsl{catch} clauses are processed sequentially, and if any matches
		the type of the thrown exception, its block is executed, and the rest
		are ignored.  Variable declarations may be of multiple types
		\texttt{(T1|T2 exc)}, when two unrelated types of exception must be
		caught and the same code executed for both. When there is an inheritance
		relationship, the parent suffices.\footnotemark
	\item[finally.] Contains a block of statements that will always be executed
		if the \textsl{try} is entered. It is used to tidy up, for example
		closing I/O streams. The \textsl{finally} can be reached in two ways:
		with an exception pending (thrown in \textsl{try} and not captured by
		any \textsl{catch} or thrown inside a \textsl{catch}) or without it
		(when the \textsl{try} or \textsl{catch} block end successfully). After
		the last instruction of the block is executed, if there is an exception
		pending, control will be passed to the corresponding \textsl{catch} or
		the program will end. Otherwise, the execution continues in the next
		statement after the \textsl{try-catch-finally} block.
\end{description}

\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}

\subsection{Exception handling in other programming languages}

In almost all programming languages, errors can appear (either through the
developer, the user or the system's fault), and must be dealt with. Most of the
popular object\added{-}oriented programs feature some kind of error system, normally
very similar to Java's exceptions. In this section, we will perform a small
survey of the error-handling techniques used on the most popular programming
languages. The language list has been extracted from a survey performed by the
programming Q\&A website Stack
Overflow\footnote{\url{https://stackoverflow.com}}. The survey contains a
question about the technologies used by professional developers in their work,
and from that list we have extracted those languages with more than $5\%$ usage
in the industry. Table~\ref{tab:popular-languages} shows the list and its
source. Except Bash, Assembly, VBA, C and G, the rest of the languages shown
feature an exception system similar to the one appearing in Java.

\begin{table}
	\begin{minipage}{0.6\linewidth}
	\centering
	\begin{tabular}{r | r }
		\textbf{Language}     & $\%$ usage \\ \hline
		JavaScript            & 69.7 \\ \hline
		HTML/CSS              & 63.1 \\ \hline
		SQL                   & 56.5 \\ \hline
		Python                & 39.4 \\ \hline
		Java                  & 39.2 \\ \hline
		Bash/Shell/PowerShell & 37.9 \\ \hline
		C\#                   & 31.9 \\ \hline
		PHP                   & 25.8 \\ \hline
		TypeScript            & 23.5 \\ \hline
		C++                   & 20.4 \\ \hline
	\end{tabular}
	\end{minipage}
	\begin{minipage}{0.39\linewidth}
	\begin{tabular}{r | r }
		\textbf{Language}     & $\%$ usage \\ \hline
		C                     & 17.3 \\ \hline
		Ruby                  &  8.9 \\ \hline
		Go                    &  8.8 \\ \hline
		Swift                 &  6.8 \\ \hline
		Kotlin                &  6.6 \\ \hline
		R                     &  5.6 \\ \hline
		VBA                   &  5.5 \\ \hline
		Objective-C           &  5.2 \\ \hline
		Assembly              &  5.0 \\ \hline
	\end{tabular}
	\end{minipage}
	% The caption has a weird structure due to the fact that there's a footnote
	% inside of it.
	\caption[Commonly used programming languages]{The most commonly used
	programming languages by professional developers\protect\footnotemark}
	\label{tab:popular-languages}
\end{table}

\footnotetext{Data from \url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}

The exception systems that are similar to Java are mostly all the same,
featuring a \texttt{throw} statement (\texttt{raise} in Python), try-catching
structure and most include a finally block that may be appended to try blocks.
The difference resides in the value passed by the exception, which in languages
that feature inheritance it is a class descending from a generic error or
exception, and in languages without it, it is an arbitrary value (e.g.
JavaScript, TypeScript). In object--oriented programming, the filtering is
performed by comparing if the exception is a subtype of the exception being
caught (Java, C++, C\#, PowerShell\footnotemark, etc.); and in languages with
arbitrary exception values, a boolean condition is specified, and the first
catch block that fulfills its condition is activated, in following a pattern
similar to that of \texttt{switch} statements (e.g. JavaScript). In both cases
there exists a way to indicate that all exceptions should be caught, regardless
of type and content.

\footnotetext{Only since version 2.0, released with Windows 7.}

On the other hand, in the other languages there exist a variety of systems that
emulate or replace exception handling:

\begin{description} % bash, vba, C and Go exceptions explained
	\item[Bash] The popular Bourne Again SHell features no exception system, apart
		from the user's ability to parse the return code from the last statement
		executed. Traps can also be used to capture erroneous states and tidy up all
		files and environment variables before exiting the program. Traps allow the
		programmer to react to a user or system--sent signal, or an exit run from
		within the Bash environment. When a trap is activated, its code run, and the
		signal \added{does not}\deleted{doesn't} proceed and stop the program. This \added{does not}\deleted{doesn't} replace a fully
		featured exception system, but \texttt{bash} programs tend to be small in
		size, with programmers preferring the efficiency of C or the commodities of
		other high--level languages when the task requires it.
	\item[VBA] Visual Basic for Applications is a scripting programming language
		based on Visual Basic that is integrated into Microsoft Office to automate
		small tasks, such as generating documents from templates, making advanced
		computations that are impossible or slower with spreadsheet functions, etc.
		The only error--correcting system it has is the directive \texttt{On Error
		$x$}, where $x$ can be 0 ---lets the error crash the program---,
		\texttt{Next} ---continues the execution as if nothing had happened--- or a
		label in the program ---the execution jumps to the label in case of
		error. The directive can be set and reset multiple times, therefore creating
		artificial \texttt{try-catch} blocks, but there is no possibility of
		attaching a value to the error, lowering its usefulness.
	\item[C] In C, errors can also be control\added{led} via return values, but some of the
		instructions it features can be used to create a simple exception system.
		\texttt{setjmp} and \texttt{longjmp} are two instructions which set up and
		perform inter--function jumps. The first makes a snapshot of the call stack
		in a buffer, and the second returns to the position where the buffer was
		safe, destroying the current state of the stack and replacing it with the
		snapshot. Then, the execution continues from the evaluation of
		\texttt{setjmp}, which returns the second argument passed to
		\texttt{longjmp}.
		\begin{example}[User-built exception system in C] \  \\
			\label{fig:exceptions-c}
			\begin{minipage}{0.5\linewidth}
			\begin{lstlisting}[language=C]
			int main() {
				if (!setjmp(ref)) {
					res = safe_sqrt(x, ref);
				} else {
					// Handle error
					printf /* ... */
				}
			}
			\end{lstlisting}
			\end{minipage}
			\begin{minipage}{0.49\linewidth}
			\begin{lstlisting}[language=C]
			double safe_sqrt(double x, int ref) {
				if (x < 0)
					longjmp(ref, 1);
				return /* ... */;
			}
			\end{lstlisting}
			\end{minipage}
			In the \texttt{main} function, line 2 will be executed twice: first when
			it is normally reached ---returning 0--- and the second when line 3 in
			\texttt{safe\_sqrt} is run, returning the second argument of \texttt{longjmp},
			and therefore entering the else block in the \texttt{main} method.
		\end{example}
	\item[Go] The programming language Go is the odd one out in this section, being a
		modern programming language without exceptions, though it is an intentional
		design decision made by its authors\footnotemark. The argument made was that
		exception handling systems introduce abnormal control--flow and complicate
		code analysis and clean code generation, as it is not clear the paths that
		the code may follow. Instead, Go allows functions to return multiple values,
		with the second value typically associated to an error type. The error is
		checked before the value, and acted upon. Additionally, Go also features a
		simple panic system, with the functions \texttt{panic} ---throws an
		exception with a value associated---, \texttt{defer} ---runs after the
		function has ended or when a \texttt{panic} has been activated--- and
		\texttt{recover} ---stops the panic state and retrieves its value. The
		\texttt{defer} statement doubles as catch and finally, and multiple
		instances can be accumulated. When appropriate, they will run in LIFO order
		(Last In--First Out).
\end{description}

\footnotetext{\url{https://golang.org/doc/faq\#exceptions}}

% vim: set noexpandtab:tabstop=2:shiftwidth=2:softtabstop=2:wrap