tfm-report/introduction.tex

\chapter{Introduction}

\section{Program slicing}
\textsl{Program slicing} is a debugging technique which answers the question: ``which parts of a program affect a selected statement and variable?'' The statement and the variable are the basic input to create a slice and are called the \textsl{slicing criterion}. The criterion can be more complex, as different slicing techniques may require additional pieces of input.
There exist two dimensions along which the problem of slicing can be proposed:
\begin{itemize}
	\item \textsl{Static} or \textsl{dynamic}: slicing can be performed statically (which is the default) or dynamically, which includes an execution log. A statement in the log is marked, along with a variable. The dynamic slice will only include statements from the execution log, even if in the general case more statements are required. This makes the slice more useful for the specific case, and may help solve a bug related with an indeterministic behaviour (such as a random or pseudo-random number generator), but must be recomputed for each case to be analyzed.
	\item \textsl{Backward} or \textsl{forward}: the default tends to be backward slicing, which looks at which statements affect the selected one. Forward slicing obtains the statements that are affected by the chosen one. There also exists a mixed approach, which is used to find all the statements that affect or affected by a specific line.
\end{itemize}

The default choice tends to be a \textsl{static backward slice}, which obtains the list of statements that affect the value of a variable in a given statement in all possible executions of the program.
The \textsl{slice} of a program is a list of statements from the original program which constitutes a valid program, whose execution will result in the same values for the variable being read by a debugger in the selected statement\cite{weiser79}.
Some definitions of slicing\todo{Citation needed} allow for the slice to continue producing values after the program has stopped, making the slices simpler to produce and smaller in size at the cost of different endings\footnotemark. We will name the exact slice ---one that produces exactly the same values--- a \textit{strong} slice, and the permissive one, a \textit{weak} slice. See table \ref{tab:slice-permissive} for an example; with each row showing the values logged at the slicing criterion from the execution of 4 different programs. The first is the original, which computes $3!$. Slice A is one slice, whose execution is identical and therefore is a strong slice. Slice B is correct but continues producing values after the original stops ---a weak slice. It would fit the relaxed definition but not a strict one. Slice C is incorrect, as the values differ from the original. Some data or control dependency has not been included in the slice and the program is behaving in a different way.

\footnotetext{POSSIBLE ADDITION: It could be argued that permissive or weak slicing is enough for most uses of slicing, as if we suppose that the bug is present before the end of the program, then the bug must show up in the slice as well, regardless of whether the sliced program continues producing extra values or not.}

\begin{table}
	\centering
	\label{tab:slice-permissive}
	\begin{tabular}{r | r | r | r | r | r }
		Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
		Original & 1 & 2 & 6 & & \\ \hline
		Slice A & 1 & 2 & 6 & & \\ \hline
		Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
		Slice C & 1 & 1 & 3 & 5 & 8 \\
	\end{tabular}
	\caption{Execution logs of different slices and their original program.}
\end{table}

The most efficient and broadly used tool for slicing is the system dependence graph (SDG), first introduced by Horwitz, Reps and Blinkey\cite{horwitz90}. It represents the statements of a program as vertices, and their dependencies as directed edges. Method calls are connected to method definitions, and so are the corresponding input and output parameters. SDGs show two different kinds of dependencies: \textsl{data} and \textsl{control}. The first one connects nodes that write to variables to the nodes that use (or \textsl{may} use) the value, and it is represented as a dashed\todo{check} line. The latter represents which nodes have control over the execution of others (conditional jumps and loops, mainly), and its representation is a solid line. In order to obtain a slice of a program, its SDG must be built from the source code. Then a two pass search ($\mathcal{O}(n)$ each) is performed to obtain the slice. The SDG can be reused to obtain a different slice of the same program (with a different criterion or kind\footnotemark of slice). The efficiency derives from the linear cost of the search on the SDG, so most modifications\todo{citation needed} modify the complexity of the SDG's construction, but try to keep the slice process linear.

\footnotetext{TODO: change this word to the proper one.}

The SDG is built in 3 stages, each resulting in a different graph:

\begin{description}
	\item[CFG] The control flow graph is the representation of the control dependencies in a method of a program. Every statement has an edge from itself to every statement that can immediately follow. This means that most will only have one outgoing edge, and conditional jumps and loops will have two. The graph starts in a ``Begin'' or ``Start'' node, and ends in an ``End'' node, to which the last statement and all return statements are connected. It is created directly from the source code, without any need for data dependency analysis.
	\item[PDG] The program dependence graph is the result of restructuring and adding data dependencies to a CFG. All statements are placed below and connected to a ``Begin'' node, except those which are inside a loop or conditional block. Then data dependencies are added (red or dashed edges), adding an edge between two nodes if there is a data dependency. \todo{add definitions?}
	\item[SDG] Finally, the system dependence graph is the interconnection of each method's PDG. When a call is made, the input arguments are passed to subnodes of the call, and the result is obtained in another subnode. There is an edge from the call to the beginning of the corresponding method, and an extra type of edge exists: \textsl{summary edges}, which summarize the data dependencies between input and output variables.
\end{description}

An example is provided in figure \ref{fig:basic-graphs}, where a simple multiplication program is converted to CFG, then PDG and finally SDG. For simplicity only the CFG and PDG of \texttt{multiply} are shown. Control dependencies are black, data dependencies red and summary edges blue.

\begin{figure}
	\centering
%	\lstinputlisting[firstline=8, lastline=16]{./dot/simple.java}
	\includegraphics[width=0.5\linewidth]{img/multiplycfg}
	\includegraphics[width=\linewidth]{img/multiplypdg}
	\includegraphics[width=\linewidth]{img/multiplysdg}
	\caption{A simple multiplication program, its CFG, PDG and SDG}
	\label{fig:basic-graphs}
\end{figure}

The original proposal by Weiser\cite{weiser79} covers the simplest of an imperative programming language. The various iterations\todo{cite} until reaching the SDG\todo{cite} have added other elements, such as return statements\todo{cite}, global variables\todo{cite}, object oriented features\todo{cite} and finally exception handling\cite{horwitz03}.

\subsection{Metrics}

There are 5 metrics considered when evaluating a slicing algorithm:

\begin{description}
	\item[Completeness] The solution includes all the statements that affect the slice. This is the most important feature, and almost all publications achieve at least completeness. Trivial completeness is easily achievable, as simple as including the whole program in the slice.
	\item[Correctness] The solution excludes all statements that don't affect the slice. Most solutions are complete, but the degree of correctness is what sets them apart, as smaller slices will not execute unnecessary code to compute the values, decreasing the executing time.
	\item[Features covered] Which features or language a slicing algorithm covers. Different approaches to slicing cover different programming languages and even paradigms. There are slicing techniques (published or commercially available) for most popular programming languages, from C++ to Erlang. Some slicing techniques only cover a subset of the targeted language, and as such are less useful for commercial applications, but can be a stepping stone in the betterment of the field.
	\item[Speed] Speed of graph generation and slice creation. As previously commented, slicing is a two-step process: build a graph and traverse it. The traversal is linear in most proposals, with small variations. Graph generation tends to be longer and with higher variance, but it is not as relevant, because it is only done once (per program being analyzed). As such, this is the least important metric. Only proposals that deviate from the aforementioned schema show a wider variation in speed.
\end{description}

\subsection{Program slicing as a debugging technique}

Program slicing is first and foremost a debugging technique, having each variation a different purpose:

\begin{description}
	\item[Backward static]
\end{description}

\section{Exception handling in Java}
\label{sec:intro-exception}

Exception handling is common in most modern programming languages. In Java, it consists of the following elements:
\begin{description}
	\item[Throwable] An interface that encompasses all the exceptions or errors that may be thrown. Child classes are \texttt{Exception} for most errors and \texttt{Error} for internal errors in the Java Virtual Machine. Exceptions can be classified in two categories: \textsl{unchecked} (those inheriting from \texttt{RuntimeException} or \texttt{Error}) and \textsl{checked} (the rest). The first may be thrown anywhere, whereas the second, if thrown, must be caught or declared in the method header.
	\item[throws] A statement that activates an exception, altering the normal control-flow of the method. If the statement is inside a \textsl{try} block with a \textsl{catch} clause for its type or any supertype, the control flow will continue in the first statement of such clause. Otherwise, the method is exited and the check performed again, until either the exception is caught or the last method in the stack (\textsl{main}) is popped, and the execution of the program ends abruptly.
	\item[try] This statement is followed by a block of statements and by one or more \textsl{catch} clauses. All exceptions thrown in the statements contained or any methods called will be processed by the list of catches. Optionally, after the \textsl{catch} clauses a \textsl{finally} block may appear.
	\item[catch] Contains two elements: a variable declaration (the type must be an exception) and a block of statements to be executed when an exception of the corresponding type (or a subtype) is thrown. \textsl{catch} clauses are processed sequentially, and if any matches the type of the thrown exception, its block is executed, and the rest are ignored. Variable declarations may be of multiple types \texttt{(T1|T2 exc)}, when two unrelated types of exception must be caught and the same code executed for both. When there is an inheritance relationship, the parent suffices.\footnotemark
	\item[finally] Contains a block of statements that will always be executed if the \textsl{try} is entered. It is used to tidy up, for example closing I/O streams. The \textsl{finally} can be reached in two ways: with an exception pending (thrown in \textsl{try} and not captured by any \textsl{catch} or thrown inside a \textsl{catch}) or without it (when the \textsl{try} or \textsl{catch} block end successfully). After the last instruction of the block is executed, if there is an exception pending, control will be passed to the corresponding \textsl{catch} or the program will end. Otherwise, the execution continues in the next statement after the \textsl{try-catch-finally} block.
\end{description}

\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}

\section{Exception handling in other programming languages}

In almost all programming languages, errors exist, and must be dealt with. Java's exception system is a common one among object-oriented programming languages, but not the only one, 
Most of the popular object oriented programs feature some kind of error system, normally very similar to Java's exceptions. In this section, we will perform a small survey on the most popular programming languages. The ``most popular'' list has been obtained from the Stack Overflow 2019 Developer Survey\footnotemark ($>5\%$ usage in the industry). The languages and their usage in the industry are shown in Figure~\ref{fig:languages}.
Most of them feature an exception system similar to the one appearing in Java, while others (bash, assembly, VBA, C) have no built-in method, but allow . Some check if the exception is of a given set of types for the catching mechanism (Java, C++, C\#), whilst others rely on a condition that includes the exception (Python, JavaScript, TypeScript). All of them have a mechanism that catches all exceptions ---either by catching the type from which all exceptions inherit or by providing no condition to check.

\footnotetext{\url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}

Go doesn't have an exception system per se, but a simple one can be built by using the keywords ``panic'' (throw an exception with a value associated), ``defer'' (finally, run even when a panic is activated) and ``recover'' (stopping the panic state, retrieves the value associated with the panic). Deferred code will be run after the main function ends, before the program terminates. Each block is stored as a member of a stack, so the execution order is LIFO. If a panic instruction is run, such code will still run, therefore acting as a finally. The panic can only be stopped via the ``recover'' instruction, which obtains the value associated with the panic. Then, the exception
Initial commit (TFM Carlos) 2019-10-18 10:54:33 +02:00			`\chapter{Introduction}`

			`\section{Program slicing}`
			\textsl{Program slicing} is a debugging technique which answers the question: ``which parts of a program affect a selected statement and variable?'' The statement and the variable are the basic input to create a slice and are called the \textsl{slicing criterion}. The criterion can be more complex, as different slicing techniques may require additional pieces of input.
			`There exist two dimensions along which the problem of slicing can be proposed:`
			`\begin{itemize}`
			\item \textsl{Static} or \textsl{dynamic}: slicing can be performed statically (which is the default) or dynamically, which includes an execution log. A statement in the log is marked, along with a variable. The dynamic slice will only include statements from the execution log, even if in the general case more statements are required. This makes the slice more useful for the specific case, and may help solve a bug related with an indeterministic behaviour (such as a random or pseudo-random number generator), but must be recomputed for each case to be analyzed.
			`\item \textsl{Backward} or \textsl{forward}: the default tends to be backward slicing, which looks at which statements affect the selected one. Forward slicing obtains the statements that are affected by the chosen one. There also exists a mixed approach, which is used to find all the statements that affect or affected by a specific line.`
			`\end{itemize}`

			`The default choice tends to be a \textsl{static backward slice}, which obtains the list of statements that affect the value of a variable in a given statement in all possible executions of the program.`
			`The \textsl{slice} of a program is a list of statements from the original program which constitutes a valid program, whose execution will result in the same values for the variable being read by a debugger in the selected statement\cite{weiser79}.`
			Some definitions of slicing\todo{Citation needed} allow for the slice to continue producing values after the program has stopped, making the slices simpler to produce and smaller in size at the cost of different endings\footnotemark. We will name the exact slice ---one that produces exactly the same values--- a \textit{strong} slice, and the permissive one, a \textit{weak} slice. See table \ref{tab:slice-permissive} for an example; with each row showing the values logged at the slicing criterion from the execution of 4 different programs. The first is the original, which computes $3!$. Slice A is one slice, whose execution is identical and therefore is a strong slice. Slice B is correct but continues producing values after the original stops ---a weak slice. It would fit the relaxed definition but not a strict one. Slice C is incorrect, as the values differ from the original. Some data or control dependency has not been included in the slice and the program is behaving in a different way.

			`\footnotetext{POSSIBLE ADDITION: It could be argued that permissive or weak slicing is enough for most uses of slicing, as if we suppose that the bug is present before the end of the program, then the bug must show up in the slice as well, regardless of whether the sliced program continues producing extra values or not.}`

			`\begin{table}`
			`\centering`
			`\label{tab:slice-permissive}`
			`\begin{tabular}{r \| r \| r \| r \| r \| r }`
			`Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline`
			`Original & 1 & 2 & 6 & & \\ \hline`
			`Slice A & 1 & 2 & 6 & & \\ \hline`
			`Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline`
			`Slice C & 1 & 1 & 3 & 5 & 8 \\`
			`\end{tabular}`
			`\caption{Execution logs of different slices and their original program.}`
			`\end{table}`

			The most efficient and broadly used tool for slicing is the system dependence graph (SDG), first introduced by Horwitz, Reps and Blinkey\cite{horwitz90}. It represents the statements of a program as vertices, and their dependencies as directed edges. Method calls are connected to method definitions, and so are the corresponding input and output parameters. SDGs show two different kinds of dependencies: \textsl{data} and \textsl{control}. The first one connects nodes that write to variables to the nodes that use (or \textsl{may} use) the value, and it is represented as a dashed\todo{check} line. The latter represents which nodes have control over the execution of others (conditional jumps and loops, mainly), and its representation is a solid line. In order to obtain a slice of a program, its SDG must be built from the source code. Then a two pass search ($\mathcal{O}(n)$ each) is performed to obtain the slice. The SDG can be reused to obtain a different slice of the same program (with a different criterion or kind\footnotemark of slice). The efficiency derives from the linear cost of the search on the SDG, so most modifications\todo{citation needed} modify the complexity of the SDG's construction, but try to keep the slice process linear.

			`\footnotetext{TODO: change this word to the proper one.}`

			`The SDG is built in 3 stages, each resulting in a different graph:`

			`\begin{description}`
			\item[CFG] The control flow graph is the representation of the control dependencies in a method of a program. Every statement has an edge from itself to every statement that can immediately follow. This means that most will only have one outgoing edge, and conditional jumps and loops will have two. The graph starts in a ``Begin'' or ``Start'' node, and ends in an ``End'' node, to which the last statement and all return statements are connected. It is created directly from the source code, without any need for data dependency analysis.
			\item[PDG] The program dependence graph is the result of restructuring and adding data dependencies to a CFG. All statements are placed below and connected to a ``Begin'' node, except those which are inside a loop or conditional block. Then data dependencies are added (red or dashed edges), adding an edge between two nodes if there is a data dependency. \todo{add definitions?}
			`\item[SDG] Finally, the system dependence graph is the interconnection of each method's PDG. When a call is made, the input arguments are passed to subnodes of the call, and the result is obtained in another subnode. There is an edge from the call to the beginning of the corresponding method, and an extra type of edge exists: \textsl{summary edges}, which summarize the data dependencies between input and output variables.`
			`\end{description}`

			`An example is provided in figure \ref{fig:basic-graphs}, where a simple multiplication program is converted to CFG, then PDG and finally SDG. For simplicity only the CFG and PDG of \texttt{multiply} are shown. Control dependencies are black, data dependencies red and summary edges blue.`

			`\begin{figure}`
			`\centering`
			`% \lstinputlisting[firstline=8, lastline=16]{./dot/simple.java}`
			`\includegraphics[width=0.5\linewidth]{img/multiplycfg}`
			`\includegraphics[width=\linewidth]{img/multiplypdg}`
			`\includegraphics[width=\linewidth]{img/multiplysdg}`
			`\caption{A simple multiplication program, its CFG, PDG and SDG}`
			`\label{fig:basic-graphs}`
			`\end{figure}`

			`The original proposal by Weiser\cite{weiser79} covers the simplest of an imperative programming language. The various iterations\todo{cite} until reaching the SDG\todo{cite} have added other elements, such as return statements\todo{cite}, global variables\todo{cite}, object oriented features\todo{cite} and finally exception handling\cite{horwitz03}.`

			`\subsection{Metrics}`

			`There are 5 metrics considered when evaluating a slicing algorithm:`

			`\begin{description}`
			`\item[Completeness] The solution includes all the statements that affect the slice. This is the most important feature, and almost all publications achieve at least completeness. Trivial completeness is easily achievable, as simple as including the whole program in the slice.`
			`\item[Correctness] The solution excludes all statements that don't affect the slice. Most solutions are complete, but the degree of correctness is what sets them apart, as smaller slices will not execute unnecessary code to compute the values, decreasing the executing time.`
			`\item[Features covered] Which features or language a slicing algorithm covers. Different approaches to slicing cover different programming languages and even paradigms. There are slicing techniques (published or commercially available) for most popular programming languages, from C++ to Erlang. Some slicing techniques only cover a subset of the targeted language, and as such are less useful for commercial applications, but can be a stepping stone in the betterment of the field.`
			`\item[Speed] Speed of graph generation and slice creation. As previously commented, slicing is a two-step process: build a graph and traverse it. The traversal is linear in most proposals, with small variations. Graph generation tends to be longer and with higher variance, but it is not as relevant, because it is only done once (per program being analyzed). As such, this is the least important metric. Only proposals that deviate from the aforementioned schema show a wider variation in speed.`
			`\end{description}`

			`\subsection{Program slicing as a debugging technique}`

			`Program slicing is first and foremost a debugging technique, having each variation a different purpose:`

			`\begin{description}`
			`\item[Backward static]`
			`\end{description}`

			`\section{Exception handling in Java}`
			`\label{sec:intro-exception}`

			`Exception handling is common in most modern programming languages. In Java, it consists of the following elements:`
			`\begin{description}`
			`\item[Throwable] An interface that encompasses all the exceptions or errors that may be thrown. Child classes are \texttt{Exception} for most errors and \texttt{Error} for internal errors in the Java Virtual Machine. Exceptions can be classified in two categories: \textsl{unchecked} (those inheriting from \texttt{RuntimeException} or \texttt{Error}) and \textsl{checked} (the rest). The first may be thrown anywhere, whereas the second, if thrown, must be caught or declared in the method header.`
			`\item[throws] A statement that activates an exception, altering the normal control-flow of the method. If the statement is inside a \textsl{try} block with a \textsl{catch} clause for its type or any supertype, the control flow will continue in the first statement of such clause. Otherwise, the method is exited and the check performed again, until either the exception is caught or the last method in the stack (\textsl{main}) is popped, and the execution of the program ends abruptly.`
			`\item[try] This statement is followed by a block of statements and by one or more \textsl{catch} clauses. All exceptions thrown in the statements contained or any methods called will be processed by the list of catches. Optionally, after the \textsl{catch} clauses a \textsl{finally} block may appear.`
			\item[catch] Contains two elements: a variable declaration (the type must be an exception) and a block of statements to be executed when an exception of the corresponding type (or a subtype) is thrown. \textsl{catch} clauses are processed sequentially, and if any matches the type of the thrown exception, its block is executed, and the rest are ignored. Variable declarations may be of multiple types \texttt{(T1\|T2 exc)}, when two unrelated types of exception must be caught and the same code executed for both. When there is an inheritance relationship, the parent suffices.\footnotemark
			\item[finally] Contains a block of statements that will always be executed if the \textsl{try} is entered. It is used to tidy up, for example closing I/O streams. The \textsl{finally} can be reached in two ways: with an exception pending (thrown in \textsl{try} and not captured by any \textsl{catch} or thrown inside a \textsl{catch}) or without it (when the \textsl{try} or \textsl{catch} block end successfully). After the last instruction of the block is executed, if there is an exception pending, control will be passed to the corresponding \textsl{catch} or the program will end. Otherwise, the execution continues in the next statement after the \textsl{try-catch-finally} block.
			`\end{description}`

			`\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}`

			`\section{Exception handling in other programming languages}`

			`In almost all programming languages, errors exist, and must be dealt with. Java's exception system is a common one among object-oriented programming languages, but not the only one,`
			Most of the popular object oriented programs feature some kind of error system, normally very similar to Java's exceptions. In this section, we will perform a small survey on the most popular programming languages. The ``most popular'' list has been obtained from the Stack Overflow 2019 Developer Survey\footnotemark ($>5\%$ usage in the industry). The languages and their usage in the industry are shown in Figure~\ref{fig:languages}.
			Most of them feature an exception system similar to the one appearing in Java, while others (bash, assembly, VBA, C) have no built-in method, but allow . Some check if the exception is of a given set of types for the catching mechanism (Java, C++, C\#), whilst others rely on a condition that includes the exception (Python, JavaScript, TypeScript). All of them have a mechanism that catches all exceptions ---either by catching the type from which all exceptions inherit or by providing no condition to check.

			`\footnotetext{\url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}`

			Go doesn't have an exception system per se, but a simple one can be built by using the keywords ``panic'' (throw an exception with a value associated), ``defer'' (finally, run even when a panic is activated) and ``recover'' (stopping the panic state, retrieves the value associated with the panic). Deferred code will be run after the main function ends, before the program terminates. Each block is stored as a member of a stack, so the execution order is LIFO. If a panic instruction is run, such code will still run, therefore acting as a finally. The panic can only be stopped via the ``recover'' instruction, which obtains the value associated with the panic. Then, the exception