tfm-report/introduction.tex
2019-10-21 15:33:42 +00:00

287 lines
16 KiB
TeX

% !TeX encoding = UTF-8
% !TeX spellcheck = en_US
% !TeX root = paper.tex
\chapter{Introduction}
\section{Program slicing}
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
answers the question: ``which parts of a program affect a given statement and
variable?'' The statement and the variable are the basic input to create a slice
and are called the \textsl{slicing criterion}. The criterion can be more
complex, as different slicing techniques may require additional pieces of input.
The \textsl{slice} of a program is the list of statements from the original
program ---which constitutes a valid program---, whose execution will result in
the same values for the variable (selected in the slicing criterion) being read
by a debugger in the selected statement.
There exist two fundamental dimensions along which the problem of slicing can be
proposed:
\begin{itemize}
\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
statically or dynamically.
\textsl{Static slicing} \cite{Wei81} is a slice which considers all
possible executions of the program, only taking into account the
semantics of the programming language.
In contrast, \textsl{dynamic slicing} \cite{KorL88} limits the slice to
the statements present in an execution log. The slicing criterion is
expanded to include a position in the log that corresponds to one
instance of the selected statement, making it much more specific. It may
help finding a bug related to indeterministic behavior (such as a random
or pseudo-random number generator), but must be recomputed for each case
being analyzed.
\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
\cite{Wei81} is generally more used, because it looks at the statements
that affect the slicing criterion. In contrast, \textsl{forward slicing}
\cite{BerC85} computes the statements that are affected by the slicing
criterion. There also exists a mixed approach called \textsl{chopping}
\cite{JacR94}, which is used to find all statements that affect or are
affected by the slicing criterion.
\end{itemize}
Since the definition of program slicing, the most extended form of slicing has
been \textsl{static backward slicing}, which obtains the list of statements that
affect the value of a variable in a given statement, in all possible executions
of the program (i.e., for any input data).
\begin{definition}[Strong static backward slice \cite{Wei81,HorwitzRB88}]
\label{def:strong-slice}
\carlos{Falta ver exactamente cuál es la cita correcta.}
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
respect to $C$ if $S$ has the following properties:
\begin{enumerate}
\item $S$ is an executable program.
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
\item For any input $I$, the values produced on each execution of $s$
for each of the variables in $v$ is the same when executing $S$ as
when executing $P$. \label{enum:exact-output}
\end{enumerate}
\end{definition}
\begin{definition}[Weak static backward slice \cite{RepY89}]
\label{def:weak-slice}
\carlos{Comprobar cita y escribir formalmente}
Same as definition~\ref{def:strong-slice}, but
property~\ref{enum:exact-output} is altered to: For any input $I$, the
values produced on each execution of $s$ for each of the variables in $v$
when running $S$ is a prefix of the values produced when running $P$.
\end{definition}
Both definitions (\ref{def:strong-slice} and~\ref{def:weak:slice}) are
used throughout the literature, with some cases favoring the first and some the
second. Though the definitions come from the corresponding citations, the naming
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
where slices which produce the same output as the original are named
\textsl{strong}, and those where the original is a prefix of the slice,
\textsl{weak} \carlos{Se podría argumentar que con el slice débil es suficiente
para debugging, ya que si un error se presenta en el original, aparecerá también
en el programa fragmentado}.
See table~\ref{tab:slice-weak} for an example; with each row showing the values
logged at the slicing criterion from the execution of 4 different programs.
The first is the original, which computes $3!$. Slice A is one slice, whose
execution is identical and therefore is a strong slice. Slice B is correct but
continues producing values after the original stops ---a weak slice. It would
fit the relaxed definition but not a strong one. Slice C is incorrect, as the
values differ from the original. Some data or control dependency has not been
included in the slice and the program are behaving in a different way.
\begin{table}
\centering
\label{tab:slice-weak}
\begin{tabular}{r | r | r | r | r | r }
Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
Original & 1 & 2 & 6 & - & - \\ \hline
Slice A & 1 & 2 & 6 & - & - \\ \hline
Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
Slice C & 1 & 1 & 3 & 5 & 8 \\
\end{tabular}
\caption{Execution logs of different slices and their original program.}
\end{table}
The most efficient and broadly used data structure for slicing is the system
dependence graph (SDG), first introduced by Horwitz, Reps and Blinkey
\cite{HorwitzRB88}. It represents the statements of a program as vertices, and
their dependencies as directed edges. Method calls are connected to method
definitions, and so are the corresponding input and output parameters. SDGs show
two different kinds of dependencies: \textsl{data} and \textsl{control}. The
first one connects nodes that write to variables (i.e., they \emph{define} their
value) to the nodes that use (or \textsl{may} use) the value, and it is often
represented as a dashed\todo{check} line.
Control dependencies are used to represent which nodes have control over the
execution of others (conditional jumps and loops, mainly), and its
representation is often a solid line. In order to obtain a slice of a program,
its SDG must be built ($\mathcal{O}(n^2)$) from the source code.
Then a two pass search ($\mathcal{O}(n)$ each) is performed to obtain the slice.
The SDG can be reused to obtain a different slice of the same program (with a
different criterion or kind \carlos{cambiar palabra} of slice).
The efficiency derives from the linear cost of the search on the SDG, so most
modifications modify the complexity of the SDG's construction, but try to keep
the slice process linear.
The SDG is built in 3 stages, each resulting in a different graph:
\begin{description}
\item[CFG] The control flow graph is the representation of the control
dependencies in a method of a program. Every statement has an edge from
itself to every statement that can immediately follow. This means that
most will only have one outgoing edge, and conditional jumps and loops
will have two. The graph starts in a ``Begin'' or ``Start'' node, and
ends in an ``End'' node, to which the last statement and all return
statements are connected. It is created directly from the source code,
without any need for data dependency analysis.
\item[PDG] The program dependence graph is the result of restructuring and
adding data dependencies to a CFG. All statements are placed below and
connected to a ``Begin'' node, except those which are inside a loop or
conditional block. Then data dependencies are added (red or dashed
edges), adding an edge between two nodes if there is a data dependency.
\todo{add definitions?}
\item[SDG] Finally, the system dependence graph is the interconnection of
each method's PDG. When a call is made, the input arguments are passed
to subnodes of the call, and the result is obtained in another subnode.
There is an edge from the call to the beginning of the corresponding
method, and an extra type of edge exists: \textsl{summary edges}, which
summarize the data dependencies between input and output variables.
\end{description}
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
multiplication program is converted to CFG, then PDG and finally SDG. For
simplicity only the CFG and PDG of \texttt{multiply} are shown. Control
dependencies are black, data dependencies red and summary edges blue.
\begin{figure}
\centering
% \lstinputlisting[firstline=8, lastline=16]{./dot/simple.java}
\includegraphics[width=0.5\linewidth]{img/multiplycfg}
\includegraphics[width=\linewidth]{img/multiplypdg}
\includegraphics[width=\linewidth]{img/multiplysdg}
\caption{A simple multiplication program, its CFG, PDG and SDG}
\label{fig:basic-graphs}
\end{figure}
The original proposal by Weiser\cite{Wei81} covers the simplest of an imperative
programming language. The various iterations\todo{cite} until reaching the
SDG\todo{cite} have added other elements, such as return statements\todo{cite},
global variables\todo{cite}, object oriented features\todo{cite} and finally
exception handling\cite{AllH03}.
\subsection{Metrics}
There are 5 metrics considered when evaluating a slicing algorithm:
\begin{description}
\item[Completeness] The solution includes all the statements that affect the
slice. This is the most important feature, and almost all publications
achieve at least completeness. Trivial completeness is easily
achievable, as simple as including the whole program in the slice.
\item[Correctness] The solution excludes all statements that don't affect
the slice. Most solutions are complete, but the degree of correctness is
what sets them apart, as smaller slices will not execute unnecessary
code to compute the values, decreasing the executing time.
\item[Features covered] Which features or language a slicing algorithm
covers. Different approaches to slicing cover different programming
languages and even paradigms. There are slicing techniques (published or
commercially available) for most popular programming languages, from C++
to Erlang. Some slicing techniques only cover a subset of the targeted
language, and as such are less useful for commercial applications, but
can be a stepping stone in the betterment of the field.
\item[Speed] Speed of graph generation and slice creation. As previously
commented, slicing is a two-step process: build a graph and traverse it.
The traversal is linear in most proposals, with small variations. Graph
generation tends to be longer and with higher variance, but it is not as
relevant, because it is only done once (per program being analyzed). As
such, this is the least important metric. Only proposals that deviate
from the aforementioned schema show a wider variation in speed.
\end{description}
\subsection{Program slicing as a debugging technique}
Program slicing is first and foremost a debugging technique, having each
variation a different purpose:
\begin{description}
\item[Backward static]
\end{description}
\section{Exception handling in Java}
\label{sec:intro-exception}
Exception handling is common in most modern programming languages. In Java, it
consists of the following elements:
\begin{description}
\item[Throwable] An interface that encompasses all the exceptions or errors
that may be thrown. Child classes are \texttt{Exception} for most errors
and \texttt{Error} for internal errors in the Java Virtual Machine.
Exceptions can be classified in two categories: \textsl{unchecked}
(those inheriting from \texttt{RuntimeException} or \texttt{Error}) and
\textsl{checked} (the rest). The first may be thrown anywhere, whereas
the second, if thrown, must be caught or declared in the method header.
\item[throws] A statement that activates an exception, altering the normal
control-flow of the method. If the statement is inside a \textsl{try}
block with a \textsl{catch} clause for its type or any supertype, the
control flow will continue in the first statement of such clause.
Otherwise, the method is exited and the check performed again, until
either the exception is caught or the last method in the stack
(\textsl{main}) is popped, and the execution of the program ends
abruptly.
\item[try] This statement is followed by a block of statements and by one or
more \textsl{catch} clauses. All exceptions thrown in the statements
contained or any methods called will be processed by the list of
catches. Optionally, after the \textsl{catch} clauses a \textsl{finally}
block may appear.
\item[catch] Contains two elements: a variable declaration (the type must be
an exception) and a block of statements to be executed when an exception
of the corresponding type (or a subtype) is thrown. \textsl{catch}
clauses are processed sequentially, and if any matches the type of the
thrown exception, its block is executed, and the rest are ignored.
Variable declarations may be of multiple types \texttt{(T1|T2 exc)},
when two unrelated types of exception must be caught and the same code
executed for both. When there is an inheritance relationship, the parent
suffices.\footnotemark
\item[finally] Contains a block of statements that will always be executed
if the \textsl{try} is entered. It is used to tidy up, for example
closing I/O streams. The \textsl{finally} can be reached in two ways:
with an exception pending (thrown in \textsl{try} and not captured by
any \textsl{catch} or thrown inside a \textsl{catch}) or without it
(when the \textsl{try} or \textsl{catch} block end successfully). After
the last instruction of the block is executed, if there is an exception
pending, control will be passed to the corresponding \textsl{catch} or
the program will end. Otherwise, the execution continues in the next
statement after the \textsl{try-catch-finally} block.
\end{description}
\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}
\section{Exception handling in other programming languages}
In almost all programming languages, errors exist, and must be dealt with.
Java's exception system is a common one among object-oriented programming
languages, but not the only one,
Most of the popular object oriented programs feature some kind of error system,
normally very similar to Java's exceptions. In this section, we will perform a
small survey on the most popular programming languages. The ``most popular''
list has been obtained from the Stack Overflow 2019 Developer
Survey\footnotemark ($>5\%$ usage in the industry). The languages and their
usage in the industry are shown in Figure~\ref{fig:languages}.
Most of them feature an exception system similar to the one appearing in Java,
while others (bash, assembly, VBA, C) have no built-in method, but allow
\carlos{todo}. Some
check if the exception is of a given set of types for the catching mechanism
(Java, C++, C\#), whilst others rely on a condition that includes the exception
(Python, JavaScript, TypeScript). All of them have a mechanism that catches all
exceptions ---either by catching the type from which all exceptions inherit or
by providing no condition to check.
\footnotetext{\url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}
Go doesn't have an exception system per se, but a simple one can be built by
using the keywords ``panic'' (throw an exception with a value associated),
``defer'' (finally, run even when a panic is activated) and ``recover''
(stopping the panic state, retrieves the value associated with the panic).
Deferred code will be run after the main function ends, before the program
terminates. Each block is stored as a member of a stack, so the execution order
is LIFO. If a panic instruction is run, such code will still run, therefore
acting as a finally. The panic can only be stopped via the ``recover''
instruction, which obtains the value associated with the panic. Then, the
exception
% vim: set noexpandtab:ts=2:sw=2:wrap