2019-10-18 16:33:52 +02:00
|
|
|
% !TeX encoding = UTF-8
|
|
|
|
% !TeX spellcheck = en_US
|
|
|
|
% !TeX root = paper.tex
|
2019-10-21 17:33:42 +02:00
|
|
|
|
2019-10-18 10:54:33 +02:00
|
|
|
\chapter{Introduction}
|
|
|
|
|
|
|
|
\section{Program slicing}
|
2019-10-21 17:33:42 +02:00
|
|
|
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
|
|
|
|
answers the question: ``which parts of a program affect a given statement and
|
|
|
|
variable?'' The statement and the variable are the basic input to create a slice
|
|
|
|
and are called the \textsl{slicing criterion}. The criterion can be more
|
|
|
|
complex, as different slicing techniques may require additional pieces of input.
|
|
|
|
The \textsl{slice} of a program is the list of statements from the original
|
|
|
|
program ---which constitutes a valid program---, whose execution will result in
|
|
|
|
the same values for the variable (selected in the slicing criterion) being read
|
|
|
|
by a debugger in the selected statement.
|
|
|
|
There exist two fundamental dimensions along which the problem of slicing can be
|
|
|
|
proposed:
|
2019-10-18 10:54:33 +02:00
|
|
|
\begin{itemize}
|
2019-10-21 17:33:42 +02:00
|
|
|
\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
|
|
|
|
statically or dynamically.
|
|
|
|
\textsl{Static slicing} \cite{Wei81} is a slice which considers all
|
|
|
|
possible executions of the program, only taking into account the
|
|
|
|
semantics of the programming language.
|
|
|
|
In contrast, \textsl{dynamic slicing} \cite{KorL88} limits the slice to
|
|
|
|
the statements present in an execution log. The slicing criterion is
|
|
|
|
expanded to include a position in the log that corresponds to one
|
|
|
|
instance of the selected statement, making it much more specific. It may
|
|
|
|
help finding a bug related to indeterministic behavior (such as a random
|
|
|
|
or pseudo-random number generator), but must be recomputed for each case
|
|
|
|
being analyzed.
|
|
|
|
\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
|
|
|
|
\cite{Wei81} is generally more used, because it looks at the statements
|
|
|
|
that affect the slicing criterion. In contrast, \textsl{forward slicing}
|
|
|
|
\cite{BerC85} computes the statements that are affected by the slicing
|
|
|
|
criterion. There also exists a mixed approach called \textsl{chopping}
|
|
|
|
\cite{JacR94}, which is used to find all statements that affect or are
|
|
|
|
affected by the slicing criterion.
|
2019-10-18 10:54:33 +02:00
|
|
|
\end{itemize}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
Since the definition of program slicing, the most extended form of slicing has
|
|
|
|
been \textsl{static backward slicing}, which obtains the list of statements that
|
|
|
|
affect the value of a variable in a given statement, in all possible executions
|
|
|
|
of the program (i.e., for any input data).
|
|
|
|
\begin{definition}[Strong static backward slice \cite{Wei81,HorwitzRB88}]
|
|
|
|
\label{def:strong-slice}
|
|
|
|
\carlos{Falta ver exactamente cuál es la cita correcta.}
|
|
|
|
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
|
|
|
|
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
|
|
|
|
or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
|
|
|
|
respect to $C$ if $S$ has the following properties:
|
|
|
|
\begin{enumerate}
|
|
|
|
\item $S$ is an executable program.
|
|
|
|
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
|
|
|
|
\item For any input $I$, the values produced on each execution of $s$
|
|
|
|
for each of the variables in $v$ is the same when executing $S$ as
|
|
|
|
when executing $P$. \label{enum:exact-output}
|
|
|
|
\end{enumerate}
|
|
|
|
\end{definition}
|
|
|
|
|
|
|
|
\begin{definition}[Weak static backward slice \cite{RepY89}]
|
|
|
|
\label{def:weak-slice}
|
|
|
|
\carlos{Comprobar cita y escribir formalmente}
|
|
|
|
Same as definition~\ref{def:strong-slice}, but
|
|
|
|
property~\ref{enum:exact-output} is altered to: For any input $I$, the
|
|
|
|
values produced on each execution of $s$ for each of the variables in $v$
|
|
|
|
when running $S$ is a prefix of the values produced when running $P$.
|
|
|
|
\end{definition}
|
2019-10-18 10:54:33 +02:00
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
Both definitions (\ref{def:strong-slice} and~\ref{def:weak:slice}) are
|
|
|
|
used throughout the literature, with some cases favoring the first and some the
|
|
|
|
second. Though the definitions come from the corresponding citations, the naming
|
|
|
|
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
|
|
|
|
where slices which produce the same output as the original are named
|
|
|
|
\textsl{strong}, and those where the original is a prefix of the slice,
|
|
|
|
\textsl{weak} \carlos{Se podría argumentar que con el slice débil es suficiente
|
|
|
|
para debugging, ya que si un error se presenta en el original, aparecerá también
|
|
|
|
en el programa fragmentado}.
|
|
|
|
See table~\ref{tab:slice-weak} for an example; with each row showing the values
|
|
|
|
logged at the slicing criterion from the execution of 4 different programs.
|
|
|
|
The first is the original, which computes $3!$. Slice A is one slice, whose
|
|
|
|
execution is identical and therefore is a strong slice. Slice B is correct but
|
|
|
|
continues producing values after the original stops ---a weak slice. It would
|
|
|
|
fit the relaxed definition but not a strong one. Slice C is incorrect, as the
|
|
|
|
values differ from the original. Some data or control dependency has not been
|
|
|
|
included in the slice and the program are behaving in a different way.
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
\begin{table}
|
|
|
|
\centering
|
2019-10-21 17:33:42 +02:00
|
|
|
\label{tab:slice-weak}
|
2019-10-18 10:54:33 +02:00
|
|
|
\begin{tabular}{r | r | r | r | r | r }
|
|
|
|
Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
|
2019-10-21 17:33:42 +02:00
|
|
|
Original & 1 & 2 & 6 & - & - \\ \hline
|
|
|
|
Slice A & 1 & 2 & 6 & - & - \\ \hline
|
2019-10-18 10:54:33 +02:00
|
|
|
Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
|
|
|
|
Slice C & 1 & 1 & 3 & 5 & 8 \\
|
|
|
|
\end{tabular}
|
|
|
|
\caption{Execution logs of different slices and their original program.}
|
|
|
|
\end{table}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
The most efficient and broadly used data structure for slicing is the system
|
|
|
|
dependence graph (SDG), first introduced by Horwitz, Reps and Blinkey
|
|
|
|
\cite{HorwitzRB88}. It represents the statements of a program as vertices, and
|
|
|
|
their dependencies as directed edges. Method calls are connected to method
|
|
|
|
definitions, and so are the corresponding input and output parameters. SDGs show
|
|
|
|
two different kinds of dependencies: \textsl{data} and \textsl{control}. The
|
|
|
|
first one connects nodes that write to variables (i.e., they \emph{define} their
|
|
|
|
value) to the nodes that use (or \textsl{may} use) the value, and it is often
|
|
|
|
represented as a dashed\todo{check} line.
|
|
|
|
Control dependencies are used to represent which nodes have control over the
|
|
|
|
execution of others (conditional jumps and loops, mainly), and its
|
|
|
|
representation is often a solid line. In order to obtain a slice of a program,
|
|
|
|
its SDG must be built ($\mathcal{O}(n^2)$) from the source code.
|
|
|
|
Then a two pass search ($\mathcal{O}(n)$ each) is performed to obtain the slice.
|
|
|
|
The SDG can be reused to obtain a different slice of the same program (with a
|
|
|
|
different criterion or kind \carlos{cambiar palabra} of slice).
|
|
|
|
The efficiency derives from the linear cost of the search on the SDG, so most
|
|
|
|
modifications modify the complexity of the SDG's construction, but try to keep
|
|
|
|
the slice process linear.
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
The SDG is built in 3 stages, each resulting in a different graph:
|
|
|
|
|
|
|
|
\begin{description}
|
2019-10-21 17:33:42 +02:00
|
|
|
\item[CFG] The control flow graph is the representation of the control
|
|
|
|
dependencies in a method of a program. Every statement has an edge from
|
|
|
|
itself to every statement that can immediately follow. This means that
|
|
|
|
most will only have one outgoing edge, and conditional jumps and loops
|
|
|
|
will have two. The graph starts in a ``Begin'' or ``Start'' node, and
|
|
|
|
ends in an ``End'' node, to which the last statement and all return
|
|
|
|
statements are connected. It is created directly from the source code,
|
|
|
|
without any need for data dependency analysis.
|
|
|
|
\item[PDG] The program dependence graph is the result of restructuring and
|
|
|
|
adding data dependencies to a CFG. All statements are placed below and
|
|
|
|
connected to a ``Begin'' node, except those which are inside a loop or
|
|
|
|
conditional block. Then data dependencies are added (red or dashed
|
|
|
|
edges), adding an edge between two nodes if there is a data dependency.
|
|
|
|
\todo{add definitions?}
|
|
|
|
\item[SDG] Finally, the system dependence graph is the interconnection of
|
|
|
|
each method's PDG. When a call is made, the input arguments are passed
|
|
|
|
to subnodes of the call, and the result is obtained in another subnode.
|
|
|
|
There is an edge from the call to the beginning of the corresponding
|
|
|
|
method, and an extra type of edge exists: \textsl{summary edges}, which
|
|
|
|
summarize the data dependencies between input and output variables.
|
2019-10-18 10:54:33 +02:00
|
|
|
\end{description}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
|
|
|
|
multiplication program is converted to CFG, then PDG and finally SDG. For
|
|
|
|
simplicity only the CFG and PDG of \texttt{multiply} are shown. Control
|
|
|
|
dependencies are black, data dependencies red and summary edges blue.
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
\begin{figure}
|
|
|
|
\centering
|
|
|
|
% \lstinputlisting[firstline=8, lastline=16]{./dot/simple.java}
|
|
|
|
\includegraphics[width=0.5\linewidth]{img/multiplycfg}
|
|
|
|
\includegraphics[width=\linewidth]{img/multiplypdg}
|
|
|
|
\includegraphics[width=\linewidth]{img/multiplysdg}
|
|
|
|
\caption{A simple multiplication program, its CFG, PDG and SDG}
|
|
|
|
\label{fig:basic-graphs}
|
|
|
|
\end{figure}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
The original proposal by Weiser\cite{Wei81} covers the simplest of an imperative
|
|
|
|
programming language. The various iterations\todo{cite} until reaching the
|
|
|
|
SDG\todo{cite} have added other elements, such as return statements\todo{cite},
|
|
|
|
global variables\todo{cite}, object oriented features\todo{cite} and finally
|
|
|
|
exception handling\cite{AllH03}.
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
\subsection{Metrics}
|
|
|
|
|
|
|
|
There are 5 metrics considered when evaluating a slicing algorithm:
|
|
|
|
|
|
|
|
\begin{description}
|
2019-10-21 17:33:42 +02:00
|
|
|
\item[Completeness] The solution includes all the statements that affect the
|
|
|
|
slice. This is the most important feature, and almost all publications
|
|
|
|
achieve at least completeness. Trivial completeness is easily
|
|
|
|
achievable, as simple as including the whole program in the slice.
|
|
|
|
\item[Correctness] The solution excludes all statements that don't affect
|
|
|
|
the slice. Most solutions are complete, but the degree of correctness is
|
|
|
|
what sets them apart, as smaller slices will not execute unnecessary
|
|
|
|
code to compute the values, decreasing the executing time.
|
|
|
|
\item[Features covered] Which features or language a slicing algorithm
|
|
|
|
covers. Different approaches to slicing cover different programming
|
|
|
|
languages and even paradigms. There are slicing techniques (published or
|
|
|
|
commercially available) for most popular programming languages, from C++
|
|
|
|
to Erlang. Some slicing techniques only cover a subset of the targeted
|
|
|
|
language, and as such are less useful for commercial applications, but
|
|
|
|
can be a stepping stone in the betterment of the field.
|
|
|
|
\item[Speed] Speed of graph generation and slice creation. As previously
|
|
|
|
commented, slicing is a two-step process: build a graph and traverse it.
|
|
|
|
The traversal is linear in most proposals, with small variations. Graph
|
|
|
|
generation tends to be longer and with higher variance, but it is not as
|
|
|
|
relevant, because it is only done once (per program being analyzed). As
|
|
|
|
such, this is the least important metric. Only proposals that deviate
|
|
|
|
from the aforementioned schema show a wider variation in speed.
|
2019-10-18 10:54:33 +02:00
|
|
|
\end{description}
|
|
|
|
|
|
|
|
\subsection{Program slicing as a debugging technique}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
Program slicing is first and foremost a debugging technique, having each
|
|
|
|
variation a different purpose:
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
\begin{description}
|
|
|
|
\item[Backward static]
|
|
|
|
\end{description}
|
|
|
|
|
|
|
|
\section{Exception handling in Java}
|
|
|
|
\label{sec:intro-exception}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
Exception handling is common in most modern programming languages. In Java, it
|
|
|
|
consists of the following elements:
|
2019-10-18 10:54:33 +02:00
|
|
|
\begin{description}
|
2019-10-21 17:33:42 +02:00
|
|
|
\item[Throwable] An interface that encompasses all the exceptions or errors
|
|
|
|
that may be thrown. Child classes are \texttt{Exception} for most errors
|
|
|
|
and \texttt{Error} for internal errors in the Java Virtual Machine.
|
|
|
|
Exceptions can be classified in two categories: \textsl{unchecked}
|
|
|
|
(those inheriting from \texttt{RuntimeException} or \texttt{Error}) and
|
|
|
|
\textsl{checked} (the rest). The first may be thrown anywhere, whereas
|
|
|
|
the second, if thrown, must be caught or declared in the method header.
|
|
|
|
\item[throws] A statement that activates an exception, altering the normal
|
|
|
|
control-flow of the method. If the statement is inside a \textsl{try}
|
|
|
|
block with a \textsl{catch} clause for its type or any supertype, the
|
|
|
|
control flow will continue in the first statement of such clause.
|
|
|
|
Otherwise, the method is exited and the check performed again, until
|
|
|
|
either the exception is caught or the last method in the stack
|
|
|
|
(\textsl{main}) is popped, and the execution of the program ends
|
|
|
|
abruptly.
|
|
|
|
\item[try] This statement is followed by a block of statements and by one or
|
|
|
|
more \textsl{catch} clauses. All exceptions thrown in the statements
|
|
|
|
contained or any methods called will be processed by the list of
|
|
|
|
catches. Optionally, after the \textsl{catch} clauses a \textsl{finally}
|
|
|
|
block may appear.
|
|
|
|
\item[catch] Contains two elements: a variable declaration (the type must be
|
|
|
|
an exception) and a block of statements to be executed when an exception
|
|
|
|
of the corresponding type (or a subtype) is thrown. \textsl{catch}
|
|
|
|
clauses are processed sequentially, and if any matches the type of the
|
|
|
|
thrown exception, its block is executed, and the rest are ignored.
|
|
|
|
Variable declarations may be of multiple types \texttt{(T1|T2 exc)},
|
|
|
|
when two unrelated types of exception must be caught and the same code
|
|
|
|
executed for both. When there is an inheritance relationship, the parent
|
|
|
|
suffices.\footnotemark
|
|
|
|
\item[finally] Contains a block of statements that will always be executed
|
|
|
|
if the \textsl{try} is entered. It is used to tidy up, for example
|
|
|
|
closing I/O streams. The \textsl{finally} can be reached in two ways:
|
|
|
|
with an exception pending (thrown in \textsl{try} and not captured by
|
|
|
|
any \textsl{catch} or thrown inside a \textsl{catch}) or without it
|
|
|
|
(when the \textsl{try} or \textsl{catch} block end successfully). After
|
|
|
|
the last instruction of the block is executed, if there is an exception
|
|
|
|
pending, control will be passed to the corresponding \textsl{catch} or
|
|
|
|
the program will end. Otherwise, the execution continues in the next
|
|
|
|
statement after the \textsl{try-catch-finally} block.
|
2019-10-18 10:54:33 +02:00
|
|
|
\end{description}
|
|
|
|
|
|
|
|
\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}
|
|
|
|
|
|
|
|
\section{Exception handling in other programming languages}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
In almost all programming languages, errors exist, and must be dealt with.
|
|
|
|
Java's exception system is a common one among object-oriented programming
|
|
|
|
languages, but not the only one,
|
|
|
|
Most of the popular object oriented programs feature some kind of error system,
|
|
|
|
normally very similar to Java's exceptions. In this section, we will perform a
|
|
|
|
small survey on the most popular programming languages. The ``most popular''
|
|
|
|
list has been obtained from the Stack Overflow 2019 Developer
|
|
|
|
Survey\footnotemark ($>5\%$ usage in the industry). The languages and their
|
|
|
|
usage in the industry are shown in Figure~\ref{fig:languages}.
|
|
|
|
Most of them feature an exception system similar to the one appearing in Java,
|
|
|
|
while others (bash, assembly, VBA, C) have no built-in method, but allow
|
|
|
|
\carlos{todo}. Some
|
|
|
|
check if the exception is of a given set of types for the catching mechanism
|
|
|
|
(Java, C++, C\#), whilst others rely on a condition that includes the exception
|
|
|
|
(Python, JavaScript, TypeScript). All of them have a mechanism that catches all
|
|
|
|
exceptions ---either by catching the type from which all exceptions inherit or
|
|
|
|
by providing no condition to check.
|
2019-10-18 10:54:33 +02:00
|
|
|
|
|
|
|
\footnotetext{\url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}
|
|
|
|
|
2019-10-21 17:33:42 +02:00
|
|
|
Go doesn't have an exception system per se, but a simple one can be built by
|
|
|
|
using the keywords ``panic'' (throw an exception with a value associated),
|
|
|
|
``defer'' (finally, run even when a panic is activated) and ``recover''
|
|
|
|
(stopping the panic state, retrieves the value associated with the panic).
|
|
|
|
Deferred code will be run after the main function ends, before the program
|
|
|
|
terminates. Each block is stored as a member of a stack, so the execution order
|
|
|
|
is LIFO. If a panic instruction is run, such code will still run, therefore
|
|
|
|
acting as a finally. The panic can only be stopped via the ``recover''
|
|
|
|
instruction, which obtains the value associated with the panic. Then, the
|
|
|
|
exception
|
|
|
|
|
|
|
|
% vim: set noexpandtab:ts=2:sw=2:wrap
|