341 lines
18 KiB
TeX
341 lines
18 KiB
TeX
% !TeX encoding = UTF-8
|
|
% !TeX spellcheck = en_US
|
|
% !TeX root = paper.tex
|
|
|
|
\chapter{Introduction}
|
|
|
|
\section{Program slicing}
|
|
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
|
|
answers the question: ``which parts of a program affect a given statement and
|
|
variable?'' The statement and the variable are the basic input to create a slice
|
|
and are called the \textsl{slicing criterion}. The criterion can be more
|
|
complex, as different slicing techniques may require additional pieces of input.
|
|
The \textsl{slice} of a program is the list of statements from the original
|
|
program ---which constitutes a valid program---, whose execution will result in
|
|
the same values for the variable (selected in the slicing criterion) being read
|
|
by a debugger in the selected statement.
|
|
There exist two fundamental dimensions along which the problem of slicing can be
|
|
proposed:
|
|
\begin{itemize}
|
|
\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
|
|
statically or dynamically.
|
|
\textsl{Static slicing} \cite{Wei81} is a slice which considers all
|
|
possible executions of the program, only taking into account the
|
|
semantics of the programming language.
|
|
In contrast, \textsl{dynamic slicing} \cite{KorL88} limits the slice to
|
|
the statements present in an execution log. The slicing criterion is
|
|
expanded to include a position in the log that corresponds to one
|
|
instance of the selected statement, making it much more specific. It may
|
|
help finding a bug related to indeterministic behavior (such as a random
|
|
or pseudo-random number generator), but must be recomputed for each case
|
|
being analyzed.
|
|
\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
|
|
\cite{Wei81} is generally more used, because it looks at the statements
|
|
that affect the slicing criterion. In contrast, \textsl{forward slicing}
|
|
\cite{BerC85} computes the statements that are affected by the slicing
|
|
criterion. There also exists a mixed approach called \textsl{chopping}
|
|
\cite{JacR94}, which is used to find all statements that affect or are
|
|
affected by the slicing criterion.
|
|
\end{itemize}
|
|
|
|
Since the definition of program slicing, the most extended form of slicing has
|
|
been \textsl{static backward slicing}, which obtains the list of statements that
|
|
affect the value of a variable in a given statement, in all possible executions
|
|
of the program (i.e., for any input data).
|
|
\begin{definition}[Strong static backward slice \cite{Wei81,HorwitzRB88}]
|
|
\label{def:strong-slice}
|
|
\carlos{Falta ver exactamente cuál es la cita correcta.}
|
|
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
|
|
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
|
|
or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
|
|
respect to $C$ if $S$ has the following properties:
|
|
\begin{enumerate}
|
|
\item $S$ is an executable program.
|
|
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
|
|
\item For any input $I$, the values produced on each execution of $s$
|
|
for each of the variables in $v$ is the same when executing $S$ as
|
|
when executing $P$. \label{enum:exact-output}
|
|
\end{enumerate}
|
|
\end{definition}
|
|
|
|
\begin{definition}[Weak static backward slice \cite{RepY89}]
|
|
\label{def:weak-slice}
|
|
\carlos{Comprobar cita y escribir formalmente}
|
|
Same as definition~\ref{def:strong-slice}, but
|
|
property~\ref{enum:exact-output} is altered to: For any input $I$, the
|
|
values produced on each execution of $s$ for each of the variables in $v$
|
|
when running $S$ is a prefix of the values produced when running $P$.
|
|
\end{definition}
|
|
|
|
Both definitions (\ref{def:strong-slice} and~\ref{def:weak-slice}) are
|
|
used throughout the literature, with some cases favoring the first and some the
|
|
second. Though the definitions come from the corresponding citations, the naming
|
|
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
|
|
where slices which produce the same output as the original are named
|
|
\textsl{strong}, and those where the original is a prefix of the slice,
|
|
\textsl{weak} \carlos{Se podría argumentar que con el slice débil es suficiente
|
|
para debugging, ya que si un error se presenta en el original, aparecerá también
|
|
en el programa fragmentado}.
|
|
See table~\ref{tab:slice-weak} for an example; with each row showing the values
|
|
logged at the slicing criterion from the execution of 4 different programs.
|
|
The first is the original, which computes $3!$. Slice A is one slice, whose
|
|
execution is identical and therefore is a strong slice. Slice B is correct but
|
|
continues producing values after the original stops ---a weak slice. It would
|
|
fit the relaxed definition but not a strong one. Slice C is incorrect, as the
|
|
values differ from the original. Some data or control dependency has not been
|
|
included in the slice and the program are behaving in a different way.
|
|
|
|
\begin{table}
|
|
\centering
|
|
\label{tab:slice-weak}
|
|
\begin{tabular}{r | r | r | r | r | r }
|
|
Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
|
|
Original & 1 & 2 & 6 & - & - \\ \hline
|
|
Slice A & 1 & 2 & 6 & - & - \\ \hline
|
|
Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
|
|
Slice C & 1 & 1 & 3 & 5 & 8 \\
|
|
\end{tabular}
|
|
\caption{Execution logs of different slices and their original program.}
|
|
\end{table}
|
|
|
|
Program slicing is a language--agnostic tool, but the original proposal by
|
|
Weiser~\cite{Wei81} covers a simple imperative programming language.
|
|
Since, the literature has been expanded by dozens of authors, that have
|
|
described and implemented slicing for more complex structures, such as
|
|
uncontrolled control flow~\cite{HorwitzRB88}, global variables~\cite{???},
|
|
exception handling~\cite{AllH03}; and for other programming paradigms, such as
|
|
object-oriented languages~\cite{???} or functional languages~\cite{???}.
|
|
\carlos{Se pueden poner más, faltan las citas correspondientes.}
|
|
|
|
\subsection{The System Dependence Graph (SDG)}
|
|
|
|
There exist multiple approaches to compute a slice from a given program and
|
|
criterion, but the most efficient and broadly use data structure is the System
|
|
Dependence Graph (SDG), first introduced by Horwitz, Reps and
|
|
Blinkey~\cite{HorwitzRB88}. It is computed from the program's statements, and
|
|
once built, a slicing criterion is chosen, the graph traversed using a specific
|
|
algorithm, and the slice obtained. Its efficiency resides in the fact that for
|
|
multiple slices that share the same program, the graph must only be built once.
|
|
On top of that, building the graph has a complexity of $\mathcal{O}(n^2)$ with
|
|
respect to the number of statements in a program, but the traversal is linear
|
|
with respect to the number of nodes in the graph (each corresponding to a
|
|
statement).
|
|
|
|
The SDG is a directed graph, and as such it has vertices or nodes, each
|
|
representing an instruction in the program ---barring some auxiliary nodes
|
|
introduced by some approaches--- and directed edges, which represent the
|
|
dependencies among nodes. Those edges represent various kinds of dependencies
|
|
---control, data, calls, parameter passing, summary--- which will be defined in
|
|
section~\ref{sec:first-def-sdg}.
|
|
|
|
To create the SDG, first a \textsl{control flow graph} is built for each method
|
|
in the program, then its control and data dependencies are computed, resulting
|
|
in the \textsl{program dependence graph}. Finally, all the graphs from every
|
|
method are joined into the SDG. This process will be explained at greater
|
|
lengths in section~\ref{sec:first-def-sdg}.
|
|
%TODO: marked for removal --- this process is repeated later in ref{sec:first-deg-sdg}
|
|
%\begin{description}
|
|
%\item[CFG] The control flow graph is the representation of the control
|
|
%dependencies in a method of a program. Every statement has an edge from
|
|
%itself to every statement that can immediately follow. This means that
|
|
%most will only have one outgoing edge, and conditional jumps and loops
|
|
%will have two. The graph starts in a ``Begin'' or ``Start'' node, and
|
|
%ends in an ``End'' node, to which the last statement and all return
|
|
%statements are connected. It is created directly from the source code,
|
|
%without any need for data dependency analysis.
|
|
%\item[PDG] The program dependence graph is the result of restructuring and
|
|
%adding data dependencies to a CFG. All statements are placed below and
|
|
%connected to a ``Begin'' node, except those which are inside a loop or
|
|
%conditional block. Then data dependencies are added (red or dashed
|
|
%edges), adding an edge between two nodes if there is a data dependency.
|
|
%\item[SDG] Finally, the system dependence graph is the interconnection of
|
|
%each method's PDG. When a call is made, the input arguments are passed
|
|
%to subnodes of the call, and the result is obtained in another subnode.
|
|
%There is an edge from the call to the beginning of the corresponding
|
|
%method, and an extra type of edge exists: \textsl{summary edges}, which
|
|
%summarize the data dependencies between input and output variables.
|
|
%\end{description}
|
|
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
|
|
multiplication program is converted to CFG, then PDG and finally SDG. For
|
|
simplicity, only the CFG and PDG of \texttt{multiply} are shown. Control
|
|
dependencies are black, data dependencies red and summary edges blue.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}{0.4\linewidth}
|
|
\begin{lstlisting}
|
|
int multiply(int x, int y) {
|
|
int result = 0;
|
|
while (x > 0) {
|
|
result += y;
|
|
x--;
|
|
}
|
|
System.out.println(result);
|
|
return result;
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.59\linewidth}
|
|
\includegraphics[width=\linewidth]{img/multiplycfg}
|
|
\end{minipage}
|
|
\includegraphics[width=\linewidth]{img/multiplypdg}
|
|
\includegraphics[width=\linewidth]{img/multiplysdg}
|
|
\caption{A simple multiplication program, its CFG, PDG and SDG}
|
|
\label{fig:basic-graphs}
|
|
\end{figure}
|
|
|
|
\subsection{Metrics}
|
|
|
|
There are four relevant metrics considered when evaluating a slicing algorithm:
|
|
|
|
\begin{description}
|
|
\item[Completeness] The solution includes all the statements that affect the
|
|
slice. This is the most important feature, and almost all publications
|
|
achieve at least completeness. Trivial completeness is easily
|
|
achievable, as simple as including the whole program in the slice.
|
|
\item[Correctness] The solution excludes all statements that don't affect
|
|
the slice. Most solutions are complete, but the degree of correctness is
|
|
what sets them apart, as smaller slices will not execute unnecessary
|
|
code to compute the values, decreasing the executing time.
|
|
\item[Features covered] Which features or language a slicing algorithm
|
|
covers. Different approaches to slicing cover different programming
|
|
languages and even paradigms. There are slicing techniques (published or
|
|
commercially available) for most popular programming languages, from C++
|
|
to Erlang. Some slicing techniques only cover a subset of the targeted
|
|
language, and as such are less useful for commercial applications, but
|
|
can be a stepping stone in the betterment of the field.
|
|
\item[Speed] Speed of graph generation and slice creation. As previously
|
|
commented, slicing is a two-step process: build a graph and traverse it.
|
|
The traversal is linear in most proposals, with small variations. Graph
|
|
generation tends to be longer and with higher variance, but it is not as
|
|
relevant, because it is only done once (per program being analyzed). As
|
|
such, this is the least important metric. Only proposals that deviate
|
|
from the aforementioned schema show a wider variation in speed.
|
|
\end{description}
|
|
|
|
\section{Exception handling in Java}
|
|
\label{sec:intro-exception}
|
|
|
|
Exception handling is common in most modern programming languages. In Java, it
|
|
consists of the following elements:
|
|
\begin{description}
|
|
\item[Throwable] An interface that encompasses all the exceptions or errors
|
|
that may be thrown. Child classes are \texttt{Exception} for most errors
|
|
and \texttt{Error} for internal errors in the Java Virtual Machine.
|
|
Exceptions can be classified in two categories: \textsl{unchecked}
|
|
(those inheriting from \texttt{RuntimeException} or \texttt{Error}) and
|
|
\textsl{checked} (the rest). The first may be thrown anywhere, whereas
|
|
the second, if thrown, must be caught or declared in the method header.
|
|
\item[throws] A statement that activates an exception, altering the normal
|
|
control-flow of the method. If the statement is inside a \textsl{try}
|
|
block with a \textsl{catch} clause for its type or any supertype, the
|
|
control flow will continue in the first statement of such clause.
|
|
Otherwise, the method is exited and the check performed again, until
|
|
either the exception is caught or the last method in the stack
|
|
(\textsl{main}) is popped, and the execution of the program ends
|
|
abruptly.
|
|
\item[try] This statement is followed by a block of statements and by one or
|
|
more \textsl{catch} clauses. All exceptions thrown in the statements
|
|
contained or any methods called will be processed by the list of
|
|
catches. Optionally, after the \textsl{catch} clauses a \textsl{finally}
|
|
block may appear.
|
|
\item[catch] Contains two elements: a variable declaration (the type must be
|
|
an exception) and a block of statements to be executed when an exception
|
|
of the corresponding type (or a subtype) is thrown. \textsl{catch}
|
|
clauses are processed sequentially, and if any matches the type of the
|
|
thrown exception, its block is executed, and the rest are ignored.
|
|
Variable declarations may be of multiple types \texttt{(T1|T2 exc)},
|
|
when two unrelated types of exception must be caught and the same code
|
|
executed for both. When there is an inheritance relationship, the parent
|
|
suffices.\footnotemark
|
|
\item[finally] Contains a block of statements that will always be executed
|
|
if the \textsl{try} is entered. It is used to tidy up, for example
|
|
closing I/O streams. The \textsl{finally} can be reached in two ways:
|
|
with an exception pending (thrown in \textsl{try} and not captured by
|
|
any \textsl{catch} or thrown inside a \textsl{catch}) or without it
|
|
(when the \textsl{try} or \textsl{catch} block end successfully). After
|
|
the last instruction of the block is executed, if there is an exception
|
|
pending, control will be passed to the corresponding \textsl{catch} or
|
|
the program will end. Otherwise, the execution continues in the next
|
|
statement after the \textsl{try-catch-finally} block.
|
|
\end{description}
|
|
|
|
\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}
|
|
|
|
\section{Exception handling in other programming languages}
|
|
|
|
In almost all programming languages, errors can appear (either through the
|
|
developer, the user or the system's fault), and must be dealt with.
|
|
Most of the popular object oriented programs feature some kind of error system,
|
|
normally very similar to Java's exceptions. In this section, we will perform a
|
|
small survey of the error-handling techniques used on the most popular
|
|
programming languages. The language list has been extracted from a survey
|
|
performed by the programming Q\&A website Stack
|
|
Overflow\footnote{\url{https://stackoverflow.com}}. The survey contains a
|
|
question about the technologies used by professional developers in their work,
|
|
and from that list we have extracted those languages with more than $5\%$ usage
|
|
in the industry. Table~\ref{tab:popular-languages} shows the list and its
|
|
source.
|
|
|
|
\begin{table}
|
|
\begin{minipage}{0.6\linewidth}
|
|
\centering
|
|
\begin{tabular}{r | r }
|
|
\textbf{Language} & $\%$ usage \\ \hline
|
|
JavaScript & 69.7 \\ \hline
|
|
HTML/CSS & 63.1 \\ \hline
|
|
SQL & 56.5 \\ \hline
|
|
Python & 39.4 \\ \hline
|
|
Java & 39.2 \\ \hline
|
|
Bash/Shell/PowerShell & 37.9 \\ \hline
|
|
C\# & 31.9 \\ \hline
|
|
PHP & 25.8 \\ \hline
|
|
TypeScript & 23.5 \\ \hline
|
|
C++ & 20.4 \\ \hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\begin{minipage}{0.39\linewidth}
|
|
\begin{tabular}{r | r }
|
|
\textbf{Language} & $\%$ usage \\ \hline
|
|
C & 17.3 \\ \hline
|
|
Ruby & 8.9 \\ \hline
|
|
Go & 8.8 \\ \hline
|
|
Swift & 6.8 \\ \hline
|
|
Kotlin & 6.6 \\ \hline
|
|
R & 5.6 \\ \hline
|
|
VBA & 5.5 \\ \hline
|
|
Objective-C & 5.2 \\ \hline
|
|
Assembly & 5.0 \\ \hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
% The caption has a weird structure due to the fact that there's a footnote
|
|
% inside of it.
|
|
\caption[Commonly used programming languages]{The most commonly used
|
|
programming languages by professional developers\protect\footnotemark}
|
|
\label{tab:popular-languages}
|
|
\end{table}
|
|
|
|
\footnotetext{Data from \url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}
|
|
|
|
|
|
Most of them feature an exception system similar to the one appearing in Java,
|
|
while others (bash, assembly, VBA, C) have no built-in method, but allow
|
|
\carlos{todo}. Some
|
|
check if the exception is of a given set of types for the catching mechanism
|
|
(Java, C++, C\#), whilst others rely on a condition that includes the exception
|
|
(Python, JavaScript, TypeScript). All of them have a mechanism that catches all
|
|
exceptions ---either by catching the type from which all exceptions inherit or
|
|
by providing no condition to check.
|
|
|
|
Go doesn't have an exception system per se, but a simple one can be built by
|
|
using the keywords ``panic'' (throw an exception with a value associated),
|
|
``defer'' (finally, run even when a panic is activated) and ``recover''
|
|
(stopping the panic state, retrieves the value associated with the panic).
|
|
Deferred code will be run after the main function ends, before the program
|
|
terminates. Each block is stored as a member of a stack, so the execution order
|
|
is LIFO. If a panic instruction is run, such code will still run, therefore
|
|
acting as a finally. The panic can only be stopped via the ``recover''
|
|
instruction, which obtains the value associated with the panic. Then, the
|
|
exception
|
|
|
|
% vim: set noexpandtab:tabstop=2:sw=2:wrap
|