423 lines
22 KiB
TeX
423 lines
22 KiB
TeX
% !TEX encoding = UTF-8
|
|
% !TEX spellcheck = en_US
|
|
% !TEX root = paper.tex
|
|
|
|
\chapter{Introduction}
|
|
|
|
\section{Program slicing}
|
|
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
|
|
answers the question: ``which parts of a program affect a given statement and
|
|
variable?'' The statement and the variable are the basic input to create a slice
|
|
and are called the \textsl{slicing criterion}. The criterion can be more
|
|
complex, as different slicing techniques may require additional pieces of input.
|
|
The \textsl{slice} of a program is the list of statements from the original
|
|
program ---which constitutes a valid program---, whose execution will result in
|
|
the same values for the variable (selected in the slicing criterion) being read
|
|
by a debugger in the selected statement.
|
|
There exist two fundamental dimensions along which the problem of slicing can be
|
|
proposed:
|
|
\begin{itemize}
|
|
\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
|
|
statically or dynamically.
|
|
\textsl{Static slicing} \cite{Wei81} is a slice which considers all
|
|
possible executions of the program, only taking into account the
|
|
semantics of the programming language.
|
|
In contrast, \textsl{dynamic slicing} \cite{KorL88} limits the slice to
|
|
the statements present in an execution log. The slicing criterion is
|
|
expanded to include a position in the log that corresponds to one
|
|
instance of the selected statement, making it much more specific. It may
|
|
help finding a bug related to indeterministic behavior (such as a random
|
|
or pseudo-random number generator), but must be recomputed for each case
|
|
being analyzed.
|
|
\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
|
|
\cite{Wei81} is generally more used, because it looks at the statements
|
|
that affect the slicing criterion. In contrast, \textsl{forward slicing}
|
|
\cite{BerC85} computes the statements that are affected by the slicing
|
|
criterion. There also exists a mixed approach called \textsl{chopping}
|
|
\cite{JacR94}, which is used to find all statements that affect or are
|
|
affected by the slicing criterion.
|
|
\end{itemize}
|
|
|
|
Since the definition of program slicing, the most extended form of slicing has
|
|
been \textsl{static backward slicing}, which obtains the list of statements that
|
|
affect the value of a variable in a given statement, in all possible executions
|
|
of the program (i.e., for any input data).
|
|
\begin{definition}[Strong static backward slice \cite{Wei81,HorwitzRB88}]
|
|
\label{def:strong-slice}
|
|
\carlos{Falta ver exactamente cuál es la cita correcta.}
|
|
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
|
|
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
|
|
or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
|
|
respect to $C$ if $S$ has the following properties:
|
|
\begin{enumerate}
|
|
\item $S$ is an executable program.
|
|
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
|
|
\item For any input $I$, the values produced on each execution of $s$
|
|
for each of the variables in $v$ is the same when executing $S$ as
|
|
when executing $P$. \label{enum:exact-output}
|
|
\end{enumerate}
|
|
\end{definition}
|
|
|
|
\begin{definition}[Weak static backward slice \cite{RepY89}]
|
|
\label{def:weak-slice}
|
|
\carlos{Comprobar cita y escribir formalmente}
|
|
Same as definition~\ref{def:strong-slice}, but
|
|
property~\ref{enum:exact-output} is altered to: For any input $I$, the
|
|
values produced on each execution of $s$ for each of the variables in $v$
|
|
when running $S$ is a prefix of the values produced when running $P$.
|
|
\end{definition}
|
|
|
|
Both definitions (\ref{def:strong-slice} and~\ref{def:weak-slice}) are
|
|
used throughout the literature, with some cases favoring the first and some the
|
|
second. Though the definitions come from the corresponding citations, the naming
|
|
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
|
|
where slices which produce the same output as the original are named
|
|
\textsl{strong}, and those where the original is a prefix of the slice,
|
|
\textsl{weak} \carlos{Se podría argumentar que con el slice débil es suficiente
|
|
para debugging, ya que si un error se presenta en el original, aparecerá también
|
|
en el programa fragmentado}.
|
|
See table~\ref{tab:slice-weak} for an example; with each row showing the values
|
|
logged at the slicing criterion from the execution of 4 different programs.
|
|
The first is the original, which computes $3!$. Slice A is one slice, whose
|
|
execution is identical and therefore is a strong slice. Slice B is correct but
|
|
continues producing values after the original stops ---a weak slice. It would
|
|
fit the relaxed definition but not a strong one. Slice C is incorrect, as the
|
|
values differ from the original. Some data or control dependency has not been
|
|
included in the slice and the program are behaving in a different way.
|
|
|
|
\begin{table}
|
|
\centering
|
|
\label{tab:slice-weak}
|
|
\begin{tabular}{r | r | r | r | r | r }
|
|
Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
|
|
Original & 1 & 2 & 6 & - & - \\ \hline
|
|
Slice A & 1 & 2 & 6 & - & - \\ \hline
|
|
Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
|
|
Slice C & 1 & 1 & 3 & 5 & 8 \\
|
|
\end{tabular}
|
|
\caption{Execution logs of different slices and their original program.}
|
|
\end{table}
|
|
|
|
Program slicing is a language--agnostic tool, but the original proposal by
|
|
Weiser~\cite{Wei81} covers a simple imperative programming language.
|
|
Since, the literature has been expanded by dozens of authors, that have
|
|
described and implemented slicing for more complex structures, such as
|
|
uncontrolled control flow~\cite{HorwitzRB88}, global variables~\cite{???},
|
|
exception handling~\cite{AllH03}; and for other programming paradigms, such as
|
|
object-oriented languages~\cite{???} or functional languages~\cite{???}.
|
|
\carlos{Se pueden poner más, faltan las citas correspondientes.}
|
|
|
|
\subsection{The System Dependence Graph (SDG)}
|
|
|
|
There exist multiple approaches to compute a slice from a given program and
|
|
criterion, but the most efficient and broadly use data structure is the System
|
|
Dependence Graph (SDG), first introduced by Horwitz, Reps and
|
|
Blinkey~\cite{HorwitzRB88}. It is computed from the program's statements, and
|
|
once built, a slicing criterion is chosen, the graph traversed using a specific
|
|
algorithm, and the slice obtained. Its efficiency resides in the fact that for
|
|
multiple slices that share the same program, the graph must only be built once.
|
|
On top of that, building the graph has a complexity of $\mathcal{O}(n^2)$ with
|
|
respect to the number of statements in a program, but the traversal is linear
|
|
with respect to the number of nodes in the graph (each corresponding to a
|
|
statement).
|
|
|
|
The SDG is a directed graph, and as such it has vertices or nodes, each
|
|
representing an instruction in the program ---barring some auxiliary nodes
|
|
introduced by some approaches--- and directed edges, which represent the
|
|
dependencies among nodes. Those edges represent various kinds of dependencies
|
|
---control, data, calls, parameter passing, summary--- which will be defined in
|
|
section~\ref{sec:first-def-sdg}.
|
|
|
|
To create the SDG, first a \textsl{control flow graph} is built for each method
|
|
in the program, then its control and data dependencies are computed, resulting
|
|
in the \textsl{program dependence graph}. Finally, all the graphs from every
|
|
method are joined into the SDG. This process will be explained at greater
|
|
lengths in section~\ref{sec:first-def-sdg}.
|
|
%TODO: marked for removal --- this process is repeated later in ref{sec:first-deg-sdg}
|
|
%\begin{description}
|
|
%\item[CFG] The control flow graph is the representation of the control
|
|
%dependencies in a method of a program. Every statement has an edge from
|
|
%itself to every statement that can immediately follow. This means that
|
|
%most will only have one outgoing edge, and conditional jumps and loops
|
|
%will have two. The graph starts in a ``Begin'' or ``Start'' node, and
|
|
%ends in an ``End'' node, to which the last statement and all return
|
|
%statements are connected. It is created directly from the source code,
|
|
%without any need for data dependency analysis.
|
|
%\item[PDG] The program dependence graph is the result of restructuring and
|
|
%adding data dependencies to a CFG. All statements are placed below and
|
|
%connected to a ``Begin'' node, except those which are inside a loop or
|
|
%conditional block. Then data dependencies are added (red or dashed
|
|
%edges), adding an edge between two nodes if there is a data dependency.
|
|
%\item[SDG] Finally, the system dependence graph is the interconnection of
|
|
%each method's PDG. When a call is made, the input arguments are passed
|
|
%to subnodes of the call, and the result is obtained in another subnode.
|
|
%There is an edge from the call to the beginning of the corresponding
|
|
%method, and an extra type of edge exists: \textsl{summary edges}, which
|
|
%summarize the data dependencies between input and output variables.
|
|
%\end{description}
|
|
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
|
|
multiplication program is converted to CFG, then PDG and finally SDG. For
|
|
simplicity, only the CFG and PDG of \texttt{multiply} are shown. Control
|
|
dependencies are black, data dependencies red and summary edges blue.
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{minipage}{0.4\linewidth}
|
|
\begin{lstlisting}
|
|
int multiply(int x, int y) {
|
|
int result = 0;
|
|
while (x > 0) {
|
|
result += y;
|
|
x--;
|
|
}
|
|
System.out.println(result);
|
|
return result;
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.59\linewidth}
|
|
\includegraphics[width=\linewidth]{img/multiplycfg}
|
|
\end{minipage}
|
|
\includegraphics[width=\linewidth]{img/multiplypdg}
|
|
\includegraphics[width=\linewidth]{img/multiplysdg}
|
|
\caption{A simple multiplication program, its CFG, PDG and SDG}
|
|
\label{fig:basic-graphs}
|
|
\end{figure}
|
|
|
|
\subsection{Metrics}
|
|
|
|
There are four relevant metrics considered when evaluating a slicing algorithm:
|
|
|
|
\begin{description}
|
|
\item[Completeness] The solution includes all the statements that affect the
|
|
slice. This is the most important feature, and almost all publications
|
|
achieve at least completeness. Trivial completeness is easily
|
|
achievable, as simple as including the whole program in the slice.
|
|
\item[Correctness] The solution excludes all statements that don't affect
|
|
the slice. Most solutions are complete, but the degree of correctness is
|
|
what sets them apart, as smaller slices will not execute unnecessary
|
|
code to compute the values, decreasing the executing time.
|
|
\item[Features covered] Which features or language a slicing algorithm
|
|
covers. Different approaches to slicing cover different programming
|
|
languages and even paradigms. There are slicing techniques (published or
|
|
commercially available) for most popular programming languages, from C++
|
|
to Erlang. Some slicing techniques only cover a subset of the targeted
|
|
language, and as such are less useful for commercial applications, but
|
|
can be a stepping stone in the betterment of the field.
|
|
\item[Speed] Speed of graph generation and slice creation. As previously
|
|
commented, slicing is a two-step process: build a graph and traverse it.
|
|
The traversal is linear in most proposals, with small variations. Graph
|
|
generation tends to be longer and with higher variance, but it is not as
|
|
relevant, because it is only done once (per program being analyzed). As
|
|
such, this is the least important metric. Only proposals that deviate
|
|
from the aforementioned schema show a wider variation in speed.
|
|
\end{description}
|
|
|
|
\section{Exception handling in Java}
|
|
\label{sec:intro-exception}
|
|
|
|
Exception handling is common in most modern programming languages. In Java, it
|
|
consists of the following elements:
|
|
\begin{description}
|
|
\item[Throwable] An interface that encompasses all the exceptions or errors
|
|
that may be thrown. Child classes are \texttt{Exception} for most errors
|
|
and \texttt{Error} for internal errors in the Java Virtual Machine.
|
|
Exceptions can be classified in two categories: \textsl{unchecked}
|
|
(those inheriting from \texttt{RuntimeException} or \texttt{Error}) and
|
|
\textsl{checked} (the rest). The first may be thrown anywhere, whereas
|
|
the second, if thrown, must be caught or declared in the method header.
|
|
\item[throws] A statement that activates an exception, altering the normal
|
|
control-flow of the method. If the statement is inside a \textsl{try}
|
|
block with a \textsl{catch} clause for its type or any supertype, the
|
|
control flow will continue in the first statement of such clause.
|
|
Otherwise, the method is exited and the check performed again, until
|
|
either the exception is caught or the last method in the stack
|
|
(\textsl{main}) is popped, and the execution of the program ends
|
|
abruptly.
|
|
\item[try] This statement is followed by a block of statements and by one or
|
|
more \textsl{catch} clauses. All exceptions thrown in the statements
|
|
contained or any methods called will be processed by the list of
|
|
catches. Optionally, after the \textsl{catch} clauses a \textsl{finally}
|
|
block may appear.
|
|
\item[catch] Contains two elements: a variable declaration (the type must be
|
|
an exception) and a block of statements to be executed when an exception
|
|
of the corresponding type (or a subtype) is thrown. \textsl{catch}
|
|
clauses are processed sequentially, and if any matches the type of the
|
|
thrown exception, its block is executed, and the rest are ignored.
|
|
Variable declarations may be of multiple types \texttt{(T1|T2 exc)},
|
|
when two unrelated types of exception must be caught and the same code
|
|
executed for both. When there is an inheritance relationship, the parent
|
|
suffices.\footnotemark
|
|
\item[finally] Contains a block of statements that will always be executed
|
|
if the \textsl{try} is entered. It is used to tidy up, for example
|
|
closing I/O streams. The \textsl{finally} can be reached in two ways:
|
|
with an exception pending (thrown in \textsl{try} and not captured by
|
|
any \textsl{catch} or thrown inside a \textsl{catch}) or without it
|
|
(when the \textsl{try} or \textsl{catch} block end successfully). After
|
|
the last instruction of the block is executed, if there is an exception
|
|
pending, control will be passed to the corresponding \textsl{catch} or
|
|
the program will end. Otherwise, the execution continues in the next
|
|
statement after the \textsl{try-catch-finally} block.
|
|
\end{description}
|
|
|
|
\footnotetext{Introduced in Java 7, see \url{https://docs.oracle.com/javase/7/docs/technotes/guides/language/catch-multiple.html} for more details.}
|
|
|
|
\subsection{Exception handling in other programming languages}
|
|
|
|
In almost all programming languages, errors can appear (either through the
|
|
developer, the user or the system's fault), and must be dealt with. Most of the
|
|
popular object oriented programs feature some kind of error system, normally
|
|
very similar to Java's exceptions. In this section, we will perform a small
|
|
survey of the error-handling techniques used on the most popular programming
|
|
languages. The language list has been extracted from a survey performed by the
|
|
programming Q\&A website Stack
|
|
Overflow\footnote{\url{https://stackoverflow.com}}. The survey contains a
|
|
question about the technologies used by professional developers in their work,
|
|
and from that list we have extracted those languages with more than $5\%$ usage
|
|
in the industry. Table~\ref{tab:popular-languages} shows the list and its
|
|
source. Except Bash, Assembly, VBA, C and G, the rest of the languages shown
|
|
feature an exception system similar to the one appearing in Java.
|
|
|
|
\begin{table}
|
|
\begin{minipage}{0.6\linewidth}
|
|
\centering
|
|
\begin{tabular}{r | r }
|
|
\textbf{Language} & $\%$ usage \\ \hline
|
|
JavaScript & 69.7 \\ \hline
|
|
HTML/CSS & 63.1 \\ \hline
|
|
SQL & 56.5 \\ \hline
|
|
Python & 39.4 \\ \hline
|
|
Java & 39.2 \\ \hline
|
|
Bash/Shell/PowerShell & 37.9 \\ \hline
|
|
C\# & 31.9 \\ \hline
|
|
PHP & 25.8 \\ \hline
|
|
TypeScript & 23.5 \\ \hline
|
|
C++ & 20.4 \\ \hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
\begin{minipage}{0.39\linewidth}
|
|
\begin{tabular}{r | r }
|
|
\textbf{Language} & $\%$ usage \\ \hline
|
|
C & 17.3 \\ \hline
|
|
Ruby & 8.9 \\ \hline
|
|
Go & 8.8 \\ \hline
|
|
Swift & 6.8 \\ \hline
|
|
Kotlin & 6.6 \\ \hline
|
|
R & 5.6 \\ \hline
|
|
VBA & 5.5 \\ \hline
|
|
Objective-C & 5.2 \\ \hline
|
|
Assembly & 5.0 \\ \hline
|
|
\end{tabular}
|
|
\end{minipage}
|
|
% The caption has a weird structure due to the fact that there's a footnote
|
|
% inside of it.
|
|
\caption[Commonly used programming languages]{The most commonly used
|
|
programming languages by professional developers\protect\footnotemark}
|
|
\label{tab:popular-languages}
|
|
\end{table}
|
|
|
|
\footnotetext{Data from \url{https://insights.stackoverflow.com/survey/2019/\#technology-\_-programming-scripting-and-markup-languages}}
|
|
|
|
The exception systems that are similar to Java are mostly all the same,
|
|
featuring a \texttt{throw} statement (\texttt{raise} in Python), try-catching
|
|
structure and most include a finally block that may be appended to try blocks.
|
|
The difference resides in the value passed by the exception, which in languages
|
|
that feature inheritance it is a class descending from a generic error or
|
|
exception, and in languages without it, it is an arbitrary value (e.g.
|
|
JavaScript, TypeScript). In object--oriented programming, the filtering is
|
|
performed by comparing if the exception is a subtype of the exception being
|
|
caught (Java, C++, C\#, PowerShell\footnotemark, etc.); and in languages with
|
|
arbitrary exception values, a boolean condition is specified, and the first
|
|
catch block that fulfills its condition is activated, in following a pattern
|
|
similar to that of \texttt{switch} statements (e.g. JavaScript). In both cases
|
|
there exists a way to indicate that all exceptions should be caught, regardless
|
|
of type and content.
|
|
|
|
\footnotetext{Only since version 2.0, released with Windows 7.}
|
|
|
|
On the other hand, in the other languages there exist a variety of systems that
|
|
emulate or replace exception handling:
|
|
|
|
\begin{description} % bash, vba, C and Go exceptions explained
|
|
\item[Bash] The popular Bourne Again SHell features no exception system, apart
|
|
from the user's ability to parse the return code from the last statement
|
|
executed. Traps can also be used to capture erroneous states and tidy up all
|
|
files and environment variables before exiting the program. Traps allow the
|
|
programmer to react to a user or system--sent signal, or an exit run from
|
|
within the Bash environment. When a trap is activated, its code run, and the
|
|
signal doesn't proceed and stop the program. This doesn't replace a fully
|
|
featured exception system, but \texttt{bash} programs tend to be small in
|
|
size, with programmers preferring the efficiency of C or the commodities of
|
|
other high--level languages when the task requires it.
|
|
\item[VBA] Visual Basic for Applications is a scripting programming language
|
|
based on Visual Basic that is integrated into Microsoft Office to automate
|
|
small tasks, such as generating documents from templates, making advanced
|
|
computations that are impossible or slower with spreadsheet functions, etc.
|
|
The only error--correcting system it has is the directive \texttt{On Error
|
|
$x$}, where $x$ can be 0 ---lets the error crash the program---,
|
|
\texttt{Next} ---continues the execution as if nothing had happened--- or a
|
|
label in the program ---the execution jumps to the label in case of
|
|
error. The directive can be set and reset multiple times, therefore creating
|
|
artificial \texttt{try-catch} blocks, but there is no possibility of
|
|
attaching a value to the error, lowering its usefulness.
|
|
\item[C] In C, errors can also be control via return values, but some of the
|
|
instructions it features can be used to create a simple exception system.
|
|
\texttt{setjmp} and \texttt{longjmp} are two instructions which set up and
|
|
perform inter--function jumps. The first makes a snapshot of the call stack
|
|
in a buffer, and the second returns to the position where the buffer was
|
|
safe, destroying the current state of the stack and replacing it with the
|
|
snapshot. Then, the execution continues from the evaluation of
|
|
\texttt{setjmp}, which returns the second argument passed to
|
|
\texttt{longjmp}. An example can be seen in figure~\ref{fig:exceptions-c},
|
|
where line 2 of the \texttt{main} function will be executed twice, once when
|
|
it is normally run (returning 0) and the second when line 3 in
|
|
\texttt{safe\_sqrt} is run, returning the second argument of line 3, and
|
|
therefore entering the else block in the \texttt{main} method.
|
|
\item[Go] The programming language Go is the odd one out in this section, being a
|
|
modern programming language without exceptions, though it is an intentional
|
|
design decision made by its authors\footnotemark. The argument made was that
|
|
exception handling systems introduce abnormal control--flow and complicate
|
|
code analysis and clean code generation, as it is not clear the paths that
|
|
the code may follow. Instead, Go allows functions to return multiple values,
|
|
with the second value typically associated to an error type. The error is
|
|
checked before the value, and acted upon. Additionally, Go also features a
|
|
simple panic system, with the functions \texttt{panic} ---throws an
|
|
exception with a value associated---, \texttt{defer} ---runs after the
|
|
function has ended or when a \texttt{panic} has been activated--- and
|
|
\texttt{recover} ---stops the panic state and retrieves its value. The
|
|
\texttt{defer} statement doubles as catch and finally, and multiple
|
|
instances can be accumulated. When appropriate, they will run in LIFO order
|
|
(Last In--First Out).
|
|
Then, the exception \carlos{complete}
|
|
\end{description}
|
|
|
|
\footnotetext{\url{https://golang.org/doc/faq\#exceptions}}
|
|
|
|
\begin{figure} % example of exception system in C
|
|
\centering
|
|
\begin{minipage}{0.5\linewidth}
|
|
\begin{lstlisting}[language=C]
|
|
int main() {
|
|
if (!setjmp(ref)) {
|
|
res = safe_sqrt(x, ref);
|
|
} else {
|
|
// Handle error
|
|
printf /* ... */
|
|
}
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\begin{minipage}{0.49\linewidth}
|
|
\begin{lstlisting}[language=C]
|
|
double safe_sqrt(double x, int ref) {
|
|
if (x < 0)
|
|
longjmp(ref, 1);
|
|
return /* ... */;
|
|
}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\caption{A user-created exception system in C}
|
|
\label{fig:exceptions-c}
|
|
\end{figure}
|
|
|
|
% vim: set noexpandtab:tabstop=2:shiftwidth=2:softtabstop=2:wrap
|