checkpoing

This commit is contained in:
Carlos Galindo 2019-12-09 20:47:09 +00:00
parent 1eb9175286
commit 20bfa1a8c0
23 changed files with 519 additions and 443 deletions

View File

@ -9,26 +9,23 @@ Before delving into the specific problems that exist in program slicing currentl
\section{Program slicing}
This section provides a series of definitions and background information so that future definitions can be grounded in a common foundation. \carlos{ampliar intro?}
\begin{definition}[Program slicing] \label{def:program-slicing}
\textit{Program slicing} is the process of extracting a slice $S$ given a program $P$ and a slicing criterion $SC$.
\end{definition}
This section provides a series of definitions and background information so that future definitions can be grounded in a common foundation.
\begin{definition}[Slicing criterion] \label{def:slicing-criterion}
Given a program $P$, composed of statements and containing variables $x_1, x_2 ... x_n \in \textnormal{vars}$, a \textit{slicing criterion} is a tuple $SC = \langle s, v \rangle$ where $s \in P$ is a single statement that belongs to the program, and $v$ is a set of variables from $P$. Each variable in $v$ may not appear in $s$.
Given a program $P$, composed of statements and containing variables $x_1, x_2 ... x_n \in \textnormal{vars}$, a \textit{slicing criterion} is a tuple $\langle s, v \rangle$ where $s \in P$ is a single statement that belongs to the program, and $v$ is a set of variables from $P$.
\end{definition}
\begin{definition}[Slice] \label{def:slice}
Given a program $P$ and a slicing criterion $SC = \langle s, v \rangle$, a \textit{slice} is a subset of statements of $P$ ($S \subset P$), which behaves like the original program $P$, when considering the values of the variables in $v$ in statement $s$.
\end{definition}
The reader should note that the variables in $v$ may not appear in $s$.
\begin{definition}[Execution history] \label{def:execution-history}
Given a program $P$, composed of a set of statements $S = \{s_1, s_2, s_3 ... s_n\}$, and a set of input values $I$, the \textit{execution history} of $P$ given $I$ is the list of statements $H$ that is executed, in the order that they were executed.
\end{definition}
\textit{Program slicing} is the process of extracting a slice given a program and a slicing criterion. A \textit{slice} is a subset of statements of a program which behaves like the original program, at the slicing criterion.
Until now, the concept of slicing has been centred around finding the instructions that affect a variable.
That is the original definition, but as time has progressed, variations have been proposed, with the one described in definitions \ref{def:program-slicing}, \ref{def:slicing-criterion} and \ref{def:slice} is called \textit{static backward slicing}.
That is the original definition, but as time has progressed, variations have been proposed.
The variation described until now is called \textit{static backward slicing}.
It is also the one that will be used throughout this thesis, though the errors detected and solutions proposed can be easily generalized to others.
The different variations are described later in this chapter, but there exist two fundamental dimensions along which the slicing problem can be proposed \cite{Sil12}:
@ -37,53 +34,48 @@ The different variations are described later in this chapter, but there exist tw
\textit{Static slicing} \cite{Sil12} produces slices that consider all possible executions of the program: the slice will be correct regardless of the input supplied.
In contrast, \textit{dynamic slicing} \cite{KorL88,AgrH90b} considers a single execution of the program, thus, limiting the slice to the statements present in an execution log.
The slicing criterion is expanded to include a position in the execution history that corresponds to one instance of the selected statement, making it much more specific.
It may help find \carlos{idk if I need the ``to''} a bug related to indeterministic behaviour ---such as a random or pseudo-random number generator--- but, despite selecting the same slicing criterion in the same program, the slice must be recomputed for each set of input values or execution considered. \carlos{Talk about quasi-static as a middle ground?}
It may help find bugs related to indeterministic behaviour---such as a random or pseudo-random number generator---but, despite selecting the same slicing criterion in the same program, the slice must be recomputed for each set of input values or execution considered.
\item \textit{Backward} or \textit{forward}: \textit{backward slicing} \cite{Sil12} looks for the statements that affect the slicing criterion.
It sits among the most commonly used slicing technique.
In contrast, \textit{forward slicing} \cite{BerC85} computes the statements that are affected by the slicing criterion.
In contrast, \textit{forward slicing} \cite{BerC85,GalL91} computes the statements that are affected by the slicing criterion.
There also exists a middle-ground approach called \textit{chopping} \cite{JacR94}, which is used to find all the statements that affect some variables in the slicing criterion and at the same time they are affected by some other variables in the slicing criterion.
\end{itemize}
Since the seminal definition of program slicing by Weiser \cite{Wei81}, the most studied variation of slicing has been \textit{static backward slicing}, which has been defined in previous sections of this thesis.
That definition can be split in two sub-types, \textit{strong} and \textit{weak} slices, with different levels of requirements and uses in different fields.
That definition can be split in two sub-types, \textit{strong} and \textit{weak} slices, with different levels of requirements and uses in different fields. First, though, we need to introduce and additional concept: the sequence of values.
\begin{definition}[Strong static backward slice \cite{Tip95}]
\begin{definition}[Sequence of values \cite{PerST19}]
\label{def:seq}
Let $P$ be a program and $\langle s, v\rangle$ be a slicing criterion of $P$. $seq(P, s, v)$ is the sequence of values the slicing criterion $v$ is evaluated to, at $s$, during the execution of $P$.
\end{definition}
\begin{definition}[Strong static backward slice \cite{Wei81,GalL91}]
\label{def:strong-slice}
Given a program $P$ and a slicing criterion $SC = \langle s,v \rangle$, $S$ is a \textit{strong static backward slice} of $P$ with
respect to $SC$ if $S$ fulfils the following properties:
\begin{enumerate}
\item $S$ is an executable program.
\item $S \subseteq P$, or $S$ is the result of removing 0 or more statements from $P$.
\item For any input $I$, the values produced on each execution of $s$
for each of the variables in $v$ is the same when executing $S$ as
when executing $P$. \label{enum:exact-output}
\item For any possible input, $\mathit{seq}(P, s, v) = \mathit{seq}(S, s, v)$.
\end{enumerate}
\end{definition}
\sergio{Esta definicion no obligaba tambien a acabar con el mismo error en caso de que la ejecucion no termine? Si es asi, plantearse poner algo al respecto.}
\josep{hay que revisar la definición de (1) Weiser, (2) Binkley y Gallagher y (3) Frank Tip. Mi opinion es que NO: Creo que no es necesario que el error se repita. Lo que dice es que el valor de las variables del SC debe ser el mismo, pero no dice nada del error.}
% \sergio{Esta definicion no obligaba tambien a acabar con el mismo error en caso de que la ejecucion no termine? Si es asi, plantearse poner algo al respecto.}
% \josep{hay que revisar la definición de (1) Weiser, (2) Binkley y Gallagher y (3) Frank Tip. Mi opinion es que NO: Creo que no es necesario que el error se repita. Lo que dice es que el valor de las variables del SC debe ser el mismo, pero no dice nada del error.}
\begin{definition}[Weak static backward slice \cite{RepY89}]
\begin{definition}[Weak static backward slice \cite{BinG96}]
\label{def:weak-slice}
\josep{Si esa cita no es, entonces puedes usar la de Binkley: \cite{BinG96}}
Given a program $P$ and a slicing criterion $SC = \langle s,v \rangle$, $S$ is the \textit{weak static backward slice} of $P$ with respect to $SC$ if $S$ fulfils the following properties:
Given a program $P$ and a slicing criterion $\langle s,v \rangle$, $S$ is the \textit{weak static backward slice} of $P$ with respect to $SC$ if $S$ fulfils the following properties:
\begin{enumerate}
\item $S$ is an executable program.
\item $S \subseteq P$, or $S$ is the result of removing 0 or more statements from $P$.
\item For any input $I$, the values produced on each execution of $s$
for each of the variables in $v$ when executing $P$ is a prefix of
those produced while executing $S$ ---which means that the slice
may continue producing values, but the first values produced always
match up with all those produced by the original program.
\item For any possible input, $\mathit{seq}(P, s, v)$ is a prefix of $\mathit{seq}(S, s, v)$.
\end{enumerate}
\end{definition}
\sergio{$\forall~i~\in~I, v\in~V~\rightarrow~seq(i,v,P)~Pref~seq(i,v,S)$ where $seq(i,a,A)$ representa la secuencia de valores obtenidos para $a$ al ejecutar el input $i$ en el programa $A$. $I$ es el conjunto de todos los inputs posibles para $P$. Por ahi irian los tiros creo yo.} \sergio{Formalizacion existente en el repo: Program Slicing $\rightarrow$ Trabajos $\rightarrow$ Erlang Benchmarks $\rightarrow$ Papers $\rightarrow$ ICSM 2018 $\rightarrow$ Submitted (Section III - A)}
\josep{Si se formaliza con el uso de seq, entonces puedes mirar la definicion del paper de POI testing (Sergio sabe cual es).}
Both definitions (\ref{def:strong-slice} and~\ref{def:weak-slice}) are
used throughout the literature (see, e.g., \cite{pending}\carlos{Which citation? Most papers on exception slicing do not indicate or hint whether they use strong or weak.}\sergio{Josep?}\josep{para Strong se puede poner a Weiser. Para Weak se puede poner a Binkley \cite{BinG96}}).
Most do not differentiate them, or acknowledge the other variant, because most publications focus on one variant exclusively.
Therefore, although the definitions come from different authors, the \textit{weak} and \textit{strong} nomenclature employed here originates from a control dependency analysis by Danicic~\cite{DanBHHKL11}, where slices that produce the same output as the original are named \textit{strong}, and those where the original is a prefix of the slice, \textit{weak}.
Both Definition~\ref{def:strong-slice} and Definition~\ref{def:weak-slice} are
used throughout the literature.
Most publications do not differentiate them, as they work with one of them without acknowledging the other variant.
Therefore, although the definitions come from different authors, the \textit{weak} and \textit{strong} nomenclature employed throughout this thesis originates from a control dependence analysis by Danicic~\cite{DanBHHKL11}, where slices that produce the same output as the original are named \textit{strong}, and those where the original is a prefix of the slice, \textit{weak}.
Different applications of program slicing use the option that fits their needs, though \textit{weak} is used if possible, because the resulting slices are smaller statement-wise, and the algorithms used tend to be simpler.
Of course, if the application of program slices requires the slice to behave exactly like the original program, then \textit{strong} slices are the only option.
@ -93,15 +85,15 @@ In contrast, program specialization requires strong slicing, as it extracts feat
Along the thesis, we indicate which kind of slice is produced with each problem detected and technique proposed.
\begin{example}[Strong, weak and incorrect slices]
Consider table~\ref{tab:slice-weak}, which displays the sequence of values or execution history obtained with respect to different slices of a program and the same slicing criterion.
Consider table~\ref{tab:slice-weak}, which displays the sequence of values obtained with respect to different slices of a program and the same slicing criterion.
The first row stands for the original program, which computes $3!$.
The first row stands for the original program's sequence of values, which computes $3!$.
Slice A's execution history is identical to the original and therefore it is a strong slice.
Slice A's sequence of values is identical to the original and therefore it is a strong slice.
Slice B's execution history does not stop after producing the same first 3 values as the original: it is a weak slice. An instruction responsible for stopping the loop may have been excluded from the slice.
Slice B's sequence does not stop after producing the same first 3 values as the original: it is a weak slice. An instruction responsible for stopping the loop may have been excluded from the slice.
Slice C is incorrect, as the execution history differs from the original program in the second column. It seems that some dependency has not been accounted for and the value is not updating.
Slice C is incorrect, as the sequence differs from the original program in the second column. It seems that some dependence has not been accounted for and the value is not updating.
\begin{table}
\centering
@ -117,16 +109,6 @@ Along the thesis, we indicate which kind of slice is produced with each problem
\end{table}
\end{example}
\carlos{The following paragraph has already been repeated in previous sections, mainly the motivation. Consider its removal and the addition of citations to the previous mention.}
\josep{Even though the original proposal by Weiser~\cite{Wei81} focussed on an imperative language, program slicing is a language--agnostic technique.} Program slicing is a language--agnostic technique, but the original proposal by
Weiser~\cite{Wei81} covered a simple imperative programming language.
Since then, the literature has been expanded by dozens of authors, that have
described and implemented slicing for more complex structures, such as
uncontrolled control flow~\cite{HorwitzRB88}, global variables~\cite{???},
exception handling~\cite{AllH03}; and for other programming paradigms, such as
object--oriented languages~\cite{???} or functional languages~\cite{???}.
\carlos{Se pueden poner más, faltan las citas correspondientes.}\sergio{Guay, hay que buscarlas y ponerlas, la biblio la veo corta para todos los papers que hay, yo creo que cuando este todo deberia haber sobre 30 casi, si no mas.} \josep{Si. Muchas de esas referencias puedes sacarlas de los ultimos surveys de slicing.}
\subsection{Computing program slices with the system dependence graph}
@ -135,8 +117,8 @@ It is computed from the program's source code, and once built, a slicing criteri
Its efficiency relies on the fact that, for multiple slices performed on the same program, the graph generation process is only performed once.
Performance-wise, building the graph has quadratic complexity ($\mathcal{O}(n^2)$), and its traversal to compute the slice has linear complexity ($\mathcal{O}(n)$); both with respect to the number of statements in the program being sliced.
The SDG is a directed graph, and as such it has a set of nodes, each representing a statement in the program ---barring some auxiliary nodes introduced by some approaches--- and a set of directed edges, which represent the dependencies among nodes.
Those edges represent several kinds of dependencies ---control, data, calls, parameter passing, summary.
The SDG is a directed graph, and as such it has a set of nodes, each representing a statement in the program---barring some auxiliary nodes introduced by some approaches---and a set of directed edges, which represent the dependencies among nodes.
Those edges represent several kinds of dependencies: control, data, calls, parameter passing, summary.
To create the SDG, first a \textit{control flow graph} (CFG) is built for each method in the program, some dependencies are computed based on the CFG.
With that data, a new graph representation is created, called the \textit{program dependence graph} (PDG) \cite{OttO84}.
@ -148,7 +130,7 @@ The process is performed twice, the first time ignoring a specific kind of edge,
Once the second pass has finished, all the nodes visited form the slice.
\begin{example}[The creation of a system dependence graph]
\label{exa:create-sdg} \sergio{Este ejemplo da demasiados detalles en cuanto a los grafos.}
\label{exa:create-sdg}
Consider the code provided in Figure~\ref{fig:create-sdg-code}, where a simple Java program containing two methods (\texttt{main} and \texttt{multiply}) is displayed.
\begin{figure}[h]
@ -171,7 +153,7 @@ int multiply(int x, int y) {
\label{fig:create-sdg-code}
\end{figure}
Now turn your attention to Figure~\ref{fig:create-sdg-cfg}\carlos{is this too personal? the second person is used in other places, but not as directly}: a CFG has been created for each method. The CFG has a unique source node (without incoming edges) and a unique sink node (without outgoing edges), named ``Entry'' and ``Exit''. In between, the statements are structured according to all possible executions that could happen.
Figure~\ref{fig:create-sdg-cfg} contains one CFG per method. Each CFG has a unique source node (without incoming edges) and a unique sink node (without outgoing edges), named ``Enter'' and ``Exit''. In between, the statements are structured according to all possible executions that could happen according to Java's semantics.
\begin{figure}[h]
\centering
@ -203,11 +185,11 @@ The following list details the most relevant metrics considered when evaluating
\item[Completeness.] The solution includes all the statements that affect the slicing criterion. This is the most important feature, and almost all techniques and implemented tools set to achieve at least the generation of complete slices. There exists a trivial way of achieving completeness, by including the whole program in the slice.
\item[Correctness.] The solution excludes all statements that do not affect the slicing criterion. Most solutions are complete, but the degree of correctness is what sets them apart, as solutions that are more correct will produce smaller slices, which will execute fewer instructions to compute the same values, decreasing the executing time and complexity.
\item[Features covered.] Which features (polymorphism, global variables, arrays, etc.), programming languages or paradigms a slicing tool is able to cover. There are slicing tools (publicly published or commercially available) for most popular programming languages, from C++ to Erlang. Some slicing techniques only cover a subset of the targeted language, and as such are less useful, but can be a stepping stone in the betterment of the field. There also exist tools that cover multiple languages or that are language-independent \cite{BinGHI14}. A small set-back of language-independent tools is that they are not as efficient in other metrics.
\item[Resource consumption.] Speed and memory consumption for the graph generation and slice creation. As previously stated, slicing is a two-step process: building a graph and traversing it, with the first process being quadratic and the second lineal (in time). Proposals that build upon the SDG try to keep traversal linear, even if that means making the graph bigger or slowing down its building process.
\item[Performance.] Speed and memory consumption for the graph generation and slice creation. As previously stated, slicing is a two-step process: building a graph and traversing it, with the first process being quadratic and the second lineal (in time). Proposals that build upon the SDG try to keep traversal linear, even if that means making the graph bigger or slowing down its building process.
Though this metric may not seem as important as others, program slicing is not a simple analysis. On top of that, some applications of software slicing like debugging constantly change the program and slicing criterion, which makes faster slicing software preferable for them.
Memory consumption is less relevant, mainly due to its availability, but could become a concern in big systems with millions of lines of code. \carlos{Check this.}
Regarding memory consumption, it is not currently a problem, given that the amount available in most workstations and servers is enough to run any slicing algorithm. It could become a concern in big programs with millions of lines of code, or in embedded systems, where memory is scarce.
\end{description}
\subsection{Variations and applications of program slicing}
@ -220,30 +202,33 @@ Each variation of program slicing answers a different question and serves a diff
\item[Backward static.] Used to obtain the lines that affect the slicing criterion,
normally used on a line which contains an incorrect value, to track down
the source of the bug.
\item[Forward static \cite{GalL91}.] Used to obtain the lines affected by the slicing criterion,
\item[Forward static.] Used to obtain the lines affected by the slicing criterion,
used to perform software maintenance: when changing a statement, slice the program w.r.t. that statement to discover the parts of the program that will be affected by the change.
\item[Chopping static.] Obtains both the statements affected by and the
statements that affect the selected statement. \carlos{Add application and verify question.}
\item[Chopping.] Given two slicing criteria, it obtains the intersection between the statements affected by the first criterion and the statements that affect the second criterion. It is mainly used for debugging applications.
\item[Dynamic.] Can be combined with any of the previous variations, and
limits the slice to an execution history, only including statements that
have run in a specific execution. The slice produced is much smaller and
useful, but must be recomputed each time. It can be used for debugging when the input values that cause the error are known.
\item[Quasi--static.] In this slicing variant, some input values are given, and some are left
\item[Quasi-static.] In this slicing variant, some input values are given, and some are left
unspecified: the result is a slice sized between the small dynamic slice and
the general but bigger static slice. It can be specially useful when
debugging a set of function calls which have a specific static input for
some parameters, and variable input for others.
\item[Simultaneous.] Similar to dynamic slicing, but considers multiple
executions instead of only one. It is another middle ground between static and dynamic slicing, similarly to quasy-static slicing.
executions instead of only one. It is another middle ground between static and dynamic slicing, similarly to quasi-static slicing.
Likewise, it can offer a slightly bigger slice than pure dynamic slicing while keeping the scope focused on the slicing criterion and the set of executions.
\end{description}
There exist many more, which have been detailed in surveys of the field, such as \cite{Sil12}, which analyzes the different dimensions that can be used to classify slicing techniques.
There exist many more, which have been detailed in surveys of the field, such as \cite{Sil12}.
\section{Exception handling in Java}
\section{Exception handling}
\label{sec:intro-exception}
Exception handling is common in most modern programming languages. It generally consists of a few new instructions used to modify the normal execution flow and later return to it. Exceptions are used to react to an abnormal program behaviour (controlled or not), and either solve the error and continue the execution, or stop the program gracefully. In our work we focus on the Java programming language, so in the following, we describe the elements that Java uses to represent and handle exceptions:
Exception handling is common in most modern programming languages. It generally consists of a few new instructions used to modify the normal execution flow and later return to it. Exceptions are used to react to an abnormal program behaviour (controlled or not), and either solve the error and continue the execution, or stop the program gracefully.
\subsection{Exception handling in Java}
In our work we focus on the Java programming language, so in the following, we describe the elements that Java uses to represent and handle exceptions:
\begin{description}
\item[Throwable.] A type that encompasses all the exceptions or errors
@ -254,25 +239,25 @@ Exception handling is common in most modern programming languages. It generally
checked exceptions, if thrown, must be either caught in the same method or declared in the method header.
\item[throws.] A statement that activates an exception, altering the normal
control-flow of the method. If the statement is inside a \texttt{try}
block with a \texttt{catch} clause for its type or any supertype, the
control flow will continue in the first statement of such clause.
block with a \texttt{catch} statement for its type or any super type, the
control flow will continue in the first statement inside the \texttt{catch} statement.
Otherwise, the method is exited and the check performed again, until
either the exception is caught or the last method in the stack
(the \texttt{main} method) is popped, and the execution of the program ends
abruptly.
\item[try.] This statement contains a block of statements and one
or more \texttt{catch} clauses and/or a \texttt{finally} block.
or more \texttt{catch} statement and/or a \texttt{finally} statement.
All exceptions thrown in the statements contained or any methods called will be processed by the list of \texttt{catch} statements. If no \texttt{catch} matches the type of the exception, the exception propagates to the \texttt{try} block that contains the current one, or, in its absence, the method that called the current one.
\item[catch.] Contains two elements: a variable declaration, whose type must extend from \texttt{Throwable}, and a block of statements to be executed when an exception of a matching type is thrown.
The type of a thrown exception $T_1$ matches the type of a \texttt{catch} statement $T_2$ if one of the following is true: (1) $T_1 = T_2$, (2) $T_1~\textnormal{extends}~T_2$, (3) $T_1~\textnormal{extends}~T \wedge T~\textnormal{matches}~T_2$.
\textit{catch} clauses are processed sequentially, although their order does not matter, due to the restriction that each type must be placed after all of its subtypes.
When a matching clause is found, its block is executed and the rest are ignored.
\texttt{catch} statements are processed sequentially, although their order does not matter, due to the restriction that each type must be placed after all of its subtypes.
When a matching \texttt{catch} is found, its block is executed and the rest are ignored.
Variable declarations may be of multiple types \texttt{(T1|T2 e)}, when two unrelated types of exception must be caught and the same code executed for both.
If there is an inheritance relationship, the parent suffices.\footnotemark
\item[finally.] Contains a block of statements that will always be executed, no matter what, if the \textit{try} is entered.
It is used to tidy up, for example closing I/O streams. The \texttt{finally} statement can be reached in two ways:
with an exception pending ---thrown in \texttt{try} and not captured by
any \texttt{catch}, or thrown inside a \texttt{catch}--- or without it
with an exception pending---thrown in \texttt{try} and not captured by
any \texttt{catch}, or thrown inside a \texttt{catch}---or without it
(when the \texttt{try} or \texttt{catch} end successfully). After
the last instruction of the block is executed, if there is an exception
pending, control will be passed to the corresponding \texttt{catch} or
@ -365,9 +350,9 @@ Regarding the languages that do not offer an exception handling mechanism simila
small tasks, such as generating documents from templates, making advanced
computations that are impossible or slower with spreadsheet functions, etc.
The only error--correcting system it has is the directive \texttt{On Error
$x$}, where $x$ can be 0 ---lets the error crash the program---,
\texttt{Next} ---continues the execution as if nothing had happened--- or a
label in the program ---the execution jumps to the label in case of
$x$}, where $x$ can be 0---lets the error crash the program---,
\texttt{Next}---continues the execution as if nothing had happened---or a
label in the program---the execution jumps to the label in case of
error. The directive can be set and reset multiple times, therefore creating
artificial \texttt{try-catch} blocks, but there is no possibility of
attaching a value to the error, lowering its usefulness.
@ -408,7 +393,7 @@ double safe_sqrt(double x, int ref) {
\label{fig:exceptions-c}
\end{figure}
Consider Figure~\ref{fig:exceptions-c}: in the \texttt{main} function, line 2 will be executed twice: first when
it is normally reached ---returning 0 and continuing in line 3--- and the second when line 3 in
it is normally reached---returning 0 and continuing in line 3---and the second when line 3 in
\texttt{safe\_sqrt} is run, returning the second argument of \texttt{longjmp},
and therefore entering the else block in the \texttt{main} method.
\end{example}
@ -420,10 +405,10 @@ double safe_sqrt(double x, int ref) {
the code may follow. Instead, Go allows functions to return multiple values,
with the second value typically associated to an error type. The error is
checked before the value, and acted upon. Additionally, Go also features a
simple panic system, with the functions \texttt{panic} ---throws an
exception with a value associated---, \texttt{defer} ---runs after the
function has ended or when a \texttt{panic} has been activated--- and
\texttt{recover} ---stops the panic state and retrieves its value. The
simple panic system, with the functions \texttt{panic}---throws an
exception with a value associated---, \texttt{defer}---runs after the
function has ended or when a \texttt{panic} has been activated---and
\texttt{recover}---stops the panic state and retrieves its value. The
\texttt{defer} statement doubles as catch and finally, and multiple
instances can be accumulated. When appropriate, they will run in LIFO
(Last In--First Out) order.

View File

@ -2,7 +2,7 @@
% !TEX spellcheck = en_GB
% !TEX root = ../paper.tex
\chapter{Conclusion}
\chapter{Conclusions}
\label{cha:conclusion}
\carlos{todo}
\carlos{todo: future work (implementacion, mejora de la correcteness para el try-catch), soluciones aportadas (problemas detectados de completitud, de correccion y generalizacion, propuesta solucion de las generalizaciones), valor de la tesis}

View File

@ -1,7 +1,7 @@
% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = ../paper.tex
\chapter{Main explanation?}
\chapter{Program slicing with exception handling}
\label{cha:incremental}
\section{First definition of the SDG}
@ -9,48 +9,49 @@
The SDG is the most common data structure for program representation in the field of program slicing.
It was first proposed by Horwitz et al. \cite{HorwitzRB88} and, since then, many approaches to program slicing have based their models on it.
It builds upon the existing CFG, which represents the control flow between the instructions of a method. Then, it creates a PDG using the CFG's vertices and the dependencies computed from it.
It builds upon the existing CFG, which represents the control flow between the statements of a method. Then, it creates a PDG using the CFG's vertices and the dependencies computed from it.
The SDG is finally built from the assembly of the different method's PDGs, linking each method call to its corresponding definition.
Because each graph is built from the previous one, new statements and instructions can be added with to the CFG, without the need to alter the algorithm that converts each CFG to PDG and then to the final SDG.
The only modification possible is the redefinition of an already defined dependency or the addition of new kinds of dependence.
Because each graph is built from the previous one, new statements and statements can be added with to the CFG, without the need to alter the algorithm that converts each CFG to PDG and then to the final SDG.
The only modification possible is the redefinition of an already defined dependence or the addition of new kinds of dependence.
The seminal appearance of the SDG covers a simple imperative programming language, featuring procedures and basic instructions like calls, variable assignments, arithmetic and logic operators and conditional instructions (branches and loops).
The seminal appearance of the SDG covers a simple imperative programming language, featuring procedures and basic statements like calls, variable assignments, arithmetic and logic operators and conditional statements (branches and loops).
\begin{definition}[Control Flow Graph \carlos{add original citation}]
\begin{definition}[Control Flow Graph (based on \cite{Allen70})]
\label{def:cfg}
Given a method $M$, which contains a list of statements $s = \{s_1, s_2, ...\}$, the \emph{control flow graph} of $M$ is a directed graph $G = \langle N, E \rangle$, where:
\begin{itemize}
\item $N = s \cup \{`\textnormal{Enter}', `\textnormal{Exit}'\}$: a set of nodes such that for each statement $s_i$ in $s$ there is a node in $N$ labelled with $s_i$ and two special nodes ``Enter'' and ``Exit'', which represent the beginning and end of the method, respectively.
\item $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$. $e \in E$ if and only if there is a possible execution of $M$ where $n_2$ is executed immediately after $n_1$.
\item $N = s \cup \{\textnormal{Enter}, \textnormal{Exit}\}$: a set of nodes such that for each statement $s_i$ in $s$ there is a node in $N$ labelled with $s_i$ and two special nodes ``Enter'' and ``Exit'', which represent the beginning and end of the method, respectively.
\item $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$. There exist edges between normal statements, in the order they appear in the program: the ``Enter'' node is connected to the first statement, which in turn is connected to the second, etc. Additionally, conditional statements (i.e., \texttt{if}) have two outgoing edges: one towards the first statement executed if the condition evaluates to \textit{true} and another towards the first statement if the condition evaluates to \textit{false}.
\end{itemize}
\end{definition}
Most algorithms, in order to generate the SDG, mandate the ``Enter'' node to be the only source and the ``Exit'' node to be the only sink in the graph.
In general, expressions are not evaluated when generating the CFG; so an \texttt{if} conditional instruction will two outgoing edges regardless the condition value being always true or false (e.g., \texttt{1 == 0}).
In general, expressions are not evaluated when generating the CFG; so an \texttt{if} conditional statement will two outgoing edges regardless the condition value being always true or false (e.g., \texttt{1 == 0}).
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence. First, though, we need to define the concept of postdominance in a graph, as it is necessary in the definition of control dependency:
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence. First, though, we need to define the concept of postdominance in a graph, as it is necessary in the definition of control dependence:
\begin{definition}[Postdominance \carlos{add original citation?}]
\begin{definition}[Postdominance \cite{Tip95}]
\label{def:postdominance}
Let $C = (N, E)$ be a CFG, and $n_e \in N$ the ``Exit'' node of $C$. $b \in N$ \textit{postdominates} $a \in N$ if and only if $b$ is present on every possible sequence from $a$ to $n_e$.
Let $C = (N, E)$ be a CFG. $b \in N$ \textit{postdominates} $a \in N$ if and only if $b$ is present on every possible sequence from $a$ to ``Exit''.
\end{definition}
From the previous definition, given that the ``Exit'' node is the only sink in the CFG, every node will have a path to it, so it follows that any node postdominates itself.
\begin{definition}[Control dependency \cite{HorwitzRB88}]
\begin{definition}[Control dependence \cite{HorwitzRB88}]
\label{def:ctrl-dep}
Let $C = (N, E)$ be a CFG. $b \in N$ is \textit{control dependent} on $a \in N$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $\{(a, n) |~(a, n) \in E, n \in N\}$ ($a$'s successors).
Let $C = (N, E)$ be a CFG. $b \in N$ is \textit{control dependent} on $a \in N$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $\{n~|~(a, n) \in E, n \in N\}$ ($a$'s successors).
\end{definition}
It follows that a node with less than two outgoing edges cannot be the source of control dependence.
\begin{definition}[Data dependency \cite{HorwitzRB88}]
\begin{definition}[Data dependence \cite{HorwitzRB88}]
\label{def:data-dep}
Let $C = (N,E)$ be a CFG.
$b \in N$ is \textit{data dependent} on $a \in N$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists in $C$ a sequence of edges from $a$ to $b$ where $x$ is not defined.
\end{definition}
Data dependency was originally defined as flow dependency, and subcategorized into loop-carried and loop-independent flow-dependencies, but that distinction is no longer used to compute program slices with the SDG. It should be noted that variable definitions and uses can be computed for each statement independently, analysing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
Data dependence was originally defined as flow dependence, and subcategorized into loop-carried and loop-independent flow-dependencies, but that distinction is no longer used to compute program slices with the SDG. It should be noted that variable definitions and uses can be computed for each statement independently, analysing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
With the data and control dependencies, the PDG may now be built by replacing the
edges from the CFG by data and control dependence edges. The first tends to be
@ -68,31 +69,10 @@ In the examples, data and control dependencies are represented by red and black
\end{enumerate}
\end{definition}
Regarding the graphical representation of the PDG, the most common one is a tree-like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable.
Regarding the graphical representation of the PDG, the most common one is a tree-like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable. An example of the creation of the PDGs of a program's methods can be seen in Example~\ref{exa:simple-pdg}.
Finally, the SDG is built from the combination of all the PDGs for every method that compose the program:
\begin{definition}[System dependence graph \cite{HorwitzRB88}]
\label{def:sdg}
Given a program $P$, composed of a set of methods $M = \{m_0 ... m_n\}$ and their associated PDGs ---each method $m_i$ has a $PDG^i = \langle N^i, E_c^i, E_d^i \rangle$.
The \textit{system dependence graph} (SDG) of $P$ is a graph $G = \langle N, E_c, E_d, E_{call} \rangle$ where:
\begin{enumerate}
\item $N = \bigcup_{i=0}^n N^i$
\item $E_c = \bigcup_{i=0}^n E_c^i$
\item $E_d = \bigcup_{i=0}^n E_d^i$
\item $(a, b) \in E_{call}$ if and only if $a$ is a statement that contains a call and $b$ is a method ``Enter'' node of the function or method called by $a$. $(a, b)$ is a \textit{call edge}.
% These will be defined later when adding function calls.
% \item $E_{in}$ (\textit{parameter-input} or \textit{param-in edges})
% \item $E_{out}$ (\textit{parameter-output} or \textit{param-out edges})
% \item $E_{sum}$ (\textit{summary edges})
\end{enumerate}
\end{definition}
Regarding call edges, in programming languages with ambiguous method calls (those that have polymorphism or pointers), there may exist multiple outgoing call edges from a statement with a single method call.
To avoid confusion, the ``Enter'' nodes of each method are relabelled with their method's name.
\begin{example}[Creation of a SDG from a simple program]
\begin{example}[Creation of a PDG from a simple program]
\label{exa:simple-pdg}
Consider the program shown on the left side of Figure~\ref{fig:simple-sdg-code}, where two procedures in a simple imperative language are shown. The CFG that corresponds to each procedure is shown on the right side.
\begin{figure}[h]
\begin{minipage}{0.2\linewidth}
@ -120,176 +100,233 @@ proc f(x, y) {
\end{figure}
Then, the nodes of each CFG are rearranged, according to the control and data dependencies, to create the corresponding PDGs. Both are shown in Figure~\ref{fig:simple-sdg}, each bounded by a rectangle.
Finally, the two graphs are connected with a single call edge to form the SDG.
\begin{figure}[h]
\centering
\includegraphics[width=0.8\linewidth]{img/sdgsimple}
\caption{The SDG that corresponds to the program from Figure~\ref{fig:simple-sdg-code}.}
\caption{The PDG that corresponds to the program from Figure~\ref{fig:simple-sdg-code}.}
\label{fig:simple-sdg}
\end{figure}
\end{example}
Before creating the SDG by joining the different PDGs, we must consider the treatment of method calls and their data dependencies.
\subsubsection{Method calls and data dependencies}
\carlos{Vocabulary: when is appropriate the use of method, function and procedure????}\sergio{buena pregunta, yo creo que es jerarquico, method incluye function y procedure y los dos ultimos son disjuntos entre si no?} \josep{No. metodo implica orientacion a objetos. si estas hablando de un lenguaje en particular (p.e., Java), entonces debes usar el vocabulario de ese lenguaje (p.e., method). Si hablas en general y quieres usar una palabra que subsuma a todos, yo he visto dos maneras de hacerlo: (1) usar routine (aunque podrias usar otra palabra, por ejemplo metodo) la primera vez y ponerle una footnote diciendo que en el resto del articulo usamos routine para referirnos a metodo/funcion/procedimiento/predicado. (2) Usar metodo/funcion/procedimiento/predicado así, separado por barras. En esta tesina parece mas apropiado hablar de metodo, y la primera vez poner una footnote que diga que hablaremos de métodos, pero todos los desarrollos son igualmente aplicables a funciones y procedimientos.}
Although it is not imperative, since the inception of the SDG, data input and output from method calls\footnotemark has been treated with special detail. A similar system is used for a method input (parameters) and output (return value) as with the global variables it can access (static variables and fields from a class in Java).
Method calls can access global variables and modify them, and to that end we must add fictitious nodes that represent variable input and output from the methods in both the method calls and their declarations.
This proposal can also be extended to those programming languages that pass parameters by reference instead of the more common pass-by-value.
Java objects and arrays can also be analysed more deeply, as even though Java passes parameters by value, modifications to fields of an object or elements of an array affect the original object or array.
In the original definition of the SDG, there was special handling of data dependencies when calling functions, as it was considered that parameters were passed by value, and global variables did not exist. \carlos{Name and cite paper that introduced it} solves this issue by splitting function calls and function \added{definitions} into multiple nodes. This proposal solved \josep{the problem}everything\sergio{lo resuelve todo?} related to parameter passing: by value, by reference, complex variables such as structs or objects and return values.
\footnotetext{Method calls in this thesis will refer to Java method calls, but most if not all the details provided apply to functions, procedures and other routines.}
To such end, the following modifications are made to the different graphs:
In practice, the following modifications are made to the different graphs:
\begin{description}
\item[CFG.] In each CFG, global variables read or modified and parameters are added to the label of the ``Enter'' node in assignments of the form $par = par_{in}$ for each parameter and $x = x_{in}$ for global variables. Similarly, global variables and parameters modified are added to the label of the ``Exit'' node as \added{assignments of the form} $x_{out} = x$. \added{From now on, we will refer to the described assignments as input and output information respectively.} \sergio{\{}The parameters are only passed back if the value set by the called method can be read by the callee\sergio{\} no entiendo a que se refiere esta frase}. Finally, in method calls the same values must be packed and unpacked: each statement containing a function called is relabeled to contain \added{its related} input (of the form $par_{in} = \textnormal{exp}$ for parameters or $x_{in} = x$ for global variables) and output (always of the form $x = x_{out}$) \added{information}. \sergio{no hay parameter\_out? asumo entonces que no hay paso por valor?}
\item[PDG.] Each node \added{augmented with input or output information}\deleted{modified} in the CFG is \added{now} split into multiple nodes: the original \deleted{label}\added{node} \added{(Enter, Exit or function call)} is the main node and each assignment \added{contained in the input and output information} is represented as a new node, which is control--dependent on the main one. Visually, \added{new nodes coming from the input information}\deleted{input is} \added{are} placed on the left and \added{the ones coming from the output information}\deleted{output} on the right; with parameters sorted accordingly.
\item[SDG.] Three kinds of edges are introduced: parameter input (param--in), parameter output (param--out) and summary edges. Parameter input edges are placed between each method call's input node and the corresponding method definition input node. Parameter output edges are placed between each method definition's output node and the corresponding method call output node. Summary edges are placed between the input and output nodes of a method call, according to the dependencies inside the method definition: if there is a path from an input node to an output node, that shows a dependence and a summary method is placed in all method calls between those two nodes.\sergio{Tengo la sensacion de que la explicacion de que es un summary llega algo tarde y tal vez deberia estar en alguna definicion previa. Que opine Josep que piensa}\josep{Efectivamente. Llega tarde. No pueden definirse estas dependencias despues de definir el SDG, porque entonces lo que has definido en la definicion formal no es un SDG (solo una parte de el) y cuando hables de SDG a partir de ahora todo estara incompleto. Las definiciones son sagradas, así que hay dos soluciones: (1) explicar estos tres arcos antes de la definicion de SDG para poder definirlos formalmente en la definicion de SDG, o (2) retrasar la definiucion formal de SDG hasta aqui (para poder incluirlos). O cualquier otra cosa que haga que el SDG esté bien definido}
Note: \deleted{parameter input and output}\added{param-in and param-out} edges are separated because the traversal algorithm traverses them only sometimes (the output edges are excluded in the first pass and the input edges in the second).\sergio{delicado mencionar lo de las pasadas sin haber hablado antes de nada del algoritmo de slicing, a los que no sepan de slicing se les quedara el ojete frio aqui. Plantearse quitar esta nota.}\josep{Esta nota retrasala hasta que hables del algoritmo de slicing. En ese momento puedes decir que precisamente para que hayan dos pasadas se distingue entre parameter-ín y paramneter-out. Alli tendrá sentido y será aclaratorio. Aquí es confusorio. ;-)}
\item[CFG.] The CFG's structure is not modified, as the control flow is not altered by the treatment of variables. Instead, some labels are extended with extra information, which is later used in the PDG's creation. Specifically, the ``Enter'' node, the ``Exit'' node and nodes that contain method calls are modified:
\begin{description}
\item[Enter.] Each global variable that is used or modified and every parameter are appended to the node's label in assignments of the form $par = par_{in}$ in the case of parameters and $x = x_{in}$ in the case of global variables. These lines are the input information, and will become the input nodes.
\item[End.] Each global variable that is modified and every parameter whose modification can be read by the caller are prepended to the node's label. The assignments take the form $x_{out} = x$ for both. The method's output is also added, if the method will return a value, as \texttt{output}. These lines constitute the output information, and will be transformed into output nodes.
\item[Method call.] Each method call must be preceded by the input information and followed by the output information of the corresponding method. The input takes the form $par_{in} = \textnormal{exp}$ for each parameter and $x_{in} = x$ for each global variable $x$. The output is always of the form $x = x_{out}$, except for the output of the function, which is labelled \texttt{output}.
\end{description}
\item[PDG.] Each node augmented with input or output information in the CFG is now split into multiple nodes: the original label (``Enter'', ``Exit'' or function call) is the main node and each assignment contained in the input and output information is represented as a new node, which is control-dependent on the main one.
\end{description}
\begin{example}[Variable packing and unpacking]
Let it be \josep{Excelente cancion de los beatles. Buenísima. Pero mejor empieza así: Let $f(x, y)$ be a function with... ;-)} a function $f(x, y)$ with two integer parameters \added{which\josep{that} modifies the argument passed in its second parameter}, and a call $f(a + b, c)$, with parameters passed by reference if possible. The label of the method call node in the CFG would be ``\texttt{x\_in = a + b, y\_in = c, f(a + b, c)\josep{???}, c = y\_out}''; method $f$ would have \texttt{x = x\_in, y = y\_in} in the ``Enter'' node and \texttt{y\_out = y} in the ``Exit'' node. The relevant section of the SDG would be: \josep{Todo este parrafo y la figura que sigue no se entienden. Hay que reescribirlo y explicarlo más detenidamente, paso a paso. Se supone que este es el ejmplo de la sección. El que va a aclarar las dudas de qué es $x_in$, etc. y de cómo funciona el SDG. Sin embargo, más que aclarar, lía (a uno que no sepa de slicing no le aclara nada). De hecho, para que se entendiera bien, una vez has construido el grafo, estaría bien continuar un poco el ejemplo explicando como las dependencias hacen que lo que hay dentro del método llamado depende (siguiendo los arcos) de lo que hay en el método llamador (o al menos de los parámetros de la llamada). Esto requiere un poco de texto explicativo.}
\begin{center}
\includegraphics[width=0.5\linewidth]{img/parameter-passing}
\end{center}
Now that method calls are properly handled, the SDG can be defined as the combination of PDGs, with the addition of four dependencies that connect the method calls and their definitions.
\begin{definition}[System dependence graph]
\label{def:sdg}
Given a program $P$, composed of a set of methods $M = \{m_0 ... m_n\}$ and their associated PDGs---each method $m_i$ has a $PDG^i = \langle N^i, E_c^i, E_d^i \rangle$.
The \textit{system dependence graph} (SDG) of $P$ is a graph $G = \langle N, E_c, E_d, E_{call}, E_{in}, E_{out}, E_{sum} \rangle$ where:
\begin{enumerate}
\item $N = \bigcup_{i=0}^n N^i$
\item $E_c = \bigcup_{i=0}^n E_c^i$
\item $E_d = \bigcup_{i=0}^n E_d^i$
\item $(a, b) \in E_{call}$ if and only if $a$ is a statement that contains a call and $b$ is a method ``Enter'' node of the function or method called by $a$. $(a, b)$ is a \textit{call edge}.
\item $(a, b) \in E_{in}$ if and only if $a$ and $b$ are input nodes which refer to the same variable or parameter, $m_{call} \ctrldep a \wedge m_{enter} \ctrldep b \wedge (m_{call}, m_{enter}) \in E_{call}$ ($m_{call}$ is a method call, $m_{enter}$ is an ``Enter'' node). $(a, b)$ is a \textit{parameter-input} or \textit{param-in edge}.
\item $(a, b) \in E_{out}$ if and only if $a$ and $b$ are output nodes which refer to the same variable or to the output, $m_{enter} \ctrldep a \wedge m_{call} \ctrldep b \wedge (m_{call}, m_{enter}) \in E_{call}$ ($m_{call}$ is a method call, $m_{enter}$ is an ``Enter'' node). $(a, b)$ is a \textit{parameter-output} or \textit{param-out edge}.
\item $(a, b) \in E_{sum}$ if and only if $a$ is an input node and $b$ is an output node, $m_{call} \ctrldep a \wedge m_{call} \ctrldep b$, $m_{call}$ is a node that contains a method call and there is a path from $a$ to $b$. $(a, b)$ is a \textit{summary edge}.
\end{enumerate}
\end{definition}
Regarding call edges, in programming languages with ambiguous method calls (those that have polymorphism or pointers), there may exist multiple outgoing call edges from a statement with a single method call.
To avoid confusion, the ``Enter'' nodes of each method are relabelled with their method's name.
\begin{example}[The creation of a system dependence graph]
\label{exa:example-sdg}
For simplicity, we explore a single small method that is called by another.
Let $f(x, y)$ be a method with two integer parameters that modifies the argument passed in its second parameter. Its code is displayed in Figure~\ref{fig:example-sdg-code}. It also uses a global variable $z$. A valid call to $f$ could be $f(a + 1, b)$, with parameters passed by reference when possible.
\begin{figure}[h]
\begin{lstlisting}
void f(int x, int y) {
z += x;
y++;
}
\end{lstlisting}
\caption{A simple method that modifies a parameter and a global variable.}
\label{fig:example-sdg-code}
\end{figure}
The CFG is very simple, with the addition of the parameter information to the labels of the nodes. The aforementioned method call would be labelled as ``$z_{in} = z$, $x_{in} = a + 1$, $y_{in} = b$, $f(a + 1, b)$, $b = y_{out}$, $z = z_{out}$'', with the inputs, the actual call and the outputs.
The PDG seems more complicated, but can be pieced together piece by piece. In Figure~\ref{fig:example-sdg-graph}, the PDG is the graph below and including the node ``Enter f''. First, the input and output information is extracted into nodes, and placed in order. The input nodes will generate data dependencies (shown in red) to the statements inside the method, and those in turn to the output nodes. All statements are control-dependent on the ``Enter'' node, as there are no conditional expressions.
Finally, if we connect the PDG of the method that contains the method call $f(a + 1, b)$ to the method's PDG we obtain the SDG (where shown partially, as the method containing the method call has not been detailed). There are param-in and param-out dependencies (shown with dashes), which connect each input node from the method call to its corresponding node from the method declaration (and vice versa for the outputs). There is also the call edge, which connects the actual call to the declaration, and finally there are the summary edges, which of course summarize the dependencies that exist between the input and output nodes inside the method.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{img/parameter-passing}
\caption{The CFG of $f$ from Figure~\ref{fig:example-sdg-code} (left) and its SDG (right).}
\label{fig:example-sdg-graph}
\end{figure}
\end{example}
\sergio{Esta figura molaria mas evolutiva si diera tiempo, asi seria casi autoexplicativa: CFG $\rightarrow$ PDG $\rightarrow$ SDG. La actual seria el SDG, las otras tendrian poco mas que un nodo y una etiqueta.}
\section{Creating slices with the SDG}
Once a SDG has been built, it can be traversed to create slices, without the need to rebuild it unless the underlying program changes. The traversal process is actually consists of two passes:
The node that corresponds to the statement in the slicing criterion is selected as the initial node. From there, all edges except for \textit{param-in} are traversed backwards. All nodes encountered are added to a set (the slice). When all possible edges have been traversed, the second pass begins, ignoring \textit{param-out} edges, adding the nodes found to the aforementioned set.
When the process has ended, the set of nodes encountered during the two-pass traversal constitutes the slice.
Along this thesis there are some examples where the SDG has been sliced, filling the nodes in grey and marking the slicing criterion in bold. Some are Example~\ref{exa:program-slicing2}, Example~\ref{exa:unconditional}, Example~\ref{exa:problem-break-sub} and Example~\ref{exa:incorrect-try-catch-graph}.
\section{Unconditional control flow}
Even though the initial definition of the SDG was \deleted{useful}\added{adequate} to compute slices, the
Even though the initial definition of the SDG was adequate to compute slices, the
language covered was not enough for the typical language of the 1980s, which
included (in one form or another) unconditional control flow. Therefore, one of
the first \added{proposed upgrades}\deleted{additions contributed} to the algorithm to build \deleted{system dependence
graphs}\added{SDGs} was the inclusion of unconditional jumps, such as ``break'',
``continue'', ``goto'' and ``return'' statements (or any other equivalent). A
naive representation would be to treat them the same as any other statement, but
with the outgoing edge landing in the corresponding instruction (outside the
loop, at the loop condition, at the method's end, etc.).
An alternative approach is to represent the instruction as an edge, not a vertex, connecting the previous statement with the next to be executed. \sergio{Juntaria las 2 propuestas anteriores (naive y alternative) en 1 frase, no las separaria, porque despues de leer la primera ya me he mosqueado porque no deciamos ni quien la hacia ni por que no era util.}
Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
From here, there stem two approaches: the first would be to
redefine control dependency, in order to reflect the real effect of these
instructions ---as some authors~\cite{DanBHHKL11} have tried to do--- and the
second would be to alter the creation of the SDG to ``create'' those
dependencies, which is the most widely--used solution \cite{BalH93}.
the first additions contributed to the algorithm to build SDGs was the inclusion of unconditional jumps, such as ``break'',
``continue'', ``goto'' and ``return'' statements (or any other equivalent).
The most popular approach was proposed by Ball and Horwitz~\cite{BalH93}, classifying instructions into three separate categories:
A naive representation would be to treat them the same as any other statement, but
with the outgoing edge landing in the corresponding statement (e.g., outside the
loop); or, alternatively, to represent the statement as an edge, not a vertex, connecting the previous statement with the next to be executed.
Both of these approaches fail to generate a control dependence from the unconditional jump, as the definition of control dependence (see Definition~\ref{def:ctrl-dep}) requires a vertex to have more than one successor for it to be possible to be a source of control dependence.
From here, there stem two approaches: the first would be to
redefine control dependence, in order to reflect the real effect of these
statements---as some authors have done~\cite{DanBHHKL11}---and the
second would be to alter some step of the SDG's construction to introduce those
dependencies.
The most popular approach follows the latter option (modifying the SDG's construction), and was proposed by Ball et al.~\cite{BalH93}. It classifies statements into three separate categories:
\begin{description}
\item[Statement.] Any instruction that is not a conditional or unconditional jump. \josep{\deleted{It has one outgoing edge in the CFG, to the next instruction that follows it in the program.}\added{Those nodes that represent an statement in the CFG have one outgoing edge pointing to the next instruction that follows it in the program.}}
\item[Predicate.] Any conditional jump instruction, such as \texttt{while}, \texttt{until}, \texttt{do-while}, \texttt{if}, etc. \josep{\deleted{It has two outgoing edges, labeled \textit{true} and \textit{false}; leading to the corresponding instructions.}\added{In the CFG, those nodes representing predicates have two outgoing edges, labeled \textit{true} and \textit{false}, leading to the corresponding instructions.}}
\item[Pseudo--predicates.] Unconditional jumps (e.g. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are like predicates, with the difference that the outgoing edge labeled \textit{false} is marked as non--executable\josep{---because there is no possible execution where such edge would be possible,\deleted{, and there is no possible execution where such edge would be possible,} according to the definition of the CFG (see Definition~\ref{def:cfg})---}. Originally the edges had a specific reasoning backing them up: the \textit{true} edge leads to the jump's destination and the \textit{false} one, to the instruction that would be executed if the unconditional jump was removed, or converted into a \texttt{no op}\sergio{no op o no-op?} (a blank operation that performs no change to the program's state). \sergio{\{}This specific behavior is used with unconditional jumps, but no longer applies to pseudo--predicates, as more instructions have used this category as means of ``artificially'' \carlos{bad word choice} generating control dependencies.\sergio{\}No entrar en este jardin, cuando se definio esto no se contemplaba la creacion de nodos artificiales. -Quita el originally, ahora es originally.}
\item[Statement.] Any statement that is not a conditional or unconditional jump. In the CFG, their nodes have one outgoing edge pointing to the next statement that follows them in the program.
\item[Predicate.] Any conditional jump statement, such as \texttt{while}, \texttt{until}, \texttt{do-while}, \texttt{if}, etc. In the CFG, nodes representing predicates have two outgoing edges, labelled \textit{true} and \textit{false}, leading to the statements that would be executed with each result of the condition evaluation. As mentioned before, in general no evaluation is performed on the conditions, so every conditional statement has two outgoing edges, even if the condition is trivially \textit{true} or \textit{false} (e.g., $1 = 1$ or \textit{false}).
\item[Pseudo-predicates.] Unconditional jumps (i.e. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are treated like predicates, with the difference that the outgoing edge labelled \textit{false} is marked as non-executable---because there is no possible execution where such edge would be possible, according to the definition of the CFG (see Definition~\ref{def:cfg}). For unconditional jumps, the \textit{true} statement leads to the statement that will be executed after the jump is performed, and the \textit{false} edge to the statement that \textit{would} be executed if the jump was skipped or turned into a no-operation.
In future sections, other statements will make use of the pseudo-predicate structure (two outgoing edges, one non-executable), but using a different definition to place the non-executable edge. Therefore, the behaviour described for unconditional jumps is not universal for all statements classified as pseudo-statements.
\end{description}
\carlos{Pseudo--statements now have been introduced and are used to generate all control edges (for now just the Enter method to the Exit).}\josep{No entiendo este CCC}
As a consequence of this classification, every statement after an unconditional jump $j$ is control-dependent on it, as can be seen in the following example.
As a consequence of this classification, every instruction after an unconditional jump $j$ is control--dependent (either directly or indirectly) on $j$ and the structure containing it (\josep{a predicate such as }a conditional statement or a loop), as can be seen in the following example.
\begin{example}[Control dependencies generated by unconditional jumps]
\label{exa:unconditional}
Consider the program on the left side of Figure~\ref{fig:break-graphs}, which contains a loop and a \texttt{break} statement. The figure also includes the CFG and PDG for the method, showcasing the data and control dependencies of the statements. The slicing criterion $\langle 6, a\rangle$ is control dependent on both the unconditional jump and its surrounding conditional statement. Therefore, the slice (all nodes coloured in grey) includes both. They are necessary to terminate the loop, but they could be excluded in the context of weak slicing: the loop does not need to terminate, the slice can keep producing values.
\begin{figure}
\centering
\begin{minipage}{0.3\linewidth}
\begin{lstlisting}
\begin{figure}[h]
\centering
\begin{minipage}{0.3\linewidth}
\begin{lstlisting}
static void f() {
int a = 1;
while (a > 0) {
if (a > 10) break;
if (a > 10)
break;
a++;
}
System.out.println(a);
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
\includegraphics[width=0.4\linewidth]{img/breakcfg}
\includegraphics[width=0.59\linewidth]{img/breakpdg}
\end{minipage}
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
\label{fig:break-graphs}
\end{figure}
\begin{example}[Control dependencies generated by unconditional instructions]
\label{exa:unconditional}
Figure~\ref{fig:break-graphs} showcases a small program with a \texttt{break} statement, its CFG and PDG with a slice in grey\josep{No hables aún del slice. Primero presenta el programa, luego los grafos, luego el CS y finalmente el slice}. The slicing criterion (line 5, variable $a$) is control dependent on both the unconditional jump and its surrounding conditional instruction (both on line 4\josep{ponlos en lineas diferentes})\josep{. Therefore, the slice (all nodes in grey) includes the conditional jump and also the conditional exception. Note however that...}; even though it is not necessary to include it\sergio{a quien se refiere este it?} (in the context of weak slicing).
Note: the ``Enter'' node $S$ is also categorized as a pseudo--statement, with the \textit{false} edge connected to the ``Exit'' node, therefore generating a dependence from $S$ to all the nodes inside the method. This removes the need to handle $S$ with a special case when converting a CFG to a PDG, but lowers the explainability of non--executable edges as leading to the ``instruction that would be executed if the node was absent or a no--op''.
\end{example}
The original paper\josep{que original paper? parece que hablas de alguno que hayas hablado antes, pero el lector ya no se acuerda. Empieza de otra manera...}~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of instructions were lifted.
\begin{example}[Nested unconditional jumps]
\label{exa:nested-unconditional}
\josep{Esta frase es dificil de leer. No se entiende hasta leerla dos o tres veces.}In the case of nested unconditional jumps where both jump to the same destination, only one of them (the out--most one) is needed \josep{El lector no tiene contexto para saber de que hablas. Mejor empieza al reves: Consider the program in Figure~\ref{fig:nested-unconditional} where we can observe two nested unconditional jumps in lines X and Y. If we slice this program using the dependencies computed according to \cite{} then we compute the slice in light blue. Nevertheless, the minimal slice is composed of the nodes in grey [NOTA: yo no veo los colores. Arreglar la frase si no coincide con los colores]. This means that the slice computed includes unnecessary code (lines 3 and 5 are included unnecessarily). This problem is explained in depth and a solution proposed in Section~\ref{}}. Figure~\ref{fig:nested-unconditional} showcases the problem, with the minimal slice \carlos{have not defined this yet} in grey, and the algorithmically computed slice in light blue. Specifically, lines 3 and 5 are included unnecessarily.
\begin{figure}
\begin{minipage}{0.15\linewidth}
\begin{lstlisting}
while (X) {
if (Y) {
if (Z) {
A;
break;
}
B;
break;
}
C;
}
D;
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.84\linewidth}
\includegraphics[width=0.4\linewidth]{img/nested-unconditional-cfg}
\includegraphics[width=0.59\linewidth]{img/nested-unconditional-pdg}
\end{minipage}
\caption{A program with nested unconditional control flow (left), its CFG (center) and \josep{its} PDG (right).}
\label{fig:nested-unconditional}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.6\linewidth}
\includegraphics[width=0.4\linewidth]{img/breakcfg}
\includegraphics[width=0.59\linewidth]{img/breakpdg}
\end{minipage}
\caption{A program with unconditional control flow, its CFG (center) and PDG(right).}
\label{fig:break-graphs}
\end{figure}
\end{example}
\carlos{Add proposals to fix both problems showcased.}
% The
% The original paper\josep{que original paper? parece que hablas de alguno que hayas hablado antes, pero el lector ya no se acuerda. Empieza de otra manera...}~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to Example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of statements were lifted.
% \begin{example}[Nested unconditional jumps]
% \label{exa:nested-unconditional}
% \josep{Esta frase es dificil de leer. No se entiende hasta leerla dos o tres veces.}In the case of nested unconditional jumps where both jump to the same destination, only one of them (the out--most one) is needed \josep{El lector no tiene contexto para saber de que hablas. Mejor empieza al reves: Consider the program in Figure~\ref{fig:nested-unconditional} where we can observe two nested unconditional jumps in lines X and Y. If we slice this program using the dependencies computed according to \cite{} then we compute the slice in light blue. Nevertheless, the minimal slice is composed of the nodes in grey [NOTA: yo no veo los colores. Arreglar la frase si no coincide con los colores]. This means that the slice computed includes unnecessary code (lines 3 and 5 are included unnecessarily). This problem is explained in depth and a solution proposed in Section~\ref{}}. Figure~\ref{fig:nested-unconditional} showcases the problem, with the minimal slice \carlos{have not defined this yet} in grey, and the algorithmically computed slice in light blue. Specifically, lines 3 and 5 are included unnecessarily.
% \begin{figure}
% \begin{minipage}{0.15\linewidth}
% \begin{lstlisting}
% while (X) {
% if (Y) {
% if (Z) {
% A;
% break;
% }
% B;
% break;
% }
% C;
% }
% D;
% \end{lstlisting}
% \end{minipage}
% \begin{minipage}{0.84\linewidth}
% \includegraphics[width=0.4\linewidth]{img/nested-unconditional-cfg}
% \includegraphics[width=0.59\linewidth]{img/nested-unconditional-pdg}
% \end{minipage}
% \caption{A program with nested unconditional control flow (left), its CFG (center) and \josep{its} PDG (right).}
% \label{fig:nested-unconditional}
% \end{figure}
% \end{example}
% \carlos{Add proposals to fix both problems showcased.}
\section{Exceptions}
\sergio{Creo que aun no hemos dicho que nuestro target language es Java, creo que ahora seria un buen momento.}
Exception handling was first tackled in the context of Java program slicing by Sinha et al. \cite{SinH98}, with later contributions by Allen and Horwitz~\cite{AllH03}. There exist contributions for other programming languages, which will be explored later (chapter~\ref{cha:state-art}) \deleted{and other small contributions}. \sergio{Tal vez cambiaria el orden de estas frases para ir de lo general a lo concreto, diria primero que hay muchas contribuciones que veremos en el chapter~\ref{cha:state-art} y luego que nos vamos a centrar en los planteamientos que abordan el problema para Java, donde las propuestas con mas peso son: tal y tal.} The following section will explain the treatment of the different elements of exception handling in Java program slicing.
Exception handling was first tackled in the context of Java program slicing by Sinha et al. \cite{SinH98}, with later contributions by Allen and Horwitz~\cite{AllH03}. There exist contributions for other programming languages, which will be explored later in chapter~\ref{cha:state-art}. This section explains the treatment of the different elements of exception handling in Java program slicing.
As seen in section~\ref{sec:intro-exception}, exception handling in Java adds
two constructs: \texttt{throw} and \texttt{try-catch}. Structurally, the
first one resembles an unconditional control flow statement carrying a value ---like \texttt{return} statements--- but its destination is not fixed, as it depends on the dynamic typing of the value.
If there is a compatible \texttt{catch} block, execution will continue inside it, otherwise the method exits with the \deleted{corresponding value as the }error \added{as returned value}.
The same process is repeated in the method that called the current one, until either the call stack is emptied or the exception is successfully caught.
\deleted{If}\added{Eventually, in case} the exception is not caught \deleted{at all}\added{by any stacked method}, the program exits with an error ---except in multi--threaded programs, in which case the corresponding thread is terminated.
The \texttt{try-catch} statement can be compared to a \texttt{switch} which compares types (with \texttt{instanceof}) instead of constants (with \texttt{==} and \texttt{Object\#equals(Object)} \sergio{esta notacion es obligatoria o podemos decir ``... and the \texttt{equals} operands"?}). Both structures require special handling to place the proper dependencies, so that slices are complete and as correct as \deleted{can be}\added{possible}.
first one resembles an unconditional control flow statement carrying a value---like \texttt{return} statements---but its destination is not fixed, as it depends on the dynamic typing of the value.
The \texttt{try-catch} statement can be likened to a \texttt{switch} which compares types (using the \texttt{instanceof} operator) instead of constants. Both structures require special handling to place the proper dependencies, so that slices are complete and as correct as possible.
\subsection{\texttt{throw} statement}
The \texttt{throw} statement compounds two elements in one instruction: an
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} instruction, but the second one requires a small addition to the CFG: there must be an alternative control flow, where the path of the exception is shown. For now\sergio{esto suena muy espanyol no? So far?}, without including \texttt{try-catch} structures, any exception thrown will exit its method with an error; so a new ``Error end'' node is needed.\sergio{No me convence esta frase, a ver como os suena esto (aunque no estoy muy convencido de ello) $\rightarrow$ So far, without including \texttt{try-catch} structures, any exception thrown would activate the mentioned ``exception mode" and leave its method with an error state. Hence, in order to represent this behaviour, a different exit point (represented with a node called ``Error end") need to be defined.} \deleted{T}\added{Consecuently, t}he pre-existing ``Exit'' node is renamed \added{as} ``Normal end'', \deleted{but now the}\added{leaving the} CFG \deleted{has}\added{with} two distinct sink nodes; which is forbidden in most slicing algorithms. To solve that problem, a general ``Exit'' node is created, with both normal and \deleted{exit}\added{error} ends connected to it; making it the only sink in the graph.
The \texttt{throw} statement compounds two elements in one statement: an
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} statement, but the second one requires a small addition to the CFG: there must be an alternative control flow for the error to flow throw until it is caught or the program terminates.
In order to properly accommodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking is added to the ``Error exit'' node; same as the ``Exit''\sergio{Exit?Vaya cacao llevamos con esto xD} node in previous examples. This change constitutes an increase in precision, as now the outputted variables are differentiated\deleted{; f}\added{. F}or example\added{,} a slice which only requires the error exit may include less variable modifications than one which includes both.
So far, without including \texttt{try-catch} structures, any exception thrown will activate the aforementioned ``exception mode'' and leave its method with an error state. Hence, in order to model this behaviour, a different exit point (represented with a node labelled ``Error exit'') needs to be defined.
Consequently, the pre-existing ``Exit'' node is renamed to ``Normal exit''. Now we face the problem that CFGs may have two distinct sink nodes, something which is forbidden in most slicing algorithms.
To solve that problem, a general ``Exit'' node is created, with both ``Normal exit'' and ``Error exit'' connected to it, which makes it the new sink of the CFG.
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw}\sergio{ statement?} as if it were an unconditional jump whose destination is the ``Error exit''.
In order to properly accommodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking must be moved from ``Exit'' to both ``Normal exit'' and ``Error exit''. This duplicates some nodes, but allows some of those duplicated to be removed. Therefore, this change constitutes an increase in precision, as now the outputted variables are differentiated. For example, a slice which only requires the ``Error exit'' may include less variable modifications than one which includes both.
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw} statement as if it were an unconditional jump whose destination is the ``Error exit''.
\begin{example}[CFG of an uncaught \texttt{throw} statement]
Consider the simple Java method on the \deleted{right}\added{left} of figure~\ref{fig:throw}; which performs a square root if the number is positive, throwing otherwise a \texttt{RuntimeError}. The CFG in the centre illustrates the treatment of \texttt{throw}, ``normal exit'' and ``error exit'' as pseudo--statements, and the PDG on the right describes the control dependencies generated from the \texttt{throw} statement to the following instructions and exit nodes.
Consider the simple Java method on the left of Figure~\ref{fig:throw}; which performs a square root on a global variable $x$ if the number is positive, otherwise throwing a \texttt{RuntimeError}. The CFG in the centre illustrates the treatment of \texttt{throw} as a pseudo-statement and the new nodes ``Normal exit'' and ``Error exit''. The PDG on the right describes the control dependencies generated from the \texttt{throw} statement to the following statements and exit nodes.
\label{exa:throw}
\begin{figure}[h]
\begin{minipage}{0.3\linewidth}
\begin{lstlisting}
double f(int x) {
void f() {
if (x < 0)
throw new RuntimeException()
return Math.sqrt(x)
x = Math.sqrt(x)
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.69\linewidth}
\includegraphics[width=\linewidth]{img/throw-example-cfg}
\end{minipage}
\caption{A simple program with a \texttt{throw} statement \added{(left)}, its CFG (centre) and its PDG (\deleted{left}\added{right}).}
\caption{A simple program with a \texttt{throw} statement (left), its CFG (centre) and its PDG (right).}
\label{fig:throw}
\end{figure}
\end{example}
@ -297,90 +334,53 @@ double f(int x) {
\subsection{\texttt{try-catch-finally} statement}
The \texttt{try-catch} statement is the only way to stop an exception once it is thrown.
It filters \added{each} exception by its type; letting those which do not match any of the catch blocks propagate to \deleted{another}\added{an external} \texttt{try-catch}\deleted{surrounding it}\added{block} or \deleted{outside the method,} to the previous \deleted{one}\added{method} in the call stack.
On top of that, the \texttt{finally} block helps programmers guarantee code execution. It can be used replacing or in conjunction with \texttt{catch} blocks.
The code placed inside a \texttt{finally} block is guaranteed to run if the \texttt{try} block has been entered.
It filters exceptions by their type; letting those which do not match any of the catch blocks propagate to an external \texttt{try-catch} statement or to the previous method in the call stack.
On top of that, the \texttt{finally} statement helps programmers guarantee code execution. It can be used as a replacement for or in conjunction with \texttt{catch} statements.
The code placed inside a \texttt{finally} statement is guaranteed to run if the \texttt{try} block has been entered.
This holds true whether the \texttt{try} block exits correctly, an exception is caught, an exception is left uncaught or an exception is caught and another one is thrown while handling it (within its \texttt{catch} block).
\carlos{This would be useful to explain that the new dependencies introduced by the non-executable edges are not ``normal'' control dependencies, but ``presence'' dependencies. Opposite to traditional control dependence, where $a \ctrldep b$ if and only if the number of times $b$ is executed is dependent on the \textit{execution} of $a$ (e.g. conditional blocks and loops); this new control dependencies exist if and only if the number of times $b$ is executed is dependent on the \textit{presence} or \textit{absence} of $a$; which introduces a meta-problem. In the case of exceptions, it is easy to grasp that the absence of a catch block alters the results of an execution. Same with unconditional jumps, the absence of breaks modifies the flow of the program, but its execution does not control anything. A differentiation seems appropriate, even if only as subcategories of control dependence: execution control dependence and presence control dependence.}
The main problem when including \texttt{try-catch} blocks in program slicing is that \texttt{catch} blocks are not always strictly necessary for the slice (less so for weak slices), but introduce control dependencies that must be properly mapped to the SDG. The absence of \texttt{catch} blocks may also be a problem for compilation, as Java requires at least one \texttt{catch} or \texttt{finally} block to accompany each \texttt{try} block; though that could be fixed after generating the slice, if it is required that the slice should be executable.
The main problem when including \texttt{try-catch} blocks in program slicing is that \texttt{catch} blocks are not always strictly necessary for the slice (less so for weak slices), but introduce new styles of control dependence \sergio{De esto se habla luego? de estos ``new styles"? si es asi acuerdate de referenciarlo forward diciendo donde. Me imagino que es lo que pone en tu comentario de la presence control dependence.}; which must be properly mapped to the SDG. The absence of \texttt{catch} blocks may also be a problem for compilation, as Java requires at least one \texttt{catch} or \texttt{finally} block to accompany each \texttt{try} block; though that could be fixed after generating the slice, if it is required that the slice be \sergio{be or to be?} executable.
A typical\sergio{La tipica o la de la propuesta de Horwitz? Si es la de Horwitz di que ellos lo hacen asi, que ya hemos dicho que es lo mas importante hasta la fecha en Java.} representation of the \texttt{try} block is as a pseudo-predicate, connected to the first statement inside it and to the instruction that follows the \texttt{try} block.
This generates control dependencies from the \texttt{try} node to each of the instructions it contains.
\carlos{This is not really a ``control'' dependency, could be replaced by the definition of structural dependence.}\sergio{Totalmente, pero para decir esto hay que definir la structural dependence, que imagino que estara en la seccion 4.}
Allen et al.'s representation of the \texttt{try} block is as a pseudo-predicate, connected to the first statement inside it and to the statement that follows the \texttt{try} block.
This generates control dependencies from the \texttt{try} node to each of the statements it contains.
Inside the \texttt{try} there can be four distinct sources of exceptions:
\begin{description}
\item[\texttt{throw} statements.] The least common, but most simple to treat, because the exception is always thrown. The only problem may come from the ambiguity of the exception's type. For example, in the statement \texttt{throw ((Throwable) o)}, where \texttt{o} is a variable of type Object, the real type of the exception is unknown.
\item[Implicit unchecked exceptions.] If \textit{unchecked} exceptions are considered, many
common expressions may throw an exception, with the most common ones being trying to call
a method or accessing a field of a \texttt{null} object (\texttt{NullPointerException}),
accessing an invalid index on an array (\texttt{ArrayIndexOutOfBoundsException}), dividing
an integer by 0 (\texttt{ArithmeticException}), trying to cast to an incompatible type
(\texttt{ClassCastException}) and many others. On top of that, the user may create new
types that inherit from \texttt{RuntimeException}, but those may only be explicitly thrown.
Their inclusion in program slicing and therefore in the method's CFG generates extra
dependencies that make the slices produced bigger. For this reason, they are not considered in most of the previous works. This does not mean that they require special treatment in the graph, they just need to be identified in all instructions that may generated them.
\item[Method calls.] If an exception is thrown inside a method and it is not caught, it will
surface inside the \texttt{try} block.
As \textit{checked} exceptions must be declared explicitly, method declarations may be consulted to see if a method call may or may not throw any exceptions.
On this front, polymorphism and inheritance present no problem, as inherited methods must match the signature of the parent method ---including exceptions that may be thrown.
\deleted{If}\added{In case} \textit{unchecked} exceptions are also considered, method calls could be analysed to know which exceptions may be thrown, or the documentation \added{could} be checked automatically for the comment annotation \texttt{@throws} to know which ones \deleted{are thrown}\added{can be raised}.
\item[\texttt{throw} statements.] The least common, but most simple, as it is \deleted{treated as}\added{equivalent to}\sergio{no las tratamos, solo decimos cuales son} a
\texttt{throw} inside a method \sergio{Hemos explicado como se trata un ``throw inside un method"? O nos estamos refiriendo a una checked exception en una method call?}. The type of the exception may be obvious, as most \carlos{this is a weird claim to make without backup} exceptions are built and thrown in the same instruction; but it also may be hidden: e.g., \texttt{throw \added{(}(Exception) o\added{)}} where\sergio{por claridad, sino parece que la o forma parte de la frase} \texttt{o} is a variable of type Object.
\sergio{Este es el caso mas directo de excepcion, un throw a fuego en un try-catch. Yo tal vez lo pondria antes que las method calls.}
\item[Implicit unchecked exceptions.] If \textit{unchecked} exceptions are considered, many
common expressions may throw an exception, with the most common ones being trying to call
a method or accessing a field of a \texttt{null} object (\texttt{NullPointerException}),
accessing an invalid index on an array (\texttt{ArrayIndexOutOfBoundsException}), dividing
an integer by 0 (\texttt{ArithmeticException}), trying to cast to an incompatible type
(\texttt{ClassCastException}) and many others. On top of that, the user may create new
types that inherit from \texttt{RuntimeException}, but those may only be explicitly thrown.
Their inclusion in program slicing and therefore in the method's CFG generates extra
dependencies that make the slices produced bigger\added{. For this reason, they are not considered in most of the previous works}.
On this front, polymorphism and inheritance present no problem, as inherited methods must match the signature of the parent method---including exceptions that may be thrown.
In case \textit{unchecked} exceptions are also considered, method calls could be analysed to know which exceptions may be thrown, or the documentation could be checked automatically for the comment annotation \texttt{@throws} to know which ones can be raised. This is the most common way an exception appears inside a \texttt{try-catch} statement.
\item[Errors.] May be generated at any point in the execution of the program, but they normally
signal a situation from which it may be impossible to recover, such as an internal JVM error.
In general, most programs will not attempt to catch them, and can be excluded in order to simplify implicit unchecked exceptions (any instruction at any moment may throw an Error).
\sergio{Despues de leer las 4 propongo el que me parece el orden ideal de explicacion: (1) throw (2) implicit unchecked (3) method calls (asi puedes aprovechar que ya has hablado de las uncheked ahora mismo y el lector ya ha recordado que eran) (4) errors}
In general, most programs will not attempt to catch them, and can be excluded in order to simplify implicit unchecked exceptions (any statement at any moment may throw an Error). Therefore, most slicing software ignores them. Similarly to implicit unchecked exceptions, they do not need special treatment, but their identification is costly and can complicate the SDG until every instruction is dependent on the correct execution of the previous one; which is true in a technical sense but not in most practical applications of program slicing.
\end{description}
All exception sources are treated very similarly: the statement that may throw an exception
is treated as a predicate, with the true edge connected to the next instruction \deleted{were the statement
to execute without raising exceptions}\added{of the normal execution}; and the false edge connected to all the possible \texttt{catch} nodes which may be compatible with the exception thrown.
All exception sources (except \texttt{throw} statements) are treated very similarly: the statement that may throw an exception
has an outgoing edge the next statement. Then, there is an outgoing edge to each \texttt{catch} statement whose type may be compatible with the exception raised.
The nodes that represent \texttt{try} and \texttt{catch} statements are both pseudo-predicates: the \textit{true} edge leads to the first statement inside them, and the \textit{false} edge leads to the first instruction after the \texttt{try-catch} statement.
\deleted{The case of method calls that may throw exceptions is slightly different, as}\added{Unfortunately, when the exception source is a method call, there is an augmented behavour that make the representation slightly different, since} there may be variables to unpack, both in the case of a normal or erroneous exit. To that end, nodes containing method calls have an unlimited number of outgoing edges: one \deleted{to leads}\added{that points} to a node labelled ``normal return'', after which the variables produced by any normal exit of the method are unpacked; and all the others \added{point} to any possible catch that may catch the exception thrown. Each catch must then unpack the variables produced by the erroneous exits of the method.
Unfortunately, when the exception source is a method call, there is an augmented behaviour that make the representation slightly different, since there may be variables to unpack, both in the case of a normal or erroneous exit. To that end, nodes containing method calls have an unlimited number of outgoing edges: one that points to an auxiliary node labelled ``normal return'', in which the output variables produced by any normal exit of the method are placed. Each catch must then be labelled with the output variables produced by the erroneous exits of the method.
The ``normal return'' node is itself a pseudo-statement; with the \textit{true} edge leading to the following instruction and \sergio{\{}the \textit{false} one to the first common instruction between all the paths of length $\ge 1$ that start from the method call ---which translates to the instruction that follows the \texttt{try} block if all possible exceptions thrown by the method are caught or the ``Exit'' node if there are some left uncaught.\sergio{\}esta frase es larguisima, con aclaraciones en medio y no se entiende.}
The ``normal return'' node is itself a pseudo-statement. The \textit{true} edge is connected to the following statement, and the \textit{false} one to the first common statement between all the paths of non-zero length start from the method call. The most common destinations for the \textit{false} edge are (1) the first statement after the \texttt{try-catch} (if all exceptions that could be thrown are caught) and (2) the ``Error exit'' of the method (if some exception is not caught).
\deleted{Carlos: CATCH Representation doesn't matter, it is similar to a switch but checking against types.
The difference exists where there exists the chance of not catching the exception;
which is semantically possible to define. When a \texttt{catch (Throwable e)} is declared,
it is impossible for the exception to exit the method; therefore the control dependency must
be redefined.}
\begin{example}[Code that throws and catches exceptions.]
Consider the segment of Java code in Figure~\ref{fig:try-catch} (left), which includes some statements without any data dependence (X, Y and Z), and a method call to $f$ that uses $x$ and $y$, two global variables. $f$ may throw an exception, so it has been placed inside a \texttt{try-catch} structure, with a statement in the \texttt{catch} that logs a message when it occurs. Additionally, consider the case that when $f$ exits normally, only $x$ is modified; but when an error occurs, only $y$ is modified.
\deleted{The filter for exceptions in Java's \texttt{catch} blocks is a type (or multiple types since
Java 8), with a class that encompasses all possible exceptions (\texttt{Throwable}), which acts
as a catch-all.
In the literature there exist two alternatives to represent \texttt{catch}: one mimics a static
switch statement, placing all the \texttt{catch} block headers at the same height, all pending
from the exception-throwing exception and the other mimics a dynamic switch or a chain of \texttt{if}
statements. The option chosen affects how control dependencies should be computed, as the different
structures generate different control dependencies by default.}
As can be seen in the CFG shown in Figure~\ref{fig:try-catch} (centre), the nodes ``Normal return'', ``catch'' and ``try'' are considered as pseudo-statements, and their \textit{true} and \textit{false} edges (solid and dashed respectively) are used to create control dependencies.
The statements contained after the function call, inside the \texttt{catch} statement and inside the \texttt{try} statement are respectively controlled by the aforementioned nodes.
\deleted{\begin{description}
\item[Switch representation.] There exists no relation between different \texttt{catch} blocks,
each exception-throwing statement is connected through an edge labelled false to each
of the \texttt{catch} blocks that could be entered. Each \texttt{catch} block is a
pseudo-statement, with its true edge connected to the end of the \texttt{try} and the
As an example, a \texttt{1 / 0} expression may be connected to \texttt{ArithmeticException},
\texttt{RuntimeException}, \texttt{Exception} or \texttt{Throwable}.
If any exception may not be caught, there exists a connection to the ``Error exit'' of the method.
\item[If-else representation.] Each exception-throwing statement is connected to the first
\texttt{catch} block. Each \texttt{catch} block is represented as a predicate, with the true
edge connected to the first statement inside the \texttt{catch} block, and the false edge
to the next \texttt{catch} block, until the last one. The last one will be a pseudo-predicate
connected to the first statement after the \texttt{try} if it is a catch-all type or to the
``Error exit'' if it \added{is not}\deleted{isn't}.
\end{description}}
\begin{example}[Catches.]
Consider the \deleted{following }segment of Java code in figure~\ref{fig:try-catch}\added{ (left)}, which includes some statements \deleted{that do not use data}\added{without any data dependence} (X, Y and Z), and\added{a} method call to \texttt{f} that uses \texttt{x} and \texttt{y}, two global variables. \texttt{f} may throw an exception, so it has been placed inside a \texttt{try-catch} structure, with a statement in the \texttt{catch} that logs the \added{\texttt{error}} \added{token} when it occurs. Additionally, \added{consider the case that} when \texttt{f} exits \deleted{without an error}\added{normally}, only \texttt{x} is modified; but when an error occurs, only \texttt{y} is modified.
\deleted{Note how the pseudo-statements act to create control dependencies between the \textit{true} and \textit{false} edges, such as the ``normal return'', ``catch'', ``try''.}\added{As can be seen in the CFG shown in figure~\ref{fig:try-catch} (centre), the nodes ``normal return'', ``catch'' and ``try'' are considered as pseudo-statements, and their \textit{true} and \textit{false} edges (solid and dashed arrows respectively) are used to create control dependencies.} The statements contained after the function call, inside the \texttt{catch} \added{block,} and \added{inside} the \texttt{try} block\deleted{s} are respectively control dependent on the aforementioned nodes.
Finally, consider the statement \texttt{Z}; which is not dependent on any part of the \texttt{try-catch} block, as all exceptions that may be thrown are caught: it will execute regardless of the path taken inside the \texttt{try} block. \carlos{Consider critiquing the result, saying that despite the last sentence, statements can be removed (the catch) so that the dependencies are no longer the same.}
Finally, consider the statement \texttt{Z}; which is not dependent on any part of the \texttt{try-catch} statement, as all exceptions that may be thrown are caught: it will execute regardless of the path taken inside the \texttt{try} block. \carlos{Consider critiquing the result, saying that despite the last sentence, statements can be removed (the catch) so that the dependencies are no longer the same.}
\begin{figure}[h]
\begin{minipage}{0.35\linewidth}
\begin{lstlisting}
@ -397,44 +397,10 @@ Z;
\begin{minipage}{0.64\linewidth}
\includegraphics[width=\linewidth]{img/try-catch-example}
\end{minipage}
\caption{A simple example of the representation of \texttt{try-catch} structures and method calls that may throw exceptions. \josep{Pon quien es el CFG y quien el PDG. Por cierto, el arco del catch a la Z (rama false del catch) no es como los que se habian comentado. Es decir, no va a donde iria la ejecucion si el catch no estuviera.}}
\caption{A simple program with a method call that could throw an exception (left), its CFG (centre) and its PDG (left).}
\label{fig:try-catch}
\end{figure}
\end{example}
\carlos{From here to the end of the chapter, delete / move to solution chapter}
Regardless of the approach, when there exists a catch--all block, there is no dependency generated
from the \texttt{catch}, as all of them will lead to the next instruction. However, this means that
if no data is outputted from the \texttt{try} or \texttt{catch} block, the catches will not be picked
up by the slicing algorithm, which may alter the results unexpectedly. If this problem arises, the
simple and obvious solution would be to add artificial edges to force the inclusion of all \texttt{catch}
blocks, which adds instructions to the slice ---lowering its score when evaluating against benchmarks---
but are completely innocuous as they just stop the exception, without running any extra instruction.
Another alternative exists, though, but slows down the process of creating a slice from a SDG.
The \texttt{catch} block is only strictly needed if an exception that it catches may be thrown and
an instruction after the \texttt{try-catch} block should be executed; in any other case the \texttt{catch}
block is irrelevant and should not be included. However, this change requires analysing the inclusion
of \texttt{catch} blocks after the two--pass algorithm has completed, slowing it down. In any case, each
approach trades time for accuracy and vice\deleted{--}\added{ }versa, but the trade--off is small enough to be negligible.
Regarding \textit{unchecked} exceptions, an extra layer of analysis should be performed to tag statements
with the possible exceptions they may throw. On top of that, methods must be analysed and tagged
accordingly. The worst case is that of inaccessible methods, which may throw any \texttt{RuntimeException},
but with the source code unavailable, they must be marked as capable of throwing it. This results on
a graph where each instruction is dependent on the proper execution of the previous statement; save
for simple statements that may not generate exceptions. The trade--off here is between completeness and
correctness, with the inclusion of \textit{unchecked} exceptions increasing both the completeness and the
slice size, reducing correctness. A possible solution would be to only consider user--generated exceptions
or assume that library methods may never throw an unchecked exception. A new slicing variation that
annotates methods or limits the unchecked exceptions \added{may also}\deleted{to} be considered.
Regarding the \texttt{finally} block, most approaches treat it properly; representing it twice: once
for the case where there is no active exception and another one for the case where it executes with
an exception active. An exception could also be thrown here, but that would be represented normally.
\sergio{Mi aportacion aqui es que posiblemente tenemos que restringir la aproximacion del Chapter 4 diciendo que vamos a tratar solo checked exceptions y mencionar al final que las unchecked serian igual pero anyadiendo mas analisis y mas codigo al slice. Sino cada vez que contemos lo que hacemos vamos a tener que estar diciendo: "y para unchecked noseque..." todo el rato. Cuando presentes la solucion acota el problema y di que vamos a proponer una solucion para checked exceptions y que considera el caso en que no se capture lo que se lanza en el try catch (cosa que puede pasar en java). Eso ya es mejor que la solucion actual}
% vim: set noexpandtab:ts=2:sw=2:wrap

View File

@ -11,7 +11,7 @@
\textit{Program slicing} is a technique for program analysis and transformation whose main objective is to extract from a program the set of statements that affect a specific statement and set of variables, called a \textit{slicing criterion} \cite{Wei81,Tip95}. It answers the question ``Which parts of a program affect a set of variables in a specific statement?'' The program obtained by program slicing is called a \textit{slice}, and it has many uses, such as debugging \cite{DeMPS96}, program specialization \cite{OchSV05}, software maintenance \cite{HajF12}, code obfuscation \cite{MajDT07}, etc. This technique was originally defined \cite{Wei81} for a simple imperative programming language, but now can be used with practically all programming languages and paradigms.
\begin{example}[Program slicing applied a simple Java method]
Consider the code shown on the left side of figure~\ref{fig:program-slicing-code}, which is a simple method written in Java. If that method is sliced with respect to the slicing criterion (line 5, variable \texttt{x}), the slice would be the program on the right. The \texttt{if} and print statements would be excluded from the slice, as they do not affect the value of \texttt{x}. As a test, the execution of line 5 on both programs would yield the same result ---assuming both the original program and the slice are executed with the same input value.
Consider the code shown on the left side of Figure~\ref{fig:program-slicing-code}, which is a simple method written in Java. If that method is sliced with respect to the slicing criterion $\langle 5, x \rangle$ (which represents variable $x$ in line 5), the slice would be the program on the right. The \texttt{if} and print statements would be excluded from the slice, as they do not affect the value of \texttt{x}. As a test, the execution of line 5 on both programs would yield the same result---assuming both the original program and the slice are executed with the same input value.
\label{exa:program-slicing}
\begin{figure}[h]
@ -35,21 +35,24 @@ void f(int x) {
}
\end{lstlisting}
\end{minipage}
\caption{A simple Java method (left) and its slice w. r. t. slicing criterion (line 5, \texttt{x}).}
\caption{A simple Java method (left) and its slice w.r.t. slicing criterion $\langle 5, x \rangle$.}
\label{fig:program-slicing-code}
\end{figure}
\end{example}
As depicted in example~\ref{exa:program-slicing}, slices are subsets of the original program. In the most general form, the execution of slices produces the same values in the slicing criterion as the original program would. In other words, the slice criterion behaves identically in the slice as in the original. Some uses of program slicing, such as program specialization, require the slices to be executable, which is useful to extract an independent process from a bigger program or software library. Other uses do not, as the slices are used to find the complete set of dependencies of a slicing criterion.
As depicted in Example~\ref{exa:program-slicing}, slices are subsets of the original program. In the most general form, the execution of slices produces the same values in the slicing criterion as the original program would. In other words, the slice criterion behaves identically in the slice as in the original. Some uses of program slicing, such as program specialization, require the slices to be executable, which is useful to extract an independent process from a bigger program or software library. Other uses do not, as the slices are used to find the complete set of dependencies of a slicing criterion.
Though it may seem a really powerful technique, many programming languages lack a mature program slicer which covers the whole language. Even commonly widespread languages like Java does not have a complete program slicer that is publicly available, or documented in the literature; which makes it difficult to use program slicing where it may be needed. Nevertheless, there exist commercial program slicers that cover Java, such as CodeSurfer\footnote{Created by GrammaTech. For more information, consult their website at \url{https://www.gramatech.com/}}.
Though it may seem a really powerful technique, many programming languages lack a mature program slicer which covers the whole language. Even commonly widespread languages like Java does not have a complete program slicer that is publicly available, or documented in the literature; which makes it difficult to use program slicing where it may be needed. Nevertheless, there exist commercial program slicers that cover Java, such as CodeSonar\footnote{Created by GrammaTech. For more information, consult their website at \url{https://www.gramatech.com/}}.
Building a program slicer is not a simple task, requiring a considerable amount of analysis to obtain a valid slice. Smaller slices are preferable, but even more difficult to create. In Java specifically many situations lead to several scenarios, such as arrays, polymorphism and inheritance, and exception handling that are quite difficult to analyze. This is the reason there does not exist a universal solution for all the existent problems in the field of program slicing. Conversely, many approaches are usually proposed to solve the same slicing problem. Program slicing is used in so many applications ---debugging, program comprehension, parallelization, dead code removal--- that any improvement to the state of the art improves those processes.
Building a program slicer is not a simple task, requiring a considerable amount of analysis to obtain a valid slice. Smaller slices are preferable, but even more difficult to create. In Java specifically there are several scenarios, such as arrays, polymorphism and inheritance, and exception handling that are quite difficult to analyse. This is the reason why a universal solution does not exist for all the problems in the field of program slicing. Conversely, there are many approaches to solve the same slicing problem. Program slicing is used in so many applications---debugging, program comprehension, parallelization, dead code removal---that any improvement to the state of the art improves those processes.
Even though the original proposal by Weiser~\cite{Wei81} focused on an imperative language, program slicing is a language-agnostic technique.
Since then, the literature has been expanded by dozens of authors, that have described and implemented program slicing for more complex structures, such as uncontrolled control flow~\cite{HorwitzRB88}, exception handling~\cite{AllH03}; and for other programming paradigms, such as object--oriented languages~\cite{LarH96}.
Among others, there is an area that has been investigated, but does not have a definitive solution yet: exception handling. Example~\ref{exa:program-slicing2} shows how, even using the latest developments to handle exceptions in program slicing~\cite{AllH03,JiaZSJ06}, the slice produced is not valid.
\begin{example}[Program slicing with exceptions]
Consider figure~\ref{fig:program-slicing2-code}: the Java program on the left has been sliced (on the right) using Allen et al.'s proposal~\cite{AllH03}; with respect to the slicing criterion (line 17, variable \texttt{a}).
Consider Figure~\ref{fig:program-slicing2-code}: the Java program on the left has been sliced (on the right) using Allen et al.'s proposal~\cite{AllH03}; with respect to the slicing criterion $\langle 17, a \rangle$.
\label{exa:program-slicing2}
\begin{figure}[h]
\begin{minipage}{0.49\linewidth}
@ -96,7 +99,7 @@ void g(int a) throws Exception {
}
\end{lstlisting}
\end{minipage}
\caption{A simple Java program with exception (left) and its slice w. r. t. line 17, variable \texttt{a} (right).}
\caption{A simple Java program with exception (left) and its slice w.r.t. $\langle 17, a \rangle$ (right).}
\label{fig:program-slicing2-code}
\end{figure}
@ -108,27 +111,26 @@ void g(int a) throws Exception {
Method \texttt{g} throws an exception, which is not caught, and the program ends with an error, stopping abruptly before reaching the slicing criterion.
The problem in this example is that the \texttt{catch} block in line 4 is not included.
This is because ---according to the system dependence graph \cite{HorwitzRB88} computed using Allen et al.'s algorithm \cite{AllH03} and shown in Figure~\ref{fig:program-slicing2-graph} below--- it does not influence the execution of line 17.
This is because---according to the system dependence graph \cite{HorwitzRB88} computed using Allen et al.'s algorithm \cite{AllH03} and shown in Figure~\ref{fig:program-slicing2-graph} below---it does not influence the execution of line 17.
The graph displays the statements of the methods as nodes; and the dependencies between statements as edges. Some nodes have its outline dashed; as they do not correspond to a statement, but are needed by the algorithm.
The node associated with the slicing criterion is marked in bold and the nodes that represent the slice are filled in grey. Note that there are some edges between both methods that are not shown. The only relevant ones (the ones traversed to create the slice) are shown, and the rest are hidden for clarity.
The graph traversal will be explained later, but the basic rule is that edges are traversed backwards starting from the slicing criterion. Any node that is reached is part of the slice, the rest can be disregarded.
\begin{figure}[h]
\centering
\includegraphics[width=\linewidth]{img/motivation-example-pdg}
\caption{The system dependence graph for the method shown in Figure \ref{fig:program-slicing2-code}.}
\label{fig:program-slicing2-graph}
\end{figure}
\end{example}
\carlos{mover el grafo y la explicación a después del background; el porqué y la solución se presenta en sección X???}
Example~\ref{exa:program-slicing2} is a contribution of this thesis, because it showcases an important error in the current state of the art.
This example is later generalized (see chapter \ref{cha:solution}), as under some conditions all \texttt{catch} statements are ignored, regardless of if it is needed or not.
The only way a \texttt{catch} block can be included in the slice is if a statement inside it is needed for another reason.
However, Allen et al. \cite{AllH03} did not tackle this problem, as for some examples the \texttt{catch} statement is included or unnecessary.
A real-life, commonly used instance of example~\ref{exa:program-slicing2} is the writing of any information to a file or a database; or any other instruction that has no data output (excluding side effects) and may throw an exception.
A real-life, commonly used instance of Example~\ref{exa:program-slicing2} is the writing of any information to a file or a database; or any other instruction that has no data output (excluding side effects) and may throw an exception.
\section{Contributions}

View File

@ -1,10 +1,72 @@
% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = ../paper.tex
\chapter{Proposed solution}
\chapter{Improving the SDG for exception handling}
\label{cha:solution}
\josep{Antes de nada, felicidades Carlos. En esta sección se ha notado una mejora importante. Sobretodo al introudcir los problemas, los ejemplos, etc. Sigue así!}
\carlos{This would be useful to explain that the new dependencies introduced by the non-executable edges are not ``normal'' control dependencies, but ``presence'' dependencies. Opposite to traditional control dependence, where $a \ctrldep b$ if and only if the number of times $b$ is executed is dependent on the \textit{execution} of $a$ (e.g. conditional blocks and loops); this new control dependencies exist if and only if the number of times $b$ is executed is dependent on the \textit{presence} or \textit{absence} of $a$; which introduces a meta-problem. In the case of exceptions, it is easy to grasp that the absence of a catch block alters the results of an execution. Same with unconditional jumps, the absence of breaks modifies the flow of the program, but its execution does not control anything. A differentiation seems appropriate, even if only as subcategories of control dependence: execution control dependence and presence control dependence.}
\carlos{This is not really a ``control'' dependence, could be replaced by the definition of structural dependence.}\sergio{Totalmente, pero para decir esto hay que definir la structural dependence, que imagino que estara en la seccion 4.}
\deleted{The filter for exceptions in Java's \texttt{catch} blocks is a type (or multiple types since
Java 8), with a class that encompasses all possible exceptions (\texttt{Throwable}), which acts
as a catch-all.
In the literature there exist two alternatives to represent \texttt{catch}: one mimics a static
switch statement, placing all the \texttt{catch} block headers at the same height, all pending
from the exception-throwing exception and the other mimics a dynamic switch or a chain of \texttt{if}
statements. The option chosen affects how control dependencies should be computed, as the different
structures generate different control dependencies by default.}
\carlos{From here to the end of the chapter, delete / move to solution chapter}
Regardless of the approach, when there exists a catch--all block, there is no dependence generated
from the \texttt{catch}, as all of them will lead to the next statement. However, this means that
if no data is outputted from the \texttt{try} or \texttt{catch} block, the catches will not be picked
up by the slicing algorithm, which may alter the results unexpectedly. If this problem arises, the
simple and obvious solution would be to add artificial edges to force the inclusion of all \texttt{catch}
blocks, which adds statements to the slice---lowering its score when evaluating against benchmarks---but are completely innocuous as they just stop the exception, without running any extra statement.
Another alternative exists, though, but slows down the process of creating a slice from a SDG.
The \texttt{catch} block is only strictly needed if an exception that it catches may be thrown and
a statement after the \texttt{try-catch} block should be executed; in any other case the \texttt{catch}
block is irrelevant and should not be included. However, this change requires analysing the inclusion
of \texttt{catch} blocks after the two--pass algorithm has completed, slowing it down. In any case, each
approach trades time for accuracy and vice\deleted{--}\added{ }versa, but the trade--off is small enough to be negligible.
Regarding \textit{unchecked} exceptions, an extra layer of analysis should be performed to tag statements
with the possible exceptions they may throw. On top of that, methods must be analysed and tagged
accordingly. The worst case is that of inaccessible methods, which may throw any \texttt{RuntimeException},
but with the source code unavailable, they must be marked as capable of throwing it. This results on
a graph where each statement is dependent on the proper execution of the previous statement; save
for simple statements that may not generate exceptions. The trade--off here is between completeness and
correctness, with the inclusion of \textit{unchecked} exceptions increasing both the completeness and the
slice size, reducing correctness. A possible solution would be to only consider user--generated exceptions
or assume that library methods may never throw an unchecked exception. A new slicing variation that
annotates methods or limits the unchecked exceptions \added{may also}\deleted{to} be considered.
Regarding the \texttt{finally} block, most approaches treat it properly; representing it twice: once
for the case where there is no active exception and another one for the case where it executes with
an exception active. An exception could also be thrown here, but that would be represented normally.
\sergio{Mi aportacion aqui es que posiblemente tenemos que restringir la aproximacion del Chapter 4 diciendo que vamos a tratar solo checked exceptions y mencionar al final que las unchecked serian igual pero anyadiendo mas analisis y mas codigo al slice. Sino cada vez que contemos lo que hacemos vamos a tener que estar diciendo: "y para unchecked noseque..." todo el rato. Cuando presentes la solucion acota el problema y di que vamos a proponer una solucion para checked exceptions y que considera el caso en que no se capture lo que se lanza en el try catch (cosa que puede pasar en java). Eso ya es mejor que la solucion actual}
\deleted{\begin{description}
\item[Switch representation.] There exists no relation between different \texttt{catch} blocks,
each exception-throwing statement is connected through an edge labelled false to each
of the \texttt{catch} blocks that could be entered. Each \texttt{catch} block is a
pseudo-statement, with its true edge connected to the end of the \texttt{try} and the
As an example, a \texttt{1 / 0} expression may be connected to \texttt{ArithmeticException},
\texttt{RuntimeException}, \texttt{Exception} or \texttt{Throwable}.
If any exception may not be caught, there exists a connection to the ``Error exit'' of the method.
\item[If-else representation.] Each exception-throwing statement is connected to the first
\texttt{catch} block. Each \texttt{catch} block is represented as a predicate, with the true
edge connected to the first statement inside the \texttt{catch} block, and the false edge
to the next \texttt{catch} block, until the last one. The last one will be a pseudo-predicate
connected to the first statement after the \texttt{try} if it is a catch-all type or to the
``Error exit'' if it \added{is not}\deleted{isn't}.
\end{description}}
\hrulefill
\josep{This chapter features different problems and weaknesses of the current treatment that program slicing techniques use in presence of exceptions. Each problem is described with a counterexample that illustrates the loss of completeness or precision. Finally, for each problem a solution is proposed.}
@ -22,11 +84,11 @@ The standard treatment of unconditional jumps as pseudo-statements introduces tw
\subsection{\josep{Problem 1: }Subsumption correctness error}
This problem has been known since\sergio{Los propios autores lo comentaban? Si es asi no digo nada xD} the seminal publication on slicing unconditional jumps~\cite{BalH93}: chapter 4 details an example where the slice is bigger than it needs to be, and leave the solution of that problem as an open question to be solved in future publications. A similar \sergio{similar a quien? Es similar o el mismo con breaks? yo tal vez diria analogous.}example ---with \texttt{break} statements instead of \texttt{goto}--- is shown in example~\ref{exa:problem-break-sub}.
This problem has been known since\sergio{Los propios autores lo comentaban? Si es asi no digo nada xD} the seminal publication on slicing unconditional jumps~\cite{BalH93}: chapter 4 details an example where the slice is bigger than it needs to be, and leave the solution of that problem as an open question to be solved in future publications. A similar \sergio{similar a quien? Es similar o el mismo con breaks? yo tal vez diria analogous.}example---with \texttt{break} statements instead of \texttt{goto}---is shown in Example~\ref{exa:problem-break-sub}.
\begin{example}[Example of unconditional jump subsumption~\cite{BalH93}]
\begin{example}[An unconditional jump subsumption~\cite{BalH93}]
\label{exa:problem-break-sub}
Consider the code shown in the left side of figure~\ref{fig:problem-break-sub}. It is a simple Java method containing a \texttt{while} statement, from which the execution may exit naturally or through any of the \texttt{break} statements (lines 6 and 9). For the rest of statements and expressions\sergio{impacta que ahora digamos statements or expressions cuando llevamos todo el rato diciendo instructions. Casi diria que es la primera vez que nos referimos a expressions. Yo dejaria statements o instructions}, uppercase letters are used; and no data dependencies are considered, as they are not relevant to the problem at hand.
Consider the code shown in the left side of Figure~\ref{fig:problem-break-sub}. It is a simple Java method containing a \texttt{while} statement, from which the execution may exit naturally or through any of the \texttt{break} statements (lines 6 and 9). For the rest of statements and expressions\sergio{impacta que ahora digamos statements or expressions cuando llevamos todo el rato diciendo instructions. Casi diria que es la primera vez que nos referimos a expressions. Yo dejaria statements o instructions}, uppercase letters are used; and no data dependencies are considered, as they are not relevant to the problem at hand.
\begin{figure}[h]
\begin{minipage}{0.33\linewidth}
@ -95,12 +157,14 @@ public void f() {
\begin{figure}[h]
\centering
\includegraphics[width=0.5\linewidth]{img/problem-break-sub-graph}
\caption{The system dependence graph for the program of figure \ref{fig:problem-break-sub}, with the slice marked in grey, and the slicing criterion in bold.\josep{En las condiciones pones O,P,Q en lugar de X,Y,Z}\sergio{eso es porque la liaste en el brainstorming con la O P Q, y lo rompiste todo, Josep's fault!! xD}}
\caption{The system dependence graph for the program of Figure \ref{fig:problem-break-sub}, with the slice marked in grey, and the slicing criterion in bold.\josep{En las condiciones pones O,P,Q en lugar de X,Y,Z}\sergio{eso es porque la liaste en el brainstorming con la O P Q, y lo rompiste todo, Josep's fault!! xD}}
\label{fig:problem-break-sub-sdg}
\end{figure}
\end{example}
The problem showcased in example~\ref{exa:problem-break-sub} can be generalized for any pair of unconditional jump statements that are nested and whose destination is the same. Formally, \josep{lo que sigue es bastante lioso. Yo crearia un entorno "problem" (como el de definition o example) y pondria el problema descrito formalmente en ese entorno. Despues, lo aclararia con una breve explicacion similar a la que hay entremezclada con la definicion formal}if a program $P$ contains a pair of unconditional jumps without any data \added{information} (e.g. \texttt{goto label}, \texttt{continue [label]}, \texttt{break [label]}, \texttt{return})\sergio{yo pondria 1, no los 4, que sino ya no es e.g. xD}\josep{Si esos cuatro son exhaustivos dejalos los cuatro, pero cambia e.g. por i.e.} $j_A$ and $j_B$ whose destinations (the instruction that will be executed after them) are $A$ and $B$, then $j_B$ is superfluous in the slice if and only if $A = B$ and $j_B$ is inside a conditional instruction $C$, and $j_A$ follows $C$ (not necessarily immediately). \carlos{Buscar mejor descripcion para la estructura ``nested''.} \carlos{Maybe use control dependencies between them.} Once $j_B$ is included, $C$ will also be included, and so will all of its data dependencies. \sergio{Esta definicion tiene varios vacios, estaba intentando proponer algo pero hay que definir varios conjuntos y es una definicion condicional del SC... propongo intentar hacer una mejor definicion entre los 3 el lunes}
\carlos{Superfluous: edge that connects two equivalent jumps w/o data. Solution: remove it after generating the control dependencies.}
The problem showcased in Example~\ref{exa:problem-break-sub} can be generalized for any pair of unconditional jump statements that are nested and whose destination is the same. Formally, \josep{lo que sigue es bastante lioso. Yo crearia un entorno "problem" (como el de definition o example) y pondria el problema descrito formalmente en ese entorno. Despues, lo aclararia con una breve explicacion similar a la que hay entremezclada con la definicion formal}if a program $P$ contains a pair of unconditional jumps without any data \added{information} (e.g. \texttt{goto label}, \texttt{continue [label]}, \texttt{break [label]}, \texttt{return})\sergio{yo pondria 1, no los 4, que sino ya no es e.g. xD}\josep{Si esos cuatro son exhaustivos dejalos los cuatro, pero cambia e.g. por i.e.} $j_A$ and $j_B$ whose destinations (the instruction that will be executed after them) are $A$ and $B$, then $j_B$ is superfluous in the slice if and only if $A = B$ and $j_B$ is inside a conditional instruction $C$, and $j_A$ follows $C$ (not necessarily immediately). \carlos{Buscar mejor descripcion para la estructura ``nested''.} \carlos{Maybe use control dependencies between them.} Once $j_B$ is included, $C$ will also be included, and so will all of its data dependencies. \sergio{Esta definicion tiene varios vacios, estaba intentando proponer algo pero hay que definir varios conjuntos y es una definicion condicional del SC... propongo intentar hacer una mejor definicion entre los 3 el lunes}
\sergio{Dejo esto a medias a ver si partiendo de eso sacamos algo:}
@ -111,7 +175,7 @@ The problem showcased in example~\ref{exa:problem-break-sub} can be generalized
\subsubsection*{\josep{\deleted{Proposal}A solution for the subsumption correctness error}}
As only the minimum amount of control edges are inserted into the PDG (according to definition~\ref{def:pdg}), the only edge that can be traverse to include the inner jump ($j_B$) is an edge $j_B \ctrldep j_A$. An exception can be included when generating the PDG, such that control edges between two unconditional jumps $j_X$ and $j_Y$ whose destinations are $X$ and $Y$ will not be included if $X = Y$.
As only the minimum amount of control edges are inserted into the PDG (according to Definition~\ref{def:pdg}), the only edge that can be traverse to include the inner jump ($j_B$) is an edge $j_B \ctrldep j_A$. An exception can be included when generating the PDG, such that control edges between two unconditional jumps $j_X$ and $j_Y$ whose destinations are $X$ and $Y$ will not be included if $X = Y$.
If the edge is not present, all inner unconditional jumps and their containing structures will be excluded from the slice, unless they are included for another reason.
@ -123,9 +187,9 @@ If the edge is not present, all inner unconditional jumps and their containing s
\begin{example}[Unnecessary unconditional jumps]
\label{exa:problem-break-weak}
Consider the code for method \texttt{g} on figure~\ref{fig:problem-break-weak-code}, which features a simple loop with a \texttt{break} statement within. The slice in the middle has been created with respect to the \added{slicing} criterion (line 6, variable \texttt{x}), and includes everything except the print statement. This seems correct, as the presence of lines 4 and 5 determine the number of times line 6 is executed.
Consider the code for method \texttt{g} on Figure~\ref{fig:problem-break-weak-code}, which features a simple loop with a \texttt{break} statement within. The slice in the middle has been created with respect to the \added{slicing} criterion $\langle 6, x \rangle$, and includes everything except the print statement. This seems correct, as the presence of lines 4 and 5 determine the number of times line 6 is executed.
However, if \josep{one considers\deleted{you consider}} weak slicing, instead of strong slicing; the loop's termination stops mattering, lines 4 and 5 are no longer relevant. Without them, the slices produce\josep{\deleted{s}} an infinite list \josep{of} natural numbers (0, 1, 2, 3, 4, 5...)\sergio{\{}, but as that is a prefix \josep{suena raro que una lista infinita sea un prefijo de 0-9, mas bien es al reves}of the original program ---which outputs the numbers 0 to 9--- the program is still a valid slice (pictured on figure~\ref{fig:problem-break-weak-code}'s right side).\sergio{\}. Fortunately, this represents no inconvenience in the context of weak slicing, since the values given to the slicing criterion for the original program ---which is a list with the numbers 0 to 9--- is a prefix of the values generated by the slice, fulfilling the requirements of definition~\ref{XX}Creo que esto estaba en una definicion.}
However, if \josep{one considers\deleted{you consider}} weak slicing, instead of strong slicing; the loop's termination stops mattering, lines 4 and 5 are no longer relevant. Without them, the slices produce\josep{\deleted{s}} an infinite list \josep{of} natural numbers (0, 1, 2, 3, 4, 5...)\sergio{\{}, but as that is a prefix \josep{suena raro que una lista infinita sea un prefijo de 0-9, mas bien es al reves}of the original program---which outputs the numbers 0 to 9---the program is still a valid slice (pictured on Figure~\ref{fig:problem-break-weak-code}'s right side).\sergio{\}. Fortunately, this represents no inconvenience in the context of weak slicing, since the values given to the slicing criterion for the original program---which is a list with the numbers 0 to 9---is a prefix of the values generated by the slice, fulfilling the requirements of Definition~\ref{XX}Creo que esto estaba en una definicion.}
Note that the removal of lines 4 and 5 is only possible if there are no statements in the slice after the \texttt{while} statement. If the slicing criterion \deleted{is}\added{was} line 8, variable \texttt{x}, lines 4 and 5 \deleted{are}\added{would be} required to print the value, as without them, the program would loop indefinitely and never execute line 8.
@ -170,7 +234,7 @@ void g() {
}
\end{lstlisting}
\end{minipage}
\caption{A simple loop with a break statement (left), its computed slice (middle) with respect to line 5, variable \texttt{x}, and the smallest weak slice (right) for the same slicing criterion.}
\caption{A simple loop with a break statement (left), its computed slice (middle) with respect to $\langle 6, x\rangle$, and the smallest weak slice (right) for the same slicing criterion.}
\label{fig:problem-break-weak-code}
\end{figure}
\end{example}
@ -183,9 +247,11 @@ As with the previous error, the problem is not the inclusion of the jump and its
\subsubsection*{\josep{\deleted{Proposal}A solution for the unnecessary instructions in weak slicing}}
\carlos{Al acabar el slice, se eliminan los saltos incondicionales tras cuyo destino no haya ningun statement en el slice. A continuacion se realiza el slice de nuevo.}
This problem cannot be easily solved, as it is a ``dynamic'' one, requiring information about the completed slice before allowing the removal of unconditional jumps and their dependencies. This means that the cost of this proposal \josep{cannot\deleted{can not}} be offloaded to the creation of the SDG as with the previous one.
\josep{frase incorrecta}Our proposal \deleted{revolves around temporarily remove}\added{is related to the temporal removal of} edges from the SDG: given an SDG of the form \josep{En la definicion de SDG salia esta sextupla?}$G = \langle N, E_c, E_d, E_{in}, E_{out}, E_{fc} \rangle$, \added{we} remove from $E_c$ any edge of the form $x \ctrldep y~|~x, y \in N$, where $x$ is an unconditional forward jump; \added{then,} \deleted{perform} the slice \added{is performed} normally; and \deleted{then}\added{finally} ---if there is any statement \added{located} after the destination of $x$ in the slice--- \added{we} restore the edges removed in the first step and recompute the slice.\sergio{no habia una solucion mejor que esta?, suena un poco a parche poco convincente} The slice would still be linear, because each node would be visited at most once; but the algorithm has a higher complexity, and the removal and restoration of the control edges has a cost; albeit small.
\josep{frase incorrecta}Our proposal \deleted{revolves around temporarily remove}\added{is related to the temporal removal of} edges from the SDG: given an SDG of the form \josep{En la definicion de SDG salia esta sextupla?}$G = \langle N, E_c, E_d, E_{in}, E_{out}, E_{call} \rangle$, \added{we} remove from $E_c$ any edge of the form $x \ctrldep y~|~x, y \in N$, where $x$ is an unconditional forward jump; \added{then,} \deleted{perform} the slice \added{is performed} normally; and \deleted{then}\added{finally}---if there is any statement \added{located} after the destination of $x$ in the slice---\added{we} restore the edges removed in the first step and recompute the slice.\sergio{no habia una solucion mejor que esta?, suena un poco a parche poco convincente} The slice would still be linear, because each node would be visited at most once; but the algorithm has a higher complexity, and the removal and restoration of the control edges has a cost; albeit small.
\josep{pon a continuacion un ejemplo solucionando el problema}
@ -197,21 +263,21 @@ In this section we present an example where the current \deleted{approximation f
\carlos{this subsection snippet could go in another place}
Even though it continues to be used for control dependence, definition~\ref{def:ctrl-dep} does not have the same meaning when applied to conditional instructions\josep{en todo el documento se debería hablar de statements en lugar de instructions, porque instructions tiene connotación del paradigma imperartivo, mientras que statements engloba el imperartivo y el declarativo} and loops as it has when applied to unconditional jumps and other complex structures, such as the \texttt{switch} and \texttt{try-catch} statements.
Even though it continues to be used for control dependence, Definition~\ref{def:ctrl-dep} does not have the same meaning when applied to conditional instructions\josep{en todo el documento se debería hablar de statements en lugar de instructions, porque instructions tiene connotación del paradigma imperartivo, mientras que statements engloba el imperartivo y el declarativo} and loops as it has when applied to unconditional jumps and other complex structures, such as the \texttt{switch} and \texttt{try-catch} statements.
Originally, the definition of control dependence signified that the execution of a statement affected whether or not another one executed (or kept executing)\sergio{$\leftarrow$ no se entiende esta frase. Creo que el whether sobra y el parentesis no lo entiendo}. In contrast, unconditional jumps, and \texttt{try-catch} statements' execution do not affect the following instructions; its presence or absence is what generates the control dependency. For those instructions, control dependencies are still generated with the same edges, but require the addition of extra edges to the CFG \cite{BalH93,AllH03}\sergio{estos son los psedo-predicate edges? Si son los falsos podemos referenciarlos dentro de la tesis, no hara falta irse a los articulos, sino parece que sean unos nuevos edges que no existian hasta le momento}.
Originally, the definition of control dependence signified that the execution of a statement affected whether or not another one executed (or kept executing)\sergio{$\leftarrow$ no se entiende esta frase. Creo que el whether sobra y el parentesis no lo entiendo}. In contrast, unconditional jumps, and \texttt{try-catch} statements' execution do not affect the following instructions; its presence or absence is what generates the control e. For those instructions, control dependencies are still generated with the same edges, but require the addition of extra edges to the CFG \cite{BalH93,AllH03}\sergio{estos son los psedo-predicate edges? Si son los falsos podemos referenciarlos dentro de la tesis, no hara falta irse a los articulos, sino parece que sean unos nuevos edges que no existian hasta le momento}.
\subsection{The control dependencies of a \texttt{catch} block}
\subsection{Problem 3: The lack control dependencies of \texttt{catch} statements}
In the current approximation\sergio{approach?} for exception handling \cite{AllH03}, \texttt{catch} blocks do not have any outgoing dependence leading anywhere except the instructions it contains. This means that, as showcased in chapter~\ref{cha:introduction}, the only way a \texttt{catch} statement may appear in a slice is if \added{the slicing criterion is inside the catch block, or if the value of a variable defined inside the catch block is needed (reaching it by data dependency).}\deleted{there is a data dependency or one of the statements inside it is needed.}
In the current approximation\sergio{approach?} for exception handling \cite{AllH03}, \texttt{catch} blocks do not have any outgoing dependence leading anywhere except the instructions it contains. This means that, as showcased in chapter~\ref{cha:introduction}, the only way a \texttt{catch} statement may appear in a slice is if \added{the slicing criterion is inside the catch block, or if the value of a variable defined inside the catch block is needed (reaching it by data dependence).}\deleted{there is a data dependence or one of the statements inside it is needed.}
The only occasion in which \texttt{catch} blocks generate any kind of control dependency is when there is an exception thrown that is not covered by any of the \texttt{catch} blocks, and the function may exit with an exception. In that case, the instructions after the \texttt{try-catch} block are dependent on an uncaught exception not being thrown.\sergio{aqui hay arcos a todos los catch? Si es asi acabar la frase diciendo que se considera esa instruccion dependiente de todos los catch o algo asi para que quede mas claro.}
The only occasion in which \texttt{catch} blocks generate any kind of control dependence is when there is an exception thrown that is not covered by any of the \texttt{catch} blocks, and the function may exit with an exception. In that case, the instructions after the \texttt{try-catch} block are dependent on an uncaught exception not being thrown.\sergio{aqui hay arcos a todos los catch? Si es asi acabar la frase diciendo que se considera esa instruccion dependiente de todos los catch o algo asi para que quede mas claro.}
But, compared to the treatment of unconditional \added{jumps?,} exceptions does\josep{do} not match\sergio{quien does not match?} the treatment of \josep{a} \texttt{try-catch} statement: unconditional jumps have a non-executable edge to the instruction that would be executed in their absence; \texttt{catch} statements do not. \josep{Estos tres parrafos estan escritosa con mucha prisa y no se entienden bien}
\begin{example}[\texttt{catch} statements' outgoing dependencies]
\label{exa:catch-no-dep}
Consider the code shown in figure~\ref{fig:catch-no-dep-code}, which depicts a \texttt{try-catch} where method \texttt{f}, which may throw an exception, is called. The function may throw either a \texttt{ExceptionA}, \texttt{ExceptionB} or \texttt{Exception}-typed exception; and the \texttt{try-catch} considers all three cases, logging the type of exception caught. Additionally, \texttt{f} accesses and modifies a global variable \texttt{x}. \josep{en el codigo no aparece la x. Seria más claro si apareciera}
Consider the code shown in Figure~\ref{fig:catch-no-dep-code}, which depicts a \texttt{try-catch} where method \texttt{f}, which may throw an exception, is called. The function may throw either a \texttt{ExceptionA}, \texttt{ExceptionB} or \texttt{Exception}-typed exception; and the \texttt{try-catch} considers all three cases, logging the type of exception caught. Additionally, \texttt{f} accesses and modifies a global variable \texttt{x}. \josep{en el codigo no aparece la x. Seria más claro si apareciera}
\begin{figure}[h]
\begin{lstlisting}
@ -230,17 +296,17 @@ next;
\label{fig:catch-no-dep-code}
\end{figure}
The CFG and PDG associated to \deleted{that}\added{the} code \added{of Figure~\ref{fig:catch-no-dep-code}} is depicted in figure~\ref{fig:catch-no-dep-graphs}\added{\footnote{For the sake of clarity, in the PDG of figure~\ref{fig:catch-no-dep-graphs}, \texttt{log} function calls have been represented as a single node instead of their full node structures.}}. As can be seen, the only two elements that are dependent on any \texttt{catch} are the log statement and the unpacking of \texttt{x}. If the following statement used \texttt{x} in any way, all \texttt{catch} statements would be selected, otherwise they are ignored, and not deemed necessary. It is true that they are normally not necessary; i.e., if the slicing criterion was placed on \texttt{next} (line 10), the whole \texttt{try-catch} would be rightfully ignored; but there exist cases where \texttt{f()} (line 2) would be part of the slice, and the absence of \texttt{catch} statements would result in an incomplete slice.
The CFG and PDG associated to \deleted{that}\added{the} code \added{of Figure~\ref{fig:catch-no-dep-code}} is depicted in Figure~\ref{fig:catch-no-dep-graphs}\added{\footnote{For the sake of clarity, in the PDG of Figure~\ref{fig:catch-no-dep-graphs}, \texttt{log} function calls have been represented as a single node instead of their full node structures.}}. As can be seen, the only two elements that are dependent on any \texttt{catch} are the log statement and the unpacking of \texttt{x}. If the following statement used \texttt{x} in any way, all \texttt{catch} statements would be selected, otherwise they are ignored, and not deemed necessary. It is true that they are normally not necessary; i.e., if the slicing criterion was placed on \texttt{next} (line 10), the whole \texttt{try-catch} would be rightfully ignored; but there exist cases where \texttt{f()} (line 2) would be part of the slice, and the absence of \texttt{catch} statements would result in an incomplete slice.
\begin{figure}[h]
\includegraphics[width=\linewidth]{img/catch-no-dep}
\caption{CFG (left) and PDG (right) of the code shown in figure~\ref{fig:catch-no-dep-code}.}
\caption{CFG (left) and PDG (right) of the code shown in Figure~\ref{fig:catch-no-dep-code}.}
\label{fig:catch-no-dep-graphs}
\end{figure}
\end{example}
\begin{example}[Incorrectly ignored \texttt{catch} statements]
Consider the code in figure~\ref{fig:incorrect-try-catch-code}, in which \deleted{a method}\added{the method \texttt{f}} is called twice: once inside a \texttt{try-catch} statement, and a second time, outside. \added{As it happened in example~\ref{exa:catch-no-dep}}, \texttt{f} also accesses and modifies variable \texttt{x}, which is redefined before the second call to \texttt{f}. Exploring this example, we demonstrate how line 3 will be necessary but not included in the slice.
Consider the code in Figure~\ref{fig:incorrect-try-catch-code}, in which \deleted{a method}\added{the method \texttt{f}} is called twice: once inside a \texttt{try-catch} statement, and a second time, outside. \added{As it happened in Example~\ref{exa:catch-no-dep}}, \texttt{f} also accesses and modifies variable \texttt{x}, which is redefined before the second call to \texttt{f}. Exploring this example, we demonstrate how line 3 will be necessary but not included in the slice.
\begin{figure}[h]
\begin{minipage}{0.5\linewidth}
@ -267,15 +333,15 @@ void f() throws Exception {
\label{fig:incorrect-try-catch-code}
\end{figure}
Figure~\ref{fig:incorrect-try-catch-graph} displays the program dependence graph for the snippet of code on the left side of figure~\ref{fig:incorrect-try-catch-code}. \josep{por que no se incluye el PDG de la parte derecha?}Data dependencies are shown in red, and summary edges\sergio{esto ya tiene una definicion clara en la seccion del SDG??} in blue. The set of nodes filled in grey represent the slice with respect to \deleted{a}\added{the} slicing criterion \added{$\langle 4, x \rangle$} \josep{no usar la misma numeracion en los dos fragmentos de codigo} in method \texttt{f} \deleted{(line 4, \texttt{x})}\sergio{plantearse si poner los SC asi o dejarlo como (line X, variable Y)}. In the slice, both calls to \texttt{f} and its input (\texttt{x\_in = x}) are included, but the \texttt{catch} block is not present. The execution of the slice may not be the same: if no exception is thrown, there is no change; but if \texttt{x} was odd before entering the snippet, an exception \deleted{will}\added{would} be thrown and not caught, exiting the program prematurely.
Figure~\ref{fig:incorrect-try-catch-graph} displays the program dependence graph for the snippet of code on the left side of Figure~\ref{fig:incorrect-try-catch-code}. \josep{por que no se incluye el PDG de la parte derecha?}Data dependencies are shown in red, and summary edges\sergio{esto ya tiene una definicion clara en la seccion del SDG??} in blue. The set of nodes filled in grey represent the slice with respect to \deleted{a}\added{the} slicing criterion \added{$\langle 4, x \rangle$} \josep{no usar la misma numeracion en los dos fragmentos de codigo} in method \texttt{f} $\langle 4, x \rangle$\deleted{(line 4, \texttt{x})}\sergio{plantearse si poner los SC asi o dejarlo como (line X, variable Y)}. In the slice, both calls to \texttt{f} and its input (\texttt{x\_in = x}) are included, but the \texttt{catch} block is not present. The execution of the slice may not be the same: if no exception is thrown, there is no change; but if \texttt{x} was odd before entering the snippet, an exception \deleted{will}\added{would} be thrown and not caught, exiting the program prematurely.
\begin{figure}[h]
\centering
\includegraphics[width=0.9\linewidth]{img/incorrect-try-catch}
\caption{The \deleted{system dependence graph}\added{SDG} of the left snippet of figure~\ref{fig:incorrect-try-catch-code}. \texttt{f} and the \deleted{edges that connect to it}\added{associated inter-procedural edges} are not shown for simplicity.}
\caption{The \deleted{system dependence graph}\added{SDG} of the left snippet of Figure~\ref{fig:incorrect-try-catch-code}. \texttt{f} and the \deleted{edges that connect to it}\added{associated inter-procedural edges} are not shown for simplicity.}
\label{fig:incorrect-try-catch-graph}
\end{figure}
\label{exa:incorrect-try-catch-graph}
\end{example}

View File

@ -17,55 +17,53 @@ Others \cite{PraMB11} have worked specifically on the C++ exception framework. \
Finally, Hao \cite{JieS11} introduced a Object-Oriented System Dependence Graph with exception handling (EOSDG), which represented a generic object-oriented language, with exception handling capabilities. Its broadness allows for the EOSDG to fit into both Java and C++. It uses concepts from Jiang \cite{JiaZSJ06}, such as cascading \textit{catch} statements, while adding explicit support for virtual calls, polymorphism and inheritance.\sergio{Es completo? trata los casos que solucionamos? se centra en modelar pero no es util para slicing? Extenderlo un poquito mas}
% TODO UNCOMPLETE
% \hrulefill
% \marginnote{Alternative explanation of \cite{AllH03}, with counter example. Maybe should move the counter example backwards.}
\hrulefill
\marginnote{Alternative explanation of \cite{AllH03}, with counter example. Maybe should move the counter example backwards.}
% In her\josep{their?} paper \added{\cite{pending}}, Horwitz \josep{et al.?} suggests treating exceptions in the following way:
% \begin{itemize}
% \item Statements are divided into statements, predicates (loops and conditional blocks) and pseudo-predicates (return and throw statements). Statements only have one successor in the CFG, predicates have two (one when the condition is true and another when false), pseudo-predicates have two, but the one labeled ``false'' is non-executable. The non-executable edge connects to the statement that would be executed if the unconditional jump was replaced by a ``nop''.
% \item \textit{try-catch-finally} blocks are treated differently, but it has fewer dependencies than needed. Each catch block is control-dependent on any statement that may throw the corresponding exception. The \josep{???}
% \end{itemize}
In her\josep{their?} paper \added{\cite{pending}}, Horwitz \josep{et al.?} suggests treating exceptions in the following way:
\begin{itemize}
\item Statements are divided into statements, predicates (loops and conditional blocks) and pseudo-predicates (return and throw statements). Statements only have one successor in the CFG, predicates have two (one when the condition is true and another when false), pseudo-predicates have two, but the one labeled ``false'' is non-executable. The non-executable edge connects to the statement that would be executed if the unconditional jump was replaced by a ``nop''.
\item \textit{try-catch-finally} blocks are treated differently, but it has fewer dependencies than needed. Each catch block is control-dependent on any statement that may throw the corresponding exception. The \josep{???}
\end{itemize}
% \josep{Crea un entorno example}
% \begin{lstlisting}[title=Example]
% void main() {
% int x = 0;
% while (true) {
% try {
% f(x);
% } catch (ExceptionA e) {
% x--;
% } catch (ExceptionB e) {
% System.err.println(x);
% } catch (ExceptionC e) {
% System.out.println(x);
% }
% System.out.println(x);
% }
% }
\josep{Crea un entorno example}
\begin{lstlisting}[title=Example]
void main() {
int x = 0;
while (true) {
try {
f(x);
} catch (ExceptionA e) {
x--;
} catch (ExceptionB e) {
System.err.println(x);
} catch (ExceptionC e) {
System.out.println(x);
}
System.out.println(x);
}
}
% void f(x) {
% x--;
% if (x > 10)
% throw new ExceptionA();
% else if (x == 0)
% throw new ExceptionB();
% else if (x > 0)
% throw new ExceptionC();
% x++;
% System.out.println(x);
% }
void f(x) {
x--;
if (x > 10)
throw new ExceptionA();
else if (x == 0)
throw new ExceptionB();
else if (x > 0)
throw new ExceptionC();
x++;
System.out.println(x);
}
% static class ExceptionA extends ExceptionC {}
% static class ExceptionB extends Exception {}
% static class ExceptionC extends Exception {}
% \end{lstlisting}
static class ExceptionA extends ExceptionC {}
static class ExceptionB extends Exception {}
static class ExceptionC extends Exception {}
\end{lstlisting}
% In this example we can explore all the errors found with the current state of the art. \josep{Seria mucho más claro si tenemos un grafo con la soluciones propuesta para cada problema.}
In this example we can explore all the errors found with the current state of the art. \josep{Seria mucho más claro si tenemos un grafo con la soluciones propuesta para cada problema.}
The first problem found is the lack of \texttt{catch} statements in the slice, as no edge is drawn from the catch. Some of the catch blocks will be included via data dependencies, but some may not be reached, though they are still necessary if the slice includes anything after a caught exception.
Therefore, an extra control dependency must be introduced, in order to always include a ``catch'' statement in the slice if the ``throw'' statement is in the slice. In the example, only the catch statement from line 20 will be included \josep{con que criterio? no has definido el ejemplo. El lector no sabe como interpretar esta figura}, and if ExceptionC or ExceptionB were thrown, they would not be caught. That would not be a problem if the function $f$ was not executed again, but it is, making the slice incorrect.
% The first problem found is the lack of \texttt{catch} statements in the slice, as no edge is drawn from the catch. Some of the catch blocks will be included via data dependencies, but some may not be reached, though they are still necessary if the slice includes anything after a caught exception.
% Therefore, an extra control dependence must be introduced, in order to always include a ``catch'' statement in the slice if the ``throw'' statement is in the slice. In the example, only the catch statement from line 20 will be included \josep{con que criterio? no has definido el ejemplo. El lector no sabe como interpretar esta figura}, and if ExceptionC or ExceptionB were thrown, they would not be caught. That would not be a problem if the function $f$ was not executed again, but it is, making the slice incorrect.
% vim: set noexpandtab:ts=2:sw=2:wrap

View File

@ -1,7 +1,6 @@
digraph g {
Start [shape=box];
End [shape=box];
Start -> End [style=dashed];
Start -> "int a = 1" -> "while (a > 0)" -> "if (a > 10)" -> "break" -> "print(a)";
"break" -> "a++" [style=dashed];
"if (a > 10)" -> "a++" -> "while (a > 0)" -> "print(a)" -> End;

Binary file not shown.

View File

@ -1,4 +1,5 @@
digraph g {
graph [splines = ortho];
// nodes g()
subgraph cluster_g {
enter_g [label=<entry<br/>g()>,shape=rect,style=filled];

Binary file not shown.

View File

@ -1,13 +1,13 @@
digraph g {
subgraph a {
E [label = "Entry", shape = box];
E [label = "Enter", shape = box];
e [label = "Exit", shape = box];
c [label = <x_in = 2<br/>y_in = 3<br/>multiply(2, 3)>];
E -> c -> e;
}
subgraph b {
Entry [shape=box, label = <Entry<br/>x = x_in<br/>y = y_in>];
Entry [shape=box, label = <Enter<br/>x = x_in<br/>y = y_in>];
Exit [shape=box];
Entry -> "int result = 0" -> "while (x > 0)" -> "result += y" -> "x--" -> "while (x > 0)" -> "System.out.println(result)" -> "return result" -> Exit;
{ rank = same; "while (x > 0)"; "System.out.println(result)"}

Binary file not shown.

View File

@ -1,6 +1,6 @@
digraph cfg {
start -> while [label = T];
while -> "if (Y)" [label = "T"];
while -> "if (Y)";
while -> D [label = "F"];
D -> end;
"if (Y)" -> "if (Z)" [label=T];
@ -18,5 +18,4 @@ digraph cfg {
break1 -> B [style=dashed, label = F];
break2 -> C [style=dashed, label = F];
C -> while;
start -> end [label = F, style = dashed];
}

Binary file not shown.

View File

@ -1,26 +1,63 @@
digraph G {
a1 [label = <Enter<br/>z = z_in<br/>x = x_in<br/>y = y_in>, shape = rect];
a2 [label = "z += x"];
a3 [label = "y++"];
a4 [label = <z_out = z<br/>y_out = y<br/>Exit>, shape = rect];
a1 -> a2 -> a3 -> a4
// p [label=<x_in = a + b<br/>y_in = c<br/>f()<br/>c = y_out>,shape=rect];
f_call [label="f()"]
x_in [label="x_in = a + b"]
y_in [label="y_in = c"]
y_out [label="c = y_out"]
f_call -> {x_in y_in y_out};
f_start [label="enter f"];
x_in [label="x_in = a + 1"]
y_in [label="y_in = b"]
z_in [label="z_in = z"]
y_out [label="b = y_out"]
z_out [label="z = z_out"]
f_call -> {z_in x_in y_in y_out z_out};
f_start [label="enter f", shape = rect];
fx_in [label="x = x_in"];
fy_in [label="y = y_in"];
fz_in [label="z = z_in"];
fy_out [label="y_out = y"];
f_start -> {fx_in fy_in fy_out};
fz_out [label="z_out = z"];
f_start -> {fz_in fx_in fy_in fy_out fz_out};
f_call -> f_start [style=bold];
y_in -> f_start [style=invis];
x_in -> fx_in [style=dashed];
y_in -> fy_in [style=dashed];
fy_out -> y_out [constraint=false,style=dashed];
{
edge [style = dashed];
z_in -> fz_in
x_in -> fx_in
y_in -> fy_in
fy_out -> y_out
fz_out -> z_out
}
invis [height=0.001,width=0.001,style=invis];
invis2 [height=0.001,width=0.001,style=invis];
{rank=same; x_in y_in y_out invis};
{rank=same; fx_in fy_in invis2 fy_out};
{rank=same; x_in y_in y_out z_in z_out invis};
{rank=same; fx_in fy_in invis2 fy_out fz_in fz_out};
{edge [style=invis];
x_in -> y_in -> invis -> y_out;
fx_in -> fy_in -> invis2 -> fy_out;
z_in -> x_in -> y_in -> invis -> y_out -> z_out;
fz_in -> fx_in -> fy_in -> invis2 -> fy_out -> fz_out;
}
{rank = max;
zplus [label = "z += x"];
yplus [label = "y++"];
}
f_start -> {zplus yplus};
{
edge [color = red];
{fz_in fx_in} -> zplus;
fy_in -> yplus;
edge [constraint = false];
zplus -> fz_out;
yplus -> fy_out;
}
{
edge [color = blue, constraint = false, style = bold];
{z_in x_in} -> z_out;
y_in -> y_out;
}
}

Binary file not shown.

View File

@ -36,6 +36,4 @@ digraph g {
l9 -> {l8 l9 l11}
}
}
l4 -> StartF [style=bold,constraint=false];
}

Binary file not shown.

View File

@ -1,21 +1,24 @@
digraph g {
Start [shape=box,label=<Start<br/>x = x_in>];
End [shape=box];
Start -> End [style=dashed];
End [shape=box,label=<End>];
NE [label = <x_out = x<br/>Normal exit>];
Start -> "if (x < 0)" -> "throw" -> "Error exit" -> End;
"throw" -> "return Math.sqrt(x)" [style=dashed];
"if (x < 0)" -> "return Math.sqrt(x)" -> "Normal exit" -> End;
"throw" -> "x = Math.sqrt(x)" [style=dashed];
"if (x < 0)" -> "x = Math.sqrt(x)" -> NE -> End;
// pdg
f [label="f()",shape=rect];
x_in [label = "x = x_in", style = dashed];
x_out [label = "x = x_out", style = dashed];
if [label = "if (x < 0)"];
t [label = "throw"];
ret [label = "return Math.sqrt(x)"];
ret [label = "x = Math.sqrt(x)"];
ee [label = "error exit", style = dashed];
ne [label = "normal exit", style = dashed];
f -> x_in;
f -> if -> t -> {ret ee ne};
ne -> x_out;
{ edge [color = red, constraint = false];
x_in -> {if ret};
ret -> x_out;
}
}

Binary file not shown.

View File

@ -5,14 +5,15 @@ digraph G {
enter -> try;
try -> X -> f;
f [label = <x_in = x<br/>y_in = y<br/>f()>];
f -> { "normal return"; catch; };
"normal return" -> "x = x_out" -> Y -> Z;
catch -> "y = y_out" -> "print(error)" -> Z -> "normal exit" -> exit;
f -> { NR; catch; };
NR [label = <normal return<br/>x = x_out>];
catch [label = <catch<br/>y = y_out>];
NR -> Y -> Z;
catch -> "print(error)" -> Z -> "normal exit" -> exit;
{ edge [style = dashed, constraint = false];
enter -> exit;
try -> Z;
catch -> Z;
"normal return" -> Z;
NR -> Z;
}
}
@ -20,8 +21,8 @@ digraph G {
method [label="Start", shape=rect];
t [label = "try"];
x [label = "X"];
x_in [label = "x_in = x"];
y_in [label = "y_in = y"];
x_in [label = "x_in = x", style = dashed];
y_in [label = "y_in = y", style = dashed];
call [label = "f()"];
nr [style=dashed, label = "normal return"];
data [style=dashed, label = "x = x_out"];

Binary file not shown.

View File

@ -3,7 +3,7 @@
% !TEX root = paper.tex
\documentclass[a4paper,twoside]{report}
\usepackage[spanish,english]{babel}
\usepackage[spanish,catalan,english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{listings}
\usepackage{algorithm}
@ -73,13 +73,26 @@
\include{listings-config}
\maketitle
\cleardoublepage
\begin{abstract}
\carlos{por completar}
Program slicing is an analysis technique that can be applied to practically all programming languages. However, in the presence of exception handling, current program slicing software has a precision problem. In this thesis we tackle the problem of program slicing with exception handling, analysing the problem from a general perspective (for any kind of exception system), but focusing our efforts in the object-oriented paradigm, specifically the Java language. The solution is still general enough to be applicable to other paradigms and programming languages.
In this thesis, we study the currently available solutions to the problem, and we propose a generalization that includes at least the \texttt{try-catch} and \texttt{throw} statements. The solution we propose produces slices that guarantee completeness and are as correct as possible. The implementation of the technique proposed will be in Java.
\end{abstract}
\selectlanguage{spanish}
\begin{abstract}
\carlos{por completar}
La fragmentación de programas es una técnica de análisis que se aplica en prácticamente todos los lenguajes de programación. Sin embargo, en presencia de excepciones los fragmentadores actuales tienen problemas de precisión. En este proyecto se aborda el problema de la fragmentación de programas con excepciones, analizando el problema desde una perspectiva general (para cualquier tipo de excepción), pero focalizando la implementación en el paradigma orientado a objetos, concretamente en el lenguaje Java.
Se estudiarán las soluciones actuales al problema y se propondrá una generalización que al menos incluya las construcciones try-catch y Throws. La solución propuesta deberá producir fragmentos que garanticen la completitud y que sean lo más precisos posible. La implementación de la técnica propuesta se hará en Java.
\end{abstract}
\selectlanguage{catalan}
\begin{abstract}
La fragmentaçió de programmaris és una tècnica d'anàlisis que s'aplica en pràcticamente tots els llenguatges de programació.
\end{abstract}
\selectlanguage{english}
@ -93,11 +106,19 @@
\include{Secciones/state_of_the_art}
\include{Secciones/conclusion}
\chapter{TODO}
% \chapter{TODO}
\begin{itemize}
\item \carlos{} Decide whether to use dependency or dependence (I suspect the plural form would also change: dependencies vs. dependences).
\end{itemize}
% \begin{itemize}
% \item \carlos{} Decide whether to use dependency or \textbf{dependence} (I suspect the plural form would also change: dependencies vs. dependences). DONE
% \item \carlos{} Ver si hay espacios antes/despues de los m-dash. Josep ha visto que no hay espacios. DONE
% \item \carlos{} Cambiar criterio de slicing de (line n, variable x) a $\langle n, x \rangle$. DONE
% \item \carlos{} Figure, Example, Definition always uppercase. DONE
% \item \carlos{} w.r.t. always without spaces. DONE
% \item \carlos{} Do not use the same number for different methods.
% \item \carlos{} Use et al. for multiple authors.
% \item \carlos{} All instructions are statements, even \texttt{catch}, \texttt{try} and \texttt{finally}.
% \item \carlos{} i.e. (es decir), e.g. (por ejemplo)
% \end{itemize}
\bibliographystyle{plain}
\bibliography{../../../../../../Biblio/biblio.bib}