Chapter 2 medio-reviewed

This commit is contained in:
Sergio Pérez 2019-12-03 18:57:40 +00:00
parent 2547aa7220
commit fbd0f679b5

View file

@ -6,16 +6,19 @@
\label{cha:background}
\section{Program slicing}
\textsl{Program slicing} \cite{Wei81,Sil12} is a debugging technique that
\textsl{Program slicing} \cite{Wei81,Sil12}\sergio{hay alguna razon para que \cite{Sil12} no este en la intro?, la unica cita alli es\cite{Wei81}. Propongo eliminar \cite{Sil12} por homogeneidad} is a debugging technique that
answers the question: ``which parts of a program affect a given statement and
set of variables?'' The statement and the variables are the basic input to create a slice
and are called the \textsl{slicing criterion}. The criterion can be more
complex, as different slicing techniques may require additional pieces of input.
The \textsl{slice} of a program is the list of statements from the original
program ---which constitutes a valid program--- whose execution will result in
the same values for the variables (selected in the slicing criterion).
the same values for the variables (selected in the slicing criterion).
There exist two fundamental dimensions along which the problem of slicing can be
proposed \cite{Sil12}:
\sergio{Mi propuesta es mover el concepto naive de aqui a la intro para que entiendan algo del ejemplo y aqui hacer referencia a la definicion anterior o introducir las dimensiones de slicing directamente con un pequenyo preambulo. Una fuerte razon para definirlo alli es que usamos todo el rato la palabra slice y de repente, despues de usarla un rato, la definimos.}
\begin{itemize}
\item \textsl{Static} or \textsl{dynamic}: slicing can be performed
statically or dynamically.
@ -26,17 +29,17 @@ proposed \cite{Sil12}:
expanded to include a position in the log that corresponds to one
instance of the selected statement, making it much more specific. It may
help find a bug related to indeterministic behavior (such as a random
or pseudo-random number generator), but must be recomputed for each case
or pseudo-random number generator), but \sergio{, despite selecting the same slicing criterion, the slice }must be recomputed for each case\sergio{different input value/execution considered?}
being analyzed.
\item \textsl{Backward} or \textsl{forward}: \textsl{backward slicing}
\cite{Wei81} is generally more used, because it looks at the statements
\cite{Wei81} is generally more used \sergio{habra que decir lo que es antes de decir que se usa mas no? Cambiar el orden y reescribir esta frase. Decimos que es y luego que es el que generalmente se estudia o algo de eso}, because it looks at the statements
that affect the slicing criterion. In contrast, \textsl{forward slicing}
\cite{BerC85} computes the statements that are affected by the slicing
criterion. There also exists a mixed approach called \textsl{chopping}
\cite{JacR94}, which is used to find all statements that affect some variables in the slicing criterion and at the same time they are affected by some other variables in the slicing criterion.
\end{itemize}
Since the definition of program slicing, the most extended form of slicing has
Since the definition of program slicing\sergio{Since Weiser defined program slicing in 1981}, the most \deleted{extended form}\added{studied configuration?} of slicing has
been \textsl{static backward slicing}, which obtains the list of statements that
affect the value of a variable in a given statement, in all possible executions
of the program (i.e., for any input data).
@ -44,17 +47,18 @@ of the program (i.e., for any input data).
\label{def:strong-slice}
\carlos{One of the citations is the correct one.}
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
$s$ is a statement and $v$ is a set\sergio{los set no se representan con letras mayusculas?} of variables in $P$ (the variables may
or may not be used in $s$), $S$ is the \textsl{strong slice} of $P$ with
respect to $C$ if $S$ has the following properties:
respect to $C$ if $S$ has\sergio{fulfils?} the following properties:
\begin{enumerate}
\item $S$ is an executable program.
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
\item $S \subseteq P$, or $S$ is the result of removing code\sergio{code o 0 or more statements?} from $P$.
\item For any input $I$, the values produced on each execution of $s$
for each of the variables in $v$ is the same when executing $S$ as
when executing $P$. \label{enum:exact-output}
\end{enumerate}
\end{definition}
\sergio{Esta definicion no obligaba tambien a acabar con el mismo error en caso de que la ejecucion no termine? Si es asi, plantearse poner algo al respecto.}
\begin{definition}[Weak static backward slice \cite{RepY89}]
\label{def:weak-slice}
@ -62,10 +66,10 @@ of the program (i.e., for any input data).
Given a program $P$ and a slicing criterion $C = \langle s,v \rangle$, where
$s$ is a statement and $v$ is a set of variables in $P$ (the variables may
or may not be used in $s$), $S$ is the \textsl{weak slice} of $P$ with
respect to $C$ if $S$ has the following properties:
respect to $C$ if $S$ has\sergio{fulfils?} the following properties:
\begin{enumerate}
\item $S$ is an executable program.
\item $S \subseteq P$, or $S$ is the result of removing code from $P$.
\item $S \subseteq P$, or $S$ is the result of removing code from $P$. \sergio{idem}
\item For any input $I$, the values produced on each execution of $s$
for each of the variables in $v$ when executing $P$ is a prefix of
those produced while executing $S$ ---which means that the slice
@ -74,73 +78,76 @@ of the program (i.e., for any input data).
\end{enumerate}
\end{definition}
\sergio{$\forall~i~\in~I, v\in~V~\rightarrow~seq(i,v,P)~Pref~seq(i,v,S)$ where $seq(i,a,A)$ representa la secuencia de valores obtenidos para $a$ al ejecutar el input $i$ en el programa $A$. $I$ es el conjunto de todos los inputs posibles para $P$. Por ahi irian los tiros creo yo.}
Both definitions (\ref{def:strong-slice} and~\ref{def:weak-slice}) are
used throughout the literature (see, e.g., \cite{pending}\carlos{Which citation? Most papers on exception slicing do not indicate or hint whether they use strong or weak.}), with some cases favoring the first and some the
used throughout the literature (see, e.g., \cite{pending}\carlos{Which citation? Most papers on exception slicing do not indicate or hint whether they use strong or weak.}\sergio{Josep?}), with some cases \deleted{favoring}\added{favouring} the first and some the
second. Though the definitions come from the corresponding citations, the naming
was first used in a control dependency analysis by Danicic~\cite{DanBHHKL11},
where slices that produce the same output as the original are named
\textsl{strong}, and those where the original is a prefix of the slice,
\textsl{weak}. Weak slicing tends to be preferred ---specially for debugging--- for two reasons: the algorithm can be simpler and avoid dealing with termination, and the slices can be smaller, narrowing the focus of the debugger. For some applications, strong slices are preferred, such as extracting a feature from a program, where there is a requirement that the resulting slice behave exactly like the original. In this paper we will indicate which kind of slice is produced with each new technique proposed.
\textsl{weak}. Weak slicing tends to be preferred ---specially for debugging--- for two reasons: the algorithm can be simpler and avoid dealing with termination, and the slices can be smaller, narrowing the focus of the debugger. For some applications, \deleted{strong slices are preferred,} such as extracting a feature from a program, where there is a requirement that the resulting slice behave exactly like the original\added{, strong slices are preferred}. In this paper\sergio{??} we will indicate which kind of slice is produced with each new technique proposed. \sergio{Generamos alguna vez strong? Joder que cracks somos xD}
\begin{example}[Strong, weak and incorrect slices]
\carlos{The table is labeled execution logs of... but the execution log is a different thing.}
In table~\ref{tab:slice-weak} we can observe examples for the various
definitions. Each row shows the values produced by the execution of a
definitions. Each row shows the values \sergio{for a specific variable $v$ in the slicing criterion,} produced by \deleted{the}\added{a particular} execution of \deleted{a}\sergio{the original}
program or one of its slices.
The first is the original, which computes $3!$.
Slice A's execution log is identical to the original and therefore it is a strong slice.
Slice B is a weak slice: its execution correctly produces the same values as the original program, but it continues producing values after the original stops.
Slice C is incorrect, as the values differ from the original.
Some data or control dependency has not been included in the slice and the program produce different results, in this case the slice computes Fibonacci numbers instead of factorials.
The first \added{row stands for}\deleted{is} the original \added{program}, which computes $3!$.
Slice A's \deleted{execution log}\added{generated sequence of values} is identical to the original and therefore it is a strong slice.
Slice B is a weak slice: its execution correctly produces the same \added{sequence of }values as the original program, but it continues producing values after the original stops.
Slice C is incorrect, as the \added{generated sequence of} values differ\added{s} from the \added{sequence generated by the }original \added{program}.
\sergio{Taking a closer look, one could think that }Some data or control dependency has not been included in the slice and the program produce different results, in this case the slice computes Fibonacci numbers instead of factorials.\sergio{Esto no parece muy relevante, plantearse quitarlo para no liar con Fibonacci.}
\end{example}
\begin{table}
\centering
\label{tab:slice-weak}
\begin{tabular}{r | r | r | r | r | r }
Iteration & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
\deleted{Iteration}\added{Evaluation Number} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\ \hline
Original & 1 & 2 & 6 & - & - \\ \hline
Slice A & 1 & 2 & 6 & - & - \\ \hline
Slice B & 1 & 2 & 6 & 24 & 120 \\ \hline
Slice C & 1 & 1 & 3 & 5 & 8 \\
\end{tabular}
\caption{Execution logs of different slices and their original program.}
\caption{\deleted{Execution logs of different slices and their original program.}\added{Sequence of values obtained for a certain variable of the original program and three different slices A, B and C for a particular input.}}
\end{table}
Program slicing is a language--agnostic tool, but the original proposal by
Program slicing is a language--agnostic tool\sergio{program slicing es tool o technique?}, but the original proposal by
Weiser~\cite{Wei81} covered a simple imperative programming language.
Since then, the literature has been expanded by dozens of authors, that have
described and implemented slicing for more complex structures, such as
uncontrolled control flow~\cite{HorwitzRB88}, global variables~\cite{???},
exception handling~\cite{AllH03}; and for other programming paradigms, such as
object--oriented languages~\cite{???} or functional languages~\cite{???}.
\carlos{Se pueden poner más, faltan las citas correspondientes.}
\carlos{Se pueden poner más, faltan las citas correspondientes.}\sergio{Guay, hay que buscarlas y ponerlas, la biblio la veo corta para todos los papers que hay, yo creo que cuando este todo deberia haber sobre 30 casi, si no mas.}
\subsection{The System Dependence Graph (SDG)}
There exist multiple approaches to compute a slice from a given program and
There exist multiple approaches to compute a slice\sergio{esto me suena raro, yo diria program representations o data structures that allow the use of program slicing techniques o algo asi, debatirlo} from a given program and
slicing criterion, but the most efficient and broadly used data structure is the System
Dependence Graph (SDG), first introduced by Horwitz, Reps and
Blinkey~\cite{HorwitzRB88}. It is computed from the program's statements, and
once built, a slicing criterion is chosen, the graph traversed using a specific
algorithm, and the slice obtained. Its efficiency resides in the fact that for
multiple slices that share the same program, the graph must only be built once.
On top of that, building the graph has a complexity of $\mathcal{O}(n^2)$ \carlos{uso $\mathcal{O}$ o $O$?} with
respect to the number of statements in a program, but the traversal is linear
Blinkey \sergio{in 1988}\sergio{Todos los autores o los citamos con et al.? lo digo por seguir la misma regla durante todo el document}~\cite{HorwitzRB88}. It is computed from the program's statements\sergio{source code}, and
once built, a slicing criterion is chosen, the graph \added{is} traversed using a specific
algorithm, and the slice \added{is} obtained. Its efficiency resides in the fact that\added{,} for
multiple slices \deleted{that share}\added{calculated for} the same program, the graph \deleted{must only be built}\added{generation process is only performed} once.
On top of that, building the graph has a complexity of $\mathcal{O}(n^2)$ \carlos{uso $\mathcal{O}$ o $O$?}\sergio{Josep?} with
respect to the number of statements in \deleted{a}\added{the} program, but the traversal is linear
with respect to the number of nodes in the graph (each corresponding to a
statement).
statement) \sergio{footnote?}.
The SDG is a directed graph, and as such it has vertices or nodes, each
representing an instruction in the program ---barring some auxiliary nodes
representing an \deleted{instruction}\added{statement} in the program ---barring some auxiliary nodes
introduced by some approaches--- and directed edges, which represent the
dependencies among nodes. Those edges represent various kinds of dependencies
---control, data, calls, parameter passing, summary--- which will be defined in
dependencies among nodes. Those edges represent various\sergio{several} kinds of dependencies
---control, data, calls, parameter passing, summary--- which will be defined\sergio{further explained?} in
section~\ref{sec:first-def-sdg}.
To create the SDG, first a \textsl{control flow graph} (CFG) is built for each method
in the program, then its control and data dependencies are computed, resulting
in the \textsl{program dependence graph} (PDG). Finally, all the graphs from every
method are joined into the SDG. This process will be explained at greater
To create the SDG, first \deleted{a}\added{the corresponding} \textsl{control flow graph} (CFG) is built for each method
in the program, then\added{,} its \added{associated }control and data dependencies are computed, resulting
in \added{a new graph representation known as }the \textsl{program dependence graph} (PDG)\sergio{cita??}. Finally, all the graphs from every
method are joined \added{by the appearance of a new kind of inter-procedural arcs, the argument-in argument-out arcs that link function definitions with function calls, obtaining}\deleted{into} the \added{final} SDG. This process will be explained at greater
lengths in section~\ref{sec:first-def-sdg}.
%TODO: marked for removal --- this process is repeated later in ref{sec:first-deg-sdg}
%\begin{description}
@ -164,10 +171,10 @@ lengths in section~\ref{sec:first-def-sdg}.
%method, and an extra type of edge exists: \textsl{summary edges}, which
%summarize the data dependencies between input and output variables.
%\end{description}
An example is provided in figure~\ref{fig:basic-graphs}, where a simple
multiplication program is converted to CFG, then PDG and finally SDG. For
simplicity, only the CFG and PDG of \texttt{main} are omitted. Control
dependencies are black, data dependencies red, and summary edges blue.
An example \added{of how an initial CFG is augmented and enhanced with all mentioned dependencies obtaining the corresponding PDG and the final SDG} is provided in figure~\ref{fig:basic-graphs}, where a \added{the process is illustrated for a} simple
multiplication program\deleted{ is converted to CFG, then PDG and finally SDG}. For
simplicity, only the CFG and PDG of \texttt{main} are omitted\sergio{no entiendo esto de main. Donde esta main?}. Control
dependencies are \added{represented with }black \added{arcs}, data dependencies \added{with} red \added{arcs}, and summary edges \added{are depicted with }blue \added{arcs}.
\begin{figure}
\centering