tfm-report/Secciones/problem_solution.tex

299 lines
22 KiB
TeX
Raw Normal View History

2019-12-04 16:53:32 +01:00
% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = ../paper.tex
\chapter{Proposed solution}
\label{cha:solution}
2019-12-07 23:47:45 +01:00
\josep{Antes de nada, felicidades Carlos. En esta sección se ha notado una mejora importante. Sobretodo al introudcir los problemas, los ejemplos, etc. Sigue así!}
2019-12-07 23:47:45 +01:00
\josep{This chapter features different problems and weaknesses of the current treatment that program slicing techniques use in presence of exceptions. Each problem is described with a counterexample that illustrates the loss of completeness or precision. Finally, for each problem a solution is proposed.}
2019-12-07 23:47:45 +01:00
\josep{With regards to the problems, }Even though the current state of the art considers exception handling, their treatment is not perfect. The mistakes made by program slicers can be classified in two: \josep{(1)} those that lower the completeness and \josep{(2)} those that lower the correctness. \josep{Remarco el 1 y el 2 porque los referencias mas adelante, lejos, y así se sabe que las referencias vienen aqui.}
2019-12-07 23:47:45 +01:00
The first kind is the most important one, as the resulting slices may be incorrect \josep{(i.e., the behaviour of the slice is different from the behaviour of the original program)\deleted{---as in produce different values than the original program---}} making them invalid for some uses of program slicing. \josep{A good example of the effects that these wrong slices may produce happens when they are used for program debugging, but the the error that we want to debug does not appear anymore, or even the slicing criterion cannot be reached due to an uncaught exception. \deleted{As an example, imagine a slice used for program debugging which does not reach the slicing criterion due to an uncaught exception.}}
The second kind is less \josep{critic\deleted{important}}, but still \josep{important because a wrong treatment of exceptions can cause the inclusion of wrong dependencies in the slice, thus producing unnecessary long slices that may turn to be useless for some applications\deleted{useful to address, as the smaller a slice is, the easier it is to use it}}.
\deleted{The rest of this chapter features different errors found in the state of the art, each with a detailed description, example, and proposals that solve them.}
\section{Unconditional jump handling}
2019-12-07 23:47:45 +01:00
The standard treatment of unconditional jumps as pseudo-statements introduces two separate correctness errors: \josep{\emph{the subsumption correctness error}, which is relevant in the context of both strong and weak slicing, and the \emph{structure-exiting jump}}, \josep{which\deleted{that}} is only relevant in the context of weak slicing.
2019-12-07 23:47:45 +01:00
\subsection{\josep{Problem 1: }Subsumption correctness error}
This problem has been known since the seminal publication on slicing unconditional jumps~\cite{BalH93}: chapter 4 details an example where the slice is bigger than it needs to be, and leave the solution of that problem as an open question to be solved in future publications. A similar example ---with \texttt{break} statements instead of \texttt{goto}--- is shown in example~\ref{exa:problem-break-sub}.
\begin{example}[Example of unconditional jump subsumption~\cite{BalH93}]
\label{exa:problem-break-sub}
Consider the code shown in the left side of figure~\ref{fig:problem-break-sub}. It is a simple Java method containing a \texttt{while} statement, from which the execution may exit naturally or through any of the \texttt{break} statements (lines 6 and 9). For the rest of statements and expressions, uppercase letters are used; and no data dependencies are considered, as they are not relevant to the problem at hand.
\begin{figure}[h]
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
public void f() {
while (X) {
if (Y) {
if (Z) {
A;
break;
}
B;
break;
}
C;
}
D;
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
public void f() {
while (X) {
if (Y) {
if (Z) {
break;
}
break;
}
C;
}
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
public void f() {
while (X) {
if (Y) {
break;
}
C;
}
}
\end{lstlisting}
\end{minipage}
\caption{A program (left), its computed slice (centre) and the smallest complete slice (right).}
\label{fig:problem-break-sub}
\end{figure}
2019-12-07 23:47:45 +01:00
Now consider statement \texttt{C} (line 11) as the slicing criterion. Figure~\ref{fig:problem-break-sub-sdg} displays the SDG produced for the program, and the nodes selected by the slice. Figure~\ref{fig:problem-break-sub} displays the computed slice on the centre, and the \josep{minimal slice\deleted{smallest slice possible}} on the left\josep{en realidad hay otro minimal slice si dejamos el otro break y quitamos el que hemos dejado}. The inner \texttt{break} on line \josep{6\deleted{9}} and the \texttt{if} surrounding it (line \josep{4\deleted{7}}) have been unnecessarily included. Their inclusion would not be specially problematic, if it were not for the condition of the \texttt{if} statement \josep{(line 4)}, which may include extra data dependencies \josep{that are unnecessary in the slice and that may led to include other unnecessary statements, making the slice even more imprecise\deleted{, whose only task is to control line 3}}.
2019-12-07 23:47:45 +01:00
Line 6 is not useful, because whether or not it executes, the execution will continue on line 13 (after the \texttt{while}), as guaranteed by line 9, which is not guarded by any condition. Note that \texttt{B} is still control-dependent on line \josep{6\deleted{5}}, as it has a direct effect on it, \josep{no termino de entender esta frase}but the dependence from line 5 to line 9 introduces useless statements into the slice.
\begin{figure}[h]
\centering
\includegraphics[width=0.5\linewidth]{img/problem-break-sub-graph}
2019-12-07 23:47:45 +01:00
\caption{The system dependence graph for the program of figure \ref{fig:problem-break-sub}, with the slice marked in grey, and the slicing criterion in bold.\josep{En las condiciones pones O,P,Q en lugar de X,Y,Z}}
\label{fig:problem-break-sub-sdg}
\end{figure}
\end{example}
2019-12-07 23:47:45 +01:00
The problem showcased in example~\ref{exa:problem-break-sub} can be generalized for any pair of unconditional jump statements that are nested and whose destination is the same. Formally, \josep{lo que sigue es bastante lioso. Yo crearia un entorno "problem" (como el de definition o example) y pondria el problema descrito formalmente en ese entorno. Despues, lo aclararia con una breve explicacion similar a la que hay entremezclada con la definicion formal}if a program $P$ contains a pair of unconditional jumps without any data (e.g. \texttt{goto label}, \texttt{continue [label]}, \texttt{break [label]}, \texttt{return}) $j_A$ and $j_B$ whose destinations (the instruction that will be executed after them) are $A$ and $B$, then $j_B$ is superfluous in the slice if and only if $A = B$ and $j_B$ is inside a conditional instruction $C$, and $j_A$ follows $C$ (not necessarily immediately). \carlos{Buscar mejor descripcion para la estructura ``nested''.} \carlos{Maybe use control dependencies between them.} Once $j_B$ is included, $C$ will also be included, and so will all of its data dependencies.
2019-12-07 23:47:45 +01:00
\subsubsection*{\josep{\deleted{Proposal}A solution for the subsumption correctness error}}
As only the minimum amount of control edges are inserted into the PDG (according to definition~\ref{def:pdg}), the only edge that can be traverse to include the inner jump ($j_B$) is an edge $j_B \ctrldep j_A$. An exception can be included when generating the PDG, such that control edges between two unconditional jumps $j_X$ and $j_Y$ whose destinations are $X$ and $Y$ will not be included if $X = Y$.
If the edge is not present, all inner unconditional jumps and their containing structures will be excluded from the slice, unless they are included for another reason.
2019-12-07 23:47:45 +01:00
\josep{pon a continuacion un ejemplo solucionando el problema (al menos di como quedaría el SDG)}
\subsection{\josep{Problem 2: }Unnecessary instructions in weak slicing}
2019-12-08 03:18:07 +01:00
\josep{Esta frase esta mal construida}In the context of weak slicing, as it is not necessary to behave exactly like the original program. This means that some statements may be removed, even if it means that a loop will become infinite, or an exception will not be caught. The following example describes a specific example which is generalized later in this section.
\begin{example}[Unnecessary unconditional jumps]
\label{exa:problem-break-weak}
Consider the code for method \texttt{g} on figure~\ref{fig:problem-break-weak-code}, which features a simple loop with a \texttt{break} statement within. The slice in the middle has been created with respect to the criterion (line 6, variable \texttt{x}), and includes everything except the print statement. This seems correct, as the presence of lines 4 and 5 determine the number of times line 6 is executed.
2019-12-08 03:18:07 +01:00
However, if \josep{one considers\deleted{you consider}} weak slicing, instead of strong slicing; the loop's termination stops mattering, lines 4 and 5 are no longer relevant. Without them, the slices produce\josep{\deleted{s}} an infinite list \josep{of} natural numbers (0, 1, 2, 3, 4, 5...), but as that is a prefix \josep{suena raro que una lista infinita sea un prefijo de 0-9, mas bien es al reves}of the original program ---which outputs the numbers 0 to 9--- the program is still a valid slice (pictured on figure~\ref{fig:problem-break-weak-code}'s right side).
Note that the removal of lines 4 and 5 is only possible if there are no statements in the slice after the \texttt{while} statement. If the slicing criterion is line 8, variable \texttt{x}, lines 4 and 5 are required to print the value, as without them, the program would loop indefinitely and never execute line 8.
\begin{figure}[h]
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
void g() {
int x = 0;
while (x > 0) {
if (x > 10)
break;
x++;
}
System.out.println(x);
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
void g() {
int x = 0;
while (x > 0) {
if (x > 10)
break;
x++;
}
}
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.33\linewidth}
\begin{lstlisting}
void g() {
int x = 0;
while (x > 0) {
x++;
}
}
\end{lstlisting}
\end{minipage}
\caption{A simple loop with a break statement (left), its computed slice (middle) with respect to line 5, variable \texttt{x}, and the smallest weak slice (right) for the same slicing criterion.}
\label{fig:problem-break-weak-code}
\end{figure}
\end{example}
If we try to generalize this problem, it becomes apparent that instructions that jump backwards (e.g., \texttt{continue}) present a problem, as they may add executions in the middle, not at the end (where they can be disregarded in weak slicing). Therefore, not only has the jump to go forwards, but no instruction can be performed after the jump.
Therefore, a forward jump $j$ (e.g., \texttt{break}, \texttt{return [value]}, \texttt{throw [value]}) whose destination is $X$ is not necessary in a slice $S$ if and only there is no statement $s \in S$ which is after $X$, meaning that there is a path from $X$ to $s$ in the CFG.
As with the previous error, the problem is not the inclusion of the jump and its controlling conditional instruction, but the inclusion of the data dependencies of the condition guarding the execution of the jump.
2019-12-08 03:18:07 +01:00
\subsubsection*{\josep{\deleted{Proposal}A solution for the unnecessary instructions in weak slicing}}
2019-12-08 03:18:07 +01:00
This problem cannot be easily solved, as it is a ``dynamic'' one, requiring information about the completed slice before allowing the removal of unconditional jumps and their dependencies. This means that the cost of this proposal \josep{cannot\deleted{can not}} be offloaded to the creation of the SDG as with the previous one.
2019-12-08 03:18:07 +01:00
\josep{frase incorrecta}Our proposal revolves around temporarily remove edges from the SDG: given an SDG of the form \josep{En la definicion de SDG salia esta sextupla?}$G = \langle N, E_c, E_d, E_{in}, E_{out}, E_{fc} \rangle$, remove from $E_c$ any edge of the form $x \ctrldep y | x, y \in N$, where $x$ is an unconditional forward jump; perform the slice normally; and then ---if there is any statement after the destination of $x$ in the slice--- restore the edges removed in the first step and recompute the slice. The slice would still be linear, because each node would be visited at most once; but the algorithm has a higher complexity, and the removal and restoration of the control edges has a cost; albeit small.
\josep{pon a continuacion un ejemplo solucionando el problema}
\section{The \texttt{try-catch} statement}
\subsection{The problem with the control dependency's definition}
\section{Previous text}
\carlos{Here begins the old text.}
\hrulefill
2019-12-04 16:53:32 +01:00
This solution is an extension of Allen's \cite{AllH03}, with some modifications to solve the problem found \josep{el problem found no ha quedado claro. Se ha diluido entre la maraña abrumadora de casos. debes formular y dejar nitido cristalino cual es el problema y por qué no lo solucinan las dsemás aproximaciones, y poner un ejempllo concreto.}. Before starting, we need to split all instructions in three categories:
\begin{description}
\item[statement] non-branching instruction, e.g. an assignment or method call.
\item[predicate] conditional branch, e.g. if statements and loops.
\item[pseudo-predicate] unconditional jump, e.g. break, continue, return, goto and throw instructions.
\end{description}
Pseudo-predicates have been previously use to model unconditional jumps with a counter-intuitive reasoning: the next statement that would be executed were the pseudo-predicate not there would be executed, therefore it is control dependent on it. Going back to the definition of control dependency, one could argue that the real control dependency is on the conditional branch that lead to the \josep{???}
\begin{figure}
\centering
\begin{lstlisting}
if (a) {
return a;
}
print(a);
\end{lstlisting}
\begin{lstlisting}
if (a) {
}
print(a);
\end{lstlisting}
\caption{Example of pseudo-predicates control dependencies \josep{no se referencia a esta figura desde ningún sitio}}
\end{figure}
This is the process used to build the Program Dependence Graph. \josep{Todo lo que sigue es demasiado verbose. No hay definiciones concretas. Es todo muy informal, y no hay un ongoing example que permita ver como las fases van evolucionando paso a paso. Nos reunimos para hablar de esta sección antes de reescribirla...}
\begin{description}
\item[Step 1 (static analysis):] Identify for each instruction the variables read and defined. Each method is annotated with the global variables that they access or modify.
\item[Step 2 (build CFGs):] Build a CFG for each method of the program. The start of all methods is a vertex labeled \textsl{enter}, which also contains the assignments for parameters and global variables used (\texttt{var = var\_in}). The \textsl{enter} node is connected to the first instruction of the method. In a similar fashion, all methods end in an \textsl{exit} vertex with the corresponding output variables. There exists one \textsl{normal exit} to which the last instruction and all return instructions are connected. If the method can throw any exceptions, there exists one \textsl{error exit} for each type of exception that may be thrown. The normal and erroneous exits are connected to the \textsl{exit} node.
Every normal statement is connected to the subsequent one by an unlabeled edge. Predicates have two outgoing edges, labeled \textsl{true} and \textsl{false}. Pseudo-predicates also have two outgoing edges. The \textsl{true} edge is connected to the destination of the jump (\textsl{normal exit} in the case of return, the begin or end of the loop in the case of continue and break, etc.). The \textsl{false} edge is a non-executable edge, marked with a dashed line, and it is connected to the next instruction that would be executed if the pseudo-predicate was a \textsl{nop}.
Nodes that represent a call to a method $M$ include the transfer of parameters and variables that may be read or written to, then execute the call, and finally the extraction of modified variables. Call nodes are an exception to the previous paragraph, as they can have an unlimited amount of outgoing edges. Each outgoing edge lands on a pseudo-predicate which indicates if the execution was correct or an exception was raised. The executable edge of each pseudo-predicate will lead to the next instruction to be executed, whereas the non-executable one will lead to the end of the try-catch block. All call nodes can lead to a \textsl{normal return} node, which is linked to the next instruction, and one error node for each type of exception that may be thrown. The erroneous returns are labeled \textsl{catch ExType}, and lead to the first instruction in the corresponding catch block\footnotemark. Any exception that may not be caught will lead to the erroneous exit node of the method it's in. See the example for more details.
\footnotetext{A problem presents itself here, as some exceptions may be able to trigger different catch blocks, due to the secuential nature of catches and polymorphism in Java. A way to fix this is to make catch blocks behave as a switch.}. %TODO
\item[Step 3 (compute dependences):] For each node in the CFG, compute the control and data dependencies. Non-executable edges are only included when computing control dependencies.\\
\carlos{put inside definition}
A node $a$ is \textsl{control dependent} on node $b$ iff $a$ post-dominates one but not all of $b$'s successors.\\
A node $a$ is \textsl{data dependent} on node $b$ iff $b$ defines or may define a variable $x$, $a$ uses or may use $x$, and there is an $x$-definition-free path in the CFG from $b$ to $a$.\\
\item[Step 4 (convert each CFG into a PDG):] each node of the CFG is one node of the PDG, with two exceptions. The first are the \textsl{enter}, \textsl{exit} and method call nodes, where the variable input and output assignments are split and placed as control-dependent on their original node. The second is the \textsl{exit} node, which is to be removed (the control-dependencies from \textsl{exit} to the variable outputs is transferred to the \textsl{enter} node). Then all the dependencies computed in the previous step are drawn.
\item[Step 5 (connect PDGs to form a SDG):] each method call to $M$ must be connected to the \textsl{enter} node in $M$'s PDG, as a control dependence. Each variable input from the method call is connected to a variable input of the method definition via a data dependence. Each variable output from the method definition is connected to the variable output of the method call via a data dependence. Each method exit is connected \carlos{complete}.
\end{description}
\begin{itemize}
\item An extra type of control dependency represented by an ``exception edge''. It will represent the need to include a \textsl{catch} clause when an exception can be thrown. It is represented with a dotted line (dashed line is for data dependency). These edges have a special characteristic: when one is traversed, only ``exception edges'' may be traversed from the new nodes included in the slice. If the same node is reached by another kind of edge, the restriction is lifted. The behavior is documented in algorithm \ref{alg:2pass}, with changes from the original algorithm are \underline{underlined}.
\item Add an extra ``exception edge'' from each ``exit with exception of type T'' node, where the type of the exception is \texttt{t} to all the corresponding ``\texttt{throw e}'', such that \texttt{e} is or inherits from \texttt{T}.
\item Add an extra ``exception edge'' from each catch statement to every statement that can throw that error.
\item The exception edges will only be placed when the method or the try-catch statement are loop-carrier\footnote{Loop-carrier, when referring to a statement, is the property that in a CFG for the complete program, the node representing the statement is part of a loop, meaning that it could be executed again once it is executed.}.
\end{itemize}
\begin{algorithm} % generate slice
\caption{Two-pass algorithm to obtain a backward static slice with exceptions}
\label{alg:2pass}
\begin{algorithmic}[1]
\REQUIRE SDG $\mathcal{G}$ representing program P. $\mathcal{G} = \{\mathcal{S}, \mathcal{E}\}$, where $\mathcal{S}$ is a set of states (some are statements) connected by a set of edges $\mathcal{E}$. Each edge, is a triplet composed of the type of edge (control, data or \underline{exception} dependency, summary, param-in, param-out), the source and destination of the edge.
\REQUIRE A slicing criterion, composed of a statement $s \in \mathcal{S}$ and a variable $v$.
\ENSURE $\mathcal{S}' \subseteq \mathcal{S}$, representing the slice of P according to the criterion provided.
\medskip
\COMMENT{First pass (do not traverse output parameter edges).}
\STATE{$\mathcal{S}' \Leftarrow \emptyset$ (slice), $\mathcal{Q}\Leftarrow\{s\}$ (queue), $\mathcal{S}\Leftarrow \mathcal{S} - \{s\}$ (not visited), $\mathcal{R}\Leftarrow \emptyset$ (only visited via exception edge)}
\WHILE{$\mathcal{Q} \neq \emptyset$}
\STATE{$a \in \mathcal{Q}$} \COMMENT{Select an element from $\mathcal{Q}$}
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} - \{a\}$}
\STATE{$\mathcal{S}' \Leftarrow \mathcal{S}' + \{a\}$}
\FORALL{$\mathcal{A}$ in $\{\{type, origin, a\} \in \mathcal{E}\}$}
\IF{$type \neq$ param-out \AND ($origin \notin \mathcal{S}'$ \OR ($origin \in \mathcal{R}$ \AND $a \notin \mathcal{R}$))} \label{line:param-out}
\IF{\underline{$a \in \mathcal{R}$}}
\IF{\underline{$type =$ exception}}
\STATE{\underline{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}}
\STATE{\underline{$\mathcal{R} \Leftarrow \mathcal{R} + \{origin\}$}}
\ENDIF
\ELSE
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}
\ENDIF
\ENDIF
\ENDFOR
\ENDWHILE
\\
\medskip
\COMMENT{Second pass (very similar, do not traverse input parameter edges).}
\STATE $\mathcal{Q} \Leftarrow \mathcal{S}'$
\WHILE{$\mathcal{Q} \neq \emptyset$}
\STATE{$a \in \mathcal{Q}$} \COMMENT{Select an element from $\mathcal{Q}$}
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} - \{a\}$}
\STATE{$\mathcal{S}' \Leftarrow \mathcal{S}' + \{a\}$}
\FORALL{$\mathcal{A}$ in $\{\{type, origin, a\} \in \mathcal{E}\}$}
\IF{$type \neq$ param-in \AND ($origin \notin \mathcal{S}'$ \OR ($origin \in \mathcal{R}$ \AND $a \notin \mathcal{R}$))}
\IF{\underline{$a \in \mathcal{R}$}}
\IF{\underline{$type =$ exception}}
\STATE{\underline{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}}
\STATE{\underline{$\mathcal{R} \Leftarrow \mathcal{R} + \{origin\}$}}
\ENDIF
\ELSE
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}
\ENDIF
\ENDIF
\ENDFOR
\ENDWHILE
\end{algorithmic}
\end{algorithm}
% vim: set noexpandtab:ts=2:sw=2:wrap