added solution proposals for unconditional jumps

2019-12-07 18:56:01 +00:00 · 2019-12-07 18:56:01 +00:00 · 8ebf72444e
commit 8ebf72444e
parent 6d747e8c29
3 changed files with 199 additions and 0 deletions
--- a/Secciones/problem_solution.tex
+++ b/Secciones/problem_solution.tex
@ -4,6 +4,183 @@
 \chapter{Proposed solution}
 \label{cha:solution}

+Even though the current state of the art considers exception handling, their treatment is not perfect. The mistakes made by program slicers can be classified in two: those that lower the completeness and those that lower the correctness.
+
+The first kind is the most important one, as the resulting slices may be incorrect ---as in produce different values than the original program--- making them invalid for some uses of program slicing. As an example, imagine a slice used for program debugging which does not reach the slicing criterion due to an uncaught exception.
+
+The second kind is less important, but still useful to address, as the smaller a slice is, the easier it is to use it.
+
+The rest of this chapter features different errors found in the state of the art, each with a detailed description, example, and proposals that solve them.
+
+\section{Unconditional jump handling}
+
+The standard treatment of unconditional jumps as pseudo-statements introduces two separate correctness errors: the subsumption correctness error and the structure-exiting jump, that is only relevant in the context of weak slicing.
+
+\subsection{Subsumption correctness error}
+
+This problem has been known since the seminal publication on slicing unconditional jumps~\cite{BalH93}: chapter 4 details an example where the slice is bigger than it needs to be, and leave the solution of that problem as an open question to be solved in future publications. A similar example ---with \texttt{break} statements instead of \texttt{goto}--- is shown in example~\ref{exa:problem-break-sub}.
+
+\begin{example}[Example of unconditional jump subsumption~\cite{BalH93}]
+	\label{exa:problem-break-sub}
+	Consider the code shown in the left side of figure~\ref{fig:problem-break-sub}. It is a simple Java method containing a \texttt{while} statement, from which the execution may exit naturally or through any of the \texttt{break} statements (lines 6 and 9). For the rest of statements and expressions, uppercase letters are used; and no data dependencies are considered, as they are not relevant to the problem at hand.
+
+	\begin{figure}[h]
+		\begin{minipage}{0.33\linewidth}
+			\begin{lstlisting}
+public void f() {
+	while (X) {
+		if (Y) {
+			if (Z) {
+				A;
+				break;
+			}
+			B;
+			break;
+		}
+		C;
+	}
+	D;
+}
+
+			\end{lstlisting}
+		\end{minipage}
+		\begin{minipage}{0.33\linewidth}
+			\begin{lstlisting}
+public void f() {
+	while (X) {
+		if (Y) {
+			if (Z) {
+
+				break;
+			}
+
+			break;
+		}
+		C;
+	}
+
+}
+			\end{lstlisting}
+		\end{minipage}
+		\begin{minipage}{0.33\linewidth}
+			\begin{lstlisting}
+public void f() {
+	while (X) {
+		if (Y) {
+
+
+
+
+
+			break;
+		}
+		C;
+	}
+
+}
+			\end{lstlisting}
+		\end{minipage}
+		\caption{A program (left), its computed slice (centre) and the smallest complete slice (right).}
+		\label{fig:problem-break-sub}
+	\end{figure}
+
+	Now consider statement \texttt{C} (line 11) as the slicing criterion. Figure~\ref{fig:problem-break-sub-sdg} displays the SDG produced for the program, and the nodes selected by the slice. Figure~\ref{fig:problem-break-sub} displays the computed slice on the centre, and the smallest slice possible on the left. The inner \texttt{break} on line 9 and the \texttt{if} surrounding it (line 7) have been unnecessarily included. Their inclusion would not be specially problematic, if it were not for the condition of the \texttt{if} statement, which may include extra data dependencies, whose only task is to control line 3.
+
+	Line 6 is not useful, because whether or not it executes, the execution will continue on line 13 (after the \texttt{while}), as guaranteed by line 9, which is not guarded by any condition. Note that \texttt{B} is still control-dependent on line 5, as it has a direct effect on it, but the dependence from line 5 to line 9 introduces useless statements into the slice.
+
+	\begin{figure}[h]
+		\centering
+		\includegraphics[width=0.5\linewidth]{img/problem-break-sub-graph}
+		\caption{The system dependence graph for the program of figure \ref{fig:problem-break-sub}, with the slice marked in grey, and the slicing criterion in bold.}
+		\label{fig:problem-break-sub-sdg}
+	\end{figure}
+\end{example}
+
+The problem showcased in example~\ref{exa:problem-break-sub} can be generalized for any pair of unconditional jump statements that are nested and whose destination is the same. Formally, if a program $P$ contains a pair of unconditional jumps without any data (e.g. \texttt{goto label}, \texttt{continue [label]}, \texttt{break [label]}, \texttt{return}) $j_A$ and $j_B$ whose destinations (the instruction that will be executed after them) are $A$ and $B$, then $j_B$ is superfluous in the slice if and only if $A = B$ and $j_B$ is inside a conditional instruction $C$, and $j_A$ follows $C$ (not necessarily immediately). \carlos{Buscar mejor descripcion para la estructura ``nested''.} \carlos{Maybe use control dependencies between them.} Once $j_B$ is included, $C$ will also be included, and so will all of its data dependencies.
+
+\subsubsection*{Proposal}
+
+As only the minimum amount of control edges are inserted into the PDG (according to definition~\ref{def:pdg}), the only edge that can be traverse to include the inner jump ($j_B$) is an edge $j_B \ctrldep j_A$. An exception can be included when generating the PDG, such that control edges between two unconditional jumps $j_X$ and $j_Y$ whose destinations are $X$ and $Y$ will not be included if $X = Y$.
+
+If the edge is not present, all inner unconditional jumps and their containing structures will be excluded from the slice, unless they are included for another reason.
+
+\subsection{Unnecessary instructions in weak slicing}
+
+In the context of weak slicing, as it is not necessary to behave exactly like the original program. This means that some statements may be removed, even if it means that a loop will become infinite, or an exception will not be caught. The following example describes a specific example which is generalized later in this section.
+
+\begin{example}[Unnecessary unconditional jumps]
+	\label{exa:problem-break-weak}
+	Consider the code for method \texttt{g} on figure~\ref{fig:problem-break-weak-code}, which features a simple loop with a \texttt{break} statement within. The slice in the middle has been created with respect to the criterion (line 6, variable \texttt{x}), and includes everything except the print statement. This seems correct, as the presence of lines 4 and 5 determine the number of times line 6 is executed.
+
+	However, if you consider weak slicing, instead of strong slicing; the loop's termination stops mattering, lines 4 and 5 are no longer relevant. Without them, the slices produces an infinite list natural numbers (0, 1, 2, 3, 4, 5...), but as that is a prefix of the original program ---which outputs the numbers 0 to 9--- the program is still a valid slice (pictured on figure~\ref{fig:problem-break-weak-code}'s right side).
+
+	Note that the removal of lines 4 and 5 is only possible if there are no statements in the slice after the \texttt{while} statement. If the slicing criterion is line 8, variable \texttt{x}, lines 4 and 5 are required to print the value, as without them, the program would loop indefinitely and never execute line 8.
+
+	\begin{figure}[h]
+		\begin{minipage}{0.33\linewidth}
+		\begin{lstlisting}
+void g() {
+	int x = 0;
+	while (x > 0) {
+		if (x > 10)
+			break;
+		x++;
+	}
+	System.out.println(x);
+}
+		\end{lstlisting}
+		\end{minipage}
+		\begin{minipage}{0.33\linewidth}
+		\begin{lstlisting}
+void g() {
+	int x = 0;
+	while (x > 0) {
+		if (x > 10)
+			break;
+		x++;
+	}
+
+}
+		\end{lstlisting}
+		\end{minipage}
+		\begin{minipage}{0.33\linewidth}
+		\begin{lstlisting}
+void g() {
+	int x = 0;
+	while (x > 0) {
+
+
+		x++;
+	}
+
+}
+		\end{lstlisting}
+		\end{minipage}
+		\caption{A simple loop with a break statement (left), its computed slice (middle) with respect to line 5, variable \texttt{x}, and the smallest weak slice (right) for the same slicing criterion.}
+		\label{fig:problem-break-weak-code}
+	\end{figure}
+\end{example}
+
+If we try to generalize this problem, it becomes apparent that instructions that jump backwards (e.g., \texttt{continue}) present a problem, as they may add executions in the middle, not at the end (where they can be disregarded in weak slicing). Therefore, not only has the jump to go forwards, but no instruction can be performed after the jump.
+
+Therefore, a forward jump $j$ (e.g., \texttt{break}, \texttt{return [value]}, \texttt{throw [value]}) whose destination is $X$ is not necessary in a slice $S$ if and only there is no statement $s \in S$ which is after $X$, meaning that there is a path from $X$ to $s$ in the CFG. 
+
+As with the previous error, the problem is not the inclusion of the jump and its controlling conditional instruction, but the inclusion of the data dependencies of the condition guarding the execution of the jump.
+
+\subsubsection*{Proposal}
+
+This problem cannot be easily solved, as it is a ``dynamic'' one, requiring information about the completed slice before allowing the removal of unconditional jumps and their dependencies. This means that the cost of this proposal can not be offloaded to the creation of the SDG as with the previous one.
+
+Our proposal revolves around temporarily remove edges from the SDG: given an SDG of the form $G = \langle N, E_c, E_d, E_{in}, E_{out}, E_{fc} \rangle$, remove from $E_c$ any edge of the form $x \ctrldep y | x, y \in N$, where $x$ is an unconditional forward jump; perform the slice normally; and then ---if there is any statement after the destination of $x$ in the slice--- restore the edges removed in the first step and recompute the slice. The slice would still be linear, because each node would be visited at most once; but the algorithm has a higher complexity, and the removal and restoration of the control edges has a cost; albeit small.
+
+\section{The \texttt{try-catch} statement}
+
+\subsection{The problem with the control dependency's definition}
+
+\section{Previous text}
+\carlos{Here begins the old text.}
+\hrulefill
+
 This solution is an extension of Allen's \cite{AllH03}, with some modifications to solve the problem found \josep{el problem found no ha quedado claro. Se ha diluido entre la maraña abrumadora de casos. debes formular y dejar nitido cristalino cual es el problema y por qué no lo solucinan las dsemás aproximaciones, y poner un ejempllo concreto.}. Before starting, we need to split all instructions in three categories:

 \begin{description}
--- a/img/problem-break-sub-graph.dot
+++ b/img/problem-break-sub-graph.dot
@ -0,0 +1,22 @@
+digraph pdf {
+    entry [label="enter f()",style=filled];
+	entry -> {while; D};
+	while [label="while (O)",style=filled];
+    C [style="bold,filled"]
+    "if (P)" [style=filled]
+    "if (Q)" [style=filled]
+	while -> {"if (P)" C};
+    break2 [style=filled]
+    break1 [style=filled]
+	"if (P)" -> {"if (Q)"};
+	"if (Q)" -> {A break1};
+	break1 -> B;
+	break1 -> break2;
+	break2 -> {C while};
+	{rank=same; A break1 B break2}
+	{rank=same; "if (Q)" C}
+	{rank=same; while D}			
+	{edge [style=invis];
+		A -> break1 -> B -> break2;
+	}
+}
--- a/img/problem-break-sub-graph.pdf
+++ b/img/problem-break-sub-graph.pdf