tfm-report/solution.tex

113 lines
9.5 KiB
TeX

% !TEX encoding = UTF-8
% !TEX spellcheck = en_GB
% !TEX root = paper.tex
\chapter{Proposed solution}
This solution is an extension of Allen's\cite{AllH03}, with some modifications to solve the problem found. Before starting, we need to split all instructions in three categories:
\begin{description}
\item[statement] non-branching instruction, e.g. an assignment or method call.
\item[predicate] conditional branch, e.g. if statements and loops.
\item[pseudo-predicate] unconditional jump, e.g. break, continue, return, goto and throw instructions.
\end{description}
Pseudo-predicates have been previously use to model unconditional jumps with a counter-intuitive reasoning: the next statement that would be executed were the pseudo-predicate not there would be executed, therefore it is control dependent on it. Going back to the definition of control dependency, one could argue that the real control dependency is on the conditional branch that lead to the
\begin{figure}
\centering
\begin{lstlisting}
if (a) {
return a;
}
print(a);
\end{lstlisting}
\begin{lstlisting}
if (a) {
}
print(a);
\end{lstlisting}
\caption{Example of pseudo-predicates control dependencies}
\end{figure}
This is the process used to build the Program Dependence Graph.
\begin{description}
\item[Step 1 (static analysis):] Identify for each instruction the variables read and defined. Each method is annotated with the global variables that they access or modify.
\item[Step 2 (build CFGs):] Build a CFG for each method of the program. The start of all methods is a vertex labeled \textsl{enter}, which also contains the assignments for parameters and global variables used (\texttt{var = var\_in}). The \textsl{enter} node is connected to the first instruction of the method. In a similar fashion, all methods end in an \textsl{exit} vertex with the corresponding output variables. There exists one \textsl{normal exit} to which the last instruction and all return instructions are connected. If the method can throw any exceptions, there exists one \textsl{error exit} for each type of exception that may be thrown. The normal and erroneous exits are connected to the \textsl{exit} node.
Every normal statement is connected to the subsequent one by an unlabeled edge. Predicates have two outgoing edges, labeled \textsl{true} and \textsl{false}. Pseudo-predicates also have two outgoing edges. The \textsl{true} edge is connected to the destination of the jump (\textsl{normal exit} in the case of return, the begin or end of the loop in the case of continue and break, etc.). The \textsl{false} edge is a non-executable edge, marked with a dashed line, and it is connected to the next instruction that would be executed if the pseudo-predicate was a \textsl{nop}.
Nodes that represent a call to a method $M$ include the transfer of parameters and variables that may be read or written to, then execute the call, and finally the extraction of modified variables. Call nodes are an exception to the previous paragraph, as they can have an unlimited amount of outgoing edges. Each outgoing edge lands on a pseudo-predicate which indicates if the execution was correct or an exception was raised. The executable edge of each pseudo-predicate will lead to the next instruction to be executed, whereas the non-executable one will lead to the end of the try-catch block. All call nodes can lead to a \textsl{normal return} node, which is linked to the next instruction, and one error node for each type of exception that may be thrown. The erroneous returns are labeled \textsl{catch ExType}, and lead to the first instruction in the corresponding catch block\footnotemark. Any exception that may not be caught will lead to the erroneous exit node of the method it's in. See the example for more details.
\footnotetext{A problem presents itself here, as some exceptions may be able to trigger different catch blocks, due to the secuential nature of catches and polymorphism in Java. A way to fix this is to make catch blocks behave as a switch.}. %TODO
\item[Step 3 (compute dependences):] For each node in the CFG, compute the control and data dependencies. Non-executable edges are only included when computing control dependencies.\\
\carlos{put inside definition}
A node $a$ is \textsl{control dependent} on node $b$ iff $a$ post-dominates one but not all of $b$'s successors.\\
A node $a$ is \textsl{data dependent} on node $b$ iff $b$ defines or may define a variable $x$, $a$ uses or may use $x$, and there is an $x$-definition-free path in the CFG from $b$ to $a$.\\
\item[Step 4 (convert each CFG into a PDG):] each node of the CFG is one node of the PDG, with two exceptions. The first are the \textsl{enter}, \textsl{exit} and method call nodes, where the variable input and output assignments are split and placed as control-dependent on their original node. The second is the \textsl{exit} node, which is to be removed (the control-dependencies from \textsl{exit} to the variable outputs is transferred to the \textsl{enter} node). Then all the dependencies computed in the previous step are drawn.
\item[Step 5 (connect PDGs to form a SDG):] each method call to $M$ must be connected to the \textsl{enter} node in $M$'s PDG, as a control dependence. Each variable input from the method call is connected to a variable input of the method definition via a data dependence. Each variable output from the method definition is connected to the variable output of the method call via a data dependence. Each method exit is connected \carlos{complete}.
\end{description}
\begin{itemize}
\item An extra type of control dependency represented by an ``exception edge''. It will represent the need to include a \textsl{catch} clause when an exception can be thrown. It is represented with a dotted line (dashed line is for data dependency). These edges have a special characteristic: when one is traversed, only ``exception edges'' may be traversed from the new nodes included in the slice. If the same node is reached by another kind of edge, the restriction is lifted. The behavior is documented in algorithm \ref{alg:2pass}, with changes from the original algorithm are \underline{underlined}.
\item Add an extra ``exception edge'' from each ``exit with exception of type T'' node, where the type of the exception is \texttt{t} to all the corresponding ``\texttt{throw e}'', such that \texttt{e} is or inherits from \texttt{T}.
\item Add an extra ``exception edge'' from each catch statement to every statement that can throw that error.
\item The exception edges will only be placed when the method or the try-catch statement are loop-carrier\footnote{Loop-carrier, when referring to a statement, is the property that in a CFG for the complete program, the node representing the statement is part of a loop, meaning that it could be executed again once it is executed.}.
\end{itemize}
\begin{algorithm} % generate slice
\caption{Two-pass algorithm to obtain a backward static slice with exceptions}
\label{alg:2pass}
\begin{algorithmic}[1]
\REQUIRE SDG $\mathcal{G}$ representing program P. $\mathcal{G} = \{\mathcal{S}, \mathcal{E}\}$, where $\mathcal{S}$ is a set of states (some are statements) connected by a set of edges $\mathcal{E}$. Each edge, is a triplet composed of the type of edge (control, data or \underline{exception} dependency, summary, param-in, param-out), the source and destination of the edge.
\REQUIRE A slicing criterion, composed of a statement $s \in \mathcal{S}$ and a variable $v$.
\ENSURE $\mathcal{S}' \subseteq \mathcal{S}$, representing the slice of P according to the criterion provided.
\medskip
\COMMENT{First pass (do not traverse output parameter edges).}
\STATE{$\mathcal{S}' \Leftarrow \emptyset$ (slice), $\mathcal{Q}\Leftarrow\{s\}$ (queue), $\mathcal{S}\Leftarrow \mathcal{S} - \{s\}$ (not visited), $\mathcal{R}\Leftarrow \emptyset$ (only visited via exception edge)}
\WHILE{$\mathcal{Q} \neq \emptyset$}
\STATE{$a \in \mathcal{Q}$} \COMMENT{Select an element from $\mathcal{Q}$}
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} - \{a\}$}
\STATE{$\mathcal{S}' \Leftarrow \mathcal{S}' + \{a\}$}
\FORALL{$\mathcal{A}$ in $\{\{type, origin, a\} \in \mathcal{E}\}$}
\IF{$type \neq$ param-out \AND ($origin \notin \mathcal{S}'$ \OR ($origin \in \mathcal{R}$ \AND $a \notin \mathcal{R}$))} \label{line:param-out}
\IF{\underline{$a \in \mathcal{R}$}}
\IF{\underline{$type =$ exception}}
\STATE{\underline{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}}
\STATE{\underline{$\mathcal{R} \Leftarrow \mathcal{R} + \{origin\}$}}
\ENDIF
\ELSE
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}
\ENDIF
\ENDIF
\ENDFOR
\ENDWHILE
\\
\medskip
\COMMENT{Second pass (very similar, do not traverse input parameter edges).}
\STATE $\mathcal{Q} \Leftarrow \mathcal{S}'$
\WHILE{$\mathcal{Q} \neq \emptyset$}
\STATE{$a \in \mathcal{Q}$} \COMMENT{Select an element from $\mathcal{Q}$}
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} - \{a\}$}
\STATE{$\mathcal{S}' \Leftarrow \mathcal{S}' + \{a\}$}
\FORALL{$\mathcal{A}$ in $\{\{type, origin, a\} \in \mathcal{E}\}$}
\IF{$type \neq$ param-in \AND ($origin \notin \mathcal{S}'$ \OR ($origin \in \mathcal{R}$ \AND $a \notin \mathcal{R}$))}
\IF{\underline{$a \in \mathcal{R}$}}
\IF{\underline{$type =$ exception}}
\STATE{\underline{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}}
\STATE{\underline{$\mathcal{R} \Leftarrow \mathcal{R} + \{origin\}$}}
\ENDIF
\ELSE
\STATE{$\mathcal{Q} \Leftarrow \mathcal{Q} + \{origin\}$}
\ENDIF
\ENDIF
\ENDFOR
\ENDWHILE
\end{algorithmic}
\end{algorithm}
% vim: set noexpandtab:ts=2:sw=2:wrap