reviewed part 1 of chapter 3

This commit is contained in:
Carlos Galindo 2019-12-09 11:48:24 +00:00
parent 7ae9bc49f2
commit 7067525f98
6 changed files with 89 additions and 91 deletions

View file

@ -4,112 +4,97 @@
\chapter{Main explanation?}
\label{cha:incremental}
\carlos{Review if we want to call nodes ``Enter'' and ``Exit'' or ``Start'' and ``End'' (I'd prefer the first one).}
\sergio{Enter o Entry?}
\josep{No es una decision nuestra, coge la misma palabra que Orwitz en el paper del SDG}
\section{First definition of the SDG}
\label{sec:first-def-sdg}
The system dependence graph (SDG) is \deleted{a method}\added{the main data structure for program representation used in the}\deleted{for} program slicing\added{ area. It}\deleted{that} was first
proposed by Horwitz, Reps and Blinkey \cite{HorwitzRB88}\added{ and, since then, many approaches have based their models on it}. It builds upon the
existing control flow graph (CFG), defining dependencies between vertices of the
CFG, and building a program dependence graph (PDG), which represents them.\sergio{Volvemos a poner las siglas y su significado?CFG?PDG? ya se han puesto antes} The
\deleted{system dependence graph (}SDG\deleted{)} is then built from the assembly of the different
PDGs (each representing a method of the program), linking each method call to
its corresponding definition. Because each graph is built from the previous one,
new constructs can be added with to the CFG, without the need to alter the
algorithm that converts \added{each} CFG to PDG and then to \added{the final} SDG. The only modification
possible is the redefinition of a\added{n already defined} dependency or the addition of new kinds of
dependence.
The SDG is the most common data structure for program representation in the field of program slicing.
It was first proposed by Horwitz et al. \cite{HorwitzRB88} and, since then, many approaches to program slicing have based their models on it.
It builds upon the existing CFG, which represents the control flow between the instructions of a method. Then, it creates a PDG using the CFG's vertices and the dependencies computed from it.
The SDG is finally built from the assembly of the different method's PDGs, linking each method call to its corresponding definition.
Because each graph is built from the previous one, new statements and instructions can be added with to the CFG, without the need to alter the algorithm that converts each CFG to PDG and then to the final SDG.
The only modification possible is the redefinition of an already defined dependency or the addition of new kinds of dependence.
The language covered by the initial proposal \deleted{was}\added{is}\sergio{todo en presente o todo en pasado} a simple one, featuring
procedures with modifiable parameters and basic instructions, including calls to
procedures, variable assignments, arithmetic and logic operators and conditional
instructions (branches and loops)\deleted{:}\added{, i.e.,}\sergio{no se si i.e., queda bien aqui :/} the basic features of an imperative
programming language. The \deleted{control flow graph was}\added{CFGs are} as simple as the programs
themselves, with each graph representing one procedure. The instructions of the
program are represented as vertices of the graph and are split into two
categories: statements, which have no effect on the control flow (\added{e.g., }assignments,
procedure calls) and predicates, whose execution may lead to one of multiple
---though traditionally two--- \added{different paths} (\added{e.g., }conditional instructions). \deleted{S}\added{While s}tatements are
connected sequentially to the next instruction\deleted{. P}\added{, on the contrary, p}redicates have two outgoing
edges, each\added{ of them} connected to the first statement that should be executed\deleted{,} according
to the result of evaluating the conditional expression in the guard of the
predicate.
The seminal appearance of the SDG covers a simple imperative programming language, featuring procedures and basic instructions like calls, variable assignments, arithmetic and logic operators and conditional instructions (branches and loops).
\begin{definition}[Control Flow Graph \carlos{add original citation}]
A \emph{control flow graph} $G$ of a program\sergio{program o method?} $P$ is a directed graph, represented as a tuple $\langle N, E \rangle$, where $N$ is a set of nodes \josep{such that for each statement $s$ in $P$there is a node in $N$ labeled with $S$ and there are two special nodes...}, composed of a method's\sergio{method o program?} statements plus two special nodes, ``Start'' and ``End''; and $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$.
\josep{Esto es una definicion. No pueden haber opinion ni contenido vago. O defines que Start y End son nodos o no lo defines. Pero no diugas lo que han hecho otros en una definicion. Lo que sigue yo lo quitaría}
Most algorithms\added{, in order} to generate the SDG\added{,} mandate the ``Start'' node to be the only source and \added{the} ``End'' \added{node} to be the only sink in the graph. \carlos{Is it necessary to define source and sink in the context of a graph?}\josep{quitalo}.
\josep{desde aqui}Edges are created according to the possible execution paths that exist; each statement is connected to any statement that may immediately follow it. Formally, \josep{hasta aqui sacalo fuera de la definicion, para explicarla., Pero no tiene sentido que digas algo informal en una defincicion y dentro incluso de la definicion digas formally, Debe ser TODO formally por definicion (valga la redundancia)}an edge $e = (n_1, n_2)$ exists if and only if there exists an execution of the program where $n_2$ is executed immediately after $n_1$. \josep{de nuevo, no puedes decir in general. O defines que si se evaluan o que no, pero no digas lo que se suele hacer. Aqui estas definiendo}In general, expressions are not evaluated\added{when generating the CFG}; so a\deleted{n \texttt{if}}\added{ conditional} instruction \added{will have}\deleted{has} two outgoing edges \added{regardless the condition value being} \deleted{even if the condition is} always true or false, e.g.\added{,} \texttt{1 == 0}.
Given a method $M$, which contains a list of statements $s = \{s_1, s_2, ...\}$, the \emph{control flow graph} of $M$ is a directed graph $G = \langle N, E \rangle$, where:
\begin{itemize}
\item $N = s \cup \{`\textnormal{Enter}', `\textnormal{Exit}'\}$: a set of nodes such that for each statement $s_i$ in $s$ there is a node in $N$ labelled with $s_i$ and two special nodes ``Enter'' and ``Exit'', which represent the beginning and end of the method, respectively.
\item $E$ is a set of edges of the form $e = \left(n_1, n_2\right) | n_1, n_2 \in N$. $e \in E$ if and only if there is a possible execution of $M$ where $n_2$ is executed immediately after $n_1$.
\end{itemize}
\end{definition}
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence. \sergio{But first, we need to define the concept of postdominance in a graph necessary in the definition of control dependency:}\sergio{no me convence mucho pero plantearse si poner algo aqui o dejarlo como esta.}
Most algorithms, in order to generate the SDG, mandate the ``Enter'' node to be the only source and the ``Exit'' node to be the only sink in the graph.
In general, expressions are not evaluated when generating the CFG; so an \texttt{if} conditional instruction will two outgoing edges regardless the condition value being always true or false (e.g., \texttt{1 == 0}).
To build the PDG and then the SDG, there are two dependencies based directly on the CFG's structure: data and control dependence. First, though, we need to define the concept of postdominance in a graph, as it is necessary in the definition of control dependency:
\begin{definition}[Postdominance \carlos{add original citation?}]
\label{def:postdominance}
\josep{Let $C = (N,E)$ be a CFG.}
Vertex $b\josep{\in N}$ \textit{postdominates} vertex $a\josep{\in N}$ if and only if $b$ is on every path from $a$ to the ``End'' vertex.
Let $C = (N, E)$ be a CFG, and $n_e \in N$ the ``Exit'' node of $C$. $b \in N$ \textit{postdominates} $a \in N$ if and only if $b$ is present on every possible sequence from $a$ to $n_e$.
\end{definition}
\begin{definition}[Control dependency\sergio{dependency o dependence?} \carlos{add original citation}]
From the previous definition, given that the ``Exit'' node is the only sink in the CFG, every node will have a path to it, so it follows that any node postdominates itself.
\begin{definition}[Control dependency \cite{HorwitzRB88}]
\label{def:ctrl-dep}
\josep{Let $C = (N,E)$ be a CFG.}
Vertex $b\josep{\in N}$ is \textit{control dependent} on vertex $a\josep{\in N}$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $a$'s successors. \josep{Lo que sigue es en realidad es un lema. No hace falta ponerlo como lema, pero sí sacarlo a después de la definicion.}It follows that a vertex with only one successor cannot be the source of control dependence.
Let $C = (N, E)$ be a CFG. $b \in N$ is \textit{control dependent} on $a \in N$ ($a \ctrldep b$) if and only if $b$ postdominates one but not all of $\{(a, n) |~(a, n) \in E, n \in N\}$ ($a$'s successors).
\end{definition}
\begin{definition}[Data dependency\sergio{dependency o dependence?} \carlos{add original citation}]
It follows that a node with less than two outgoing edges cannot be the source of control dependence.
\begin{definition}[Data dependency \cite{HorwitzRB88}]
\label{def:data-dep}
\josep{Let $C = (N,E)$ be a CFG.}
Vertex $b\josep{\in N}$ is \textit{data dependent} on vertex $a\josep{\in N}$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists a \carlos{could it be ``an''??} $x$-definition free path from $a$ to $b$.
Data dependency was originally defined as flow dependency, and split into loop and non--loop related dependencies\josep{creo que es loop-carried. Me parece que esta en el paper de Frank Tip}, but that distinction is no longer useful to compute program slices \sergio{Quien dijo que ya no es util? Vale la pena citarlo?}. \josep{Si que es useful en program slicing, pero no en debugging.}
It should be noted that variable definitions and uses can be computed for each statement independently, analysing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
Let $C = (N,E)$ be a CFG.
$b \in N$ is \textit{data dependent} on $a \in N$ ($a \datadep b$) if and only if $a$ may define a variable $x$, $b$ may use $x$ and there exists in $C$ a sequence of edges from $a$ to $b$ where $x$ is not defined.
\end{definition}
With the data and control dependencies, the PDG may be built by replacing the
Data dependency was originally defined as flow dependency, and subcategorized into loop-carried and loop-independent flow-dependencies, but that distinction is no longer used to compute program slices with the SDG. It should be noted that variable definitions and uses can be computed for each statement independently, analysing the procedures called by it if necessary. The variables used and defined by a procedure call are those used and defined by its body.
With the data and control dependencies, the PDG may now be built by replacing the
edges from the CFG by data and control dependence edges. The first tends to be
represented as a thin solid line, and the latter as a thick solid line. In the
examples, \added{data and control dependencies are represented by thin solid red and black lines respectively}\deleted{data dependencies will be thin solid red lines}.
represented as a thin dashed line or a thin solid coloured line; and the latter as a thin solid black line.
In the examples, data and control dependencies are represented by red and black solid lines, respectively.
\begin{definition}[Program dependence graph]
\label{def:pdg}
\josep{Given a program $P$,} The \textit{program dependence graph} (PDG) \josep{associated with $P$} is a directed graph (and originally a tree\sergio{???}\josep{sobran las aclaraciones historicas en una definicion}) represented by \josep{a triple $\langle N, E_c, E_d \rangle$ where $N$ is...} three elements: a set of nodes $N$, a set of control edges $E_c$ and a set of data edges $E_d$. \sergio{$PDG = \langle N, E_c, E_d \rangle$}
Method $M$, CFG $C = \langle N, E \rangle$, the PDG is $P = \langle N', E_c, E_d \rangle$, where
% $$E_c = \{ (a, b) | a, b \in N' \wedge a \ctrldep b\}$$
Given a method $M$, composed of statements $S = \{s_1, s_2, ... s_n\}$ and its associated CFG $C = (N, E)$, the \textit{program dependence graph} (PDG) of $M$ is a directed graph $G = \langle N', E_c, E_d \rangle$, where:
\begin{enumerate}
\vspace{-1em}
\item $N' = N \backslash \{End\}$ \vspace{-1em}
\item $(a, b) \in E_c \iff a, b \in N' \wedge a \ctrldep b ~ \wedge \not\exists c \in N' ~.~ a \ctrldep c \wedge c \ctrldep b$ \vspace{-1em}
\item $(a, b) \in E_d \iff a, b \in N' \wedge a \datadep b$
\item $N' = N~\backslash~\{\textnormal{Exit}\}$
\item $(a, b) \in E_c \iff a, b \in N' \wedge (a \ctrldep b \vee a = \textnormal{Enter}) ~ \wedge \not\exists c \in N' ~.~ a \ctrldep c \wedge c \ctrldep b$ (\textit{control edges})
\item $(a, b) \in E_d \iff a, b \in N' \wedge a \datadep b$ (\textit{data edges})
\end{enumerate}
The set of nodes corresponds to the set of nodes of the CFG\josep{que CFG? no se puede dar por hecho que existe un CFG en una definicion}, excluding the ``End'' node.
Both sets of edges are built as follows\josep{:}. There is a control edge between two nodes $n_1$ and $n_2$ if and only if $n_1 \ctrldep n_2$\sergio{acordarse de lo de evitar la generacion de arcos para prevenir la transitividad. Decidir si definimos Control arc como ua definicion aparte.}, and a data edge between $n_1$ and $n_2$ if and only if $n_1 \datadep n_2$. Additionally, if a node $n$ does not have any incoming control edges, it has a ``default'' control edge $e = (\textnormal{Start},n)$; so that ``Start'' is the only source node of the graph.
Note: \josep{dentro de una definicion no pueden haber notas. Esto va fuera}the most common graphical representation is a tree--like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable.
\end{definition}
\sergio{creo que en la definicion de CFG y PDG tiene que quedar mas claro que hay varios por programa (uno por funcion), para que esta ultima frase cobre mas sentido.}
Regarding the graphical representation of the PDG, the most common one is a tree-like structure based on the control edges, and nodes sorted left to right according to their position on the original program. Data edges do not affect the structure, so that the graph is easily readable.
Finally, the SDG is built from the combination of all the PDGs that compose the
program.
Finally, the SDG is built from the combination of all the PDGs for every method that compose the program:
\begin{definition}[System dependence graph]
\begin{definition}[System dependence graph \cite{HorwitzRB88}]
\label{def:sdg}
Given a program $P$ composed of a set of $n$ methods $M = \{m_0 ... m_n\}$ and their associated PDGs (each method $m_i$ has a PDG $G_{PDG}^i = \langle N^i, E_c^i, E_d^i \rangle$), the \textit{system dependence graph} (SDG) of $P$ is a graph $G = \langle N', E'_c, E'_d, E_{fc}, E_s \rangle$ where $N = \bigcup_{i=0}^n N^i$, $ $, $ $, $ $, and $ $.
\josep{Arreglar esta definicion como la del PDG. Ahora mismo es totalmente informal. Deberia definirse encima del PDG. Es decir, una SDG es la conexion adecuada de varios PDGs, uno por método. Y solo definir lo nuevo: call arcs, parameter-in arcs, parameter-out arcs y summary arcs.}
The \textit{system dependence graph} (SDG) is a directed graph that represents the control and data dependencies of a whole program. It has three kinds of edges: control, data and function call. The graph is built combining multiple PDGs, with the ``Start'' nodes labeled after the function they begin. There exists one function call edge between each node containing one or more calls and each of the ``Start'' node\josep{s} of the method called. In a programming language where the function call is ambiguous (e.g. with pointers or polymorphism), there exists one edge leading to every possible function called.\sergio{Esta definicion ha quedado muy informal no? Donde han quedado los $E_c,~E_d,~E_{fc},$ Nodes del PDG...?}
Given a program $P$, composed of a set of methods $M = \{m_0 ... m_n\}$ and their associated PDGs ---each method $m_i$ has a $PDG^i = \langle N^i, E_c^i, E_d^i \rangle$.
The \textit{system dependence graph} (SDG) of $P$ is a graph $G = \langle N, E_c, E_d, E_{call} \rangle$ where:
\begin{enumerate}
\item $N = \bigcup_{i=0}^n N^i$
\item $E_c = \bigcup_{i=0}^n E_c^i$
\item $E_d = \bigcup_{i=0}^n E_d^i$
\item $(a, b) \in E_{call}$ if and only if $a$ is a statement that contains a call and $b$ is a method ``Enter'' node of the function or method called by $a$. $(a, b)$ is a \textit{call edge}.
% These will be defined later when adding function calls.
% \item $E_{in}$ (\textit{parameter-input} or \textit{param-in edges})
% \item $E_{out}$ (\textit{parameter-output} or \textit{param-out edges})
% \item $E_{sum}$ (\textit{summary edges})
\end{enumerate}
\end{definition}
Regarding call edges, in programming languages with ambiguous method calls (those that have polymorphism or pointers), there may exist multiple outgoing call edges from a statement with a single method call.
To avoid confusion, the ``Enter'' nodes of each method are relabelled with their method's name.
\begin{example}[Creation of a SDG from a simple program]
Given the program shown below (left), the control flow graphs for both methods are shown on the right: \\
Consider the program shown on the left side of Figure~\ref{fig:simple-sdg-code}, where two procedures in a simple imperative language are shown. The CFG that corresponds to each procedure is shown on the right side.
\begin{figure}[h]
\begin{minipage}{0.2\linewidth}
\begin{lstlisting}
proc main() {
@ -127,18 +112,25 @@ proc f(x, y) {
\end{lstlisting}
\end{minipage}
\begin{minipage}{0.79\linewidth}
\centering
\includegraphics[width=0.6\linewidth]{img/cfgsimple}
\end{minipage}
\sergio{Centrar la figura, sobra mucho espacio a la derecha}
\caption{A simple imperative program composed of two procedures (left) and their associated CFGs (right).}
\label{fig:simple-sdg-code}
\end{figure}
Then, control and data dependencies are computed, arranging the nodes in the \josep{corresponding} PDG\josep{s (see the two PDGs inside the two squares below)}\sergio{FigureRef missing}. Finally, the two graphs are connected with summary edges\sergio{with que? esto no se sabe aun ni lo que es ni para que sirve. En todo caso function call edges, y si ese es el negro que va de f(a,b) a Start f() para diferenciarlo deberia ser de otro color} to create the SDG:
Then, the nodes of each CFG are rearranged, according to the control and data dependencies, to create the corresponding PDGs. Both are shown in Figure~\ref{fig:simple-sdg}, each bounded by a rectangle.
Finally, the two graphs are connected with a single call edge to form the SDG.
\begin{center}
\includegraphics[width=0.8\linewidth]{img/sdgsimple}
\end{center}
\begin{figure}[h]
\centering
\includegraphics[width=0.8\linewidth]{img/sdgsimple}
\caption{The SDG that corresponds to the program from Figure~\ref{fig:simple-sdg-code}.}
\label{fig:simple-sdg}
\end{figure}
\end{example}
\subsubsection{Function calls and data dependencies}
\subsubsection{Method calls and data dependencies}
\carlos{Vocabulary: when is appropriate the use of method, function and procedure????}\sergio{buena pregunta, yo creo que es jerarquico, method incluye function y procedure y los dos ultimos son disjuntos entre si no?} \josep{No. metodo implica orientacion a objetos. si estas hablando de un lenguaje en particular (p.e., Java), entonces debes usar el vocabulario de ese lenguaje (p.e., method). Si hablas en general y quieres usar una palabra que subsuma a todos, yo he visto dos maneras de hacerlo: (1) usar routine (aunque podrias usar otra palabra, por ejemplo metodo) la primera vez y ponerle una footnote diciendo que en el resto del articulo usamos routine para referirnos a metodo/funcion/procedimiento/predicado. (2) Usar metodo/funcion/procedimiento/predicado así, separado por barras. En esta tesina parece mas apropiado hablar de metodo, y la primera vez poner una footnote que diga que hablaremos de métodos, pero todos los desarrollos son igualmente aplicables a funciones y procedimientos.}
@ -147,15 +139,15 @@ In the original definition of the SDG, there was special handling of data depend
To such end, the following modifications are made to the different graphs:
\begin{description}
\item[CFG.] In each CFG, global variables read or modified and parameters are added to the label of the ``Start'' node in assignments of the form $par = par_{in}$ for each parameter and $x = x_{in}$ for global variables. Similarly, global variables and parameters modified are added to the label of the ``End'' node as \added{assignments of the form} $x_{out} = x$. \added{From now on, we will refer to the described assignments as input and output information respectively.} \sergio{\{}The parameters are only passed back if the value set by the called method can be read by the callee\sergio{\} no entiendo a que se refiere esta frase}. Finally, in method calls the same values must be packed and unpacked: each statement containing a function called is relabeled to contain \added{its related} input (of the form $par_{in} = \textnormal{exp}$ for parameters or $x_{in} = x$ for global variables) and output (always of the form $x = x_{out}$) \added{information}. \sergio{no hay parameter\_out? asumo entonces que no hay paso por valor?}
\item[PDG.] Each node \added{augmented with input or output information}\deleted{modified} in the CFG is \added{now} split into multiple nodes: the original \deleted{label}\added{node} \added{(Start, End or function call)} is the main node and each assignment \added{contained in the input and output information} is represented as a new node, which is control--dependent on the main one. Visually, \added{new nodes coming from the input information}\deleted{input is} \added{are} placed on the left and \added{the ones coming from the output information}\deleted{output} on the right; with parameters sorted accordingly.
\item[CFG.] In each CFG, global variables read or modified and parameters are added to the label of the ``Enter'' node in assignments of the form $par = par_{in}$ for each parameter and $x = x_{in}$ for global variables. Similarly, global variables and parameters modified are added to the label of the ``Exit'' node as \added{assignments of the form} $x_{out} = x$. \added{From now on, we will refer to the described assignments as input and output information respectively.} \sergio{\{}The parameters are only passed back if the value set by the called method can be read by the callee\sergio{\} no entiendo a que se refiere esta frase}. Finally, in method calls the same values must be packed and unpacked: each statement containing a function called is relabeled to contain \added{its related} input (of the form $par_{in} = \textnormal{exp}$ for parameters or $x_{in} = x$ for global variables) and output (always of the form $x = x_{out}$) \added{information}. \sergio{no hay parameter\_out? asumo entonces que no hay paso por valor?}
\item[PDG.] Each node \added{augmented with input or output information}\deleted{modified} in the CFG is \added{now} split into multiple nodes: the original \deleted{label}\added{node} \added{(Enter, Exit or function call)} is the main node and each assignment \added{contained in the input and output information} is represented as a new node, which is control--dependent on the main one. Visually, \added{new nodes coming from the input information}\deleted{input is} \added{are} placed on the left and \added{the ones coming from the output information}\deleted{output} on the right; with parameters sorted accordingly.
\item[SDG.] Three kinds of edges are introduced: parameter input (param--in), parameter output (param--out) and summary edges. Parameter input edges are placed between each method call's input node and the corresponding method definition input node. Parameter output edges are placed between each method definition's output node and the corresponding method call output node. Summary edges are placed between the input and output nodes of a method call, according to the dependencies inside the method definition: if there is a path from an input node to an output node, that shows a dependence and a summary method is placed in all method calls between those two nodes.\sergio{Tengo la sensacion de que la explicacion de que es un summary llega algo tarde y tal vez deberia estar en alguna definicion previa. Que opine Josep que piensa}\josep{Efectivamente. Llega tarde. No pueden definirse estas dependencias despues de definir el SDG, porque entonces lo que has definido en la definicion formal no es un SDG (solo una parte de el) y cuando hables de SDG a partir de ahora todo estara incompleto. Las definiciones son sagradas, así que hay dos soluciones: (1) explicar estos tres arcos antes de la definicion de SDG para poder definirlos formalmente en la definicion de SDG, o (2) retrasar la definiucion formal de SDG hasta aqui (para poder incluirlos). O cualquier otra cosa que haga que el SDG esté bien definido}
Note: \deleted{parameter input and output}\added{param-in and param-out} edges are separated because the traversal algorithm traverses them only sometimes (the output edges are excluded in the first pass and the input edges in the second).\sergio{delicado mencionar lo de las pasadas sin haber hablado antes de nada del algoritmo de slicing, a los que no sepan de slicing se les quedara el ojete frio aqui. Plantearse quitar esta nota.}\josep{Esta nota retrasala hasta que hables del algoritmo de slicing. En ese momento puedes decir que precisamente para que hayan dos pasadas se distingue entre parameter-ín y paramneter-out. Alli tendrá sentido y será aclaratorio. Aquí es confusorio. ;-)}
\end{description}
\begin{example}[Variable packing and unpacking]
Let it be \josep{Excelente cancion de los beatles. Buenísima. Pero mejor empieza así: Let $f(x, y)$ be a function with... ;-)} a function $f(x, y)$ with two integer parameters \added{which\josep{that} modifies the argument passed in its second parameter}, and a call $f(a + b, c)$, with parameters passed by reference if possible. The label of the method call node in the CFG would be ``\texttt{x\_in = a + b, y\_in = c, f(a + b, c)\josep{???}, c = y\_out}''; method $f$ would have \texttt{x = x\_in, y = y\_in} in the ``Start'' node and \texttt{y\_out = y} in the ``End'' node. The relevant section of the SDG would be: \josep{Todo este parrafo y la figura que sigue no se entienden. Hay que reescribirlo y explicarlo más detenidamente, paso a paso. Se supone que este es el ejmplo de la sección. El que va a aclarar las dudas de qué es $x_in$, etc. y de cómo funciona el SDG. Sin embargo, más que aclarar, lía (a uno que no sepa de slicing no le aclara nada). De hecho, para que se entendiera bien, una vez has construido el grafo, estaría bien continuar un poco el ejemplo explicando como las dependencias hacen que lo que hay dentro del método llamado depende (siguiendo los arcos) de lo que hay en el método llamador (o al menos de los parámetros de la llamada). Esto requiere un poco de texto explicativo.}
Let it be \josep{Excelente cancion de los beatles. Buenísima. Pero mejor empieza así: Let $f(x, y)$ be a function with... ;-)} a function $f(x, y)$ with two integer parameters \added{which\josep{that} modifies the argument passed in its second parameter}, and a call $f(a + b, c)$, with parameters passed by reference if possible. The label of the method call node in the CFG would be ``\texttt{x\_in = a + b, y\_in = c, f(a + b, c)\josep{???}, c = y\_out}''; method $f$ would have \texttt{x = x\_in, y = y\_in} in the ``Enter'' node and \texttt{y\_out = y} in the ``Exit'' node. The relevant section of the SDG would be: \josep{Todo este parrafo y la figura que sigue no se entienden. Hay que reescribirlo y explicarlo más detenidamente, paso a paso. Se supone que este es el ejmplo de la sección. El que va a aclarar las dudas de qué es $x_in$, etc. y de cómo funciona el SDG. Sin embargo, más que aclarar, lía (a uno que no sepa de slicing no le aclara nada). De hecho, para que se entendiera bien, una vez has construido el grafo, estaría bien continuar un poco el ejemplo explicando como las dependencias hacen que lo que hay dentro del método llamado depende (siguiendo los arcos) de lo que hay en el método llamador (o al menos de los parámetros de la llamada). Esto requiere un poco de texto explicativo.}
\begin{center}
\includegraphics[width=0.5\linewidth]{img/parameter-passing}
\end{center}
@ -191,7 +183,7 @@ The most popular approach was proposed by Ball and Horwitz~\cite{BalH93}, classi
\item[Pseudo--predicates.] Unconditional jumps (e.g. \texttt{break}, \texttt{goto}, \texttt{continue}, \texttt{return}); are like predicates, with the difference that the outgoing edge labeled \textit{false} is marked as non--executable\josep{---because there is no possible execution where such edge would be possible,\deleted{, and there is no possible execution where such edge would be possible,} according to the definition of the CFG (see Definition~\ref{def:cfg})---}. Originally the edges had a specific reasoning backing them up: the \textit{true} edge leads to the jump's destination and the \textit{false} one, to the instruction that would be executed if the unconditional jump was removed, or converted into a \texttt{no op}\sergio{no op o no-op?} (a blank operation that performs no change to the program's state). \sergio{\{}This specific behavior is used with unconditional jumps, but no longer applies to pseudo--predicates, as more instructions have used this category as means of ``artificially'' \carlos{bad word choice} generating control dependencies.\sergio{\}No entrar en este jardin, cuando se definio esto no se contemplaba la creacion de nodos artificiales. -Quita el originally, ahora es originally.}
\end{description}
\carlos{Pseudo--statements now have been introduced and are used to generate all control edges (for now just the Start method to the End).}\josep{No entiendo este CCC}
\carlos{Pseudo--statements now have been introduced and are used to generate all control edges (for now just the Enter method to the Exit).}\josep{No entiendo este CCC}
As a consequence of this classification, every instruction after an unconditional jump $j$ is control--dependent (either directly or indirectly) on $j$ and the structure containing it (\josep{a predicate such as }a conditional statement or a loop), as can be seen in the following example.
@ -221,7 +213,7 @@ static void f() {
\label{exa:unconditional}
Figure~\ref{fig:break-graphs} showcases a small program with a \texttt{break} statement, its CFG and PDG with a slice in grey\josep{No hables aún del slice. Primero presenta el programa, luego los grafos, luego el CS y finalmente el slice}. The slicing criterion (line 5, variable $a$) is control dependent on both the unconditional jump and its surrounding conditional instruction (both on line 4\josep{ponlos en lineas diferentes})\josep{. Therefore, the slice (all nodes in grey) includes the conditional jump and also the conditional exception. Note however that...}; even though it is not necessary to include it\sergio{a quien se refiere este it?} (in the context of weak slicing).
Note: the ``Start'' node $S$ is also categorized as a pseudo--statement, with the \textit{false} edge connected to the ``End'' node, therefore generating a dependence from $S$ to all the nodes inside the method. This removes the need to handle $S$ with a special case when converting a CFG to a PDG, but lowers the explainability of non--executable edges as leading to the ``instruction that would be executed if the node was absent or a no--op''.
Note: the ``Enter'' node $S$ is also categorized as a pseudo--statement, with the \textit{false} edge connected to the ``Exit'' node, therefore generating a dependence from $S$ to all the nodes inside the method. This removes the need to handle $S$ with a special case when converting a CFG to a PDG, but lowers the explainability of non--executable edges as leading to the ``instruction that would be executed if the node was absent or a no--op''.
\end{example}
The original paper\josep{que original paper? parece que hablas de alguno que hayas hablado antes, pero el lector ya no se acuerda. Empieza de otra manera...}~\cite{BalH93} does prove its completeness, but disproves its correctness by providing a counter--example similar to example~\ref{exa:nested-unconditional}. This proof affects both weak and strong slicing, so improvements can be made on this proposal. The authors postulate that a more correct approach would be achievable if the slice's restriction of being a subset of instructions were lifted.
@ -275,9 +267,9 @@ The \texttt{try-catch} statement can be compared to a \texttt{switch} which comp
\subsection{\texttt{throw} statement}
The \texttt{throw} statement compounds two elements in one instruction: an
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} instruction, but the second one requires a small addition to the CFG: there must be an alternative control flow, where the path of the exception is shown. For now\sergio{esto suena muy espanyol no? So far?}, without including \texttt{try-catch} structures, any exception thrown will exit its method with an error; so a new ``Error end'' node is needed.\sergio{No me convence esta frase, a ver como os suena esto (aunque no estoy muy convencido de ello) $\rightarrow$ So far, without including \texttt{try-catch} structures, any exception thrown would activate the mentioned ``exception mode" and leave its method with an error state. Hence, in order to represent this behaviour, a different exit point (represented with a node called ``Error end") need to be defined.} \deleted{T}\added{Consecuently, t}he pre-existing ``End'' node is renamed \added{as} ``Normal end'', \deleted{but now the}\added{leaving the} CFG \deleted{has}\added{with} two distinct sink nodes; which is forbidden in most slicing algorithms. To solve that problem, a general ``End'' node is created, with both normal and \deleted{exit}\added{error} ends connected to it; making it the only sink in the graph.
unconditional jump with a value attached and a switch to an ``exception mode'', in which the statement's execution order is disregarded. The first one has been extensively covered and solved; as it is equivalent to the \texttt{return} instruction, but the second one requires a small addition to the CFG: there must be an alternative control flow, where the path of the exception is shown. For now\sergio{esto suena muy espanyol no? So far?}, without including \texttt{try-catch} structures, any exception thrown will exit its method with an error; so a new ``Error end'' node is needed.\sergio{No me convence esta frase, a ver como os suena esto (aunque no estoy muy convencido de ello) $\rightarrow$ So far, without including \texttt{try-catch} structures, any exception thrown would activate the mentioned ``exception mode" and leave its method with an error state. Hence, in order to represent this behaviour, a different exit point (represented with a node called ``Error end") need to be defined.} \deleted{T}\added{Consecuently, t}he pre-existing ``Exit'' node is renamed \added{as} ``Normal end'', \deleted{but now the}\added{leaving the} CFG \deleted{has}\added{with} two distinct sink nodes; which is forbidden in most slicing algorithms. To solve that problem, a general ``Exit'' node is created, with both normal and \deleted{exit}\added{error} ends connected to it; making it the only sink in the graph.
In order to properly accommodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking is added to the ``Error exit'' node; same as the ``Exit''\sergio{Exit?End?Vaya cacao llevamos con esto xD} node in previous examples. This change constitutes an increase in precision, as now the outputted variables are differentiated\deleted{; f}\added{. F}or example\added{,} a slice which only requires the error exit may include less variable modifications than one which includes both.
In order to properly accommodate a method's output variables (global variables or parameters passed by reference that have been modified), variable unpacking is added to the ``Error exit'' node; same as the ``Exit''\sergio{Exit?Vaya cacao llevamos con esto xD} node in previous examples. This change constitutes an increase in precision, as now the outputted variables are differentiated\deleted{; f}\added{. F}or example\added{,} a slice which only requires the error exit may include less variable modifications than one which includes both.
This treatment of \texttt{throw} statements only modifies the structure of the CFG, without altering the other graphs, the traversal algorithm, or the basic definitions for control and data dependencies. That fact makes it easy to incorporate to any existing program slicer that follows the general model described. Example~\ref{exa:throw} showcases the new exit nodes and the treatment of the \texttt{throw}\sergio{ statement?} as if it were an unconditional jump whose destination is the ``Error exit''.

View file

@ -1,13 +1,13 @@
digraph g {
subgraph a {
Start [shape=box];
End [shape=box];
Start [shape=box,label="Enter"];
End [shape=box,label="Exit"];
f [label=<f (a, b)>]
Start -> "a = 10" -> "b = 20" -> f -> End;
}
subgraph b {
s [shape=box,label=<Start>];
End2 [shape=box,label=<End>];
s [shape=box,label=<Enter>];
End2 [shape=box,label=<Exit>];
s -> "while (x > y)" -> "x = x - 1" -> "while (x > y)" -> "print(x)" -> End2;
}
}

Binary file not shown.

View file

@ -1,6 +1,6 @@
digraph g {
subgraph cluster_a {
Start [shape=box,label="Start main()"];
Start [shape=box,label="Enter main()"];
l2 [label="a = 10"];
l3 [label="b = 20"];
l4 [label="f(a, b)"];
@ -22,7 +22,7 @@ digraph g {
}
subgraph cluster_b {
StartF [shape=box,label="Start f()"];
StartF [shape=box,label="Enter f()"];
l8 [label="while (x > y)"];
l9 [label="x = x + 1"];
l11 [label="print(x)"];

Binary file not shown.

View file

@ -93,6 +93,12 @@
\include{Secciones/state_of_the_art}
\include{Secciones/conclusion}
\chapter{TODO}
\begin{itemize}
\item \carlos{} Decide whether to use dependency or dependence (I suspect the plural form would also change: dependencies vs. dependences).
\end{itemize}
\bibliographystyle{plain}
\bibliography{../../../../../../Biblio/biblio.bib}