Pointer arithmetic is widely used in low-level programs, e.g. memory allocators.

\mihaela{Version changed using CADE-28 reviews}

%

%The verification of such programs usually requires simultaneous reasoning about pointer arithmetic and unbounded heap regions.

The verification of such programs usually requires using pointer arithmetic inside inductive definitions to define the common data structures, e.g. heap lists, in memory allocators.

Pointer arithmetic is widely used in low-level programs, e.g. memory allocators.

%

%The verification of such programs usually requires simultaneous reasoning about pointer arithmetic and unbounded heap regions.

The verification of such programs usually requires using pointer arithmetic inside inductive definitions to define the common data structures, e.g. heap lists, in memory allocators.

%

In this work, we investigate a separation logic SLAH that allows pointer arithmetic inside inductive definitions, thus capable of specifying properties of programs manipulating heap lists.

%a common data structure used in memory allocators.

%

Pointer arithmetic inside inductive definitions is challenging for automated reasoning.

%

We propose some new ideas to tackle this challenge and achieve decision procedures for both satisfiability and entailment of SLAH formulas.

%

The crux of our decision procedure for satisfiability is to compute summaries of inductive definitions. We show that although the summary is naturally expressed as an existentially quantified non-linear arithmetic formula, it can actually be transformed into an equivalent linear arithmetic formula.

%

The decision procedure for entailment, on the other hand, has to match and split the spatial atoms according to the arithmetic relation between address variables.

%

We report on the implementation of these decision procedures and their

good performance in solving problems issued from the verification of

building block programs used in memory allocators.

%

\hide{

We study the decidability of the verification problem for

an extension of the array separation logic (ASL)

allowing the specification of data structures like

pointers, memory blocks or heap-lists,

which are lists built inside memory blocks.

The logic, called \slah, adds to ASL %an

inductively defined predicates specifying singly linked heap-lists.

%

We show that the logic is suitable for the compositional verification

of programs manipulating heap-lists, e.g., memory allocators.

%

We propose decision procedures for satisfiability and entailment of \slah,

which are substantial extensions of those proposed for ASL

because they have to deal with pointer arithmetic inside inductive definitions.

%

Our decision procedures propose some new ideas to tackle this challenge.

The crux of our decision procedure for satisfiability is

to compute a summary of the inductive definition of the heap-list predicate.

We show that although the summary is expressed as

an existentially quantified non-linear arithmetic constraint,

it can actually be transformed into an equivalent linear arithmetic formula.

%

The decision procedure for entailment

uses the decision procedure for satisfiability as an oracle

and deals with the pointer arithmetic in the inductive definitions.

%

We implemented the decision procedures and

did experiments to evaluate their performance.

The experimental results demonstrate the effectiveness of the solver

for deciding problems issued from the verification

@@ -19,6 +19,7 @@ for any stack $s$, $s\models \chi$

We suppose that both formula $\varphi$ and $\psi$ have been transformed in normal form as follows:

\begin{itemize}

\item atoms $\hls{}(x,y;v')$ with $\abs(\varphi)\models(v'<2\lor x=y)$ have been removed and replaced by $x=y$\mihaela{use $\xi$ ?}

\item atoms $\hls{}(x,y;v')$ with $\abs(\varphi)\models y-x=2\land v' \ge2$ have been replaced by $\exists z.~ z \le v' \land y-x=2 : ~x\pto z \star\blk(x+1,y)$

\item ...\mihaela{TODO}

\end{itemize}

The normalization can be done in a linear number of calls to the \EPbA\ procedure.

\section{\EPbA\ abstraction for \slah\ formulas}\label{app:sat-hls}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hide{

\subsection{{\EPbA} summary of $\hls{}$ atoms}

\label{ssec:sat-hls-abs}

\noindent{\bf Lemma~\ref{lem-}}\begin{lemma}[Summary of $\hls{}$ atoms]

Let $\hls{}(x, y; z)$ be an atom in \slah\

representing a non-empty heap, where $x, y, z$ are three distinct variables in $\cV$.

Then there is an {\EPbA} formula, denoted by $\abs^+(\hls{}(x,y; z))$, which summarizes $\hls{}(x, y; z)$, namely for every stack $s$$s \models\abs^+(\hls{}(x,y; z))$ iff there exists a heap $h$ such that $s, h \models\hls{}(x, y, z)$.

%Similarly for $ \hls{}(x, y; \infty)$ and $ \hls{}(x, y; d)$ with $d \in \NN$.

\end{lemma}

\begin{proof}

The constraint that the atom represents a non-empty heap means that

the inductive rule defining $\hls{}$ in Equation~(\ref{eq:hlsv-rec})

should be applied at least once. Notice that the semantics of this rule

defines, at each inductive step,

a memory block starting at $x$ and ending before $x'$ of size $x'-x$.

By induction on $k \ge1$, we obtain that $\hls{k}(x, y; z)$ defines

a memory block of length $y-x$ such that

$2 k \le y-x \le k z$. Then $\hls{}(x, y; z)$ is summarized by the formula $\exists k.\ k \ge1\wedge2k \le y -x \le kz$, which is a non-linear arithmetic formula.

%Let $t^\infty \equiv d_0 + \sum_{i=1}^n d_i v_i$

%with $d_i$ ($i\in\{0..n\}$) constants in $\NN$

% such that at least one $d_i > 0$ for $i\in\{1..n\}$,

%and $v_i$ ($i\in\{1..n\}$) variables in $\cV$.

The formula $\exists k.\ k \ge1\wedge2k \le y -x \le kz$ is actually equivalent to the disjunction of the two formulas corresponding to the following two cases:

\begin{itemize}

\item If $2= z$, then $\abs^+(\hls{}(x, y; z))$ has the formula $\exists k.\ k \ge1\land y -x =2k$ as a disjunct.

%

\item If $2 < z$, then we consider the following sub-cases:

\begin{itemize}

\item if $k =1$

then $\abs^+(\hls{}(x, y; z))$ contains the formula

$2\leq y-x \le z$ as a disjunct;

\item if $k \ge2$, then we observe that the intervals

$[2k, k z]$ and $[2(k+1), (k+1) z]$

overlap.

Indeed,

\begin{eqnarray*}

k z - 2(k+1) & = &

k (z -2) - 2

\ge k - 2 \ge 0.

\end{eqnarray*}

Therefore, $\bigcup\limits_{k \ge2}[2k, kz]=[4, \infty)$. It follows that the formula

$\exists k.\ k \ge2\land2k \le y-x \le k z$

is equivalent to $4\le y-x$. Therefore, $\abs^+(\hls{}(x, y; z))$ contains $4\le y-x$ as a disjunct.

%\subsection{\EPbA\ abstraction for \slah\ formulas}

%\label{ssec:sat-abs}

We utilize $\abs^+(\hls{}(x, y; z))$ defined in the above section

to obtain in polynomial time an equi-satisfiable \EPbA\ abstraction for a symbolic heap $\varphi$, denoted by $\abs(\varphi)$.

%This shows that the satisfiability problem of \slah\ is in NP. Moreover, it enables using the off-the-shelf SMT solvers e.g. Z3 to solve the satisfiability problem of \slah\ formulas.

%This abstraction extends the ones proposed in~\cite{BrotherstonGK17,DBLP:journals/corr/abs-1802-05935} for ASL to deal with the inductive predicate $\hls{}$.

We introduce some notations first.

%

Given a formula $\varphi\equiv\Pi : \Sigma$,

$\atoms(\varphi)$ denotes the set of spatial atoms in $\Sigma$, and $\patoms(\varphi)$ denotes the set of predicate atoms in $\Sigma$.

The abstraction of $\varphi\equiv\Pi : \Sigma$,

denoted by $\abs(\varphi)$ is

the formula $\Pi\wedge\phi_{\Sigma}\wedge\phi_*$

where:

\begin{itemize}

\item$\phi_{\Sigma}\triangleq\bigwedge\limits_{a \in\atoms(\varphi)}\abs(a)$ such that

\emph{A \slah\ formula $\varphi$ is satisfiable iff $\abs(\varphi)$ is satisfiable.}

\smallskip

\begin{proof}

\emph{``Only if'' direction}:

Suppose that $\varphi$ is satisfiable.

Then there exists a stack $s$ and a heap $h$

such that $s,h \models\varphi$.

We show $s \models\abs(\varphi)$.

%We build an interpretation $I$ of free variables in

%$\abs(\varphi)$ by using $s$ for location variables.

The semantics of \slah\ implies that at least one disjunct in $\abs(a)$

is true for every predicate atom $a$. Therefore, $s \models\phi_\Sigma$. Moreover, $s \models\phi_\sepc$, as a result of the semantics of separating conjunction. Thus $s \models\abs(\varphi)$ and $\abs(\varphi)$ is satisfiable.

\smallskip

\noindent\emph{``If'' direction}: Suppose that $\abs(\varphi)$ is satisfiable.

Then there is an interpretation $s$ such that $s \models\abs(\varphi)$.

We build a heap $h$ from $s$ and $\varphi$ as follows:

\begin{itemize}

\item For each points-to atom $t_1\pto t_2$ in $\varphi$, $h(s(t_1))= s(t_2)$.

%

\item For each block atom $\blk(t_1, t_2)$ in $\varphi$, $h(n)=1$ for each $n \in[s(t_1), s(t_2)-1]$.

%

\item For every predicate atom $\hls{}(t_1,t_2; t_3)$ in $\varphi$, we have $s \models\abs(\hls{}(t_1,t_2; t_3))$. Then either $s(t_1)= s(t_2)$ or $s(t_1) < s(t_2)$ and $s \models\abs^+(\hls{}(x, y; z))[t_1/x, t_2/y, t_3/z]$.

\begin{itemize}

\item For the formal case, let $h(s(t_1))$ undefined.

\item For the latter case, $s \models(2= t_3\wedge t_1 < t_2\wedge t_2- t_1\equiv0\bmod2)\vee(2 < t_3\wedge2\le t_2- t_1)$.

\begin{itemize}

\item If $s(t_3)=2$, then let $h(s(t_1)+2(i-1))=2$ and $h(s(t_1)+2i-1)=1$ for every $i: 1\le i \le\frac{s(t_2)- s(t_1)}{2}$.

\item If $s(t_3) > 2$, then from $s(t_2)- s(t_1)\ge2$, we know that there is a sequence of numbers $n_1, \cdots, n_\ell$ such that $2\le n_i \le s(t_3)$ for every $i \in[\ell]$ and $s(t_2)= s(t_1)+\sum\limits_{i \in[\ell]} n_i$. We then let $h(s(t_1)+\sum\limits_{j \in[i-1]} n_j)= n_i$ for every $i \in[\ell]$, and let $h(n')=1$ for all the other addresses $n' \in[s(t_1), s(t_2)-1]$.

\end{itemize}

From the construction above, we know that the subheap of the domain $[s(t_1), s(t_2)-1]$ satisfies $\hls{}(t_1, t_2; t_3)$.

\end{itemize}

\end{itemize}

From the definition of $h$, we know that $s, h \models\varphi$. Therefore, $\varphi$ is satisfiable.

%Let $k$ be the number of these unfoldings and $k$ addresses in $[I(t_1),I(t_2)-1]$

%at distance between $[c,I(t^\infty)]$ denoting the start of chunks in this atom.

%The others values in the heap are chosen arbitrary.

%Therefore, we obtain a model for $\varphi$ which satisfies all the constraints in its semantics.

\qed

\end{proof}

% ESOP'21, Reviewer 3: At the end of page 10, when discussing the abstraction idea, I wondered to what extent this is related to the decision procedures for ASL itself; is the overall approach new, or is it a similar approach with suitable extensions?

%Indeed, let $n$ be the initial number of variables and $\sigma$ the number of spatial atoms.

%The \PbA\ abstraction has $n+\sigma$ variables because of added boolean variables.

%The $\hls{}$ atoms produces a linear increase of the formula,

%and may introduce an existentially quantified variable.

%The biggest impact in the size of the abstraction is the translation of

%the separating conjunction, $\varphi_\sepc$, which size is in $O(\sigma)$.

\begin{remark}

%The Boolean variables $\isemp_a$ in $\abs(\varphi)$ are introduced for improving the readability. They are redundant in the sense that they can be equivalently replaced by the formula $\atomhead(a) = \atomtail(a)$.

From the construction of $\abs(\varphi)$, we know that $\abs(\varphi)$ is in {\EPbA} and the size of $\abs(\varphi)$ is polynomial in that of $\varphi$. From the fact that the satisfiability of \EPbA\ is in NP, we conclude that the satisfiability of \slah\ is in NP.

%

The formulas $\exists k.\ k \ge1\wedge2k = y - x $ in $\abs^+(\hls{}(x, y; z))$ can be equivalently replaced by $y -x > 0\wedge y -x \equiv0\bmod2$. Therefore, $\abs(\varphi)$ is essentially a quantifier-free \PbA\ formula containing modulo constraints. The satisfiability of such formulas is still NP-complete.

%From now on, we shall assume that {\bf $\abs(\varphi)$ is a quantifier-free \PbA\ formula containing modulo constraints}.

\end{remark}

\section{Proof of Lemma~\ref{lem-eub}}

\smallskip

\noindent{\bf Lemma~\ref{lem-eub}}.

\emph{For an {\slah} formula $\varphi\equiv\Pi: \hls{}(t'_1, t'_2; t'_3)$, an {\EPbA} formula $\xi_{eub,\varphi}(z)$ can be constructed in linear time such that for every store $s$ satisfying $s \models\abs(\varphi)$, we have $s[z \rightarrow\eub_\varphi(s)]\models\xi_{eub,\varphi}(z)$.

}

\smallskip

\begin{proof}

From the construction of $\abs^+(\hls{}(t'_1, t'_2; t'_3))$ in Section~\ref{sec:sat-hls}, we know that for every store $s$, there is a heap $h$ such that $s, h \models\Pi: \hls{}(t'_1, t'_2; t'_3)$ iff $s \models\Pi\wedge\big((t'_3=2\wedge\exists k.\ 2k = t'_2- t'_1)\vee(2 < t'_3\wedge2\le t'_2- t'_1)\big)$. Then the fact that $z$ appears in some unfolding scheme of $\varphi$ w.r.t. some store $s$

can be specified by the formula

\[

\Pi\wedge

\left(

\begin{array}{l}

(t'_3=2\wedge\exists k.\ 2k = t'_2- t'_1\wedge z =2)\ \vee\\

(2 < t'_3\wedge2\le z \le t'_3\ \wedge(2\le t'_2- t'_1- z \vee t'_2- t'_1= z))

\end{array}

\right),

\]

where $t'_2- t'_1- z$ denotes the remaining size of the heap after removing a memory chunk of size $z$. Therefore, we define $\xi_{eub,\varphi}(z)$ as

\[

\begin{array}{l}

\Pi\wedge

\left(

\begin{array}{l}

(t'_3=2\wedge\exists k.\ 2k = t'_2- t'_1\wedge z =2)\ \vee\\

\left(

\begin{array}{l}

2 < t'_3\wedge2\le z \le t'_3\wedge(2\le t'_2- t'_1- z \vee t'_2- t'_1= z)\ \wedge\\

where $\forall z'.\ z < z' \le t'_3\rightarrow\neg(2\le t'_2- t'_1- z' \vee t'_2- t'_1= z')$ asserts that $z$ is the maximum chunk size upper bound among unfolding schemes of $\varphi$. The formula $\forall z'.\ z < z' \le t'_3\rightarrow\neg(2\le t'_2- t'_1- z' \vee t'_2- t'_1= z')$ is equivalent to $\forall z'.\ z < z' \le t'_3\rightarrow(t'_2- t'_1- z' \le1\wedge t'_2- t'_1\neq z')$, and can be simplified into

Let $\varphi\equiv\Pi: \hls{}(t'_1, t'_2; t'_3)$, $\psi\equiv\hls{}(t_1, t_2; t_3)$, and $\preceq$ be a total preorder over $\addr(\varphi)\cup\addr(\psi)$ such that $C_\preceq\models t'_1 < t'_2\wedge t'_1= t_1\wedge t'_2= t_2$.

Then $\varphi\models_\preceq\psi$ iff $C_\preceq\wedge\abs(\varphi)\models\exists z.\ \xi_{eub, \varphi}(z)\wedge z \le t_3$.

%\item For every interpretation $I$ such that $I \models \abs(\varphi) \wedge t'_1 = t'_2$,

%there are a model $s, h$ of $\varphi$ such that $s$ and $I$ agree on $\fv(\varphi)$ (where $h$ is empty). From $\varphi \models \hls{}(t_1, t_2; t_3)$, we know that $s, h \models \hls{}(t_1, t_2; t_3)$. Thus $s(t_1) = s(t_2)$ since $h$ is empty.

%

%\item

At first, we observe that $C_\preceq\wedge\abs(\varphi)$ and $\varphi$ contain the same number of variables.

Let $s$ be an interpretation such that $s \models C_\preceq\wedge\abs(\varphi)$. From $C_\preceq\models t'_1 < t'_2\wedge t'_1= t_1\wedge t'_2= t_2$, we have $s(t'_1) < s(t'_2)$, $s(t'_1)= s(t_1)$, and $s(t'_2)= s(t_2)$.

At first, from $s \models\abs(\varphi)$ and Lemma~\ref{lem-eub}, we deduce that $s[z \rightarrow\eub_\varphi(s)]\models\xi_{eub,\varphi}(z)$.

%

Moreover, from the definition of $\eub_\varphi(s)$, we know that there is an unfolding scheme, say $sz_1, \cdots, sz_\ell$, such that $\eub_\varphi(s)=\max(sz_1, \cdots, sz_\ell)$. From the definition of unfolding schemes, we have $2\le sz_i \le s(t'_3)$ for every $i \in[\ell]$ and $s(t'_2)= s(t'_1)+\sum\limits_{i \in[\ell]} sz_i$. Therefore, there is a heap $h$ such that $s, h \models\varphi$ and for every $i \in[\ell]$, $h(s(t'_1)+\sum\limits_{j \in[i-1]} sz_j)= sz_i$. From the assumption $\varphi\models_\preceq\psi$, we deduce that $s, h \models\psi\equiv\hls{}(t_1, t_2; t_3)$. Then each chunk of $h$ has to be matched exactly to one unfolding of $\hls{}(t_1, t_2; t_3)$. Thus, $2\le sz_i \le s(t_3)$ for every $i \in[\ell]$. Therefore, $\eub_\varphi(s)=\max(sz_1, \cdots, sz_\ell)\le s(t_3)$. We deduce that $s[z \rightarrow\eub_\varphi(s)]\models\xi_{eub,\varphi}(z)\wedge z \le t_3$. Consequently, $s \models\exists z.\ \xi_{eub,\varphi}(z)\wedge z \le t_3$.

\medskip

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\noindent\emph{``If'' direction}: Suppose $C_\preceq\wedge\abs(\varphi)\models\exists z.\ \xi_{eub, \varphi}(z)\wedge z \le t_3$ and $s, h \models C_\preceq\wedge\Pi: \hls{}(t'_1, t'_2; t'_3)$. We show $s, h \models\psi$.

From $s, h \models C_\preceq\wedge\Pi: \hls{}(t'_1, t'_2; t'_3)$, we know that there are $sz_1, \cdots, sz_\ell$ such that they are the sequence of chunk sizes in $h$. Therefore, $(sz_1, \cdots, sz_\ell)$ is an unfolding scheme of $\varphi$ w.r.t. $s$. From the definition of $\eub_\varphi(s)$, we have $\eub_\varphi(s)$ is the maximum chunk size upper bound of the unfolding schemes of $\varphi$ w.r.t. $s$. Therefore, $\max(sz_1, \cdots, sz_\ell)\le\eub_\varphi(s)$.

From $s, h \models C_\preceq\wedge\Pi: \hls{}(t'_1, t'_2; t'_3)$, we deduce that $s \models C_\preceq\wedge\abs(\varphi)$. Because $C_\preceq\wedge\abs(\varphi)\models\exists z.\ \xi_{eub, \varphi}(z)\wedge z \le t_3$, we have $s \models\exists z.\ \xi_{eub, \varphi}(z)\wedge z \le t_3$. Then $s[z \rightarrow n]\models\xi_{eub, \varphi}(z)\wedge z \le t_3$ for some $n \in\NN$.

Thus $s[z \rightarrow n]\models\xi_{eub, \varphi}(z)$. Moreover, from Lemma~\ref{lem-eub}, we know that $s[z \rightarrow\eub_\varphi(s)]\models\xi_{eub, \varphi}(z)$. According to the definition of $\eub_\varphi(s)$, we deduce that $n =\eub_\varphi(s)$. Therefore, $\max(sz_1, \cdots, sz_\ell)\le\eub_\varphi(s)\le s(t_3)$. It follows that for every $i \in[\ell]$, $2\le sz_i \le s(t_3)$. We conclude that $s, h \models\hls{}(t_1, t_2; t_3)\equiv\psi$.

In this work, we investigated \slah, a separation logic fragment that allows pointer arithmetic inside inductive definitions so that the commonly used data structures e.g. heap lists can be defined.

We show that the satisfiability problem of {\slah} is NP-complete and

the entailment problem can be solved in doubly exponential time.

%We propose decision procedure for both problems calling satisfiability

% of Presburger arithmetic formulas.

%We show that the satisfiability problem is NP-complete and

% we give a decision procedure for it based on the translation to

% the existential fragment of Presburger arithmetic.

%We also demonstrate that the entailment problem for the quantifier free

% formulas is decidable with an upper bound complexity of double exponential time.

% We propose a decision procedure for the entailment also based on

% the translation to Presburger arithmetic.

%%The same classes of complexity are observed for these problems in ASL,

%% but for the quantified formulas in the entailment.

We implemented the decision procedures in a solver, \cspenp, and use it

to efficiently solve more than hundred problems issued from program verification

or randomly generated. There are several directions for the future work.

At first, it is interesting to see whether the decision procedures for heap lists in this work can be extended to tackle free lists, another common data structure in memory allocators \cite{conf/lopstr/FangS16}. Moreover, since bit operations are also widely used in memory allocators, it would also be interesting to automate the reasoning about bit operations inside inductive definitions.

%%MS: moved into introduction

%\smallskip

%This study is original to our knowledge, although the separation logic