%ESOP 21 Reviewer 3: so this isn't built in specially? Or is it just that the tool expects the definition to be there because of the format? Does the definition have to be in syntactically exactly the form expected?

and the satisfiability or entailment queries in the SL-COMP format~\cite{SLCOMPsite}.

and the satisfiability or entailment queries in the SL-COMP format~\cite{DBLP:conf/tacas/SighireanuPRGIR19}.

%

\begin{compactitem}

\item For satisfiability queries,

...

...

@@ -71,9 +71,9 @@ and the satisfiability or entailment queries in the SL-COMP format~\cite{SLCOMPs

\medskip

\mypar{Benchmarks.}

We generated 133 benchmarks,

We generated 190 benchmarks,

available at~\cite{CompSpenSite} (directory \texttt{samples}),

%with 66 and 67 instances for the satisfiability and entailment problem respectively. These benchmarks are

%with 88 and 102 instances for the satisfiability and entailment problem respectively. These benchmarks are

classified into four suites, whose sizes are given in Table~\ref{tab-exp}, as follows:

%%MS: not clear what is the number here, leave it to the table.

%MEM-SAT (38), MEM-ENT (38), RANDOM-SAT (28), RANDOM-ENT (29), where SAT and ENT represent satisfiability and entailment respectively, moreover,

...

...

@@ -87,12 +87,12 @@ classified into four suites, whose sizes are given in Table~\ref{tab-exp}, as fo

or search an address inside a heap-list.

%

\item RANDOM-SAT and RANDOM-ENT are satisfiability resp. entailment problems

which are randomly generated.

which are randomly or manually generated.

Starting from the path and verification conditions for programs manipulating heap lists,

we replace some $\hls{}$ atoms with their unfoldings in order to generate

formulas with more spatial atoms.

RANDOM-SAT includes also formulas where the atoms and their start

and end address terms are generated randomly.

and end address terms are generated randomly. In addition, we generate some benchmarks manually.

This suite is motivated by testing the scalability of \cspenp.

%which usually contain more spatial atoms than those in MEM-SAT and MEM-ENT, motivated by testing the scalability of \cspenp.

\end{itemize}

...

...

@@ -118,7 +118,7 @@ classified into four suites, whose sizes are given in Table~\ref{tab-exp}, as fo

\medskip

\mypar{Experiments.}

We run \cspenp over the four benchmark suites, using a Ubuntu-16.04 64-bit lap-top with an Intel Core i5-8250U CPU and 2GB RAM.

The experimental results are summarized in Table~\ref{tab-exp}.

The experimental results are summarized in Table~\ref{tab-exp}. We set the timeout to 30 seconds. The statistics of average time and maximum time do not include the time of timeout instances.

To the best of our knowledge, there have not been solvers that are capable of solving the \slah\ formulas that include, points-to, block, and $\hls{}$ atoms.

The solver SLar~\cite{KimuraT17} was designed to solve entailment problems involving points-to, block, and $\ls{}$ atoms. Nevertheless, we are unable to find a way to access SLar, thus failing to compare with it on ASL formulas.

Moreover, the examples used by SLar, available online, are in a format that seems nontrivial to translate into the SL-COMP format.

...

...

@@ -134,24 +134,24 @@ To the best of our knowledge, there have not been solvers that are capable of so

%\small

\setlength{\tabcolsep}{1eX}

\renewcommand{\arraystretch}{1.2}

\begin{tabular}{| c | c | c | c | c |}

\begin{tabular}{| c | c | c | c | c | c |}

\hline

% & & Benchmark& \multicolumn{3}{c}{Time (in seconds)} \\\

Benchmark suite &$\#$instances & Avg. time (s) & Min. time (s) & Max. time (s) \\

Benchmark suite &$\#$instances &Timeout &Avg. time (s) & Min. time (s) & Max. time (s) \\

\hline

\hline

MEM-SAT & 38 & 0.05 & 0.02&0.12 \\

MEM-SAT & 38 &0 &0.05 & 0.03&0.12 \\

\hline

RANDOM-SAT &28& 0.04& 0.02 & 0.09\\

RANDOM-SAT &50 & 0& 0.09& 0.02 & 0.53\\

\hline

TOTAL &66& 0.05& 0.02 & 0.12\\

TOTAL &88 & 0& 0.07& 0.02 & 0.53\\

\hline

\hline

MEM-ENT &38 & 2.98& 0.73 & 9.05\\

MEM-ENT &43 & 0 & 3.05& 0.34& 9.98\\

\hline

RANDOM-ENT &29 &2.04& 0.04 &17.99\\

RANDOM-ENT &59 &7 & 6.66& 0.04 &29.57\\

\hline

TOTAL &67 & 2.57& 0.04 &17.99\\

TOTAL &102 & 7 & 5.02& 0.04 &29.57\\

\hline

\end{tabular}

\end{center}

...

...

@@ -162,14 +162,14 @@ As expected, solving entailment instances is more expensive

than solving satisfiability instances.

We recall from Section~\ref{sec:ent} that the procedure for entailment queries satisfiability of several formulas.

\cspenp\ efficiently solves the benchmark instances originated from programs,

namely MEM-SAT and MEM-ENT, with the average time in 0.05 and 2.98 seconds respectively.

namely MEM-SAT and MEM-ENT, with the average time in 0.05 and 3.05 seconds respectively.

% ESOP 21 Reviewer 3: In the evaluation, since many examples are automatically generated, it would be interesting to see what happens as the size of the example is increased. Of course, one understands the worst-case complexity from the theoretical sections, but whether performance degrades quickly in practice is less obvious given that this is a real implementation of the work which includes some heuristics for improving performance in practice.

Table~\ref{tab-exp} shows that some entailment instances

are challenging for \cspenp. %\todo{speed if size increase}

For instance, the maximum time in MEM-ENT and RANDOM-ENT suites are

9.05 and 17.99 seconds respectively.

For instance, the maximum time in MEM-ENT suite is

9.98, and there are 7 timeout instances in RANDOM-ENT suite.

%In order to pinpoint the potential reasons behind the long running time,

%we inspected these two entailment instances manually and find that

By inspected these challenging instances, we found that