1 Introduction
Data exchange is the problem of transferring data from a source schema to a target schema, where the transfer process is usually described via socalled schema mappings: a set of logical assertions specifying how the data should be moved and restructured. Furthermore, the target schema may have its own constraints to be satisfied. Schema mappings and target constraints are usually encoded via standard database dependencies: tuplegenerating dependencies (TGDs) and equalitygenerating dependencies (EGDs). Thus, given an instance I over the source schema $\mathsf{S}$ , the goal is to materialize an instance J over the target schema $\mathsf{T}$ , called solution, in such a way that I and J together satisfy the dependencies.
Since multiple solutions might exist, a precise semantics for answering queries is needed. By now, the certain answers semantics is the most accepted one. The certain answers to a query is the set of all tuples that are answers to the query in every solution of the data exchange setting (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005).Although it has been formally shown that for positive queries (e.g., conjunctive queries) the notion of solution of of (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005) is the right one to use, for more general queries such solutions become inappropriate, as they easily lead to counterintuitive results.
Example 1 Consider a data exchange setting denoted by $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , where $\mathsf{S}$ is the source schema, storing product orders in a binary relation $\mathsf{Ord}$ , with the first argument being the id of an order, and the second argument specifying whether the order has been paid. Moreover, $\mathsf{T}$ is the target schema having unary relations $\mathsf{AllOrd}$ and $\mathsf{Paid}$ , storing all orders and the paid orders, respectively. The schema mapping is described by the following sourcetotarget TGDs $\Sigma_{st}$ :
In this example, we assume that the set of target dependencies $\Sigma_t$ is empty. The above schema mapping states that all orders in the source schema must be copied to the $\mathsf{AllOrd}$ relation, and all the paid orders must be copied to the $\mathsf{Paid}$ relation. Assume the source instance is as follows:
and assume we want to pose the query Q over the target schema asking for all the unpaid orders. This can be written as the following firstorder (FO) query:
One would expect the answer to be $\{2\}$ , since the schema mapping above is simply copying I to the target schema, and hence $J = \{\mathsf{AllOrd}(1),\mathsf{AllOrd}(2),\mathsf{Paid}(1)\}$ should be the only candidate solution. However, under the classical notion of solution of (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005), also the instance $J' = \{\mathsf{AllOrd}(1),\mathsf{AllOrd}(2),\mathsf{Paid}(1),\mathsf{Paid}(2)\}$ is a solution (since $I \cup J'$ satisfies the TGDs), and every order in J’ is paid. Hence, the certain answers to Q, which are computed as the intersection of the answers over all solutions, are empty.
The issue above arises because the classical notion of solution is too permissive, in that it allows the existence of facts in a solution that have no support from the source (e.g., $\mathsf{Paid}(2)$ in the solution J’ of Example 1 above).
Some efforts exist in the literature that provide alternative notions of solutions for which certain answers to general queries become more meaningful. Prime examples are the works of (Hernich et al. Reference Hernich, Libkin and Schweikardt2011) and (Hernich Reference Hernich2011). In both approaches, the certain answers in the example above are $\{2\}$ . However, the works above have their own drawbacks too. In (Hernich et al. Reference Hernich, Libkin and Schweikardt2011), socalled CWAsolutions are introduced, which are a subset of the classical solutions with some restrictions. However, these restrictions are so severe that certain answers over such solutions fail to capture certain answers over classical solutions, when focusing on positive queries. Moreover, even when focusing on more general queries, answers can still be counterintuitive, as shown in the following example.
Example 2 Consider the data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , where $\mathsf{S}$ stores employees of a company in the unary relation $\mathsf{Emp}$ . For some employees, the city they live in is known, and it is stored in the binary relation $\mathsf{KnownC}$ . The target schema $\mathsf{T}$ contains the binary relation $\mathsf{EmpC}$ , storing employees and the cities they live in, and the binary relation $\mathsf{SameC}$ , storing pairs of employees living in the same city. The sets $\Sigma_{st} = \{\rho_1,\rho_2\}$ and $\Sigma_t = \{\rho_3,\eta\}$ are as follows (for simplicity, we omit the universal quantifiers):
The above setting copies employees from the source to the target. The TGD $\rho_1$ states that every copied employee x must have some city z associated, whereas $\rho_2$ states that when the city y of an employee x is known, this should be copied as well. Moreover, the target schema requires that employees living in the same city should be stored in relation $\mathsf{SameC}$ ( $\rho_3$ ), and each employee must live in only one city ( $\eta$ ). Assume the source instance is
and consider the query Q that asks for all pairs of employees living in different cities. This can be written as:
One would expect that the set of certain answers to Q is empty, since it is not certain that $\mathsf{john}$ and $\mathsf{mary}$ live in different cities. However, no CWAsolution admits $\mathsf{mary}$ and $\mathsf{john}$ to live in the same city, and thus $(\mathsf{\mathsf{john}},\mathsf{mary})$ is a certain answer under the CWAsolutionbased semantics.
The approach of (Hernich Reference Hernich2011), where the notion of GCWA $^*$ solution is presented, seems to be the most promising one. For positive queries, certain answers w.r.t. GCWA $^*$ solutions coincide with certain answers w.r.t. classical solutions. Moreover, GCWA $^*$ solutions solve some other limitations of CWAsolutions, like the one discussed in Example 2. However, the practical applicability of this semantics is somehow limited, since the (rather involved) construction of GCWA $^*$ solutions easily makes certain query answering undecidable, even for very simple settings with only two sourcetotarget TGDs, and no target dependencies.
Other semantics have been proposed in (Libkin and Sirangelo Reference Libkin and Sirangelo2011), but they are only defined for data exchange settings without target dependencies. Hence, one needs to assume that the target schema has no dependencies at all.
As a final remark, in a data exchange setting, it might be the case that the source is not always available, and thus the materialization of a single solution, over which certain answers can be computed, is a desirable requirement. This is especially true when using weaklyacyclic dependencies, which form the standard language for data exchange (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005). However, none of the semantics above allow for the materialization of such a special solution, for weaklyacyclic settings.
In this paper, we propose a new notion of data exchange solution, dubbed supported solution, which allows us to deal with general queries, but at the same time is suitable for practical applications. That is, we show that certain answers under supported solutions naturally generalize certain answers under classical solutions, when focusing on positive queries. Moreover, such solutions do not make any assumption on how values associated to existential variables compare to other values, hence solving issues like the ones of Example 2.
As expected, there is a price to pay to get meaningful answers over general queries: we show that certain answering is undecidable for general settings, but it becomes $\text{co}\text{NP}\text{complete}$ when we focus on weaklyacyclic dependencies.
Moreover, we show that exact answers under supported solutions for general queries in weaklyacyclic settings can be computed via an encoding into logic programming with the wellknown choice construct, allowing one to use efficient offtheshelf reasoning systems.
Finally, we also show that if one is not willing to incur the high complexity of exact certain answers for weaklyacyclic settings, then it is possible to construct a target instance in polynomial time, which is similar in spirit to a universal solution of (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005), that can be exploited for computing exact answers, for positive queries, and approximate answers, for general FO queries, in polynomial time. The latter is achieved by adapting existing approximation algorithms originally defined for querying incomplete databases.
2 Preliminaries
Basics. We consider pairwise disjoint countably infinite sets $\mathsf{Const}$ , $\mathsf{Var}$ , and $\mathsf{Null}$ of constants, variables, and labeled nulls, respectively. Nulls are denoted by the symbol ${\perp}$ , possibly subscripted. A term is a constant, a variable, or a null. We additionally assume the existence of a countably infinite set $\mathsf{Rel}$ of relations, disjoint from the previous ones. A relation R has an arity, denoted ar(R), which is a nonnegative integer. We also use $R/n$ to say that R is a relation of arity n. A schema is a set of relations. A position is an expression of the form R[i], where R is a relation and $i \in \{1,\ldots,ar(R)\}$ .
An atom $\alpha$ (over a schema $\mathsf{S}$ ) is of the form $R(\mathbf{t})$ , where R is an nary relation (of $\mathsf{S}$ ) and $\mathbf{t}$ is a tuple of terms of length n. We use $\mathbf{t}[i]$ to denote the ith term in $\mathbf{t}$ , for $i \in \{1,\ldots,n\}$ . An atom without variables is a fact. An instance I (over a schema $\mathsf{S}$ ) is a finite set of facts (over $\mathsf{S}$ ). A database D is an instance without nulls. For a set of atoms A, $\mathsf{dom}(A)$ is the set of all terms in A, whereas $\mathsf{var}(A)$ is the set $\mathsf{dom}(A) \cap \mathsf{Var}$ . A homomorphism from a set of atoms A to a set of atoms B is a function $h : \mathsf{dom}(A) \rightarrow \mathsf{dom}(B)$ that is the identity on $\mathsf{Const}$ , and such that for each atom $R(\mathbf{t}) = R(t_1,\ldots,t_n) \in A$ , $R(h(\mathbf{t}))=R(h(t_1),\ldots,h(t_n)) \in B$ .
Dependencies. A TGD $\rho$ (over a schema $\mathsf{S}$ ) is a FO formula of the form $\forall {\textbf {x}},{\textbf {y}}\, \varphi({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ , where ${\textbf {x}},{\textbf {y}},{\textbf {z}}$ are disjoint tuples of variables, and $\varphi$ and $\psi$ are conjunctions of atoms (over $\mathsf{S}$ ) without nulls, and over the variables in ${\textbf {x}},{\textbf {y}}$ and ${\textbf {y}},{\textbf {z}}$ , respectively. The body of $\rho$ , denoted $\mathsf{body}(\rho)$ , is $\varphi({\textbf {x}},{\textbf {y}})$ , whereas the head of $\rho$ , denoted $\mathsf{head}(\rho)$ , is $\psi({\textbf {y}},{\textbf {z}})$ . We use $\mathsf{exvar}(\rho)$ to denote the tuple ${\textbf {z}}$ and $\mathsf{fr}(\rho)$ to denote the tuple ${\textbf {y}}$ , also called the frontier of $\rho$ . An EGD $\eta$ (over a schema $\mathsf{S}$ ) is a FO formula of the form $\forall {\textbf {x}}\, \varphi({\textbf {x}}) \rightarrow x=y$ , where ${\textbf {x}}$ is a tuple of variables, $\varphi$ a conjunction of atoms (over $\mathsf{S}$ ) without nulls, and over ${\textbf {x}}$ , and $x,y \in {\textbf {x}}$ . The body of $\eta$ , denoted $\mathsf{body}(\eta)$ , is $\varphi({\textbf {x}})$ , and the head of $\eta$ , denoted $\mathsf{head}(\eta)$ , is the equality $x=y$ . For clarity, we will omit the universal quantifiers in front of dependencies and replace the conjunction symbol $\wedge$ with a comma. Moreover, with a slight abuse of notation, we sometimes treat a conjunction of atoms as the set of its atoms. Consider an instance I. We say that I satisfies a TGD $\rho$ if for every homomorphism h from $\mathsf{body}(\rho)$ to I, there is an extension h’ of h such that h’ is a homomorphism from $\mathsf{head}(\rho)$ to I. We say that I satisfies an EGD $\eta = \varphi({\textbf {x}}) \rightarrow x=y$ , if for every homomorphism h from $\mathsf{body}(\eta)$ to I, $h(x)=h(y)$ . I satisfies a set of TGDs and EGDs $\Sigma$ if I satisfies every TGD and EGD in $\Sigma$ .
Queries. A query $Q(\mathbf{x})$ , with free variables $\mathbf{x}$ , is a FO formula $\varphi({\textbf {x}})$ with free variables ${\textbf {x}}$ . The arity of $Q(\mathbf{x})$ , denoted ar(Q), is the number $\mathbf{x}$ . The output of $Q(\mathbf{x})$ over an instance I, denoted Q(I), is the set $\{\mathbf{t} \in \mathsf{dom}(I)^{\mathbf{x}} \mid I \models \varphi(\mathbf{t})\}$ , where $\models$ is FO entailment.^{ Footnote 1 } A query is Boolean if it has arity 0, in which case its output over an instance is either the empty set or the empty tuple $\langle \rangle$ . A conjunctive query (CQ) is a query of the form $Q(\mathbf{x}) = \exists \mathbf{y}\, \varphi(\mathbf{x},\mathbf{y})$ , where $\varphi(\mathbf{x},\mathbf{y})$ is a conjunction of atoms over $\mathbf{x}$ and $\mathbf{y}$ . A union of conjunctive queries (UCQs) is a query of the form $Q(\mathbf{x}) = \bigvee^n_{i=1} Q_i(\mathbf{x})$ , where each $Q_i({\textbf {x}})$ is a CQ. We refer to UCQs also as positive queries.
Data Exchange Settings. A data exchange setting (or simply setting) is a tuple of the form $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , where $\mathsf{S},\mathsf{T}$ are disjoint schemas, called source and target schema, respectively; $\Sigma_{st}$ is a finite set of TGDs, called the sourcetotarget TGDs of $\mathcal{S}$ , such that for each TGD $\rho \in \Sigma_{st}$ , $\mathsf{body}(\rho)$ is over $\mathsf{S}$ and $\mathsf{head}(\rho)$ is over $\mathsf{T}$ ; $\Sigma_t$ is a finite set of TGDs and EGDs over $\mathsf{T}$ , called the target dependencies of $\mathcal{S}$ . We say $\mathcal{S}$ is TGDonly if $\Sigma_t$ contains only TGDs.
A source (resp., target) instance of $\mathcal{S}$ is an instance I over $\mathsf{S}$ (resp., $\mathsf{T}$ ). We assume that source instances are databases, that is, they do not contain nulls. Given a source instance I of $\mathcal{S}$ , a solution of I w.r.t. $\mathcal{S}$ is a target instance J of $\mathcal{S}$ such that $I \cup J$ satisfies $\Sigma_{st}$ and J satisfies $\Sigma_t$ (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005). We use $\mathsf{sol}(I,\mathcal{S})$ to denote the set of all solutions of I w.r.t. $\mathcal{S}$ .
Given a data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , a source instance I of $\mathcal{S}$ and a query Q over $\mathsf{T}$ , the certain answers to Q over I w.r.t. $\mathcal{S}$ is the set $\mathsf{cert}_{\mathcal{S}}(I,Q) = \bigcap_{J \in \mathsf{sol}(I,\mathcal{S})} Q(J)$ .
To distinguish between the notion of solution (resp., certain answers) above and the one defined in Section 3, we will refer to the former as classical.
A universal solution of I w.r.t. $\mathcal{S}$ is a solution $J \in \mathsf{sol}(I,\mathcal{S})$ such that, for every $J' \in \mathsf{sol}(I,\mathcal{S})$ , there is a homomorphism from J to J’ (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005). Letting $Q(J)_{\downarrow} = Q(J) \cap \mathsf{Const}^{{\textbf {x}}}$ , for any instance J and query $Q({\textbf {x}})$ , the following result from (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005) is wellknown:
Consider a data exchange setting $\mathcal{S}$ , a source instance I of $\mathcal{S,}$ and a positive query Q. If J is a universal solution of I w.r.t. $\mathcal{S}$ , then $ \mathsf{cert}_{\mathcal{S}}(I,Q) = Q(J)_{\downarrow}$ .
3 Semantics for general queries
The goal of this section is to introduce a new notion of solution for data exchange that we call supported. As already discussed, the main issue we want to solve w.r.t. classical solutions is that such solutions are too permissive, that is, they allow for the presence of facts that are not a certain consequence of the source instance and the dependencies. Consider again Example 1. The (classical) solution J’ in Example 1 is not supported, since from the source instance I and the dependencies, we cannot conclude that the fact $\mathsf{Paid}(2)$ should occur in the target. On the other hand, the solution $J = \{\mathsf{AllOrd}(1),\mathsf{AllOrd}(2),\mathsf{Paid}(1)\}$ is supported: it contains precisely the facts supported by I and the dependencies, and no more than that. Similarly, considering Example 2, the instance $J = \{\mathsf{EmpC}(\mathsf{john},\mathsf{miami})$ , $\mathsf{EmpC}(\mathsf{mary},\mathsf{chicago})$ , $\mathsf{SameC}(\mathsf{john},\mathsf{mary})\}$ is a solution, but it is not supported, since from the source and the dependencies we cannot certainly conclude that $\mathsf{john}$ and $\mathsf{mary}$ live in the same city. We now formalize the above intuitions.
Consider a TGD $\rho$ and a mapping h from the variables of $\rho$ to $\mathsf{Const}$ . We say that a TGD $\rho'$ is a ground version of $\rho$ (via h) if $\rho' = h(\mathsf{body}(\rho)) \rightarrow h(\mathsf{head}(\rho))$ .
Definition 1 (exchoice)
An exchoice is a function $\gamma$ , that given as input a TGD $\rho = \varphi({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ and a tuple $\mathbf{t} \in \mathsf{Const}^{{\textbf {y}}}$ , returns a set $\gamma(\rho,\mathbf{t})$ of pairs of the form (z,c), one for each existential variable $z \in \mathsf{exvar}(\rho)$ , where c is a constant of $\mathsf{Const}$ . Note that if $\rho$ does not contain existential variables, $\gamma(\rho,\mathbf{t})$ is the empty set.
Intuitively, given a TGD, an exchoice specifies a valuation for the existential variables of the TGD which depends on a given valuation of its frontier variables.
We now define when a ground version of a TGD indeed assigns existential variables according to an exchoice.
Definition 2 (Coherence)
Consider a TGD $\rho = \varphi({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ , an exchoice $\gamma$ and a ground version $\rho'$ of $\rho$ via some mapping h. We say that $\rho'$ is coherent with $\gamma$ if for each existential variable $z \in \mathsf{exvar}(\rho)$ , $(z,h(z)) \in \gamma(\rho,h({\textbf {y}}))$ . For a set $\Sigma$ of TGDs and EGDs, and an exchoice $\gamma$ , $\Sigma^\gamma$ denotes the set of dependencies obtained from $\Sigma$ by replacing each TGD $\rho$ in $\Sigma$ with all ground versions of $\rho$ that are coherent with $\gamma$ . Note that the set $\Sigma^\gamma$ can be infinite. We are now ready to present our notion of solution.
Definition 3 (Supported Solution)
Consider a setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ and a source instance I of $\mathcal{S}$ . A target instance J of $\mathcal{S}$ is a supported solution of I w.r.t. $\mathcal{S}$ if there exists an exchoice $\gamma$ such that $I \cup J$ satisfies $\Sigma_{st}^\gamma$ and J satisfies $\Sigma_t^\gamma$ , and there is no other target instance $J' \subsetneq J$ of $\mathcal{S}$ such that $I \cup J'$ satisfies $\Sigma_{st}^\gamma$ and J’ satisfies $\Sigma_t^\gamma$ .
Note that a supported solution contains no nulls. We use $\mathsf{ssol}(I,\mathcal{S})$ to denote the set of all supported solutions of I w.r.t. $\mathcal{S}$ .
Example 3 Consider the data exchange setting $\mathcal{S}$ and the source instance I of Example 2. The target instance $J = \{\mathsf{EmpC}(\mathsf{john},\mathsf{miami}),\mathsf{EmpC}(\mathsf{mary},\mathsf{chicago})\}$ is a supported solution of I w.r.t. $\mathcal{S}$ . Indeed, consider the exchoice $\gamma$ such that $\gamma(\rho_1,\mathsf{john}) = \{ (z,\mathsf{miami})\}$ , and $\gamma(\rho_1,\mathsf{mary}) = \{(z,\mathsf{chicago})\}$ . Then, $\Sigma_{st}^\gamma$ is
whereas $\Sigma_t^\gamma$ is the set containing the EGD $\eta$ of Example 2, and the set of TGDs
Clearly, $I \cup J$ satisfies $\Sigma_{st}^\gamma$ , and J satisfies $\Sigma_t^\gamma$ , and any other strict subset J’ of J is such that $I \cup J'$ does not satisfy $\Sigma_{st}^\gamma$ . Another supported solution is $\{\mathsf{EmpC}(\mathsf{john},\mathsf{miami})$ , $\mathsf{EmpC}(\mathsf{mary},\mathsf{miami})$ , $\mathsf{SameC}(\mathsf{john},\mathsf{mary})\}$ .
With the notion of supported solution in place, it is now straightforward to define the supported certain answers.
Definition 4 (Supported Certain Answers)
Consider a data exchange setting $\mathcal{S}$ , a source instance I of $\mathcal{S,}$ and a query Q over $\mathsf{T}$ . The supported certain answers to Q over I w.r.t. $\mathcal{S}$ is the set of tuples $\mathsf{scert}_{\mathcal{S}}(I,Q) = \bigcap_{J \in \mathsf{ssol}(I,\mathcal{S})} Q(J)$ .
Example 4 Consider the data exchange setting $\mathcal{S}$ , the source instance I, and the query Q of Example 1. It is not difficult to see that the only supported solution of I w.r.t. $\mathcal{S}$ is the instance
Thus, the supported certain answers to Q over I w.r.t. $\mathcal{S}$ are $ \mathsf{scert}_{\mathcal{S}}(I,Q) = Q(J) = \{2\}$ . Consider now the data exchange setting $\mathcal{S}$ , the source instance I, and the query Q of Example 2. Then, one can verify that $\mathsf{scert}_{\mathcal{S}}(I,Q) = \emptyset$ .
We now start establishing some important results regarding supported solutions and supported certain answers. The following theorem states that supported solutions are a refined subset of the classical ones, but whether a supported solution exists is still tightly related to the existence of a classical one.
Theorem 2 Consider a data exchange setting $\mathcal{S}$ . For every source instance I of $\mathcal{S}$ , it holds that:

1. $\mathsf{ssol}(I,\mathcal{S}) \subseteq \mathsf{sol}(I,\mathcal{S})$ , and

2. $\mathsf{ssol}(I,\mathcal{S}) = \emptyset$ iff $\mathsf{sol}(I,\mathcal{S}) = \emptyset$ .
Proof.
Item 1 follows by definition. For proving Item 2, it suffices to show that $\mathsf{sol}(I,\mathcal{S}) \neq \emptyset$ implies $\mathsf{ssol}(I,\mathcal{S}) \neq \emptyset$ . Let $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ and consider a solution $J \in \mathsf{sol}(I,\mathcal{S})$ . We construct from J a supported solution $\hat{J}$ in $\mathsf{ssol}(I,\mathcal{S})$ . Let J’ be one of the minimal subsets of J such that J’ is still a solution of $\mathsf{sol}(I,\mathcal{S})$ . Moreover, let $\hat{J}$ be the instance obtained from J’, where each null $\perp$ in J’ is replaced with a new constant $c_\perp$ not occurring in $\Sigma_{st} \cup \Sigma_t$ and J’. Since $\hat{J}$ and J’ are the same instance, up to null renaming, we conclude that $\hat{J}$ is also a solution in $\mathsf{sol}(I,\mathcal{S})$ . To see that $\hat{J}$ is a supported solution, consider the following exchoice $\gamma$ . For every TGD $\rho \in \Sigma_{st} \cup \Sigma_t$ , and every tuple $\mathbf{t}$ of constants such that there exists a homomorphism h from $\mathsf{body}(\rho)$ to $\hat{J}$ , and $\mathbf{t} = h(\mathsf{fr}(\rho))$ , let $\gamma(\rho,\mathbf{t}) = \{(z,h(z)) \mid z \in \mathsf{exvar}(\rho)\}$ . By construction of $\gamma$ , $I \cup \hat{J}$ satisfies $\Sigma_{st}^\gamma$ , and $\hat{J}$ satisfies $\Sigma_t^{\gamma}$ . Since $\hat{J}$ is minimal, that is, for every $J'' \subsetneq \hat{J}$ , $J'' \not \in \mathsf{sol}(I,\mathcal{S})$ , from Item 1 of this claim, every $J'' \subsetneq J$ is such that $J'' \not \in \mathsf{ssol}(I,\mathcal{S})$ , that is, either $I \cup J''$ does not satisfy $\Sigma_{st}^\gamma$ or J” does not satisfy $\Sigma_t^\gamma$ . Thus, $\hat{J}$ is a supported solution of $\mathsf{ssol}(I,\mathcal{S})$ , and the claim follows.
Regarding certain answers, we show that supported solutions indeed enjoy an important property: supported certain answers and classical certain answers coincide, when focusing on positive queries. Note that this does not necessarily follow from Theorem 2.
Theorem 3 Consider a setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ and a positive query Q over $\mathsf{T}$ . For every source instance I of $\mathcal{S}$ , $\mathsf{scert}_{\mathcal{S}}(I,Q) = \mathsf{cert}_{\mathcal{S}}(I,Q)$ .
Proof. The fact that $\mathsf{cert}_{\mathcal{S}}(I,Q) \subseteq \mathsf{scert}_{\mathcal{S}}(I,Q)$ , follows from Item 1 of Theorem 2. To prove that $\mathsf{scert}_{\mathcal{S}}(I,Q) \subseteq \mathsf{cert}_{\mathcal{S}}(I,Q)$ , assume $\mathbf{t} \not \in \mathsf{cert}_{\mathcal{S}}(I,Q)$ , which means that there exists a solution J of I w.r.t. $\mathcal{S}$ such that $\mathbf{t} \not \in Q(J)$ . Since Q is positive, and hence monotone, $\mathbf{t} \not \in Q(J)$ iff $\mathbf{t} \not \in Q(J')$ , where J’ is one of the minimal subsets of J such that J’ is still a solution of I w.r.t. $\mathcal{S}$ . Let $\hat{J}$ be the instance obtained from J’, where each null $\perp$ in J’ is replaced with a new constant $c_\perp$ not occurring in $\mathbf{t}$ , Q, $\Sigma_{st} \cup \Sigma_t$ , and J’. With a similar discussion to the one given in the proof of Theorem 2, we conclude that $\hat{J}$ is a supported solution of I w.r.t. $\mathcal{S}$ . Since Q is positive, and since $\mathbf{t}$ and Q do not contain any of the constants introduced in J’, we conclude that $\mathbf{t} \not \in Q(\hat{J})$ , which implies that $\mathbf{t} \not \in \mathsf{scert}_{\mathcal{S}}(I,Q)$ , and the claim follows.
From the above, we conclude that for positive queries, certain query answering can be performed as done in the classical setting, and thus all important results from that setting, like query answering via universal solutions, carry over.
Corollary 1 Consider a setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ and a positive query Q over $\mathsf{T}$ . If J is a (classical) universal solution of I w.r.t. $\mathcal{S}$ , then $ \mathsf{scert}_{\mathcal{S}}(I,Q) = Q(J)_{\downarrow}$ .
Proof. It follows from Theorems 1 and 3.
We now move to the complexity analysis of the two most important data exchange tasks: deciding whether a supported solution exists, and computing the supported certain answers to a query.
4 Complexity
In data exchange, it is usually assumed that a setting $\mathcal{S}$ does not change over time, and a given query Q is much smaller than a given source instance. Thus, for understanding the complexity of a data exchange problem, it is customary to assume that $\mathcal{S}$ and Q are fixed, and only I is considered in the complexity analysis, that is, we consider the data complexity of the problem. Hence, the problems we are going to discuss will always be parametrized via a setting $\mathcal{S}$ , and a query Q (for query answering tasks). The first problem we consider is deciding whether a supported solution exists; $\mathcal{S}$ is a fixed data exchange setting.
The above problem is very important in data exchange, as one of the main goals is to actually construct a target instance that can be exploited for query answering purposes. Hence, knowing in advance whether at least a supported solution exists is of paramount importance.
Thanks to Item 2 of Theorem 2, all the complexity results for checking the existence of a classical solution can be directly transferred to our problem.
Theorem 4 There exists a data exchange setting $\mathcal{S}$ such that $\mathsf{EXISTS\text{}SSOL}(\mathcal{S})$ is undecidable.
Proof. It follows from Theorem 2 and from the fact that there exists a data exchange setting $\mathcal{S}$ such that checking whether a classical solution exists is undecidable (Kolaitis et al. Reference Kolaitis, Panttaja and Tan2006).
Despite the negative result above, we also inherit positive results from the literature, when focusing on some of the most important data exchange scenarios, known as weaklyacyclic. Such settings only allow target TGDs to belong to the language of weaklyacyclic TGDs, which have been first introduced in the seminal paper (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005), and is now wellestablished as the main language for data exchange purposes.
We start by introducing the notion of weakacyclicity. We recall that for a schema $\mathsf{S}$ , $\mathsf{pos}(\mathsf{S})$ denotes the set of all positions R[i], where $R/n \in \mathsf{S}$ and $i \in \{1,\ldots,n\}$ , and for a TGD $\rho = \varphi({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ , $\mathsf{fr}(\rho)$ denotes the tuple ${\textbf {y}}$ .
Definition 5 (Dependency Graph (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005))
Consider a set $\Sigma$ of TGDs over a schema $\mathsf{S}$ . The dependency graph of $\Sigma$ is a directed graph $\mathsf{dg}(\Sigma)=(N,E)$ , where $N = \mathsf{pos}(\mathsf{S})$ and E contains only the following edges. For each $\rho \in \Sigma$ , for each $x \in \mathsf{fr}(\rho)$ , and for each position $\pi$ in $\mathsf{body}(\rho)$ where x occurs:

there is a normal edge $(\pi,\pi') \in E$ , for each position $\pi'$ in $\mathsf{head}(\rho)$ where x occurs, and

there is a special edge $(\pi,\pi') \in E$ , for each position $\pi'$ in $\mathsf{head}(\rho)$ where an existentially quantified variable $z \in \mathsf{exvar}(\rho)$ occurs.
Definition 6 A set of TGDs $\Sigma$ is weaklyacyclic if no cycle in $\mathsf{dg}(\Sigma)$ contains a special edge. A data exchange setting $\langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ is weaklyacyclic if the set of TGDs in $\Sigma_t$ is weaklyacyclic.
Example 5 The settings of Examples 1 and 2 are weaklyacyclic, whereas the data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , where $\mathsf{S} = \{S/2\}$ , $\mathsf{T} = \{T/2\}$ , $\Sigma_{st} = \{S(x,y) \rightarrow T(x,y)\}$ , and $\Sigma_t = \{T(x,y) \rightarrow \exists z\, T(y,z)\}$ is not, since (T[2],T[2]) is a special edge in $\mathsf{dg}(\Sigma_t)$ .
The following result follows.
Theorem 5 For every weaklyacyclic data exchange setting $\mathcal{S}$ , $\mathsf{EXISTS\text{}SSOL}(\mathcal{S})$ is in PTIME.
Proof.
It follows from Theorem 2 and (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005, Corollary 3.10).
We now move to the second crucial task: computing supported certain answers. Since this problem outputs a set, it is standard to focus on its decision version. For a fixed data exchange setting $\mathcal{S}$ and a fixed query Q, we consider the following decision problem:
One can easily show that the above problem is logspace equivalent to the one of computing the supported certain answers.
We start by studying the problem in its full generality, and show that there is a price to pay for query answering with general queries.
Theorem 6 There exists a data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , with $\Sigma_t$ having only TGDs, and a query Q over $\mathsf{T}$ , such that $\mathsf{SCERT}(\mathcal{S},Q)$ is undecidable.
Proof.
We provide a polynomialtime reduction from the Embedding Problem for Finite Semigroups $\mathsf{EMB}$ (Reference Kolaitis, Panttaja and Tan Kolaitis et al. 2006 ). The reduction is an adaptation of the one used for proving Proposition 6.1 in (Hernich et al. Reference Hernich, Libkin and Schweikardt2011). Inputs of $\mathsf{EMB}$ are pairs of the form A,f, where A is a finite set, and f is a partial function of the form $f: A \times A \rightarrow A$ . The question is whether there exists a finite set $B \supseteq A$ , and a total function $g : B \times B \rightarrow B$ , such that g is associative,^{ Footnote 2 } and g extends f, that is, whenever f(a,b) is defined, $g(a,b) = f(a,b)$ .
Let us first introduce some notation. Consider a finite set A and a partial function $f : A \times A \rightarrow A$ . We define the instance:
Consider now the data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , where $\mathsf{S} = \{\mathsf{F}/3\}$ and $\mathsf{T} = \{\mathsf{G}/3\}$ . Intuitively, the relation $\mathsf{F}$ collects all the triples a,b,c such that $f(a,b) = c$ , whereas the relation $\mathsf{G}$ collects all the triples of the extended associative function g. The sets $\Sigma_{st}$ and $\Sigma_t$ are defined as $\Sigma_{st} = \{\mathsf{F}(x,y,z) \rightarrow \mathsf{G}(x,y,z)\}$ and $\Sigma_t = \{\mathsf{G}(x,y,z) \rightarrow \exists x',y',z'\ \mathsf{G}(x',y',z') \wedge \mathsf{Aux}(x,y,z)\}$ . Roughly $\Sigma_{st}$ is in charge of forcing the function stored in $\mathsf{G}$ to be an extension of the function stored in $\mathsf{F}$ , whereas $\Sigma_t$ is in charge of adding additional entries to $\mathsf{G}$ .
The difference with the construction of (Hernich et al. Reference Hernich, Libkin and Schweikardt2011) is in the set $\Sigma_t$ . Here, the head of the only TGD in $\Sigma_t$ has an additional auxiliary atom $\mathsf{Aux}(x,y,z)$ . Intuitively, since the set $\Sigma_t$ is in charge of extending the function defined by the relation $\mathsf{F}$ by introducing additional terms, in order for these terms to be actually introduced in a supported solution, we require that every body variable is also a frontier variable. Regarding our query Q, it is the same as the one in (Hernich et al. Reference Hernich, Libkin and Schweikardt2011). Hence, instead of giving the precise expression of Q, we only describe its properties. The query Q over $\mathsf{T} = \{\mathsf{G}/3\}$ is a Boolean query which is true (i.e., the empty tuple is its only output) if either $\mathsf{G}$ does not encode a function, that is, it maps the same pair (a,b) to different terms, or $\mathsf{G}$ does not encode an associative function, or $\mathsf{G}$ does not encode a total function. In other words, Q checks whether $\mathsf{G}$ does not encode a solution for $\mathsf{EMB}$ .
We are now ready to present the reduction. Let A be a finite set and $f : A \times A \rightarrow A$ be a partial function. The reduction constructs the source instance $I_{A,f}$ and the empty tuple $\mathbf{t} = ()$ . Clearly, $I_{A,f}$ can be constructed in polynomial time w.r.t. $I$ . It remains to show that A,f is a “yes”instance of $\mathsf{EMB}$ iff $\mathbf{t} \not \in \mathsf{scert}_{\mathcal{S}}(I_{A,f},Q)$ .
(Only if direction) Assume $\mathbf{t} \not \in \mathsf{scert}_{\mathcal{S}}(I_{A,f},Q)$ . Then, there exists a supported solution $J \in \mathsf{ssol}(I_{A,f},\mathcal{S})$ of $I_{A,f}$ w.r.t. $\mathcal{S}$ such that $\mathbf{t} \not \in Q(J)$ . By definition of supported solution, J is finite and it only contains atoms with relation $\mathsf{G}$ . Thus, by definition of $\mathcal{S}$ , $\mathbf{t} \not \in Q(J)$ implies that J necessarily encodes an extension of f, which is also total and associative.
(If direction) Assume A,f is a “yes”instance of $\mathsf{EMB}$ , and let $B \supseteq A$ be a finite set, and $g : B \times B \rightarrow B$ be the total associative function that extends f. Then, consider the instance J over $\mathsf{T}$ defined as $J = \{\mathsf{G}(a,b,c) \mid a,b,c \in B \text{ and } g(a,b) = c \}$ . It is not difficult to verify that J is a supported solution of $I_{A,f}$ w.r.t. $\mathcal{S}$ . Finally, by construction of J, $\mathbf{t} \not \in Q(J)$ as needed.
Although the complexity result above tells us that computing supported certain answers might be infeasible in some settings, we can show that for weaklyacyclic settings, the complexity is more manageable. In particular, we prove that in this case, the problem is in $\text{ co}\text{ NP}$ and that this complexity bound is tight (i.e., there exist weaklyacyclic settings and queries for which the problem is $\text{co}\text{NP}\text{hard}$ ). We first focus on the upper bound.
Theorem 7 For every weaklyacyclic setting $\mathcal{S}$ and every query Q, $\mathsf{SCERT}(\mathcal{S},Q)$ is in $\text{co}\text{NP}$ .
Proof. We provide a nondeterministic polynomialtime procedure for solving the complement of the problem $\mathsf{SCERT}(\mathcal{S},Q)$ , when $\mathcal{S}$ is a weaklyacyclic data exchange setting. That is, given a source instance I of $\mathcal{S}$ and a tuple $\mathbf{t} \in \mathsf{Const}^{ar(Q)}$ , the procedure nondeterministically constructs a supported solution $J^*$ of I w.r.t. $\mathcal{S}$ (if one exists), and checks whether $\mathbf{t} \not \in Q(J^*)$ . Let $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , and consider a source instance I of $\mathcal{S}$ , a query Q over $\mathsf{T}$ , and a tuple $\mathbf{t} \in \mathsf{Const}^{ar(Q)}$ .
The procedure is defined in two parts. The first part is in charge of nondeterministically constructing a supported solution $J^*$ . If the procedure was not able to construct a supported solution (i.e., no such solution exists, or it followed a wrong computation path), the procedure sets $J^* =\; \perp$ . The second part simply verifies whether either $J^*=\; \perp$ , in which case it rejects, or it checks whether $\mathbf{t} \not \in Q(J^*)$ , in which case accepts, otherwise rejects. The second part can be easily implemented by a deterministic polynomialtime procedure; we now show the first procedure constructing $J^*$ .
This procedure implements a variation of the socalled semioblivious chase algorithm; we refer the reader to (Marnette Reference Marnette2009) for more details. In the following, for each TGD $\rho \in \Sigma_{st} \cup \Sigma_t$ , let $\mathsf{Chosen}_\rho$ be a fresh relation, not occurring in $\mathsf{S} \cup \mathsf{T}$ , of arity $\mathsf{fr}(\rho)$ .

1. Let $J_0 = I$ , and let the current step be $i = 0$ .

2. If $J_i$ does not satisfy the EGDs in $\Sigma_t$ , then let $J^* =\; \perp$ and halt;

3. If $J_i$ satisfies the EGDs in $\Sigma_t$ , and no TGD $\rho \in \Sigma_{st} \cup \Sigma_t$ and homomorphism h from $\mathsf{body}(\rho)$ to $J_i$ exist such that $\mathsf{Chosen}_\rho(h(\mathsf{fr}(\rho))) \not \in J_i$ , then let $J^*$ be $J_i$ after removing all atoms over $\mathsf{S}$ and the atoms using the $\mathsf{Chosen}$ predicates, and halt.

4. Otherwise, guess a TGD $\rho_i \in \Sigma_{st} \cup \Sigma_t$ and a homomorphism $h_i$ from $\mathsf{body}(\rho_i)$ to $J_i$ such that $\mathsf{Chosen}_{\rho_i}(h_i(\mathsf{fr}(\rho_i))) \not \in J_i$ , and guess an extension $h_i'$ of $h_i$ such that, for each $z \in \mathsf{exvar}(\rho_i)$ , $h_i'(z)=c^i_z$ , where either $c^i_z$ is a constant occurring in one of $\mathcal{S}$ , I, Q, or a fresh new constant. Finally, let $J_{i+1} = J_i \cup h_i'(\mathsf{head}(\rho_i)) \cup \{\mathsf{Chosen}_{\rho_i}(h_i(\mathsf{fr}(\rho_i))) \}$ . Let $i := i + 1$ and goto 2.
To show that the procedure above terminates after a polynomial number of steps, we can use a similar argument to the one given for proving Theorem 3.9 in (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005). We now show that, for every target instance J of $\mathcal{S}$ , a run of the above procedure halting with $J^* = J$ exists iff J is a supported solution of I w.r.t. $\mathcal{S}$ , and the claim will follow. We focus on one of the two directions, as the other direction can be proved in a similar way.
Assume there is a run of n steps of the procedure above with $J^* = J$ , for some target instance J of $\mathcal{S}$ , and let $\rho_i$ , $h_i,$ and $c^i_z$ , for $z \in \mathsf{exvar}(\rho_i)$ be the TGD, homomorphism, and constants guessed at step i in the run. Let $\gamma$ be the exchoice such that, for each $i \in \{1,\ldots,n\}$ , $\gamma(\rho_i,h_i(\mathsf{fr}(\rho_i))) = \{(z,c^i_z) \mid z \in \mathsf{exvar}(\rho_i)\}$ . The fact that $\gamma$ is indeed an exchoice follows from the fact that at each step $i \in \{1,\ldots,n\}$ , a constant $c^i_z$ is introduced only if $\mathsf{Chosen}_{\rho_i}(h_i(\mathsf{fr}(\rho_i))) \not \in J_i$ , which in turn implies that no constant has been chosen at some step $j<i$ , where $h_j(\mathsf{fr}(\rho_j)) = h_i(\mathsf{fr}(\rho_i))$ . By definition of the procedure, J is the instance obtained from $J_n$ where all the atoms with relations in $\mathsf{S}$ or of the form $\mathsf{Chosen}_{\rho}$ are removed. Hence, by construction, $I \cup J$ satisfies $\Sigma_{st}^\gamma$ and J satisfies all the TGDs in $\Sigma_t^\gamma$ . Since $J \neq\; \perp$ , J also satisfies the EGDs in $\Sigma_t^\gamma$ . Moreover, no $J' \subsetneq J$ is such that $I \cup J'$ satisfies $\Sigma_{st}^\gamma$ and J’ satisfies $\Sigma_t^\gamma$ . If this is the case, let $\alpha \in J \setminus J'$ , and let $i \in \{1,\ldots,n\}$ be the step in the above run where $\alpha$ is added in $J_{i+1}$ . Then, the TGD $\rho' = h_i(\rho_i) \rightarrow h_i'(\mathsf{head}(\rho_i))$ is in $\Sigma_{st}^\gamma \cup \Sigma_t^\gamma$ , by construction of $\gamma$ . However J’ does not satisfy $\rho'$ . The latter, together with the previous discussion implies that J is a supported solution of I w.r.t. $\mathcal{S}$ .
We point out that the above result is in contrast with all the data exchange semantics discussed in the introduction, for which computing certain answers is undecidable, even for weaklyacyclic settings (Hernich et al. Reference Hernich, Libkin and Schweikardt2011; Hernich Reference Hernich2011).
We now move to the lower bound and show that the $\text{co}\text{NP}$ upper bound is tight.
Theorem 8 There exists a weaklyacyclic setting $\mathcal{S}$ that is TGDonly and a query Q such that $\mathsf{SCERT}(\mathcal{S},Q)$ is $\text{co}\text{NP}\text{hard}$ .
Proof.
The $\text{co}\text{NP}\text{hard}$ ness is proved via a reduction from 3colorability to the complement of our problem. We encode the input graph $G = (V,E)$ as the instance
Colorings are constructed in the setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , via the sourcetotarget TGDs ( $\Sigma_t$ is empty):
where $\mathsf{Col_t}$ collects all colors, $\mathsf{E_t}$ contains the edges of the graph in the target schema, and $\mathsf{HasC}$ assigns a term to each node of the graph.
The Boolean query $Q = Q_1 \vee Q_2$ is true over an instance of the target schema iff the instance does not encode a valid 3coloring. In particular, $Q_1$ checks whether the “color” used for some node differs from $\mathsf{r},\mathsf{g}, \mathsf{b}$ :
whereas $Q_2$ checks whether the nodes of an edge have the same color:
We prove that G admits a 3coloring iff $\mathbf{t} = () \not \in \mathsf{scert}_{\mathcal{S}}(I_G,Q)$ .
(Only if direction) Assume G admits a 3coloring $\mu$ and consider the instance
It is not difficult to see that J is a supported solution of $I_G$ w.r.t. $\mathcal{S}$ . Clearly, $\mathbf{t} \not \in Q(J)$ and the claim follows.
(If direction) Assume that G does not admit a 3coloring, and consider an arbitrary supported solution J of $I_G$ w.r.t. $\mathcal{S}$ . Note that for every edge $(u,v) \in E$ , we have that $\mathsf{E_t}(u,v) \in J$ and $\mathsf{HasC}(u,c_1),\mathsf{HasC}(v,c_2) \in J$ , for some constants $c_1,c_2$ . We distinguish two cases. Assume that there is an edge $(u,v) \in E$ such that $c_1 \not \in \{\mathsf{r},\mathsf{g},\mathsf{b}\}$ or $c_2 \not \in \{\mathsf{r},\mathsf{g},\mathsf{b}\}$ . Thus, $\mathbf{t} \in Q_1(J)$ which implies $\mathbf{t} \in Q(J)$ . Assume now that for every edge $(u,v) \in E$ , $c_1,c_2 \in \{\mathsf{r},\mathsf{g},\mathsf{b}\}$ . Thus, since G does not admit a 3coloring, for at least one edge $(u,v) \in E$ , $c_1 =c_2$ . Hence, $\mathbf{t} \in Q_2(J)$ , which implies that $\mathbf{t} \in Q(J)$ and the claim follows.
We point out that the query employed in the proof of the above theorem is a simple Boolean combination of CQs. This kind of FO queries have been studied in the context of incomplete databases, for example, see (Gheerbrant and Libkin Reference Gheerbrant and Libkin2015). However, differently from the incomplete databases setting, where such queries guarantee query answering in polynomial time, the complexity in our setting is higher, due to the presence of TGDs; the latter is true even for weaklyacyclic TGDs, as shown by Theorem 8 above. Similarly, arbitrary FO queries (e.g., involving also universal quantification) behave very differently depending on the given setting. For example, according to Theorem 7, for any FO query, supported certain answers remain in $\text{co}\text{NP}$ , under weaklyacyclic settings, while for arbitrary settings, the use of universal quantification makes supported certain answering undecidable; the latter is a consequence of the proof of Theorem 6. Hence, one cannot directly conclude much on the complexity of supported certain answers by considering the query alone, as done for querying incomplete databases.
We conclude this section by recalling that for positive queries, supported certain answers coincide with the classical ones (Theorem 3), and computing (classical) certain answers for weaklyacyclic settings, under positive queries, is tractable (Fagin et al. Reference Fagin, Kolaitis, Miller and Popa2005). Hence, the result below follows.
Corollary 2 For every weaklyacyclic setting $\mathcal{S}$ and every positive query Q, $\mathsf{SCERT}(\mathcal{S},Q)$ is in PTIME.
5 Exact query answering via logic programming
In this section, we show how to compute supported certain answers exactly by means of a translation into logic programming under the stable model semantics, that is, answer set programming (ASP). First, we need to recall the syntax and semantics of logic programs. In particular, we focus on a fragment of logic programs that is enough for our purposes, which is Datalog with (possibly nonstratified) negation, which means we do not allow for function symbols or disjunctive rules.
Syntax. A literal L is an expression of the form $\alpha$ or $\neg \alpha$ , where $\alpha$ is either an atom without nulls, or the expression $t_1 = t_2$ , where $t_1,t_2$ are variables or constants; we write $t_1 \neq t_2$ for $\neg t_1 = t_2$ . We say that L is positive (resp., negative) if $L = \alpha$ (resp., $L = \neg \alpha$ ). If a literal contains no variables, it is said to be ground.
A rule r is an expression of the form
with $n \ge 0$ , $m \ge 0$ , and where H is either a positive literal or the symbol $\perp$ , $A_1,\ldots,A_n$ are positive literals, and $\neg B_1,\ldots,\neg B_m$ are negative literals. We denote $\mathsf{head}(r) = \{H\}$ as the head of r, while $\mathsf{body}(r) = \{A_1,\ldots,A_n, \neg B_1,\ldots,\neg B_m\}$ is the body of r; we use $\mathsf{body}^+(r)$ to denote $\{A_1,\ldots,A_n\}$ , and $\mathsf{body}^(r)$ to denote $\{B_1,\ldots,B_m\}$ . If $H =\; \perp$ , we say that r is a constraint. If $m=0$ , we say the rule is positive; if r contains no variables, it is said to be ground. We say the rule r is safe if every variable in the rule occurs in some literal of $\mathsf{body}^+(r)$ . We will require every rule to be safe (besides being a common requirement, safe rules suffice for our purposes).
As customary, we will consider two kinds of sets of rules:

1. finite sets of rules of the form $H \text{ : }$ , with $H \neq \bot$ (notice that such rules must be ground because of safety), which are commonly used to represent databases – a set of this kind will be called an extensional database;

2. finite sets of rules of any other form – a set of this kind will be called a program.
Semantics. Let $\mathcal{P}$ be a program and $\mathit{ED}$ an extensional database. We will often use $\mathcal{P}_{\mathit{ED}}$ to denote the set $\mathcal{P} \cup \mathit{ED}$ . The Herbrand universe of $\mathcal{P}_{\mathit{ED}}$ , denoted $\mathsf{\mathsf{U}}(\mathcal{P}_{\mathit{ED}})$ , is the set of all constants occurring in $\mathcal{P}_{\mathit{ED}}$ . The Herbrand base of $\mathcal{P}_{\mathit{ED}}$ , denoted $\mathsf{base}(\mathcal{P}_{\mathit{ED}})$ , is the set of all atoms that can be built using relations and constants occurring in $\mathcal{P}_{\mathit{ED}}$ . A ground version of a rule $r \in \mathcal{P}_{\mathit{ED}}$ is a ground rule r’ that can be obtained from r by replacing all occurrences of each variable x of r with some constant from $\mathsf{\mathsf{U}}(\mathcal{P}_{\mathit{ED}})$ .
The grounding of $\mathcal{P}_{\mathit{ED}}$ , denoted $\mathsf{ground}(\mathcal{P}_{\mathit{ED}})$ , is the set of rules obtained from $\mathcal{P}_{\mathit{ED}}$ by replacing each rule $r \in \mathcal{P}_{\mathit{ED}}$ with all its ground versions.
We say that an instance I satisfies a ground positive literal L if either L is of the form $\alpha = \beta$ and $\alpha$ and $\beta$ are the same constant, or L is an atom occurring in I. Furthermore, we say that I satisfies a ground negative literal $\neg L$ , if I does not satisfy L. Finally, I satisfies a set of ground literals if I satisfies each literal in it.
Consider a rule $r \in \mathsf{ground}(\mathcal{P}_{\mathit{ED}})$ and an instance I. We say that I satisfies r if, either r is a constraint and I does not satisfy $\mathsf{body}(r)$ , or I satisfies $\mathsf{body}(r)$ implies that I satisfies $\mathsf{head}(r)$ (notice that an empty body is always satisfied).
A model of $\mathcal{P}_{\mathit{ED}}$ is an instance M such that $M \subseteq \mathsf{base}(\mathcal{P}_{\mathit{ED}})$ and such that M satisfies each rule of $\mathsf{ground}(\mathcal{P}_{\mathit{ED}})$ . We say that M is minimal if there is no other model M’ of $\mathcal{P}$ such that $M' \subsetneq M$ . We use $\mathsf{MM}(\mathcal{P}_{\mathit{ED}})$ to denote the set of all minimal models of $\mathcal{P}$ .
The reduct of $\mathcal{P}_{\mathit{ED}}$ w.r.t. some instance I is the set of ground rules obtained from $\mathsf{ground}(\mathcal{P}_{\mathit{ED}})$ by removing each rule r for which I does not satisfy $\mathsf{body}^(r)$ , and by removing all negative literals from the body of each rule r for which I satisfies $\mathsf{body}^(r)$ .
An instance M is a stable model of $\mathcal{P}_{\mathit{ED}}$ if $M \in \mathsf{MM}(\mathcal{P}_{\mathit{ED}}')$ , where $\mathcal{P}_{\mathit{ED}}'$ is the reduct of $\mathcal{P}_{\mathit{ED}}$ w.r.t. M. We use $\mathsf{SM}(\mathcal{P}_{\mathit{ED}})$ to denote the set of all stable models of $\mathcal{P}_{\mathit{ED}}$ .
Cautious Reasoning. Consider an extensional database $\mathit{ED}$ , a program $\mathcal{P}$ , and a query Q. The cautious answers to Q over $\mathit{ED}$ and $\mathcal{P}$ is the set:
The key task we are interested in, regarding logic programs, is computing cautious answers. In particular, we are interested in its data complexity, that is, when the program and the query are fixed; as usual, we focus on the decision version of the problem. In the following, $\mathcal{P}$ and Q denote some program and some query, respectively:
It is well known that for every program $\mathcal{P}$ and every query Q, $\mathsf{CANS}(\mathcal{P},Q)$ is in $\text{co}\text{NP}$ – e.g., see (Greco et al. Reference Greco, Saccà and Zaniolo1995).
The choice construct. We now extend logic programs with an additional construct, called choice. We point out that extending logic programs with the choice is purely for syntactic convenience, as this construct can be implemented by means of standard rules with negation.
The choice construct has been introduced in Datalog in (Saccà and Zaniolo Reference Saccà and Zaniolo1990), studied in (Giannotti et al. Reference Giannotti, Pedreschi, Saccà and Zaniolo1991; Greco et al. Reference Greco, Saccà and Zaniolo1995; Greco and Zaniolo Reference Greco and Zaniolo1998; Greco et al. Reference Greco, Zaniolo and Ganguly1992), and implemented in the Datalog systems LDL++ (Arni et al. Reference Arni, Ong, Tsur, Wang and Zaniolo2003) and, in some form, in recent ASP systems (e.g., Potassco (Gebser et al. Reference Gebser, Kaufmann, Kaminski, Ostrowski, Schaub and Schneider2011) and DLV (Alviano et al. Reference Alviano, Faber, Leone, Perri, Pfeifer and Terracina2010)). It is used to enforce functional dependency (FD) constraints on rules of a logic program.
A choice rule r is an expression of the form
where n, m, H, $A_1,\ldots,A_n$ , and $B_1,\ldots,B_m$ are all defined as for standard rules, while X and Y denote disjoint sets of variables occurring in $\mathsf{body}(r)$ .^{ Footnote 3 } The original definition of choice rule allows for multiple choice constructs in the rule body; here we focus on choice rules with only one choice construct in the body as this is enough for our purposes.
Intuitively, the construct $\textit{choice}((X),(Y))$ prescribes that the set of all consequences derived from r must respect the FD $X \rightarrow Y$ .
The formal semantics of choice rules is given in terms of a translation to standard rules using negation. In particular, the choice rule r defined above is a shorthand for writing the following set of rules; in what follows, $\mathbf{x}$ and $\mathbf{y}$ are the tuples of all variables in X and Y, respectively, in some arbitrary order.
In the above rules, $\mathsf{Range}_r$ , $\mathsf{Chosen}_r$ , and $\mathsf{DiffChoice}_r$ are fresh relations not occurring in $\mathcal{P}$ , which are used only to rewrite the rule r.
5.1 Implementing supported certain answers via logic programming with choice
The goal of this section is to prove the following key result.
Theorem 9 For every weaklyacyclic data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , and every query Q over $\mathsf{T}$ , there exists a program $\mathcal{P}$ such that $\mathsf{SCERT}(\mathcal{S},Q)$ reduces to $\mathsf{CANS}(\mathcal{P},Q)$ in polynomial time.
The rest of this section is devoted to prove the above claim. In particular, we show how to convert a weaklyacyclic data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , together with a source instance I of $\mathcal{S}$ and a query Q over $\mathsf{T}$ , into an extensional database $\mathit{ED}$ and a program $\mathcal{P}$ using choice rules, in such a way that $\mathcal{P}$ depends only on $\mathcal{S}$ and such that $\mathsf{scert}_{\mathcal{S}}(I,Q) = \mathsf{cans}_{\mathcal{P}}(\mathit{ED},Q)$ .
The main idea of the translation is to derive a program together with an extensional database such that the stable models correspond to a subset of the supported solutions that is enough for computing supported certain answers. For this, we rely on the following useful result that one can extract from the proof of Theorem 4. For a set S of terms and a set of instances $\mathcal{I}$ , we use $\mathcal{I}_{\downarrow S}$ to denote the set of instances $\{I \in \mathcal{I} \mid \mathsf{dom}(I) \subseteq S\}$ .
Lemma 10 Consider a weaklyacyclic data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ . There exists a polynomial $\mathsf{pol}$ such that, for every source instance I of $\mathcal{S}$ , and every query Q over $\mathsf{T}$ , the following holds:
where S is the set of all constants occurring in $\mathcal{S}$ , I and Q, plus some fixed, arbitrarily chosen constants $c_1,\ldots,c_{\mathsf{pol}(I)}$ not occurring anywhere in $\mathcal{S}$ , I, or Q.
Proof. The claim easily follows by construction of the nondeterministic procedure building the instance $J^*$ in the proof of Theorem 7, from the fact that it terminates after a polynomial number of steps w.r.t. I, and the fact that it halts with $J^* \neq\; \perp$ iff $J^*$ is a supported solution in $\mathsf{ssol}(I,\mathcal{S})$ .
The result above tells us that considering supported solutions of a certain polynomial size suffices for computing supported certain answers. The stable models of the program together with the extensional database we are going to define will correspond to such supported solutions.
Definition 7 (Translation)
Consider a data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , a source instance I of $\mathcal{S}$ , a query Q over $\mathsf{T}$ , and the set of constants S as defined in Lemma 10 w.r.t. $\mathcal{S}$ , I and Q.
We use $\mathsf{LP}(\mathcal{S})$ to denote the set consisting of the following rules.

1. For each TGD $\rho$ of the form $\alpha_1 \wedge \cdots \wedge \alpha_n \rightarrow \exists \mathbf{z}\, \beta_1 \wedge \cdots \wedge \beta_m$ in $\Sigma_{st} \cup \Sigma_t$ , with $\mathbf{y} = \mathsf{fr}(\rho)$ , if $k = \mathbf{z} = 0$ , the following rules are introduced:
(1) \begin{equation}\beta_i \text{ : } \alpha_1,\ldots,\alpha_n,\ \ \ \ i \in \{1,\ldots,m\},\end{equation}otherwise, the following rules are introduced:(2) \begin{equation}\mathsf{ExChoice}_\rho(\mathbf{y},\mathbf{z}) \text{ : } \alpha_1,\ldots,\alpha_n, \mathsf{Dom}(\mathbf{z}[1]),\ldots,\mathsf{Dom}(\mathbf{z}[k]),\textit{choice}((Y),(Z)),\end{equation}(3) \begin{equation}\beta_i \text{ : } \mathsf{ExChoice}_\rho(\mathbf{y},\mathbf{z}), \ \ \ \ i \in \{1,\ldots,m\},\end{equation}where Y and Z are the sets of all variables in $\mathbf{y}$ and $\mathbf{z}$ , respectively, and $\mathsf{Dom}$ is a fresh predicate. 
2. For each EGD $\alpha_1 \wedge \cdots \wedge \alpha_n \rightarrow x = y$ in $\Sigma_t$ , the following constraint is introduced:
(4) \begin{equation}\perp \text{ : } \alpha_1,\ldots,\alpha_n, x \neq y\end{equation}
We use $\mathsf{ED}(\mathcal{S},I,Q)$ to denote the extensional database consisting of the following rules.

1. For each constant $c \in S$ , the following rule is introduced:
(5) \begin{equation} \mathsf{Dom}(c) \text{ : } . \end{equation} 
2. For each fact $\alpha \in I$ , the following rule is introduced:
(6) \begin{equation} \alpha \text{ : } . \end{equation}
Example 6 Considering the data exchange setting $\mathcal{S}$ and the source instance I of $\mathcal{S}$ from Example 2, we have that $\mathsf{LP}(\mathcal{S})$ is the following logic program:
Intuitively, the choice rule associated to the TGD $\rho_1$ is in charge of nondeterministically assigning a certain value to the existential variables of $\rho_1$ , for each value its frontier variables can take, that is, the choice rule essentially builds an exchoice for $\rho_1$ . Once the exchoice is constructed, the rule $\mathsf{EmpC}(x,z) \text{ : } \mathsf{ExChoice}_{\rho_1}(x,z)$ simply propagates these choices to the head of $\rho_1$ , as needed. All other TGDs have no existential quantification, and so use no choice construct. Finally, the only EGD $\eta$ is converted to a constraint, so that the stable models of the logic program satisfy $\eta$ .
We are now ready to prove Theorem 9.
Proof of Theorem 9. Given an instance I over a schema $\mathsf{S}$ , and a schema $\mathsf{S}' \subseteq \mathsf{S}$ , we use $I[\mathsf{S}']$ to denote the restriction of I to only its facts referring to relations in $\mathsf{S}'$ . Notice that for every query Q over $\mathsf{S}'$ , the following holds: $Q(I)=Q(I[\mathsf{S}'])$ .
Consider a data exchange setting $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ , a source instance I of $\mathcal{S}$ , a query Q over $\mathsf{T}$ , and the set of constants S as defined in Lemma 10 w.r.t. $\mathcal{S}$ , I, and Q.
Let $\mathcal{P}=\mathsf{LP}(\mathcal{S})$ and $\mathit{ED}=\mathsf{ED}(\mathcal{S},I,Q)$ .
We want to show $ \mathsf{cans}_{\mathcal{P}}(\mathit{ED},Q) = \mathsf{scert}_{\mathcal{S}}(I,Q).$ Leveraging Lemma 10, we show that $\{M[\mathsf{T}] \mid M \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})\} = \mathsf{ssol}(I,\mathcal{S})_{\downarrow S}.$
(1) In the following, we show $\{M[\mathsf{T}] \mid M \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})\} \subseteq \mathsf{ssol}(I,\mathcal{S})_{\downarrow S}.$ Let $X \in \{M[\mathsf{T}] \mid M \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})\}$ and M be a stable model in $\mathsf{SM}(\mathcal{P}_{\mathit{ED}})$ such that $X=M[\mathsf{T}]$ .
1Let $\gamma$ be an exchoice defined as follows: given a TGD $\rho = \varphi({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ in $\Sigma_{st}\cup \Sigma_t$ and a tuple $\mathbf{t} \in \mathsf{Const}^{{\textbf {y}}}$ , $\gamma$ returns a set $\gamma(\rho,\mathbf{t})$ of pairs of the form $(z_i,c)$ , one for each existential variable $z_i \in {\textbf {z}}$ , where c is defined as follows: if $\mathsf{ExChoice}_\rho(\mathbf{t},c_1,\dots,c_k)\in M$ , then $c=c_i$ , otherwise c is an arbitrary constant of S.
It is easy to see that $\mathsf{\mathsf{U}}(\mathcal{P}_{\mathit{ED}})=S$ , and thus X contains only constants in S. Moreover, $I \cup X$ satisfies $\Sigma_{st}^\gamma$ , because otherwise M would not satisfy some ground version of the rules derived from the TGDs in $\Sigma_{st}^\gamma$ . Also, X satisfies $\Sigma_t^\gamma$ , because otherwise M would not satisfy some ground version of the rules derived from the TGDs/EGDs in $\Sigma_t^\gamma$ .
Since every stable model is also a minimal model, the minimality of M ensures that there is no $J' \subsetneq X$ such that $I \cup J'$ satisfies $\Sigma_{st}^\gamma$ and J’ satisfies $\Sigma_t^\gamma$ . Thus, X is a supported solution of I w.r.t. $\mathcal{S}$ containing only constants in S.
(2) We now show $\{M[\mathsf{T}] \mid M \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})\} \supseteq \mathsf{ssol}(I,\mathcal{S})_{\downarrow S}.$ Let $J \in \mathsf{ssol}(I,\mathcal{S})_{\downarrow S}$ and $\gamma$ be the exchoice for which $I \cup J$ satisfies $\Sigma_{st}^\gamma$ and J satisfies $\Sigma_t^\gamma$ . Let $X = I \cup J \cup \{\mathsf{Dom}(c) \mid c \in S\}$ . We show that $X \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})$ .
First, X satisfies each ground rule in $\mathsf{ground}(\mathcal{P})$ of the form $\beta_i \text{ : } \alpha_1,\ldots,\alpha_n$ (cf. (1) in Definition 7), because otherwise the TGD of the form $\alpha_1 \wedge \cdots \wedge \alpha_n \rightarrow \beta_1 \wedge \cdots \wedge \beta_m$ in $\Sigma_{st}^\gamma$ or $\Sigma_t^\gamma$ would not be satisfied by $I \cup J$ or J, respectively.
Also, X satisfies the ground rules in $\mathsf{ground}(\mathcal{P})$ of the form (2)–(3) in Definition 7, derived from a TGD having existential variables, because otherwise such a TGD would not be satisfied by either $I \cup J$ or J, or J would not be minimal.
Further, X satisfies each ground constraint of the form $\perp \text{ : } \alpha_1,\ldots,\alpha_n, x \neq y$ (cf. (4) in Definition 7) in $\mathsf{ground}(\mathcal{P})$ as otherwise J would not satisfy the EGD $\alpha_1 \wedge \cdots \wedge \alpha_n \rightarrow x = y$ in $\Sigma_t^\gamma$ .
Then, X satisfies each rule in $\mathit{ED}$ of the form (5) of Definition 7 because $\{\mathsf{Dom}(c) \mid c \in S\} \subseteq X$ .
Finally, X satisfies each rule in $\mathit{ED}$ of the form (6) of Definition 7 because X contains I.
By the minimality of J we obtain the minimality of X, and thus, X is a stable model of $\mathcal{P}_{\mathit{ED}}$ . Noting that $X[\mathsf{T}] =J$ , we conclude that $J\in \{M[\mathsf{T}] \mid M \in \mathsf{SM}(\mathcal{P}_{\mathit{ED}})\}$ . $\Box$
6 Approximate query answering via materialization
As already discussed in the introduction, there might exist scenarios where it is desirable to materialize a target instance starting from the source instance and the schema mapping, in such a way that supported certain query answers can be computed by considering the target instance alone. The goal of this section is thus to study the problem of materializing such an instance, when focusing on our notion of supported solutions.
It would be very useful if such a special target instance could be computed in polynomialtime, already for weaklyacyclic settings. However, due to Theorem 8, this would imply $\text{PTIME}=\text{co}\text{NP}$ . Hence, we need something different.
We introduce a special instance that enjoys the following properties: the answers over this instance are an approximation (i.e., a subset) of the supported certain answers for general queries, but they coincide with supported certain answers for positive queries. We also show that we can compute such an instance in polynomial time for weaklyacyclic settings.
Our approach relies on conditional instances (Imielinski and Lipski Reference Imielinski and Lipski1984), which we introduce in the following.
Conditional instances. A valuation $\nu$ is a mapping from $\mathsf{Const} \cup \mathsf{Null}$ to $\mathsf{Const}$ that is the identity on $\mathsf{Const}$ . A condition $\phi$ is an expression that can be built using the standard logical connectives $\wedge$ , $\vee$ , $\neg$ , $\Rightarrow$ , and expressions of the form $t = u$ , where $t,u \in \mathsf{Const} \cup \mathsf{Null}$ . We will also use $t \neq u$ as a shorthand for $\neg (t = u)$ . We write $\nu \models \phi$ to state that $\nu$ satisfies $\phi$ , and $\phi \models \psi$ if all valuations satisfying $\phi$ satisfy the condition $\psi$ . A conditional fact is a pair $\langle \alpha,\phi \rangle$ , where $\alpha$ is a fact and $\phi$ is a condition. A conditional instance $\mathcal{I}$ is a finite set of conditional facts. We also denote $\mathcal{I}[1] = \{\alpha \mid \langle \alpha,\phi \rangle \in \mathcal{I}\}$ . A possible world of a conditional instance $\mathcal{I}$ is an instance I such that there exists a valuation $\nu$ with $I = \{ \nu(\alpha) \mid \langle \alpha, \phi\rangle \in \mathcal{I} \text{ and } \nu \models \phi\}$ . We use $\mathsf{pw}(\mathcal{I})$ to denote the set of all possible worlds of $\mathcal{I}$ .
Definition 8 Consider a conditional instance $\mathcal{I}$ and a query Q. The conditional certain answers of Q over $\mathcal{I}$ is the set $\mathsf{con\text{}cert}(\mathcal{I},Q) = \bigcap_{J \in \mathsf{pw}(\mathcal{I})} Q(J)$ .
We are now ready to introduce our main tool.
Definition 9 (Approximate Conditional Solution)
Consider a data exchange setting $\mathcal{S}$ and a source instance I of $\mathcal{S}$ . A conditional instance $\mathcal{J}$ is an approximate conditional solution of I w.r.t. $\mathcal{S}$ , if for every query Q:

1. $\mathsf{ssol}(I,\mathcal{S}) \subseteq \mathsf{pw}(\mathcal{J})$ , and thus $\mathsf{con\text{}cert}(\mathcal{J},Q) \subseteq \mathsf{scert}_{\mathcal{S}}(I,Q)$ , and

2. if Q is positive, $\mathsf{con\text{}cert}(\mathcal{J},Q) = \mathsf{scert}_{\mathcal{S}}(I,Q)$ .
That is, an approximate conditional solution is a conditional instance that allows to compute approximate answers for general queries, and exact answers for positive queries.
It is easy to observe that there are settings $\mathcal{S} = \langle \mathsf{S}, \mathsf{T}, \Sigma_{st}, \Sigma_t \rangle$ and source instances I for which an approximate conditional solution might not exist, even if $\mathcal{S}$ is weaklyacyclic. This is due to the presence of EGDs in $\Sigma_t$ .
However, for weaklyacyclic settings without EGDs, an approximate conditional solution always exists, and we present a polynomialtime algorithm that is able to construct one. We show how to deal with general weaklyacyclic settings with EGDs in Section 7.
The algorithm is a variation of the wellknown chase algorithm, which iteratively introduces new facts, starting from a source instance, whenever a TGD is not satisfied, that is, it triggers the TGD. This variation also allows for a conditional triggering of TGDs, where new atoms are introduced, under the condition that some terms in the body coincide.
Normal TGDs. To simplify the discussion, we consider an extension of TGDs that allow for equality predicates in the body. We will use these TGDs to rewrite standard TGDs in the following normal form. A normal form TGD $\rho$ is an expression of the form $\varphi({\textbf {x}},{\textbf {y}}) \wedge \eta({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ , where $\varphi$ and $\psi$ are conjunctions of atoms, $\varphi$ uses only variables and each variable in $\varphi$ occurs once in $\varphi$ . The formula $\eta$ is a conjunction of equalities of the form $x=t$ , where x is a variable in ${\textbf {x}}$ or ${\textbf {y}}$ , and t is either a variable in ${\textbf {x}}$ or ${\textbf {y}}$ , or a constant. The above equalities denote which variables should be considered to be the same and which positions should contain a constant. A (set of) standard TGDs $\Sigma$ can be converted in normal form in the obvious way. We denote $\mathsf{norm}(\Sigma)$ as the (set of) TGDs in normal form obtained from $\Sigma$ .
In the following, fix a conditional instance $\mathcal{I}$ , a TGD $\rho$ with $\mathsf{norm}(\rho) = \varphi({\textbf {x}},{\textbf {y}}) \wedge \eta({\textbf {x}},{\textbf {y}}) \rightarrow \exists {\textbf {z}}\, \psi({\textbf {y}},{\textbf {z}})$ , and a homomorphism h from $\varphi({\textbf {x}},{\textbf {y}})$ to $\mathcal{I}[1]$ . We use $h(\eta({\textbf {x}},{\textbf {y}}))$ to denote the condition obtained from $\eta({\textbf {x}},{\textbf {y}})$ by replacing each variable x therein with h(x). Letting $h(\varphi({\textbf {x}},{\textbf {y}})) = \{\alpha_1,\ldots,\alpha_n\}$ , we use $\Phi^{\mathcal{I}}_{\rho,h}$ to denote the set of all conditions of the form