What do we want to do with a description logic?

Concept Subsumption

Is CD definitely true, definitely false, or unknown?

Satisfiability

Is there a model of our KB such that a given concept is non-empty? Equivalently, is C ⊑ ⊥ true, false, or unknown?

CD iff CDDC

Disjointness

C and D are disjoint if CD ⊑ ⊥; equivalently if their intersection is empty in every model.

Classification

Organise all the mentioned concepts into an explicit hierarchy. Depends on subsumption.

If we can classify concepts, we can classify individuals.

Consistency checking

Can we conclude xCxC for some individual and concept?

Can we conclude (x,y) ∈ r ∧ (x,y) ∉ r for some individuals and rôle?

Could there be a non-empty ABox for a given TBox?

Classify individuals

Is xC definitely true, definitely false, or unknown?

Take all the assertions xAi and test whether A1 ⊓ ⋯ ⊓ AnC.

Find individuals

Find all individuals x such that xC is known to be true.

Description (realisation)

Given an individual and a set of concepts, find the most specific concept that the individual belongs to.

AL

Concepts can be

• A — atomic concepts (names)
• ¬ A — negated atomic concept
• CD — intersection
• r.⊤ (limited existential)
• r.C (value restriction)

There is no union, and negation cannot be applied to anything but an atomic concept. None of the operations of the algebra of binary relations are available for rôles.

There is a family of languages obtained by extending AL. They are named AL[U][E][N][C] where the name includes

• U if union CD is allowed
• E if full existential quantification ∃ r.C is allowed
• N if number restrictions ≥nR and ≤nR are allowed.
• C if general complements ¬C are allowed.

C implies U (why? de Morgan) and E (why?), while U and E imply C (by expanding out to negation normal form). So ALUEC is abbreviated ALC and ALEUNC is abbreviated ALCN.

In AL we can write

Concepts:
Person, Pet, Dog, Bird, Male, Female
Roles:
tends

Axioms:
Person ⊑ ∀ tends.Pet
∃ tends.⊤ ⊑ Person
Person ⊓ Pet ⊑ ⊥
Dog ⊑ Pet
Bird ⊑ Pet
Dog ⊓ Bird ⊑ ⊥
Male ⊓ Female ⊑ ⊥

ABox:
richard ∈ Person ⊓ Male
lily ∈ Dog ⊓ Female
jazzie ∈ Bird ⊓ Female
chassie ∈ Bird ⊓ Male
(richard,lily) ∈ tends
(richard,jazzie) ∈ tends
(richard,chassie) ∈ tends

Note that (richard ∈ Person ⊓ Male) could be simplified to (richard ∈ Male) because (richard,lily) ∈ tends → richard ∈ ∃ tends.⊤ → richard ∈ Person.

In 1991 it was shown that there is a decision algorithm for ALC satisfiability (hence also ALC subsumption). Since then sound and complete algorithms for several extensions of ALC have been found: number restrictions, transitive rôles and transitive closure of rôles.

However...

Decidable is not the same as tractable. For example, the satisfiability problem for propositional calculus is NP-complete. That is, if you N(ondeterministically guess) a model for a formula, you can verify that it satisfies the formula in P(olynomial time), but nobody how to manage without guessing in less than exponential time.

If you have a problem that is NP-complete or worse, you can

• Throw in any practical improvements you can think of and hope that no hard problems will turn up.
• Use some kind of approximation.
• Use a randomised algorithm with a specified probability of failing.
• Look for a restriction that makes the problem easier.

There is a description logic in which subsumption is not just decidable, it's polynomial time. This is EL.

Concepts can be

• A — atomic concepts
• CD
• r.C (existential restriction)

Terminology axioms are

• CD
• CD

This looks rather weak. On the other hand, I've seen something very like this before. There is a graphical notation for first order logic and a set of operations on such graphs called Conceptual Graphs, described in a book by John Sowa. The formulas that can be modelled simply in this way are the existential-conjunctive ones. There is a research group at Montpellier University who have been studying the existential-conjunctive fragments of first-order logic. If you say that a formula is one of

• A literal whose arguments are just variables
• α ∧ β where α and β are formulas
• x α where α is a formula

then “does α entail β” is decidable (unlike the case in general logic), but very very hard. Technically, it is in Π2P.

But if we don't just limit negation to logical atoms, and forbid it completely, the entailment problem is “only” NP complete.

This should remind you of the relationship between ALC and EL. (The entailment problem and the subsumption problem are basically the same.)

Dr Labuschagne has talked to you about belief revision. If you have a knowledge base in existential-conjunctive form, then no matter how many positive facts you add, you will never get a contradiction. Belief revision is simple growth.

This immediately tells us that EL cannot accept axioms like

Person ⊓ Pet ⊑ ⊥

Well, it's obvious: ⊥ is not allowed by EL syntax. But more importantly, there's no other way to express the same intention.

Unification

There is another thing we might want to do with formulas in a description logic. Suppose we have a concept C expressed using concept variables V... and a second concept D

• Asking whether there is a substitution of concept formulas for variables making CD, when D does not contain concept variables, is the matching problem.
• Asking whether there is a substitution of concept formulas for variables making CD, when D may also contain concept variables, is the unification problem.

Franz Baader and Barbara Morawska have shown that both unification and matching are NP-complete for EL. If you allow ∀ r.C instead of ∃ r.C, unification is still decidable, but it is EXPTIME-complete. We don't know if P=NP and we don't know if NP=EXPTIME, but we do know that P≠EXPTIME, and it's widely suspected that NP≠EXPTIME.

EL is tractable, but it's not very expressive. But it's nearly expressive enough.

EL++

EL++ is the logic underlying the SNOMED Clinical Terms ontology and the Gene Ontology. SNOMED contains several hundred thousand concepts and well over a million relationships between them. At this sort of scale, “tractable” matters. (SNOMED is not the largest medical ontology around, but we have a copy of it.)

Here are the forms of EL++, using a, b, ... to stand for individuals, r (with or without subscripts) to stand for rôles, f to stand for features (functions mapping individuals to concrete domains such as numbers or strings, p to stand for predicates over concrete domains (such as ≤), and C, D, ... to stand for concepts.

top
bottom
nominal
{a}
conjunction
CD
existential restriction
r.C
concrete domain
p(f1,...fn)

We haven't seen “concrete domain” before. The interpretation of such a form is {x ∈ Δ | p(f1(x),...,fn(x))}. There's a link here with Entity-Relationship modelling in relational data bases. Consider a tiny fragment of a University data base:

Entities:
Student, Paper
Relationships:
Takes(Student, Paper)
Attributes:
Name : Student → string
Date_Of_Birth : Student → date
Debt : Student → money
Title : Paper → string
Level : Paper → number
Points : Paper → number

Some things are modelled as “entities”, rather like objects in an OO domain model. As in this case, sometimes entities refer to concrete physical things like students and sometimes to social constructs like papers. Other things are modelled as mathematical values like strings and numbers. In an E-R model, an entity type like Student or Paper corresponds to a table, while a value type like string or number doesn't. An association between entities is called a relationship and also corresponds to a table (which has foreign keys pointing to the entities). An association between an entity and a value is represented by a column in the entity's table.

In the same way, EL++ and RDF let you model some things as “individuals” (the RDF name is “resource”, but it's the same idea) and other things as values (numbers or strings). And that's what features are for.

Here are the axioms you can put in the TBox:

concept inclusion
CD
rôle inclusion
r1 ∘ ⋯ ∘ rnr
domain restriction
dom(r) ⊑ C
range restriction
ran(r) ⊑ D

In rôle inclusions, the case n=0 is allowed; the identity for the composition of binary relations is the identity function, here written ε.

Here are the assertions you can put in the ABox:

concept assertion
aC
rôle assertion
(a,b) ∈ r

Rôle inclusions let us express

• rôle hierarchies
• rôle equivalences
• transitive rôles: r ∘ rr
• reflexive rôles: ε ⊑ r

Here are some examples.

father ⊑ parent
mother ⊑ parent
ε ⊑ ancestor
parent ⊑ ancestor
ancestor ∘ ancestor ⊑ ancestor
parent ⊑ looks_after
tends ⊑ looks_after

If y is x's father, y is one of x's parents. Being the parent of a child and the owner of a dog are similar (you have to provide your children and your animals with suitable food, drink, and shelter) but different (you can sell your dog but not your daughter). We may need to model the similarities so that we can draw correct inferences (how many creatures is Richard responsible for feeding?) and we may also need to model the differences so that we don't draw incorrect inferences.

The empty concept ⊥ together with concept inclusions gives us disjointness of concepts, as before.

Something we couldn't do before: {a} ⊑ {b} expresses “a and b are identical” and {a} ⊓ {b} ⊑ ⊥ expresses “a and b are different”. This means that we can allow a=b and a≠b in the language without extending its power.

We can now express “Richard has these three pets”:

Concepts:
Person, Pet
Roles:
tends
Axioms:
dom(tends) ⊑ Person
ran(tends) ⊑ Pet
Person ⊓ Pet ⊑ ⊥
lily ≠ jazzie, lily ≠ chassie, jazzie ≠ chassie.
ABox:
(richard,lily) ∈ tends
(richard,jazzie) ∈ tends
(richard,chassie) ∈ tends

and now you can safely conclude that I have at least three pets, without knowing anything about their sex or genus.

There is an additional limitation on EL++, and it's rather technical. It's a beautiful example (in its own horrible way) of just how delicate the differences between complexity classes can be.

Define T ⊨ rs iff there are rôle inclusions r1r2, r2r3, ... rn-1rn in T with r=r1 and rn=s.

Define T ⊨ ran(r) ⊑ C iff there is a rôle name s such that T ⊨ rs and the range restriction ran(s) ⊑ C is in T.

We need to impose an extra restriction on the whole TBox.

If r1 ∘ ⋯ ∘ rns is in T, with n≥1
and T ⊨ ran(s) ⊑ C
then T ⊨ ran(rn) ⊑ C.

Without this restriction, subsumption in EL++ is undecidable. With it, subsumption can be done in polynomial time.

Sadly, we can't add rôle inverse or symmetric rôles without losing tractability.

Credit

Much of this is cribbed from papers by Franz Baader and his colleagues.