XsRQL: an XQuery-style Query Language for RDF

A submission to the RDF Data Access Working Group (DAWG)

Howard Katz <howardk@fatdog.com>
June 27, 2004

Introduction Language objectives This document Feature summary XsRQL path language A brief tutorial Examples Example 1: Marriage partners Example 2: Plotting people on a map Example 3: Finding hypernyms with WordNet Example 4: Libby Miller's papers Example 5: Libby's mailbox Example 6: Libby's optional mailbox XsRQL grammar

Introduction

XsRQL is a query language for RDF that derives much of its syntax and style from XQuery, hence its name: an XQuery-style RDF Query Language. The idea is to reuse many of the useful and innovative features and metaphors the XML Query Working Group spent so many thousands of hours developing, while omitting the more complex parts of the XQuery specification that are specific to XML and not required in an RDF environment.

The "style" qualifier in the name is important: XsRQL sits on top of an RDF data model and knows nothing about XML or the complexities of the XQuery data model; on the other hand, it borrows happily and unashamedly from the XQuery surface syntax, its concept of an underlying, formal data model, its functional programming metaphor, and a number of the other innovations pioneered by the XML Query working group that are described below.

The basic idea is to reuse some of the fruits of the tens of thousands of long hours and hard work the working group put into XQuery, arguably the W3C's most complex specification. In the end, RDF is far simpler than XML, and XsRQL is correspondingly far simpler than XQuery. It shamelessly steals, er, borrows, much of what's best about XQuery and ignores the rest.

I've tried to keep the amount of blue-skying in the following to a minimum (though I might not have always succeeded; some of the sample code below has yet to see silicon, and it's hard not to occasionally wax rhapsodic, particularly with a cider or two in hand.) I've implemented much of the path language in prototype form and hope to be able to demo some live code at the face-to-face in San Diego.

Language objectives

The main objectives of XsRQL are:

to keep the language as simple and as elegant as possible (something the author is not necessarily in the best position to judge),

to provide the end user the opportunity of choosing a query style that sits, at his or her own discretion, somewhere on a continuum between concise-but-readable and a more verbose style that's as self-documenting as desired, and

to allow the user a similar choice on the emit side of the equation between result-sequence concision and a full-blown, ad hoc report-generating capability.

This document

This document looks briefly at the XsRQL feature set, most of which are drawn directly or indirectly from existing mechanisms in XQuery, provides numerous code snippet examples of the path language that's central to XsRQL, works its way through a somewhat herky-jerky tutorial overview of the language in general, and finishes by providing several illustrative examples of XsRQL queries compared and contrasted with similar queries in RDQL and other existing RDF query languages.

As well, a working prototype of an early first cut at a JavaCC grammar is attached (if only to prove that the author isn't living entirely in cloud cuckoo land.)

For those who are impatient to get up to speed and don't derive as much pleasure listening to the author speak as he does himself, I'd suggest jumping right into working code: work your way through the numerous code snippets in the sections on the XsRQL path language and the Examples.

Lastly, thanks to Andy Seaborne of HP Labs whose Jena tutorial was helpful in bringing me up to speed on RDQL, and whose look and feel so impressed me with its straightforward simplicity (the document that is, not Andy) that I have, with his concurrence, adopted its style as my own.

Feature summary

The main features of XsRQL that are drawn directly from or are inspired by XQuery are:

it's a functional language

it's based on a formal data model which understands RDFItems and XML Schema atomic types

result lists are free-form heterogeneous sequences of data model items

it has a procedural feel to it that will be readily familiar to many programmers

it adapts an XPath-style navigational metaphor to the needs of RDF graphs

it borrows the XQuery's style of built-in functions

the language is typed, albeit lightly

for and let statements allow access to sequences en masse or via individual components

It allows full and partial wildcards on QNames

it has user-defined functions for extensibility

it has triple constructors, as well as a triples-selector function

reports can be built and formatted directly in the query

it provides the ability to set and interrogate query- and query-processor environment settings within a query prolog

it adds an optional result-sequence preamble, which can be used to deliver query- and query-processor-related metadata, as well as to reduce bandwidth for large uri sets.

Functional language

XsRQL is a functional language in that the output resulting from evaluating each expression in the query tree becomes input to the expression above it. The sequence that ultimately emerges from the top of the query tree is the result of the query.

This makes it possible to cascade XsRQL functions together and is part of what makes the language composable. For example:

    count( sorted( distinct( @* ) ) ) + 1

The output of distinct() (a graph-oriented, unique list of every resource in the datastore) feeds into sorted() (which sorts it into alphabetic order if the implementation hasn't already done so in distinct()), which in turn feeds into count().

An underlying, RDF-oriented data model

The interaction between query evaluation and the data model in XsRQL is similar to the way it works in XQuery: as expressions are evaluated, they "inject" instances of particular items into the initially unpopulated data model. As the query processor evaluates successfully higher expressions on the query tree, new items come into existence or existing ones disappear. The sequence of whatever remains at the top of the tree is the results of the query.

The major difference from XQuery is that the XsRQL data model understands entities that are germane to RDF and not XML. At present, XsRQL knows about:

subjects

predicates

objects

triples

quads

resources, and

literals, both constructed in the query and as object nodes, and

so-called "atomic" items (terminology from XQuery) corresponding to the 19 simple types in XML Schema

Heterogeneous result sequences

Let's walk through an almost trivial example of how queries are evaluated and result sequences created in XsRQL. Here's a very simple query:

"I like this language.", " So do I!", "+", 1

The parse tree for this query looks something like the following:

                        commaOp
                       /       \
    "I like this language."   commaOp
                            /       \
                    " So do I!"   commaOp
                                 /       \
                               "+"        1

The comma operator concatenates its lefthand operand with its righthand operand. As each item is encountered as the query processor walks the query tree, the comma operator at each stage first evaluates its lefthand side. The result of evaluating a string expression (they're all strings in this example, except for the single integer "1" at the end of the sequence) is to create a singleton sequence containing the item. The operator then combines that singleton sequence with the result of evaluating its righthand side, which causes it to recursively call the next comma operator down the line.

At the end of this sequence of recursive evaluations, the following five-item heterogeneous sequence emerges from the top of the query:

    "I like this language"-[str]  " So do I!"-[str]  "+"-[str]  1-[int]   "!"-[str]

Once query evaluation is complete, the results are serialized: String items are printed to the result stream as they're encountered in the result sequence, and the lexical string representation of the single integer value is likewise printed. The final result of this is:

I like this language. So do I! +1

It looks like a single string, but from the query processor's perspective, it's a sequence of four consecutive items in a heterogeneous result sequence. Result sequence is a more accurate term than result set, since order is often important and duplication is allowed.

Procedural feel

I'm sure some members of the XML Query working group will disagree, but both XQuery and XsRQL share a procedural "feel", at least from my perspective. This is admittedly a highly personal, "Motherhood and apple pie" kind of thing. The following snippet of code (from Example 5) should be readily understandable by anybody with a background in C, Java, or some other procedural language. It does share some declarative characteristics with other languages such as SQL (which has had a huge impact on the design of the language); my own perspective is that XQuery, and hence XsRQL, is largely procedural, with a bit of the declarative mixed in:

let $libbysMailboxes := @foaf:mbox[ "mailto:libby.miller@bristol.ac.uk" ]/*
return
    if ( count( $libbysMailboxes ) = 0 )
    then "Libby doesn't have a mailbox"
    else 
        if ( count( $libbysMailboxes ) > 1 ) or ( count( *[ $libbysMailBoxes ] ) > 1 )
        then "Libby's mailbox isn't inverse functional!"
        else "Libby has a single @foaf:mbox as expected: ", $libbysMailBoxes

XPath-like navigation

XsRQL adopts a navigational style of maneuvering through an RDF graph that is very similar to the way XPath navigates through XML, with a few interesting differences. The main difference is that RDF is not XML, and the entities you specify in an XsRQL path are RDF entities, not XML ones. XsrPath (so-called) knows about such RDF concepts as subjects, predicates, and objects, as well as various node types: uri-addressable resources, bnodes, and literals, as well as triples and quads. It allows both an instances-, triples-based view of the datastore, as well as a graph-based view, depending on the user's needs and preferences.

Paths can be of any length, from a single node or predicate on up. A "striped" style of alternating nodes and "@"-prefixed predicates makes it easy to orient yourself visually as you move down the path.

What might most surprise those familiar with XPath is that the "attributes" in XsrPath are not terminal leaves. In XQuery/XPath, attributes are leaves; they terminate a path. In XsRQL, they simply mark property arcs that are way stations on the way to somewhere else.

Non-XPath 2.0 users are sometimes surprised to see such strange things in the path as function calls and constructed elements, such as in:

    doc( "bib.xml" )/bib/books-with-editors( book )/editor

This XPath says "Call the user-defined function, books-with-editors(), passing in all <book> children of the <bib> root in the document "bib.xml", and return the <editor> children of those books that have <editor> children. Once that function returns its <book>-sequence result, dereference that and return the <editor>s themselves."

(Wonderful example tho this might be, it's a no-op: the same path without the inserted function would work just as well. This is called pedagogy.)

XsrPaths can likewise contain embedded functions and triple constructors.

Easily expandable list of built-in functions

XQuery supports approximately 150-or-so built-in functions. XsRQL could easily cherry-pick a dozen or two of the most useful of these, adapting them to an RDF context where necessary. Interesting to note: by rough count more than half of these are in place to support operations on XML Schema datatypes.

My current, very immature prototype of an XsRQL processor implements the following built-ins at this point:

chr( xsd:integer ) -> xsd:string
count( item* ) -> xsd:integer
distinct( item* ) -> item*
empty( item* ) -> xsd:boolean
exists( item* ) -> xsd:boolean
string-length( xsd:string ) -> xsd:integer
distinct( item* ) -> item*
quads( RDFItem* ) -> RDFItem*
sorted( item* ) -> item*
triples( RDFItem* ) -> RDFItem*

The argument types above are part of a type-expression-language subset of the grammar that's yet to be worked out in its entirety (though I don't have any concerns this will cause any great difficulty).

Lightly typed

It's my personal belief that some degree of typing and type-checking is a good thing. At the very least, we can use it to document what sort of operands need to be delivered to built-in and user-defined functions, as shown above, and enforce that in code if desired. I'm of the opinion that XsRQL should be "lightly typed," if for no other reason than to inform the user when he or she is doing something that's patently foolish or doesn't make sense.

If that perspective is adopted, it will be interesting for the working group to work out what to do when operators have type-mismatched operand types, something that occurs primarily in comparisons and arithmetic expressions. The possible choices seem to be:

return an empty sequence -- ie, fail silently
automatically cast or promote one operand to the type of the other -- ie, succeed silently, or
throw an irrevocable hard exception

The fact that one or both operands can also be either singleton or multi-valued sequences adds a further wrinkle. Because of the complexity caused by adding XML nodetypes to the mix, this caused the XQuery group no end of time and effort in determining how to handle all eventualities. I don't think however that it should be all that difficult to do with the much simpler RDF data model. I definitely think it's worth doing.

Full and partial wildcards on QNames

This query uses a full wildcard in the subject position, saying, "Get me all subjects of a dc:title predicate:

declare prefix dc: = <http://purl.org/dc/elements/1.1/>; 
*[ @dc:title ]

This query uses a partially wildcarded predicate to say, "Get me all the Dublin Core predicates, and only the Dublin core predicates, in the datastore".

declare prefix dc: = <http://purl.org/dc/elements/1.1/>; 
@dc:*

User-defined functions for extensibility

Coming soon ...

Triple constructors and functions

XsRQL has two mechanisms for inserting triples into the result sequence. The first, a triples constructor syntax, provides a fairly free-form method of generating new triples, either built completely from scratch or partially or fully seeded by existing values. The second mechanism, a built-in triples() function, returns triples that already exist in the datastore, seeding the function with a single argument.

The main distinction between the two is that triple constructors let you specify all three subject, predicate, and object positions using either constants or XsRQL path-language expressions. This triples-generating capability can be used both to return existing triples as well as to create new ones, and makes it possible to do XSLT-like transformations on existing graphs. The triples() function, by contrast is a triples-finder, only allows a single path-language expression as its solitary argument and only returns triples that already exist.

As an example, the triple constructor in the following code snippet, adapted from Example 6, is being used to transform a triple from an existing vocabulary into a new one:

declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; 
declare prefix newFoaf: = <http://some/foafish/vocab/>;


let $libby := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ] 
return
      { $libby, @newFoaf:Name, $libby/@foaf:name/* }

Example 1 provides another example of triple constructor usage.

The triples() function in the following snippet, on the other hand:

declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; 
triples( @foaf:mbox )

returns a sequence of all existing triples in the datastore that contain a foaf:mbox predicate. Any XsRQL path-language expression is allowed as the argument, so you can select triples based on whatever path-language constraints you wish. See the XsRQL path language for a fairly good sample of what those are.

A quads() function similarly generates quads where provenance is required.

XsRQL path language

In XsRQL, the path language is everything. Here's a quick introductory walk-through. Some of the guiding principles are:

steal idioms and metaphors where possible from XPath, including steps, QNames and wildcarded QNames, and filters
discard the initial root symbol ('/') (since RDF graphs don't have roots)
discard for now (but possibly reconsider later) the use of a descendant ("//") operator as computationally too expensive (although I've reused it as a datasource separator; see the last few path-language samples below)
reuse the familiar attribute ("@") notation to mark predicates/properties -- a good visual cue that provides context while descending long paths
keep paths as terse as possible but no terser, using predicates were necessary to disambiguate between subjects and objects

Here's a number of short snippets demonstrating the above principles:

Return all nodes (subject and objects) in the datastore
(This is an "instances-" or triples-oriented view of the datastore. Nodes can be duplicated)

An XPath-style "kindtest" variation of the same query

resource()

How many nodes are there?

count( * )

Return all instances of predicates in the datastore (their names, not their values)
(Again, this is a triples-oriented view of the datastore)

*/@*

A shorter way of saying the same thing

@*

Return all subjects (ie, resources that have predicates)

*[ @* ]

If you prefer a kindtest equivalent to the above

subject()

Return all objects (ie, any node that's on the downstream side of a predicate)

@*/*

A kindtest equivalent

object()

Return all instances of all literals

@*/literal()

A shorter version of the same thing (literals only occur in the object position; the predicate isn't required for disambiguation)

literal()

Generate a sorted list, by lexical value, of all triples containing literals.

triples( sorted( literal() ))

One of a number of datastore integrity checks you can do: the count of all object nodes should equal the sum of all literals + object resources. Returns a boolean false if that's not the case

count( object() ) = count( literal() ) + count( @*/resource() )

Return a unique list of all predicates (their names, not their values)
(This is a graph-oriented view of the datastore; no duplicates allowed.)

distinct( @* )

How many unique predicates are there?

count( distinct( @* ) )

Return a sorted, unique list of their values
(This is a graph-oriented view of the datastore; no duplicates allowed.)

sorted( distinct( @*/* ))

How many distinct terms in the CIA Factbook vocabulary? (A good way to get a quick feel for a new vocabulary.) The trailing semicolon terminates a prolog declaration)

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
count( distinct( @ciafb:* ) )

Return the subject named by the uri

<http://www.odci.gov/cia/publications/factbook/af.html>

Return the subject named by the QName

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html

Return all triples containing the subject named by the QName

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
triples( ciafb:af.html )

Return all predicates belonging to the named subject (their names, not their values)

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@*

Return the @GDP_per_capita predicate of the named subject (its name, not its value)

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@ciafb:GDP_per_capita

Return its value

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@ciafb:GDP_per_capita/*

Another way of returning its value, if you're sure it's a literal

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@ciafb:GDP_per_capita/literal()

Return all object instances, if you're sure they're *not* literals

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@ciafb:GDP_per_capita/resource()

Return all named subject instances and let the query engine decide if they don't contain literal values

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html[ @ciafb:GDP_per_capita/resource() ]

Return all caifb: subjects that have the named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:*[ @ciafb:Airports_with_paved_runways ]

Return instances of *all* subjects that have the named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
*[ @ciafb:Airports_with_paved_runways ]

Generate all triples containing the named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
{ *, @ciafb:Airports_with_paved_runways, * }

Return instances of all subjects that have either named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
*[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]

Return the names of all instances of all subjects that have either named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
*[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]/@ciafb:Name/*

Return a list of the unique names of all predicates of subjects that have either named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
distinct( *[ @ciafb:Airports_with_paved_runways | @ciabf:Airports_with_unpaved_runways ]/@* )

Generate all triples containing either named predicate

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
{ *, @ciafb:Airports_with_paved_runways, * } | { *, @ciafb:Airports_with_unpaved_runways, * }

Return all the values belonging to all predicates of the named subject

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@*/*

Only return object values that are bnodes (another kindtest)

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:af.html/@*/bnode()

A partial wildcard. Return a list of all subjects in the CIA Factbook (ie, their instances -- a triples perspective).

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
ciafb:*

Return a unique list of all subjects in the CIA Factbook (a graph-based perspective).

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
distinct( ciafb:* )

Show all the triples for the subject Afghanistan

declare prefix ciafb: = <http://www.odci.gov/cia/publications/factbook/>;
let $afghanistan = *[ @ciafb:Name = "Afghanistan" ]
return
    { $afghanistan, $afghanistan/@*, $afghanistan/@*/* }

Steven Pemberton delivered which working draft(s) at the W3C? The rightmost, nonfiltered wildcard is the title object we're looking for. Note the datasource declaration

declare prefix          = <http://www.w3.org/2001/02pd/rec54#>;
declare prefix dc:      = <http://purl.org/dc/elements/1.1/">;
declare prefix contact: = <http://www.w3.org/2000/10/swap/pim/contact#">;
declare datasource w3c  = <http://www.w3.org/2002/01/tr-automation/tr.rdf>;

<w3c>//*/dc:title/*[ @editor/*/@contact:fullName = "Steven Pemberton" ]

Any other editors named Pemberton doing good work? (You can filter on something and still return it)

declare prefix          = <http://www.w3.org/2001/02pd/rec54#>;
declare prefix dc:      = <http://purl.org/dc/elements/1.1/">;
declare prefix contact: = <http://www.w3.org/2000/10/swap/pim/contact#">;
declare datasource w3c  = <http://www.w3.org/2002/01/tr-automation/tr.rdf>;

<w3c>//@editor/*/[ @contact:fullName/endsWith( "Pemberton" ) ]/@contact:fullName

What's been published since June 1, 1999? (w/a different syntactic variation on datasource, which is now embedded directly in the path)

declare prefix dc: = <http://purl.org/dc/elements/1.1/">;

datasource( "http://www.w3.org/2002/01/tr-automation/tr.rdf" )//*[ dc:date >= "1999-06-01" ]

A brief tutorial

Let's walk through a few simple XsRQL query examples to get a better idea of how things work.

Here's an example from the Jena Tutorial that says, "Give me every vcard: Full Name (Formal Name?) in the repository":

select ?x, ?fname where 
    (?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> ?fname)

x                                | fname 
================================================
<http://somewhere/JohnSmith/>    | "John Smith" 
<http://somewhere/RebeccaSmith/> | "Becky Smith" 
<http://somewhere/SarahJones/>   | "Sarah Jones" 
<http://somewhere/MattJones/>    | "Matt Jones"

If you were satisfied in seeing just the names, your query could be as simple as this in XsRQL:

@<http://www.w3.org/2001/vcard-rdf/3.0#FN>/*

The rightmost wildcard, which is what we're returning, represents any object that is downstream of a vard:FN predicate. "@..." means predicate.

(This following output assumes that the query engine is automatically outputting an auto-linefeed option after every line, but that's at the discretion of the implementation):

John Smith
Becky Smith
Sarah Jones
Matt Jones

If your query engine doesn't have an auto-linefeed feature or it's not enabled (how you set environment options is shown below), you'd have to use a for statement to isolate each individual person node in turn, and use the chr() function to insert your own linefeeds as follows:

for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN>/*
return
    $person, chr(10)

chr() is a simple built-in function that takes an integer argument and returns the Unicode equivalent. In this case, it's a linefeed which is injected right into the result stream with expected, ahem, results. The comma (",") in the clause is an expression-concatenating operator that takes two arguments, in this case a $person node on its left and a single-character string on its right, and concatenates the two together. The effect at emit time of serializing the two items in sequence is to produce the name of the person, followed by a linefeed as expected.

Note that we've left off the heading to the report, which RDQL (or at least Jena?) produces automatically. We can make the same thing happen in XsRQL:

"x                                         | fname\n",
"-------------------------------------------------\n",
for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN>/*
return
    $person, chr(10)

Again, we're evaluating expressions and injecting them into the data model instance as they're encountered and evaluated in the query tree. In this case we evaluate and embed two string items before encountering the for statement and evaluating that. In this case we can embed an escaped "\n" linefeed character directly at the end of both strings without having to evaluate a chr() function.

Finally, the same query can be shortened by rewriting it using QName notation as in the following. The results would be identical. Using QNames doesn't save you much in this particular example; they're more useful when your queries become significantly longer than this one.

declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>;

@foaf:FN/*, chr(10)

This query also shows our first use of a query prolog to set up the namespace prefix. Our parser recognizes this statement as part of a prolog because:

it occurs first, and

it ends in a semi-colon

Any number of prolog declarations can be strung together and used to:

declare prefix-to-uri mappings, as above,

specify emit-time settings (see below>),

set environmental options for both the query- and the query-processor environment ,

declare the signatures of external functions (not covered yet), and

be where user-defined functions are, well, defined (also not covered yet)

QNames are also useful when you both want to create more readable results in the result sequence, as well as reducing bandwidth. The XsRQL:emitQNames declaration in the following query reports all distinct predicates in the datastore of interest. This query assumes a long list of results coming back. XsRQL:emitQNames reduces bandwidth by shipping a short result-sequence preamble that provides the QName-to-uri mapping the client needs, followed by the result-sequence proper. This query:

declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>;
XsRQL:emitQNames;

distinct( @* )

might produce something like the following:

XsRQL:resultPreamble
{
    declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>;
}
@foaf:accountName
@foaf:accountServiceHomePage
@foaf:aimChatID
@foad:based_near
...

The client can easily parse the incoming result sequence to strip off the preamble, as well as noting the QName definition(s) needed to reconstitute the full uris.

Nota: There's been some discussion in the DAWG group about latency issues involved in shipping QNames; I've been able to implement a version that appears to have very little latency (yet to be tested). The key to making this work is that the prefix mappings must first be explicitly set in the query prolog by the user, as above.

XsRQL:emitQNames;
declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>;

@foaf:FN/*, chr(10)

I mentioned earlier that some implementations might provide a user option to do auto-linefeeds. You would declare that option as follows:

XsRQL:autoLineFeed;
declare prefix foaf: = <http://www.w3.org/2001/vcard-rdf/3.0#>;

@foaf:FN/*

If you wanted something a bit closer to the RDQL result format, we'd need to be able to access each individual person using a for statement, saying some like:

for $person in @<http://www.w3.org/2001/vcard-rdf/3.0#FN>
return 
    $person/*, "  ",
    \"$person/*\", chr(10)

with the following results:

<http://somewhere/JohnSmith/> "John Smith"
<http://somewhere/RebeccaSmith/> "Becky Smith"
<http://somewhere/SarahJones/> "Sarah Jones"
<http://somewhere/MattJones/> "Matt Jones"

$person is actually a sequence of @foaf:FN predicates and not subjects as you might expect. We do that in this case because we're dereferencing from the predicate to its downstream literal in the return clause.

Note the leading wildcard preceding the filter on @foaf:FN. This says we're grabbing subjects and not predicates.

If you were using this particular reporting style a lot, you might consider writing it up as a user-defined function and possibly making it external (neither capability is discussed in this version of the language spec).

Examples

Example 1: Marriage partners

This example is inspired by an example in Steve Harris' TestSketchCase document. I've reworked Steve's abstract syntax into what the corresponding RDQL might look like. (Any errors in the transliteration are mine.) This particular query looks through a sequence of married partners and returns triples where the age of the first partner is less than the age of the second partner

In RDQL:

SELECT?x, ?y WHERE
    (?x :marriedTo ?y)
    (?x :age ?xAge )
    ( ?y :age ?yAge )
    and ?xAge < ?yAge

In XsRQL:

for $x in *[ @<marriedTo> ]	
for $y in $x/@<marriedTo>/*
where $x/@<age>/* < $y/@<age>/*
return
    { $x, @<marriedTo>, $y }

If we wanted to walk our way through this code at runtime, we would say:

The first for statement evaluates the expression on the right to produce a result sequence, and then assigns each of its constituent items in turn to the variable $x on the left.

For each $x, follow the predicate arc labelled @marriedTo to produce one (and hopefully not more!) marriage partners at the object end of the arc.

Now that we have two partners $x and $y, examine the respective age of both partners. If the @age value attached to the first partner is less than that attached to the second partner, generate the triple shown and add it to a growing list of such triples; otherwise throw it away.

Once both iterations are complete, return the final sequence "out the top" of the superordinate for expression. Since that's also the root of our query, we're done.

Note the use of positional context to distinguish the subject partner in the first for statement from the object partner in the second for statement. The wildcarded resource being assigned to the $x variable in the first statement is a subject because (1) it immediately precedes a predicate, and (2) it's the rightmost, non-filtered item in the path. (It's also the only non-filtered item in the path.) The wildcarded resource being assigned to the $y variable in the second for statement is an object because it immediately follows a predicate. (The predicate could be embedded in a filter or directly inline as it is here; it wouldn't make any difference.)

Example 2: Plotting people on a map

This is one of the examples from the "Query and Rule languages Use Cases and Examples" document at http://rdfstore.sourceforge.net/2002/06/24/rdf-query/query-use-cases.html.

The query ties together triples from the FOAF vocabulary and the RDF Interest Group's GEO vocabulary, which uses WGS84 (World Geodesic Survey) longitude and latitudes.

In RDQL:

select
        ?uri,?name, ?lat, ?lon 
from 
        <http://foaf.asemantics.com/dirkx>
where
        (?person, <rdf:type>, <foaf:Person>), 
        (?person, <foaf:name>, ?name), 
        (?person, <foaf:based_near>, ?bn), 
        (?person, <foaf:mbox>,?uri),
        (?bn, <pos:lat>, ?lat), 
        (?bn, <pos:long>, ?lon) 
using
    rdfs FOR <http://www.w3.org/2000/01/rdf-schema#>,
    foaf FOR <http://xmlns.com/foaf/0.1/>, 
    pos FOR <http://www.w3.org/2003/01/geo/wgs84_pos#>,

In XsRQL:

declare prefix pos:               = <http://www.w3.org/2003/01/geo/wgs84_pos#>;
declare prefix foaf:              = <http://xmlns.com/foaf/0.1/>;
declare datasource dirksFoafFile  = <http://foaf.asemantics.com/dirkx>;

for $person in dirksFoafFile//*[ @rdf:type/foaf:Person ]
return 
    $person/@foaf:mbox/*, ", ", 
    $person/@foaf:name/*, ", ",
    $person/@foaf:based_near/*/@pos:lat/*, ", ", 
    $person/@foaf:based_near/*/@pos:long/*, chr(10)

I've omitted the rdf: prefix declaration, since this is a well-known namespace prefix to XsRQL.

If you wanted to simplify the query slightly and improve performance a bit (since you wouldn't have to dereference down the path from $person quite as far), you could add a temporary $location variable and say:

declare prefix pos:       = <http://www.w3.org/2003/01/geo/wgs84_pos#>;
declare prefix foaf:      = <http://xmlns.com/foaf/0.1/>;

let $dirksFile := datasource( <http://foaf.asemantics.com/dirkx> )
for $person in $dirksFile//*[ @rdf:type/foaf:Person ]
let $location := $person/@foaf:based_near/*
return 
    $person/@foaf:mbox/*, ", ", 
    $person/@foaf:name/*, ", ",
    $location/@pos:lat/*, ", ", 
    $location/@pos:long/*, chr(10)

Example 3: Finding hypernyms with WordNet

This example comes from an IBM developerWorks article by Philip McCarthy titled an Introduction to Jena, which looks at using Jena with a WordNet ontology. The following RDQL query is used to find all the WordNet "hypernyms" of the words "panther" and "tiger":

SELECT
      ?wordform, ?definition

WHERE
      (?firstconcept, <wn:wordForm>, "panther"),
      (?secondconcept, <wn:wordForm>, "tiger"),

      (?firstconcept, <wn:hyponymOf>, ?hypernym),
      (?secondconcept, <wn:hyponymOf>, ?hypernym),

      (?hypernym, <wn:wordForm>, ?wordform),
      (?hypernym, <wn:glossaryEntry>, ?definition)

USING
      wn FOR <http://www.cogsci.princeton.edu/~wn/schema/>

The RDQL resultset is:

wordform  | definition
=====================================================================================
"big cat" | "any of several large cats typically able to roar and living in the wild"
"cat"     | "any of several large cats typically able to roar and living in the wild"

The equivalent query in XsRQL is:

declare prefix wn: = <http://www.cogsci.princeton.edu/~wn/schema/>;

"wordform   |     definition\n",
"=======================================================================\n", 
for $concept in *[ @wn:wordForm = "panther" or @wn:wordForm = "tiger" ]
return
    $concept/@wn:wordForm/*, " | ", $concept/@wn:definition/*

Example 4: Libby Miller's papers

This example is taken from Libby Miller's online paper, RDF Query by Example. The query is written in squishQL and says, "Find me the name of the person whose email address is libby.miller@bristol.ac.uk, and also find me the title and identifier of anything that she has created".

select ?name, ?title, ?identifier 
where 
    (dc::title ?paper ?title)
    (dc::creator ?paper ?creator)
    (dc::identifier ?paper ?uri)
    (foaf::name ?creator ?name) 
    (foaf::mbox ?creator mailto:libby.miller@bristol.ac.uk) 
using dc for http://purl.org/dc/elements/1.1/
foaf for http://xmlns.com/foaf/0.1/

The main thing to note, in attempting to move from the above triples formulation to a path-based one, is that the "?creator" person who's the subject owner of the foaf:mailbox in statement #5 above, is also the object "?creator" person who's created the paper in statement #2.

Using an ad hoc amalgam of XsrPath with an RDQL-style variable-binding notation, we can concatenate the two relationships into a single path describing who knows what about what and who does what to whom:

?paper/@dc:creator/?libbyPerson/@foaf:mbox/"mailto:libby.miller.@bristol.ac.uk"

What we want to do is to isolate the "libbyPerson" in the middle of the path as follows:

declare prefix dc:   = <http://purl.org/dc/elements/1.1/>;
declare prefix foaf: = <http://xmlns.com/foaf/0.1/>;

let $tab         := chr(9),
    $lf          := chr(10),
    $libbyPerson := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ],
    $libbyPapers := *[ @dc:creator/$libbyPerson ]
return
(
    $libbyPerson/@foaf:name/*, " has written ", count( $libbyPapers ), " papers:", $lf,
    for $paper in $libbyPapers
    return
    (
        $tab, $paper/@dc:identifier/*, ": ", $paper/@title/*, $lf
    )
)

The key to understanding the two variable assignments in the middle of the query:

    $libbyPerson := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ],
    $libbyPapers := *[ @dc:creator/$libbyPerson ]

is to note that any self-respecting implementation should first be able to readily find all foaf:mbox's with a value of "mailto:libby.miller@bristol.ac.uk" and from there be able to find the owner(s) of such a mailbox. Once that node has been found and assigned to $libbyPerson, the implementation should equally easily be able to examine all its dc:creator predicates to determine which one points to Libby, whether this is done by brute force, by doing joins on an SQL backend, or by following internal data pointers from predicate to object.

Note that we've added a few variable definitions to better document our use of tabs and linefeeds, as well as a shortcut for cascading let clauses that lets us use comma separators between clauses, instead of forcing us to repeat the word "let" over and over again.

Without knowing anything about the specifics of Libby's particular publishing history, the results might look something like the following:

Libby Miller has written 406 papers:
    1987-03-02-1: By Gun and Camera Through the Alimentary Canal
    1987-03-02-2: RDF: A History of Renal Dental Failure among the Flemish
    1987-04-10-1: My Fabulous Childhood. Life amongst the Gypsies in Paris, Rome, and Bratislawa
    1988-11-10-1: The Seduction of Technology
    ...

Example 5: Libby's mailbox

This last example looks at the usage of an if statement in XsRQL. We can use an if to check the validity of the datastore vis a vis the inverse-functional status of Libby's mailbox:

declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; 

let $libbysMailboxes := @foaf:mbox[ "mailto:libby.miller@bristol.ac.uk" ]/*
return
    if ( count( $libbysMailboxes ) = 0 )
    then "Libby doesn't have a mailbox"
    else 
        if ( count( $libbysMailboxes ) > 1 ) or ( count( *[ $libbysMailBoxes ] ) > 1 )
        then "Libby's mailbox isn't inverse functional!"
        else "Libby has a single @foaf:mbox as expected: ", $libbysMailBoxes

Example 6: Libby's optional mailbox

To close with one final, short example, here's the use of an if expression, combined with the built-in function exists(), to return an optional result. The following query returns a constructed triple containing Libby's name, followed by a triple containing her mailbox if she has one:

declare prefix foaf: = <http://xmlns.com/foaf/0.1/>; 

let $libby := *[ @foaf:mbox = "mailto:libby.miller@bristol.ac.uk" ]
return
    { $libby, @foaf:name, $libby/@foaf:name/* },
    if ( exists( $libby/@foaf:mbox ))
    then { $libby, @foaf:mbox, $libby/@foaf:mbox/* }
    else ()

The XsRQL grammar

The grammar below is a first cut and is still incomplete. Along with most of the XPath-style kindtests shown in the XsrPath snippets above, the grammar is most noticeably still missing productions to handle:

user-defined functions

the associated type declaration sub-language for function signatures

FLWOR's, and

the mechanism for specifying xsd: literal types in XsrPaths

The grammar is short and sweet, certainly when compared to the XQuery BNF, its progenitor. The XQuery grammar by comparison is several hundred productions long and uses some twenty-five (25) lexical states to enable proper lexing. Debugging it was a huge amount of fun. Not. This one's a piece of cake by comparison.

getQueryAST	::=	mainModule
mainModule	::=	prolog ( queryBody )
prolog	::=	( ( prefixDecl \| dawgDecl ) <SemiColon> )*
prefixDecl	::=	<DeclareNamespace> <NCPrefixName> <AssignEquals> <Uriref>
dawgDecl	::=	<QName>
queryBody	::=	exprSequence
exprSequence	::=	expr ( <Comma> exprSequence )?
expr	::=	ifExpr
	\|	orExpr
ifExpr	::=	( <IfLpar> exprSequence <Rpar> <Then> expr <Else> expr )
orExpr	::=	andExpr ( <Or> andExpr )?
andExpr	::=	generalComparison ( <And> andExpr )?
generalComparison	::=	additiveExpr ( ( <Equals> \| <NotEquals> \| <Lt> \| <LtEquals> \| <Gt> \| <GtEquals> ) )?
additiveExpr	::=	multiplicativeExpr ( ( <Plus> \| <Minus> ) additiveExpr )?
multiplicativeExpr	::=	unaryExpr ( <Multiply> unaryExpr )?
unaryExpr	::=	( ( <UnaryMinus> ) \| ( <UnaryPlus> ) )* unionExpr
unionExpr	::=	dawgPath ( "\|" unionExpr )?
dawgPath	::=	sPath
	\|	pPath
	\|	oPath
sPath	::=	subjectStep ( filteredSubject )? ( <Slash> pPath )?
subjectStep	::=	primaryExpr
	\|	qName
	\|	wildcard
	\|	uriRef
	\|	anyLiteralTest
filteredSubject	::=	<Lbrack> pPath <Rbrack> ( filteredSubject )?
pPath	::=	predicateStep ( filteredPredicate )? ( <Slash> oPath )?
predicateStep	::=	( <At> ( qName \| wildcard \| uriRef ) )
filteredPredicate	::=	<Lbrack> oPath <Rbrack> ( filteredPredicate )?
oPath	::=	sPath
	\|	literal
wildcard	::=	<Star>
	\|	<NCNameColonStar>
anyLiteralTest	::=	<AnyLiteralLpar> <RparForAnyLiteralTest>
primaryExpr	::=	literal
	\|	functionCall
	\|	variable
	\|	parensExpr
	\|	tripleCtor
variable	::=	<VariableIndicator> <VarName>
literal	::=	integerLiteral
	\|	stringLiteral
parensExpr	::=	<Lpar> ( exprSequence )? <Rpar>
tripleCtor	::=	<Lbrace> sPath <Comma> pPath <Comma> oPath <Rbrace>
qName	::=	<QName>
integerLiteral	::=	<IntegerLiteral>
stringLiteral	::=	<StringLiteral>
functionCall	::=	<QName> <Lpar> ( exprSequence )? <Rpar>
uriRef	::=	<Uriref>

version 0.86 27june04 howard katz

XsRQL: an XQuery-style Query Language for RDF

A submission to the RDF Data Access Working Group (DAWG)

Table of Contents