09 April 2007

SPARQL S-expressions

I'm interested in exposing the SPARQL algebra support in ARQ for others (and me!) to experiment with SPARQL and SPARQL extensions.

Having a syntax to be able to write algebra expressions is useful and makes writing test cases of the algebra easier. ARQ already uses S-expressions to detail the syntax tree so it was natural to use S-expressions for algebra expressions.

I split the lowest levels of syntax out, to avoid having to write a many parsers. The result - SSE (SPARQL S-Expressions), a vaguely lisp-ish syntax. It consists of lists, RDF terms (IRIs, blank nodes, prefixed names and literals) in SPARQL syntax, and also words which are plain symbols without colon.

Given this universal syntax, it's a matter of building code libraries to build the Java data structures from SSE. This is mundane but being able to do this without rebuilding a parser each time is easier.

Example query:

 PREFIX : <http://example/> 

 SELECT ?x ?v
 { ?x :p ?v 
   OPTIONAL { ?v :q ?w }
 }

which is the algebra expression:

 (project (?x ?v)
   (leftjoin
     (bgp [triple ?x <http://example/p> ?v])
     (bgp [triple ?v <http://example/q> ?w])))

The use of either () or [] for lists, where beginning and end must match, aids readability but has no other significance.

Another example: 'prefix' defines namespaces for the enclosed body:

 (prefix ((: <http://example/>))
   (project (?c)
     (filter (= ?c "world")
       (bgp [triple ?s :p ?c]) )))

It doesn't just capture strict SPARQL: tables-as-constants mean an SSE file can contain data as well

(prefix ((x: <http://example/>))
  (join
    (table
      (row [?x 1] [?y x:g])
      (row [?x 2] ))
    (table 
      (row [?y x:g])
      (row [?x 2] ))
  ))

evaluating to:

 --------------------------
 | y                  | x |
 ==========================
 | <http://example/g> | 1 |
 | <http://example/g> | 2 |
 |                    | 2 |
 --------------------------

It's still "work in progress" and a bit rough - it can be inconsistent in layout, mainly due to slipping in a quick bit of hacking between doing other things; and also this leads to different coding styles in different places. But it's already proved to be an efficient way to write SPARQL algebra expressions and evaluate them for testing.

And doing an Emacs mode for SSE is trivial.

As an aside - I did a little web-trawling for the lisp information and gathered my links together.

No comments: