2008-07-06

SPARQL Literal Matching

I use the Jena implementation of SPARQL for a personal photoalbum. I have encountered a use case where SPARQL just can't help me. It is impossible to split a literal into parts. Take this RDF example.
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:a foaf:name "Johnny Lee Outlaw" .
If I want to use SPARQL to convert this information into a different schema where first, middle and last name are separate properties, it can't be done.
@prefix ns2: <http://some.other/namespace/> .
_:a ns2:firstname "Johnny" .
_:a ns2:middlename "Lee" .
_:a ns2:lastname "Outlaw" .
This could be solved by allowing variables to bind to regexp groups. Something like this.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ns2: <http://some.other/namespace/>
CONSTRUCT {
?s ns2:firstname ?first .
?s ns2:middlename ?middle .
?s ns2:lastname ?last .
}
WHERE {
?s foaf:name ?fullname .
FILTER match(?fullname, "([^ ]*) ([^ ]*) ([^ ]*)", "?first ?middle ?last") .
}
I could of course have missed something that allows me to do what I want. You are welcome to correct me in the comments.

In this case, not being Turing-complete is a major drawback for SPARQL. If it was Turing-complete, the problem would be solvable in some way or another. The Principle of Least Power is very useful for data definition languages, but I doubt a SPARQL query is useful for an interpreter not being a query engine.

The easiest way to create a Turing-complete language is to embed it in a Turing-complete host language. Then it is always possible to go beyond the embedded language and use features from the host language when necessary.

No comments:

Debugging with Popper