Posts Tagged ‘RDF’

OpenMath CDs as Linked Data

Wednesday, June 30th, 2010

I am currently pursuing the integration of OpenMath Content Dictionaries (CDs) into the Web of Data. (Here is the agenda, which I will present and discuss at the upcoming OpenMath workshop.) The motivation is that mathematical knowledge is currently underrepresented on the Web of Data, but that it is needed for certain use cases, such as dealing in a reasonable way with all those numbers in statistical datasets published by governments.

Only now I discovered several blog posts, which are almost a year old, on the question whether something that is called “Linked Data” must use RDF. In the proposed OpenMath setup, we will primarily publish the OpenMath CDs themselves according to the Linked Data principles. That works, because the CDs and the symbols defined in them have URIs. The XML language, in which the CDs are written, is well known in the OpenMath community. It consists of a thin XML wrapper around the actual objects of interest, the so-called OpenMath objects, i.e. mathematical formulæ in a functional tree structure. When a web service wants to know how to compute, e.g., the Human Development Index of a country, assuming that the auxiliary data points LE, ALI, GEI and GDP are already known, it looks up the definition of the HDI symbol by its URI, e.g. http://example.org/statistics#hdi. It would request the CD as application/openmath+xml, locate the desired symbol, find out that its definition is 1/3 (LE + 2/3 ALI + 1/3 GEI + GDP) (encoded as an OpenMath object), substitute the values it knows for the parameters, and let a computer algebra system do the computation.

Thus, my answers to these previous blog posts are:

  • to Paul Miller’s “Does Linked Data need RDF?”: No, it does not. OpenMath CDs also work. Well, in principle, at least for entirely OpenMath-based application scenarios, as sketched above. For making a real contribution, the data should additionally be made available as RDF (which is no problem for us, we have the software for translation), so that RDF-based Linked Data applications don’t get stuck on a link saying, e.g., the function used to compute this entry of our dataset is http://www.openmath.org/cd/arith1#sum.
  • to Toby Inkster’s comment on that blog post: Yes, in principle we could convert a whole OpenMath CD to RDF. At the moment, I’m not doing this. I provide the complete structural outline of the CD (i.e. what symbols it contains, what metadata have been given for the CD and its symbols), but so far I have not implemented a translation of OpenMath objects to RDF. Why?
    1. There is no suitable RDF representation of the ordered tree nature of mathematical expressions. Several people have tried it (e.g. [1], [2]), but none of these representations have been adopted by the community, if they have been implemented at all.
    2. RDF-based reasoning engines don’t understand mathematical expressions. They don’t know, e.g., what a bound variable is, so even if we expressed a formula in RDF, it would be useless.
    3. Software that does understand mathematical expressions (e.g. a computer algebra system) can usually either process OpenMath, or a language for which translations from/to OpenMath have been implemented.

    Note that I have been thinking about what information from the OpenMath objects might reasonably be represented in RDF. In my own applications, I do make use of the information about what symbols occur in a formula (regardless of the depth at which they occur and the order), so I represent that information in the RDF I extract from OpenMath CDs. I have seen other applications that care about the symbol at the root of an expressions, such as the “plus” in a+2b², so that could as well be represented in RDF. One could also think about applications making use of OpenMath objects in CDs obtaining them from the RDF representation of a CD, as XMLLiterals. (That could entirely replace the XML-based CD format without losing expressivity, but I’m sure the OpenMath community wouldn’t like that.)

  • to Ian Davis’ blog post: I do not agree with the idea that the term Linked Data may only be used together with RDF. I will continue to call what I’m doing with OpenMath “Linked Data”. However, being aware of the ubiquity of RDF and the software supporting it, I will also make RDF data available for the OpenMath CDs, so the difference is a philosophical one.

And a general remark for the RDF community: Most OpenMath users don’t care. The OpenMath community is conservative, and it has tools that work with the OpenMath knowledge model and its concrete XML representation. In fact, both communities are quite similar. Both have their own standard, with useful applications, and they say: “Why should we need any other knowledge model or format? OpenMath/RDF is fine for us. We won’t use RDF/OpenMath. But of course we’d appreciate if you could come up with another real-world use case that uses OpenMath/RDF and shows its superiority.” (BTW, I would be interested in feedback from other communities whose original data you have published as RDF Linked Data. What attitudes to they have?)

Upcoming SPARQL improvements

Tuesday, February 2nd, 2010

W3C’s new SPARQL working drafts bring a lot of nice features that I soon hope to be widely supported, because our applications would also greatly benefit from them.

Property paths

Property paths will make queries both more powerful and easier to write. Some cases resemble XPath/XQuery:

Find the names of people 2 “foaf:knows” links away.

{
 ?x foaf:mbox <mailto:alice@example> .
 ?x foaf:knows/foaf:knows/foaf:name ?name .
}

… whereas others generalize the idea of transitive closures, which is also relevant in our applications that work on RDF extracted from OMDoc or OpenMath (e.g. finding imported theories, computing dependencies, and checking MMT well-formedness):

Find the names of all the people that can be reached from Alice by foaf:knows:

{
 ?x foaf:mbox <mailto:alice@example> .
 ?x foaf:knows+/foaf:name ?name .
}

Update language

Other features to come are an update language, probably inspired by XQuery Update.  That would, assuming a triple store that supports it, e.g. make it easier to integrate Krextor into applications.

Entailment regimes

Besides enhancements to simple queries, the behavior of SPARQL under different entailment regimes (e.g. RDFS or OWL – in practical terms: what happens when you attach a reasoner to your triple store) will be clarified.

Miscellaneous

In the core of the language, certain other goodies will be specified, such as an easier syntax for negation-as-failure and subqueries (nested queries).

Reinventing the XML→RDF wheel?

Sunday, January 11th, 2009

When researching into related work for Krextor, I discovered this paper about XSDL (XML Semantics Definition Language). (Note that by XSDL the authors do not mean the new name of W3C XML Schema, as the latter has only been renamed recently.) XSDL is a language that allows for solving very similar problems as Krextor – extracting RDF in terms of some ontology from XML documents. I had always been looking for a nice declarative way of doing so, and there it is.

(more…)

XML Pattern Matching and Functional Programming

Tuesday, December 2nd, 2008

I’m currently reconsidering whether it was a good idea to implement my XML→RDF extraction library Krextor in XSLT. Writing down my actual requirements, I realized that I need a language that supports

  • pattern matching on XML elements and attributes, using a syntax that is close to literal XML or to XPath (for easily writing extraction rules, which should also be done by other developers in future)
  • functional programming (in some way), as the whole idea of mapping XML to RDF (and thus XML nodes to URIs) can be modeled most elegantly using a functional approach. (This is rather a requirement for me implementing the generic core of Krextor, but also for extraction module developers once the XML input language is a bit more complex.)

Having looked a bit into XQuery, Scala, and JavaScript (and a little bit into Haskell), I decided to stick to XSLT for now. Functional programming is awkward but possible, and XML pattern matching is awkward or non-intuitive in most other languages.

Documenting XSLT

Wednesday, November 26th, 2008

A considerable part of the implementation of my research prototype(s) is done in XSLT. Now that the extraction of RDF from semantic markup is more and more turning in to a project of its own, more software engineering was needed – including proper documentation.

It turned out that XSLTdoc is a really nice solution for that: Just put a few additional XML elements in front of every template or function and run a special XSLT to generate the documentation. Works like javadoc and looks nice.