XML Pattern Matching and Functional Programming

I’m currently reconsidering whether it was a good idea to implement my XML→RDF extraction library Krextor in XSLT. Writing down my actual requirements, I realized that I need a language that supports

  • pattern matching on XML elements and attributes, using a syntax that is close to literal XML or to XPath (for easily writing extraction rules, which should also be done by other developers in future)
  • functional programming (in some way), as the whole idea of mapping XML to RDF (and thus XML nodes to URIs) can be modeled most elegantly using a functional approach. (This is rather a requirement for me implementing the generic core of Krextor, but also for extraction module developers once the XML input language is a bit more complex.)

Having looked a bit into XQuery, Scala, and JavaScript (and a little bit into Haskell), I decided to stick to XSLT for now. Functional programming is awkward but possible, and XML pattern matching is awkward or non-intuitive in most other languages.

Tags: , ,

3 Responses to “XML Pattern Matching and Functional Programming”

  1. I suppose you meant “Functional programming is awkward but possible in *XSLT*”. It is not awkward in (at least two) other functional programming languages.

    I can recommend the Scheme XML tools (http://modis.ispras.ru/Lizorkin/sxml-tutorial.html) or the Haskell XML Toolbox (HXT) (http://www.fh-wedel.de/~si/HXmlToolbox/).
    You may consider also the deductive and rule-based query language for graph-structured data “XCerpt” which is build on top of HXT.

  2. Christoph says:

    Yes, I just left our “in XSLT” in that sentence and didn’t realize the ambiguity. Of course I know that actual functional languages are not awkward – they may just be awkward for pattern matching. Just compare: If I want to match an OMDoc theory in Scala, I have to write “(theory) { _* } (/theory)“, whereas in XPath I can simply write “theory” ;-) – replace (…) by angle brackets; WordPress doesn’t allow them.

    Thanks for the Scheme link! It doesn’t really look like an active project, but I’ve always liked the Lisp way of thinking.

    OK, so you would recommend HXT – I’ll have a look. Oh, I see, I’ve had a look into that before and didn’t read carefully enough. They do support an XPath-like shorthand notation for addressing nodes.

    I think that Xcerpt is not suitable, because I want to do different things: no inference, but translation, and the translation in a slightly more complex way that Xcerpt would offer me for free.

    Thanks!

  3. The Scheme XML tools in fact does not have a very active community.

    Nonetheless, these tools work well. I used them once for a plugin to import OpenMath into TeXmacs. And you can use also your XPath inside these sxml tools – look, for instance, at Example 10:

    (sxpath “bib/book[@year > 1993][position()< =2]“)

    from the sxml tutorial