Today I discovered a nice tool for improving your scientific writing: Neil Spring’s style-check.rb. In a primitive but effective way (and aware of LaTeX markup!), it checks your document for the occurrence of typical style or syntax errors. These are implemented as a configurable list of regular expressions. style-check.rb reports any encountered errors like a compiler, with line and column number, according to the developer’s philosophy that writing is just a special case of coding. When executing the script manually, its output is somewhat hard to “process” for a human, so I thought, why not integrate it into Emacs. Emacs is prepared for dealing with compiler output, so the solution was straightforward: Read the rest of this entry »
Scientific writing with Emacs: style-check and writegood-mode
February 25th, 2012KISSWIN congress: support for early career researchers
January 20th, 2012On I attended the KISSWIN congress, an information event about support for early career researchers:
- career paths
- funding possibilities
- networking possibilities
Their homepage contains a lot of helpful information, which I can highly recommend – at least for getting started with things. The information applies to anyone interested in a scientific career in Germany. Not all information is as detailed as I would like it to be, but in addition they also offer career seminars:
Unfortunately, as any other project, KISSWIN is limited in time. If I understood correctly, the proper project will run until the end of 2012, whereas they intend to offer some services (such as the homepage) beyond that. So take the opportunities!
An old Subversion annoyance finally explained
July 7th, 2011I finally found the explanation for a Subversion misbehavior that has been annoying me for a long time. Many repositories of KWARC projects have a world-readable root, whereas access to certain subdirectories is restricted. Now, when checking out such repositories, I never got those subdirectories. So I always ended up doing another checkout for them, but that meant that inside the working copy of the overall repository I had a directory “sub”, which appeared as an unversioned item from above, but was an independent working copy of “sub” in itself – not quite convenient, as that makes it impossible to commit changes in the whole repository at once.
The Subversion book explains why that is the case (quoting from the section on “path-based authorization”):
Partial Readability and Checkouts
If you’re using Apache as your Subversion server and have made certain subdirectories of your repository unreadable to certain users, you need to be aware of a possible nonoptimal behavior with svn checkout.
When the client requests a checkout or update over HTTP, it makes a single server request and receives a single (often large) server response. When the server receives the request, that is the only opportunity Apache has to demand user authentication. This has some odd side effects. For example, if a certain subdirectory of the repository is readable only by user Sally, and user Harry checks out a parent directory, his client will respond to the initial authentication challenge as Harry. As the server generates the large response, there’s no way it can resend an authentication challenge when it reaches the special subdirectory; thus the subdirectory is skipped altogether, rather than asking the user to reauthenticate as Sally at the right moment. In a similar way, if the root of the repository is anonymously world-readable, the entire checkout will be done without authentication—again, skipping the unreadable directory, rather than asking for authentication partway through.
As a workaround, you temporarily have to restrict access to the root directory, while checking out.
SePublica@ESWC Workshop on Semantic Publication (May 30, Crete), LNCS Post-proceedings, Best Paper Award by Elsevier
January 16th, 2011I am a chair of the following workshop (and Michael is on the PC), which is closely related to KWARC’s research interests (specifically KWARC-relevant topics highlighted below):
1st International Workshop on Semantic Publication (SePublica 2011)
at the 8th Extended Semantic Web Conference (ESWC 2011)
May 30th, Hersonissos, Crete, Greece
Keynote by Steve Pettifer, Manchester University, UK: “Utopia Documents and The Semantic Biochemical Journal experiment”
SUBMISSION DEADLINE (extended) March 4
Highlights:
- Best Paper Award sponsored by Elsevier: US$ 750+250 for the most innovative and feasible proposal concerning semantic publishing
- Springer LNCS post-proceedings: A selection of revised versions of the best submissions will be published in the “ESWC 2011 Workshop Highlights” LNCS volume.
The MISSION of the SePublica workshop is to bring together researchers and practitioners dealing with different aspects of Semantic Technologies in the Publishing Industry. How is the Semantic Web impacting the publishing industry? How is our experience of publications changing because of Semantic Web technologies being applied to the publishing industry?
TEI Guidelines mention MathML, OpenMath, and OMDoc
July 31st, 2010Someone in the humanities must be interested in OMDoc. I was really surprised to find a reference to OMDoc in the section “Formulæ and Mathematical Expressions” guidelines (a.k.a. specification) for TEI. TEI (Text Encoding Initiative) is the standard semantic markup language for humanities, social sciences and linguistics, much like DocBook for technical manuals. All that TEI itself has is an element <formula notation=”…”/>, where notation refers to the language in which the formula is represented. But the guidelines refer to some mathematical markup languages, from which the document author is asked to “make an informed choice”:
- TeX – the obvious candidate, also used in some examples
- MathML – the obvious candidate when XML is desired. They give one Presentation MathML example but also mention Content MathML.
- OpenMath – much less expected. Nice to see that here. Oh the other hand, the links to the OpenMath standard are outdated. I should probably report that.
- OMDoc – I didn’t expect that at all.
OpenMath CDs as Linked Data
June 30th, 2010I am currently pursuing the integration of OpenMath Content Dictionaries (CDs) into the Web of Data. (Here is the agenda, which I will present and discuss at the upcoming OpenMath workshop.) The motivation is that mathematical knowledge is currently underrepresented on the Web of Data, but that it is needed for certain use cases, such as dealing in a reasonable way with all those numbers in statistical datasets published by governments.
Only now I discovered several blog posts, which are almost a year old, on the question whether something that is called “Linked Data” must use RDF. In the proposed OpenMath setup, we will primarily publish the OpenMath CDs themselves according to the Linked Data principles. That works, because the CDs and the symbols defined in them have URIs. The XML language, in which the CDs are written, is well known in the OpenMath community. It consists of a thin XML wrapper around the actual objects of interest, the so-called OpenMath objects, i.e. mathematical formulæ in a functional tree structure. When a web service wants to know how to compute, e.g., the Human Development Index of a country, assuming that the auxiliary data points LE, ALI, GEI and GDP are already known, it looks up the definition of the HDI symbol by its URI, e.g. http://example.org/statistics#hdi. It would request the CD as application/openmath+xml, locate the desired symbol, find out that its definition is 1/3 (LE + 2/3 ALI + 1/3 GEI + GDP) (encoded as an OpenMath object), substitute the values it knows for the parameters, and let a computer algebra system do the computation.
Thus, my answers to these previous blog posts are:
- to Paul Miller’s “Does Linked Data need RDF?”: No, it does not. OpenMath CDs also work. Well, in principle, at least for entirely OpenMath-based application scenarios, as sketched above. For making a real contribution, the data should additionally be made available as RDF (which is no problem for us, we have the software for translation), so that RDF-based Linked Data applications don’t get stuck on a link saying, e.g., the function used to compute this entry of our dataset is http://www.openmath.org/cd/arith1#sum.
- to Toby Inkster’s comment on that blog post: Yes, in principle we could convert a whole OpenMath CD to RDF. At the moment, I’m not doing this. I provide the complete structural outline of the CD (i.e. what symbols it contains, what metadata have been given for the CD and its symbols), but so far I have not implemented a translation of OpenMath objects to RDF. Why?
- There is no suitable RDF representation of the ordered tree nature of mathematical expressions. Several people have tried it (e.g. [1], [2]), but none of these representations have been adopted by the community, if they have been implemented at all.
- RDF-based reasoning engines don’t understand mathematical expressions. They don’t know, e.g., what a bound variable is, so even if we expressed a formula in RDF, it would be useless.
- Software that does understand mathematical expressions (e.g. a computer algebra system) can usually either process OpenMath, or a language for which translations from/to OpenMath have been implemented.
Note that I have been thinking about what information from the OpenMath objects might reasonably be represented in RDF. In my own applications, I do make use of the information about what symbols occur in a formula (regardless of the depth at which they occur and the order), so I represent that information in the RDF I extract from OpenMath CDs. I have seen other applications that care about the symbol at the root of an expressions, such as the “plus” in a+2b², so that could as well be represented in RDF. One could also think about applications making use of OpenMath objects in CDs obtaining them from the RDF representation of a CD, as XMLLiterals. (That could entirely replace the XML-based CD format without losing expressivity, but I’m sure the OpenMath community wouldn’t like that.)
- to Ian Davis’ blog post: I do not agree with the idea that the term Linked Data may only be used together with RDF. I will continue to call what I’m doing with OpenMath “Linked Data”. However, being aware of the ubiquity of RDF and the software supporting it, I will also make RDF data available for the OpenMath CDs, so the difference is a philosophical one.
And a general remark for the RDF community: Most OpenMath users don’t care. The OpenMath community is conservative, and it has tools that work with the OpenMath knowledge model and its concrete XML representation. In fact, both communities are quite similar. Both have their own standard, with useful applications, and they say: “Why should we need any other knowledge model or format? OpenMath/RDF is fine for us. We won’t use RDF/OpenMath. But of course we’d appreciate if you could come up with another real-world use case that uses OpenMath/RDF and shows its superiority.” (BTW, I would be interested in feedback from other communities whose original data you have published as RDF Linked Data. What attitudes to they have?)