This project aims at developing a formal description the structures of documents independently from their syntax. Document ontologies can be used to classify types of documents and their subparts. We are particularly interested in document ontologies for semantic markup languages.
For generic document formats like CNXML, the document ontology would contain concepts like paragraphs or sections and interrelations like cross-references or containment. For highly formal domain-specific document formats like OMDoc, the document ontology would additionally contain concepts like mathematical theories and their interrelations.
So far, we have modeled an OWL ontology that covers most of the OMDoc specification. Additionally, we have sketched a document ontology for CNXML 0.5 in UML.