Project Description: 

The Cornell e-print arXiv contains one of the largest corpora of scientific literature in the world. Unfortunately, its contents are locked up in the TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We translate it to XML to have a basis for uncovering it's structural semantics.

Project Start Date: 
June, 2006