I think it helps to get concrete so I'll try to enumerate various types of id that arXiv uses. I use id for our local ids:

  • id and arXiv:id - our local internal ids. These are used internally and also quite widely in published literature. They are well recognized by the community. They are not resolvable and not guaranteed to be globally unique. For several years we have been encouraging people to add "arXiv:" as a prefix when printed. This increased the accuracy of pattern matching and is especially important now we have dropped the subject area prefix (i.e. we use arXiv:0808.0001 and not the old arXiv:cs/0610031). There is some discussion at http://arxiv.org/help/arxiv_identifier including the versioning scheme.
  • http://arxiv.org/abs/_id_ - this is the canonical link to a spash page. Widely used in web pages, PDF files etc.. URLs with mirror names or old machine names often appear though we discourage (e.g. historical http://xxx.lanl.gov/abs/cs/0610031 which works because this ends up at our mirror http://lanl.arxiv.org/. Because canonical mirror names are in arxiv.org domain we can keep URLs from defunct mirrors working, e.g. [http://za.arxiv.org/abs/cs/0610031)
  • http://arxiv.org/(pdf|ps|dvi|e-print|src)/id - canonical links to PS, PostScript, DVI etc. for id. Same issues as with /abs/ links re mirrors.
  • oai:arXiv.org:id - identifiers used within OAI. These would be very good candidates to change to something more sensible and resolvable.
  • semweb item/concept identifiers - current topic of debate is what we should do for semantic web type apps. We want global, resolvable, persistent ids for articles, versions, ore-aggregations, subjects, collections. Seems like the perfect place to start with any new PIDS.
  • semweb author ids - we have internal account numbers and nicknames for the submitters of all papers. Others may also be registered as owners and/or authors of papers though we do not have ids for all authors. Up until now (Sep 2009) we have not exposed author numbers or nicknames. We plan to expose services based on user ids but no decision has yet been made about what form to expose. We will likely want to use/expose equivalences with author ids in other systems (rdf:sameAs) but sharing and scheme is likely to be impractical.
  • No labels