Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Items are listed in approximate priority order.

Add automatic classification checks to submission system - We will use classifier software developed by Paul Ginsparg to classify incoming submissions according to our category scheme. Where these automatic classifications differ significantly from the user-selected classifications we will add a warning to the moderator alerts.

Improve tools and interfaces to support moderators - We will work Work with the SAB and moderators to define and develop better tools that allow moderators to interact more directly and efficiently with the arXiv system and administrators. The overarching aim is to make best use of available moderator effort by making the work of moderators as quick and convenient as possible, consistent with achieving the quality goals and following policies set out by the SAB.

Add automatic classification checks to submission system - We will use classifier software developed by Paul Ginsparg to classify incoming submissions according to our category scheme. Where these automatic classifications differ significantly from the user-selected classifications we will add a warning to the moderator alerts.

Implement new category aliases for cs/math/stat and add a new category for q-fin - In the past the creation of category aliases (e.g. math.IT/cs.IT) and associated recategorization of articles has required a mix of manual DB edits and one-off scripting. We will develop tools to safely do bulk edits of this sort. Testing is also required to ensure that having articles where the primary classification does not match the old-style id as a result of these new aliases is handled correctly everywhere.

...

Improve dataset support - Review use of and experience with the Data Conservancy pilot and then either discontinue or improve interaction. Decide on a medium-term strategy for data and consider assigning DataCite DOIs for ancillary files. Status as of 8/13: The Data Conservancy pilot was discontinued (http://arxiv.org/help/data_conservancy) experience reviewed. We will continue to support modest sized datasets and other materials using the existing ancillary files mechanism (http://arxiv.org/help/ancillary_files), and will ingest data from the pilot as ancillary files to support it long-term. We have subscribed to the EZID service and will assign DataCite DOIs to ancillary files.*
Security and login, email privacy - There are significant deficiencies which should be addressed. Issues include: all password entry and authenticated interactions should occur via https; domain based cookies should not be sent to mirror sites; and user email addresses should be more carefully protected.


Alerting system - The email alerting system remains popular but the mechanisms for subscription management are outdated. Users should be able to see and control their subscriptions from their user account page. New software will be developed to replace the extremely old and hard to maintain software implementing the current email subscription system. Status as of 8/13: Not done

Email privacy - User email addresses should be more carefully protected which requires some infrastructural changes.

Implement secure communication (SSL/https) for all login and account interactions on arXiv - All interactions with the submission, moderation and admin system should be over SSL. The login in particular exposes arXiv to sniffing attacks for user, moderator or admin credentials, and is especially and issue when users login on unsecured wifi.

Add automatic classification checks to submission system and integrate with moderator alerting - Need to get Paul's classifier software and model data. Set this up on a production server and understand how we will deploy software and model updates from Paul. Will need to work out how to call this from the submission system for new articles, including pipelining the text extraction needed for the classification to work. Based on response from classifier add information to moderator email and also make classification information available on /mod/ pages. Consider thresholds for alert for reclassification.

Provide summary documentation of the arXiv code to SAB - Simeon to provide documentation of the arXiv codebase arrangement, technologies and areas of personnel expertise. Do this as an exercise of updating, fleshing-out, and organizing information on the arXiv wiki. Provide dumped snapshot to SAB.

Do category aliasing for cs/math/stat and new category for q-fin - There are three category merges (aliases) requested and one new category requested. Some of these are problematic because there are pre-0704 (old id) submissions that exists with primary category that is becoming an alias and so will "break" the primary<->id correspondence. In the past aliases have been made on an ad-hoc basis by Simeon. We should instead work out and document procedures for such changes. We will need to create some tools for the bulk recategorization of submissions affected by merges.

Transfer arxiv.org domain registration from Paul Ginsparg to library - Was in 2006 MOU. 2011-04-06: Paul agreed to transfer domain name. Need to work through Paul's Network Solutions account to effect transfer.

Add automatic overlap detection comparing new submissions with existing corpus and generate notifications for administrators and moderators - Work with Paul Ginsparg to obtain bulk overlap check software and install on production servers. Develop pipeline for checking of new submissions against existing corpus and staged submissions. Developer warnings for administrators and moderators based on overlap check results. Make these warnings available in moderator emails and on /mod/ and /admin/ pages.

Investigate when we'll need to expand the id range to yymm.nnnnn and work out how to do it - We have already had two months with > 8000 submissions (1210=> 8452, 1304 => 8135), we do not know what will happen if we get > 9999 submissions and we would instead like to plan a migration to yymm.nnnnn ids to cater for this.

Add support for ORCID and other author identifiers associated with authors - We would like to support ORCID identifiers for better interoperability with other repositories implementing authority control and also as a route toward providing institutional statistics for member organizations (because ORCID is implementing storage of affiliation in the profile data).

Ingest data from discontinued Data Conservancy pilot and assign DOIs to data as ancillary files - We have discontinued the Data Conservancy pilot but not yet pulled the data into arXiv. We continue of accept ancillary files but offer relatively little support. Plan to pull the DC data in and to assign DataCite DOIs from EZID to ancillary files thus making them citeable.

Submitter email addresses should not be harvestable - Currently submitter email addresses are stored in the metadata (.abs) files and in the listings files. While the web user interface does hide this information somewhat, the email addresses to too easily harvestable and potentially misused. We have had complaints from users where collaborating services have made the emails available. We should instead keep the submitter email addresses only in the database, and not in the metadata or listings files.

Replace email alert system for better maintainability, to allow easy subscribe/unsubscribe, and for flexible options - Replace email alert system: allow changes via web interface tied to user accounts, make maintainable code, ensure scalability and customization

User Support and Moderation

...