Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Note
titleThe content on this page has moved

The content on this wiki page has been moved to https://confluence.cornell.edu/x/WKZRF.

This page is no longer kept up to date.


 

 

 

 

 

2013 arXiv Roadmap   2015 arXiv Roadmap

Technical

Items are listed in approximate priority order.

Implement secure communication (SSL/https) for all login and account interactions on arXiv - All interactions with the submission, moderation and admin system should be over SSL. The login in particular exposes arXiv to sniffing attacks for user, moderator or admin credentials, and is especially and issue when users login on unsecured wifi. Completed.

Technical

Complete development issues from 2012-09-21 Scientific Advisory Board meeting - The SAB agreed to a number of changes in the way arXiv should send emails to moderators and control user selection of categories. Changes also add the ability for moderators to comment on and put submissions on "hold" without the need for arXiv admin interaction, a first step toward empowering moderators though more direct facilities. Work was started in 2012 and will be completed in early 2013. Status as of 8/13: Done

Complete move of servers to virtual machine (VM) infrastructure - In 2012 we migrated two out of three of arXiv's server machines to new VMs. We will complete the move by migration our main web server to pair of load-balanced VMs managed by IT@Cornell. This arrangement will support scaling to additional web front-ends as necessary. As part of this transition we will formalize the update and maintenance process for these machines, and remove access by non-CUL or IT@Cornell staff. Status as of 8/13: Done

Change email handling to support arXiv admin and moderation ticketing system - The "Request Tracker" software has been selected and we will rework email filtering software to use this instead of the current email-based arXiv admin workflows. Status as of 8/13: Partially complete. Have moved help mailbox, not yet moderation mailbox. After discussion with SAB and moderators it is clear that discussion with moderators should not move to Request Tracker at this stage

Add automatic classification checks to submission system - We will use classifier software developed by Paul Ginsparg to classify incoming submissions according to our category scheme. Where these automatic classifications differ significantly from the user-selected classifications we will add a warning to the moderator alerts. Status as of 8/13: In discussion with Paul Ginsparg to get software

Improve tools and interfaces to support moderators - We will work Work with the SAB and moderators to define and develop better tools that allow moderators to interact more directly and efficiently with the arXiv system and administrators. The overarching aim is to make best use of available moderator effort by making the work of moderators as quick and convenient as possible, consistent with achieving the quality goals and following policies set out by the SAB. Status as of 8/13: Good discussion with SAB and moderators, currently refining list of tasks generated. Completed several improvements, further work in progress.

Add automatic classification checks to submission system and integrate with moderator alerting - Need to set Paul Ginsparg's classifier software and model data up on a production server and understand how we will deploy software and model updates from him. Will need to work out how to call this from the submission system for new articles, including pipelining the text extraction needed for the classification to work. Based on response from classifier add information to moderator email and also make classification information available on /mod/ pages. Consider thresholds for alert for reclassification. Completed, further work being done to refine notifications.

Provide summary documentation of the arXiv code to SAB - Provide documentation of the arXiv codebase arrangement, technologies and areas of personnel expertise. Do this as an exercise of updating, fleshing-out, and organizing information on the arXiv wiki. Provide dumped snapshot to SAB. Completed.

Do category aliasing Implement new category aliases for cs/math/stat and add a new category for q-fin - There are three category merges (aliases) requested and one new category requested. Some of these are problematic because there are pre-0704 (old id) submissions that exists with primary category that is becoming an alias and so will "break" the primary<->id correspondence. In the past the creation of category aliases (e.g. math.IT/cs.IT) and associated recategorization of articles has required a mix of manual DB edits and one-off scripting. We will develop tools to safely do bulk edits of this sort. Testing is also required to ensure that having articles where the primary classification does not match the old-style id as a result of these new aliases is handled correctly everywhere. Status as of 8/13: Not donealiases have been made on an ad-hoc basis by Simeon Warner. We should instead work out and document procedures for such changes. We will need to create some tools for the bulk re-categorization of submissions affected by merges. New q-fin category created, aliases in cs/math/stat not started.

Add automatic overlap detection comparing new submissions with existing corpus and generate notifications for administrators and moderators - Work with Paul Ginsparg to obtain bulk overlap check software and install on production servers. Develop pipeline for checking of new submissions against existing corpus and staged submissions. Developer warnings for administrators and moderators based on overlap check results. Make these warnings available in moderator emails and on /mod/ and /admin/ pages. Completed duplicate submissions checks based on title similarity, implementation of full overlap detection not started.

Investigate and expand the id range to yymm.nnnnn - We have already had months with > 8000 submissions (1210=> 8452, 1304 => 8135), and must soon expect to have > 9999 submissions in a month. This requires migration to yymm.nnnnn ids instead of the current yymm.nnnn. From January 2015 submissions will get yymm.nnnnn identifiers, existing submissions will retain yymm.nnnn identifiers.

Add Improve author identifier support and data export - Add basic support for ORCID and other author identifiers associated with arXiv accounts. Add periodic data dumps for all public authorship data. Status as of 8/13: Not done

Improve dataset support - Review use of and experience with the Data Conservancy pilot and then either discontinue or improve interaction. Decide on a medium-term strategy for data and consider assigning DataCite DOIs for ancillary files. Status as of 8/13: The Data Conservancy pilot was discontinued (http://arxiv.org/help/data_conservancy) experience reviewed. We will continue to support modest sized datasets and other materials using the existing ancillary files mechanism (http://arxiv.org/help/ancillary_files), and will ingest data from the pilot as ancillary files to support it long-term. We have subscribed to the EZID service and will assign DataCite DOIs to ancillary files.

Security and login, email privacy - There are significant deficiencies which should be addressed. Issues include: all password entry and authenticated interactions should occur via https; domain based cookies should not be sent to mirror sites; and user email addresses should be more carefully protected. Status as of 8/13: Not done

Alerting system - The email alerting system remains popular but the mechanisms for subscription management are outdated. Users should be able to see and control their subscriptions from their user account page. New software will be developed to replace the extremely old and hard to maintain software implementing the current email subscription system. Status as of 8/13: Not done

User Support and Moderation

Establish a new Physics Subject Advisory Committee - Physics moderation has become overly dependent on a few active physics moderators, especially Paul Ginsparg. The current Subject Advisory Committee in physics is dormant. The arXiv Scientific Advisory Board is engaged in an effort to identify additional moderators and seek new leadership within the arXiv physics community. Status as of 8/13: Completed. See Physics Advisory Committee page.

Explore the advantages and practicability of having an arXiv Scientific Director - Work with arXiv Scientific Advisory Board to define the role of a Scientific Director, and explore with stakeholders the desirability and feasibility of such a position. Status as of 8/13: In progress. A proposed position description has been drafted, for review at SAB Sept. meeting.

Define new tools and interfaces for moderators - Work with existing moderators to gather and refine requirements for new tools and interfaces to support their work. Existing processes are in need of enhancement or replacement as we upgrade moderation processes and work to improve moderator efficiency and convenience. Status as of 8/13: In progress. New email list created for moderator input on moderation tools. A prioritized list of actions to be considered at SAB Sept. meeting.

Move arXiv communications administration to an issue tracking system - We have begun the process of shifting all communications with users and moderators to a web-based issue tracking system (Request Tracker). Service desk functions managed by arXiv administrators include answering user questions, troubleshooting technical problems, soliciting and following up on moderator input, responding to and resolving moderation appeals, etc. These functions are currently carried out using multiple arXiv.org email mailboxes. A modern issue tracking system will provide us not only with better tracking, but built in reporting and issue classification tools for better management of this labor intensive aspect of arXiv administration. Status as of 8/13: Partially complete. See item above, under "Technical".

Governance

Form the first MAB - Make MAB appointments in January'13 and start working with the group. Status as of 8/13: Completed

authors - We would like to support ORCID identifiers for better interoperability with other repositories implementing authority control and also as a route toward providing institutional statistics for member organizations (because ORCID is implementing storage of affiliation in the profile data). Work in progress.

Ingest data from discontinued Data Conservancy pilot and assign DOIs to data as ancillary files - We have discontinued the Data Conservancy pilot but not yet pulled the data into arXiv. We continue of accept ancillary files but offer relatively little support. Plan to pull the DC data in and to assign DataCite DOIs from EZID to ancillary files thus making them citeable. Work in progress, data has been copied from Data Conservancy.

Submitter email addresses should not be harvestable - Currently submitter email addresses are stored in the metadata (.abs) files and in the listings files. While the web user interface does hide this information somewhat, the email addresses to too easily harvestable and potentially misused. We have had complaints from users where collaborating services have made the emails available. We should instead keep the submitter email addresses only in the database, and not in the metadata or listings files. Implemented protections in view-email pages, not yet removed from metadata files.

Replace email alert system for better maintainability, to allow easy subscribe/unsubscribe, and for flexible options - Replace email alert system: allow changes via web interface tied to user accounts, make maintainable code, ensure scalability and customization. Work not started.

User Support and Moderation

Define and implement new tools and interfaces for moderators - Work with existing moderators and IT to define and implement new tools and interfaces to support the work of moderators. See "Improve tools and interfaces to support moderators" in Technical section above. Completed several improvements, further work in progress.

arXiv moderator coverage and evaluation - 1) Work with subject committees to attain a full complement of moderators. Over 12 new moderators added in physics during 2014; 2 empty physics mod positions as of Dec 2014. 2) Develop and implement metrics for evaluating moderator performance, to share with subject committee chairs.  Work not completed: defining and developing countable metrics for moderator performance.

General Physics moderation - Work with Physics Advisory Committee to implement an effective moderation process for the General Physics category. Gen-ph moderator in place as of November 2014.

Scientific Director - Work with various stakeholders to define the role of the Scientific Director with respect to daily moderation processes and workflows. Work in progress.

Evaluate administrative processes - Work with Scientific Director to evaluate arXiv administration processes and staffing, especially in light of evolving moderation tools. Work in progress.

arXiv category definitions - Work with Physics Advisory Committee to develop public subject category descriptions for existing physics categories. Only a small number of physics categories currently have descriptions. Defining the scope and boundaries of the categories will help users, moderators, and administrators. Unedited descriptions have been gathered from moderators for most categories; yet to do: work with the Scientific Director and Physics Advisory Committee for editing and approval.

Business Model & Governance

Finalize SAB bylaws - The SAB is reviewing a revised draft of its bylaws. A final version of these will be adopted in early 2013, along with the appointment of a Board Chair. Also under consideration is the question of appointing a scientific director — a part-time position to provide intellectual leadership from the perspective of the scientific community. Status as of 8/13: Almost doneRecruit an Interim arXiv Scientific Director -We created a new position to provide intellectual leadership for arXiv's operation.  First we would like to test this position and refine the job description. In support of this goal, we'll recruit a temporary Scientific Director at 0.50 FTE capacity for an 18-month term to explore how such a position can contribute to arXiv in formulating overall scientific direction of the service and its policies.  Chris Myers as SD recruited.

Test and refine  the operation of the new governance model -The arXiv principles aims aim to clarify the authority, responsibilities, and constraints of CUL, MAB, and SAB. Ironing out problems and developing a working system may system will require some time to test and observe the inner operation of the governance model.   For instance, there may be overlaps and tension, due to role ambiguity or personality conflicts. Currently there is not a clear mechanism to settle potential conflicts among the three chambers - CUL, MAB, SAB. CUL will engage SAB and MAB in proactively identifying and addressing problems. The first strategy to reduce/avoid tension is having in place a roadmap for arXiv so that everyone knows the CUL team's priorities and goals. Both boards need to understand the We will continue our engagements with the advisory boards and experiment with different communication strategies to share our vision, priorities, and challenges to be able to contribute to the and to seek their input. We continue to evaluate our arXiv's governance . Status as of 8/13: In progressCreate a working relationship between SAB and MAB - The advisory groups need to regularly exchange information in order to contribute to each other's agendas in a meaningful and useful manner. For instance, CUL will provide joint briefings(reports) to SAB and MAB to highlight common interest areas and complementary perspectives. Having ex officio member representation in each group will also faciliate information sharing and developing a common understanding of the respective goals. Status as of 8/13: In progress as we've appointed representatives for each groupmodel and the team structure.  We formed several MAB and SAB subgroups this year in order to focus on specific issues such as the IT prioritization, recruitment of a scientific director, evaluation of membership & revenue model, etc. Another important milestone was appointing two new SAB members based on a nomination and election process specified by the SAB bylaws.

Further refine the benefits of being a member for participating institutions - During the governance planning meetings, several ideas emerged as potential free services for members; however, implementing these features may require significant staff time. Therefore these services need to be considered in the context of arXiv's arXiv’s current maintenance and development priorities. After the MAB is formed, in order to better understand demand for services, CUL will invite ideas/proposals and  review them with MAB and SAB.  Some of the ideas to be considered include: submission-based data when arXiv's metadata structure is ready; institutional repository bulk download.  Member institutions are also very keen on informing their scientists and researchers about arXiv's business model (e.g.,arXiv LibGuide to share with their faculty and students in related disciplines). Often, we are advised to make it easier to find business info on the arXiv webpage (one idea proposed is to put links from abstracts or submission forms to 'sustainability' page). Status as of 8/13: To be discussed during the September SAB meetingWe will engage the MAB and SAB in discussing possible ideas and assess their potential value and requirements from the arXiv team. Work in progress - discussion item for SAB/MAB meetings.

Continue the membership drive - We are very encouraged with the five-year pledges received so far. We want to increase the number of arXiv member institutions to create a large and international network of supporters. Another goal is to be able to reduce the institutional membership fees in the future (current annual institutional fees are in the $1,500-$3,000 range). Status as of 8/13: In progress

Develop reserve fund policies - The purpose of the arXiv reserve fund is to support unexpected expenses to ensure a sound business model. Currently, arXiv has a reserve fund of approximately $100,000 that accumulated during 2010-2011 due to unexpected staff vacancies and other savings (2012 expenses have not been factored in yet). The 2013-2017 budget projections assume that we will be able to add $50,000-$100,000 per year to the reserve funds. We need to develop policies about how contingency funds will be used and how the account will be structured (e.g., funds needed for closing the business versus development funds - also a part of the reserve funds to create an endowment & the interest can be use the interest for reducing membership fees).  Also, we need to determine at what point of reserve fund accumulation (total reserve balance) we will consider to lower the annual membership fees. Status as of 8/13: To be discussed during the September SAB meeting

New Partnerships & Communication

Define a R&D agenda and seek external funds to advance arXiv - So far, CUL's sustainability planning efforts focused on arXiv's operational budget to support the core services and arXiv's strengths in order to stay mission-centric. One of the key goals ahead of us is to define a research agenda for arXiv.  We'll seek input from the advisory boards, users/scientists at large, and information scientists.  We will develop a methodology for making decisions about arXiv's R&D projects and partners.  One of the challenges will be maintaining/developing arXiv as a distinct system vs. envisioning it as a part of a joint scholarly communication infrastructure (interoperability, research data, tracking funding sources, etc.).  This is likely to be a potential tension area between the library community and scientists. Status as of 8/13: Not enough attention has been given to this goal as there were more pressing issues

Continue the dialogue with publishers/societies - In celebration of the arXiv's 20th anniversary, on September 23, 2011 Cornell University Library (CUL) hosted a meeting at Cornell with the representatives from several publishers and societies that are interested in Cornell's sustainability planning efforts. The meeting report provides a synopsis of the discussion and recommends next steps for continuing this dialogue. We will resume this investigation and discuss the feasibility and desirability of establishing a research and innovation collaboration in support of arXiv.  This effort is envisioned to entail a separate funding stream (created by participating publishers and societies) from the operational budget, which includes resources for the routine and core services currently provided by the arXiv team (including essential updates). Status as of 8/13: Held a conference call in September, outcomes & next steps to be discussed during the MAB meeting (seek feedback from SAB)

 We recently formed two SAB subcommittees. The first one aims to assess and revise our tier model to consider other classes of member categories such as scientific societies (to be implemented during the next 5-year plan). The second group will focus on fund raising policies & strategies such as the potential addition of a 'Give' button to arXiv homepage to encourage donations. This is an ongoing activity and will continue. With advice from MAB, a decision was made not to change the tier model and membership class categories for the current 5-year business period. The idea of adding a Give button was provisionally approved by SAB & MAB, contingent on a pilot proposal that will lay out the details.  

New Partnerships & Communication

Continue the dialogue with publishers/societies - Continue information exchange with publishers/societies represented in arXiv, especially in exploring the evolving public policies and open access mandates. One of the ideas we are considering is implementing a pilot with the IOP for facilitating deposit of author version of articles to arXiv after they are published. Work in progress (IOP pilot delayed due to the publisher's current focus on CHORUS).  The current OA landscape is changing rapidly due to emerging OA mandates. We will continue to follow the changes and evaluate arXiv's role within the new scholarly communication models.

Define and communicate measures of success - Over the next several months, we want Define and communicate measures of success - To address this dynamic process, one goal over the next several months is to create an assessment model to help CUL continue to fine-tune the sustainability model. An assessment plan will also help identify unforeseen developments and making course adjustments to the service and collaboration model. Working with SAB and MAB, we will develop an assessment framework with three key components. Desired draft desired outcomes and success measures to measure progress measures  (for instance, dynamics of the governance model, level of financial support, enhancements to arXiv, improvements to moderation system, etc.); Assessment model to gauge and report success based on the identified outcome measures; Plan a five-year review process to enable CUL to conduct a comprehensive self-evaluation with input from the Simons Foundation, MAB, SAB, and other key partners. Status as of 8/13: Not done yet

Develop guidelines for engaging in new partnership - arXiv is approached on a regular basis by outside groups asking for advice or special services. Sometimes the assistance requested is minor, but often it would require devoting some amount of staff effort. We need guidelines for determining under what circumstances we will allocate resources in order to collaborate with an outside organization.Working with the arXiv Boards, develop policies that guide consideration of potential working collaborations with outside groups. Status as of 8/13:Not done yet but have an internal CUL partnership assessment tool for consideration

. Enhance communication with users (scientists) - Share opinions received from scientists and users with SAB and MAB.How about seeking input from other scientists, arXiv users?  What is working well for them that needs to be maintained? What are their unmet needs? What kinds of changes they want to see implemented? CUL needs to have a systematic way of gathering user feedback and this info ought to be shared with MAB and SAB (both groups are also important channels of input as they represent other groups).  Will it be useful to encourage users to send feedback directly to SAB and MAB and make them aware of these advisory groups (through a mailing list)? Also scientists know very little about how arXiv is run and how much it costs. Improve communication, it is important for scientists to know that open access is not free and even open access systems need to be carefully managed. Some scientists may think that they don't need libraries any longer. Be vocal about libraries' new roles. Status as of 8/13: To be discussed during the September SAB meetingWork in progress. One of the new strategies we tried this year was engaging the MAB members in an informal survey of arXiv user experiences from their home institutions.