Dec 14 2011

Driving institutional research data policy

Steve Hitchcock

Porch/Pooch Policy at PowellsInstitutional data policy is necessary as one of the drivers of changing practices towards research data across the institution. The role of data policy generally, according to Neylon, is to drive data availability, data management, and data archiving while stressing the importance of data as a core output of public research.

In our first post we identified DataPool’s three-pronged approach – system, policy, training – that we hope will enable us to develop and support a rich collection of research data emerging from the University of Southampton. Here we report on how the proposed research data policy is shaping at Southampton, and on progress piloting it through the research and senior management channels towards adoption.

Where we stand now with the policy at Southampton is it was recently given a final-stage presentation to the university’s Research and Enterprise Advisory Group (REAG), which directed the policy to the University Executive Group (UEG). Ultimately, UEG can forward it to the university’s highest policy-making body, the Senate, perhaps by March 2012 if it goes well.

This rate of progress is due in part to the work of our predecessor Institutional Data Management Blueprint project, but credit is also due to the sterling work of Wendy White and Mark Brown at the head of the DataPool Project in piloting the draft policy through the advisory group and policy-making networks. Wendy and Mark are veterans of the university’s Open Access Policy, so they know their way around the networks of influence concerning the development of institutional repositories.

The policy includes the policy document supported by a series of user guides to smooth implementation. It would be premature to describe the specifics of the policy here, although broadly it covers a researcher’s responsibilities, IPR, storage, retention, disposal and access, as well as setting out contextual issues such as purpose, objectives, and definitions. My viewpoint on reading the draft policy is to anticipate how a researcher might respond to it in terms of clarity of actions, options and consequences. In this respect it is noticeable how much the policy has improved through review and iterations. Admittedly it may not attract the same level of excited publicity as, say, an open data policy, but the scope is wider and the purpose more pragmatic.

We do not expect the policy to be without issues when it comes to implementation, clearly, for an initiative of this scale, but the policy will give the DataPool Project the basis to investigate and resolve the issues, in terms of actions and answers. On current schedule, there should be a year for the project to work with this.

There is little prior art on institutional data policy, and one of the reasons JISC has funded DataPool is not just to help produce a data policy, but to inform other institutions on implementation. Logged on the DCC page of UK Institutional data policies are currently just four examples, one of which is a ‘commitment’ rather than a policy, while others are in the early stages of implementation. Policy implementation, monitoring and ability to adapt are the real testing ground for this latest phase of research data management projects.

More, and somewhat better established, data policies can be found among the UK’s research funders, again as logged by DCC. These policies can be seen as context rather than competition for institutional data policies. One of the reasons managers of institutions might commit to research data policy are the requirements on their researchers that are embedded in the funder policies. For the institutions there is a need to support their researchers in complying with the policies, for no doubt there will in future be implications for research assessment processes. There is also the incentive of competition between institutions, and the scent of a leading edge in exploiting innovation driven by the profound changes in digital research data management. As Neylon says: “In the longer term, those who adopt more effective and efficient approaches will simply out compete those who do not or can not.” We will look in more detail at the funder policies and their implications for institutions in a later post.

One of the points of contention in emerging data policy is to define the term ‘research data’. How can policy on this be effectively implemented unless everyone has the same understanding? This may be a semantic argument, but it must also be rooted in current practice by researchers, and also in how that practice is already being shaped by current policy, notably from the research funders. My simplistic take here is that researchers are finding their own preferred approaches to storing and managing early-stage research data, that is, data some way from publication. We might call this the Dropbox approach. Meanwhile funder policy, on the other hand, tends to apply more to data that underpins publication, that is, is concerned with the quality and reproducibility of results, the bedrock of scientific testability. If simple and unrepresentative, this view on the different motivations and practices for capturing both early and late-stage research data nevertheless seems to mirror the framework of our companion JISC DataFlow Project at the University of Oxford, as represented in its DataStage (a secure personalized ‘local’ file management environment for use at the research group level) and DataBank (an institutional-level research data repository allowing researchers to store, reference, manage and discover datasets) processes, respectively.

Seeing the Southampton policy develop through engagement with research, policy and legal experts on advisory groups it is easy to anticipate this prospectively as a worthy policy exemplar for research data. It won’t be the last institutional research data management policy:

> @simonhodson99 By March 2013 all these #jiscmrd projects will develop research data management policies for their institutions #idcc11, 6 December 2011

Timing is key, and our aim is to bring forward policy ratification early in 2012 rather than by March 2013, the project end. It’s important to allow enough time to test the policy in practice. Given the scope of its intended coverage and the range of open questions posed by research data, it is possible the policy might contain unexpected holes or omissions that could limit uptake by both willing and unwilling researchers. Even when adopted – perhaps even more so when adopted – we have to be proactive and vigilant in monitoring how researchers respond to the institutional research data policy.

Dec 8 2011

Data system, policy, training: putting people first

Steve Hitchcock

To support research data management across a large multi-disciplinary institution such as the University of Southampton you need a collection, storage and archiving system, right? Yes, but you need more than that. The proposal for the DataPool Project reveals that it will tackle three distinct developments:

  • Research Data Management System Implementation
  • Research Data Management Policy Ratification and Implementation
  • Integrated Training, Guidance and Support for Researchers

So in addition to a system you need an institutional policy setting out requirements for participation by members of the institution, and training to help them do what the policy specifies using the system provided. We will return to these three developments often in this blog as the project builds, and the next posts will set out the details of where we start for each of these developments.

But there is another crucial element, and the clues are becoming clearer – that is, people. As one of the joint project managers for the DataPool Project, with my colleague Dorothy Byatt, the brief in the project proposal contains no magic bullet that will solve the challenge of rolling out digital data management to members and all disciplines across an institution, but it begins to set out a network of colleagues to help us achieve the goal.

Since we began the project Wendy White, co-investigator from the university library, has been extending this network beyond the co-investigators named in the proposal to encompass data contacts for all eight major faculties across the university, and leaders for a series of disciplinary case studies involving different data types produced by postgraduate and undergraduate students as well as researchers. We look forward to introducing data contacts and case study leaders as we report their work here.

We will, however, introduce our team of co-investigators now because they have shaped both the proposal and the prior project, the Institutional Data Management Blueprint (IDMB), that led to where we begin with DataPool. They are:

Mark Brown (Principal Investigator), Les Carr (computer science), Simon Cox (engineering sciences), Graeme Earl (archaeology), Jeremy Frey (chemistry) and Peter Hancock (iSolutions).

We hope they will have the opportunity during the project to introduce themselves as co-contributors to this blog.