Mar 29 2012

DataPool: presented, tweeted, blogged

Steve Hitchcock
Computer Applications and Quantitative Methods in Archaeology (CAA) 2012 conference

Computer Applications and Quantitative Methods in Archaeology (CAA) 2012 conference, hosted by the Archaeological Computing Research Group in the Faculty of Humanities at the University of Southampton on 26-30 March 2012

How do you give a conference presentation when your laptop with the presentation on it dies 1 hour before the presentation? You tweet it.

Graeme Earl is co-investigator with the DataPool Project. He is also a senior lecturer in archaeology at the University of Southampton and organiser of the Computer Applications and Quantitative Methods in Archaeology 2012 (CAA2012) conference being held in Southampton this week (26-30 March). So this is an especially busy time for Graeme, yet he still wanted to give a presentation on DataPool to his own research community.

Why choose the novel means of presenting via Twitter? Graeme explains: “I decided at lunchtime that I would give the paper via twitter, and upload slides as an accompaniment. An hour before the paper my laptop died catastrophically and, irony of ironies, my presentation materials were on my laptop rather than on a network location. So I assembled the presentation as links in 30 minutes and then delivered it.”

In case Graeme hasn’t time to blog his presentation as well, we’ll do it for him. Twitter is intended to be an immediate service so retrieval can get harder over time. You may be able to find the original tweets by searching for Graeme’s username or for the hashtags he used. To avoid repetition these have been removed from the tweets and are copied immediately below. There is also some brief annotation of links between tweets to assist readers. Otherwise, tweets are as Graeme’s originals. For reference, the presentation was given around 5 pm on Wednesday 28th March.

@GraemeEarl #caasoton #datapool #jisc

> Starting my tweeted paper on #datapool now #caasoton

> Managing Research Data

Report on Developing Institutional Research Data Management Policies, a JISC Managing Research Data (MRD) Programme meeting held in Leeds on 12-13 March.


This blog.

> Research data management infrastructure

Research data management infrastructure projects (RDMI), Web page on the first phase of the JISC MRD programme.


JISC project page for IDMB: Institutional data management blueprint, predecessor project to DataPool.

> Creating a system – sharepoint, repository, metadata

DataPool poster paper, on Graeme’s Slideshare account.

> Rolling out a policy – ratified, embedded, implemented

> Producing examples – discipline, re-use case studies, domains e.g. imaging

> Developing skills – training staff and students; ‘help desk’

> Sharepoint infrastructure provides data access and collaboration

> University deep storage repository + connection to others e.g. via SWORD2

> ADS SWORD ARM project

JISC project page for SWORD-ARM: SWORD & Archaeological Research data Management.

> ADS page for SWORD ARM facilitating deposit from outside to ADS repository


> Middle layer of metadata management – initially project/sub-project/item hierarchy

> Publication – push to and pull from external repositories e.g. ADS; policy implications for this?

> Provide external access to cache and deep storage versions

> Demonstration repository; trialling with

Portus Project, Digital Humanities, University of Southampton.

> Presented at Soton Research and Enterprise Advisory Group (REAG)

A project for the research life cycle? DataPool blog post, 8 March 2012.

> Ratification by Soton senate; included user guides also clarify uncertainties

> Defining core focus areas e.g. USRG Imaging

Computationally Intensive Imaging, University Strategic Research Groups (USRGs), University of Southampton.

> Building network of experts and interested people

Data system, policy, training: putting people first, DataPool blog post, December 8th, 2011.

> Defining internal dissemination mechanisms e.g. USRG DE

Digital Economy USRG, University of Southampton.

> data management plans presented to other JISC projects

Data management plans (DMPs): the day has arrived, DataPool blog post, 22 March 2012.

> Details of meeting disciplinary challenges in research data management planning workshop

Agenda for JISC workshop on Meeting (Disciplinary) Challenges in Research Data Management Planning held in London on 23 March.

> Finished. Taking questions.

> @PatHadley thankfully I had a helper to advance them for me!

That’s it: presented, tweeted, now blogged.

Mar 22 2012

Data management plans (DMPs): the day has arrived

Steve Hitchcock

Changed Days at Paddington Basin, Colin Smith, Geograph Project

Updated 28 March 2012

The day has arrived for data management plans (DMPs). It’s tomorrow (Friday 23rd March 2012) when Research Data Management Planning Projects from the JISCMRD (2011-13) programme convene a workshop in Paddington, London, to present their findings and results. But has the day for DMPs arrived in a bigger sense? Are DMPs pivotal to research data management? I suspect so, and at the meeting I will be looking for evidence to support the assertion, or not.

DMPs are the link between the conception and proposal of research projects, and the later production of data from those projects. These plans can be extensive and demanding to produce, but as a result the information they contain should be invaluable to data repositories. This is not the type of information the researcher is likely to provide again at the point of depositing data in an institutional data repository.

DMPs represent carefully planned information on the project and predict the existence of data, in some cases precisely. This creates a link with emerging research data policy, which requires an open record of data produced in the course of funded research and the effective management and storage of that data. DMPs have a role to play in monitoring and ensuring the completeness of the records.

This approach raises a series of questions about DMPs. What is the scope of a DMP, and who defines this? This is most likely to be the research funder, but might be institutions in other, non-funded cases. In what form will the DMP be completed? Presumably online. Where will online DMPs be hosted? The Digital Curation Centre, not a research funder, hosts the DMP Online tool. Should institutions create and/or host DMP tools? To what extent will it be possible to (pre-) populate data repository records from DMPs? How comfortable are funders, institutions and researchers about sharing and publishing information from DMPs? The answers will involve specifying where DMPs fit the researcher’s workflow, ensuring there is no duplication of effort, and allowing DMPs to be driven by the needs of research and researchers, not by systems requirements or other special pleading.

These are some of the issues that will be in my mind when listening to the DMP project presentations tomorrow, and which I shall report on afterwards.

Update. Following the JISC DMP meeting in Paddington, Meeting (Disciplinary) Challenges in Research Data Management Planning Workshop, and further feedback from key presenters, some of the questions I posed can begin to be answered. For me the key presentations in this context were by Kerry Miller and Adrian Richardson from DCC, who updated the meeting on future plans for the DMP Online tool, and from David Shotton, a zoology researcher at Oxford University who has been compiling a more researcher-friendly set of 20 DMP questions.

First, we were shown how DMP Online v3.0 now includes selectable templates for e.g. different funding council requirements, so we can see how customisation of DMP input forms is beginning to take shape. We can also see from the plans for DMP Online that one possibility being considered for the tool is Ability to host locally within institutions. This was my choice in the selection exercise, but it appears others did not rank this feature so highly. My recollection is this group exercise was somewhat curtailed, and the tied rankings for many features suggest the returns were not high, so I hope the development team will not feel bound to the ranking of these results.

Now to David Shotton’s analysis of a short, customised set of DMP questions. If we are to host locally and customise DMP tools, we need to be careful we do not get away from the core requirements of the forms, which are not simply to suit individuals or institutions. They still have to be grounded in formal funder and research requirements. So having framed his 20 DMP questions, David looked into comparing and aligning his questions with known sets of DMP questions, including from DMP Online and the US equivalent DMP Tool, and others. To make this alignment David created a downloadable spreadsheet containing the aligned DMP questions, which can be found in a link towards the end of the blog post. An analysis of this comparison is provided in a follow-up post, also linked.

That’s enough for this update. It’s time to look at David Shotton’s analysis and at the plans for DMP Online in more detail. I expect to return to this. I can’t answer yet my latter questions on whether researchers and funders will be happy with the approaches being considered here, nor how soon this work might come to fruition by integrating DMP tools in data repository-based research workflow. Even to suggest this phrase leaves me accused of over-egging this particular pudding. What I can say is that answers to my first questions, on customisation and hosting, have begun to be revealed and they are highly encouraging.

Mar 8 2012

A project for the research life cycle?

Dorothy Byatt

How do we view a project like DataPool?  What are we hoping to achieve?  These are important questions that need to be kept in mind throughout the life of the project.  It can be easy to become focused on specific tasks.  Projects can be seen as simply “a project” with a fullstop and an end, but DataPool is more than “just” a project testing ideas and systems.  It will do that, but we hope that it will do much more.  DataPool is about beginning the process of embedding the management of research data into the infrastructure and culture of our institution.  DataPool is here to make a difference and to make it throughout the research life cycle, from proposal to storing and sharing.


The bedrock underpinning the project will be a Research Data Management Policy for the University.  This will be key Pozo de las animas by Alejandro Colombo CC BY-NC-SA 2.0to all the other work and will inform the related guidance and training requirements.  Its development is being seen as an iterative process with views of the academic community initially being gathered through designated “data” contacts within the Faculties.  The policy will be valuable in informing data management processes in the University, influence plans where required by funders and will be a significant benefit arising from the DataPool project.  We would hope by the end of the project to see an increased number of references to the policy within research proposals, resulting over time in an increased number of datasets held securely and in a location that makes them available for re-use.


The increased focus on research data, its management, storage and sharing requires that the systems offered within the University of Southampton are adapted and developed so that they can meet this need.  DataPool will be of benefit to this process.  DataPool will work to inform the decisions concerning the technical infrastructure of the institution to provide a simple deposit system that will also facilitate sharing at the appropriate time and under approved conditions.  This will be geared towards the individual researcher, influenced by case studies and discipline exemplars, with the aim of seeing how best it can support the research data workflow and capture metadata from existing University systems.  By the end of the project we would expect to have enhanced the storage and deposit options available, and seen an improved uptake of them.


The start of the data life cycle is long before the creation of any data and really begins with the research proposal.  We plan to draw together a network of services that will support the researcher from proposal to deposit.  This will draw on existing services and expertise, both internal, such as our Research and Innovation Service, Doctoral Training Centres, Library, and external ones, such as the Digital Curation Centre.  We aim to create: guidance sheets; training materials; and to offer workshops and a web site. These will enhance the support that academic, professional and support staff can provide, whether for writing plans or advice on versioning through to different levels and types of metadata.  We would see the establishment of a central web site as an important step in this area.  The creation of this support will be a direct benefit arising from the Datapool project.