Dec 20 2012

Connecting research data roadmaps and business cases: the IDMB example for the University of Southampton

Steve Hitchcock

The sausage in the roll or the wafer-thin ham in the sandwich, as promised in the last post this is the alternative to the ubiquitous benefits-evidence slides presented by each project represented at the JISC MRD workshop in Bristol. This presentation connects the development of roadmaps with the business case and policy for making progress with research data management (RDM) at an institutional level.

This was presented by Steve Hitchcock, but draws heavily on a report from the Institutional Data Management Blueprint (IDMB) Project, which began the work on research data management (RDM) at the University of Southampton now being taken on by DataPool. Mark Brown, Oz Parchment and Wendy White, co-authors of that report, are therefore the true authors of this presentation. Comment and interpretation are mine.

This version provides the notes for each slide used to inform the commentary for the presentation. It might be worth opening the Slideshare site (adverts notwithstanding) to switch between the slide notes below and the graphic slides – clicking on View on Slideshare in the embedded view will open these in a separate browser window.

Slide 2 Taking the IDMB example with others, connecting roadmaps with the business case and policy seems like a logical sequence, but in practice this is not always the case. At Southampton we have a roadmap and an official institutional research data policy, but the business case is still to be approved. Other institutions appear to have begun with a policy. Here we will focus on the roadmap and business case rather than policy.

Slide 3 If the IDMB project elaborated the roadmap, DataPool represents progress along the first part (18 months) of the first phase (3 years) of the plan, and is beginning to fill in components of the map, as can be seen by the links in this slide.

Slide 4 For reference, this is a recent poster designed to show graphically the full scope of the DataPool Project. It shows the characteristic tripartite approach of this and comparable JISC institutional RDM projects: policy, training, and technical infrastructure (data repository and storage services).

Slide 5 This middle phase of the Southampton RDM roadmap looks like it may have been the trickiest part of the map to elaborate. It’s not imminent and depends on outcomes from the first stage; on the other hand, it’s not that far away that we don’t need to be aware and making plans for it. As seen in this extract, it is essentially describing refinements of many of the expected developments from stage 1.

Slide 6 If looking ahead is trickier than framing immediate work, this final phase looking up to 10 years ahead might have been hardest to describe. It is, however, more aspirational in tone and less inclined to deal with specifics, and seems more appropriate for adopting that approach.

Slide 7 A recent and interesting comparison with the Southampton RDM roadmap is that from Edinburgh University. Edinburgh has a target completion date of early 2014, a startlingly short roadmap compared with a 10Y example. The two are not directly comparable, of course. The Edinburgh case looks to be a well specified, well structured and comprehensive first phase and can be commended for that. Whether it is achievable within the time and resources specified we cannot judge yet. The illustration reproduced here is a helpful representation of the plan – at least, it is once you’ve read the plan.

Slide 8 This extract connects the first progress report of the DataPool Project, by then-PI Mark Brown, with the roadmap and policy. It makes the clear point that research funder requirements (EPSRC, RCUK) had an important influence on adoption of the policy at an executive level, even if some discussion at this JISC MRD Benefits Meeting was around whether supporting compliance with such requirements can usefully be presented to researchers as a ‘benefit’.

Slide 9 Other JISC MRD projects that have roadmaps have similarly emphasised the importance of EPSRC requirements on the production of the roadmap.

Slide 10 Now we move on to the second part of the talk, the business case. The data.bris project from Bristol University was presenting in the same session at this event, so we will spare the detail here, but this extract from a recent blog post by the project illustrates some of the imponderables, Donald Rumsfeld-style, of forming a business case for RDM.

Slide 11 We are heading towards the critical part of this presentation, the financial numbers. First some context. This case covers just the technical infrastructure – IT services – not the wider factors outlined by data.bris. This business model has been updated and presented at the University of Southampton and, as we have already indicated is currently undergoing further revision with a view to official acceptance. The assumption stated here is not based on the university’s current research data policy, which requires a record of all data produced in the course of research at the institution rather than full data deposit. The university can’t be said, therefore, to have stopped short, so far, of accepting the business case for supporting the costs of the policy. The data on usage of storage services and projected usage are the basis for the financials that follow.

Slide 12 In the style of the financial services industry, given there are a number of uncertain factors to accommodate in projections of the growth of storage requirements, this chart attempts to draw upper and lower bounded curves to underpin the calculations.

Slide 13 This illustration also comes directly from the IDMB report. Allowing that the metadata should ideally attach to both active and archive layers, the cost factors introduced here are access bandwidth latency and storage technology. The basic choices considered are between more expensive and faster access disk storage, and slower tape stores.

Slide 14 Now we get to the actual financial numbers resulting from this analysis. The number that stands out is Y3 in the disk-based scenario, which not only rises above £1M for the first time but gets closer to £2.4M. Subsequent annual costs shown here remain above £1M for this scenario. The slower tape-based costs are always lower.

Slide 15 Having identified the numbers, the critical decision is how to pay for it. This was an important issue for the second DataPool Steering Group meeting recently. A full free-at-point-of-use service may be the simplest if most expensive option for the institution, but it has been strongly argued that RDM must be viewed as a direct cost of research, and funded accordingly. The dilemma for institutions is how much to invest in infrastructure directly, compared with leaving projects to raise additional costs for data management and risking research bids becoming less competitive than those from institutions with more generous direct support.

Slide 16 In summary, roadmaps are useful for focussing discussion on research data management at an institutional level, and for engaging other stakeholders across all disciplines. Given that a roadmap should be based on prior consultations with those stakeholders, it follows that subsequent interaction with the roadmap should lead to further consultation. The roadmap must therefore be used as a living document. Southampton has not yet finalised its business case for supporting RDM, but it has established a process through engaging with the roadmap in the first instance.

Dec 20 2012

DataPool benefits-evidence table

Steve Hitchcock

JISC, funder of DataPool, of other projects in research data management, and many more projects on widening use of digital technology in education, tends to focus on areas close to practical exploitation. On the R&D spectrum, it is typically towards the development end. For project managers, therefore, there is an emphasis on procedures and tools to increase the impact of practical outcomes – evaluation, sustainability, exit strategies, technology transfer, etc.

Another planning tool being adopted in the Managing Research Data Programme (MRD) 2011-13, of which DataPool is a part, is benefits-evidence analysis. As this description suggests, the idea is to elaborate prospective benefits of a project, and then identify the evidence that will demonstrate whether or not the benefit has been realised. It is as much about informing the process of getting to the results, and identifying which results are important and achievable, as the results themselves.

Hence, JISC MRD projects were invited to Bristol for a 2-day programme workshop at the end of November to present their benefits-evidence slides. If this sounds a little repetitive, it is but not uninteresting, especially as in preparing for the workshop all projects had essentially to engage in the same analysis, and were therefore armed not just with their own slide but ready to comment on others.

For project managers used to working towards outputs (products or services arising from the project) and outcomes (effects of the outputs on users in the target community), benefits are another factor. Hence, the JISC MRD programme has recruited a team of evidence gatherers, to work with and assist projects to hone and refine the benefits they are working towards and the consequent evidence measures. “Those are more outputs than benefits” I was advised, fairly, during open discussion on some ‘benefits’ in my slide. But then I had seeded the slide with points to discuss rather than a definitive list, and unwittingly extended the project’s previously discussed benefits.

So after the workshop I was grateful for the advice of Laura Molloy, evidence gatherer for DataPool, on aligning our pre- and post-workshop benefits lists.

After all that effort it would be a remiss not to reveal our benefits-evidence table that emerged from the process. For the record, here are the benefits DataPool will seek to demonstrate in its final months into early 2013.

DataPool: Benefits-Evidence

1 Improved RDM skills across the target community, including researchers and professional support staff Qual reporting on effectiveness of training events.
Feedback from training courses and deskside consultations, DMP and email help services.
More staff running RDM support services, increased service offer.
2 Greater visibility and use of institution’s research data / research outputs through sharing, collaboration, reuse Qual case study describing improved dataset exposure.
Qual evidence of DMP engagement, including early indications of access routes.
* Quant indication of increase in dataset downloads.
No. of datasets stored in data repository.
Accesses of open datasets vs closed datasets vs shared datasets.
3 Sustained institutional support for RDM / sustainability for RDM infrastructure at institution No. of training opportunities introduced.
Scope of: deskside consultations, DMP support service.
Results from case studies – engagement with existing data facilities.
Assessment of added value for institution of using institutional storage over other options – report.
4 Improved use/uptake of RDM infrastructure Quant account of ‘bid preparation consultations’, inc. qual narrative of referrals to data policy and DMP help.
Case study on working with data policy – feedback on uptake of policy.
Quant tracking of higher attendance at training.
Accesses to RDM guidance documents.
No. of deskside consultations.
* Quant indication of improved uptake of institutional storage and deposit options.
No. of large data projects switching to institutional data service.
5 Time / costs saved by improved RDM infrastructure Identifying early cost-benefits – combined case studies report, inc large data projects, open data, imaging, disciplinary efficiencies.
Assessment of added value for institution of using institutional storage over other options – report (see 3).

* This evidence not expected to be available during DataPool Project, following launch of RDM repository service by project end, but will be collected in ongoing work at Southampton University on institutional RDM. Table by Steve Hitchcock for DataPool, in collaboration with Wendy White, Dorothy Byatt. We gratefully acknowledge the feedback and suggestions from Laura Molloy, JISC evidence gatherer.

The University of Southampton has a 10 year roadmap for research data, of which DataPool represents the first stretch of road, so there is a commitment to go further, but the clearer the steer from DataPool the faster the progress afterwards.

As a little light relief from projects’ benefits-evidence slides, a presentation on the Southampton roadmap and business plan was given at the Bristol workshop. That will be covered in a separate post.

How will you know which benefits have been achieved as the project moves forward? This post is tagged with the label ‘benefits’. All updates reporting evidence from the table above will use this tag. Tags can be found in the column immediately to the right of this one, and up, from this point in the post.

This is how other JISC MRD projects are tackling these challenges and what benefits-evidence are being targetting:

Dec 18 2012

Trialling DataCite for chemistry lab notebooks and repository data services

Steve Hitchcock

To use research data we need to be able locate and cite it. DataCite is a service for identifying and citing data. The British Library’s DataCite service is being trialled through DataPool at the University of Southampton with a view to making an institutional agreement for the service. First to try the service here are Philip Adler and Simon Coles, who report on how well metadata describing entries in chemistry lab notebooks and repositories maps to the DataCite schema.

Recently, we have undertaken trials of the DataCite service operated in the UK by the British Library for minting DOIs (digital object identifiers). These were based on use cases in chemistry concerning the use of an Electronic Lab Notebook (ELN), LabTrove, portions of which can and should be referenced using a DOI when being referred to, particularly from journal articles. Additionally we checked the suitability of minting DataCite DOIs for records in eCrystals – an (institutional) data repository based on the EPrints system.

Four use cases have been identified and tested, referencing:

  1. an entire Lab Notebook
  2. a subset of the entries in a Lab Notebook
  3. a single entry in a Lab Notebook
  4. an entry in the e-Crystals system

The key part of the work is identifying whether or not suitable metadata can be located, so that it can be placed in an XML framework conformant with the DataCite XML schema. Currently the only mechanism for generating the XML is, somewhat laboriously, by hand but, given the successful outcome of our trials, we will automate this process within each system at a later time.

Case 1: Referencing an entire Lab Notebook

The key metadata accompanies each post, so this is for the most part a mapping exercise between the two kinds of metadata. However, the trickiest of these is the publication date. In traditional publication circles, this date is definite – parts of a single journal issue, for instance, would not all be published at different dates. However, in the case of LabTrove the individual entries that make up a complete record (or indeed, a category, as in case 2) can have a range of dates. In guidelines DataCite asks for the most appropriate date based on a citation perspective. This does not necessarily clarify things in this case. For the purposes of experiment, however, I have used the date of the most recent entry in the record being referenced. There is precedent for this in the RSS protocol used as a publishing XML schema elsewhere.

A good feature of the schema is that the <dates> optional field allows dates to be entered each time an entry collection is ‘updated’, i.e. each time a new entry is posted. Another neat aspect is the <relatedIdentifiers> option, which allows each of the records that make up the collection to be linked to the collection itself. The relationship between different resources can be described semantically using the attribute relationType.

Use Case 2: Referencing an arbitrary collection of records

The only adjustment required for this use case is that the items being referenced have a means of being grouped. Happily, LabTrove comes with the ability to tag things within categories, within date, etc. Other than this, there is no difference in procedure between this and the method for use case 1. Additionally, there is a semantic facility in the XML schema which allows ‘related identifiers’- permitting the inclusion of the URL of each record in the XML metadata.

Use Case 3: Referencing a specific record in a Lab Notebook

LabTrove-based blog example: Pictet-Spengler route to Praziquantel Synthesis of intermediates and derivatives of PZQ

Once again, this is a simple mapping exercise, made simpler than the previous two examples by the fact there is no ambiguity about the date information associated with the record.

Use Case 4: Referencing an eCrystals record

Once again, this is a simple mapping exercise, since the data are all presented in the eCrystals record, and there is no ambiguity about any of the data. Some of the information in the XML schema is open to field-dependent interpretation, however (in particular, the ‘roles’ section in the schema), and this could use some clarification within the accompanying documentation.

Dec 17 2012

To architect or engineer research data repositories

Steve Hitchcock

There cannot be many mature products where development meetings have not been interrupted with a rueful declaration that to make further progress “you wouldn’t start from here”. This encapsulates one key difference between the architect and engineer, the latter prepared to work with the set of tools provided, the other preferring to start with a blank sheet of paper or an open space.

In building research data repositories using two different softwares, Microsoft Sharepoint and EPrints, the DataPool Project is working somewhere between these extremes. Which approach will prove to be the more resilient for research data management (RDM)? In this invited talk for RDMF 9, the ninth in the DCC series of Research Data Management Forums, held in Cambridge on 14-15 November 2012, we will look at the relevant factors. As a project we are agnostic to repository platforms, and as an institutional-scale project we have to work with who will support the chosen platform.

The original Powerpoint slides are available from the RDMF9 site. This version additionally reproduces the notes for each slide used to inform the commentary from the presentation. It might be worth opening the Slideshare site (adverts notwithstanding) to switch between the slide notes below and the graphic slides – clicking on View on Slideshare in the embedded view will open these in a separate browser window

I thank Graham Pryor of DCC, organiser of RDMF9, for inviting this talk, and for suggesting this topic based, presumably, on the project blog post shown in slide 2. This post sets out some of the higher-level issues while avoiding the trap of setting up a straw man pitting Sharepoint versus EPrints.

Before we get into the detailed notes, here is the live Twitter stream for the DataPool presentation (retrieved from #rdmf9 hashtag on 15 Nov.).

@jiscdatapool Preparing to talk at #rdmf9. Have the 9 am slot
@MeikPoschen #rdmf9 2nd day: To architect or engineer? Lessons from DataPool on building RDM repositories, first talk by Steve Hitchcock #jiscmrd
@MeikPoschen JISC DataPool Project at Southampton, see #jiscmrd #rdmf9
@simonhodson99 Down to work at #rdmf9 at Madingley Hall – outside it’s misty, autumnal – inside it’s Steve Hitchcock, DataPool: to architect or engineer?
@simonhodson99 Steve Hitchcock argues that the DataFlow solution is one of the most innovative things to come through #jiscmrd #rdmf9
@simonhodson99 ePrints data apps available from ePrints Bazaar: #jiscmrd #rdmf9
@jtedds Hitchcock (Southampton) describes institutional drive to implement SharePoint type solution but can it compete with DropBox? #jiscmrd #rdmf9
@jtedds Trial integrations with DataFlow MT @simonhodson99 ePrints data apps available from ePrints Bazaar #jiscmrd #rdmf9
@John_Milner Hitchcock highlights the challenge of getting quality RDM while keeping deposit simple for researchers, not easy #RDMF9
@simonhodson99 Perennial question of the level of detail required in metadata: with minimal metadata will the data be discoverable or reusable? #rdmf9
@simonhodson99 Is SharePoint a sufficient and appropriate platform for active data management? Sustainable? One size fits all? #rdmf9

Are the Twitter contributions a fair summary? We return to the slide commentary to find out.

Slide 3 The blog post highlighted in slide 2 included this architectural diagram, produced by Peter Hancock, director of the iSolutions IT services provider at the University of Southampton. Although it leans heavily towards referencing Sharepoint, it can be viewed as a high-level reference model, analogous to the OAIS in digital preservation, and therefore as a model that can embrace other repository types.

Slide 4 Before we get into the detail of the presentation, here is a poster-based summary of the DataPool Project. It has a tripartite approach characteristic of similar institutional projects in the JISC MRD programme, covering data policy, training and, the area of interest here, building a data repository. It is worth noting as well, in this context, that the development partners shown in the row beneath the tripartite elements effectively represent ways of getting data in and out of the RDM service adopted, and are relevant factors in the repository design.

Slide 5 Here is how the different repository platforms might line up on a broad spectrum of Architected vs Engineered. This is a rough-and-ready approach to illustrate the basic point. Also included is DataFlow, from the University of Oxford, perhaps the most innovative repository platform to have emerged for RDM. Given its originality, it appears towards the architected end of the spectrum. We could not claim that Sharepoint is a new software platform in the same way as DataFlow, but from an RDM perspective you don’t get anything out of the box – you have to start from scratch and ‘architect’ an RDM solution. What developers can do is try and ‘engineer’ the designed RDM element with the IT services already provided in Sharepoint. EPrints first appeared in 2001 to manage research publications. It has offered a ‘dataset’ deposit type since 2007, so provides a ready-made solution for an RDM repository, and can be ‘engineered’ to enhance that solution. As the slide notes, other RDM repository platforms are available. In the following slides we will explore the features of our three highlighted RDM platforms, starting with DataFlow.

Slide 6 DataFlow is a two-stage architecture for data management: an open (Dropbox-like) space for data producers (DataStage), and a managed and curated repository (DataBank), connected by a standard content transfer protocol, SWORD. While DataBank provides a bespoke data management service for Oxford, we have recently noted experiments to connect an open source version of DataStage with EPrints- and DSpace-based curated repositories, thus providing the yearned for Dropbox functionality apparently so in demand with research data producers.

Slide 7 This is an example screenshot from the DataStage-EPrints experimental arrangement used by the JISC Kaptur project. It shows the familiar Choose File-Upload button combination familiar to e.g. WordPress blog users, for uploading data. Uploaded data is then shown in a conventional file manager list.

Slide 8 To move data from DataStage to the curated repository, again shown in the experimental Kaptur implementation, uses this surprisingly simple SWORD client interface. If this seems insufficient description for a curated item, presumably a more detailed SWORD client could be substituted.

Slide 9 One basis for building a more comprehensive description, or metadata, for research data is this 3-layer model produced by the Institutional Data Management Blueprint (IDMB) Project, the project that preceded DataPool at the University of Southampton. This is quite a general-purpose and flexible model, perhaps with more flexibility than meaning. Structurally, nevertheless, we will see that this has some relevance to repository deposit workflow design.

Slide 10 The 3-layer metadata model can be seen quite clearly in the emerging user interface for data deposit built on Sharepoint. Here we see the interface for collecting project descriptions, used once per project and then linked to each data record produced by the project.

Slide 11 In the same style, here is the Sharepoint user interface for collecting data descriptions. One of the most noticeable features within both the Project and Data forms is the small number of mandatory fields (indicated with a red asterisk), just one on each form. Mandatory fields have to be filled in for the form to submit successfully. Most people will have experienced these fields; invariably when completing a Web shopping form these will be returned with red text warning. In this case you could feasibly submit a project or data description containing only a title. Aspects such as this are shortly to be subjected to user testing and review of this implementation.

Slide 12 Sharepoint has its detractors as an IT service platform, principally bemoaning its complexity-to-functionality ratio. Prof Simon Cox from Southampton University takes the opposite view passionately. This is an extract from his intervention at a DataPool Steering Group meeting (May 2012) putting the case for Sharepoint. It is a good way of understanding the wider strengths of Sharepoint, which may not be immediately apparent to users of particular Sharepoint services. Building the range of services suggested is a difficult and long-term project.

Slide 13 EPrints supports the deposit of many item types, including datasets since 2007. When you open a new deposit process in EPrints you will first be shown this screen, where you can select an item type such as ‘dataset’.

Slide 14 Selecting ‘dataset’ will take you to this next screen, which might look something like this from ePrints Soton, the Southampton Institutional Repository. This is not quite a default screen for standard EPrints installs; the workflow and fields have been customised in some areas by a repository developer.

Slide 15 EPrints users need not be restricted to standard interfaces or interfaces customised to a repository requirement. Interfaces in EPrints can be added to or amended by simply installing an app from the app store, or EPrints Bazaar. Unlike the Apple app store, with which it might optimistically be compared, EPrints apps are not selected to be installed by users but installation is authorised by repository managers. There are already two apps for those managers to choose to suit particular RDM workflow requirements: DataShare and Data Core. More data apps are expected to follow. EPrints is thus being engineered for flexibility in RDM deposit. In the following slides we will explore these first two data apps.

Slide 16 DataShare makes some minor modifications to the default EPrints workflow for deposit of datasets, highlighted with red circles here.

Slide 17 Data Core aims to implement a minimal ‘core’ metadata for datasets. Implementing this app will overwrite the default EPrints workflow, replacing it with the minimal set, approximately half of which is shown here (the remainder in the next slide). In addition, we have a short description of the design aims for Data Core, which are unavailable for Sharepoint data deposit and the DataShare app.

Slide 18 Taking both slides showing the Data Core deposit workflow, this is comparable, in extent, with the Sharepoint ‘data’ interface shown earlier, although it has a few more mandatory fields.

Slide 19 Another example of an EPrints data deposit interface has been developed by Research Data @Essex at the University of Essex. Like Data Core, the Essex approach has explicit design objectives, based on aligning with other metadata initiatives to support multi-disciplinary data. In other words, this does not simply expand or reduce the default EPrints workflow for data deposit, but starts with a new perspective. We have been liaising with its development team to investigate the possibility of building this approach into an Essex EPrints app for other repositories to share.

Slide 20 Here is a section of the Essex workflow, highlighting one area of major difference with the default workflow. It shows fields for time- and geographic-based information.

Slide 21 We’ve looked at getting data into the repository, but not yet how it is displayed as an output, or a data record from the repository. This is one example. It is not the most revealing record, but could be expanded.

Slide 22 Essex has cited specific design criteria for its research data repository. Additionally we have observed some characteristic features, indicated here. In particular, it is a data-only repository, without provision for other data-types offered by EPrints (shown in slide 13). The indication of mandatory fields adds a further layer of insight into the implementation of the design criteria.

Slide 23 So far in this presentation we have seen different implementations of data repository deposit interfaces, including DataFlow, Sharepoint, and multiple interfaces for EPrints. Where is this heading, and what are the common themes? Since we are exploring the difference between architecting and engineering these repositories, I was interested to see this national newspaper article about a major redevelopment of an area close to central London, Nine Elms, an area that interests me as I pass through it on regular basis. Phrases that stand out refer to the relationship between the planned new high-rise buildings. What does this have to do with data repositories?

Slide 24 Interoperability is the relationship between repositories and how they interact with services, such as search, through shared metadata. If repositories have “nothing in particular to do with anything around them” or “show little interest in anything around” them, then they will not be interoperable. If repositories stand alone rather than interoperate then they become less effective at making their contents visible. Open access repositories have long recognised the importance of interoperability, being founded on the Open Archives Initiative (OAI) over a decade ago, and efforts to improve interoperability continue with current developments. Shown here are some current interoperability initiatives from one morning’s mailbox. Data repositories will be connected to this debate, but so far it has not been a priority in the examples we have considered here.

Slide 25 One of the organisations listed on the previous slide, COAR, produced a report that outlines more comprehensively the scope of current interoperability initiatives for open access. While some solutions to the capture of research data seen here have reasonably been ‘architected’, that is, starting with a blank sheet to focus on the specific design needs of data deposit, these will need to catch up quickly with interoperability requirements, including most of those listed here. Data repositories ‘engineered’ on a platform such as EPrints, originally designed for other data types, do not obviously lack the flexibility to accommodate research data, and by virtue of having contributed to repository interoperability since the original OAI, already support most of the requirements shown here.

Slide 26 As for the DataPool Project, it will continue its dual approach of developing and testing both Sharepoint and EPrints apps. As a project it does not get to choose what is ultimately adopted to run the emerging research data repository at the University of Southampton. There are repository-specific factors that will determine that; but there are other organisational factors to take into account as well. Institutions seeking to build research data repositories that are clearly focussed on this range of factors are likely to have most success in implementing a repository to attract data deposit and usage.

This post has covered just one presentation, from DataPool, at RDMF9. The following two blog reports give a wider flavour of the event, the first exploring the architectural issues raised.

Julie Allinson, Some initial thoughts about RDM infrastructure @ York: “I’ll certainly carry on working up my architecture diagram, and will be drawing on the data coming out of our RDM interviews and survey to help flesh out the scenarios we need to support. But what I feel encouraged and even a little bit excited by is the comment by Kevin Ashley at the end of the RDMF9 event: that two years ago everyone was talking about the problem, and now people are coming up with solutions.”

Carlos Silva, RDMF9: Shaping the infrastructure, 14-15 November 2012: “Overall it was a good workshop which provided different points of view but at the same time made me realise that all the institutions are facing similar issues. IT departments will need to work more closely with other departments, and in particular the Library and Research Office in order to secure funding and make sustainable decisions about software.”

Dec 7 2012

DataPool Steering Group, second meeting

Steve Hitchcock

Monday 12 November marked the start of a busy week for DataPool, being the date of the project’s second Steering Group meeting and leading towards a presentation at the 9th meeting of the DCC Research Data Management Forum. In other words, the project was to address two of its key audiences, and had to prepare appropriate documentation for the purpose. We are pleased to share the documentation, starting here with that presented to the Steering Group ahead of its meeting, complementing the record of the first Steering Group meeting.

Collected documents for 2nd Steering Group meeting

Agenda, Steering Group meeting, 12 November 2012
Minutes of previous Steering Group meeting, 31 May 2012
Progress Report by Wendy White, DataPool PI (corrected 20 November 2012)

Introduction to the Progress Report. At the last Steering Group there was a clear emphasis on the importance of supporting cultural change and identifying institutional benefits to improving research data management practice. Recent policy developments from funders have aligned parameters for the accessibility of research data to strengthening requirements for research publications.  There is a focus on benefits- led activity, working with Funders and other external bodies on developing an integrated approach to improving research data management practice. The mid-phase of the project has been informed by this context as we have made progress on the key strands of the project:

  • Developing and rolling out service and training models to work with researchers
  • Planning an evidence-based programme of support for professional services staff providing these services
  • Multidisciplinary engagements
  • Investigating requirements for data storage and archiving
  • Testing the SharePoint and ePrints data catalogue components

PGR Thesis Model: mapping support from start to award, a work-in-progress, particularly with regard to the role of data in the examiners’ process

Note, two documents provided to the Steering Group were from ongoing work and were for current information rather than this record. These were a draft training needs questionnaire aimed at research support staff, and an update report on a 3D data survey at the University of Southampton.

Among the many issues discussed at the meeting, one noteworthy topic was funding models to support a storage strategy, i.e. once the costs have been mapped, does the funding come from grant funding bid applications or from institutional support infrastructure funds? We are particularly grateful to our external (i.e. outside Southampton) steering group members for the additional perspectives they bring, in this case for the valuable insights on the storage funding issue from research councils and data archives.

Members of the steering group present at the meeting (University of Southampton unless otherwise indicated): Wendy White (Chair, DataPool PI and Head of Scholarly Communication), Philip Nelson (Pro-VC Research), Mark Brown (University Librarian), Helen Snaith (National Oceanography Centre Southampton), Mylene Ployart (Associate Director, Research and Innovation Services), Louise Corti (Associate Director, UK Data Archive), Oz Parchment (iSolutions), Les Carr (Electronics and Computer Science), Simon Cox (Engineering Sciences), Graeme Earl (Humanities), Jeremy Frey (Chemistry), Dorothy Byatt, Steve Hitchcock (DataPool Project Managers). Apologies from: Adam Wheeler (Provost and DVC), Graham Pryor (Associate Director, Digital Curation Centre), Sally Rumsey (Digital Collections Development Manager at The Bodleian Libraries, University of Oxford).