Mar 25 2013

Institutional alignments for progressing research data management

Steve Hitchcock

Can visualisation of alignments – of people and ideas across an institution – reveal and predict progress towards research data management (RDM)?

DataPool has been seeking to institute formal RDM practices at the University of Southampton on three fronts – policy, technical infrastructure, and training – as we have noted before. In addition, the university has a longer-term roadmap looking years beyond the point reached in DataPool.

One aspect of this work we haven’t addressed is the alignments that have been instrumental in making progress on these three fronts. It follows that if we can visualise these alignments then not only does this chart progress but it may reveal¬†new alignments that need to be forged looking forward, and¬†where there may be gaps in existing alignments there could be lessons for future progress. Since in terms of these alignments the University of Southampton may be distinctive but not unique, this analysis might extend to other institutional RDM projects.¬†That is the idea, at least, behind the latest DataPool poster presentation, shown below,¬†prepared for the final JISC MRD Programme Workshop (25-26 March 2013, Aston Business School, Birmingham).

Within DataPool we have established formal and informal networks of people that connect with and cross existing institutional forums. For example, the project has close and regular contact with an advisory group of disciplinary experts, has established a network of faculty contacts, has been working with the multidisciplinary strands of the University Strategic Research Groups (USRGs), and with senior managers and teams in IT support (iSolutions) and Research and Innovation Services (RIS). At the apex, we have a high-level steering group that spans all of these areas with in addition senior institutional managers (Provost, Pro-VC) as well leaders from external data management organisations. A series of case studies provide insights into the current data practices and needs of those researchers who are data creators and users.

Returning to the three fronts of our investigations, we have reached either natural and expected conclusions ready to be taken forward beyond DataPool, or in some cases incomplete and possibly unexpected conclusions. Below we reveal and assess the alignments that have driven progress on these three fronts:

Policy. Approved by Senate, the¬†University’s ‘primary academic authority’, following recommendations from the¬†Research and Enterprise Advisory Group (REAG), and officially published within the University Calendar. This alignment did not happen by chance, but began to be formed by the library team through the IDMB project and was taken forward within DataPool. Supporting documentation and guidance for the policy is provided on the University Library web site. The policy is effective from publication, but with a ‘low-profile’ launch and follow-up it has by design not had widespread impact on researchers to date.

Data infrastructure. Research data apps for EPrints repositories, with selected apps installed on ePrints Soton, the institutional repository, which is now better structured for data deposit. Progress made with initial interfaces in Sharepoint, the university’s multi-service IT support platform, to describe data projects and facilitate data deposit; some user testing, but currently remains incomplete. On storage infrastructure it has not been possible to cost extensions to the existing institutional storage provision, a limitation in extending data services to large and regular data producers, who by definition are the most active data researchers.¬†One late development has been to embed support for minting and embedding DataCite DOIs for data citation in data repositories at Southampton.

Training and support. Principally extended towards PhD and early career researchers, and in-service support teams in the library. Plans to embed RDM training within the university’s extended support operations across all training areas, Gradbook and Staffbook. One highlight in this area is the uptake of support for data management planning (DMP), particularly at the stage of submitting research project proposals for funding.

In these examples we can see alignments spanning governance-IT-services-users.

From the brief descriptions of these fronts it can be seen that the existing alignments have brought us forward, but to go further we have to return to those alignments and reinforce the actions taken so far: to widen awareness, impact and uptake of policy; to provide adequate and usable RDM infrastructure for data producers; to develop and integrate training support within the primary delivery channels.

Almost all of these outcomes and the need for more follow-through can be traced to the alignments. However, the elusive element common across these alignments is the researcher and data producer, despite being a perennial target. Data initiatives, whether from institutions or wider bodies such as research funders, start out with the researcher in mind, but can lose momentum if the researcher appears not to engage. That may be because the benefits identified do not align with the interests of the researcher, or it may be because at a practical level the support and resources provided are insufficient. Thus the extended alignments required for full RDM do not materialise. Worse, the existing alignments can be prematurely discouraged, lack incentives and confidence to promote the real innovation they have delivered, in turn affecting investment decisions and service development.

Where the researcher is engaged the results can be quite different, as seen in the DataCite example, motivated and developed by researchers, and in DMP uptake, where researchers clearly begin to recognise both the emergence of good practice in digital data research and the need for compliance with emerging policy.

These alignments are a crucial but largely unnoticed aspect of DataPool, and no doubt of other similar #jiscmrd projects at other institutions as well. If this analysis is correct then for institutional-scale projects alignments can both reveal and predict progress.

Jan 29 2013

DataPool expands on student RDM training approaches at IDCC13, Amsterdam

Steve Hitchcock

Two presentations from the University of Southampton at the¬†8th International Digital Curation Conference (#IDCC13) set out its approach to providing training for research data management for postgraduates. Taking a broad approach, DataPool gave a poster¬†on working with PhD and Early Career Researchers, described as featuring “examples of essential building blocks coming out of researcher-focussed work”. In the main conference a team from the¬†Faculty of Engineering and the Environment presented a paper jointly with DataPool on a booklet “Introducing Research Data‚ÄĚ they have produced, and included some startling findings from initial training sessions with students. Below Mark Scott from that team introduces the booklet, and we reproduce the live Twitter record of the presentation, which highlights the main points from the talk identified by the Twitter reporters, particularly those findings on student responses.

Introducing Research Data booklet – cover sheet

We recently presented some of our postgraduate training material at the¬†IDCC conference in Amsterdam. With so much data out there, and much of¬†today’s research relying on large scale data sets, it is important to¬†educate researchers about their data – and its value – early.

Our approach was two-fold: a lecture to introduce research data management to first year postgraduates, and a booklet introducing the area. The talk concentrated mainly on the booklet we produced.

The booklet had three sections: an introduction to types of research data,¬†some case studies showing real-world examples of the types of data in use,¬†and some best practices. For the case studies, we looked at five¬†researchers’ work from medicine, materials engineering, aerodynamics,¬†chemistry, and archaeology, and tried to show the similarities and¬†differences between the data types they produce using the categories from¬†the first section.

The concepts in the booklet have been presented twice as a training lecture in the Faculty of Engineering and the Environment, and the material has also been used in the WebScience Doctoral Training Centre. The feedback from students suggest that being made to think about these issues is necessary and useful, and engaging them at this stage helps cultivate good practices.

Mark Scott

Mark Scott et al, #idcc13 slide 13, Feedback From Lectures

Mark Scott et al, #idcc13 slide 13, Feedback From Lectures

Below is the Twitter record of Mark’s talk on 16 January 2013, from #idcc13,¬†in the chronological sequence of posting.

Meik Poschen ‚ÄŹ@MeikPoschen¬†Next up is Mark Scott, University of Southampton on ‘Research Data Management Education for Future Curators’ #idcc13

Marieke Guy ‚ÄŹ@mariekeguy¬†#idcc13 Mark Scott from Uni of Southampton – post graduate training, created a magazine style booklet for all first year students

Archive Training ‚ÄŹ@archivetraining¬†Southampton University give magazine style RDM booklet to 1st year PG students. #IDCC13

@MeikPoschen Booklet to introduce RD to first year students with 3 sections: 1) five ways to think about RD, 2) case studies, 3) DM best practice#idcc13

Jez Cope @jezcope Good, thorough description of the Southampton approach to RDM education from Mark Scott. #idcc13

‚ÄŹ@mariekeguy¬†#idcc13 Uni of Southampton – 5 ways to think about data: creation, forms of research, electronic rep, size/structure, data lifcycle

Gail Steinhart ‚ÄŹ@gailst¬†Southampton’s RDM guide for first year post-graduate students: ¬†(PDF) #idcc13

Full link added:

@jezcope I want to see Southampton’s RDM booklet, which includes ways to think about data, case studies and best practices. #idcc13

SMacdee ‚ÄŹ@SMacdee¬†#idcc13 – mark scott (U of Soton) RDM education booklet for Postgrads: 5 ways to think about research data; case studies; best practices

‚ÄŹ@MeikPoschen¬†Booklet: 2) case studies giving an overview on various disciplinary examples, covering Genetics, Materials Engineering, Archaeology#idcc13

Mariette van Selm ‚ÄŹ@mvanselm¬†+1 RT @jezcope: I want to see Southampton‚Äôs RDM booklet, which includes ways to think about data, case studies and best practices.#idcc13

Odile Hologne ‚ÄŹ@Holo_08¬†Five Ways to Think About Research Data¬†¬†¬†Mark Scott course for students #idcc13

Corrected link:

@mvanselm¬†Mark Scott (Uni of Southampton) on RDM education: “When you start scaring students, they start paying attention” #idcc13 ūüôā

@mariekeguy #idcc13 Southampton noted that students happier with RDM lecture when delivered later in year Рhad real data experience at that stage

‚ÄŹ@jezcope¬†Feedback for Soton‚Äôs RDM lecture was better when it was delivered later in the year. PGRs need to have some data experience first?#idcc13

@archivetraining¬†Southampton’s feedback: RDM lecture was more positive when given in month 7 – when data collection was underway. The power of fear! #IDCC13

@MeikPoschen¬†Booklet now part of University of Southampton’s wider (10 year) training etc. scheme #idcc13

@archivetraining¬†Lots of RDM guides for lots of different audiences being shown-off at#IDCC13. Here’s ours [in German – English coming]¬†

For a view of wider coverage and activities with a research data training theme at #IDCC13, we return to our colleagues at @archivetraining:

“The impression I took from this bundle of presentations (mostly funded by the excellent¬†JISC Managing Research Data¬†programme) was that projects doing data management training or support have to effectively design a campaign strategy, as one would for an election. Digital curation is akin to a valence issue ‚Äď we all like sharable, long-term secure data ‚Äď but how we get there needs to be thought about.” More

Or for more views on #IDCC13 as a whole, see this collection of post-conference blog posts.

Jan 28 2013

Positive Poster’ing for IDCC 2013

Dorothy Byatt

Creating¬†the poster for the International Digital Curation Conference ¬†(#IDCC13) was different to the ones we have done thus far. Although very much linked to the DataPool project, the choice of the content was only restricted to being of interest to the theme of the conference –¬†“Infrastructure, Intelligence, Innovation: driving the Data Science agenda”. Our choice was to focus on our collaborative work by PhD and Early Career Researchers, that is, helping to embed and enable good research data management practices in the institution.

Gareth Beale and Hembo Pagi have been investigating 3D and 2D raster imaging being used in the University. We look forward to their report. A group of researchers came to a working lunch, led by iSolutions and the DataPool team,¬†to look at progress on a SharePoint data deposit option and provided valuable feedback. Another development that will be of great assistance to those looking to capture a snapshot of life and society is that of a twitter archiver using ePrints currently in beta development. One snapshot will be of #IDCC13 tweets. Yet another collaboration was with Mark Scott on his work on his ‘Introducing research data’ guide and on a data sharing system for the Heterogeneous Data Cente (HDC). More details of his work and paper¬†he presented will follow in our linked second IDCC blog. So there was our content, examples of essential building blocks coming out of researcher-focussed work.

And that just left the design …!

Dec 20 2012

Connecting research data roadmaps and business cases: the IDMB example for the University of Southampton

Steve Hitchcock

The sausage in the roll or the wafer-thin ham in the sandwich, as promised in the last post this is the alternative to the ubiquitous benefits-evidence slides presented by each project represented at the JISC MRD workshop in Bristol. This presentation connects the development of roadmaps with the business case and policy for making progress with research data management (RDM) at an institutional level.

This was presented by Steve Hitchcock, but draws heavily on a report from the Institutional Data Management Blueprint (IDMB) Project, which began the work on research data management (RDM) at the University of Southampton now being taken on by DataPool. Mark Brown, Oz Parchment and Wendy White, co-authors of that report, are therefore the true authors of this presentation. Comment and interpretation are mine.

This version provides the notes for each slide used to inform the commentary for the presentation. It might be worth opening the Slideshare site (adverts notwithstanding) to switch between the slide notes below and the graphic slides ‚Äď clicking on View on Slideshare in the embedded view will open these in a separate browser window.

Slide 2 Taking the IDMB example with others, connecting roadmaps with the business case and policy seems like a logical sequence, but in practice this is not always the case. At Southampton we have a roadmap and an official institutional research data policy, but the business case is still to be approved. Other institutions appear to have begun with a policy. Here we will focus on the roadmap and business case rather than policy.

Slide 3 If the IDMB project elaborated the roadmap, DataPool represents progress along the first part (18 months) of the first phase (3 years) of the plan, and is beginning to fill in components of the map, as can be seen by the links in this slide.

Slide 4 For reference, this is a recent poster designed to show graphically the full scope of the DataPool Project. It shows the characteristic tripartite approach of this and comparable JISC institutional RDM projects: policy, training, and technical infrastructure (data repository and storage services).

Slide 5 This middle phase of the Southampton RDM roadmap looks like it may have been the trickiest part of the map to elaborate. It’s not imminent and depends on outcomes from the first stage; on the other hand, it’s not that far away that we don’t need to be aware and making plans for it. As seen in this extract, it is essentially describing refinements of many of the expected developments from stage 1.

Slide 6 If looking ahead is trickier than framing immediate work, this final phase looking up to 10 years ahead might have been hardest to describe. It is, however, more aspirational in tone and less inclined to deal with specifics, and seems more appropriate for adopting that approach.

Slide 7¬†A recent and interesting comparison with the Southampton RDM roadmap is that from Edinburgh University. Edinburgh has a target completion date of early 2014, a startlingly short roadmap compared with a 10Y example. The two are not directly comparable, of course. The Edinburgh case looks to be a well specified, well structured and comprehensive first phase and can be commended for that. Whether it is achievable within the time and resources specified we cannot judge yet. The illustration reproduced here is a helpful representation of the plan ‚Äď at least, it is once you‚Äôve read the plan.

Slide 8¬†This extract connects the first progress report of the DataPool Project, by then-PI Mark Brown, with the roadmap and policy. It makes the clear point that research funder requirements (EPSRC, RCUK) had an important influence on adoption of the policy at an executive level, even if some discussion at this JISC MRD Benefits Meeting was around whether supporting compliance with such requirements can usefully be presented to researchers as a ‚Äėbenefit‚Äô.

Slide 9 Other JISC MRD projects that have roadmaps have similarly emphasised the importance of EPSRC requirements on the production of the roadmap.

Slide 10 Now we move on to the second part of the talk, the business case. The data.bris project from Bristol University was presenting in the same session at this event, so we will spare the detail here, but this extract from a recent blog post by the project illustrates some of the imponderables, Donald Rumsfeld-style, of forming a business case for RDM.

Slide 11¬†We are heading towards the critical part of this presentation, the financial numbers. First some context. This case covers just the technical infrastructure ‚Äď IT services ‚Äď not the wider factors outlined by data.bris. This business model has been updated and presented at the University of Southampton and, as we have already indicated is currently undergoing further revision with a view to official acceptance. The assumption stated here is not based on the university‚Äôs current research data policy, which requires a record of all data produced in the course of research at the institution rather than full data deposit. The university can‚Äôt be said, therefore, to have stopped short, so far, of accepting the business case for supporting the costs of the policy. The data on usage of storage services and projected usage are the basis for the financials that follow.

Slide 12 In the style of the financial services industry, given there are a number of uncertain factors to accommodate in projections of the growth of storage requirements, this chart attempts to draw upper and lower bounded curves to underpin the calculations.

Slide 13 This illustration also comes directly from the IDMB report. Allowing that the metadata should ideally attach to both active and archive layers, the cost factors introduced here are access bandwidth latency and storage technology. The basic choices considered are between more expensive and faster access disk storage, and slower tape stores.

Slide 14 Now we get to the actual financial numbers resulting from this analysis. The number that stands out is Y3 in the disk-based scenario, which not only rises above £1M for the first time but gets closer to £2.4M. Subsequent annual costs shown here remain above £1M for this scenario. The slower tape-based costs are always lower.

Slide 15 Having identified the numbers, the critical decision is how to pay for it. This was an important issue for the second DataPool Steering Group meeting recently. A full free-at-point-of-use service may be the simplest if most expensive option for the institution, but it has been strongly argued that RDM must be viewed as a direct cost of research, and funded accordingly. The dilemma for institutions is how much to invest in infrastructure directly, compared with leaving projects to raise additional costs for data management and risking research bids becoming less competitive than those from institutions with more generous direct support.

Slide 16 In summary, roadmaps are useful for focussing discussion on research data management at an institutional level, and for engaging other stakeholders across all disciplines. Given that a roadmap should be based on prior consultations with those stakeholders, it follows that subsequent interaction with the roadmap should lead to further consultation. The roadmap must therefore be used as a living document. Southampton has not yet finalised its business case for supporting RDM, but it has established a process through engaging with the roadmap in the first instance.

Dec 17 2012

To architect or engineer research data repositories

Steve Hitchcock

There cannot be many mature products where development meetings have not been interrupted with a rueful declaration that to make further progress ‚Äúyou wouldn‚Äôt start from here‚ÄĚ. This encapsulates one key difference between the architect and engineer, the latter prepared to work with the set of tools provided, the other preferring to start with a blank sheet of paper or an open space.

In building research data repositories using two different softwares, Microsoft Sharepoint and EPrints, the DataPool Project is working somewhere between these extremes. Which approach will prove to be the more resilient for research data management (RDM)? In this invited talk for RDMF 9, the ninth in the DCC series of Research Data Management Forums, held in Cambridge on 14-15 November 2012, we will look at the relevant factors. As a project we are agnostic to repository platforms, and as an institutional-scale project we have to work with who will support the chosen platform.

The original Powerpoint slides are available from the RDMF9 site. This version additionally reproduces the notes for each slide used to inform the commentary from the presentation. It might be worth opening the Slideshare site (adverts notwithstanding) to switch between the slide notes below and the graphic slides – clicking on View on Slideshare in the embedded view will open these in a separate browser window

I thank Graham Pryor of DCC, organiser of RDMF9, for inviting this talk, and for suggesting this topic based, presumably, on the project blog post shown in slide 2. This post sets out some of the higher-level issues while avoiding the trap of setting up a straw man pitting Sharepoint versus EPrints.

Before we get into the detailed notes, here is the live Twitter stream for the DataPool presentation (retrieved from #rdmf9 hashtag on 15 Nov.).

@jiscdatapool Preparing to talk at #rdmf9. Have the 9 am slot
@MeikPoschen #rdmf9 2nd day: To architect or engineer? Lessons from DataPool on building RDM repositories, first talk by Steve Hitchcock #jiscmrd
@MeikPoschen JISC DataPool Project at Southampton, see #jiscmrd #rdmf9
@simonhodson99¬†Down to work at #rdmf9 at Madingley Hall – outside it’s misty, autumnal – inside it’s Steve Hitchcock, DataPool: to architect or engineer?
@simonhodson99 Steve Hitchcock argues that the DataFlow solution is one of the most innovative things to come through #jiscmrd #rdmf9
@simonhodson99 ePrints data apps available from ePrints Bazaar: #jiscmrd #rdmf9
@jtedds Hitchcock (Southampton) describes institutional drive to implement SharePoint type solution but can it compete with DropBox? #jiscmrd #rdmf9
@jtedds Trial integrations with DataFlow MT @simonhodson99 ePrints data apps available from ePrints Bazaar #jiscmrd #rdmf9
@John_Milner Hitchcock highlights the challenge of getting quality RDM while keeping deposit simple for researchers, not easy #RDMF9
@simonhodson99 Perennial question of the level of detail required in metadata: with minimal metadata will the data be discoverable or reusable? #rdmf9
@simonhodson99 Is SharePoint a sufficient and appropriate platform for active data management? Sustainable? One size fits all? #rdmf9

Are the Twitter contributions a fair summary? We return to the slide commentary to find out.

Slide 3 The blog post highlighted in slide 2 included this architectural diagram, produced by Peter Hancock, director of the iSolutions IT services provider at the University of Southampton. Although it leans heavily towards referencing Sharepoint, it can be viewed as a high-level reference model, analogous to the OAIS in digital preservation, and therefore as a model that can embrace other repository types.

Slide 4 Before we get into the detail of the presentation, here is a poster-based summary of the DataPool Project. It has a tripartite approach characteristic of similar institutional projects in the JISC MRD programme, covering data policy, training and, the area of interest here, building a data repository. It is worth noting as well, in this context, that the development partners shown in the row beneath the tripartite elements effectively represent ways of getting data in and out of the RDM service adopted, and are relevant factors in the repository design.

Slide 5¬†Here is how the different repository platforms might line up on a broad spectrum of Architected vs Engineered. This is a rough-and-ready approach to illustrate the basic point. Also included is DataFlow, from the University of Oxford, perhaps the most innovative repository platform to have emerged for RDM. Given its originality, it appears towards the architected end of the spectrum. We could not claim that Sharepoint is a new software platform in the same way as DataFlow, but from an RDM perspective you don‚Äôt get anything out of the box ‚Äď you have to start from scratch and ‚Äėarchitect‚Äô an RDM solution. What developers can do is try and ‚Äėengineer‚Äô the designed RDM element with the IT services already provided in Sharepoint. EPrints first appeared in 2001 to manage research publications. It has offered a ‚Äėdataset‚Äô deposit type since 2007, so provides a ready-made solution for an RDM repository, and can be ‚Äėengineered‚Äô to enhance that solution. As the slide notes, other RDM repository platforms are available. In the following slides we will explore the features of our three highlighted RDM platforms, starting with DataFlow.

Slide 6 DataFlow is a two-stage architecture for data management: an open (Dropbox-like) space for data producers (DataStage), and a managed and curated repository (DataBank), connected by a standard content transfer protocol, SWORD. While DataBank provides a bespoke data management service for Oxford, we have recently noted experiments to connect an open source version of DataStage with EPrints- and DSpace-based curated repositories, thus providing the yearned for Dropbox functionality apparently so in demand with research data producers.

Slide 7 This is an example screenshot from the DataStage-EPrints experimental arrangement used by the JISC Kaptur project. It shows the familiar Choose File-Upload button combination familiar to e.g. WordPress blog users, for uploading data. Uploaded data is then shown in a conventional file manager list.

Slide 8 To move data from DataStage to the curated repository, again shown in the experimental Kaptur implementation, uses this surprisingly simple SWORD client interface. If this seems insufficient description for a curated item, presumably a more detailed SWORD client could be substituted.

Slide 9 One basis for building a more comprehensive description, or metadata, for research data is this 3-layer model produced by the Institutional Data Management Blueprint (IDMB) Project, the project that preceded DataPool at the University of Southampton. This is quite a general-purpose and flexible model, perhaps with more flexibility than meaning. Structurally, nevertheless, we will see that this has some relevance to repository deposit workflow design.

Slide 10 The 3-layer metadata model can be seen quite clearly in the emerging user interface for data deposit built on Sharepoint. Here we see the interface for collecting project descriptions, used once per project and then linked to each data record produced by the project.

Slide 11 In the same style, here is the Sharepoint user interface for collecting data descriptions. One of the most noticeable features within both the Project and Data forms is the small number of mandatory fields (indicated with a red asterisk), just one on each form. Mandatory fields have to be filled in for the form to submit successfully. Most people will have experienced these fields; invariably when completing a Web shopping form these will be returned with red text warning. In this case you could feasibly submit a project or data description containing only a title. Aspects such as this are shortly to be subjected to user testing and review of this implementation.

Slide 12 Sharepoint has its detractors as an IT service platform, principally bemoaning its complexity-to-functionality ratio. Prof Simon Cox from Southampton University takes the opposite view passionately. This is an extract from his intervention at a DataPool Steering Group meeting (May 2012) putting the case for Sharepoint. It is a good way of understanding the wider strengths of Sharepoint, which may not be immediately apparent to users of particular Sharepoint services. Building the range of services suggested is a difficult and long-term project.

Slide 13¬†EPrints supports the deposit of many item types, including datasets since 2007. When you open a new deposit process in EPrints you will first be shown this screen, where you can select an item type such as ‚Äėdataset‚Äô.

Slide 14¬†Selecting ‚Äėdataset‚Äô will take you to this next screen, which might look something like this from ePrints Soton, the Southampton Institutional Repository. This is not quite a default screen for standard EPrints installs; the workflow and fields have been customised in some areas by a repository developer.

Slide 15 EPrints users need not be restricted to standard interfaces or interfaces customised to a repository requirement. Interfaces in EPrints can be added to or amended by simply installing an app from the app store, or EPrints Bazaar. Unlike the Apple app store, with which it might optimistically be compared, EPrints apps are not selected to be installed by users but installation is authorised by repository managers. There are already two apps for those managers to choose to suit particular RDM workflow requirements: DataShare and Data Core. More data apps are expected to follow. EPrints is thus being engineered for flexibility in RDM deposit. In the following slides we will explore these first two data apps.

Slide 16 DataShare makes some minor modifications to the default EPrints workflow for deposit of datasets, highlighted with red circles here.

Slide 17¬†Data Core aims to implement a minimal ‚Äėcore‚Äô metadata for datasets. Implementing this app will overwrite the default EPrints workflow, replacing it with the minimal set, approximately half of which is shown here (the remainder in the next slide). In addition, we have a short description of the design aims for Data Core, which are unavailable for Sharepoint data deposit and the DataShare app.

Slide 18¬†Taking both slides showing the Data Core deposit workflow, this is comparable, in extent, with the Sharepoint ‚Äėdata‚Äô interface shown earlier, although it has a few more mandatory fields.

Slide 19 Another example of an EPrints data deposit interface has been developed by Research Data @Essex at the University of Essex. Like Data Core, the Essex approach has explicit design objectives, based on aligning with other metadata initiatives to support multi-disciplinary data. In other words, this does not simply expand or reduce the default EPrints workflow for data deposit, but starts with a new perspective. We have been liaising with its development team to investigate the possibility of building this approach into an Essex EPrints app for other repositories to share.

Slide 20 Here is a section of the Essex workflow, highlighting one area of major difference with the default workflow. It shows fields for time- and geographic-based information.

Slide 21 We’ve looked at getting data into the repository, but not yet how it is displayed as an output, or a data record from the repository. This is one example. It is not the most revealing record, but could be expanded.

Slide 22 Essex has cited specific design criteria for its research data repository. Additionally we have observed some characteristic features, indicated here. In particular, it is a data-only repository, without provision for other data-types offered by EPrints (shown in slide 13). The indication of mandatory fields adds a further layer of insight into the implementation of the design criteria.

Slide 23 So far in this presentation we have seen different implementations of data repository deposit interfaces, including DataFlow, Sharepoint, and multiple interfaces for EPrints. Where is this heading, and what are the common themes? Since we are exploring the difference between architecting and engineering these repositories, I was interested to see this national newspaper article about a major redevelopment of an area close to central London, Nine Elms, an area that interests me as I pass through it on regular basis. Phrases that stand out refer to the relationship between the planned new high-rise buildings. What does this have to do with data repositories?

Slide 24¬†Interoperability is the relationship between repositories and how they interact with services, such as search, through shared metadata. If repositories have “nothing in particular to do with anything around them” or “show little interest in anything around” them, then they will not be interoperable. If repositories stand alone rather than interoperate then they become less effective at making their contents visible. Open access repositories have long recognised the importance of interoperability, being founded on the Open Archives Initiative (OAI) over a decade ago, and efforts to improve interoperability continue with current developments. Shown here are some current interoperability initiatives from one morning‚Äôs mailbox. Data repositories will be connected to this debate, but so far it has not been a priority in the examples we have considered here.

Slide 25¬†One of the organisations listed on the previous slide, COAR, produced a report that outlines more comprehensively the scope of current interoperability initiatives for open access. While some solutions to the capture of research data seen here have reasonably been ‚Äėarchitected‚Äô, that is, starting with a blank sheet to focus on the specific design needs of data deposit, these will need to catch up quickly with interoperability requirements, including most of those listed here. Data repositories ‚Äėengineered‚Äô on a platform such as EPrints, originally designed for other data types, do not obviously lack the flexibility to accommodate research data, and by virtue of having contributed to repository interoperability since the original OAI, already support most of the requirements shown here.

Slide 26 As for the DataPool Project, it will continue its dual approach of developing and testing both Sharepoint and EPrints apps. As a project it does not get to choose what is ultimately adopted to run the emerging research data repository at the University of Southampton. There are repository-specific factors that will determine that; but there are other organisational factors to take into account as well. Institutions seeking to build research data repositories that are clearly focussed on this range of factors are likely to have most success in implementing a repository to attract data deposit and usage.

This post has covered just one presentation, from DataPool, at RDMF9. The following two blog reports give a wider flavour of the event, the first exploring the architectural issues raised.

Julie Allinson,¬†Some initial thoughts about RDM infrastructure @ York: “I‚Äôll certainly carry on working up my architecture diagram, and will be drawing on the data coming out of our RDM interviews and survey to help flesh out the scenarios we need to support. But what I feel encouraged and even a little bit excited by is the comment by Kevin Ashley at the end of the RDMF9 event: that two years ago everyone was talking about the problem, and now people are coming up with solutions.”

Carlos Silva,¬†RDMF9: Shaping the infrastructure, 14-15 November¬†2012: “Overall it was a good workshop which provided different points of view but at the same time made me realise that all the institutions are facing similar issues. IT departments will need to work more closely with other departments, and in particular the Library and Research Office in order to secure funding and make sustainable decisions about software.”

Nov 2 2012

Oh no, not another presentation!

Steve Hitchcock

Continuing, and concluding, our brief ‘oh no’ series of presentations by DataPool at the recent JISC MRD (#jiscmrd) programme update workshop held in Nottingham on 24-25 October.

Projects were invited to volunteer short 10 mins talks at the meeting to fit specified session themes. Given the tripartite approach of DataPool, shown in our ‘oh no’ poster, Wendy White chose to present Policy and Guidance on this occasion (noting that we will be covering the Data Repository aspects – the third tripartite element of the project’s work – at the forthcoming RDMF9 meeting).

Earlier in 2012 the University of Southampton approved a Research Data Management (RDM) policy (slide 2). Clearly it is not enough simply to announce a policy with far-reaching and long-term implications such as this. There has to be support for its implementation, and particularly for those it is aimed at, in this case the university’s researchers and producers of research data. The first step towards this is the RDM Web site (slide 3), with a collection of guidance and briefing notes on how to manage research data effectively, covering issues such as planning, description, sharing, access, storage, and more.

The presentation goes on to outline the principles that shape this guidance and its continuing development, and the contexts in which it is presented across the university.

In the #jiscmrd meeting as whole there were so many presentations like this it wasn’t possible for one person to attend them all. What you got therefore is a selective quickfire update on companion projects in the programme. Even if you couldn’t catch everything, you were certain to learn something.

Oh no, not another presentation! Why would we have thought that?

Oct 24 2012

Oh no, not another poster!

Steve Hitchcock

Back in the early days of the Web there were fears that content would lose value through ease of sharing, copying, and piracy. It was then suggested by John Perry Barlow, co-founder of the Electronic Frontier Foundation, a digital rights organisation, that value would instead accrue to services and performance. Since then we have seen, for example, the transformation of the economy of the music industry from recording to performance, and growth in performance art. The same idea underlies academic poster papers (minus the art in our case).

Posters are performance. Above is the DataPool poster for a meeting of the JISC Managing Research Data (MRD) programme. We can post it here without fear of diminishing its value (!) because the Web reader

  1. can’t appreciate the scale (although if you ‘View on Slideshare’ you can see a slightly larger full-screen version)
  2. doesn’t get the performance or the interaction

As you can see from the poster, even the version here, we’ve thrown everything at it from the DataPool project. While I tend to be fairly comfortable with narrative storytelling, I am less confident with visual storytelling, as you may, just, be able to tell. Among all the posters at the meeting, I wonder which aspect will win out and attract most viewers. That’s probably obvious – with posters the visual wins every time, but the key is turning that attention into dialogue and shared understanding.

The meeting at which the poster will be displayed, a mid-term progress workshop, is for JISC projects in the MRD programme and selected invitees. If you will be at the workshop on 24-25 October in Nottingham, we will see you by the DataPool poster where we will be on hand to explain the project’s progress, and our curious, although probably not unique, scatter art style.

Oh no, not another poster! Why would we have thought that?

Oct 23 2012

Datapool presents at SxSC Creative Digifest

Gareth Beale

I recently presented the Datapool project’s plans for 3D and imaging data management research at #SxSC2 Creative Digifest. The event (organised by the University of Southampton Digital Economy USRG) was held with the aim of better understanding the impact digital technologies have upon our lives. Participants from several institutions came together to talk about their work, but also to talk more generally about the impact of digital technology on communities and individuals. It was the perfect place to present, but also to reflect upon, our work with the Datapool project.

The 3D and imaging strands of the DataPool project, led by Steve Hitchcock and administered by Gareth Beale and Hembo Pagi respectively, aim to develop a better understanding of how 3D and imaging data are currently handled at the University of Southampton: how they are created, how they are shared, how they are archived, and what this means for research and research culture.

A diverse range of technical and theoretical work was presented at #SXSC 2. The presentations served to highlight the highly innovative nature of contemporary research on digital themes, but they also placed repeated emphasis upon the need to understand how the growth of digital technology is affecting the way we live, think and work.

This need to understand the implications of digital technologies and to work in ways which are not only creative but also sustainable represents one impetus behind the Datapool project. It was fantastic to see so many people talking about how we manage our digital lives and to consider how different strategies might lead us in very different directions. It was important for the Datapool project to be at the centre of this discussion. We are left considering how some of the themes raised at the conference may relate to our digital working practice throughout the University.

Two of the talks which I found particularly interesting were Les Carr and Ramine Tinati talking about the Web Observatory. The idea that the web is sufficiently complex and poorly understood that it requires observation, as we might observe a complex natural phenomenon, is highly significant in thinking about relatively small scale data management on an institutional level. While we do not face many of the challenges faced by those seeking to understand the dynamics of an inherently social and dynamic global network, we must be aware that we are not simply looking at how people stucture their files. As research culture becomes increasingly digital and connected our data becomes socially significant. It will be very interesting to see, as we conduct our research, what the social landscape of Southampton’s 3D and imaging data looks like and whether as participants and observers we can develop a better understanding of the changes which are taking place.


Jul 27 2012

Demystifying Research Data: don’t be scared be prepared

Dorothy Byatt

I thoroughly enjoyed this JIBS/RLUK joint event focussing on¬†how¬†research data may impact on the role and remit of librarians in the not too distant future, if it hasn’t already.¬† It was¬†good to be able to hear how others are approaching issues associated with research data as well as get an opportunity to tell them about the work we have done so far as part of the¬†DataPool project.

17 July 2012

@jiscdatapool DataPool presenting today at JIBS/RLUK #JIBSUK Demystifying Research Data Tweet record of talk follows from those there

@nataliafay Now up Dorothy Byatt – Uni Soton on JISC DataPool – Building capacity, developing skills & supporting researchers #JIBSUK

@bindonlane Dorothy Byatt (Univ Southampton) now introducing JISC DataPool project; Website at:  #JIBSUK #jiscmrd

@nataliafay DB highlights how DataPool project will work to embed management of research data into research life cycle <– KEY #JIBSUK

@bindonlane Byatt: Soton IDMB project provided a business model for data storage; assumptions have been revisited in DataPool #JIBSUK

@samanthahalf Eprints data share app just launched to help with research data metadata and dissemination  #JIBSUK

@samanthahalf DataPool service model diagram: … #JIBSUK

@samanthahalf Introducing research data article from Southampton #JIBSUK

@nataliafay Soton’s excellent web pages articulate research support services incl. research data mgmt ‚Ķ #JIBSUK

@samanthahalf This is a great event, everyone go to the next one! #JIBSUK

19 July 2012

@jiscdatapool Report on #JIBSUK Demystifying Research Data  NB DataPool 18mth not 3y project, unless inc preceding Soton IDMB project

@jiscdatapool More on #JIBSUK Demystifying Research Data – “sandwiches were lovely and the carrot cake was particularly good”

Mar 29 2012

DataPool: presented, tweeted, blogged

Steve Hitchcock
Computer Applications and Quantitative Methods in Archaeology (CAA) 2012 conference

Computer Applications and Quantitative Methods in Archaeology (CAA) 2012 conference, hosted by the Archaeological Computing Research Group in the Faculty of Humanities at the University of Southampton on 26-30 March 2012

How do you give a conference presentation when your laptop with the presentation on it dies 1 hour before the presentation? You tweet it.

Graeme Earl is co-investigator with the DataPool Project. He is also a senior lecturer in archaeology at the University of Southampton and organiser of the Computer Applications and Quantitative Methods in Archaeology 2012 (CAA2012) conference being held in Southampton this week (26-30 March). So this is an especially busy time for Graeme, yet he still wanted to give a presentation on DataPool to his own research community.

Why choose the novel means of presenting via Twitter? Graeme explains: “I decided at lunchtime that I would give the paper via twitter, and upload slides as an accompaniment. An hour before the paper my laptop died catastrophically and, irony of ironies, my presentation materials were on my laptop rather than on a network location. So I assembled the presentation as links in 30 minutes and then delivered it.”

In case Graeme hasn’t time to blog his presentation as well, we’ll do it for him. Twitter is intended to be an immediate service so retrieval can get harder over time.¬†You may be able to find the original tweets by searching for Graeme’s username or for the hashtags he used. To avoid repetition these have been removed from the tweets and are copied immediately below. There is also some brief annotation of links between tweets¬†to assist readers. Otherwise, tweets are as Graeme’s originals. For reference, the presentation was given around 5 pm on Wednesday 28th March.

@GraemeEarl #caasoton #datapool #jisc

> Starting my tweeted paper on #datapool now #caasoton

> Managing Research Data

Report on Developing Institutional Research Data Management Policies, a JISC Managing Research Data (MRD) Programme meeting held in Leeds on 12-13 March.


This blog.

> Research data management infrastructure

Research data management infrastructure projects (RDMI), Web page on the first phase of the JISC MRD programme.


JISC project page for IDMB: Institutional data management blueprint, predecessor project to DataPool.

> Creating a system – sharepoint, repository, metadata

DataPool poster paper, on Graeme’s Slideshare account.

> Rolling out a policy – ratified, embedded, implemented

> Producing examples – discipline, re-use case studies, domains e.g. imaging

> Developing skills – training staff and students; ‘help desk’

> Sharepoint infrastructure provides data access and collaboration

> University deep storage repository + connection to others e.g. via SWORD2

> ADS SWORD ARM project

JISC project page for SWORD-ARM: SWORD & Archaeological Research data Management.

> ADS page for SWORD ARM facilitating deposit from outside to ADS repository


> Middle layer of metadata management – initially project/sub-project/item hierarchy

> Publication – push to and pull from external repositories e.g. ADS; policy implications for this?

> Provide external access to cache and deep storage versions

> Demonstration repository; trialling with

Portus Project, Digital Humanities, University of Southampton.

> Presented at Soton Research and Enterprise Advisory Group (REAG)

A project for the research life cycle? DataPool blog post, 8 March 2012.

> Ratification by Soton senate; included user guides also clarify uncertainties

> Defining core focus areas e.g. USRG Imaging

Computationally Intensive Imaging, University Strategic Research Groups (USRGs), University of Southampton.

> Building network of experts and interested people

Data system, policy, training: putting people first, DataPool blog post, December 8th, 2011.

> Defining internal dissemination mechanisms e.g. USRG DE

Digital Economy USRG, University of Southampton.

> data management plans presented to other JISC projects

Data management plans (DMPs): the day has arrived, DataPool blog post, 22 March 2012.

> Details of meeting disciplinary challenges in research data management planning workshop

Agenda for JISC workshop on Meeting (Disciplinary) Challenges in Research Data Management Planning held in London on 23 March.

> Finished. Taking questions.

> @PatHadley thankfully I had a helper to advance them for me!

That’s it: presented, tweeted, now blogged.