Mapping RDM requirements for the next stage of data repositories
DataPool and the University of Southampton have been investigating the use of EPrints and Sharepoint to extend the capabilities of repositories for research data management (RDM). Others, notably the Universities of Lincoln and Bristol, have been looking at CKAN, a data portal platform from the Open Knowledge Foundation, and were responsible with JISC for a ‘sold out’ meeting on CKAN for Research Data Management in an Academic Setting (18 February 2013).
The principal output of the meeting is a set of CKAN RDM requirements (a Google Doc spreadsheet), produced by workgroups in which all participants at the meeting were involved, based around different stakeholder positions. Delete the term ‘CKAN’ from the title of this spreadsheet and you have a series of RDM requirements that define the space in which all repository platforms seeking to support RDM will be challenged to engage. In other words, while adapting deposit workflow is a start, it is not sufficient. Dropbox – the elephant in the room that went unmentioned, for at least an hour into the meeting – stands as the model that illustrates some of these challenges, but there are now many more requirements set out from this workshop.
At Lincoln, Joss Winn explained, they have an EPrints publications repository and are developing a CKAN data approach to “create a record of CKAN data in EPrints, thereby joining research outputs with research data” through a SWORD2 implementation.
Is this a path to get rid of EPrints at Lincoln, to accommodate CKAN? No, Joss said, quite definitely, but then effectively questioned his own answer: if starting now, would we start from here, i.e. a combination of two software platforms? The implication is that over time, possibly years, the definite answer could change. The challenge is on.
Addendum For more detail on proceedings at the workshop see Patrick McCann’s report for DCC, and a view from a presenter, Simon Price of data.bris.
Thanks for the write-up Steve. From our point of view, there is no conflict between CKAN and ePrints. True, they are both content management systems and could be forced to do the work of the other, but in our workflow it makes sense to use both systems for what they are good at and this is made possible by the use of rich APIs and standards-based metadata exchange protocols. If there is a challenge in the future, I don’t think it is CKAN vs ePrints, but most likely the continuing challenge of meeting diverse disciplinary RDM requirements with very few existing tools. We are at the dawn of an emerging area of information management and much (very much) remains to be done by the RDM and open data communities to keep up with and meet that challenge. I do enjoy it though 😉
[…] suggested that delegates got a lot out of the event. You can read write ups from the DCC and the Datapool project at […]