DANS has launched its own data journal, which consists of two categories of publications: data papers and data guides. The data papers document individual data sets, the data guides introduce data collections and discuss their characteristics. The data journal is an enhanced publication in more than one respect: the journal texts are enhanced with direct links to data sets stored in the repository of DANS. In addition to this text-data coupling the journal is enriched with various features that contribute to greater usability of the content in terms of overview and navigation; they provide contextualization by adding background information and support various forms of visualization. Where possible, data can be previewed and explored online rather than through downloading and offline applications. In this paper the major reasons for data publication are summed up and the relationship of the journal’s design with enhanced publications in general is discussed. Because enhancing a publication is a cost factor, particular attention is paid to its added-value and legitimate reasons.
Data Archiving and Networked Services (DANS) is a research data archive in the Netherlands, which mainly serves the heterogeneous field of social sciences and humanities (including archaeology), as well as health sciences. It provides access to thousands of scientific data sets, e-publications and other research information. Improvement of data depositing, discovery and description require constant attention. For that purpose DANS decided to start its own data journal to enhance data visibility, to provide more extensive documentation and to stimulate peer reviews of deposited data. A major objective is stimulating researchers in this way to make their data available by providing an attractive platform for their output and by giving credits for data publication. The journal is a spin-off of CLIO-DAP, a project that is aimed at establishing a data availability policy in economic and social history and funded by CLARIAH, a large national research infrastructure program for the arts and the humanities.
To make data sets widely reusable, researchers need to know how and in what context (with what research aims and questions in mind) data were produced and what quality-checks were performed. Thoroughly documenting research data is hard work that is often poorly rewarded. As a result valuable data sets may not be documented in sufficient detail to permit adequate reuse or even to reveal its relevance to researchers who are less familiar with the subject matter [Wallis 2013]. Data journals address this need by publishing data papers, which describe data sets and the conditions of their collection against the background of the research projects in which they were created and used. Some form of peer reviewing is usually part of the publication process. An essential property is that such a data publication can be formally cited [Katz & Strasser 2014].
The Research Data Journal is an enhanced publication, a concept that usually refers to a research article linked to the underlying data, optionally enhanced with interactive facilities and multimedia which makes reading of the text and validation of the conclusions more convenient. Well-known examples are Elsevier’s ‘Article of the Future’ and its various derived formats [Aalbersberg 2012; Article of the Future 2013, 2014]. The implementation in data papers takes this concept a step further, to the level of data documentation. The enhanced data paper model that forms the core of the Research Data Journal is data-centered with novel elements, largely taken from the “usability repertoire” (in a broad sense) of regular enhanced publications. They are aimed at enhancing the reading experience and at easier data assessment, thus offering a better service to the heterogeneous user groups mentioned, which differ considerably in scientific background, breadth and depth of interests. In this paper we shall discuss how the presentation principles of enhanced publication are applied to data papers and to the data journal as a container, and what issues and challenges we have encountered.
2. Features of Enhanced Publications
Models of enhanced publications differ particularly in their data structure, i.e. the way in which different parts of the content are linked or embedded. On the one hand, there is the category of enhanced publications as packages of rather loosely connected content parts linked through metadata and/or hyperlinks. On the other end of the spectrum one finds compound documents, which integrate the different elements as much as possible into a single user interface [Breure 2011]. Data papers themselves can be regarded as enhanced publications of the first category because of their structure consisting of a narrative part with references to a data set [Bardi & Manghi 2014]. Content management systems form the technical infrastructure of the journal that contains the enhanced publications and add additional features such as (automatically generated) links to related articles, viewer plug-ins for multimedia content and 3D-models, highlighting keywords by topic, et cetera.
Improving readability and understanding have been important and are the first point discussed by Aalbersberg et al. with regard to the ‘Article of the Future’: Elsevier introduced a left-pane for navigation and quick browsing through the images and tables, applied best practices from good page design (like optimal left and right margin sizes and a maximum number of words per line) and a separate right-pane for contextual information to keep the core article clear and clean [Aalbersberg 2012]. In our survey of enhanced publications, which dates from about two years ago, we found a variety of features which can be grouped into four major categories:
- Overview and navigation: features like navigation components and schematic summaries that help the reader to determine the importance of an article by glancing at key passages and the most relevant images.
- Contextualization, i.e. adding background information to the publication, which may relate to the author, the project, or the concepts and terminology used. Other forms of contextualization are adding references to similar and related articles and user-driven tagging and annotation, which position the publication in the research community.
- Visualization is an important feature, which comes in a variety of forms. A distinction can be made between data visualization and visual material about the research objects. It comprises image galleries, interactive images, infographics, rotatable 3D-models and video clips of experimental procedures.
- Exploration of data, which partly relies on visual components like interactive statistical charts and interactive maps, but is also realized by online spreadsheets and database interfaces or executable documents, like the IPython notebook and the executable paper (e.g. the Collage system, the winner of Elsevier’s’ Executable Paper Grand Challenge 2011 [Nowakowski 2011]).
3. The Added-Value of Enhancing Publications
Thinking critically one may take the position that these enhancements are a certain form of luxury; what really counts are quality of research, solid arguments and reliability of the data and the availability to others. Indeed, when only experts communicate with each other and exchange their results, they may not need them. Their cognitive frames overlap, they share the same background knowledge and can easily separate primary and secondary issues in their common research field (this explains perhaps, why some data journals consist of very brief papers). They will probably appreciate time-saving facilities of quick overview and handy navigation components like those in the ‘Article of the Future’, while scholarly multimedia are easily reduced to the status of “unacademic”, as Burgess and Hamming have observed in the humanities. “Content is the essence of analysis, while form is merely the ‘matter’ out of which it is made” [Burgess & Hamming 2011]. Scholarly publishing, as a broad generalization, “has taken only baby steps” in the direction of engaging an audience in a multimodal way. It has moved from printed publications to online publications, from physical delivery to online delivery, but without benefitting yet from the many opportunities available through new media [Anderson-Wilk & Hino 2011].
Looking through literature and websites, one must admit the ‘what’ of enhanced publications is better explained then the ‘why’. In so far as enhancement by linking-to-data is concerned, the benefits are self-evident, but the aspect of form needs some accounting, also because the packaging of content is a cost factor. The common denominator of the meager explicit motivations for enhancing publications boils down to arguments referring to the rapidly growing information society, the necessity of linking information and advancements in communication. Visual culture has already developed close links with scholarly research. Going more to the periphery of scholarly communication, to the blogs, social media and popular presentations of research, where rules are few and credits come through visibility, we see an obvious connection with the multimedia trend in society [Breure 2013]. However, many multimedia products end up as videos on YouTube or in the separate web section of the journal’s website creating a headache for publishers with regard to management of material and level of reviewing [Carpenter 2010].
What is often missing is integration: interactive images and maps, video, and interactive models substantially connected with or embedded in the text, as for example realized with the 3D models in Utopia documents (http://utopiadocs.com/) and the interactive multimedia in Apple’s proprietary EPUB3 implementation iBooks [Breure 2014]. Multimedia components are then treated as “first-class citizens”, on an equal footing with the verbal discourse, while the distinctive capability and functionality of each medium are fully exploited. But to be productive in such multimodal communication requires a certain visual literacy, which is – generally speaking – still a missing element in academic training. Unfortunately, most of us are still held hostage by the technology of the past, which enforced descriptions while we can now also show what we tell [Breure 2011]. Although a multimedia specialist will be of great help, an author must learn to think not only in terms of text but also in terms of additional material such as images underpinning or illustrating what he wants to put across. He has to extend his rhetorical repertoire of textual constructions with visual representations in order to master the full range of new expressive possibilities [Barish & Daley 2009].
Regardless of these objections and hurdles, there are a few sound reasons for enhanced (data) publications, which vary depending on the nature of the research work and envisaged audience:
1. Handling data complexity
There is a growing pressure from the research field itself to use visualizations for data and to include them in publications. Visualization is essential to understand big data sets. It is helpful if the visual data components allow exploration, filtering, querying and ‘what if’-scenarios. Astrophysicists would like to be able to link research results to object and image databases, and earth scientists feel that publishing large datasets, which underwrite many scientific papers in their field, would increase transparency and move the field forward [Physical sciences 2011]. However, with the data deluge, a bottleneck is emerging in the research life cycle, which is the result of the fact that the costs of visualization technologies are not decreasing as quickly as the costs of generating data. The extra effort of making data understandable is consuming considerable resources that could be used for many other purposes [Fox & Hendler 2011, Neugebauer 2012]. Especially in social media research the need for interactive exploration in connection with publications is increasingly strongly felt. The outcomes of ‘big data’ social media research often necessarily include complex data visualizations on multiple dimensions [Bruns 2013].
2. Visualizing material aspects
Disciplines that focus on the structure of research objects, spatial situations or special procedures want to publish visual models and authoritative examples. The Journal of Visualized Experiments (JoVE) is the world's first peer reviewed scientific video journal, devoted to help researchers overcome two of the biggest challenges facing the scientific research community today: the poor reproducibility, and the time and labor-intensive nature of learning new experimental techniques.
In medical science progress has been strongly linked up with visualization techniques, for example X-ray photography, MRI scans and computed tomography [Smelik 2010]. Medical researchers feel handicapped by being restricted to two dimensions and by severe limitations in the distribution of moving images and sound through medical publications. They want to avoid the unattractive separation of the actual publication from supporting multimedia data, which may contain crucial information and want to integrate multimedia and text files into a single article [Ziegler 2011].
The same applies mutatis mutandis to other fields – the examples are serendipitous. In periodontology (the specialty of dentistry that studies supporting structures of teeth, as well as diseases and conditions that affect them) a new online-only journal was created to demonstrate how cases are managed clinically. Step-by-step photographs or short videos are included to demonstrate findings and/or management of the case or clinical situation. [Kornman & Reddy 2011].
The Journal of Physical Chemistry Letters (JPCL) uses so-called ‘perspective videos’ in which authors present their work to readers. In addition, the journal has recently introduced a web-based platform that allows authors to design a scientific presentation based on the results published in JPCL, in which audio narration is integrated into PowerPoint slides [PChem 2014].
Another example from chemistry is Jmol, an open-source Java viewer for chemical structures in 3D with features for chemicals, crystals, materials and biomolecules. It is capable of animation, the display of vibrations, surfaces, orbitals, measurements of distance and angle, and more. Jmol allows the reader to view a protein or crystal structure from different perspectives and to save the structure data to their own computer or forward it to colleagues [Neugebauer 2012].
Archaeology in particular is a discipline where a need for enhanced publications may be expected because of its focus on object and space. A few years ago a survey on user needs was conducted, which has been used as basis for implementations of dynamic content in the Journal of Archaeology in the Low Countries (JALC), 2009 [Adema 2010]. To date there is a range of online archaeology journals that come with more or less technically sophisticated image galleries (for example the enhanced articles in Oxford Journal of Archaeology). Archaeology journals published by Elsevier allow embedded video and attachment of 3D-models, however most imagery is in the form of a large number of 2D-pictures.
Last but not least there are usability arguments in line with the already mentioned motivation of the ‘Article of the Future’. In 2010 the International Council for Scientific and Technical Information (ICSTI) held a workshop on interactive innovations in scientific research publications. In one of the sessions the results of an experiment on an interactive journal article against a conventional counterpart were presented [MacMahon 2010a]. Although of limited generality, this example makes the issue more concrete: “An article from the journal Urology was selected, and an enhanced version prepared that contained both user-invoked features and presentational improvements. Members of a group of 51 medical students were randomly assigned the conventional (‘control’) or the enhanced (‘experimental’) article, and their knowledge gain on reading the article tested by pre and post-experiment questionnaires. The students were also asked to rate their acceptance of the article format using Likert-scale psychometric questions. The results of the experiment revealed several unexpected findings. The first finding was that the dependent measure of knowledge acquisition showed no difference overall between the control and experimental groups. There was, however, significant gain on the content accessible directly through user-invoked interactive features. Statistically significant variations were correlated with student year and gender (second-year students performed best with both types of article; female students benefited more from the enhanced article), and the acceptance was greater for the experimental article” [MacMahon 2010b].
One tentative conclusion from these different arguments is, that enhancements make the content of scientific publication more operable for users by easing the transition from reading to actual understanding. This works on different levels: attracting attention, a better and more concrete communication of the research object, active involvement of the reader, controlling and reducing complexity in understanding the outcome of research, positive affective appraisal of the subject and accommodation of individual learning styles, which can even make reading an engaging experience. The latter may sound as luxury, but may be not, given the still rapidly growing stream of scholarly information. More and more articles get read per year but less time is spent per article. Within the last 30 years the articles read nearly doubled whereas the time spent per article nearly halved [Kunert 2012], which justifies some extra service in knowledge presentation.
4. An Enhanced Data Journal
DANS concentrates on the humanities and the social sciences; the data journal is directly connected with the data archive and covers fields as history, archaeology, language and literature in particular. It is divided into sections containing data papers, each describing individual data sets in the context of the research project in which they were created and used. Data sets are formally reviewed on deposit; data publication is also intended to stimulate more thorough peer reviews, which will also be included in the journal.
The structure of data papers is rather conventional:
- an introduction with background and context;
- discussion of the research problem; · methods of data gathering and analysis;
- a description of the data set with persistent identifiers linked to the deposited data in the archive (where more extensive technical documentation is stored);
- concluding remarks and
- references to literature.
This division may slightly vary according to the type of data or project. Substantive conclusions as in a research paper are not expected. To make navigation straightforward, a data paper is implemented as an introduction page with tabs below for the other sections (see figure 1). This makes it easy for users to go immediately to the tab page that raises their interest.
Figure 1 - Structure of a data paper.
In addition, there is a separate section for data guides, which document specific data collections and the way they were digitized (e.g. census data). The structure of such a data guide depends on the nature of the data collection. Currently, a data guide for Dutch census data 1795-2001 is implemented, which contains introductions on the characteristics of the historical sources, on the coding of key variables as occupational titles and municipalities, taking into account the changing geographical coverage and evolution of place names in the course of time. This general part is followed by commentaries on individual censuses, which are linked to the scans of the historical sources and the processed data, which are stored as spreadsheets in the archive.
An important aim in all journal sections is showcasing the data, i.e. drawing attention to the importance of the research behind the data set and making it easy for users to evaluate its potential relevance for him or her personal work. This serves multiple interests: it makes a wide audience aware of the wealth of archived data and at the same time it provides visibility and credits (and thus rewards) to the depositors. DANS has committed itself to a data availability policy, i.e. the requirement to deposit the underlying data before publication of an article. Although the idea itself is widely welcomed, many journals particularly in the humanities are reluctant to actually support it. A data journal published by an archive does not take away all reservations in this respect, but it can operate from a different perspective than established scientific journals.
All four categories of features of enhanced publications have been used in the data journal:
1. Overview and navigation
The data journal is implemented in Joomla!; like all major content management systems this CMS comes with built-in facilities for automatic creation of a table of contents, indexing and searching on word level, categorization and tagging of articles, navigation aids and various plugins for multimedia, information sharing through social media and other content-related tasks such as adding comments to an article, embedding Google maps and content rating.
In addition, Joomla! has a number of technical advantages. It has the second largest share in the CMS market and stands midway between its competitors WordPress and Drupal when it comes to flexibility. Moreover, from version 3.x onwards it incorporates the HTML5 framework Twitter Bootstrap, which provides a high quality layout and makes its templates fully responsive (i.e. easy reading and navigation with a minimum of resizing, panning, and scrolling – across a wide range of devices, from mobile phones to wide desktop monitors). An interesting side effect is, that implementation can benefit from extra features of Twitter Bootstrap itself, such as text columns inside a Joomla! article and expandable text. The latter mechanism has been used, for example, to create layers in the data guide: a core text containing the essentials, while detailed information that may be expected to be relevant at a later stage, is hidden until the user clicks on a link.
Figure 2 - Data guide example of a census in Joomla!. 'Lees verder...' indicates expandable text.
The entries of the census’ table of contents are linked to data sets in the data archive (spreadsheets, scans, pdfs).
The sidebar provides visual contextualization through contemporary illustrations of aspects covered by the census.
Contextualization is primarily a matter of authoring guidelines: texts are expected to be written with a wide audience in mind. Specific concepts, terms, names of persons and places are linked to reference works to allow easy look-up. Users can find similar and related articles through the built-in tagging and search mechanisms. Where appropriate illustrations are used, in a functional way to connect the reader with the physical world in which the research object was situated to get an impression of the atmosphere of a historical period or to make abstract concepts more concrete.
Because of the heterogeneity of the field that the data archive has to serve, both types of visualization discussed earlier (that of data and research objects) are highly relevant to the data journal. Humanities and social sciences rely on data visualization through charts, maps and graphs, while archeology data sets contain a large number of object representations (drawings, photographs of artefacts and soil profiles, etc.). We encourage authors to make a selection of most significant visuals. Most of them will not have the skills and/or time for multimedia enhancements; at least they will need some assistance with adding interactivity, creating image galleries, additional maps, word clouds or infographics. Therefore, DANS will offer a helpdesk function and technical editorial assistance (outsourced to a small company that specializes in this field).
Figure 3 – Preview of a video interview with victims of Word War II.
Transcription fragments are combined with stills
and presented as an image gallery.
4. Exploration of data
The archive’s default data usage procedure requires logging in, downloading the data set (tabular data often as spreadsheets or in CSV format), and uploading it into a suitable application. This can be time-consuming. It is acceptable if the user is sure that he is actually going to use the data, but quite cumbersome in an orientation stage if the researcher is reading a data paper; then browsing and quick exploration are wanted. For that reason advanced preview facilities embedded in or closely linked to the data paper are more useful. In case of tabular data this can be implemented by means of linked online spreadsheets, which allow manipulation of the data without saving any changes. We have used documents in Google Drive for demo purposes, but in the long term an in-house solution is desired.
DANS holds also quite a lot of Microsoft Access databases, which can only fully used with Microsoft Office applications. Because they concern either relatively small data collections or samples of larger data sets, it is feasible to publish the content as data base reports in PDF format, which makes them easy to explore. As mentioned already, visual material can be previewed by means of interactive image galleries. A quick scan of multimedia sources is a bit more complicated. Audiotapes and videos can be shortened to clips and combined with fragments of their transcriptions, if available (figure 3). However, the downside of all this are the overhead costs of technical editing.
The Research Data Journal is intended to showcase data sets for a heterogeneous community of data users with diverse expertise and different levels of interest. In the data archive the description of data sets is short and mostly limited to a listing of metadata. The journal offers a more attractive platform for providing access to data sets, which rewards the depositor through credits for his data paper and may have the capability to win over more scholars to deposit the yield of their work. It is an enhanced publication in different respects. First of all, by linking data and paper both of them get enriched. In addition, the journal itself is based on an enhanced design, which accommodates a diversity of user requirements. All three major reasons for enhancement mentioned, the handling of data complexity, visualization of material aspects and usability apply to the presentation of the data archive’s content.
The actual value of the new format should be investigated by evaluation of the success after a start-up period of several years. The acceptance by the readership and the supply of new data papers will be decisive factors. DANS uses already a system of customer reviews of downloaded data sets, which can be extended to include the journal as well. Additional information will be obtained from usage statistics by logging the journal’s website and from expert surveys [Vaulo 2013]. It is evident, that an enhanced data journal comes with some extra costs, which have to be balanced against the benefits of a modern, more intuitive and straightforward access to the intellectual capital stored in the data archive.
[Aalbersberg 2012]: Aalbersberg, IJ.J., Heeman, F., Koers, H. & Zudilova-Seinstra, E., Elsevier’s Article of the Future: enhancing the user experience and integrating data through applications, Insights: the UKSG journal 25:1 (2012) 33-43; doi: 10.1629/2048-77184.108.40.206.
[Adema 2010]: Adema, J., JALC User Needs: External Evaluation Report. Work Package 8, SURFshare project 2009 – Enriched publications in Dutch Archaeology, Leiden 2010.
[Anderson-Wilk & Hino 2011]: Anderson-Wilk, M. & Hino, J., Achieving rigor and relevance in online multimedia publishing, First Monday 16:12 (2011).
[Article of the Future 2013]: The Article of the Future is now live! Have you experienced it? (2013).
[Article of the Future 2014]: Article of the future.
[Bardi & Manghi 2014]: Bardi, A. & Mangi, P., Enhanced Publications: Data Models and Information Systems, Liber Quarterly 23:4 (2014).
[Barish & Daley 2009]: Barish, S. & Daley, E. Multimedia Scholarship for the Twenty-First Century.
[Breure 2011]: Breure, L., Voorbij, H. & Hoogerwerf, M., Rich Internet Publications: "Show What You Tell", Journal of Digital Information, North America, 12 (2011).
[Breure 2013]: Breure, L., Visual Culture and Scholarly Communication.
[Breure 2014]: Breure, L., Hoogerwerf, M. & Horik, R. van, Xpos’re: A Tool for Rich Internet Publications, Digital Humanities Quarterly 8:2 (2014).
[Bruns 2013]: Bruns, A., Faster than the speed of print: Reconciling ‘big data’ social media analysis and academic scholarship, First Monday 18:10 (2013).
[Burgess & Hamming 2011]: Burgess, H.J. & Hamming, J., New Media in the Academy: Labor and the Production of Knowledge in Scholarly Multimedia, Digital Humanities Quartly 5:3 (2011).
[Carpenter 2010]: Carpenter, T., Standards Column - Journal Article Supplementary Materials: A Pandora’s Box of Issues Needing Best Practices, Against the Grain 21:6 (2009/2010) 84-85.
[Fox & Hendler 2011]: Fox, P. & Hendler, J., Changing the Equation on Scientific Data Visualization, Science 331 (2011) 705-708.
[Katz & Strasser 2014]: Kratz, J. & Strasser, C., Data publication consensus and controversies, F1000research (2014).
[Kornman & Reddy 2011]: Kornman, K.S. & Reddy, M.S., The Future of Scholarly Journals, Journal of Periodontology 82: 5 (2011) 657-658.
[Kunert 2012]: Kunert, R. How Long Should A Scientific Publication Be? (blog).
[MacMahon 2010a]: McMahon, B., Interactive Publications and the Record of Science (report), International Union of Crystallography, data-related meetings 2010.
[MacMahon 2010b]: McMahon, B., Interactive Publications and the Record of Science (paper): Information Services & Use 30 (2010) 1-16.
[Neugebauer 2012]: Neugebauer, T., A Report from the 2011 ICSTI Workshop on Multimedia and Visualization Innovations for Science, D-Lib Magazine 18:1/2 (2012).
[Nowakowski 2011]: Nowakowski, P., Ciepiela, E., Harężlak, D., Kocot, J., Kasztelnik, M., Bartyński, T., Meizner, J., Dyk, G. & Malawski, M., The Collage Authoring Environment, Procedia Computer Science 4 (2011) 608-617, doi: 10.1016/j.procs.2011.04.064.
[PChem 2014]: The Increasing Impact of Multimedia and Social Media in Scientific Publications (editorial), The Journal of Physical Chemistry Letters 5 (2014) 233-234.
[Physical sciences 2011]: Collaborative yet independent: Information practices in the physical sciences, 2011.
[Smelik 2010]: Smelik, A. (ed.) The Scientific Imaginary in Visual Culture [V&R unipress GmbH], Göttingen 2010.
[Vaulo 2013]: Vaulo, A., Evaluating journals, in: ToR - Toolbox of Research.
[Wallis 2013]: Wallis, J.C., Rolando, E. & Borgman, C.L., If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology, PLOS One 8:7 (2013): doi: 10.1371/journal.pone.0067332.
[Ziegler 2011]: Ziegler, A., Mietchen, D., Faber, C., Hausen, W. von, Schöbel, C., Sellerer, M. & Ziegler, A., Effectively incorporating selected multimedia content into medical publications, BMC Medicine 9:17 (2011), doi:10.1186/1741-7015-9-17.