The Availability of Source Data and Statistics: An Improvement in Good Publication Practice

The purpose of this paper is to highlight the aspects of good publication practices, with particular reference to data analysis, and to propose an innovative initiative for improving the quality of scientific information in this field.

Several committees within the scientific community provide information and publish guidelines in order to support scientists in the application of good publication practices and to improve quality in medical research. Those guidelines suggest that the possibility of verifying the source data warrants the reliability of the published results by reducing the occurrence of misconduct related to data analysis.

The initiative proposed in this article is aimed at making the source data and the statistical reports available to the scientific community together with the actual paper. Such a practice is undoubtedly an improvement in the quality of publication permitting verification of the results as well as allowing for further elaboration of the same data.



Nowadays, scientific information plays a fundamental role, considering that the knowledge and the application of the results obtained in the scientific field have very important consequences for our society. It is superfluous to give examples, since the effects of advancement in the many diverse scientific fields, (from biotechnology to communication, from technology to computer science, etc.) are extremely evident. All this is especially true in clinical studies which have an immediate impact on every day clinical practice. Just to give an example, I will mention the case of simvastatin in the prevention of acute myocardial infarction [1, 2].

Experimental study quality, in terms of carrying out and the publication of the results, is crucial to correct scientific information and its importance constantly increases within the scientific community, particularly within the biomedical one. Not to apply, or to incorrectly apply good publication practice criteria, leads to misconduct whose primary effect can be summarized as "causing others to regard as true that which is not true". The impact of this statement in the scientific field is devastating.

The following considerations are aimed at highlighting the aspects of good publication practice, with particular reference to data analysis, and at proposing an innovative initiative in order to improve the quality of scientific information in this field.

Good Publication Practice

In the last few years, several committees were founded within the scientific community for the purpose of dealing with the problem of quality in scientific communication. Among these are COPE: Committee on Publication Ethics (; CSE: Council of Science Editors (, formerly the CBE: Council of Biology Editors (; EASE: European Association of Science Editors (; SSP: Society for Scholarly Publishing ( and WAME: Word Association of Medical Editors ( At the same time, other committees were funded with the specific objective of dealing with the aspects of quality in medical research. Among them are: ASSERT: A Standard for the Scientific and Ethical Review of Trials (; CHA: Center for Health Affairs of the Project Health Opportunities for People Everywhere (HOPE) (; CIOMS: Council for International Organizations of Medical Sciences (; CONSORT: Consolidated Standards of Reporting Trials (; ICH: International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (; ORI: US Office on Research Integrity (

One of the tasks of these committees is to provide information and publish guidelines in order to support scientists in the application of good publication practices. There are many aspects that must be taken into account in relation to the good quality of scientific information. Those aspects refer to topics which involve many different disciplines, as well as people with different roles. Among them, authors and publishers are certainly the most important ones, while other entities can be interested, even if not directly involved, in the scientific publication process. Some aspects, such as peer-reviewing, specifically concern the editors and the authors themselves, in that such activity is done by people who produce scientific information - therefore authors - in their turn. Other aspects more specifically interest the authors and are related to problems of various kinds: i.e. methodological problems (correctness of the experimental design, data analysis, etc.), ethical problems, etc. There are also aspects that may concern all those involved in the publication process: i.e. conflicts of interest, plagiarism, redundant publications. Finally, they may also concern people not directly involved in the publication process, such as journalists (i.e., media relations). All those aspects have been extensively analyzed and discussed for many years now, and several reports and guidelines can be found at the Committees’ websites (see above). COPE guidelines [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] and summaries of the WAME report [21, 22, 23, 24, 25, 26] are also available in biomedical literature.

As to what more specifically concerns the preparation of manuscripts submitted to biomedical journals, the problem was initially considered by a small group of editors of general medical journals that met informally in Vancouver, British Columbia in 1978 and established guidelines for the format of manuscripts submitted to their journals. The group became known as the Vancouver Group. Its requirements for manuscripts were first published in 1979. The Vancouver Group expanded and evolved into the International Committee of Medical Journal Editors (ICMJE), which meets annually; gradually, it has broadened its concerns and has produced the Uniform Requirements for Manuscripts Submitted to Biomedical Journals ( Further details on the Uniform Requirements are reported in the Appendix.

The Uniform Requirements also report recommendations about the statistical aspects of manuscripts. The most significant recommendation is included in the initial sentence of the "Statistics" section, where the following is stated: "Describe statistical methods with enough detail to enable a knowledgeable reader with access to the source data to verify the reported results". This statement summarizes an essential concept: the possibility of verifying the source data should warrant the reliability of both the data and the analysis performed; therefore, it should warrant the reliability of the results that were obtained.

Misconduct Related to Data Analysis

Fabrication (invention of data or cases), falsification (willful distortion of data), not admitting that some data are missing, and ignoring outliers without declaring them constitute the main research misconduct related to statistical and data analysis [27, 28]. This is serious misconduct to the point that Buyse et al. [29] used the term "fraud" specifically to refer to data fabrication and falsification. Given the importance of this misconduct, its consequences on the quality of scientific information become evident. On this topic, biostatistics plays an important role and biostatisticians should be involved in preventing fraud (as well as unintentional errors), detecting it, and quantifying its impact on the outcome of the research, particularly when clinical trials are involved. In particular, the guidelines for clinical trials [30, 31, 32] indicate that a biostatistician should be involved in the protocol at all stages, from design and analysis to reviewing and in order to avoid misconduct, it is advisable that an independent biostatistician be included in the Data Monitoring Committee.

The examples mentioned so far should be considered very serious misconduct, but there are many others which can be determined by the incorrect application and/or choice of the methods used for data analysis. As an example, some of the most frequent misconduct, related to the analysis and representation of data, can be identified as follows:

  • To treat missing data as zeros;

  • Application of inhomogeneous statistical methods (i.e. parametric and non-parametric methods) within the same set of data;

  • Application of parametric methods to data with evident non-normal distributions;

  • Lack of application of specific methods in the presence of multiple comparisons;

  • Erroneous interpretation of the statistical analysis results, particularly when they were obtained using sophisticated and uncommon methods of analysis;

  • Omission of exact P values, i.e. reporting only the reference to the significance levels.

The most important negative effects of this misconduct can be identified in the overestimation (such as generally happens in many cases), or in the underestimation of the significant data obtained in the study, as opposed to that which would have really resulted from the application of correct techniques. Another important negative effect is the one of influencing the comparability of the results obtained in different studies; some studies might report results that are different from those reported in other studies only because the data were analyzed using incorrect methods.

In some cases, this misconduct can be generated in good faith by the scientist as a result of lack of knowledge and/or adequate statistical tools; in other cases, they might be the consequence of a deliberate choice of the researcher who prefers to report those results obtained from the analysis which provides the most significant data. In other words, in the latter example, misconduct can be the direct consequence of having chosen to apply a statistical method on the basis of its results, instead of having made the choice at the beginning (i.e.: during the protocol preparation as indicated by good statistical practice). Finally, it is important to observe that the last misconduct cited in the list reported above, might also be interpreted as the intention of masking the fact that the statistical analysis was not actually carried out.

In order to avoid this misconduct, the initial statement in the guidelines provided by the Uniform Requirements (i.e.: to allow for verification of the affirmations stated in the description of the results) becomes of fundamental importance. Unquestionably, a problem exists in relation to the practical applicability of such a recommendation; within a scientific paper, the space available for the description of statistical methods is generally not sufficient to include all the details necessary for making the analysis completely reproducible. Moreover, the accomplishment of that which allows the direct verification of results reported in the paper would imply that a reader, or an external observer, have the possibility of gaining access to the source data; nowadays this possibility is, in reality, only theoretical for the most part and, especially for publications in print format, it is not easily applicable.

A Contribution of JOP to the Quality of Scientific Information

Innovative journals, such as, for example, the one in which this article is published, can greatly contribute to the quality of scientific information, especially in relation to the problems described here. The application of the most recent innovations, in terms of new models for the creation and dissemination of scientific information, and the new possibilities provided by electronic publishing technology, encourage initiatives for improving the quality of information as far as the statistical aspects are concerned.

Innovative copyright policies, where the intellectual property of scientific contributions remains with the author, and the electronic format of a publication, are fundamental to the open exchange of information in scientific communication. The first example of that is the initiative undertaken by JOP where, starting with the current issue, authors can publish their scientific papers and, at the same time, make their source data, reports of statistical analysis, as well as any other materials that they judge important, available to the scientific community in order to improve the knowledge of their results.

Availability of Source Data and Statistics

The first article meeting the above criteria is the paper by Pezzilli et al. published in the current issue of JOP [33]; the original database and the results of the statistical analysis are freely available by means of a link in the body of the hypertext.

When a ‘traditional’ copyright policy is applied, such an initiative might generate new questions related to the property of the data. That is not the case for publications such as JOP where, as already mentioned, the intellectual property remains with the author who will retain the rights for conventional publication and on all additional materials about the study that he/she wishes to make available to the scientific community.

The implications of this initiative, in terms of improvement in many aspects of the quality of information and its dissemination, are very important and manifold. In relation to misconduct, the problem of the verification of the analysis performed is completely resolved, since the data and the results of the analysis are made fully available; anyone equipped with the same statistical package used by the original author can reproduce exactly the same results. In this way, much of the misconduct described in this paper can be avoided or, at least, verified and discussed by the scientific community as appropriate.

A second aspect related to the quality of the information, certainly even more important than the previous one, is directly linked to scientific knowledge: as a matter of fact, other authors have the possibility of performing further analysis on the same data. One example might be to carry out the same evaluations published in the original paper, yet applying different methodologies, or published data could be used to test new hypotheses, beyond those taken into account by the original authors. In this way, much more information can be obtained from the same data.

Other authors’ evaluations of one scientist’s proprietary data opens up new questions. Among them, at least two which concern the area of ethics in scientific communication, are fundamental:

  • Is the permission of the original author, who owns the intellectual property of the data, mandatory for a different author in order to treat them ?

  • How can we safeguard the intellectual property of the original author ?

In my opinion, the answer to the first question can only be negative. Indeed, scientific information is a patrimony belonging to the community and, as such, it must be freely accessible.

Given that data are a fundamental component of a scientific paper, they must be freely accessible and, considering their intrinsic function, to say that data must be freely accessible is the same than as saying that they must be freely evaluated.

Keeping within the limits of the dissemination of scientific results, the answer to the second question is quite evident; an author who publishes results obtained by a new elaboration on data of property of another author must clearly identify the origin and property of the data, and the citation details of the original article must be reported. The case in which commercial benefits might be obtained from the results of analysis on non-proprietary data requires discussion not appropriate in this article, since it specifically concerns the field of ethics; a ‘shared’ property (the original author owns the data and the scientist who performed the new analysis owns the ‘idea’) seems however easily applicable.


In conclusion, I think that the initiative of making the source data, together with all other materials relevant to the verification of published results, available can greatly contribute to improving the quality of scientific information. It is possible to undertake such an initiative thanks to the opportunities made available in the new era of electronic publishing, the same opportunities which have allowed many other important initiatives in the last few years, particularly since an organized discussion on scientific information problems started to develop.

Finally, I hope that this initiative will be positively received by JOP authors, as well as by authors and editors of other scientific journals, so that there will be as many benefits as possible to the quality of scientific information deriving from its application.