о проблемных данных в экспериментах Барбало
Peer review is typically regarded as the critical appraisal/quality control that happens before publication. However, peer review is an ongoing process, where the research community levels its critical analysis at all points, including after publications make it into the ivory tower. Letters to the editor that re-examine (and in some cases, call for retraction) comprise the post-publication side of peer review.In the linked material below, Andrew Vigotsky et al detail their points of concern & reasons for skepticism about several studies from Barbalho et al's lab. This 'white paper' just hit the stands; it's a pre-release en route to formal publication. Here's the executive summary:1) The studies by Barbalho et al. have extremely homogeneous baseline strength levels compared to the rest of the literature. In particular, we observed homogeneity up to ∼7.5 z-score units below what would be expected given the mean value. This homogeneity was not just extreme across one study or variable; rather, homogeneity was present across many studies, and many variables within each study. Simultaneous homogeneity across many variables is improbable. Finally, homogeneity was also present for variables that could not have been measured at baseline (muscle thickness and change scores). Therefore, biased sampling alone cannot explain this degree of homogeneity.2) The effect sizes observed are both large and homogeneous. From a magnitude perspective, effect sizes for strength increases in the studies by Barbalho et al. were up to 13.5 z-score units greater than those in the rest of the resistance training literature. From a signal-to-noise perspective, multiple signal-to-noise effect sizes were undefined since the responses were perfectly homogeneous (i.e., standard deviation of change scores equal to zero). Excluding the perfectly homogeneous effects, the signal-to-noise effect sizes for strength increases reported by Barbalho et al. were up to 34 z-score units greater than those in the rest of the resistance training literature. While standardized effect sizes tend to scale with percent increases in strength in the literature, they do not in the studies by Barbalho et al.3) The men’s and women’s volume studies are remarkably similar in terms of their observed effects and correlation structures. This is despite both studies being independent, and each study being randomized. These across-study consistencies yield P < 1 × 10−6 when we would in fact expect the null hypothesis to be true due to randomization. In addition, there is structure in raw data that is inconsistent with randomization (again, P < 1 × 10−6). Other patterns in the raw data, such as twice the number of even as odd numbers, were also noted—this holds even after removing the strength data.4) In the single- vs. multi-joint vs. single+multi-joint studies, the effects observed in the multi-joint group nearly perfectly match those in the single+multi-joint group. This holds across studies.5) Several patterns exist in the raw data, including “runs” of numbers and strength values for one exercise being exactly 8 kg more than those for another exercise (for the entire sample).6) Squat strength increases in the recent squat versus hip thrust and single versus multi-joint papers are far beyond what would be expected for trained women of similar strength to those in the study. Even women who did not squat increased their squat strength at a rate of more than 2 z-score units above powerlifters who specifically train the movement. In those who did squat, z-scores of over 5 were observed.7) In the elderly study, 98% of the sample lost weight from a resistance training intervention alone; no dietary intervention was implemented. This is in contrast to what is known about the role of exercise in weight loss and in contrast to other studies. This study also contained methodological inconsistencies, such as large imbalances in group size despite using block randomization.8) We provide a statistical rationale for why the observed baseline homogeneities are not likely to stem from biased sampling; namely, because one would need to screen too many people.
Вход
Регистрация








Наверх
