No Victoria, the study didn’t “prove” anything

Don’t get me wrong, I love the “cult of science” that has become so popular. There are tons of people that are gaining a new appreciation of science as a field. The popularity of the revamp of Cosmos, or the gigantic layperson backlash against anti-vaxxers is a great thing.

What I am leery of is the statement and the layperson perception that various things have been deemed “proven” by scientists (Note: I’m not sure that this perception is anything new).

One recent example is an article about gluten I saw linked to on Buzzfeed called “Science Proves Gluten Sensitivity Isn’t Real, People Are Just Whiners“. Now, I’m not expecting that Buzzfeed is going to be the pinnacle of scientific accuracy, but let’s face it, the number of readers that article got is probably 1000-fold (and that might be a conservative estimate) what the actual journal article got (found here, if you’re interested). The reality of the matter, is that gluten sensitivity is not “proven” to only be in a person’s head. This is a single article that used frequentist statistics to evaluate non-celiac gluten sensitivity.

One of the limitations of frequentist statistics is that a lot of the ability we have to test something is limited to probabilities. In the case in the gluten article, they were using null hypothesis testing to evaluate the effects of a particular treatment.  What they found is that they did not have statistically significant p-values in their repeated measures ANOVAs and paired t-tests. Because they did not find a statistically significant p-value they had a situation where they “failed to reject the null hypothesis”, which basically means that they did not have sufficient statistical evidence to deem that the null hypothesis (that there is no effect) was unlikely to be correct.

This boils down to “we didn’t have enough evidence to suggest there is no effect”. Keep in mind that this does not mean that “there was no effect”.

You might think of this as the authors did not show support for the treatment having an effect but also did not show that there was no effect. Pretty inconclusive, right?

In any particular study, we do our best as researchers to obtain a sample of subjects (the group tested in the study) that does a good job representing the population of interest (basically, everybody that falls within a specific group of characteristics). Despite our best interests, we are limited by who we can actually get, who wants to participate, our geographic location, etc. Because of these limitations, we don’t necessarily get a sample that truly represents the population we would like to make inferences about.

One of the benefits of repeating studies by other research groups, is that they have the ability to access slightly different people. Hopefully, once many different groups have tried to answer a particular question, we have a pretty good idea of how the cards fall for a greater number of subjects. Ideally, this improves our overall ability to make a correct inference about the entire population of interest.

This is why meta-analyses, where a researcher combines findings from lots of different studies, are so important. One study isn’t enough to be sure you can correctly infer about the whole population. The gluten study looked at 37 subjects. Are we absolutely certain that those people represent the entire “gluten sensitive” population? I would argue no, not even close, especially with how trendy gluten is.

Anyway, what you are probably thinking now is that this scenario is seriously muddy due to issues with statistics and subject sampling (keep in mind this is the tip of the iceberg for lack of clarity). The reason you are thinking this is because the situation really is muddy! With this in mind, the statement that this researcher “proved” anything about gluten sensitivity feels a whole lot less correct.

Slogging through research findings.

Now, admittedly gluten sensitivity is not something I can claim much knowledge much about, but the example is illustrative of issues we certainly run into within strength and conditioning. It is really easy to read a single study and conclude that it “proves” or “disproves” a certain method or point, especially when it is one of our pet theories. We have to keep in mind that no one study can “prove” anything. It is not until there are multiple studies about a given topic that we can be really sure about it.

Even when there are large bodies of research on a topic, there is often still controversy (try these on for size: peripheral vs central fatigue, or daily undulating vs block periodization).

What we need to keep in mind is that we should try to wait for consensus to adopt something as canon. Waiting for a string of studies with similar findings makes it a whole lot easier to be confident that what you find represents reality.

A major issue however, is that a real consensus is often a long wait. S&C coaches should try be as up to date with the research as humanly possible so they can understand where a consensus really exists.  However, when a consensus is not reached, they can experiment a bit with themselves or their athletes for what works best. This will let you take the next logical steps beyond what the research states. Just don’t tell anyone that science, or your experiment “proved” your program is the best 🙂 .