Hypothesis Formulation

Science proceeds by a process of observation, hypothesis formulation, hypothesis testing, and on the basis of the test, accpeting or rejecting your hypothesis. If you hypothesis is accepted then you (or others) will attempt to replicate your results, however if you reject your hypothesis then you will have to refine or modify it in some way.

It is therefore useful to always have a clear idea of your hypothesis, which is essentially the question you are asking, as it will greatly ease your interpretation of the results.

The Alternative Hypothesis

The alternative hypothesis is simply the question you are asking. If you are carryingout a genetic Population based case-control study then your hypothesis is, in a broad sense, that carrying a particular genotype (or allele) increases your susceptibility to developing a particular disease. The formulation of this hypothesis has nothing to do with the statistics which are to be employed in testing it, and should be derived from your prior biological/clinical knowledge about the disease. For example you may know that a particular gene is involved in a metabolic pathway which is disrupted in individuals who have a disease, which gives you an a priori reason for testing the genes involved in that pathway for association with the disease, or you may be trying to replicate an association which has already been demonstrated.

A few examples from genetics

If you are carrying out a parametric linkage analysis then you must specify a disease model (in terms of allele frequencies, number of liability classes, and the genotypic penetrances (i.e. mode of inheritance) and a recombination fraction between the marker and the hypothesised disease locus). The results of the tests are summarised by LOD scores, which tell you how likely your specified disease model is compared to the null hypothesis of no linkage to the locus.

On the other hand non-parametric linage analysis does not require the specification of a disease model (hence why they are non-parametric), assess evidence for linkage by means of deviations from the expected allele sharing frequencies for a pair of related individuals. The results are often summarised by means of a MLS (Maximum LOD Scores) for simplicity (as human geneticists have this strange affinity to statistics that no one else can understand, really its just to confuse everybody else), but this does not tell you anything about the mode of inheritance in this instance, rather it tells you that there is significant deviation from the expected allele sharing. Nor do non-parametric linage tests tell you anything about which allele at a given locus is associated with disease (you need to perform tests of association in order to investigate this).

The Null Hypothesis

The Null hypothesis is the exact opposite of the hypothesis you have formulated for testing. This sounds like a very simple explanation, and thats because it is. You should have a clear idea of the question you are asking before you start analysing your data. In the examples dicsussed so far, if there is failure to detect a significant assoication with a given genotype in a population based association study then you null-hypothesis is exactly that, there is no difference between the frequency of genotypes in the cases and controls. In parametric linkage if the LOD score is not significant then the likelihood of your disease model is no better than that of the null-hypothesis.

Accepting or Rejecting your hypothesis

When carrying out a statistical test the results will be summarised by some sort of statistic, either p-values, LOD scores, NPL statistics, etc.. You should have decided in advance of performing the test at what level you are going to accept or reject your hypothesis, so the interpretation of the results should be straight-forward. If you do not understand the statistic that is being reported then you should ask for help in understanding that statistic, the interpretation of that statistic is down to the person who formulated the hypothesis.


Last modified: Tue Jan 27 11:17:13 GMT 2004