Statistical Advice  NHSGGC staff can contact Dr David Young, Consultant Statistician for the NHS or alternatively, contact the Robertson Centre for Biostatistics at the University of Glasgow who offer an initial half hour consultation free of charge. Statistical Packages  R is a free statistical package, contact the IT helpdesk to have this installed on your device. Microstrategy is another alternative; contact Jonathan Todd, Health of Information Management, eHealth to discuss. NHSGGC does not have a institutional subscription to SPSS, departments should contact Procurement if they wish to purchase a subscription. 
Numbers needed to treat / Numbers needed to harm
Statistical vs Clinical Significance
Links: http://www.bandolier.org.uk/booth/glossary/ARR.html
Incidence of new edema after randomization was 31.40 % in test group and 46.51 % in control group [p = 0.03; absolute risk reduction (ARR) = 15.1 %; Number Needed to Treat (NNT) = 7, ITT analysis].
Absolute risk measures the size of a risk in a person or group of people. This could be the risk of developing a disease or symptom over a certain period, or it could be a measure of the effect of a treatment. It is calculated by dividing the number of events in the group by the number of people in that group. Absolute risk reduction is calculated as the absolute risk figure in the control group minus the absolute risk figure in the treatment group.
Relative risk is also known as risk ratio and compares a risk in two different groups of people (e.g. group given a treatment versus group not given it). Relative risk is calculated from the absolute risk figure in the treatment group divided by that in the control group. A relative risk reduction (RRR) figure is found by subtracting the relative risk figure from 1. The problem with using RRR is that we cannot assess the actual effect size if the event rate in the control group is not known. It can give an artificially high idea of the effectiveness of a treatment or the risk of harm.
For this study therefore absolute risk reduction is ([absolute risk in control group] 46.51%  [absolute risk in treatment group] 31.40%) 15.1%. Relative risk is 67.5% (found by 31.40%/46.51%) and so relative risk reduction for the study is 32.47%. Absolute risk reduction relating to the common side effect of new leg oedema among hypertensive patients treated with the alternative drug therefore is 15.1%.
Confidence intervals (CI) are common to most research articles.
Confidence intervals account for uncertainty in measurements giving us a range of data where the ‘true value’ is likely to be. The confidence interval is commonly given as 95% e.g. the mean value is 7 with a 95% CI of 510.
Smaller ranges show more precise results; sample size is related to confidence intervals so larger studies tend to have a narrower CI.
When you are assessing the clinical significance of any result always look for the ‘best’ and ‘worst’ case scenarios from either end of the confidence interval.
Typically if the CI includes zero then the result is statistically nonsignificant however if the CI is given alongside a ratio e.g. odds ratio, relative risk and contains a one then this is statistically nonsignificant.
Confidence intervals are a key part of Forest Plots diagrams which are generally found in studies which have carried out a metaanalysis . See Forest Plots for more information.
A confounder is a factor which has a hidden effect on the outcome of research and can invalidate results. A positive confounder results in an association between the exposure and outcome which are not in fact associated. It is also possible to have a negative confounder which obscures an association which does exist. Researchers should take confounding factors into account when designing their study and should use appropriate statistical methods to uncover the impact of confounders on study results.
What are True/False Positives/Negatives?
Papers about diagnostic tests will talk about true (or false) positives (or negatives). This refers to the accuracy of a binary test result.
A diagnostic test result has four possibilities. It will either show the presence of the condition being tested for (a positive result), or the absence of the condition being tested for (a negative result). The result given will either be true or false:
 A true positive occurs when a person who tested positive does have the condition.
 A false positive occurs when a person who tested positive does not have the condition.
 A true negative occurs when the person who tested negative does not have the condition.
 A false negative occurs when the person who tested negative does have the condition.
A good diagnostic test will minimise false positives and negatives where possible, however it’s not possible to entirely eliminate them.
In diagnostic test reporting the true/false positive/negative rate is often displayed using a 2x2 table. This allows other calculations to be done.
The example 2x2 table below shows the results of a study where 100,000 people have been screened for condition X using a new test. 200 people tested positive. After further investigation, 196 of those who tested positive were found to actually have condition X.
Disease present 
Disease not present 
Total 

Positive test 
196 
29994 
30190 
Negative test 
4 
969806 
969810 
Total 
200 
999800 
1000000 
What is sensitivity and specificity?
In the context of a diagnostic test, the sensitivity of the test is the proportion of true positives that it identifies. This means the proportion of the time that a person with the condition being tested for receives a positive test result.
The specificity of the test is the proportion of true negatives that it identifies. This means how often someone receiving a negative test result is free of the condition being tested for.
 The more sensitive a test is, the more likely a negative test will mean that the person does not have the disease (a way to remember this is using the acronym SnNOUT (In a highly Sensitive test, Negative rules it OUT).
 The more specific a test is, the more likely a positive test will mean that the person does have the disease (a way to remember this is using the acronym SpPIN (In a highly Specific test, Positive rules it IN)
Sensitivity & specificity are interrelated: as sensitivity increases, specificity decreases and vice versa.
Calculating sensitivity and specificity
Reporting on diagnostic test performance often uses a 2x2 table to calculate sensitivity and specificity.
The table shows how many positive and negative results were reported by the test, and how many of these were true positives and negatives (via confirmation of the results by a “gold standard” test).
In the example below 100,000 people have been screened for condition X using a new test. 200 people tested positive. After further investigation, 196 of those who tested positive were found to actually have condition X.
Disease present 
Disease not present 
Total 

Positive test 
196 
29994 
30190 
Negative test 
4 
969806 
969810 
Total 
200 
999800 
1000000 
True positives identified: 196/200 = 0.98
True negatives identified: 969806/999800 = 0.97
The test has a sensitivity of 0.98 and specificity of 0.97 (i.e. 2% of the time the test will report a false positive, and 3% of the time it will report a false negative.)
What are Positive & Negative Predictive Values (PPV/NPV)?
Predictive values tell us the probability that the test will be correct.
Positive predictive value (PPV) = proportion of correct positive diagnoses
Negative predictive value (NPV) = proportion of correct negative diagnoses
Calculating the PPV/NPV of a test
A 2x2 table can be used to calculate PPV & NPV.
Disease present 
Disease not present 
Total 

Positive test 
196 
29994 
30190 
Negative test 
4 
969806 
969810 
Total 
200 
999800 
1000000 
PPV= proportion of correct positive diagnoses
ie: 196/30,190 = 0.006 (or 0.6%)
NPV= proportion of correct negative diagnoses
ie: 969,806/969,810 = 0.999 (99.9%)
Note on prevalence
Prevalence is the % of the population being studied that has the disease/condition being tested for.
NPV/PPV will be affected by the prevalence in the sample being tested.
ie: screening for a rare disease in a large sample (as in the previous example) will almost certainly result in many false positives and a low PPV.
So, PPV & NPV will change depending on prevalence and predictive values observed in one study will not apply universally.
References
Zakowski L, Seibert C & VanEyck S. 2004, Evidencebased medicine: answering questions of diagnosis. Clin Med Res. 2(1):63–69.
Also known as the ‘Standardised Mean Difference (SMD)’or ‘Cohen’s d’.
The effect size is used to compare the results of studies which used different outcome measures. In general the larger the value of the effect size the greater the impact of the intervention however it is important to bear in mind the outcome measure being used in the study. If a higher score on the outcome measure is considered an improvement then a SMD greater than zero shows that the intervention has a great impact than the comparator. If a lower score on the outcome measure is associated with an improvement then a SMD lower than zero indicates the degree to which the intervention is more effective than the comparator.
As a general guide:
 Small effect size, SMD = 0.2
 Medium effect size, SMD = 0.5
 Large effect size, SMD = 0.8
A SMD of zero means that the intervention and the comparator have the same effect.
A forest plot – also known as a blobbogram – is a visual representation of the results of a number of scientific studies addressing the same question, along with the overall results. They are often used in metaanalyses.
Each study included in the metaanalysis will be listed in the forest plot with references (author and date), and other information such as the number of people in each study (N), the number of people with the specified outcome (n), and the confidence interval. This allows you to compare the results of individual studies; to see the weight of each study (ie. (how much influence each study has on the overall results of the metaanalysis); and to see an overall result.
Further information here:
Students 4 Best Evidence: How to read a forest plot? www.students4bestevidence.net/forestplot/
Students 4 Best Evidence: Tutorial: How to read a forest plot. www.students4bestevidence.net/tutorialreadforestplot/
How to interpret a forest plot [video]. www.youtube.com/watch?v=pyL8DvJmDc
In a randomised controlled trial subjects are randomly allocated to two treatment groups (an intervention and control group). This is done to achieve two groups that are broadly similar in characteristics, so that the only difference between the two groups is the intervention being tested. It is important to analyse subjects in the groups they were originally allocated to even if they don't comply with the intervention being tested or even complete the trial. This maintains the balance of characteristics across the intervention and control groups achieved by the original randomisation of subjects. This reduces the risk of bias, the chance that factors other than the intervention being analysed have influenced the outcome of the study. The analysis of subjects according to the groups they were originally allocated is called intention to treat analysis.
Number needed to treat (NNT)/Number needed to harm (NNH)
Links: https://www.students4bestevidence.net/numberneededtreat/
http://www.bandolier.org.uk/booth/glossary/NNT.html
https://www.gpnotebook.co.uk/simplepage.cfm?ID=x20050322121805411760
Incidence of new edema after randomization was 31.40 % in test group and 46.51 % in control group [p = 0.03; absolute risk reduction (ARR) = 15.1 %; Number Needed to Treat (NNT) = 7, ITT analysis].
NNT (number needed to treat) is the number of patients requiring the intervention for one to avoid a specified bad outcome (in this case leg oedema). The ‘ideal’ NNT is 1, where everyone has improved with treatment. The higher the NNT, the less effective the treatment, but how clinically important this is depends on factors including the cost of the intervention versus the severity of the condition (e.g. an NNT of 40 is reasonable for lowcost interventions such as aspirin to prevent heart attacks). Assessing the significance of an NNT figure also requires knowledge of what the treatment is being compared with and what the outcome is. Other aspects may also be relevant including how long the study continued. An NNT figure will not give a good idea of the degree to which a patient will benefit, and cannot be used in metaanalyses as baseline risk often varies heavily between studies.
For this study NNT can be calculated as NNT=1/ARR (1/15.1%) – in this case the roundedup figure is 7 (NNT figures are always rounded up to the nearest whole number).
NNH (Number needed to harm) is found in some studies and indicates the number of patients who need to receive an intervention to experience a specific adverse outcome – in this case a lower number indicates a higher risk factor.
Links: http://www.bandolier.org.uk/booth/glossary/Odds.html
Among the pregnant women aged 40 years or older, 10.82% experienced one or more of the selected adverse pregnancy outcomes compared with 5.46% of pregnant women aged 2034 years (odds ratio [OR] 2.02, 99.8% CI 1.782.29).
From: Frederiksen LE; Ernst A; Brix N; Braskhoj Lauridsen LL; Roos L; RamlauHansen CH; Ekelund CK. (2018) Risk of Adverse Pregnancy Outcomes at Advanced Maternal Age Obstetrics & Gynecology. 131(3):457463
In casecontrol studies, where the total number of exposed people is not available, relative risk cannot be calculated and odds ratio is used instead as a measure of the strength of association between exposure and outcome. Where the number at risk (number exposed) is available, either relative risk or odds ratio can be calculated.
An odds ratio (OR) is the ratio of the odds of a specific outcome (e.g. death, or recovery) among those treated/receiving the intervention versus the odds of that specific outcome among the control/placebo group. If the outcome is the same in both groups the ratio will be 1, which implies there is no difference between the two parts of the study. An odds ratio over 1 indicates that the intervention is more effective than the control in bringing about an outcome and a ratio of under 1 indicates that the intervention is less effective in bringing about the outcome. Where the outcome (as in this example article) is undesirable an odds ratio of over 1 indicates that the ‘intervention’ (higher maternal age) is a risk factor for the adverse outcome. High odds ratio figures should not though be taken in isolation as indicating very high risks.
The easiest way to calculate an odds ratio (or to check that presented in an article) is via a 2x2 table in this format:
Exposure status 
Outcome status  Positive 
Outcome status  Negative 
Positive 
A 
B 
Negative 
C 
D 
Odds ratio = (A/C)/(B/D)
For the above article, looking specifically at the odds ratio of experiencing adverse pregnancy outcomes between pregnant women aged over 40 and pregnant women aged 2034, this is expressed as (taking data from Table 3):
Women aged over 40 with adverse pregnancy outcomes: 1054 (intervention group experiencing outcome)
Women aged over 40 without adverse pregnancy outcomes: 8689 (intervention group not experiencing outcome)
Women aged 2034 with adverse pregnancy outcomes: 16429 (control group experiencing outcome)
Women aged 2034 without adverse pregnancy outcomes: 284434 (control group not experiencing outcome)

Adverse outcomes 
No adverse outcomes 
Aged over 40 
1054 
8689 
Aged 2034 
16429 
284434 
Odds ratio here therefore is (1054/16429)/(8689/284434) = 0.06/0.03 = 2
If the prevalence of a disease is low then the odds ratio in a case control study is a good approximation to the relative risk.
https://www.nhs.uk/news/Pages/Newsglossary.aspx good for simple lay explanations of terms
https://www.students4bestevidence.net/relativemeasureseffectscanmisleading/?utm_content=buffer6f02a&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer good on how to interpret claims of ‘risk’
P values are given in the majority of research articles.
The P value gives the probability of any observed difference having happened purely by chance.
By convention a P value of 0.05 or less shows statistical significance i.e. the probability of this difference having happened by chance is 1 in 20.
‘The closer the P value is to zero the more likely it is that the effect reported is not due to chance.’ P values greater than 0.05 are statistically nonsignificant.
It is important to note that statistical significance as shown by P values is not the same as clinical significance. Statistical significance judges if the effect of a treatment can be explained as a chance finding. Clinical significance is whether any treatment effects would be worthwhile for patients in a real life setting. A ‘statistically significant’ improvement may not translate into a meaningful clinical improvement.
The power of a study is the ability of the test to detect an effect if it actually exists. A study powered correctly will not produce a statistically significant result if it doesn’t exist. Or an assessment of the likelihood that the study will be able to correctly identify the true effect of an intervention. The power of a study depends on:
 the effect size required to be clinically significant. This relates to the context of the study and what difference would be considered clinically important.
 the P value used to decide if the results are statistically significant i.e. the result is "real" and not down to random chance (usually P<0.05).
 the number of participants (or sample size)
Clinical studies are expected to have a power of at least 80% (preferably 90%). A Power calculation is usually turned around to calculate the sample size required, presuming a P value of <0.05, a power of at least 80% and an expected effect size.
NB: If the number of participants falls below the required sample size, it may be difficult for the study to produce meaningful results. An underpowered study is often reflected in wide confidence intervals and high P values (i.e. not statistically significant).
Statistical vs Clinical Significance
Statistical significance, often represented by P<0.05, is a way to show how likely it is that an observed outcome is due to particular exposure or intervention and not down to random chance. If a result is statistically significant it means it is unlikely to have occurred by random chance.
Clinical significance however is looking at the clinical effectiveness of a specific exposure or intervention and the degree of beneficial effect under real world clinical settings. The practical importance of a treatment effect should be considered and whether it has a genuine and noticeable effect on daily life.
Gosall, N.K. and Gosall, G.S. (2015) The Doctor’s Guide to Critical Appraisal. Fourth Edition.
Harris, M and Taylor, G. (2014) Medical Statistics Made Easy. Third Edition.
Bootland, D et al. (2016) Critical Appraisal from Papers to Patient: a practical guide.
Comments
0 comments
Please sign in to leave a comment.