Required for next week

  1. Choice(s) of essay topics due next week.
  2. No new reading for next week. Please have another look at the second hand smoke papers, and the cheating papers.

Some notes on the smoking video

(Program Tape 11 from Against All Odds)

Prospective and retrospective studies

A distinction is made in the video between prospective and retrospective studies. A retrospective study identifies a group of people with a disease (or other outcome), and tries to ascertain potential risk factors in retrospect. This is usually done by identifying another group of people who do not have the disease in question, and comparing these two groups on their exposure to a possible risk factor. In medical applications, this type of study is often called a case-control study: the cases are the people with the disease, and the controls are the disease-free comparison group. For example, one study of the effect of second-hand smoke on spouses ([ JAMA]), identified 653 non-smoking women with lung cancer, and collected data on how many of these were married to smokers. They identified a further group of 1,253 non-smoking women without lung cancer, and also collected data on how many of these were married to smokers. A prospective study identifies a group of healthy people and follows them over time, to ascertain who gets the disease and who doesn't, and then compares these two groups on their potential risk factors. In medical applications a prospective study is often called a cohort study. The smoking video referred to early evidence from retrospective studies being confirmed in a large prospective study.

(An opinion poll or other type of survey is more like a prospective study, although the data from particular individuals is usually only collected at one time point.)

A major disadvantage of a case-control study is that the controls are usually chosen by convenience. Very often the controls are patients in the same hospital, who have been hospitalized for a condition unrelated to the disease under study. Thus there is no guarantee that the controls are really comparable to the cases. In a prospective study, effort is usually made to measure many variables potentially related to the disease, so that the comparison can be fine-tuned at the end of the study. Another disadvantage of a case-control study is that people are asked about their exposure to a risk factor in retrospect, and this can provide quite unreliable data. Both these criticisms were made of the initial studies suggesting a link between smoking and lung cancer, and both these criticisms are made by Gross in his article about second-hand smoke (last week's reading).

A major disadvantage of prospective studies is that it is necessary to follow a very large group of people over a long period of time, especially when investigating rare diseases or diseases that take a long time to appear. This is not only expensive, it is also very difficult to get complete data, and biases can result if loss to followup is related to the outcome of interest.

Both prospective and retrospective studies are observational studies, as opposed to experiments. Experiments require active intervention on the part of the researchers, something often not possible in medical research or other research involving human subjects.

Confounding variables

All observational studies can potentially be skewed by confounding variables. These are variables that are not of primary interest, and may not even be measured, but are correlated with the response of interest. For example, in the smoking video, the possibility was mentioned that alcohol consumption and lung cancer incidence were associated. Since we might also expect that smoking and alcohol consumption are related, the observed increase in lung cancer among smokers might only reflect the association of lung cancer and alcohol consumption.

In the polling for the Quebec referendum, potential confounding variables that were mentioned (and adjusted for when allocating the undecideds) included gender, age, and first language.

In observational studies effort is usually made to identify all possible confounding variables, and adjust for them in the analysis of the data. But it is impossible to adjust for confounding variables that no one has thought of yet.

Criteria for causality

It can be argued that observational studies can never establish a causal relationship between two variables, such as smoking and lung cancer: that causality can only be established by experimentation involving direct intervention. However, the science of epidemiology, which studies various aspects of public health, deals almost exclusively with observational studies, and as a result have identified a series of criteria for deciding that a causal relationship is very likely. In fact, these criteria were originally developed by Sir Richard Doll in connection with studies on lung cancer and smoking in Britain in the fifties. The five criteria mentioned on the video are:

Simpson's paradox

A particularly intriguing type of confounding occurs when a confounding variable, once exposed, completely turns around the observed association. Here is an example, from [ Rad], on the relationship between race and the imposition of the death penalty for convicted first-degree murderers.

race of		death penalty	death penalty
defendant	imposed		not imposed	percentage
----------------------------------------------------------
white		19		141		11.88%
black		17		149		10.24%

The imposition of the death penalty is about the same rate for white and black defendants, in fact, slightly higher for white defendants. However, when the race of the victim is taken into account, a quite different picture emerges:

			white victim



race of		death penalty	death penalty
defendant	imposed		not imposed	percentage
----------------------------------------------------------
white		19		132		12.58%
black		11		52		17.46%

			black victim


race of		death penalty	death penalty
defendant	imposed		not imposed	percentage
----------------------------------------------------------
white		0		9		0%
black		6		97		5.83%

References

JAMA
Reported in Chance News 3.08 , J. American Medical Association, June 8, 1994.
Rad
Radelet, M. (1981) Racial characteristics and imposition of the death penalty. American Sociological Review 46, 918--927. There is a related article in Chance News 4.04 from the New York Times, February 24, 1995.
[]
The book Statistics: concepts and controversies by Moore has a nice section on causality, smoking and cancer: Chapter 6, Section 3.

In the Globe and Mail this week


Technical note: relative risk

The relative risk for an event or outcome, such as lung cancer, due to a risk factor, such as smoking, is the ratio of the probability of the event, given exposure to the risk factor to the probability of the event, given non-exposure to the risk factor. If you like formulas:
			Prob(event|exposure)
relative risk  =     ----------------------------
			Prob(event|non-exposure)
If the relative risk is equal to 1, then there is no (apparent) increase in the probability of disease, due to the risk factor. If the relative risk is greater than 1, then exposure to the risk factor does increase the probability of disease, and if the relative risk is less than 1, exposure decreases the probability of disease. Of course in studies, the relative risk can only be estimated, and there is always an associated margin of error.

In the smoking and lung cancer studies, the relative risk for lung cancer identified with smoking was 9, for one-pack-a-day smokers, and 30 for two pack-a-day-smokers. In the ETS studies, the relative risk for lung cancer identified with exposure to second-hand smoke was between 1 and 2 in most studies, and overall was about 1.13. This means an increase of 13\% in the probability of lung cancer due to exposure to ETS. Since the lung cancer incidence is already very small, about 1/10,000 (?), this doesn't translate into very many additional expected cases of lung cancer. (The EPA study identified a relative risk of 2, which translated into an estimated 3000 additional cases of lung cancer in the U.S.; a guesstimate for Canada would be 300, since the population is about 1/10 the size.)

Even more technical

In case-control studies, which all the ETS studies are, the relative risk must be estimated indirectly, by the so-called odds ratio, which is the ratio of the odds of the event, given exposure, to the odds of the event given non-exposure:
			odds of event|exposure
odds ratio  =  -----------------------------------
			odds of event|non-exposure

		Prob(event|exposure)/Prob(non-event|exposure)
            =  -----------------------------------------------
		Prob(event|non-exposure)/Prob(non-event|non-exposure)

One further technical point is that the margin of error for the estimate of relative risk is computed as a 'plus-or-minus' on the log-scale, which is why Gross exhibits his summary of the studies this way.