Two data sources — the British Household Panel Survey (BHPS) and British Election Study (BES
A. Data Sets
Two data sources — the British Household Panel Survey (BHPS) and British Election
Please have a look at their questionnaires to identify response variables that you would like
to model and to start searching for and reading substantive literature on them.
In the second technical report, fit a linear model, containing either a transformation or an
interaction term, to a (quasi-)continuous response that interests you. Please feel free to
search the UK Data Archive for data sets that fit your research interest. I recommend two
possible research designs.
Find a linear model in the literature. Fit the model to a data set. Then introduce a
transformation of the response or of one of the predictors. Argue theoretically and
empirically why this transformation makes sense to do.
Find a linear model without an interaction effect in the literature. Fit the model to a data set.
Then add the product of two of the predictors to the model and fit it to the same data set.
Argue theoretically and empirically why the interaction effect makes sense to include.
For background reading on interactions, Gelman and Hill (3.3 and 4.2) as well as Fox (7.3)
are good places to start; for background reading on transformations, Gelman and Hill (3.6
and 4) as well as Fox (4 and 12) are good starting points.
● Introduction: What did others do? What did you do? Why did you do it? Why is it
● Methods/Data: How did you do it?
● Results: What did you find?
● Discussion: What does it mean? What are the implications?
● Bibliography: Please list your references in alphabetical order here.
● Appendix: Please copy and paste your R code here.
D. Data analysis
● An initial data analysis that explores the numerical and graphical characteristics
of the data.
● Variable selection to choose the best model.
● An exploration of transformations to improve the fit of the model.
● Diagnostics to check the assumption of your model.
● Some predictions of future observations for interesting values of the predictors.
● An interpretation of the meaning of the model with respect to the particular area
There is always some freedom in deciding which methods to use, in what order to apply
them, and how to display and interpret the results. So there may not be one clear right
answer and good analysts come up with different models. Please have a look at
Gelman and Hill (2007: Chapter 4.6) for some general principles for building regression
E. Things to consider
● to identify, read, and discuss substantive work on the topic of analysis in the
● to recode missing values in the data while keeping an eye on the number of
● to keep the number of observations constant across models if you compare them,
● to interpret results of probability models in the right way: a change from, say, 40
to 44% is a difference of 4 percentage points (44% – 40% = 4 percentage points),
but an increase of 10 percent ([44 / 40] – 1 = 0.1 = 10%) and
● to interpret results of regression models in a descriptive or predictive way unless
you have given strong reasons for a causal interpretation.