Experimental Design

Experimental design is the science and subfield of statistics which focuses on how to collect data effectively. To do this, it describes how to conduct exeriments to collect meaningful data.

To design an experiment, we need to go through the following stages, described below.

(1) Problem Definition

We first want to understand what problem or research question we wish to orient our experiment towards addressing.

Such a focus should be specific, and will help lead the following planning of the experiment.

(2) Variable Identification

After defining our problem, we then want to identify our variable(s) of interest for the study, which describes any characteristic recorded for subjects in a study. There are two main types of variables we want to identify:

  • Independent Variable: Any variables which may influence the outcome (dependent variable).
  • Dependent Variable: Any variable describing the outcome we’re interested in.

We will manipulate our independent variable, and measure changes in our dependent variable.

We will also want to define the population we will collect data from, which describes our entire target group.

Often due to technical limitations, we often don’t study the entire population - instead, we will take samples from the population to collect data from, which are smaller groups that are (ideally) representative of the population.

(3) Hypothesis

After identifying our variables, we then want to define a hypothesis, which is a testable statement that addresses our problem / question. It will represent an educated guess between the variables, and we will collect data to test whether or not this hypothesis is true.

Confounding

While planning our hypothesis, we also want to consider potential effects of confounders, which are any factors that may influence the relationship between the independent and dependent variables. Such confounders can often impact the results by introducing bias.

To deal with confounding, we can use the following 3 techniques:

  • Control: By using an (unaffected) control group, we can test for the population’s baseline, without any influence of the independent variable. We can account for this baseline to be sure that any effects are the result of our independent variables.
  • Randomization: Random assignment of participants to groups helps diversify who’s in what group, which reduces the risk of bias by spreading potential confounding variables equally.
  • Replication: Repeating the experiment multiple times helps assess the consistency and reliability of results.

(4) Data Collection

Finally, we can choose our method of data collection and begin collecting data! There are many different ways data can be collected, and some of the most common methods are given below.

Obesrvational Studies

In an observational study, we record data (variables) without intervention in the senario (without manipulation of variables). There are 3 types of observational studies, described below.

  • Cross Sectional Studies: Data is collected from many different individuals at one specific point in time.
  • Retrospective (Case Control) Studies: Looking backwards at series in events in the past to examine relationships between events and outcomes.
  • Prospective (Longitudinal / Cohort) Studies: Following a group of people (called a cohort) closely, over a period of time.

Surveys

In a survey, information is collected through structured interviews or questionnaires.

Surveys are often done when the subjects are people, and careful consideration must be taken in the wording of questions as to avoid bias.

Experiments

In an experiment, the independent variable (called the treatment) is assigned to groups and the effects are observed. In experiemnts, we are actively changing the variables to measure changes.

It’s important to account for the placebo effect in such experiments, where the belief in a treatment (even if not administered) creates effects, due to the person’s psychological beliefs affecting them. This is often why experiments need control groups.

A common method to minimize bias in experiments is by using blinding, where people involved in the experiment don’t know who’s getting the treatment or not.

  • Single-Blinding: Either the participants or researchers don’t know who’s receiving the treatment.
  • Double-Blinding: Both the participants and researchers don’t know who’s receiving the treatmnt.

Simulations

In a simulation, artificial scenarios are created to replicate some process or situation in the real-world. This is often done when the actual situation (ex. natural disasters) are too experience or dangerous to replicate in real life.