How to Calculate a Paired Sample
T-Test — Student Guide
The paired sample t-test shows up in psychology, nursing, education, and social science courses constantly. The concept isn’t complicated. But students get tripped up on the formula, the assumptions, the degrees of freedom, and especially how to write the results up. This guide walks through all of it — what the test is, when to use it, how to calculate it by hand, how to run it in SPSS and Excel, and how to report it correctly.
📐 Working on a paired t-test assignment, lab report, or statistics homework? Our data analysis specialists can help.
Get Statistics Help →What a Paired Sample T-Test Is — Before You Touch the Formula
A paired sample t-test (also called a dependent samples t-test or repeated measures t-test) tests whether the mean difference between two related measurements is statistically significantly different from zero. The key word is related. The two sets of scores come from the same participants — measured twice — or from matched pairs. You’re not comparing two separate groups. You’re comparing one group to itself across two conditions or time points.
Here’s the intuitive version. You measure students’ anxiety before an exam and after. Same students, two scores each. The question the paired t-test answers: is the average before-minus-after difference large enough that it’s unlikely to have happened by chance? That’s it. The test works on the difference scores rather than on the raw scores themselves — which is what makes it more powerful than an independent samples t-test when the design is appropriate.
Compare it to an independent samples t-test, where you’d have two completely different groups. With paired data, each participant acts as their own control. That removes a lot of noise — individual differences in baseline levels cancel out when you subtract. The result is a more sensitive test, which is why researchers prefer a paired design when the study allows it.
When to Use It — and When Another Test Fits Better
Getting the test selection right matters. Using the wrong test is a methodological error your instructor will catch. The paired t-test is appropriate in exactly two situations.
Repeated Measures
The same participants are measured twice — before and after an intervention, or under two different conditions. Pre-test/post-test designs. Within-subjects experiments.
Matched Pairs
Two different participants are deliberately matched on relevant variables (age, IQ, baseline score) and then assigned to different conditions. One score per pair per condition.
Continuous DV
The outcome variable needs to be measured on an interval or ratio scale — something like test scores, reaction times, blood pressure, weight, or rating scales treated as continuous.
One Mean Difference
You’re testing one before-after comparison. If you have three or more time points, you need repeated measures ANOVA instead — running multiple paired t-tests inflates your Type I error rate.
| Your Design | Correct Test |
|---|---|
| Same participants measured before and after | Paired sample t-test ✓ |
| Two independent groups (different people) | Independent samples t-test |
| One group compared to a known population value | One-sample t-test |
| Same participants measured at three or more time points | Repeated measures ANOVA |
| Paired data but the normality assumption is violated and n is small | Wilcoxon signed-rank test (non-parametric alternative) |
Assumptions You Must Check — Not Optional
Every inferential test rests on assumptions. The paired t-test has three. Failing to check them — or acknowledge them — is a common reason statistics assignments lose marks. Your write-up needs to show you tested them.
Normality of Difference Scores
The distribution that needs to be normal is not the raw scores — it’s the differences
The paired t-test assumes that the difference scores (d = X₁ − X₂ for each participant) are approximately normally distributed in the population. This is a point many students get wrong — they test normality on the raw scores rather than on the computed differences. How to check it: run a Shapiro-Wilk test on the difference scores (preferred when n < 50), or use a Q-Q plot visually. With larger samples (roughly n > 30), the Central Limit Theorem means the sampling distribution of the mean difference will be approximately normal regardless — so the assumption is most critical with small samples. In SPSS, the normality test on difference scores is not automatic; you need to compute the difference variable first and then test it.
Dependent (Paired) Observations
The data structure must actually be paired — this is the design requirement, not just a statistical check
Each pair of scores must come from the same participant or matched unit. The scores within each pair are related; scores across pairs are independent. This is a design assumption — it can’t be fixed after data collection. If your two sets of scores come from different, unrelated participants, you don’t have paired data and need an independent samples t-test instead. A common error is treating paired data as independent samples, which produces incorrect results and loses the efficiency gain of the paired design.
Continuous Measurement Scale
The dependent variable must be interval or ratio level
The outcome variable needs to be measured at the interval or ratio level — test scores, time, blood pressure, weight, physiological measures. Ordinal data (like Likert scale items) is technically not appropriate for a paired t-test, though in practice many researchers use it with Likert scales treated as pseudo-interval. If your instructor is strict about this, use a Wilcoxon signed-rank test for ordinal outcomes. If your dependent variable is categorical, you need a completely different test (McNemar’s test for paired binary data).
What to Write When Assumptions Are Violated
If your Shapiro-Wilk comes back significant (p < .05), the normality assumption is violated. Your options: report the violation and note that the t-test is robust to mild violations of normality, especially with n > 30; or switch to the Wilcoxon signed-rank test. Either decision is defensible — just document it. Saying nothing about assumptions in your write-up is what costs marks.
The Formula — What Every Part Actually Means
There’s one formula. It looks like most t-test formulas because it shares the same logic: a signal divided by noise. The signal is the mean difference. The noise is how much those differences vary, adjusted for sample size.
The denominator — s_d ÷ √n — is called the standard error of the mean difference. It tells you how much the sample mean difference would vary across repeated samples of the same size. A large t-statistic means the mean difference is large relative to that variability. That’s evidence against the null hypothesis that the true mean difference is zero.
The paired t-test is essentially a one-sample t-test run on the difference scores. Once you compute d for each participant, you’re asking whether the mean of those d values is significantly different from zero.
— Core principle of paired t-test logicStep-by-Step Calculation — A Full Worked Example
Here’s a clean example. A researcher measures participants’ stress scores before (X₁) and after (X₂) a mindfulness intervention. Lower scores mean lower stress. Ten participants, two measurements each.
| Participant | Before (X₁) | After (X₂) | d = X₁ − X₂ | d² |
|---|---|---|---|---|
| 1 | 72 | 65 | 7 | 49 |
| 2 | 85 | 78 | 7 | 49 |
| 3 | 68 | 60 | 8 | 64 |
| 4 | 90 | 82 | 8 | 64 |
| 5 | 76 | 71 | 5 | 25 |
| 6 | 81 | 70 | 11 | 121 |
| 7 | 70 | 65 | 5 | 25 |
| 8 | 88 | 80 | 8 | 64 |
| 9 | 74 | 68 | 6 | 36 |
| 10 | 79 | 74 | 5 | 25 |
| Totals | — | — | Σd = 70 | Σd² = 522 |
Calculate the difference score (d) for each pair
Subtract X₂ from X₁ for every participant. Be consistent — always subtract in the same direction. The sign matters. A positive d means the before score was higher; negative means it went up after the intervention. In this example all differences are positive, meaning stress dropped for every participant.
From the table: d values are 7, 7, 8, 8, 5, 11, 5, 8, 6, 5
Calculate the mean difference (d̄)
Sum all the d values and divide by n. This is the average before-minus-after difference across all participants.
d̄ = 70 ÷ 10 = 7.0
Calculate the standard deviation of the differences (s_d)
This requires the sum of squared differences (Σd²) and the square of the sum of differences (Σd)². Use the computational formula below — it’s faster and less prone to rounding errors than the definitional formula.
= √[ (522 − (70)²/10) ÷ (10−1) ]
= √[ (522 − 4900/10) ÷ 9 ]
= √[ (522 − 490) ÷ 9 ]
= √[ 32 ÷ 9 ]
= √3.556
= 1.886
Calculate the standard error of the mean difference
Divide the standard deviation of differences by the square root of n. This is the denominator of the t formula.
SE = 1.886 ÷ √10
SE = 1.886 ÷ 3.162
SE = 0.596
Calculate the t statistic
Divide the mean difference by the standard error. The result is your t value.
t = 7.0 ÷ 0.596
t = 11.74
Determine degrees of freedom
For a paired t-test, df = n − 1. With 10 pairs, df = 9. You use df to look up the critical value in a t-distribution table or to interpret the p-value from software.
Critical Value and Statistical Decision
Once you have t and df, you need to decide whether to reject the null hypothesis. The null here is H₀: μ_d = 0 — the true mean difference in the population is zero (the intervention had no effect). The alternative is typically H₁: μ_d ≠ 0 for a two-tailed test.
With df = 9 and α = .05, the critical value for a two-tailed test from a t-distribution table is ±2.262. Your calculated t = 11.74. Since 11.74 > 2.262, you reject the null hypothesis. The mean stress score was significantly lower after the mindfulness intervention than before.
One-Tailed vs Two-Tailed — Know Which Your Instructor Expects
A two-tailed test asks whether the mean difference is different from zero in either direction. A one-tailed test asks specifically whether it’s higher or lower. Most assignments default to two-tailed unless the research hypothesis is directional. When in doubt, use two-tailed — it’s the more conservative choice and is almost always acceptable.
| df | α = .05 (two-tailed) | α = .01 (two-tailed) | α = .05 (one-tailed) |
|---|---|---|---|
| 5 | ±2.571 | ±4.032 | 2.015 |
| 9 | ±2.262 | ±3.250 | 1.833 |
| 14 | ±2.145 | ±2.977 | 1.761 |
| 19 | ±2.093 | ±2.861 | 1.729 |
| 29 | ±2.045 | ±2.756 | 1.699 |
| ∞ | ±1.960 | ±2.576 | 1.645 |
Effect Size — Cohen’s d for Paired Designs
Statistical significance tells you whether the result is likely real. Effect size tells you whether it’s meaningful. A study with 500 participants can find a statistically significant difference that is practically trivial. Cohen’s d gives you the standardized magnitude of the difference.
| Cohen’s d Value | Conventional Interpretation | What It Means Practically |
|---|---|---|
| ~0.2 | Small effect | The difference exists but would be hard to notice in real-world observation |
| ~0.5 | Medium effect | Noticeable and practically meaningful — visible to a careful observer |
| ~0.8 | Large effect | A substantial difference — obvious and practically important |
| >1.0 | Very large effect | Dominant — the intervention or difference is highly impactful |
These benchmarks come from Cohen (1988) and are widely used in psychology, education, and social science. Some fields have their own conventions — nursing and medicine often use different thresholds depending on clinical significance. Always include effect size in your write-up; many instructors and journals now require it alongside the p-value.
Running the Paired T-Test in SPSS
Most university statistics courses run analyses in SPSS. The paired t-test is straightforward to run — the steps below get you to the output you need.
Enter your data correctly
In SPSS Data View, your data goes in two columns — one for the before scores, one for the after scores. Each row is one participant. Do not stack the data in one column (that’s for independent samples). Label the variables clearly in Variable View (e.g., “Stress_Before” and “Stress_After”).
Navigate to the test
Go to Analyze → Compare Means → Paired-Samples T Test. Move both variables into the “Paired Variables” box as a pair — Stress_Before in Variable 1 and Stress_After in Variable 2. Click OK.
Read the output — three tables
SPSS gives you three tables. Paired Samples Statistics: means, SDs, and SEs for each variable separately. Paired Samples Correlations: the correlation between the two measurements (informational, not the main result). Paired Samples Test: this is the one you need — it shows the mean difference, SD of differences, SE, t statistic, df, and two-tailed significance (p-value). SPSS reports a 95% confidence interval for the mean difference here too, which is useful for your write-up.
Compute Cohen’s d manually
SPSS does not automatically report Cohen’s d for paired t-tests. Take the mean difference and the SD of differences from the Paired Samples Test table and divide: d = mean difference ÷ SD of differences. Some versions of SPSS (v27+) have an optional effect size output — check Options in the dialog box.
Check normality on the difference scores
To test the normality assumption properly, compute the difference variable first: Transform → Compute Variable, then define a new variable as Stress_Before minus Stress_After. Run Explore on that new variable (Analyze → Descriptive Statistics → Explore) and check the Shapiro-Wilk result under “Tests of Normality.” A non-significant result (p > .05) supports the normality assumption.
What to Screenshot for Your Assignment
Most instructors want the Paired Samples Test table in your output. Include it in an appendix or paste it into your results section. Clean up the SPSS output formatting before submitting — default SPSS tables are not APA formatted. You’ll need to rebuild the table in Word if APA format is required.
Running the Paired T-Test in Excel
Excel can run a paired t-test quickly. Two ways to do it — a built-in function or the Data Analysis ToolPak.
Method 1: T.TEST Function
One formula, returns the p-value directly
In an empty cell, type: =T.TEST(array1, array2, tails, type)
array1 — the range of your before scores (e.g., A2:A11). array2 — the range of your after scores (e.g., B2:B11). tails — 2 for a two-tailed test, 1 for one-tailed. type — 1 for paired (dependent). So for this example: =T.TEST(A2:A11, B2:B11, 2, 1). This returns the p-value directly. It does not give you the t statistic itself. To get t, you’ll need to compute d̄ and s_d separately and apply the formula manually, or use the ToolPak.
Method 2: Data Analysis ToolPak
Returns a full output table including t, df, p-value, and critical value
Enable the ToolPak first if you haven’t: File → Options → Add-ins → Analysis ToolPak → Go → check the box. Then go to Data → Data Analysis → t-Test: Paired Two Sample for Means. Enter Variable 1 Range (before scores), Variable 2 Range (after scores), set Hypothesized Mean Difference to 0, choose Alpha (0.05 typically), and select an output range. The output gives you: means, variances, observations, Pearson correlation, df, t Stat, P(T<=t) one-tail, t Critical one-tail, P(T<=t) two-tail, t Critical two-tail. Everything you need for your write-up is in that table.
APA Write-Up — How to Report Your Results
The APA format for reporting a paired t-test is standardized. Using the numbers from the worked example above:
APA Results Paragraph — Worked Example
APA 7th EditionA paired samples t-test was conducted to examine whether stress scores differed significantly between pre-intervention and post-intervention time points. The assumption of normality of difference scores was assessed using the Shapiro-Wilk test and was not violated (p = .23). Results indicated that stress scores were significantly lower after the mindfulness intervention (M = 71.30, SD = 6.85) compared to before (M = 78.30, SD = 7.27), t(9) = 11.74, p < .001, d = 3.71, 95% CI [5.65, 8.35]. The effect size was very large.
A paired samples t-test was conducted to [state the research question]. Results indicated that [condition 1] (M = ___, SD = ___) was significantly [higher/lower/different from] [condition 2] (M = ___, SD = ___), t(df) = ___, p = ___, d = ___, 95% CI [lower, upper].
A few specifics to get right:
- Report exact p-values rather than just “p < .05” — use “p = .023” not “p is significant.” Exception: when p < .001, report as “p < .001”
- The degrees of freedom go inside the parentheses after t — written as t(df), not t with a subscript
- Include means and SDs for both conditions in the text or in a table
- Report effect size (Cohen’s d) alongside the p-value — APA 7 strongly recommends this
- Include the 95% confidence interval for the mean difference
- State whether assumptions were checked and whether they were met
- Use past tense throughout the results section
FAQs — What Students Ask Most About the Paired T-Test
Pulling It Together
The paired t-test has one job: test whether the mean difference between two related measurements is statistically significantly different from zero. The formula is straightforward once you work through it on a real dataset. Compute the differences. Find the mean and SD of those differences. Plug into t = d̄ ÷ (s_d ÷ √n). Compare to a critical value or interpret the p-value. Report effect size alongside significance.
The parts that actually trip students up are usually the assumptions (especially checking normality on the differences, not the raw scores), knowing when the paired test is appropriate versus the independent samples version, and writing the APA results paragraph in the right format. Those are learnable. Work through one example by hand and the logic clicks into place.
If you need help at any point — with the calculation, the SPSS output, the write-up, or understanding your results — the statistics specialists at Smart Academic Writing work with students on exactly this kind of assignment. Support is available through statistics assignment help and data analysis help.