consciouslink

Reading the Evidence

A crash course in the numbers behind the insights. You don't need a statistics degree — you need to know what to look for and what it means.

The one number that matters most: effect size

Most people have heard that a study found something "statistically significant." That tells you the result probably isn't random noise. But it doesn't tell you how much it matters.

Effect size does. It's the answer to: "OK, it works — but how much?"

Throughout these insights, we use Cohen's d as our common unit. Think of it as a universal ruler for comparing results across completely different studies — sleep research, trauma therapy, diet interventions, meditation. Cohen's d translates all of them into the same scale.

How to read Cohen's d
d value What it means In everyday terms
0.2 Small effect Real but hard to notice without measuring. Like the difference between 5'9" and 5'10".
0.5 Medium effect Noticeable. Like the height difference between 14- and 18-year-old girls.
0.8 Large effect Obvious. You'd see it without a ruler.
1.0+ Very large Dramatic. The kind of difference that reorganizes how you think about something.

A negative d (like d = −0.41) means the effect goes in the negative direction — a treatment reduces something (depression, inflammation, risk). The size still follows the same scale: −0.41 is a moderate reduction.

When we say dietary pattern has d = 1.85 or ACE burden has d = 1.07, those are very large effects. When exercise shows d = 0.42 for general cognition, that's a real but more modest effect. The numbers let you compare directly: the dietary pattern finding is roughly 4× stronger than the exercise-cognition finding.

Where these numbers come from: study types

Not all evidence is created equal. A single study with 20 people tells you less than a synthesis of 50 studies with 50,000 people. Here's the hierarchy we use, from strongest to weakest:

Study type What it is Why it matters
Meta-analysis A study of studies. Combines results from dozens or hundreds of individual experiments into a single pooled estimate. The closest thing science has to a final answer. When we cite "meta-analysis, 47 RCTs, n = 258,279" — that's 47 separate experiments with a quarter million participants, mathematically combined.
RCT
(Randomized Controlled Trial)
Participants randomly assigned to treatment or control group. The gold standard for testing whether something causes an effect. Random assignment rules out confounders. If the treatment group improves and the control doesn't, the treatment likely caused it.
Longitudinal study Follows the same people over years or decades, measuring how things change. Can reveal long-term patterns that short studies miss. The Harvard Study of Adult Development (80+ years) is the most famous example.
Cross-sectional study Measures a group of people at one point in time. Can show associations but not causation. "People who exercise more have lower depression" doesn't prove exercise reduces depression — maybe less-depressed people exercise more.

When we quote an effect size, we tell you where it came from. A d = 0.5 from a meta-analysis of 30 RCTs is far more trustworthy than a d = 0.5 from a single cross-sectional study.

Other numbers you'll see

Odds Ratio (OR) and Hazard Ratio (HR)

These measure risk. An OR of 1.50 for social connection and survival means well-connected people are 50% more likely to be alive at follow-up than isolated people. An HR of 0.76 for purpose and inactivity means people with strong purpose have 24% lower risk of becoming inactive.

We convert these to Cohen's d for comparison. The conversion isn't perfect, but it lets us put risk-based findings on the same scale as effect-size findings.

Correlation (r)

Measures how strongly two things move together. Ranges from −1 (perfect inverse relationship) to +1 (perfect direct relationship). Zero means no relationship.

When the contemplative practice research reports r = 0.56 between experiential avoidance and depression (Akbari 2022, 441 studies), that's a strong relationship — one of the strongest in psychology.

Sample size (N or n)

How many people were in the study. Bigger is generally more trustworthy. When we write "n = 136,000" it means the finding is based on 136,000 participants — that's not a pilot study, that's a population-scale result.

Key concepts in the insights

Threshold

A point below which you're measurably impaired, and above which more doesn't help much. Sleep has one around 7 hours — below it, cognitive function degrades sharply. Above it, an 8th or 9th hour adds little. Think of it like a cliff edge: you're either on solid ground or you're falling.

Diminishing returns

Each additional unit of investment produces less benefit than the last. The first hour of weekly exercise produces more health benefit than the tenth hour. The curve flattens. You never reach zero return, but you get less and less for each additional unit.

Gating

A foundational need that, when unmet, suppresses the benefit of everything above it. If you're severely sleep-deprived, a meditation retreat can't deliver its full benefit — not because meditation doesn't work, but because the biological infrastructure it depends on is compromised. The gate is partially closed.

In our model, this isn't binary. It's a gradient — a partially met foundation partially opens the gate. But the implication is clear: fixing what's below threshold has outsized impact because it unlocks everything above it.

Synergy

When two things together produce more benefit than the sum of each alone. Sleep and circadian alignment are synergistic — 8 hours at the right time of day is worth more than 8 hours at the wrong time plus perfect circadian rhythm with only 5 hours of sleep. The combination exceeds the sum.

Hierarchy levels

We organize dimensions of human flourishing into three levels:

How to read our provenance labels

Throughout our computed analysis, every number carries a label showing where it came from:

Label What it means
[DATA] Directly reported in a cited study. The number came straight from published research.
[DERIVED] Calculated from reported data using standard formulas. For example, converting an odds ratio to Cohen's d. The source data is cited; the conversion is deterministic math.
[ESTIMATED] Inferred from related evidence but not directly measured. Treat with appropriate caution.
[ASSUMED] No extractable effect size in the source text. We used a default value (d = 0.3). This is a placeholder, not evidence. 75% of our dimensions currently carry this label — an honest measure of where the evidence is thin.
Why this matters

Most research summaries don't tell you which claims are grounded in data and which are inferred. We label everything because knowing what you don't know is as important as knowing what you do. When 75% of dimensions default to assumed values, that's not a failure — it's an honest map of where the evidence is strong and where it's still thin.

The bottom line

You don't need to memorize any of this. When you're reading the insights, just remember:

The commitment of this project is to let the evidence speak — and to be transparent about what it says, what it doesn't, and where we got it wrong the first time.