Amsterdam UMC Systems Science SIG · 21 May 2026 · 09:30 CET
Causal mapping is not systems dynamics
Coding what people say, then deciding what it means
Steve Powell · Causal Map Ltd · hello@causalmap.app
app.causalmap.app · free, unlimited public projects · garden.causalmap.app
Causal mapping is a 50-year tradition used in many disciplines including cognitive psychology, ecology, decision science, management and evaluation.
Causal Map Ltd provides the Causal Map app for evaluation and qualitative research.
Its niche: coding causal claims in text, closer to CAQDAS/NVivo than to a diagramming tool.
The app: app.causalmap.app. Free for unlimited public projects!
Boxes and arrows are a shared visual language.
Feedback, mediation and indirect effects matter in many traditions.
The common move: messy reality into useful structure.
The interesting question is what the arrows mean.
Reading 1: a fact about the world
If X goes up, Y goes up.
A claim about the system itself. Backed by measurement, modelling, simulation.
→ System dynamics, CLDs, FCMs
Reading 2: a claim with a source
“This person said X influenced Y, in this context, in this quote.”
A claim about somebody’s account of the world. Backed by quoting them.
→ Causal Map
Many mapping traditions slide between the two. This deck keeps them apart, then bridges them.
In this text-coding tradition, a causal map first records what people say causes what.
A CLD is usually a hypothesis about the system. This map shows the distribution of claims across people.
P and Q’s separate beliefs, then combined. Each link still carries source metadata.
A factor is a short proposition, close to what someone said.
Examples
“Not enough money in the household”
“Did not take a holiday this year”
“Civil society coalitions gained influence in decision-making bodies”
Not “money” as a variable from low to high.
What this minimalist style does not code on the link itself:
People rarely state these explicitly. Coders rarely agree on them from text. Minimalism is honesty about what the data can support.
Bare links plus good labels let analysts:
It works fully by hand. AI makes it faster at scale.
Hierarchical labels let a dense map zoom out without deleting detail.
Convention
General; specific
New intervention; midwife training
Healthy behaviour; hand washing
Zoom to level 1:
New intervention -> Healthy behaviour
Use a hierarchy only when the parent is a valid causal factor. A; B means B can be reported as evidence for A.
Opposites coding handles pairs like:
Employed and ~EmployedGood health and ~Good healthEating vegetables and ~Eating vegetablesThe ~ marks the opposite pole. Combining opposites rewrites both poles to one canonical label, while keeping polarity metadata.
Why not just plus/minus links?
A link can be flipped at the cause end, the effect end, or both. Treating this as one average positive or negative strength loses information.
Sentiment is different
Sentiment is link metadata, usually -1, 0, 1. It says whether the claim is positive, neutral or negative in context. It is not the same as an opposite-coded factor.
Opposites preserve meaning across labels. Sentiment marks the tone or valence of a particular claim.
Stakeholder cognitions
What people think causes what. The map as a record of distributed belief.
Facts about the world
What really influences what. The map as the start of an argument.
Causal mapping has often muddled these two functions. The important move is to keep the leap visible and separate.
Moving from claims to conclusions is a sequence of checks. Only the last is required; most projects use several.
Coding rarely sits inside a fixed codebook from the start.
Revisit the codebook as coding proceeds. Merge near-duplicates, split overloaded factors, lift to a useful level of abstraction.
Not a one-off setup step. Codebook decisions shape every later moment.
Quality questions: are factors at a consistent grain? Do labels mean the same thing across sources? Have AI-coded factors drifted from the intended vocabulary?
Code the smallest defensible claim: this source says X influenced Y.
First quality move: keep the unit small, traceable and inspectable.
This is why minimalist coding matters. Richer causal language belongs later, not in the raw link.
Manual or AI, the check is the same: is there a causal claim here, and are the endpoints right?
Check each raw claim before it feeds the map.
Mark what matters: tags, conviction, source reliability, custom columns.
First rule: don’t hide doubt. Mark it.
These link-level checks flow into filters and bundle summaries downstream.
A bundle is all claims X→Y across all sources. Either collapse it into one assessed link, or skip it if evidence is too thin.
The workflow:
The app requires a written rubric before assessed links. Deliberately.
Typical scale: 1,000 raw claims → 30 bundles → 25 assessed links
Unassessed view: many raw claims per bundle
Assessed view: one vouched link per bundle
A→B plus B→C does not automatically mean A→C. Contexts may not line up.
Path tracing: keeps links lying on some path from A to C.
Source tracing: keeps only sources whose own account runs all the way from A to C.
The transitivity trap: fragments from different sources can create a pathway nobody actually told.
Source tracing asks: did anyone tell the whole story from intervention to outcome?
Once a pathway is credible, ask:
This moves from “is there evidence?” to “what should be concluded?”
Counts and maps help. The analyst weighs significance, context and rivals.
The final judgement is about the whole account.
Having looked across claims, bundles, pathways, gaps and rivals: is this worth standing behind?
A vignette can draft this whole-map judgement. You check it.
Example vignette output
Based on the data, the quotes provided by individual sources largely represent coherent causal stories explaining the journey from increased knowledge to food consumption quantity.
Knowledge leading to production and consumption
One source links lessons from the organisation to increased output, then links that output to better household food consumption.
“I now produce a lot and I have a good income from my products. I now produce and sell and grow well.”
- MNX-6
13 countries 692 outcomes 5,430 claims
INTRAC was MEL partner for a 13-country civil-society programme. Outcome Harvesting gave rich data, but not the big picture.
Couldn’t understand the big picture, for the whole programme but also for specific countries too.
- Alastair Spray, INTRAC
Zero-shot AI coding: no pre-set codebook. Outputs: one programme map, 13 country maps, filtered views by objective and time period.
Burkina Faso, country view
Same database, filtered to Burkina Faso.
Advocacy, training and public mobilisation feed action on rights violations.
Factor and link sizes = citation counts. Arrowhead colour = sentiment (blue positive, red negative).
Every link traces to a quote. Nothing in the visual is inferred.
Coding is half the job. The database under the map is for questions.
Stack filters and you can answer most evaluation questions worth asking.
The map you see is one view. Behind it sits a queryable database of links, sources, quotes, tags and metadata.
About factors
About links
About sources
Raw cognition maps
For policy design, behaviour change and stakeholder analysis: what do people think drives what?
The map is the answer.
Assessed maps
For evaluation and theory-of-change checks: which claims are vouched for, and with what behind them?
The map is the start of an argument.
INTRAC: pathways from interventions to outcomes in a 13-country lobby and advocacy programme.
System dynamics, CLDs, FCMs
Text-coded causal mapping
Also distinct from
CAQDAS: text tags, no map structure
System Dynamics Bot: variable-based, single model
FCM: often predictive in practice
Comparison table adapted from Powell, Copestake and Remnant 2024
The method works fully without AI. Minimalist coding is a small, clear task, which suits LLMs.
Validation study (Powell and Cabral, 2025), QuIP corpus, GPT-4 temperature 0:
84%
Open coding
180 links
87%
Codebook-assisted
172 links
Composite score: correct endpoints, real causal claim, not hypothetical, correct direction.
Top-level overview maps from AI and human coding come out broadly the same.
It works because the prompt is narrow: “where is a causal claim, and what influences what?”
INTRAC: 5,430 claims from 692 harvested outcomes and ToC review documents. Zero-shot (Gemini 2.5 Pro). Coded in days, not months.
1
The arrow question matters
The choice between ‘arrows are facts’ and ‘arrows are claims’ is not cosmetic. This deck starts with claims.
2
Minimalism enables scale
Minimalist coding lets analysts aggregate, compare and query thousands of claims without semantic disputes.
3
The bridge needs a workflow
Going from claims to facts needs explicit, checkable steps. Not an algorithm.
App (free, unlimited public projects)
app.causalmap.app
Knowledge garden
garden.causalmap.app
Methods comparison table
in Powell, Copestake and Remnant 2024
Powell and Cabral (2025) AI-assisted causal mapping: a validation study. IJSRM.
Powell, Cabral and Mishan (2025) A workflow for collecting and understanding stories at scale. Evaluation.
Powell, Copestake and Remnant (2024) Causal mapping for evaluators. Evaluation.
Thank you. Questions, please.
hello@causalmap.app
Amsterdam UMC Systems Science SIG · 21 May 2026