A nine-step workflow for AI-assisted causal coding

From documents to a conclusion you can defend

You have a stack of documents

Interviews. Reports. Open-ended survey answers. Hundreds of pages of people telling you what changed and why.

Somewhere in there is the answer to your evaluation or research questions.

How do you get the answers, in a rigorous way?

Two tempting shortcuts, both bad

Hand it to the black box

“ChatGPT, what does this say?”

Fast, fluent, and you have no idea what it leaned on. You cannot show your working, so you cannot defend the answer.

Read it all yourself

Thorough, but it does not scale. Three hundred transcripts? How to make sure that your synthesis really reflects what they all say, without jumping to early conclusions? How can you synthesise in a way which answers the causal questions in research and evaluation?

There is a third way.

The promise

One workflow from raw text to a defensible conclusion, using AI as a clerk and making the judgements yourself.

Nine steps. Built for AI coding at scale, and it works just as well by hand.

Human first, AI next.

First, the big idea

Causal mapping is an evidence broker

It does not compete with the evaluation methods you already know.

It has a 50-year tradition.

It is the step that finds the causal claims, organises them, and hands them to contribution analysis, process tracing, Outcome Harvesting, QuIP, or your own judgement.

Stories in, organised evidence out. The evaluative call stays with you.

What a coded claim is

You read the text and mark each causal claim as a link from one factor to another.

“The training gave me confidence, and that is why I started the business.”

becomes training → confidence → started a business, with the verbatim quote and the source kept on every link.

We code claims, not facts

A coded link means: there is evidence that this source claims X influenced Y.

Not that X really did influence Y. Twenty people saying so is not proof. It is evidence you can now weigh. Crossing from claims to conclusions is your job, and the back half of this workflow is about doing it well.

Keep the coding minimalist

We deliberately do not code strength, polarity, or hidden meaning.

People say “X made Y happen”. They rarely say how strongly, or whether it was linear, or what the counterfactual was. So we do not invent it.

Minimalist coding is fast, automatable, and stays close to what people actually said. That is exactly why AI can do the heavy lifting.

AI as a clerk, not an oracle

The clerk’s job

Find every causal claim. Attach a quote. Tireless, exhaustive, cheap.

Your job

Decide the question, check the work, judge what it all means.

The minimalist task is simple enough to hand over. The judgement is not, so we keep it.

How the workflow is built

Three acts

Plan

Steps 1 to 2. Decide what you want to answer and gather data that can answer it.

Code

Steps 3 to 5. Turn the text into a checked table of claims, each with a quote.

Query

Steps 6 to 9. Weigh the evidence and answer the question.

Capture cheaply, then judge

A few wide, cheap passes to capture the evidence. Then steadily narrower judgement.

1000 claims → 30 bundles → 25 assessed links → 1 conclusion

Coding is broad and cheap on purpose. The value is added later, spending that volume down into something you can vouch for.

The nine steps

Plan

  1. Overall planning
  2. Data gathering

Code

  1. Manage the codebook
  2. Code the claims
  3. Check and enrich the links

Query

  1. From claims to bundles
  2. From bundles to pathways
  3. Judge value and contribution
  4. Holistic judgement

Not a strict sequence. You revisit the early ones. Only the last is strictly required.

Plan

Step 1: Start from the question

Before you touch a codebook, write down what you want to be able to say at the end, and to whom.

Every later choice, the data, the labels, the columns, the queries, follows from that one sentence.

Tip: sketch the map or table that would answer your question. That sketch is your target.

Be realistic about what it can answer

Good at

  • Which factors matter most
  • What drives or follows from a factor
  • How groups differ
  • Whether evidence fits a theory of change
  • The overall structure of the system

Not for

  • Effect sizes
  • Proving X causes Y on its own

Pick questions the method can serve, and only as many as the evaluation needs.

Step 2: Gather data that can answer it

The question decides the data.

Narrative material works best. Ask people what changed and why, and you get causal claims to code.

Want to compare women and men, or staff and clients, or early and late? Those groups must be in the data and recorded in the source metadata. You cannot compare what you did not capture.

Code

Step 3: Manage the codebook

How much freedom does the coder get to invent labels?

  • Forced: only your labels
  • Mostly fixed: your labels, flag new ones for review
  • Hierarchical: fix the top level, free the detail
  • Free: invent everything

Loose finds more but leaves more to tidy. Tight is cleaner but misses links. Most projects start loose and tighten later by recoding.

Four tensions behind every coding choice

The settings look like separate knobs. They all pull on the same four tensions.

  • Precision and recall: are the links right, and did you catch them all?
  • Freedom and control: how much rope you give the model
  • Capture now, judge later: grab it cheaply, commit late
  • Cost and time: how much you spend tuning the rest

Turn one knob and the others move.

Step 4: Code the claims

You write an instruction for the AI, like a chatbot prompt, and paste it in.

In a hurry, or one short text? Press one-click and accept the defaults. Often that is enough.

The golden rule: test on a small, varied sample, work out exactly why the output is wrong or thin, fix the instruction, run again. Then scale up.

Holistic or claim by claim?

Holistic

One connected diagram per chunk. A more joined-up story. Best for a single short text. The model has more freedom.

Claim by claim

Every link separately. Fuller coverage. Best for many texts. Links join up less, so you recode later.

The one rule you never break

Every link needs a quote.

Without a verbatim quote behind each link, you cannot show your working, and you cannot defend the conclusion. Ask for it explicitly, every time.

Query

A filter is a question, and filters chain

You query the graph with filters. Each filter is a question. Stack them and you answer a bigger one.

Links → women only → trace training to income → zoom out → bundle → map

The same data gives very different maps with no contradiction. Each map is just the result of a different chain of filters. Order usually matters.

Two everyday examples

“What did the cash transfer lead to?”

Filter to that factor, look downstream. Out comes the map of every reported consequence, with counts.

“Do women and men tell different stories?”

Split by group on the same map. The links each group stresses light up differently.

These quick wins also sharpen your question before the harder work. The next four steps are for the questions that need a defensible answer.

Step 6: From claims to bundles

A bundle is all the claims that say the same X influences Y, from different sources or different parts of one source.

Whatever else you do, weigh each bundle as a whole. How many sources? How convincing? Do they agree or pull apart?

Step 7: From bundles to pathways

Now the indirect questions. How does an intervention reach an outcome, two or three steps down the chain?

This is where causal mapping earns its name, and where the biggest trap lives.

The transitivity trap

A pig farmer says:

the cash grant gave me more cash

A wheat farmer says:

more cash let me buy more seed

So cash grants lead to more seed?

No. The first step is true only for pig farmers, the second only for wheat farmers. Two people, two stories, stitched into one that nobody told.

Source tracing keeps you safe

Path tracing shows every link on a route between two factors, across all sources. Easy to misread.

Source tracing keeps only the sources whose own account runs all the way through. Every link is then part of one complete story told by one person. That is the safe move.

Step 8: Judge value and contribution

How much did it matter, and compared to what?

Compare the influence you care about against the rival explanations, on the same map, not in isolation. Count the sources whose narratives actually run from your driver to your outcome.

Step 9: Holistic judgement

Finally, step back and draw the conclusion.

Behind a single tidy map there may still be hundreds of quotes. Does the claim hold up? Do all the links really belong to the same context?

The AI can draft a vignette: a source-by-source commentary on the pathways, judging how coherent each account is. It does only what a patient reader could. Treat it as a first draft and edit it.

Close the loop

Does the evidence answer your Step 1 question?

If yes, you have a conclusion with every step on show, from quote to claim to bundle to pathway to judgement.

So what

The whole thing in one line

Code “X influenced Y, with a quote”. Capture cheaply. Then judge, in the open, until you have something you can defend.

None of this is statistical causal inference. It is a disciplined way to assemble evidence, weigh it transparently, and reach conclusions you can stand behind.

Where it fits

Causal mapping is the evidence broker. It feeds the methods you already trust.

  • Contribution analysis
  • Process tracing
  • Outcome Harvesting
  • QuIP
  • Realist evaluation
  • Most Significant Change

Most real evaluations combine several. Pick the methods to fit the question.

We use this every day

This is how we work at Causal Map Ltd, and it keeps evolving.

If you want to go from a stack of documents to a conclusion you can defend, come and try it with us.

Companion working paper: “A workflow for AI-assisted causal coding”. App: app.causalmap.app

Home