The Unified Experiment Protocol: 7 Steps for Any Hypothesis Test
A step-by-step protocol for designing and running any product experiment—from formulating a falsifiable hypothesis to pre-committing to thresholds and decisions.
The Unified Experiment Protocol: 7 Steps for Any Hypothesis Test
In product development, experiments are not a luxury but a necessity. They allow us to get honest signals from reality and avoid the costly development of unnecessary features. But for an experiment to be truly useful, it must be strictly structured.
The PTOS methodology offers a unified experiment protocol that works for any tool—whether it's a fake-door, a prototype, a concierge-MVP, or an A/B-test. This protocol disciplines the team, protects against self-deception, and ensures that every test leads to a meaningful decision.
The Fundamental Principle
An experiment is a user's stake (time/money/action), measured in data, with a predefined threshold and decision.
7 Steps of the Unified Experiment Protocol
Step 1. Write the Hypothesis So It Can Be Killed
A hypothesis must be falsifiable—that is, it must be possible to disprove it. The more clearly it is formulated, the easier it is to test.
- Format: 'If we do A, segment S will do B, and metric M will change by Δ within window T.'
- Example: 'If we shorten the onboarding to one step (A), new users (S) will complete activation more often (B), and the percentage of activated users (M) will increase by 5% (Δ) within 7 days (T).'
- Why: This is a defense against 'meaning dilution' and the basis for decision-making.
Step 2. Deconstruct the Risk (What Exactly Don't We Know?)
Any experiment aims to reduce uncertainty. Identify the key risks you want to test.
- Five typical uncertainties:
- Not needed: Users have no real pain, only idle curiosity.
- Can't do it: The UX is complex; users don't understand how to use the solution.
- Won't buy: The pain exists, but users are not willing to pay (with resources/money) for the solution.
- Won't adopt: The solution won't fit into the user's daily process.
- Which is better: Causality or a comparison of different solution options is needed.
- Why: For each uncertainty, there are 'cheap' tests.
Step 3. Choose the Cheapest Test That Can Kill the Hypothesis
You don't need an A/B test to check if a feature is needed at all. Choose the tool that will give the most honest signal with minimal cost.
- Cheat sheet:
- 'Needed?' →
fake-door/message test/micro-survey. - 'Can they?' → Prototype test.
- 'Will they buy?' →
Price-page/ deposit /LOI. - 'Will it stick?' →
Concierge-MVP/ pilot. - 'Which is better?' →
A/B-test/small safe release.
- 'Needed?' →
Step 4. Pre-Commit to Thresholds and Decisions Before Starting
This is the key safeguard against self-deception. Before the experiment begins, you must clearly define:
- Success threshold: What will be considered an unambiguous success.
- Failure threshold: What will be an unambiguous failure.
- Decision based on the result: What you will do in each case (scale / improve / roll back / delete).
- Why: In times of trouble, the team is emotional. Pre-written criteria turn panic into a managed decision.
Step 5. Add a Guardrail (Anti-Goodhart)
One metric = the path to Goodhart's Law. Two metrics = a chance at the truth. A guardrail metric protects the product from negative side effects.
- North metric: What we are improving.
- Guardrail: What should not get worse (e.g., number of errors, support tickets,
retentionin other scenarios). - Question: 'If the metric increased, how could it be 'gamed' without improving the product?' The answer to this question will help strengthen the guardrails.
Step 6. Tracking Plan: An 'End-to-End Trail' from Exposure to Repetition
Make sure you can track the entire user journey.
- Minimum events:
- Exposure: Saw it.
- Intent: Tried it (click/start).
- Value: Got the result (
value-event). - Repeat: Repeated it within window T.
- Guardrails: Side effects (errors/support/churn).
- Why: Without this, you won't be able to understand the mechanism of behavior change.
Step 7. Readout and Decision (Without 'Well, in general...')
The results of the experiment are not a presentation, but a gateway to making a decision.
- Readout questions:
- What did the signal show?
- Are there alternative explanations? (Segment shift, novelty effect, seasonality).
- What is the decision according to the rules: Go / No-go / Reframe?
- Why: This disciplines the team and prevents 'watering down' the conclusions.
By following this protocol, you will turn experiments from an exercise into a 'scientific method' of building products, where every change leads to real learning and progress.