Every test returns a decision, not a dashboard. Here is a fully transparent writeup of six hypotheses we ran for a Series B fintech over six weeks. Five of them were “no.” One moved the number by thirty percent. The math below is how I decide what to run next.
One of the things a good performance program owes its client is honesty about the base rate. Most tests fail. Not because the testers are bad — because the test exists to tell you whether a hypothesis is true, and most hypotheses a thoughtful team comes up with are only directionally correct. The discipline is not to run tests that always win. The discipline is to run tests that resolve a decision either way, so the program stops arguing and starts compounding.
The engagement below is a composite of real work — the shape is faithful to what we actually ran, with the client details blurred for their comfort. If you are running paid media without this level of structure, you are not actually running experiments. You are running aesthetics.
Creative format change on the hero unit.
The bet: Their hero ad was a polished 30s lifestyle spot that looked like every other fintech ad on the platform. The first hypothesis was the cheapest one — swap to a longer-form unscripted founder-voice spot, keeping targeting and offer identical, and see whether a different creative register opens a new segment.
The result: Modest lift in top-of-funnel CTR (+9%), neutral on CAC. The founder-voice spot attracted attention but did not change who clicked. Decision: kill.
Offer restructure — lifetime value messaging.
The bet:The offer on the ads focused on the immediate action (“open a free account in 90 seconds”). We hypothesized the adjacent audience was less impulse-driven and would respond to an LTV-anchored framing (“build a banking relationship that compounds”). Same product, same landing page, new ad copy.
The result: CTR dropped 14%. CAC got worse. The LTV framing felt right in the boardroom and felt wrong in the ad feed. Decision: kill. Revert in under 72 hours.
What sounds right in a strategy doc and what works in a feed are different problems. Test both. Keep what the feed keeps.
New audience vertical — small-business owners.
The bet:Their consumer product had latent utility for sole proprietors who were using it as a de-facto business account. A standalone audience and messaging test: target SMB owners directly, message the “hybrid personal/business” benefit explicitly.
The result: CAC came in 22% above the blended baseline. The SMB audience existed but had much higher conversion friction — they wanted banking features this product did not ship yet. Not a marketing problem; a product gap. Decision: kill. Send the signal to product as a prioritization input.
Landing page redesign — simplified onboarding flow.
The bet: The landing page had a multi-step onboarding that we suspected was leaking qualified traffic. Sheepdog built a single-page variant, same offer, same ads, new destination. Split 50/50 to incoming paid traffic for two weeks.
The result: Conversion lifted 6% on the single-page variant — real but small. Not large enough to justify the maintenance cost of two parallel pages. Decision: kill the variant. But take the specific onboarding-step win and port it into the main page.
Small wins that are not big enough to ship alone can still inform the design of the next thing. We logged the learning as kaizen — not a rollout, an improvement input.
Channel expansion — streaming audio.
The bet: Their audience over-indexed on streaming audio according to research. We hypothesized a modest audio test would either reveal a new acquisition channel or confirm the audience/channel mismatch. Budget: $40k over three weeks, three creative variants, two platforms.
The result: Zero measurable lift on blended CAC. Audio did drive some brand lift in the post-campaign recall survey, but not enough to pay for itself on a direct-response basis. Decision: kill the direct- response framing of audio; keep the question of brand-only audio for a separate conversation when brand spend is on the table.
Creative framework reset — outcome-first ad architecture.
The bet: After five kills in four weeks, we stepped back. The thread across all five failures was that each test moved one variable — creative, audience, channel, offer, landing — without questioning the underlying framework. The sixth bet was structural: rebuild the ad architecture around outcomes rather than features. Every ad had to open with the user's state-change, not the product's capability. Nine new units, one consistent structural rule.
The result: Blended CAC dropped 30% over three weeks and held for the next six. The winning units were not particularly clever in isolation; they were consistent in their structural posture. The previous tests had been right about individual moves and wrong about the operating model that produced them.
Most performance programs fail not on individual tests, but on the framework beneath the tests. When five reasonable hypotheses fail in a row, stop testing. Test the framework.
What the program actually taught us.
Five “no”s and one “yes.” The blended win was 30% CAC reduction sustained across the quarter. The board-facing story is the 30%. The operating story is the five kills, because each kill was a decision that otherwise would have been a slow drift.
- 01The creative-format swap told us we had a volume knob, not an audience knob.
- 02The LTV offer test told us the feed rewards immediate clarity.
- 03The SMB audience test told product it had an under-served segment with a feature gap.
- 04The landing-page redesign contributed a kaizen input even though it did not ship.
- 05The audio channel test scoped the next conversation about brand-only spend.
- 06The framework reset moved the number because it addressed the system, not the parts.
The rule I keep coming back to.
Every test I run has to answer one question and produce one decision. If I cannot write the decision in a single sentence before the test starts, the test is not ready. The six above pass that bar. Every week, a few hypotheses fail that bar and never make it to the queue. That discipline is the reason the program compounds instead of churning.