Here’s a true story about reverse engineering a competitor’s revenue from accidentally leaked information. I’m not going to say what year or what company this was. One of our competitors was doing their own webinar and screen sharing. They accidentally shared a tab that was their Segment.com tab. They were intentionally sharing a “safe” part of the software, but there was a banner at the top saying “You’re over your limit” and it showed exactly how many monthly tracked users. I immediately took a screenshot. Then I wondered: are they using Segment just for their product, or does that include website visitors? I went to their marketing site and saw they were sending Segment events. So this number included their top-of-funnel visitors too. Here’s the detective work I did with publicly available information: First, from LinkedIn posts by that company, I found out exactly how long they’d been in business (“We’re celebrating our anniversary” - they simplified it to how many months they’d been operating). Second, they celebrated by sharing a vanity metric - signups - which confirmed how many signups they had on a particular date and how long it took them to get there. Third, I grabbed their pricing page with a few tiers (probably an enterprise tier I’m simplifying here). I wanted to know if I could estimate that competitor’s monthly recurring revenue with this information and a few educated assumptions. I gave this story and data to GPT-4, along with a screenshot of the pricing page. When I tried this with different models without giving specific instructions, they did a very simplified analysis: “If you have this many people, this percent is free, this percent is paid.” That might be order-of-magnitude correct, but it’s not the most rigorous approach. From my experience on growth teams, one of the best ways to do this is to run a cohort analysis, breaking it into atomic assumptions. If the competitor has been in business for exactly 12 months, you calculate what the cohort that started 12 months ago contributes to today’s revenue, what the cohort from 11 months ago contributes, and so on, accounting for drop-offs. This is actually a really complicated analysis because for each month of every cohort, you want to calculate a funnel: how many website visitors, signups, cumulative signups, active users, and from there you can make assumptions about how many are paying customers. I eventually iterated to a prompt where I was very clear that I wanted a cohort analysis. What it’s doing is running that cohort analysis to match what that company posted on LinkedIn - 50,000 cumulative signups, their vanity metric, and the monthly tracked users at the end of this period. It should try to fit that model and iterate until it gets to those numbers. The more you can tell it about the industry, share your own benchmarks, or do independent research on benchmarks, the better. It will call out the different assumptions it makes, and when I’ve seen variance between models on this analysis, it basically comes down to the assumptions they make. ➡️ Turn accidental information leaks into rigorous revenue estimates by combining public data with cohort analysis. The key is being explicit about wanting sophisticated modeling, not simple percentage calculations.