Talk notes
Most alternative-data products treat methodology as the trade secret. We argue the opposite. Closed methodology compounds error because no one can challenge a hidden mistake. Open methodology compounds trust because every challenger sharpens the methodology. The trade-off is real — we give up some marginal pricing power because anyone can in principle reproduce our work. But we gain something larger: the buyer who can reproduce our regression is the buyer who trusts us most, and that buyer is also the buyer who churns the least and refers the most.
We chose CC BY 4.0 deliberately. Apache and MIT are software licenses — they're designed for code, not for datasets and methodology. CC BY 4.0 is built for documents, datasets, and reproducible research. It requires attribution but otherwise permits any use. That's the right shape for a measurement product where the goal is to maximize citation, replication, and stress-testing.
Reproducibility filters our buyer base toward the right buyer. A fund that wants secrecy will choose Harmonic, Affinity, or Tracxn — all of which charge €1,000+/month and offer no methodology disclosure. A fund that wants reproducibility will choose us. That self-selection is exactly the customer-acquisition pattern we want. We're not trying to win every fund. We're trying to win the methodology-first fund. Open-source is how we do that.
We encourage reproducibility audits. A handful of subscribers have published audits where they rebuilt our regression in their own notebook and compared their numbers to ours. Most of those audits found our numbers reproduce within a 2 percent error band — usually attributable to dataset-cutoff differences. The audits that surfaced larger discrepancies got incorporated into the methodology — a citizen-science quality-assurance loop that closed-source data products cannot match.