Abstract

A philosophical talk on why we published every signal definition, every regression coefficient, and every dataset under CC BY 4.0. The case is empirical, not ideological — open methodology compounds trust faster than closed methodology compounds margin.

Takeaways

✓Closed methodology compounds error; open methodology compounds trust

✓Why CC BY 4.0 is the right license, not Apache or MIT

✓How reproducibility filters our buyer base toward the right buyer

✓What a 'reproducibility audit' looks like and why we encourage it

Talk notes

Most alternative-data products treat methodology as the trade secret. We argue the opposite. Closed methodology compounds error because no one can challenge a hidden mistake. Open methodology compounds trust because every challenger sharpens the methodology. The trade-off is real — we give up some marginal pricing power because anyone can in principle reproduce our work. But we gain something larger: the buyer who can reproduce our regression is the buyer who trusts us most, and that buyer is also the buyer who churns the least and refers the most.

We chose CC BY 4.0 deliberately. Apache and MIT are software licenses — they're designed for code, not for datasets and methodology. CC BY 4.0 is built for documents, datasets, and reproducible research. It requires attribution but otherwise permits any use. That's the right shape for a measurement product where the goal is to maximize citation, replication, and stress-testing.

Reproducibility filters our buyer base toward the right buyer. A fund that wants secrecy will choose Harmonic, Affinity, or Tracxn — all of which charge €1,000+/month and offer no methodology disclosure. A fund that wants reproducibility will choose us. That self-selection is exactly the customer-acquisition pattern we want. We're not trying to win every fund. We're trying to win the methodology-first fund. Open-source is how we do that.

We encourage reproducibility audits. A handful of subscribers have published audits where they rebuilt our regression in their own notebook and compared their numbers to ours. Most of those audits found our numbers reproduce within a 2 percent error band — usually attributable to dataset-cutoff differences. The audits that surfaced larger discrepancies got incorporated into the methodology — a citizen-science quality-assurance loop that closed-source data products cannot match.

Abstract

Takeaways

✓Closed methodology compounds error; open methodology compounds trust

✓Why CC BY 4.0 is the right license, not Apache or MIT

✓How reproducibility filters our buyer base toward the right buyer

✓What a 'reproducibility audit' looks like and why we encourage it

Talk notes

Reproducibility as a Moat: Why We Open-Sourced Everything

Abstract

Takeaways

Talk notes

Cited in this talk

Reproducibility as a Moat: Why We Open-Sourced Everything

Abstract

Takeaways

Talk notes

Cited in this talk