Predicting Founder Success Without an LLM: An Interpretable Tree-Based Approach to VCBench

23 May 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Predicting the success of startup founders from their professional profile is a high-stakes task that has recently become tractable thanks to the public release of VCBench [Chen et al., 2025], a benchmark of 9,000 anonymised founder profiles. The current public leaderboard is dominated by large language models, which deliver state-of-the-art F0.5 scores at a substantial monetary, computational, and interpretability cost. We ask whether a fully interpretable, freely reproducible tabular approach can close most of this gap. Starting from the structured JSON fields of the public VCBench split (4,500 founders, 9% positive rate), we engineer 42 features grouped into four tiers (prior exits, education, career, and industry) and benchmark four classical models—Logistic Regression, Random Forest, XGBoost, and LightGBM—under the same 6-fold cross-validation protocol as the original paper, with out-of-fold threshold tuning to optimise F0.5. Our best model, a Random Forest, reaches F0.5 = 0.246 (precision 25.1%, recall 23.5%) on the public split, on par with the structured-ML baselines on the leaderboard, and roughly 13× better than the average venture-capital fund in real-world precision. SHAP analysis reveals that the predictions are driven by interpretable founder characteristics: almamater prestige (QS world ranking), prior exits, exposure to large organisations, and industry alignment. Our approach costs zero in API fees, runs end-to-end on a laptop in under two minutes, and is fully auditable—three properties that current LLM-based competitors lack and that matter for regulated decision-making in venture capital.

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.