Abstract
Predicting the success of startup founders from their professional profile is a
high-stakes task that has recently become tractable thanks to the public release of
VCBench [Chen et al., 2025], a benchmark of 9,000 anonymised founder profiles.
The current public leaderboard is dominated by large language models, which
deliver state-of-the-art F0.5 scores at a substantial monetary, computational, and
interpretability cost. We ask whether a fully interpretable, freely reproducible
tabular approach can close most of this gap. Starting from the structured JSON
fields of the public VCBench split (4,500 founders, 9% positive rate), we engineer
42 features grouped into four tiers (prior exits, education, career, and industry) and
benchmark four classical models—Logistic Regression, Random Forest, XGBoost,
and LightGBM—under the same 6-fold cross-validation protocol as the original
paper, with out-of-fold threshold tuning to optimise F0.5. Our best model, a
Random Forest, reaches F0.5 = 0.246 (precision 25.1%, recall 23.5%) on the public
split, on par with the structured-ML baselines on the leaderboard, and roughly 13×
better than the average venture-capital fund in real-world precision. SHAP analysis
reveals that the predictions are driven by interpretable founder characteristics: almamater prestige (QS world ranking), prior exits, exposure to large organisations,
and industry alignment. Our approach costs zero in API fees, runs end-to-end
on a laptop in under two minutes, and is fully auditable—three properties that
current LLM-based competitors lack and that matter for regulated decision-making
in venture capital.


