Skip to main content
Howard Zeng

Hi, I'm Howard Zeng

Quantitative Researcher & ML Engineer

I build models that explain themselves, pipelines that don't break, and results you can defend.

$400M+ resi credit modeled
100M+ loan-level records
End-to-End Prepay · Default · Loss

About

I build and productionize loan-level prepayment / delinquency models and data pipelines across the full RMBS stack — CRT (STACR/CAS), MIR, Non-QM, Jumbo, and HELOC.

I own modules end-to-end — from factor design and diagnostics to daily risk refresh, tracking, and explainable outputs used in investment decisions.

I care about turning messy, high-dimensional data into models that actually ship — and about writing code the next person can read.

R (mgcv/bam GAM) Python (Polars, XGBoost/LightGBM) SQL (Redshift/SQL Server) Jenkins/PowerShell Ray

Previously at

Structured credit · Lending analytics · Asset pricing · Derivatives research

LibreMax Capital Navy Federal Credit Union Gravity Investments Huatai Securities China Securities

Skills

Structured Credit / Modeling

RMBS CRT (STACR/CAS) Non-QM Jumbo HELOC Prepayment Default/Loss Scenario/Stress Testing

ML / Statistics

GAM (mgcv/bam) XGBoost LightGBM Logistic Regression Time Series Imbalanced Classification Walk-forward Validation

Data / Engineering

Python (Polars/Pandas) C++ R (Tidyverse) SQL (Redshift/SQL Server) Parquet Jenkins PowerShell Automation/QC

Systems / Tooling

Git Linux Ray (distributed compute) Azure AWS Teradata Reproducible Configs Model Versioning

Experience

2025
Feb 2025 - Present
LibreMax Capital logo
LibreMax Capital

Quantitative Researcher

New York, NY

  • Production prepay/delinquency/loss models across the full resi stack — CRT (STACR/CAS), MIR, Non-QM, Jumbo, HELOC; factors stable under stress.
  • End-to-end pipeline: SQL extraction → feature engineering → GAM/XGBoost fitting → automated QC (Jenkins/Python) on loan-level data.
  • Own full model lifecycle — factor design through daily risk refresh — and deliver tracking outputs and risk explanations to PMs.
RMBS CRT Non-QM Jumbo Python SQL
2024
May 2024 - Aug 2024
Navy Federal Credit Union logo
Navy Federal Credit Union

Data Scientist Intern | Lending Analytics, Pricing Team

Vienna, VA

  • Built executive dashboard for a $6B equity-loan portfolio; surfaced pricing anomalies that informed portfolio strategy.
  • SQL/Python pipelines on Azure/Teradata processing 100M+ records with automated anomaly detection.
  • End-to-end delivery: KPI design → Plotly visualizations → presentation to senior leadership.
Python SQL Azure Pricing Dashboards
Jan 2024 - May 2024
Gravity Investments logo
Gravity Investments

Quantitative Research Engineer & Team Leader | Asset Pricing

Ithaca, NY

  • Led 6-person team; built predictive models that outperformed S&P 500 benchmark in backtests.
  • Migrated Temporal Fusion Transformers to AWS SageMaker, optimizing 20M+ record workloads.
  • Drove full pipeline architecture — feature engineering through deployment — as team lead.
Python PyTorch AWS SageMaker Time Series
2023
Jan 2023 - Aug 2023
WeiCepts LLC logo
WeiCepts LLC

Financial Data & Model Analyst Intern | Mortgage Modeling

Conroe, TX

  • Reduced dataset 91% while preserving predictive signal across ~9M mortgage records; strong holdout AUC.
  • SMOTE resampling + Optuna-tuned XGBoost/RF/Logistic classifiers with cross-validation.
  • End-to-end mortgage default modeling — data prep through model selection and evaluation.
Mortgage Python PostgreSQL XGBoost
2022
Nov 2022 - Mar 2023
UW Foster School of Business logo
UW Foster School of Business

Research Assistant

Seattle, WA

  • Supported deep demand estimation research with reliable data pipelines and model experiments.
  • Large-scale econometric dataset preparation and preliminary model evaluation.
  • Independently managed preprocessing and evaluation workflows.
Python Demand Estimation Econometrics
Sep 2022 - Dec 2022
Huatai Securities logo
Huatai Securities

Equity Research Intern | Research Institute (Hardware & Software)

Shanghai, China

  • Published equity research on database technology and virtual power plant sectors.
  • Analyzed 19,424 funds using R and Wind terminal data for sector analysis.
  • Owned full analysis from data collection through published research notes.
R Wind Equity Research
Feb 2022 - Apr 2022
China Securities logo
China Securities

Quantitative Research Intern | Derivatives Trading

Beijing, China

  • Improved derivatives strategy Sharpe ratio by 17% through ARIMA/GBDT model refinement.
  • 11-year (2010–2021) Chinese futures dataset across multiple asset classes.
  • Designed rolling-window and walk-forward validation for cross-regime robustness.
Python Derivatives Time Series

Earlier Experience

UW Human Centered Design & Engineering logo
UW Human Centered Design & Engineering

Research Assistant

Sep 2020 - Feb 2021

iRent logo

iRent

Mobile Full Stack Developer | Founder & Team Leader

May 2019 - Jan 2021

UW Information School logo
UW Information School

Research Assistant

Sep 2019 - Jan 2020

Education

Cornell University logo

Cornell University

2023 - 2024

MS Applied Statistics — Data Science

GPA: 4.08 / 4.3

Relevant Coursework

Large-Scale Machine Learning Deep Learning Natural Language Processing Reinforcement Learning Stochastic Processes Statistical Modeling Sequence Models Convolutional Neural Networks Machine Learning
University of Washington logo

University of Washington

2019 - 2022

BS Economics — Econometrics

Minor: Applied Math & Data Science

GPA: 3.6 / 4.0

Relevant Coursework

Econometric Theory & Applications Causal Inference Data Science for Pricing Financial Economics Database Systems (SQL) Linear Algebra & Numerical Analysis Differential Equations Scientific Computing Data Science in Python

Featured Case Studies

AlphaCycle — Stock Prediction Framework

Problem
How to build a reproducible, config-driven research stack for systematic equity signals?
Approach
Built a 3-layer framework (data ingestion → feature store → model training → reporting) with walk-forward validation and regime labels.
Result
Clean separation of data/model/report layers; config-driven pipelines; evaluation with robust metrics across multiple horizons.
Python XGBoost Pipeline Walk-forward
Read more →

Retail Sentiment Trading Signals

Problem
Can retail investor sentiment on social media predict short-term equity price movements?
Approach
Built an NLP pipeline with VADER and FinBERT to score Reddit/Twitter posts, then trained classifiers on sentiment-price lag features.
Result
Achieved statistically significant predictive signal for 1–3 day returns; backtested strategy returned ~12% annualized alpha on selected tickers.
Python NLP FinBERT Time Series
Read more →

Handling Class Imbalance in Random Forest

Problem
Standard Random Forest classifiers degrade on imbalanced datasets common in fraud detection and credit risk.
Approach
Systematically compared SMOTE, ADASYN, Tomek links, cost-sensitive learning, and ensemble balancing across multiple imbalance ratios.
Result
Cost-sensitive RF with SMOTE achieved 15–20% F1 improvement over baseline on highly skewed datasets.
Python Scikit-learn SMOTE Classification
Read more →

Get In Touch

Open to Quant Research and ML Engineer roles.

haozhe76@outlook.com