Hi, I'm Howard Zeng

Quantitative Researcher & ML Engineer

I build models that explain themselves, pipelines that don't break, and results you can defend.

Download Resume (PDF) View Projects Contact Me

$400M+ resi credit modeled

100M+ loan-level records

End-to-End Prepay · Default · Loss

About

I build and productionize loan-level prepayment / delinquency models and data pipelines across the full RMBS stack — CRT (STACR/CAS), MIR, Non-QM, Jumbo, and HELOC.

I own modules end-to-end — from factor design and diagnostics to daily risk refresh, tracking, and explainable outputs used in investment decisions.

I care about turning messy, high-dimensional data into models that actually ship — and about writing code the next person can read.

R (mgcv/bam GAM) Python (Polars, XGBoost/LightGBM) SQL (Redshift/SQL Server) Jenkins/PowerShell Ray

Previously at

Structured credit · Lending analytics · Asset pricing · Derivatives research

LibreMax Capital Navy Federal Credit Union Gravity Investments Huatai Securities China Securities

Skills

Structured Credit / Modeling

RMBS CRT (STACR/CAS) Non-QM Jumbo HELOC Prepayment Default/Loss Scenario/Stress Testing

ML / Statistics

GAM (mgcv/bam) XGBoost LightGBM Logistic Regression Time Series Imbalanced Classification Walk-forward Validation

Data / Engineering

Python (Polars/Pandas) C++ R (Tidyverse) SQL (Redshift/SQL Server) Parquet Jenkins PowerShell Automation/QC

Systems / Tooling

Git Linux Ray (distributed compute) Azure AWS Teradata Reproducible Configs Model Versioning

Experience

2025

Feb 2025 - Present

LibreMax Capital

Quantitative Researcher

New York, NY

Production prepay/delinquency/loss models across the full resi stack — CRT (STACR/CAS), MIR, Non-QM, Jumbo, HELOC; factors stable under stress.
End-to-end pipeline: SQL extraction → feature engineering → GAM/XGBoost fitting → automated QC (Jenkins/Python) on loan-level data.
Own full model lifecycle — factor design through daily risk refresh — and deliver tracking outputs and risk explanations to PMs.

RMBS CRT Non-QM Jumbo Python SQL

2024

May 2024 - Aug 2024

Navy Federal Credit Union

Data Scientist Intern | Lending Analytics, Pricing Team

Vienna, VA

Built executive dashboard for a $6B equity-loan portfolio; surfaced pricing anomalies that informed portfolio strategy.
SQL/Python pipelines on Azure/Teradata processing 100M+ records with automated anomaly detection.
End-to-end delivery: KPI design → Plotly visualizations → presentation to senior leadership.

Python SQL Azure Pricing Dashboards

Jan 2024 - May 2024

Gravity Investments

Quantitative Research Engineer & Team Leader | Asset Pricing

Ithaca, NY

Led 6-person team; built predictive models that outperformed S&P 500 benchmark in backtests.
Migrated Temporal Fusion Transformers to AWS SageMaker, optimizing 20M+ record workloads.
Drove full pipeline architecture — feature engineering through deployment — as team lead.

Python PyTorch AWS SageMaker Time Series

2023

Jan 2023 - Aug 2023

WeiCepts LLC

Financial Data & Model Analyst Intern | Mortgage Modeling

Conroe, TX

Reduced dataset 91% while preserving predictive signal across ~9M mortgage records; strong holdout AUC.
SMOTE resampling + Optuna-tuned XGBoost/RF/Logistic classifiers with cross-validation.
End-to-end mortgage default modeling — data prep through model selection and evaluation.

Mortgage Python PostgreSQL XGBoost

2022

Nov 2022 - Mar 2023

UW Foster School of Business

Research Assistant

Seattle, WA

Supported deep demand estimation research with reliable data pipelines and model experiments.
Large-scale econometric dataset preparation and preliminary model evaluation.
Independently managed preprocessing and evaluation workflows.

Python Demand Estimation Econometrics

Sep 2022 - Dec 2022

Huatai Securities

Equity Research Intern | Research Institute (Hardware & Software)

Shanghai, China

Published equity research on database technology and virtual power plant sectors.
Analyzed 19,424 funds using R and Wind terminal data for sector analysis.
Owned full analysis from data collection through published research notes.

R Wind Equity Research

Feb 2022 - Apr 2022

China Securities

Quantitative Research Intern | Derivatives Trading

Beijing, China

Improved derivatives strategy Sharpe ratio by 17% through ARIMA/GBDT model refinement.
11-year (2010–2021) Chinese futures dataset across multiple asset classes.
Designed rolling-window and walk-forward validation for cross-regime robustness.

Python Derivatives Time Series

Earlier Experience

UW Human Centered Design & Engineering

Research Assistant

Sep 2020 - Feb 2021

iRent

Mobile Full Stack Developer | Founder & Team Leader

May 2019 - Jan 2021

UW Information School

Research Assistant

Sep 2019 - Jan 2020

Education

Cornell University

2023 - 2024

MS Applied Statistics — Data Science

GPA: 4.08 / 4.3

Relevant Coursework

Large-Scale Machine Learning Deep Learning Natural Language Processing Reinforcement Learning Stochastic Processes Statistical Modeling Sequence Models Convolutional Neural Networks Machine Learning

University of Washington

2019 - 2022

BS Economics — Econometrics

Minor: Applied Math & Data Science

GPA: 3.6 / 4.0

Relevant Coursework

Econometric Theory & Applications Causal Inference Data Science for Pricing Financial Economics Database Systems (SQL) Linear Algebra & Numerical Analysis Differential Equations Scientific Computing Data Science in Python

Featured Case Studies

AlphaCycle — Stock Prediction Framework

Problem: How to build a reproducible, config-driven research stack for systematic equity signals?
Approach: Built a 3-layer framework (data ingestion → feature store → model training → reporting) with walk-forward validation and regime labels.
Result: Clean separation of data/model/report layers; config-driven pipelines; evaluation with robust metrics across multiple horizons.

Python XGBoost Pipeline Walk-forward

Problem: Can retail investor sentiment on social media predict short-term equity price movements?
Approach: Built an NLP pipeline with VADER and FinBERT to score Reddit/Twitter posts, then trained classifiers on sentiment-price lag features.
Result: Achieved statistically significant predictive signal for 1–3 day returns; backtested strategy returned ~12% annualized alpha on selected tickers.

Python NLP FinBERT Time Series

Problem: Standard Random Forest classifiers degrade on imbalanced datasets common in fraud detection and credit risk.
Approach: Systematically compared SMOTE, ADASYN, Tomek links, cost-sensitive learning, and ensemble balancing across multiple imbalance ratios.
Result: Cost-sensitive RF with SMOTE achieved 15–20% F1 improvement over baseline on highly skewed datasets.

Python Scikit-learn SMOTE Classification

Get In Touch

Open to Quant Research and ML Engineer roles.

haozhe76@outlook.com

Hi, I'm Howard Zeng

About

Skills

Structured Credit / Modeling

ML / Statistics

Data / Engineering

Systems / Tooling

Experience

Earlier Experience

Education

Cornell University

University of Washington

Featured Case Studies

AlphaCycle — Stock Prediction Framework

Retail Sentiment Trading Signals

Handling Class Imbalance in Random Forest

Get In Touch