← togetherai / Forward Deployed Engineer (Inference & Post-Training)

tailored_resume_v2 / art_Wk_LQdqx3Dg

role

togetherai / Forward Deployed Engineer (Inference & Post-Training)

model

anthropic/claude-sonnet-4.6

created

2026-06-08T19:11

↓ Download .docx ↓ Download .pdf PDF requires LibreOffice installed

What changed for togetherai

change	why it matters
Summary rewritten to lead with 'Forward Deployed Engineer' identity and RL Workbench as primary proof point	JD's role title and first hard requirement is post-training pipeline expertise (GRPO/DPO/RLHF); leading with this directly mirrors the ideal candidate profile
Projects section moved before Experience	The RL Workbench and aeval projects are the strongest direct proof of inference/post-training expertise — more relevant than any single work role for this JD
RL Workbench reordered to lead the projects section	It is the single most relevant credential: 12 RL algorithms, multi-framework benchmarking (TRL/VeRL/OpenRLHF/NeMo RL), GPU Docker passthrough, Apple Silicon MPS + CUDA — maps to nearly every hard requirement
Intuit lead bullet reframed around 50K TPS / sub-25ms TP99 / 675M engagements as 'production inference infrastructure'	JD requires hitting throughput and latency targets in production; this is the strongest enterprise-scale proof point on the resume
Splunk lead bullet reframed around 10x performance improvement and 'winning critical POCs and benchmarks'	JD explicitly calls out winning critical POCs and benchmarks as a core responsibility
Fintellect lead bullet reframed to emphasize multi-provider LLM orchestration and model landscape awareness	JD requires broad knowledge of open-source models and judgment on model selection; multi-provider orchestration with fallback routing demonstrates this
BRAIN project bullets consolidated to include NeurIPS publication inline	Space optimization; NeurIPS credential is important for Together AI's research-driven culture but doesn't need a standalone entry
Kaiser Permanente condensed to 1 bullet	Low relevance to inference/post-training role; retained for career continuity and Redis/scale signal
Bank of America retained as 1 bullet	Completeness and Monte Carlo / quantitative analysis signal; minimal space cost
Streamio OpenClaw and MCP SDK bullets led the Streamio role	LLM orchestration and production AI deployment are most relevant to Together AI's customer-facing AI platform context

JD analysis (20 key phrases)

Key phrases: inference engine optimizationpost-training pipelinesKV cache tuningspeculative decodingtensor parallelismquantization strategyLoRA, SFT, DPO, RLHF, GRPOthroughput and latency targetsforward deployed engineerproduction AI teamsopen-source LLM deploymentfine-tuning pipelinesstrategic customer alignmenttime-to-valueproduct feedback loopbenchmarkingGPU passthroughApple Siliconframework benchmarkinghands-on RL training runs

Hard requirements:

Inference engine expertise (vLLM, TensorRT-LLM, SGLang)
KV cache tuning, speculative decoding, tensor parallelism, quantization
Post-training pipelines: LoRA, SFT, DPO, RLHF, GRPO
Strong Python skills in production environments
Open-source LLM deployment experience
Customer-facing technical partnership / strategic account management

Preferred qualifications:

Model landscape awareness (open-source SOTA)
Hardware-aware inference optimization (GPU, Apple Silicon, CUDA)
Product feedback loop / roadmap influence
Benchmarking and POC execution
Fine-tuning pipeline design from experimentation to production

Per-role mapping (10 roles scored)

role	score	reframe angle	JD phrases that map
RL Workbench — Post-Training RL Platform	5/5	Lead project — direct proof of GRPO/DPO/RLHF pipeline expertise and multi-framework inference benchmarking	GRPO, DPO, RLHF, post-training pipelines, benchmarking, throughput and latency targets, GPU passthrough, Apple Silicon, hands-on RL training runs, framework benchmarking
aeval — AI Model Evaluation Platform	4/5	Model evaluation infrastructure — maps to model landscape awareness and production quality gates	open-source LLM deployment, production environments, benchmarking, throughput and latency targets
Intuit — Staff PM Developer Frameworks & Platform Infrastructure	4/5	Platform infrastructure at scale — throughput/latency optimization, developer tooling, and strategic onboarding	throughput and latency targets, time-to-value, production environments, strategic customer alignment, product feedback loop, opinionated onboarding
Streamio AI — Founder & CEO	3/5	Production AI deployment and multi-agent orchestration — demonstrates hands-on LLM integration in production	production AI teams, open-source LLM deployment, hands-on
Fintellect AI — Founder & CEO	3/5	Multi-provider LLM orchestration and production AI deployment	open-source LLM deployment, production AI teams, model landscape awareness
BRAIN — Protein Structure Prediction ML Platform	4/5	Deep ML research credentials — NeurIPS publication, transformer architectures, production ML serving	post-training pipelines, open-source LLM deployment, production environments, hands-on
Splunk — Senior PM Search Orchestration	2/5	Performance optimization and distributed systems — condense to 2 bullets	throughput and latency targets, production environments
Kaiser Permanente — SOA Technical PM	1/5	Enterprise infrastructure scale — condense to 1 bullet	—
IBM — Software Engineer	1/5	Keep 1 bullet for completeness	—
AutoEval — Automated Visual Evaluation for Robot Model Training	3/5	Automated model evaluation — maps to model quality and production deployment validation	open-source LLM deployment, production AI teams, benchmarking

Tailored summary

Forward Deployed Engineer and Technical Product Leader with 12+ years in production AI systems — from hand-coding BPTT in C++ (2004) to building a full RLHF/DPO/GRPO post-training workbench benchmarking TRL, VeRL, OpenRLHF, and NeMo RL across Apple Silicon (MPS) and CUDA today. Hands-on expertise in post-training pipelines (PPO, GRPO, DPO, RLHF, SFT), inference optimization, and open-source LLM deployment at scale. Scaled production inference infrastructure to 675M+ engagements and 50K TPS with sub-25ms TP99 at Intuit; NeurIPS published researcher in neural architectures.