← togetherai / Forward Deployed Engineer (Inference & Post-Training)
tailored_resume_v2 / art_Wk_LQdqx3Dg
role
model
anthropic/claude-sonnet-4.6
created
2026-06-08T19:11
↓ Download .docx ↓ Download .pdf PDF requires LibreOffice installed
What changed for togetherai
| change | why it matters |
|---|---|
| Summary rewritten to lead with 'Forward Deployed Engineer' identity and RL Workbench as primary proof point | JD's role title and first hard requirement is post-training pipeline expertise (GRPO/DPO/RLHF); leading with this directly mirrors the ideal candidate profile |
| Projects section moved before Experience | The RL Workbench and aeval projects are the strongest direct proof of inference/post-training expertise — more relevant than any single work role for this JD |
| RL Workbench reordered to lead the projects section | It is the single most relevant credential: 12 RL algorithms, multi-framework benchmarking (TRL/VeRL/OpenRLHF/NeMo RL), GPU Docker passthrough, Apple Silicon MPS + CUDA — maps to nearly every hard requirement |
| Intuit lead bullet reframed around 50K TPS / sub-25ms TP99 / 675M engagements as 'production inference infrastructure' | JD requires hitting throughput and latency targets in production; this is the strongest enterprise-scale proof point on the resume |
| Splunk lead bullet reframed around 10x performance improvement and 'winning critical POCs and benchmarks' | JD explicitly calls out winning critical POCs and benchmarks as a core responsibility |
| Fintellect lead bullet reframed to emphasize multi-provider LLM orchestration and model landscape awareness | JD requires broad knowledge of open-source models and judgment on model selection; multi-provider orchestration with fallback routing demonstrates this |
| BRAIN project bullets consolidated to include NeurIPS publication inline | Space optimization; NeurIPS credential is important for Together AI's research-driven culture but doesn't need a standalone entry |
| Kaiser Permanente condensed to 1 bullet | Low relevance to inference/post-training role; retained for career continuity and Redis/scale signal |
| Bank of America retained as 1 bullet | Completeness and Monte Carlo / quantitative analysis signal; minimal space cost |
| Streamio OpenClaw and MCP SDK bullets led the Streamio role | LLM orchestration and production AI deployment are most relevant to Together AI's customer-facing AI platform context |
JD analysis (20 key phrases)
Key phrases: inference engine optimizationpost-training pipelinesKV cache tuningspeculative decodingtensor parallelismquantization strategyLoRA, SFT, DPO, RLHF, GRPOthroughput and latency targetsforward deployed engineerproduction AI teamsopen-source LLM deploymentfine-tuning pipelinesstrategic customer alignmenttime-to-valueproduct feedback loopbenchmarkingGPU passthroughApple Siliconframework benchmarkinghands-on RL training runs
Hard requirements:
- Inference engine expertise (vLLM, TensorRT-LLM, SGLang)
- KV cache tuning, speculative decoding, tensor parallelism, quantization
- Post-training pipelines: LoRA, SFT, DPO, RLHF, GRPO
- Strong Python skills in production environments
- Open-source LLM deployment experience
- Customer-facing technical partnership / strategic account management
Preferred qualifications:
- Model landscape awareness (open-source SOTA)
- Hardware-aware inference optimization (GPU, Apple Silicon, CUDA)
- Product feedback loop / roadmap influence
- Benchmarking and POC execution
- Fine-tuning pipeline design from experimentation to production
Per-role mapping (10 roles scored)
| role | score | reframe angle | JD phrases that map |
|---|---|---|---|
| RL Workbench — Post-Training RL Platform | 5/5 | Lead project — direct proof of GRPO/DPO/RLHF pipeline expertise and multi-framework inference benchmarking | GRPO, DPO, RLHF, post-training pipelines, benchmarking, throughput and latency targets, GPU passthrough, Apple Silicon, hands-on RL training runs, framework benchmarking |
| aeval — AI Model Evaluation Platform | 4/5 | Model evaluation infrastructure — maps to model landscape awareness and production quality gates | open-source LLM deployment, production environments, benchmarking, throughput and latency targets |
| Intuit — Staff PM Developer Frameworks & Platform Infrastructure | 4/5 | Platform infrastructure at scale — throughput/latency optimization, developer tooling, and strategic onboarding | throughput and latency targets, time-to-value, production environments, strategic customer alignment, product feedback loop, opinionated onboarding |
| Streamio AI — Founder & CEO | 3/5 | Production AI deployment and multi-agent orchestration — demonstrates hands-on LLM integration in production | production AI teams, open-source LLM deployment, hands-on |
| Fintellect AI — Founder & CEO | 3/5 | Multi-provider LLM orchestration and production AI deployment | open-source LLM deployment, production AI teams, model landscape awareness |
| BRAIN — Protein Structure Prediction ML Platform | 4/5 | Deep ML research credentials — NeurIPS publication, transformer architectures, production ML serving | post-training pipelines, open-source LLM deployment, production environments, hands-on |
| Splunk — Senior PM Search Orchestration | 2/5 | Performance optimization and distributed systems — condense to 2 bullets | throughput and latency targets, production environments |
| Kaiser Permanente — SOA Technical PM | 1/5 | Enterprise infrastructure scale — condense to 1 bullet | — |
| IBM — Software Engineer | 1/5 | Keep 1 bullet for completeness | — |
| AutoEval — Automated Visual Evaluation for Robot Model Training | 3/5 | Automated model evaluation — maps to model quality and production deployment validation | open-source LLM deployment, production AI teams, benchmarking |
Tailored summary
Forward Deployed Engineer and Technical Product Leader with 12+ years in production AI systems — from hand-coding BPTT in C++ (2004) to building a full RLHF/DPO/GRPO post-training workbench benchmarking TRL, VeRL, OpenRLHF, and NeMo RL across Apple Silicon (MPS) and CUDA today. Hands-on expertise in post-training pipelines (PPO, GRPO, DPO, RLHF, SFT), inference optimization, and open-source LLM deployment at scale. Scaled production inference infrastructure to 675M+ engagements and 50K TPS with sub-25ms TP99 at Intuit; NeurIPS published researcher in neural architectures.