RLHF and preference alignment. Includes TRL (SFT, DPO, PPO, GRPO), GRPO (Group Relative Policy Optimization), OpenRLHF (Ray+vLLM acceleration), and SimPO (reference-free alignment). Use when aligning models with human preferences or training reward models.
/plugin marketplace add zechenzhangAGI/AI-research-SKILLs/plugin install post-training@ai-research-skillsExpert guidance for Next.js Cache Components and Partial Prerendering (PPR). Proactively activates in projects with cacheComponents enabled.
Adds educational insights about implementation choices and codebase patterns (mimics the deprecated Explanatory output style)
Easily create hooks to prevent unwanted behaviors by analyzing conversation patterns
Frontend design skill for UI/UX implementation