# slime ## Docs - [Fault Tolerance](https://mintlify.wiki/THUDM/slime/advanced/fault-tolerance.md): Ensure long-term stable RL training with automatic failure detection and recovery - [Low Precision Training](https://mintlify.wiki/THUDM/slime/advanced/low-precision.md): Train models efficiently with FP8 and INT4 quantization for improved throughput and reduced memory usage - [On-Policy Distillation](https://mintlify.wiki/THUDM/slime/advanced/on-policy-distillation.md): Enable student models to learn from larger teacher models during RL training with on-policy distillation - [Reproducibility](https://mintlify.wiki/THUDM/slime/advanced/reproducibility.md): Achieve bitwise deterministic experiment reproduction by combining SGLang deterministic inference with Megatron-LM deterministic mode - [Slime Router](https://mintlify.wiki/THUDM/slime/advanced/slime-router.md): Lightweight HTTP router with training-oriented capabilities including radix-tree caching and rollout routing replay for MoE models - [Speculative Decoding](https://mintlify.wiki/THUDM/slime/advanced/speculative-decoding.md): Accelerate rollout inference using speculative decoding with draft models and online training for RL - [Arguments API](https://mintlify.wiki/THUDM/slime/api/arguments.md): Complete argument reference for slime configuration - [Backends API](https://mintlify.wiki/THUDM/slime/api/backends.md): Training backend implementations - Megatron, FSDP, and SGLang integration - [Data Structures API](https://mintlify.wiki/THUDM/slime/api/data-structures.md): Core data types for samples, batches, and rollout outputs - [Logging API](https://mintlify.wiki/THUDM/slime/api/logging.md): Logging configuration, tracking initialization, and metrics reporting - [Rollout API](https://mintlify.wiki/THUDM/slime/api/rollout.md): Rollout generation, data sampling, and inference management APIs - [Router API](https://mintlify.wiki/THUDM/slime/api/router.md): SlimeRouter for text-based inference routing and middleware - [Training API](https://mintlify.wiki/THUDM/slime/api/training.md): Core training functions and model initialization for slime RL framework - [RL Algorithms](https://mintlify.wiki/THUDM/slime/concepts/algorithms.md): Understanding GRPO, PPO, GSPO, and Reinforce++ in slime - [Architecture Overview](https://mintlify.wiki/THUDM/slime/concepts/architecture.md): Understanding slime's three-module design for RL scaling - [Rollout & Reward](https://mintlify.wiki/THUDM/slime/concepts/rollout-and-reward.md): Understanding rollout generation and reward model evaluation in slime - [Training Loop](https://mintlify.wiki/THUDM/slime/concepts/training-loop.md): Understanding the Data Sampling → Weight Update cycle in slime - [DeepSeek-R1 with 128×H100](https://mintlify.wiki/THUDM/slime/examples/deepseek-r1.md): Train DeepSeek-R1 (671B MoE) using 128 H100 GPUs with FP8 inference, dynamic sampling, and deepep - [GLM-4.5 (355B MoE) with 64×H100](https://mintlify.wiki/THUDM/slime/examples/glm4-5-355b-a32b.md): Train GLM-4.5-355B-A32B using 64 H100 GPUs with advanced parallelism and optional FP8 rollout - [GLM-4.7-Flash (30B MoE) with 8×H100](https://mintlify.wiki/THUDM/slime/examples/glm4-7-30b-a3b.md): Train GLM-4.7-Flash Mixture-of-Experts model with CPU Adam optimization and MTP speculative decoding - [GLM4-9B with 8×H100](https://mintlify.wiki/THUDM/slime/examples/glm4-9b.md): Train GLM-Z1-9B with GRPO reinforcement learning using 8 H100 GPUs - [Projects Built with slime](https://mintlify.wiki/THUDM/slime/examples/projects-built-with-slime.md): Explore research projects and production systems powered by slime - [Qwen3-30B-A3B with 8×H100](https://mintlify.wiki/THUDM/slime/examples/qwen3-30b-a3b.md): Train Qwen3-30B-A3B Mixture-of-Experts model with CPU Adam and optional FP8 inference - [Qwen3-4B with 8×H100](https://mintlify.wiki/THUDM/slime/examples/qwen3-4b.md): Train Qwen3-4B with GRPO reinforcement learning using 8 H100 GPUs in co-located mode - [Configuration Guide](https://mintlify.wiki/THUDM/slime/guides/configuration.md): Complete reference for slime training configuration parameters - [Customization Guide](https://mintlify.wiki/THUDM/slime/guides/customization.md): Extend slime with custom generation functions, reward models, and filters - [Distributed Training](https://mintlify.wiki/THUDM/slime/guides/distributed-training.md): Set up Ray clusters and scale slime training across multiple nodes - [Multi-Turn Agent Training](https://mintlify.wiki/THUDM/slime/guides/multi-turn-agents.md): Build and train agents with tool calling and multi-turn interactions - [Installation](https://mintlify.wiki/THUDM/slime/installation.md): Set up slime using Docker or conda for single-node or multi-node training - [Introduction to slime](https://mintlify.wiki/THUDM/slime/introduction.md): An LLM post-training framework for RL scaling with high-performance training and flexible data generation - [AMD GPU Support](https://mintlify.wiki/THUDM/slime/platform/amd-tutorial.md): Setup and configuration guide for running slime on AMD Instinct GPUs (MI300/MI325) - [Beyond Megatron-LM](https://mintlify.wiki/THUDM/slime/platform/beyond-megatron.md): Support for alternative training backends including FSDP for flexible model architectures - [Quick Start](https://mintlify.wiki/THUDM/slime/quickstart.md): Get started with slime in under an hour - from environment setup to running your first training job - [Changelog](https://mintlify.wiki/THUDM/slime/resources/changelog.md): Version history and release notes for slime - [Contributing](https://mintlify.wiki/THUDM/slime/resources/contributing.md): Guidelines for contributing to slime - [FAQ](https://mintlify.wiki/THUDM/slime/resources/faq.md): Frequently asked questions about slime - [Usage Guide](https://mintlify.wiki/THUDM/slime/usage.md): Complete reference for slime command-line parameters, training configurations, and advanced features ## OpenAPI Specs - [openapi](https://mintlify.wiki/THUDM/slime/api-reference/openapi.json)