# Available Models FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs. ## Model Overview | Model ID | Name | Context Length | Max Output | Features | |----------|------|----------------|------------|----------| | `llama-3.3-70b-instruct` | Llama 3.3 70B Instruct | 131K tokens | 8K tokens | Function calling, Structured output | | `llama-4-scout` | Llama 4 Scout | 128K tokens | 16K tokens | Function calling, Structured output | | `llama-4-maverick` | Llama 4 Maverick | 128K tokens | 16K tokens | Function calling, Structured output, Multimodal (text+image) | | `glm-4.5` | GLM-4.5 | 128K tokens | 96K tokens | Function calling, Structured output, Bilingual (Chinese/English) | | `glm-4.5-air` | GLM-4.5-Air | 128K tokens | 96K tokens | Function calling, Structured output, Bilingual (Chinese/English) | | `glm-4.6` | GLM-4.6 | 200K tokens | 128K tokens | Function calling, Structured output, Bilingual (Chinese/English), Thinking mode | | `deepseek-r1` | DeepSeek R1 | 64K tokens | 8K tokens | Function calling, Structured output | | `qwen3-coder-30b` | Qwen3 Coder 30B | 32K tokens | 8K tokens | Function calling, Structured output | | `minimax-m2` | MiniMax M2 | 196K tokens | 8K tokens | Function calling, Structured output | --- ## Model Details ### Llama 3.3 70B Instruct **Model ID:** `llama-3.3-70b-instruct` - Context length: 131,072 tokens - Max output: 8,192 tokens - Quantization: bf16 - Input modalities: text - Output modalities: text - Function calling: Yes - Structured output: Yes --- ### Llama 4 Scout **Model ID:** `llama-4-scout` - Context length: 128,000 tokens - Max output: 16,384 tokens - Quantization: fp8 - Input modalities: text - Output modalities: text - Function calling: Yes - Structured output: Yes --- ### Llama 4 Maverick **Model ID:** `llama-4-maverick` - Context length: 128,000 tokens - Max output: 16,384 tokens - Quantization: fp8 - Input modalities: text, image - Output modalities: text - Function calling: Yes - Structured output: Yes --- ### GLM-4.5 **Model ID:** `glm-4.5` - Context length: 128,000 tokens - Max output: 96,000 tokens - Quantization: fp8 - Input modalities: text - Output modalities: text - Language support: Chinese, English - Function calling: Yes - Structured output: Yes --- ### GLM-4.5-Air **Model ID:** `glm-4.5-air` - Context length: 128,000 tokens - Max output: 96,000 tokens - Quantization: fp8 - Input modalities: text - Output modalities: text - Language support: Chinese, English - Function calling: Yes - Structured output: Yes --- ### GLM-4.6 **Model ID:** `glm-4.6` - Context length: 200,000 tokens - Max output: 128,000 tokens - Quantization: fp8 - Input modalities: text - Output modalities: text - Language support: Chinese, English - Function calling: Yes - Structured output: Yes - Thinking mode: Yes - Tool streaming: Yes --- ### DeepSeek R1 **Model ID:** `deepseek-r1` - Context length: 64,000 tokens - Max output: 8,000 tokens - Quantization: bf16 - Input modalities: text - Output modalities: text - Function calling: Yes - Structured output: Yes --- ### Qwen3 Coder 30B **Model ID:** `qwen3-coder-30b` - Context length: 32,768 tokens - Max output: 8,192 tokens - Quantization: bf16 - Input modalities: text - Output modalities: text - Function calling: Yes - Structured output: Yes --- ### MiniMax M2 **Model ID:** `minimax-m2` - Context length: 196,608 tokens - Max output: 8,192 tokens - Quantization: bf16 - Input modalities: text - Output modalities: text - Function calling: Yes - Structured output: Yes --- ## Switching Models To use different models, change the model name in your IDE configuration: **Cursor:** Select from the dropdown in settings **Codex:** Edit `~/.codex/config.toml`: ```toml model = "glm-4.6" # Change to any model ID ``` **Roo Code / Kilo Code:** Select from the dropdown in extension settings