Available Models
FreeInference provides access to multiple state-of-the-art LLM models for coding agents and IDEs.
Model Overview
Model ID |
Name |
Context Length |
Max Output |
Features |
|---|---|---|---|---|
|
Llama 3.3 70B Instruct |
131K tokens |
8K tokens |
Function calling, Structured output |
|
Llama 4 Scout |
128K tokens |
16K tokens |
Function calling, Structured output |
|
Llama 4 Maverick |
128K tokens |
16K tokens |
Function calling, Structured output, Multimodal (text+image) |
|
GLM-4.5 |
128K tokens |
96K tokens |
Function calling, Structured output, Bilingual (Chinese/English) |
|
GLM-4.5-Air |
128K tokens |
96K tokens |
Function calling, Structured output, Bilingual (Chinese/English) |
|
GLM-4.6 |
200K tokens |
128K tokens |
Function calling, Structured output, Bilingual (Chinese/English), Thinking mode |
|
DeepSeek R1 |
64K tokens |
8K tokens |
Function calling, Structured output |
|
Qwen3 Coder 30B |
32K tokens |
8K tokens |
Function calling, Structured output |
|
MiniMax M2 |
196K tokens |
8K tokens |
Function calling, Structured output |
Model Details
Llama 3.3 70B Instruct
Model ID: llama-3.3-70b-instruct
Context length: 131,072 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Llama 4 Scout
Model ID: llama-4-scout
Context length: 128,000 tokens
Max output: 16,384 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Llama 4 Maverick
Model ID: llama-4-maverick
Context length: 128,000 tokens
Max output: 16,384 tokens
Quantization: fp8
Input modalities: text, image
Output modalities: text
Function calling: Yes
Structured output: Yes
GLM-4.5
Model ID: glm-4.5
Context length: 128,000 tokens
Max output: 96,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
GLM-4.5-Air
Model ID: glm-4.5-air
Context length: 128,000 tokens
Max output: 96,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
GLM-4.6
Model ID: glm-4.6
Context length: 200,000 tokens
Max output: 128,000 tokens
Quantization: fp8
Input modalities: text
Output modalities: text
Language support: Chinese, English
Function calling: Yes
Structured output: Yes
Thinking mode: Yes
Tool streaming: Yes
DeepSeek R1
Model ID: deepseek-r1
Context length: 64,000 tokens
Max output: 8,000 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Qwen3 Coder 30B
Model ID: qwen3-coder-30b
Context length: 32,768 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
MiniMax M2
Model ID: minimax-m2
Context length: 196,608 tokens
Max output: 8,192 tokens
Quantization: bf16
Input modalities: text
Output modalities: text
Function calling: Yes
Structured output: Yes
Switching Models
To use different models, change the model name in your IDE configuration:
Cursor: Select from the dropdown in settings
Codex: Edit ~/.codex/config.toml:
model = "glm-4.6" # Change to any model ID
Roo Code / Kilo Code: Select from the dropdown in extension settings