Page Pulse
Qwen models in brief: the family spans text, code, vision, and audio branches across multiple generations (Qwen through Qwen2.5 and beyond); each generation ships a wide parameter sweep from ~0.5B to 100B+; base and instruction-tuned variants ship at each size; official weights live on the Qwen Hugging Face organisation; version naming follows a consistent pattern once you know the conventions.
How the Qwen model family is organised
The Qwen family tree has four main branches — text, code, vision, and audio — each with multiple generations and a wide parameter sweep at each release.
The Qwen models family has grown substantially since the first public release. Understanding the full set requires a mental model of four orthogonal axes: the generation (Qwen 1.0 through Qwen2.5 and later), the branch (text, code, vision, audio), the parameter size (roughly 0.5B through 100B+), and the training stage (base versus instruction-tuned). Any given model release sits at a specific combination of those four axes.
The generation axis is the most visible in the naming. Qwen2.5 is a more capable and more refined release than Qwen2, which improved on Qwen1.5, which improved on the original Qwen release. Within a generation, the team ships updates in parallel across branches — so Qwen2.5, Qwen2.5-Coder, and Qwen2.5-VL are separate model lines sharing a generation number but fine-tuned for different task types.
The parameter sweep within a generation is unusually wide. While many open-weight families focus their release on two or three sizes, Qwen typically ships something like 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B within the same generation, plus occasionally a larger flagship. That breadth means there is a Qwen variant for almost any hardware constraint, from a mobile device to a server cluster.
How to read Qwen version naming
The naming convention is systematic once decoded: family name + generation + optional branch + parameter count + base or instruct stage.
A full Qwen model name like "Qwen2.5-Coder-14B-Instruct" decomposes as follows: Qwen is the family name; 2.5 is the generation; Coder indicates the code-specialised branch; 14B is the parameter count; Instruct means the instruction-tuned variant. A name like "Qwen2.5-72B-Base" means the Qwen 2.5 generation general text model at 72B parameters in its base (pre-trained, not instruction-tuned) form.
Some releases add further suffixes. "-GPTQ-Int4" indicates a GPTQ 4-bit quantised build. "-AWQ" indicates an AWQ quantised build. "-GGUF" indicates a GGUF build for llama.cpp. These suffixed variants are typically community-published builds on Hugging Face under separate accounts, not official Tongyi releases. The official releases use the base naming without quantisation suffixes.
Generations and branches at a glance
A quick map of the four Qwen branches and how they have evolved across generations, with notes on where each branch's releases currently live.
The text branch is the core of the Qwen models family. It covers the general-purpose base models and instruction-tuned chat models. Each generation has produced base and instruct variants at the full parameter sweep. The instruction-tuned models in this branch are the right starting point for general conversation, summarisation, translation, and reasoning tasks.
The code branch (Qwen-Coder) ships alongside the text branch with each generation. It continues pre-training on code-heavy data and adds fill-in-the-middle training. Recent Qwen-Coder releases have scored competitively with the best open-weight code models at equivalent parameter counts. The coding plan page on this site covers the Qwen-Coder branch in more depth.
The vision-language branch (Qwen-VL) adds image understanding to the text capability. Qwen-VL models accept image plus text inputs and return text. They are suited for chart analysis, document understanding, visual question answering, and image description. The VL branch has improved multimodal integration with each generation and is the primary Qwen entry point for multimodal applications.
The audio branch (Qwen-Audio) adds speech and audio understanding. It is the smallest and most recently added branch, with fewer parameter size options than the text and code branches. For teams building voice or audio-processing applications, the Qwen-Audio releases are the relevant starting point. AI model evaluation methodology guidance from NIST's AI Risk Management Framework provides useful context for teams formally evaluating Qwen models against alternatives. Research on multimodal model assessment from UC Berkeley's BAIR is another relevant reference.
| Family branch | Release names (representative) | Parameter classes shipped |
|---|---|---|
| Text (base + instruct) | Qwen, Qwen1.5, Qwen2, Qwen2.5 — Base and Instruct variants | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B, 110B (flagship varies by gen) |
| Code (Qwen-Coder) | Qwen2.5-Coder, Qwen2-Coder — Base and Instruct variants | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B (sweep varies by generation) |
| Vision-language (Qwen-VL) | Qwen-VL, Qwen2-VL, Qwen2.5-VL — Instruct primary | 3B, 7B, 72B in recent generations; earlier gens had single-size releases |
| Audio (Qwen-Audio) | Qwen-Audio, Qwen2-Audio — Instruct primary | 7B class primary; fewer size options than text or code branch |
| Math (Qwen-Math) | Qwen2.5-Math — Base and Instruct variants | 1.5B, 7B, 72B; specialised for mathematical reasoning tasks |
Where Qwen model weights live
The primary distribution channel for Qwen models is the official Qwen organisation on Hugging Face. Each model has its own repository containing the weights in safetensors format, the tokeniser files, the model card, and the LICENSE file. Weights are free to download subject to the applicable license — Apache 2.0 on recent flagship text and code releases, custom Qwen Community License on some older or specialised variants.
Community quantised builds — GGUF, AWQ, and GPTQ variants — are typically found under separate community accounts on Hugging Face rather than under the official Qwen organisation. The Ollama model library includes a curated set of Qwen builds that can be pulled with a single command. For teams running Qwen via vLLM or llama-server, the standard Hugging Face download path followed by model loading via the respective inference engine is the recommended route.