At-a-Look
The Qwen code GitHub organisation separates concerns across repos: inference, fine-tuning, evaluation, and multimodal demos each have their own home. Releases follow semantic versioning aligned to model generations. For bug reports, include model name, stack version, and a minimal reproducible example. Pull requests without a linked issue are typically deferred.
What lives in the public Qwen GitHub organisation
A map of the major repositories and how responsibility is split across them.
The Qwen code GitHub organisation organises its public code into several purpose-specific repositories rather than a single monorepo. That split reflects the different audiences and update cadences involved: inference code changes when a new model architecture ships, fine-tuning scripts change when new training techniques become practical, and evaluation harnesses change when new benchmarks are added or existing benchmarks are revised. Keeping each concern in its own repository means a developer who only cares about inference does not have to track changes in the fine-tuning codebase, and vice versa.
The primary inference repository contains the generation code, serving utilities, and integration examples for running Qwen models with the transformers library and with vLLM. It is the repo most developers visit first, because it includes the quick-start examples and the documentation on which model IDs to use for which inference backend. The README in this repo is kept synchronised with each new model generation, making it the reference for which Qwen model IDs are currently active and which have been superseded.
A separate repository hosts the fine-tuning recipes — the scripts and configuration files for supervised fine-tuning, instruction tuning, and parameter-efficient fine-tuning with LoRA and QLoRA adapters. Fine-tuning a Qwen model correctly requires matching the tokenizer's chat template, the training data format, and the learning rate schedule to the specific model variant being fine-tuned. The fine-tuning repo documents those requirements per variant rather than expecting users to infer them from the model card.
The evaluation repository contains the harnesses used by the Qwen team to measure model performance on public benchmarks — MMLU, HumanEval, GSM8K, MATH, and the multilingual evaluations that Qwen scores particularly well on. The harnesses are published so that external researchers can reproduce the team's benchmark numbers and run comparable evaluations on their own fine-tuned variants or on competing models. Reproducible evaluation methodology is a meaningful contribution to the field, and the Qwen team's practice of publishing the harnesses alongside the results is worth noting.
How releases are tagged in the Qwen GitHub repositories
The semantic versioning convention used across Qwen repositories and what each tag component signals.
Qwen code GitHub repositories use semantic versioning tags in the form vMAJOR.MINOR.PATCH. The major version number aligns with the Qwen model generation: version 2 tags correspond to the Qwen2 generation, version 2.5 tags correspond to Qwen2.5, and so on. This alignment between the codebase version and the model generation makes it straightforward to check out the correct code version for a given model without digging through commit history.
Minor version increments within a generation indicate new features or significant compatibility changes — for example, adding support for a new inference backend, extending the fine-tuning scripts to handle a new model variant, or updating the evaluation harness to include a newly relevant benchmark. Patch version increments fix bugs or address compatibility breakage with a dependency update without changing public interfaces.
The GitHub releases page for each repository includes release notes that document the changes in each tag. For minor and major version increments, the release notes typically include a migration guide that flags any breaking changes in configuration file formats, script arguments, or API interfaces. Checking the release notes before upgrading the inference or fine-tuning code is worth the few minutes it takes, particularly for production integrations where an unexpected breaking change in a dependency can halt a workflow.
The main branch in each Qwen code GitHub repository typically tracks the latest release, with feature branches used for in-progress work that has not yet been tagged. Pinning a production integration to a specific release tag rather than to main is strongly advisable — main can receive breaking changes as part of active development work between releases. Tags are immutable references; main is not.
| Repository | Purpose | Update cadence |
|---|---|---|
| Qwen (main inference repo) | Generation code, vLLM integration, quick-start examples, model ID reference | Updated at each model generation launch; patches for compatibility fixes |
| Qwen-VL | Vision-language model inference code, image preprocessing, multimodal examples | Updated with each VL model release; separate cadence from text-only repo |
| Qwen-Audio | Audio-language model code, audio tokenisation, speech input handling | Updated with audio model releases; lower frequency than text repos |
| Qwen fine-tuning recipes | SFT, LoRA, QLoRA scripts, training configs, data format documentation | Updated when new model variants or training techniques require new scripts |
| Qwen evaluation harnesses | Benchmark runners for MMLU, HumanEval, GSM8K, multilingual benchmarks | Updated when new benchmarks are added or methodology is revised |
How to file an issue in the Qwen code GitHub repositories
What the maintainers need from a bug report to be able to act on it efficiently.
Most Qwen code GitHub repositories provide an issue template that lists the information the maintainers need for a useful report. Following the template closely is the single most effective way to get a prompt response. Reports that omit key information — such as the model name, the inference stack version, or a reproducible example — are typically labelled "needs more info" and left pending until the reporter adds the missing detail.
The minimum useful bug report includes: the Qwen model name and parameter size, the inference stack and version (transformers, vLLM, llama.cpp, and the relevant version numbers), the operating system and Python version, a minimal script that reproduces the problem, and the full error traceback. Minimal means the smallest amount of code that still triggers the issue — not the entire application. The process of creating a minimal reproducer often surfaces the cause before the issue is even filed, which is a side benefit worth capturing.
Feature requests are welcomed in most Qwen code GitHub repositories and are typically filed as issues with a "feature request" label. A good feature request explains the use case that motivates the request rather than just describing the feature itself. "Add support for streaming responses in the batch script" is easier to evaluate than "add async support", because the use-case framing lets the maintainers judge whether a different existing mechanism already covers the need.
The issue tracker is not the right venue for general usage questions — those belong in the Discussions tab where they are available. An issue filed as a question takes up triage capacity that the maintainers could spend on actionable bug reports and feature requests. If there is uncertainty about whether something is a bug or a usage question, starting in Discussions and converting to an issue if the discussion confirms a real defect is the recommended path.
The contribution flow for external pull requests
How external developers can contribute code to the Qwen repositories and what to do before opening a pull request.
The Qwen code GitHub repositories accept external contributions under the standard fork-and-pull-request model that is the norm across open-source projects on GitHub. A contributor forks the relevant repository, makes changes on a feature branch, and opens a pull request against the upstream main branch. The maintainers review the pull request, request changes if needed, and merge it when it meets the project's standards.
Before investing time in a non-trivial pull request — new features, architectural changes, or additions to the evaluation harness — the right first step is to open an issue or a GitHub discussion to describe the intended contribution and get a signal from the maintainers on whether it aligns with the project's direction. Pull requests that arrive without a linked issue and without prior discussion are often left in review limbo for extended periods, because the maintainers lack the context to evaluate whether the change is wanted before reading through the diff in detail.
For smaller contributions — fixing a typo in documentation, correcting an incorrect example in a README, adding a missing dependency to a requirements file — a pull request without a prior issue is generally acceptable. The bar for prior discussion scales with the size and scope of the change. A one-line fix can usually speak for itself; a 500-line addition to the fine-tuning recipes benefits from advance alignment.
The CONTRIBUTING.md file in each repository specifies the code style requirements, testing expectations, and sign-off requirements for pull requests. The Qwen repositories typically require that a DCO (Developer Certificate of Origin) sign-off is included in each commit message, which is a lightweight attestation that the contributor has the right to submit the code under the project's license. The MIT Open Source Program Office publishes useful background on DCO and CLA practices for developers new to open-source contribution workflows. For teams evaluating the governance model of the Qwen repositories before contributing, the NIST open-source guidance is a relevant reference on open-source compliance expectations in research and enterprise contexts.