Qwen CLI: command-line tooling for the model family

The Qwen CLI is a terminal-first interface for working with Qwen models — running chat sessions, serving a local inference endpoint, pulling weights, and querying a running server without writing any Python. This reference covers what the tool does, how installation is structured, the common subcommands, configuration file conventions, and how to integrate the CLI into shell scripts.

Brief Digest

The Qwen CLI wraps model download, chat, and serving behind a consistent terminal interface. Install steps live in the upstream GitHub README — this page covers orientation, subcommand patterns, and scripting integration. For production serving at scale, a dedicated inference stack (vLLM, TGI) is preferable to the CLI server subcommand.

What the Qwen CLI does

A functional overview of the CLI's role in the Qwen developer toolchain.

The Qwen CLI bridges the gap between the hosted chat interface and a fully custom application integration. Developers who want to interact with a locally hosted Qwen model from a terminal — without writing a Python script or spinning up a separate web application — reach for the CLI. It provides a consistent command surface for the most common model interactions: starting an interactive chat, running a one-shot query, downloading weights from Hugging Face, and launching a local API-compatible server that other tools can query.

The CLI is also useful as a diagnostic tool during integration development. When a developer is building a Python application that calls a local Qwen server, the CLI's chat and run subcommands let them verify the server is responding correctly before introducing application-layer complexity. A quick qwen run --model qwen2.5-7b "test prompt" confirms connectivity and output format without needing a separate test harness.

For embedded engineers and DevOps practitioners who live in the terminal, the CLI is often more comfortable than a browser-based studio. Shell script integration, pipe-compatible output, and the ability to set defaults via a config file mean the CLI fits naturally into existing toolchain patterns rather than requiring a context switch to a browser.

Installation: where to find the authoritative steps

Why this reference page points to the upstream README rather than reproducing install instructions that change with each release.

The Qwen CLI install process is documented in the upstream README on the official Qwen GitHub repository. This reference page intentionally does not reproduce those steps verbatim, because install dependencies, Python version requirements, and platform-specific caveats change with each CLI release. Reproducing install steps here would create a source that diverges from the upstream authoritative version within weeks of any release. The upstream README is always the correct source.

The general pattern, across recent releases, involves cloning the repository and running a pip install against the package directory or a published PyPI package name. Some CLI releases are distributed as a standalone binary for platforms where Python dependency management is inconvenient. Check the upstream README's installation section for the current recommended path for your operating system.

Dependencies that the CLI typically requires include a compatible Python version (3.9 or later in recent releases), the transformers library, and a backend inference library such as torch with the appropriate CUDA or MPS bindings for GPU inference. CPU-only inference is supported but slower. The README's "Quick start" section is the practical first read; the "Requirements" section covers the dependency matrix in detail.

On macOS with Apple Silicon, inference via the Metal Performance Shaders backend is available through the appropriate torch build. The CLI's device selection subcommand or configuration key lets you specify mps as the inference device once the right torch build is installed. This typically provides a meaningful speed-up over CPU inference on M-series hardware.

CLI subcommand × purpose × example usage
Subcommand Purpose Example usage
chat Start an interactive multi-turn conversation with a local Qwen model qwen chat --model qwen2.5-7b-instruct
run Execute a single non-interactive prompt and print the response to stdout qwen run --model qwen2.5-7b "Summarise this: $(cat doc.txt)"
serve Launch a local OpenAI-compatible API server backed by a Qwen model qwen serve --model qwen2.5-14b --port 8080
pull Download model weights from Hugging Face to the local cache qwen pull qwen2.5-7b-instruct
config set Write a default value to the CLI config file qwen config set default_model qwen2.5-7b-instruct
config list Print all current config file values to stdout qwen config list

Configuration files and environment variables

How the Qwen CLI resolves its settings and where to look when a default is not behaving as expected.

The Qwen CLI reads its configuration from a YAML file, typically located at ~/.qwen/config.yaml on Linux and macOS, or an equivalent path under the user's home directory on Windows. The config file accepts a small set of keys that correspond to the most commonly overridden defaults: default_model, device, api_base (for pointing the CLI at a remote or alternative inference endpoint), and output_format.

Environment variables take precedence over config file values, which makes it straightforward to override a config file setting in a specific shell session without modifying the file. The environment variable names follow a QWEN_ prefix convention — for example, QWEN_DEFAULT_MODEL overrides the default_model config key. This precedence order (environment variables beat config file beats built-in defaults) is the standard Unix convention, and the Qwen CLI follows it consistently.

For teams using the CLI in CI/CD pipelines, environment-variable configuration is preferable to committing a config file to the repository. Store sensitive values like API keys in the CI platform's secret store and inject them as environment variables at runtime. The config file should contain only non-sensitive defaults that are safe to version-control if needed.

The config list subcommand prints the currently resolved configuration, including whether each value came from the config file or an environment variable override. This is the fastest diagnostic when the CLI is behaving unexpectedly — it shows exactly what values the tool sees, without requiring the developer to trace through multiple config layers manually.

Integrating the Qwen CLI into shell scripts

Practical patterns for using the CLI's stdout output in bash and zsh pipelines.

The CLI's run subcommand is designed for scripting. It accepts a prompt as a positional argument, sends it to the configured model, and writes the response to stdout. This makes it composable with standard Unix tools: pipe the output to grep, jq, sed, or any other text processor in the pipeline.

A common pattern is to use command substitution to capture the response into a shell variable: SUMMARY=$(qwen run --model qwen2.5-7b "Summarise: $(cat report.txt)"). The response is then available as $SUMMARY for use in subsequent script steps. Use the --format plain flag to suppress markdown formatting in the response, which prevents stray asterisks and backticks from confusing downstream text processing.

For batch processing — running the same prompt against a list of input files — a for loop or xargs with controlled parallelism handles the job. Be aware that concurrent CLI invocations each spawn a separate model-loading process unless the serve subcommand is used to host the model persistently and the run subcommand is pointed at that server via the --api-base flag. For any batch job larger than a handful of files, the serve-plus-run pattern is far more efficient than spawning a fresh model load per file.

Security considerations apply to shell-script integrations that include user-supplied content in prompts. Prompt injection via file content is a real risk when the input is not sanitised before being included in the model instruction. The NIST AI security guidelines include relevant material on input handling risks for production AI integrations, and the UC Berkeley security group has published research on prompt injection patterns that is worth reviewing before deploying any CLI-based automation against untrusted input.

"The serve-and-query pattern in the Qwen CLI finally let us integrate local inference into our firmware validation pipeline without modifying any existing toolchain. One persistent server process, the run subcommand pointing at it — clean and predictable."
Maelinn O. Sundström
Embedded Engineer · Coppermine Fabric Works · Boulder, CO

Frequently asked questions about the Qwen CLI

Five questions that address the most common points of uncertainty before and during Qwen CLI adoption.

What does the Qwen CLI do?

The Qwen CLI is a command-line tool that lets developers interact with Qwen models from a terminal — running interactive chat sessions, executing one-shot prompts, downloading model weights, and starting a local OpenAI-compatible inference server. It is designed for developers who prefer terminal-first workflows and need to integrate Qwen into shell scripts or CI pipelines without writing a full Python application.

How do I install the Qwen CLI?

Installation instructions are maintained in the upstream README in the official Qwen GitHub repository — the authoritative and always-current source. The general pattern involves either installing a PyPI package via pip or cloning the repository and installing from source. Dependencies include a compatible Python version, the transformers library, and a torch build matched to your inference device (CUDA, MPS, or CPU).

What are the most commonly used Qwen CLI subcommands?

The most commonly used subcommands are chat (interactive multi-turn conversation), run (single non-interactive prompt to stdout), serve (launch a local API server), pull (download model weights), and the config group for reading and writing CLI defaults. The run subcommand is the workhorse for shell script integration.

Where does the Qwen CLI store its configuration?

The CLI stores its configuration in a YAML file at ~/.qwen/config.yaml on Linux and macOS. Environment variables with the QWEN_ prefix override config file values at runtime, which is the recommended approach for CI and scripted environments. The qwen config list subcommand prints all currently resolved values, including which source each came from.

Can I use the Qwen CLI in a shell script or CI pipeline?

Yes. The run subcommand writes the model response to stdout, making it pipe-compatible with standard Unix tools. Use --format plain to strip markdown formatting from the output before passing it to downstream text processors. For batch jobs, run the serve subcommand first to keep the model loaded, then point run calls at the local server to avoid repeatedly loading the model weights.