Qwen CLI | Command-Line Access Reference

Q: What are the most commonly used Qwen CLI subcommands?

The most commonly used subcommands are: chat (start an interactive conversation), serve (launch a local inference server), pull or download (fetch model weights), and run (execute a single non-interactive prompt). Configuration subcommands let you set defaults for model path, device, and API endpoint.

Q: Where does the Qwen CLI store its configuration?

The CLI stores its configuration in a file typically located at ~/.qwen/config.yaml or an equivalent XDG config directory depending on the platform. The config file accepts keys for default model path, inference device, API base URL, and output format. Environment variables can override config-file values for scripting contexts.

Q: Can I use the Qwen CLI in a shell script?

Yes. The CLI's run subcommand accepts a prompt as a positional argument and writes the response to stdout, making it straightforward to pipe into other shell tools or capture into a variable. Setting the output format to plain text via a flag removes markdown formatting from the response, which simplifies downstream text processing in shell pipelines.

Brief Digest

The Qwen CLI wraps model download, chat, and serving behind a consistent terminal interface. Install steps live in the upstream GitHub README — this page covers orientation, subcommand patterns, and scripting integration. For production serving at scale, a dedicated inference stack (vLLM, TGI) is preferable to the CLI server subcommand.

What the Qwen CLI does

A functional overview of the CLI's role in the Qwen developer toolchain.

The Qwen CLI bridges the gap between the hosted chat interface and a fully custom application integration. Developers who want to interact with a locally hosted Qwen model from a terminal — without writing a Python script or spinning up a separate web application — reach for the CLI. It provides a consistent command surface for the most common model interactions: starting an interactive chat, running a one-shot query, downloading weights from Hugging Face, and launching a local API-compatible server that other tools can query.

The CLI is also useful as a diagnostic tool during integration development. When a developer is building a Python application that calls a local Qwen server, the CLI's chat and run subcommands let them verify the server is responding correctly before introducing application-layer complexity. A quick qwen run --model qwen2.5-7b "test prompt" confirms connectivity and output format without needing a separate test harness.

For embedded engineers and DevOps practitioners who live in the terminal, the CLI is often more comfortable than a browser-based studio. Shell script integration, pipe-compatible output, and the ability to set defaults via a config file mean the CLI fits naturally into existing toolchain patterns rather than requiring a context switch to a browser.

Installation: where to find the authoritative steps

Why this reference page points to the upstream README rather than reproducing install instructions that change with each release.

The Qwen CLI install process is documented in the upstream README on the official Qwen GitHub repository. This reference page intentionally does not reproduce those steps verbatim, because install dependencies, Python version requirements, and platform-specific caveats change with each CLI release. Reproducing install steps here would create a source that diverges from the upstream authoritative version within weeks of any release. The upstream README is always the correct source.

The general pattern, across recent releases, involves cloning the repository and running a pip install against the package directory or a published PyPI package name. Some CLI releases are distributed as a standalone binary for platforms where Python dependency management is inconvenient. Check the upstream README's installation section for the current recommended path for your operating system.

Dependencies that the CLI typically requires include a compatible Python version (3.9 or later in recent releases), the transformers library, and a backend inference library such as torch with the appropriate CUDA or MPS bindings for GPU inference. CPU-only inference is supported but slower. The README's "Quick start" section is the practical first read; the "Requirements" section covers the dependency matrix in detail.

On macOS with Apple Silicon, inference via the Metal Performance Shaders backend is available through the appropriate torch build. The CLI's device selection subcommand or configuration key lets you specify mps as the inference device once the right torch build is installed. This typically provides a meaningful speed-up over CPU inference on M-series hardware.

CLI subcommand × purpose × example usage
Subcommand	Purpose	Example usage
`chat`	Start an interactive multi-turn conversation with a local Qwen model	`qwen chat --model qwen2.5-7b-instruct`
`run`	Execute a single non-interactive prompt and print the response to stdout	`qwen run --model qwen2.5-7b "Summarise this: $(cat doc.txt)"`
`serve`	Launch a local OpenAI-compatible API server backed by a Qwen model	`qwen serve --model qwen2.5-14b --port 8080`
`pull`	Download model weights from Hugging Face to the local cache	`qwen pull qwen2.5-7b-instruct`
`config set`	Write a default value to the CLI config file	`qwen config set default_model qwen2.5-7b-instruct`
`config list`	Print all current config file values to stdout	`qwen config list`

Configuration files and environment variables

How the Qwen CLI resolves its settings and where to look when a default is not behaving as expected.

The Qwen CLI reads its configuration from a YAML file, typically located at ~/.qwen/config.yaml on Linux and macOS, or an equivalent path under the user's home directory on Windows. The config file accepts a small set of keys that correspond to the most commonly overridden defaults: default_model, device, api_base (for pointing the CLI at a remote or alternative inference endpoint), and output_format.

Environment variables take precedence over config file values, which makes it straightforward to override a config file setting in a specific shell session without modifying the file. The environment variable names follow a QWEN_ prefix convention — for example, QWEN_DEFAULT_MODEL overrides the default_model config key. This precedence order (environment variables beat config file beats built-in defaults) is the standard Unix convention, and the Qwen CLI follows it consistently.

For teams using the CLI in CI/CD pipelines, environment-variable configuration is preferable to committing a config file to the repository. Store sensitive values like API keys in the CI platform's secret store and inject them as environment variables at runtime. The config file should contain only non-sensitive defaults that are safe to version-control if needed.

The config list subcommand prints the currently resolved configuration, including whether each value came from the config file or an environment variable override. This is the fastest diagnostic when the CLI is behaving unexpectedly — it shows exactly what values the tool sees, without requiring the developer to trace through multiple config layers manually.

Integrating the Qwen CLI into shell scripts

Practical patterns for using the CLI's stdout output in bash and zsh pipelines.

The CLI's run subcommand is designed for scripting. It accepts a prompt as a positional argument, sends it to the configured model, and writes the response to stdout. This makes it composable with standard Unix tools: pipe the output to grep, jq, sed, or any other text processor in the pipeline.

A common pattern is to use command substitution to capture the response into a shell variable: SUMMARY=$(qwen run --model qwen2.5-7b "Summarise: $(cat report.txt)"). The response is then available as $SUMMARY for use in subsequent script steps. Use the --format plain flag to suppress markdown formatting in the response, which prevents stray asterisks and backticks from confusing downstream text processing.

For batch processing — running the same prompt against a list of input files — a for loop or xargs with controlled parallelism handles the job. Be aware that concurrent CLI invocations each spawn a separate model-loading process unless the serve subcommand is used to host the model persistently and the run subcommand is pointed at that server via the --api-base flag. For any batch job larger than a handful of files, the serve-plus-run pattern is far more efficient than spawning a fresh model load per file.

Security considerations apply to shell-script integrations that include user-supplied content in prompts. Prompt injection via file content is a real risk when the input is not sanitised before being included in the model instruction. The NIST AI security guidelines include relevant material on input handling risks for production AI integrations, and the UC Berkeley security group has published research on prompt injection patterns that is worth reviewing before deploying any CLI-based automation against untrusted input.

"The serve-and-query pattern in the Qwen CLI finally let us integrate local inference into our firmware validation pipeline without modifying any existing toolchain. One persistent server process, the run subcommand pointing at it — clean and predictable."

Maelinn O. Sundström
Embedded Engineer · Coppermine Fabric Works · Boulder, CO

Frequently asked questions about the Qwen CLI

Five questions that address the most common points of uncertainty before and during Qwen CLI adoption.

What does the Qwen CLI do?

The Qwen CLI is a command-line tool that lets developers interact with Qwen models from a terminal — running interactive chat sessions, executing one-shot prompts, downloading model weights, and starting a local OpenAI-compatible inference server. It is designed for developers who prefer terminal-first workflows and need to integrate Qwen into shell scripts or CI pipelines without writing a full Python application.

How do I install the Qwen CLI?

Installation instructions are maintained in the upstream README in the official Qwen GitHub repository — the authoritative and always-current source. The general pattern involves either installing a PyPI package via pip or cloning the repository and installing from source. Dependencies include a compatible Python version, the transformers library, and a torch build matched to your inference device (CUDA, MPS, or CPU).

What are the most commonly used Qwen CLI subcommands?

The most commonly used subcommands are chat (interactive multi-turn conversation), run (single non-interactive prompt to stdout), serve (launch a local API server), pull (download model weights), and the config group for reading and writing CLI defaults. The run subcommand is the workhorse for shell script integration.

Where does the Qwen CLI store its configuration?

The CLI stores its configuration in a YAML file at ~/.qwen/config.yaml on Linux and macOS. Environment variables with the QWEN_ prefix override config file values at runtime, which is the recommended approach for CI and scripted environments. The qwen config list subcommand prints all currently resolved values, including which source each came from.

Can I use the Qwen CLI in a shell script or CI pipeline?

Yes. The run subcommand writes the model response to stdout, making it pipe-compatible with standard Unix tools. Use --format plain to strip markdown formatting from the output before passing it to downstream text processors. For batch jobs, run the serve subcommand first to keep the model loaded, then point run calls at the local server to avoid repeatedly loading the model weights.

Qwen CLI: command-line tooling for the model family