问题

QwQ-32B 是一个性能比肩 DeepSeek-R1 的推理模型,

但是很多人发现模型会出现无限生成、不停重复输出的情况。

等了两天后发现已经有解决方案了:Tutorial: How to Run QwQ-32B effectively | Unsloth Documentation

本文是自己的体验记录

推荐参数设置

  • Temperature = 0.6
  • TopP = 0.95
  • TopK = 20 ~ 40
  • Min_P of 0.02
  • Repetition Penalty of 1.0

使用对话模板

<|im_start|>user\nCreate a Flappy Bird game in Python.<|im_end|>\n<|im_start|>assistant\n<think>\n

Ollama

我只有在 MacBook Pro 上才能跑起来这个模型,因此只记录了 Ollama 的使用。 llama.cpp的用法

ollama 支持将模型设置打包到 params 文件里,所以 MacBook 用户直接执行以下命令就行了

ollama run hf.co/unsloth/QwQ-32B-GGUF:Q4_K_M

或者用其他的量化版本 unsloth/QwQ-32B-GGUF at main

ollama run hf.co/unsloth/QwQ-32B-GGUF:Q5_K_M

效果和总结

我尝试了 unslosh 的 Flappy Bird 游戏的例子:

Create a Flappy Bird game in Python. You must include these things:

- You must use pygame.
- The background color should be randomly chosen and is a light shade. Start with a light blue color.
- Pressing SPACE multiple times will accelerate the bird.
- The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.
- Place on the bottom some land colored as dark brown or yellow chosen randomly.
- Make a score shown on the top right side. Increment if you pass pipes and don't hit them.
- Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.
- When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.
- The final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.

对话客户端用的是 chatbox,输出结果由于太长就不放出来了。可以直接看 unslosh 的分享

作为推理模型,它在我的 MacBook Pro M3 Max 64G 上有以下问题

  • 思考时间很长。不大适合作为代码补全的用途,因为等待时间长就容易打断思路。

其他的开源推理模型