In a significant leap for open-source AI, Alibaba’s Qwen team has announced the release of Qwen2, an advanced version of its predecessor, Qwen1.5.
Qwen2 introduces five new models: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B, each optimized for state-of-the-art performance across a variety of benchmarks.
These models offer substantial improvements, including training on data from 27 additional languages beyond English and Chinese, such as Hindi, Bengali, and Urdu.
This multilingual training enhances Qwen2’s capabilities in diverse linguistic contexts, addressing common issues like code-switching with greater proficiency.
Qwen2 also excels in coding and mathematics, with significantly improved performance in these areas.
A standout feature of Qwen2 is its extended context length support, with the Qwen2-7B-Instruct and Qwen2-72B-Instruct models capable of handling up to 128K tokens. This makes them particularly adept at processing and understanding long text sequences.
Qwen2’s release includes various technical enhancements such as Group Query Attention (GQA) for faster speed and reduced memory usage, and optimized embeddings for smaller models.
Performance evaluations show that Qwen2-72B, the largest model in the series, outperforms leading competitors like Llama-3-70B in natural language understanding, coding proficiency, mathematical skills, and multilingual abilities.
Despite having fewer parameters, Qwen2-72B surpasses its predecessor, Qwen1.5-110B, demonstrating the effectiveness of the new training methodologies.
Safety and responsibility remain a priority, with Qwen2-72B-Instruct performing comparably to GPT-4 in terms of safety across various categories of harmful queries. The model exhibits significantly lower proportions of harmful responses compared to other large models.
The Qwen2 models, licensed under Apache 2.0 and Qianwen License for different versions, are set to accelerate the application and commercial use of AI technologies worldwide.
Future plans include training larger models and extending Qwen2 to multimodal capabilities, integrating vision and audio understanding.
Leave a Reply