🔥 Open Source AI on Fire Last Week!

The AI community saw some massive open-source releases and breakthroughs last week. Here’s a roundup of the latest developments:

🗣️ KyutAI Open Sources Moshi & Mimi

  • Moshi: An ~7.6B on-device Speech-to-Speech foundation model.
  • Mimi: A state-of-the-art streaming speech codec.
  • Blazing fast inference codebases in Candle, PyTorch, and MLX.

🧠 Alibaba Qwen 2.5 Release

  • Qwen 2.5: A 72B model with a 128K context window, rivaling Llama 3.1 (405B) and outperforming Mistral Large 2 (123B).
  • Qwen 2.5 Coder: Available in 1.5B and 7B variants for enhanced coding capabilities.

🌍 Mistral AI - Small Instruct 22B

  • New multilingual 22B model with a 128K context window.
  • Supports tool use and function calling—ideal for on-device applications.

🟢 NVIDIA AI with Nemotron Mini 4B

  • Nemotron Mini 4B: A distilled version of Nemotron 15B.
  • Tailored for generating responses in roleplaying, retrieval-augmented generation, and function calling tasks.

Open Source LLM Highlights

  • Alibaba Qwen 2.5: Includes Qwen 2.5 Math and Qwen 2.5 Code (available on X, HF, Blog, Try It).
  • Qwen 2.5 Coder 1.5B: Can run on a 4-year-old phone (Nisten).
  • KyutAI Moshi & Mimi: End-to-end voice chat models (X, HF, Paper).
  • Microsoft GRIN-MoE: Tiny 6.6B active MoE with 79.4 MMLU (X, HF, GitHub).
  • Nvidia NVLM 1.0: A frontier-class multimodal LLM (weights not released yet, X).

Big CO LLMs + APIs

  • OpenAI O1: Results from LMsys are stellar—new king LLM in town (Thread).
  • NousResearch Forge: Announced with MCTS-enabled inference product—currently in waitlist (X).
  • Weights & Biases: All the buzz this week—featuring their new RAG Course on advanced Retrieval-Augmented Generation, with Cohere and Weaviate (sign up for free).

Vision & Video Breakthroughs

  • YouTube DreamScreen: Announcing generative AI for image and video creation in YouTube Shorts (Blog).
  • CogVideoX-5B-I2V: The leading open-source img2video model (X, HF).
  • Runway, DreamMachine, Kling: All announce text-to-video APIs (Runway, DreamMachine).
  • Runway: Unveils a video-to-video model (X).

Tools & Gadgets

  • Snap: Announced their XR glasses—featuring hand tracking and AI capabilities (X).

Stay tuned for more updates, and don’t miss out on upcoming events like Judgement Day Hackathon—just 2 days away! Plus, our brand-new RAG Course is now live. Dive into advanced RAG techniques with top-notch insights from WandB, Cohere, and Weaviate!