Photo by ThursdAI


Latest News

  1. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
    Phi 3 comprises a range of models, varying in size from 3 billion to 14 billion parameters, all demonstrating outstanding performance on contemporary benchmarks. The 3B model, in particular, asserts superiority over the original ChatGPT model and its weights have been made publicly available. Additionally, a variant with an extended 128k context length is accessible. . [Source]

  2. Meta releases LLama 3 -8B, 70B and later 400B
    The model has been trained on a massive 15 trillion tokens! Two modes have been released, one with 70 billion parameters and another with 8 billion parameters, along with instruction fine-tuning. It operates with an 8K context length and is not multi-modal. The 70B model achieves impressive results, scoring 82% on MMLU and 81.7% on HumanEval. It employs a tokenizer with a vocabulary size of 128,000. Unlike the MoE (Mixture of Experts) approach, this model utilizes a dense architecture. [Announcement], [Models], [Try it], [Run Locally]

  3. Bigxtral instruct 0.1
    The Instruct model, part of the Apache 2 series, stands as the premier choice, supported by a comparison chart that sparked community interest. Mistral AI’s Mixtral 8x22B model leads with exceptional performance and efficiency. Fluent in five languages and equipped with strong math and coding capabilities, it utilizes only 39 billion parameters out of 141 billion, maintaining cost efficiency. With a 64K token context window, it excels in recalling information from large documents. Released under an open-source license, it outperforms competitors on reasoning, knowledge, and language benchmarks, particularly in four languages. Its adaptability is further enhanced by Mistral’s new Tokenizer, tailored for tool use with tokens. [Blog], [Try it]

  4. Meta’s battle with ChatGPT begins now
    Meta’s AI assistant is being put everywhere across Instagram, WhatsApp, and Facebook. Meanwhile, the company’s next major AI model, Llama 3, has arrived. [News]

  5. Mistral seeking funding at $5B valuation
    There have been reports of the open source pioneer Mistral seeking several hundred million dollars of funding to train more models. [Source]

  6. Google's New Technique Gives LLMs Infinite Context
    Google researchers have introduced Infini-attention, a technique that enables LLMs to work with text of infinite length while keeping memory and compute requirements constant. [Source]


Articles

  1. Llama 3 is not very censored

  2. Stanford HAI Releases 2024 AI Index Report

  3. OpenAI and Meta Reportedly Preparing New AI Models Capable of Reasoning

  4. A Handy Compendium of Common Terms Used In The Context Of LLMs

  5. From 7B to 8B Parameters: Understanding Weight Matrix Changes in LLama Transformer Models

  6. Unlocking the Power of Transformers: A Journey through the Evolution of Artificial Intelligence

  7. Groq API: Unleashing the Power of Ultra-Low Latency AI Inference


Papers and Repositories

  1. Optimizing In-Context Learning in LLMs
    This paper introduces a new approach to enhancing In-Context Learning (ICL) in large language models like Llama-2 and GPT-J. Its authors present a new optimization method that refines what they call ‘state vectors’ — compressed representations of the model’s knowledge. [Source]

  2. AI Gateway
    AI Gateway is an interface between apps and hosted large language models. It streamlines API requests to LLM providers using a unified API. AI Gateway is fast, with a tiny footprint, and it can load balance across multiple models, providers, and keys. It has fallbacks to ensure app resiliency and supports plug-in middleware as needed.[Source]


Thank you for reading ! 

If you have any suggestions or feedback, please do comment. You can find me on [Linkedin].

Find the medium post here: LLM News and Articles Weekly Digest — April 24, 2024 @ Medium

Do subscribe for future newsletters.