LLM News and Articles Weekly Digest - April 15, 2024
Photo by Sigmund on Unsplash
Latest News
-
x.AI Unveils its First Multimodal model, Grok-1.5 Vision
x.AI has announced that its latest flagship model has vision capabilities on par with (and in some cases exceeding) state-of-the-art models. [Source] -
OpenAI Fires Researchers For Leaking Information
OpenAI has reportedly fired two researchers who were allegedly linked to the leaking of company secrets following months of leaks and company efforts to crack down on such incidents. [Source] -
Cohere Launches New Rerank 3 Model
This adaptable model seamlessly meshes with various databases or search indexes and effortlessly integrates into older applications boasting native search capabilities. With the insertion of a mere line of code, Rerank 3 can amplify search effectiveness or slash the costs associated with running RAG applications, all while keeping latency to a minimum. [Source] -
Google’s Gemini Pro 1.5 Enters Public Preview
Google has made its most advanced generative AI model, Gemini 1.5 Pro, available in public preview on its Vertex AI platform. It has a context window of 1 million tokens, can understand audios, has a JSON mode for devs and acts on your commands. [Source] -
Mistral releases Mixtral 8x22 Apache 2 licensed MoE model
A new 8x22B model like always with a magnet link. Initial community benchmarks indicate that the first version performs impressively as a base model, boasting 77 MMLU (typically linked with reasoning tasks). [Source] -
Meta Confirms That Llama 3 Is Coming Next Month — GPT 4 Competitor?
Meta has confirmed plans to release Llama 3, the next generation of its large language model for generative AI assistants, within the next month. [Source]
Articles
Papers and Repositories
-
Evaluating Large Language Models on Long Texts
Ada-LEval is a groundbreaking benchmark for assessing long-context capabilities with adaptable-length questions. It includes two challenging tasks: TSort, arranging text segments, and BestAnswer, selecting the best answer from multiple candidates. [Source] -
karpathy/llm.c: LLM training in simple, raw C/CUDA.
Karpathy’s project focuses on developing a minimalist GPT-2 training framework using C/CUDA, aiming to replicate the PyTorch model in around 1,000 lines of code while enhancing performance through direct CUDA integration and tailored CPU optimizations.[Source] -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs.
Apple researchers have created Ferret-UI, an advanced multimodal large language model (MLLM) tailored for enhanced interpretation and interaction with mobile user interface (UI) screens. [Source] -
Rho-1: Not All Tokens Are What You Need.
The authors analyze token importance in language model training, revealing varied loss patterns. This leads to RHO-1, a new model using Selective Language Modeling (SLM) to focus on training with beneficial tokens, rather than treating all equally. [Source] -
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention.
The work introduces Infini-attention, an attention mechanism within a Transformer block, enabling LLMs to handle infinitely long inputs while maintaining bounded memory and computational requirements. [Source]
Thank you for reading !
If you have any suggestions or feedback, please do comment. You can find me on [Linkedin].
Find the medium post here: https://shresthakamal.medium.com/llm-news-and-articles-weekly-digest-april-8-2024-466fe73f6233.
Do subscribe for future newsletters.
Enjoy Reading This Article?
Here are some more articles you might like to read next: