Shared on21 Oct 24
The Latest on Nvidia's NVLM 1.0 Nvidia has recently unveiled NVLM 1.0, a family of open-source multimodal large language models (MLLMs) designed to compete with leading proprietary systems like OpenAI's GPT-4 and open-source models such as Meta's Llama 3.1. The flagship model, NVLM-D-72B , boasts 72 billion parameters and demonstrates similar performance to both proprietary and open-source competitors across both vision and language tasks. For those of you who are a bit more technologically inclined, there are a few key features that make NVLM quite interesting: The model offers three architectural options - Decoder-only, Cross-attention, and Hybrid models - to optimize for different tasks, improving both training efficiency and multimodal reasoning.