Today at Meta Connect, Meta unveiled Llama 3.2, its first-ever model that processes both text and images. Llama 3.2 offers versatility with small and medium-sized models (11B and 90B parameters), alongside lightweight, text-only versions (1B and 3B parameters) designed to run seamlessly on select mobile and edge devices.
Meta CEO Mark Zuckerberg:
“This is our first open-source multimodal model. It’s going to enable a lot of applications that will require visual understanding.”
Just like Meta’s previous advanced model the Llama 3.2 provides users with lots of input text as it has a 128000 token context length demonstrating the model’s ability to tackle complex tasks seamlessly.
You can download Llama 3.2 on llama.com or Meta’s partner platform.
Moreover, Meta has also shared its official Llama stack distributions so that developers can work with the model in different environments like on-prem, on-device, cloud, and single-node.
Zuckerberg:
“Open source is going to be — already is — the most cost-effective customizable, trustworthy, and performant option out there. We’ve reached an inflection point in the industry. It’s starting to become an industry standard, call it the Linux of AI.”
The company is paving its progress so quickly that Meta reveals that its Llama 3.1 (released just 2 months ago) has achieved 10X growth.
“Llama continues to improve quickly,” said Zuckerberg. “It’s enabling more and more capabilities.”
The two largest Llama 3.2 models (11B and 90B) now support image-based tasks, such as interpreting charts, captioning images, and identifying objects from text prompts. For example, users can ask about their company’s best sales month, and the model will analyze graphs to provide an answer. It also generates captions by extracting image details.
Meta’s lightweight models, like Llama Omni and Llama 3, empower developers to create agentic apps in private environments, such as summarizing recent messages or scheduling follow-up meetings with calendar invites. This allows for seamless automation of everyday tasks.
With its advancement in summarization, tool, and prompt rewriting it has outperformed Gemma and Phi 3.5-mini and also took a huge competitive advantage with Anthropic and OpenAI.
Looking Ahead
Today, Meta is enhancing its business AI, allowing enterprises to use click-to-message ads on WhatsApp and Messenger. Businesses can also create agents to answer common questions, discuss products, and assist with purchases.
Over 1 million advertisers have utilized Meta’s generative AI tools, resulting in 15 million ads created last month. Campaigns using this technology achieved 11% higher click-through rates and a 7.6% increase in conversion rates compared to those that didn’t.
Additionally, for consumers, Meta AI has “a voice”. Llama 3.2 with its advanced feature is capable of responding in celebrity voices. The celebrities include Dame Judi Dench, John Cena, Keegan Michael Key, Kristen Bell, and Awkwafina.
“I think that voice is going to be a way more natural way of interacting with AI than text,” Zuckerberg said during his keynote. “It is just a lot better.”
The celebrity voice feature is available on WhatsApp, Messenger, Facebook and Instagram. Apart from that Meta AI is also capable of understanding images and making changes like adding or removing background. Meta is also exploring new tools for translation, video dubbing, and lip-syncing for Meta AI.
Zuckerberg enthusiastically proclaims that Meta AI is on its way to becoming the world’s leading assistant, stating, “It’s probably already there!” His confidence highlights Meta’s ambitious vision and the impressive impact of Meta AI in the market, promising a game-changing experience for users.