MBZUAI launches five new “first-of-its-kind” LLMs

Mohamed bin Zayed University of Artificial Intelligence’s (MBZUAI) Institute of Foundation Models (IFM) is pioneering generative artificial intelligence (AI) tools by deploying “first-of-its-kind” specialized language and multimodal models.

The IFM was launched last year to bring together top AI scientists, engineers, and practitioners to develop large-scale, broad-utility, and efficient AI models with the capacity to be adapted to a wide range of downstream, sustainable applications. MBZUAI is world-renowned for its expertise in this area following the success of the launch of Jais – the world’s most advanced Arabic large language model (LLM) – with Core42 and Cerebras Systems, and Vicuna – a sustainable model in partnership with other universities.

The launch of the five new models – BiMediX, PALO, GLaMM, GeoChat and MobiLLaMA is an important milestone in the institute’s journey and for the research community. They include both small and large language models and large multimodal models (LMMs), which deploy multimodal learning to process and analyze data from multiple modalities or sources beyond text, including audio and images, with a special emphasis on Arabic language capabilities.

The five models are designed to make a real-world impact on healthcare, detailed visual reasoning, multilingual multimodal capabilities, multimodal reasoning for the geospatial domain, and efficient LLMs for mobile devices, respectively, and were each developed from extensive research from the university’s faculty, researchers, and students.

“These models showcase the ability of the Institute of Foundational Models to transform cutting-edge research into applications that address novel use cases across society,” MBZUAI’s Acting Provost and Professor of Natural Language Processing, Professor Timothy Baldwin, said. “By going beyond the limitations of single-modality models and offering numerous applications across industries, multimodal models can be tailored to meet specific needs. This approach ensures that the university is fulfilling its vision to drive excellence in knowledge creation and transfer and AI deployment to foster economic growth, while positioning Abu Dhabi as a hub for the international AI community.”

With demand for AI across healthcare soaring, BiMediX is the world’s first bilingual medical mixture of experts LLM, which outperforms many existing LLMs on medical benchmarks in both English and Arabic including medical board exams. Transformative potential use cases include virtual healthcare assistants, telemedicine, medical report summarization, clinical symptom diagnosis, medical research, counseling and mental health support, or diet plans and lifestyle enhancement.

GLaMM is a first-of-its-kind LMM capable of generating natural language responses related to objects in an image at the pixel level. It offers an enhanced and richer version of automated image captioning, reasoning, and the ability to switch objects in images. Real-world applications include sectors such as e-commerce, fashion, safe and smart cities, and home retail. GLaMM has been accepted for publication at CVPR 2024.

PALO is the world’s first multilingual LMM to offer visual reasoning capabilities in 10 major languages: English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese. The solution ensures high linguistic fidelity even for low-resource languages such as Urdu or Bengali, covering two-thirds of the world’s population and helping to bring the benefits of AI to more people. The model has broad potential applications, including everything from monitoring crops to recording wildlife and aiding with search and rescue missions.

GeoChat is the world’s first grounded LMM, specifically tailored to remote sensing (RS) scenarios. Unlike general-domain models, GeoChat excels in handling high-resolution RS imagery, employing region-level reasoning for comprehensive scene interpretation. Leveraging a newly created RS multimodal dataset, GeoChat exhibits robust zero-shot performance across various RS tasks, including image and region captioning, visual question-answering, scene classification, visually grounded conversations, and referring object detection. The accompanying paper has been accepted to the IEEE/ CVF Computer Vision and Pattern Recognition Conference (CVPR 2024). It has broad potential applications in intelligent Earth observation, climate-related monitoring, and applications along with urban and sustainable planning.

MobiLLaMA is a fully transparent, open-source, lightweight, and efficient small language model (SLM) for resource-constrained devices such as mobile phones, which can be readily deployed and used on your smartphone or tablet. It uses a novel parameter-sharing scheme to reduce pre-training computing, deployment costs, memory footprint, and multimodal capabilities. The model is fully transparent, with complete training data available as part of the LLM360 initiative, intermediate checkpoints, training and evaluation code, and a mobile deployment environment.

 

Have your say!

0 0

Lost Password

Please enter your username or email address. You will receive a link to create a new password via email.