
An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

An edge computing AI model which accepts text, audio and image input, ideal for resource-constrained environments

Cutting-edge vision-language model exceling in retrieving text and metadata from images.

Cutting-edge open multimodal model exceling in high-quality reasoning from image and audio inputs.

Robust Speech Recognition via Large-Scale Weak Supervision.

Multi-lingual model supporting speech-to-text recognition and translation.