Reasoning vision language model (VLM) for physical AI and robotics.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Powerful OCR model for fast, accurate real-world image text extraction, layout, and structure analysis.
Generates physics-aware video world states for physical AI development using text prompts and multiple spatial control inputs derived from real-world data or simulation.
Powerful, multimodal language model designed for enterprise applications, including software development, data analysis, and reasoning.
Develop AI powered weather analysis and forecasting application visualizing multi-layered geospatial data.
Investigate, understand, and interpret single cell data in minutes, not days by leveraging RAPIDS-singlecell, powered by NVIDIA RAPIDS
Easily run essential genomics workflows to save time leveraging Parabricks
Generalist model to generate future world state as videos from text and image prompts to create synthetic training data for robots and autonomous vehicles.
Generates future frames of a physics-aware world state based on simply an image or short video prompt for physical AI development.
Multi-modal vision-language model that understands text/img/video and creates informative responses
Ingest massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
Rapidly identify and mitigate container security vulnerabilities with generative AI.
Estimate gaze angles of a person in a video and redirect to make it frontal.
Visual Changenet detects pixel-level change maps between two images and outputs a semantic change segmentation mask
EfficientDet-based object detection network to detect 100 specific retail objects from an input video.