The 3D conditioning for precise visual generative AI NVIDIA Omniverse Blueprint, powered by NVIDIA NIM™, NVIDIA Omniverse™, OpenUSD, image2image models such as SDXL or tuned models like Realviz4.0, and Shutterstock Generative 3D, offers a streamlined solution for creating precise, on-brand images. This experience allows users to choose the color of a hero asset, select the desired camera angle in the 3D scene, and then use generative AI to customize scene components such as backgrounds and props. Using this experience as inspiration, developers can download and customize the blueprint to unlock use cases such as scalable concepting and ideation, through to the creation of marketing assets for their brands or customers.
The user is presented with a live 3D viewport showcasing the final product—the "hero asset"—created by a creative team. In this instance, the hero asset is an espresso machine with a coffee mug. This asset, representing the final design, includes all the final materials and product options. It is placed within a rudimentary scene that appears unfinished. Additional props, such as the cutting board, were generated using Shutterstock 3D Generator to populate the counter with objects unavailable to the creative team when the initial scene was created.
The user can orbit the hero asset using the left mouse button and zoom in or out with the mouse wheel. Navigation within the scene is designed to keep the hero asset in frame at all times. Once the user has identified a suitable camera angle, they can adjust the espresso machine's configuration. The machine has two control surface options, various color options, and a choice of coffee mug style. An additional control allows the user to select a pre-generated HDRi image (created with Shutterstock 360 HDRi Generator) to quickly modify the scene's background.
Next, the user inputs individual prompts to generate the background. These prompts are linked to specific objects, ensuring each prompt modifies a designated area within the scene. After the user enters the prompts, the system processes them along with the scene's layout using generative AI to create the final image. During this process, the system generates masks for the targeted prompts, which the creative team can use for further image processing.
The following NIM are used by this blueprint:
USD Search
USD Code
Shutterstock 3D Generator (Playground Sample on NIM)
Shutterstock360 HDRi Generator (Playground Sample on NIM)
NVIDIA Blueprints are customizable AI workflow examples that equip enterprise developers with NIM microservices, reference code, documentation, and a helm chart for deployment.
This blueprint provides a reference and workflow guide for the users to showcase how diffusion models, control nets, and corresponding auxiliary tools can be easily integrated to Omniverse to be streamed remotely. Our primary container with Omniverse handles viewport streaming and message passing between the web front-end with the second container; users can opt to use ComfyUI + an image2image model such as SDXL or tuned models like Realviz4.0, leveraging our default template or take their own custom pipeline for diffusion models to handle requests coming from the first container with Omniverse. Then, we push the helm chart with the two containers.
Hardware Requirements
GPU: 2 x L40 deployed (One for rendering the scene and another for inferencing the diffusion model) or 1x NVIDIA RTX™ 6000 Ada Generation for local
CPU: x86_64 architecture, 8 Cores (Intel Core i7 (7th Generation) or AMD Ryzen 5)
System Memory: 64GB
Software Requirements
OS: Ubuntu 22.04
Primary Container with Omniverse
Input
Input Type(s): JSON with payloads of text prompts and dropdown options in text
Input Format: bytes
Output
Output Type(s): Viewport, Image
Output Format: stream
Second Container with Diffusion Model
Input
Input Type(s): JSON graph structure with embedded parameters (text, number, and image in base64) (10MB custom limit, which can be changed from the primary container) Input Format: bytes
Output Output Type(s): Image Output Format: bytes
The framework is designed to enable software developers to rapidly prototype and productize custom workflows that involve capturing buffers from the viewport of a USD scene while taking conditions in a text form, then generating an image that accounts for both constraints by running an inference of diffusion models. We have included a docker image that automatically installs and deploys ComfyUI + image2image models such as SDXL or tuned models like Realviz4.0 for the included workflow, which leverages Control Net heavily to handle multiple conditions.
The software is capable of conditioning a USD scene to mask a target asset to keep the 3D rendering while inferring the others via a diffusion model given the normal map or depth map as additional constraints along with text prompts. This allows users to have more artistic control over 3D assets, as well as opting to use the 3D rendering instead of the image generated by the diffusion model. To run the experience locally, you can use the downloadable and choose to run web front end or Omniverse application. Please consult our documentation to learn more about how we integrate 3D rendering and the diffusion model to achieve the final result.
NVIDIA believes that Trustworthy AI is a shared responsibility and we have established policies and practices to enable the development of a wide array of AI applications. When downloaded or used under our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse.
For more detailed information on ethical considerations for the models, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.
NVIDIA Edify is a multimodal architecture for developing visual generative AI models for image, 3D, 360 HDRi, PBR materials, and video. Using NVIDIA AI Foundry, service providers can train, and customize Edify models to build commercially viable visual services on top of NVIDIA NIM.
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. ADDITIONAL INFORMATION: RealvisXL license at LICENSE.md · stabilityai/stable-diffusion-xl-base-1.0 at main.
Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.