![](/_next/image?url=https%3A%2F%2Fassets.ngc.nvidia.com%2Fproducts%2Fapi-catalog%2Fimages%2Fedify-3d.jpg&w=3840&q=75)
Shutterstock/edify-3d
Shutterstock Generative 3D service for 3D asset generation. Trained on NVIDIA Edify using Shutterstock’s licensed creative libraries
Model Overview
Description:
Shutterstock 3D Generation powered by NVIDIA Edify generates 3D meshes and associated 2D PBR textures from text prompt or optional reference image. This model is for commercial use.
References:
This model is based on large-scale diffusion models.
[1] Balaji, Y., Nah, S., Huang, X., Vahdat, A., Song, J., Kreis, K., Aittala, M., Aila, T., Laine, S., Catanzaro, B. and Karras, T., 2022. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324.
[2] Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T. and Ho, J., 2022. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, pp.36479-36494.
[3] Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. and Chen, M., 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2), p.3.
[4] Lin, C-H, Gao, J., Tang, L, Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M-Y and Lin, T-Y, Magic3D: High-Resolution Text-to-3D Content Creation. CVPR 2023.
Model Architecture:
Architecture Type: Convolution Neural Network (CNN) and Transformer
Network Architecture: Unet-Based CNN and Transformer
This model is based on diffusion architecture and Transformer architecture.
Input:
Input Type(s): Text (Prompt), Image (Optional)
Input Format(s): Text: Raw and Image: Red, Green, Blue (RGB)
Input Parameters: Text: One-Dimensional (1D) and Image: Two-Dimensional (2D, optional)
Other Properties Related to Input: Max 500 text tokens. No input minimum or maximum
resolution; input images segmented and resized to 224 x 224
Output:
Output Type(s): Mesh
Output Format: Three-Dimensional (3D) with Texture Map (2D)
Other Properties Related to Output: Output Target Faces (Configurable)- [500, 200000]; Texture Resolution: {'1k', '2k', ‘4k’}
Software Integration:
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere
[Preferred/Supported] Operating System(s): - Linux
Model Version(s):
Edify 3D v1.0
Training & Evaluation:
Training Dataset:
Link: Shutterstock Images, TurboSquid 3D Models
** Data Collection Method by dataset
- Customer data
** Labeling Method by dataset - Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): 600 million image-text pairs of licensed high quality photography, illustrations, and 3D renderings. 270 thousand 3D meshes. 228k PixelSquid 3D models rendered in multi-view images. from TurboSquid.
Evaluation Dataset:
** Data Collection Method by dataset
- Customer data
Properties (Quantity, Dataset Descriptions, Sensor(s)): Data contains: 600 million image-text pairs of licensed high quality photography, illustrations, and 3D renderings. 270 thousand 3D meshes from TurboSquid. 228k PixelSquid 3D models rendered in multi-view images.
Inference:
Engine: Tensor(RT), Triton
Test Hardware:
- NVIDIA H100
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.