NVIDIA
Explore
Models
Blueprints
GPUs
Docs
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2025 NVIDIA Corporation

NeMo Data Designer

NeMo Data Designer

A compound AI system for synthetic data generation

View GitHubGet API Key

Getting Started

All you need to get started with this free trial of NeMo Data Designer is an API key. Once you have your key, follow our tutorials on GitHub or run through the steps below to create a product review dataset.

Step 1
Install the NeMo Data Designer Python SDK

Install the NeMo Microservices SDK in your Python virtual environment. If you don’t have one, we recommend creating one using uv.

uv pip install "nemo-microservices[data-designer]"
## If not using uv
# pip install "nemo-microservices[data-designer]"

Step 2
Initialize NeMo Data Designer

Generate an API key to get started.

from nemo_microservices.data_designer.essentials import (
    CategorySamplerParams,
    DataDesignerConfigBuilder,
    LLMTextColumnConfig,
    NeMoDataDesignerClient,
    PersonSamplerParams,
    SamplerColumnConfig,
    SamplerType,
    SubcategorySamplerParams,
    UniformSamplerParams,
)

data_designer_client = NeMoDataDesignerClient(
    base_url="https://ai.api.nvidia.com/v1/nemo/dd",
    default_headers={"Authorization": "Bearer $API_KEY_REQUIRED_IF_EXECUTING_OUTSIDE_NGC"} # auto-generated API KEY
)

# The following model aliases are availble by default in this hosted Data Designer
# nemotron-nano-v2, nemotron-super, mistral-small, gpt-oss-20b, gpt-oss-120b, llama-4-scout-17b  
model_alias="nemotron-nano-v2"

config_builder = DataDesignerConfigBuilder()

Step 3
Define the columns in your dataset

We've configured default model aliases for you to choose from.

###
# This free trial includes:
# - nemotron-nano-v2     → nvidia/nvidia-nemotron-nano-9b-v2
# - nemotron-super       → nvidia/llama-3.3-nemotron-super-49b-v1.5
# - mistral-small        → mistralai/mistral-small-24b-instruct
# - gpt-oss-20b          → openai/gpt-oss-20b
# - gpt-oss-120b         → openai/gpt-oss-120b
# - llama-4-scout-17b    → meta/llama-4-scout-17b-16e-instruct
###

config_builder.add_column(
  SamplerColumnConfig(
      name="product_category",
      sampler_type=SamplerType.CATEGORY,
      params=CategorySamplerParams(
          values=[
              "Electronics",
              "Clothing",
              "Home & Kitchen",
              "Books",
              "Home Office",
          ],
      ),
  )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="product_subcategory",
        sampler_type=SamplerType.SUBCATEGORY,
        params=SubcategorySamplerParams(
            category="product_category",
            values={
                "Electronics": [
                    "Smartphones",
                    "Laptops",
                    "Headphones",
                    "Cameras",
                    "Accessories",
                ],
                "Clothing": [
                    "Men's Clothing",
                    "Women's Clothing",
                    "Winter Coats",
                    "Activewear",
                    "Accessories",
                ],
                "Home & Kitchen": [
                    "Appliances",
                    "Cookware",
                    "Furniture",
                    "Decor",
                    "Organization",
                ],
                "Books": [
                    "Fiction",
                    "Non-Fiction",
                    "Self-Help",
                    "Textbooks",
                    "Classics",
                ],
                "Home Office": [
                    "Desks",
                    "Chairs",
                    "Storage",
                    "Office Supplies",
                    "Lighting",
                ],
            },
        ),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="target_age_range",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["18-25", "25-35", "35-50", "50-65", "65+"]
        ),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="customer",
        sampler_type=SamplerType.PERSON,
        params=PersonSamplerParams(age_range=[18, 70]),
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="number_of_stars",
        sampler_type=SamplerType.UNIFORM,
        params=UniformSamplerParams(low=1, high=5),
        convert_to="int",
    )
)

config_builder.add_column(
    SamplerColumnConfig(
        name="review_style",
        sampler_type=SamplerType.CATEGORY,
        params=CategorySamplerParams(
            values=["rambling", "brief", "detailed", "structured with bullet points"],
            weights=[1, 2, 2, 1],
        ),
    )
)

config_builder.add_column(
    LLMTextColumnConfig(
        name="product_name",
        prompt=(
            "Come up with a creative product name for a product in the '{{ product_category }}' category, focusing "
            "on products related to '{{ product_subcategory }}'. The target age range of the ideal customer is "
            "{{ target_age_range }} years old. Respond with only the product name, no other text."
        ),
        # This is optional, but it can be useful for controlling the behavior of the LLM. Do not include instructions
        # related to output formatting in the system prompt, as Data Designer handles this based on the column type.
        system_prompt=(
            "You are a helpful assistant that generates product names. You respond with only the product name, "
            "no other text. You do NOT add quotes around the product name."
        ),
        model_alias=model_alias,
    )
)

config_builder.add_column(
    LLMTextColumnConfig(
        name="customer_review",
        prompt=(
            "You are a customer named {{ customer.first_name }} from {{ customer.city }}, {{ customer.state }}. "
            "You are {{ customer.age }} years old and recently purchased a product called {{ product_name }}. "
            "Write a review of this product, which you gave a rating of {{ number_of_stars }} stars. "
            "The style of the review should be '{{ review_style }}'."
        ),
        model_alias=model_alias,
    )
)

Step 4
Start generating data

By default each job will generate 10 records of data, you can generate up to 100 records of data per job.

preview = data_designer_client.preview(config_builder, num_records=10)

Iterate through the generated data:

preview.display_sample_record()

Get your results as a Pandas DataFrame

preview.dataset

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of the Nemotron models nvidia/nvidia-nemotron-nano-9b-v2 and nvidia/llama-3.3-nemotron-super-49b-v1.5 are governed by the NVIDIA Open Model License Agreement. Use of mistralai/mistral-small-24b-instruct, openai/gpt-oss-20b, openai/gpt-oss-120b and meta/llama-4-scout-17b-16e-instruct models are governed by the NVIDIA Community Model License. Additional Information: for nvidia/llama-3.3-nemotron-super-49b-v1.5, the Llama 3.3 Community License Agreement ; for mistralai/mistral-small-24b-instruct, openai/gpt-oss-20b and openai/gpt-oss-120b models, the Apache 2.0 license; and for meta/llama-4-scout-17b-16e-instruct, Llama 4 Community Model License. Built with Llama.

Resources

  • Documentation
  • Tutorials
  • Terms of Service