Converts streamed audio to facial blendshapes for realtime lipsyncing and facial performances.
Enhance speech by correcting common audio degradations to create studio quality speech output.
Leaderboard topping reward model supporting RLHF for better alignment with human preferences.
Estimate gaze angles of a person in a video and redirect to make it frontal.
Create facial animations using a portrait photo and synchronize mouth movement with audio.
VISTA-3D is a specialized interactive foundation model for segmenting and anotating human anatomies.