Karren D. Yang

I am Co-Founder and Chief Scientist at Nuance Labs. My research focuses on multimodal AI at the intersection of audio and video.

Previously, I was a Senior AI/ML Researcher at Apple. I received my Ph.D. from the Laboratory for Information & Decision Systems at MIT, where I was advised by Caroline Uhler. During my Ph.D., I worked with leading researchers from Apple, Niantic Labs, Meta Reality Labs, Bosch Center for AI, and Adobe Research.

Email  /  CV  /  Google Scholar

profile photo
Publications
Research Topics: all / multimodal learning / audio-visual learning / speech recognition / generative modeling / computational biology / optimal transport / causal inference
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Sanjoy Chowdhury, Karren D Yang, Xudong Liu, Fartash Faghri, Pavan Kumar Anasosalu Vasu, Oncel Tuzel, Dinesh Manocha, Chun-Liang Li, Raviteja Vemulapalli
arXiv, 2025

We introduce AMUSE, a benchmark for evaluating multimodal large language models on agentic multi-speaker audio-visual reasoning, and propose RAFT, a data-efficient agentic alignment framework achieving up to 39.52% relative improvement.

pdf | abstract

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
Akshita Gupta, Tatiana Likhomanenko, Karren Dai Yang, Richard He Bai, Zakaria Aldeneh, Navdeep Jaitly
arXiv, 2024

We introduce Visatronic, a unified multimodal decoder-only transformer for video-text to speech synthesis that achieves 4.5% WER on LRS3 in zero-shot, outperforming prior methods trained on LRS3.

pdf | abstract | demo

Hypernetworks for Personalizing ASR to Atypical Speech
Max Müller-Eberstein, Dianna Yee, Karren Yang, Gautam Varma Mantena, Colin Lea
Transactions of the Association for Computational Linguistics, Vol. 12, 2024

We propose meta-learned hypernetworks that generate utterance-level adaptations on-the-fly for personalizing ASR to diverse atypical speech, achieving 75.2% relative WER reduction using 0.1% of the parameter budget.

pdf | abstract

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren D Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel
CVPR, 2024

We propose large-scale benchmarks and a probabilistic model for speech-driven 3D facial motion synthesis that achieves both diversity and fidelity, with applications to unseen speaker style transfer and improving downstream audio-visual models.

pdf | abstract

Corpus Synthesis for Zero-Shot ASR Domain Adaptation Using Large Language Models
Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel
ICASSP, 2024

We propose a data synthesis pipeline using LLMs and controllable speech synthesis to adapt ASR models to new domains without any target-domain data, achieving 28% relative WER improvement.

pdf | abstract

FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with a Simple Super-Resolution Pipeline
Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan
WACV, 2024   (Oral Presentation)

We build a simple NeRF + super-resolution pipeline that upscales NeRF outputs by 2-4x, increasing inference speed up to 18x on GPU and 12.8x on Apple M1 Pro, while training up to 23x faster than existing NeRF+SR methods.

pdf | abstract

Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang
arXiv, 2023

We combine blind audio recordings with 3D scene information to estimate sound anywhere in a scene, jointly tackling source localization, separation, and dereverberation.

pdf | abstract | code

Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis
Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
ICASSP, 2023

We study when and why synthetic data is effective for ASR personalization, finding that text content rather than style drives speaker adaptation, leading to a content-based data selection strategy.

pdf | abstract

Systematically Characterizing the Roles of E3-Ligase Family Members in Inflammatory Responses with Massively Parallel Perturb-seq
Kathryn Geiger-Schuller, Basak Eraslan, Olena Kuksenko, Kushal K Dey, Karthik A Jagadeesh, Pratiksha I Thakore, Ozge Karayel, Andrea R Yung, Anugraha Rajagopalan, Ana M Meireles, Karren Dai Yang, Liat Amir-Zilberstein, Toni Delorey, Devan Phillips, Raktima Raychowdhury, Christine Moussion, Alkes L Price, Nir Hacohen, John G Doench, Caroline Uhler, Orit Rozenblatt-Rosen, Aviv Regev
bioRxiv, 2023

Using Perturb-seq, we interrogated the function of 1,130 E3 ligases in the inflammatory response in dendritic cells, revealing co-functional modules and predicting outcomes of new genetic combinations with deep learning.

pdf | abstract

Camera Pose Estimation and Localization with Active Audio Sensing
Karren Yang, Clément Godard, Eric Brachmann, Michael Firman
ECCV, 2022

We use audio sensing to improve the performance of visual localization methods on three tasks: relative pose estimation, place recognition, and absolute pose regression.

pdf | abstract

Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren Yang, Dejan Markovic, Steven Krenn, Vasu Agrawal, Alexander Richard
CVPR, 2022   (Oral Presentation)

We perform visual speech enhancement by using audio-visual speech cues to generate the codes of a neural speech codec, enabling efficient synthesis of clean, realistic speech from noisy signals.

pdf | abstract | video | dataset

Defending Multimodal Fusion Models against Single-Source Adversaries
Karren Yang, Wan-Yi Lin, Manash Barman, Filipe Condessa, Zico Kolter
CVPR, 2021

We study the robustness of multimodal models on three tasks-- action recognition, object detection, and sentiment analysis-- and develop a robust fusion strategy that protects against worst-case errors caused by a single modality.

pdf | abstract

Mol2Image: Improved Conditional Flow Models for Molecule-to-Image Synthesis
Karren Yang, Samuel Goldman, Wengong Jin, Alex Lu, Regina Barzilay, Tommi Jaakkola, Caroline Uhler
CVPR, 2021

We build a molecule-to-image synthesis model that predicts the biological effects of molecular treatments on cell microscopy images.

pdf | abstract | code

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren Yang, Justin Salamon, Bryan Russell
CVPR, 2020   (Oral Presentation)

We leverage spatial correspondence between audio and vision in videos for self-supervised representation learning and apply the learned representations to three downstream tasks: sound localization, audio spatialization, and audio-visual sound separation.

pdf | abstract | project website | dataset | demo code

Multi-domain translation between single-cell imaging and sequencing data using autoencoders
Karren Dai Yang*, Anastasiya Belyaeva*, Saradha Venkatachalapathy, Karthik Damodaran, Abigail Katcoff, Adityanarayanan Radhakrishnan, G. V. Shivashankar, Caroline Uhler
Nature Communications 12, 31 (2021)

We propose a framework for integrating and translating between different modalities of single-cell biological data by using autoencoders to map to a shared latent space.

pdf | abstract | code

Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing
Anastasiya Belyaeva, Louis Cammarata, Adityanarayanan Radhakrishnan, Chandler Squires, Karren Yang, G. V. Shivashankar, Caroline Uhler
Nature Communications 12, 1024 (2021)

We integrate transcriptomic, proteomic, and structural data to identify candidate drugs and targets that affect the SARS-CoV-2 and aging pathways.

pdf | abstract

Optimal Transport using GANs for Lineage Tracing
Neha Prasad, Karren Yang, Caroline Uhler
ICML Workshop on Computational Biology, 2020   (Oral Spotlight)

We propose a novel approach to computational lineage tracing that combines supervised learning with optimal transport based on generative adversarial networks.

pdf | abstract | code

Scalable Unbalanced Optimal Transport using Generative Adversarial Networks
Karren Yang, Caroline Uhler
ICLR, 2019

We align and translate between datasets by performing unbalanced optimal transport with generative adversarial networks.

pdf | abstract | code

Predicting cell lineages using autoencoders and optimal transport
Karren Yang, Karthik Damodaran, Saradha Venkatachalapathy, Ali C. Soylemezoglu, G. V. Shivashankar, Caroline Uhler
PLOS Computational Biology 16(4): e1007828 (2020)

We combine autoencoding and optimal transport to align biological imaging datasets collected from different time points.

pdf | abstract | code

Multi-Domain Translation by Learning Uncoupled Autoencoders
Karren Yang, Caroline Uhler
ICML Workshop on Computational Biology, 2019   (Oral Spotlight)

We train domain-specific autoencoders to map different data modalities to the same latent space and translate between them.

pdf | abstract

ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery
Raj Agrawal, Chandler Squires, Karren Yang, Karthik Shanmugam, Caroline Uhler
AISTATS, 2019

We propose an experimental design strategy for target causal discovery.

pdf | abstract | code

Characterizing and Learning Equivalence Classes of Causal DAGs under Interventions
Karren Yang, Abigail Katcoff, Caroline Uhler
ICML, 2018   (Oral Presentation)

We characterize interventional Markov equivalence classes of DAGs that can be identified under soft interventions and propose the first provably consistent algorithm for learning DAGs in this setting.

pdf | abstract

Memorization in Overparameterized Autoencoders
Adityanarayanan Radhakrishnan, Karren Yang, Mikhail Belkin, Caroline Uhler
arXiv, 2018

We show that overparameterized autoencoders are biased towards learning step function around training examples.

pdf | abstract

Permutation-based Causal Inference Algorithms with Interventions
Yuhao Wang, Liam Solus, Karren Yang, Caroline Uhler
NeurIPS, 2017   (Oral Presentation)

We present two provably consistent algorithms for learning DAGs from observational and (hard) interventional data.

pdf | abstract


Modified version of template from here and here.