About

Hello! My name is Zachary Novack, and I am currently a Computer Science PhD Candidate at UC San Diego, where I am advised by Prof. Julian McAuley and Prof. Taylor Berg-Kirkpatrick. Previously, I studied statistics and machine learning at Carnegie Mellon University, and was primarily advised by Prof. Zachary Lipton and Prof. Simon DeDeo.

I am actively on the job market and looking for industry, academic, and postdoc positions starting fall 2026!

As someone passionate about building Interactive Generative Audio/Music systems, my research primarily focuses on two main pillars of bringing the state of the art in audio generation to practical usability: Controllability and Efficiency. In particular, I have worked on extending generative audio systems with training-free (ICML 2024, Oral) and multimodal (ISMIR 2025) controls, how to accelerate both music (ICLR 2025, Spotlight) and general audio (WASPAA 2025) generation models, and how to make such interactive controls faster (ISMIR 2024).

Additionally, I’ve also collaborated extensively on researching mutli-modal text-audio reasoning, both on the modeling side, working on improving long-form audio retrieval (ICASSP 2025, Oral), as well as designing better evaluation benchmarks to accurately measure audio perception (ISMIR 2025, Best Paper Nominee).

In the past, I’ve worked on general multi-modal reasoning tasks (ICML 2023) and empirical deep optimization theory (ICLR 2023).

In my free time, I enjoy cooking, playing beach volleyball (a must in San Diego!), as well as teaching the front ensemble at 11-time world championship finalist POW Percussion Ensemble!


Updates

January 2025: Our work on accelerating text-to-music diffusion models is accepted at ICLR 2025 in Singapore as a Spotlight!
December 2024: Three works from the UCSD MUSAIC Group accepted at ICASSP 2025!
October 2024: Our work on accelerating text-to-music diffusion models is out on arXiv!
October 2024: Our work on long-form text-audio contrastive learning is out on arXiv!
September 2024: Our work on the largest dataset of public domain of symbolic music scores is out on arXiv!
June 2024: Our work on accelerated training-free editing and control for text-to-music diffusion models is accepted at ISMIR 2024 in San Francisco!
May 2024: Our work on training-free editing and control for text-to-music diffusion models is accepted at ICML 2024 in Vienna as an ORAL, and our work on unsupervised lead sheet generation is accepted at the AES Symposium for AI and the Musician in Boston!
January 2024: Our work on training-free editing and control for text-to-music diffusion models is out on arxiv!
October 2023: Our work on unsupervised lead sheet generation is out on arxiv!
June 2023: Started Research Scientist internship with Nicholas Bryan at the Adobe Research Audio Group!
April 2023: Our work on augmenting CLIP zero-shot inference with hierarchical label sets was accepted to ICML 2023 in Honolulu, Hawaii!
March 2023: Our work on augmenting CLIP zero-shot inference with hierarchical label sets was accepted to the ICLR 2023 1st Workshop on Multimodal Representation Learning!
January 2023: Our work on understanding implicit regularization mechanisms in SGD was accepted to ICLR 2023 in Kigali, Rwanda!
December 2022: Our work on understanding implicit regularization mechanisms in SGD got accepted to the NeurIPS 2022 Workshop on the Benefits of Higher Order Optimization in Machine Learning (HOO-ML), as a Spotlight and won Best Poster!
September 2022: Began CS PhD at UCSD!
May 2022: Submitted senior thesis on modeling social media addiction on Twitter to CMU Kilthub.
May 2022: Graduated from CMU with B.S. in Statistics & Machine Learning, and a minor in Sonic Arts!