title={Presto! Distilling steps and layers for accelerating music generation.},
author={Zachary Novack and Ge Zhu and Jonah Casebeer and
Julian McAuley and Taylor Berg-Kirkpatrick and Nicholas J. Bryan},
year={2024},
eprint={TBD},
archivePrefix={arXiv},
primaryClass={cs.SD}
} -->
Zachary Novack, Ge Zhu, Jonah Casebeer, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
Paper / Website / Summary / BibTex
This work presents Presto!, a new method for accelerating audio-domain TTM models, which works by distilling the model to drop both diffusion steps and interior layers of the model itself, achieving <0.5s for generating 32s of 44.1kHz stereo audio.
@article{Novack2025Presto,
title={Presto! Distilling steps and layers for accelerating music generation.},
author={Zachary Novack and Ge Zhu and Jonah Casebeer and
Julian McAuley and Taylor Berg-Kirkpatrick and Nicholas J. Bryan},
year={2024},
eprint={TBD},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
Conference Publications
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
International Society of Music Information Retrieval (ISMIR), 2024
Paper / Website / Summary / BibTex
This work presents DITTO-2, a novel method for accelerating inference-time control in diffusion-based music models, leveraging diffusion distillation to speed up the sampling process.
@inproceedings{Novack2024Ditto2,
title={{DITTO-2}: Distilled Diffusion Inference-Time T-Optimization for Music Generation},
author={Novack, Zachary and McAuley, Julian and Berg-Kirkpatrick, Taylor and Bryan, Nicholas J.},
year={2024},
booktitle={International Society of Music Information Retrieval (ISMIR)}}
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas J. Bryan
International Conference on Machine Learning (ICML), 2024 Oral (top 1.5%)
Paper / Website / Summary / BibTex
This work presents DITTO, a novel method for controlling pretrained text-to-music diffusion models without any model finetuning, enabling a wide range of artistic editing and control tasks.
@inproceedings{Novack2024Ditto,
title={{DITTO}: Diffusion Inference-Time T-Optimization for Music Generation},
author={Novack, Zachary and McAuley, Julian and Berg-Kirkpatrick, Taylor and Bryan, Nicholas J.},
year={2024},
booktitle={International Conference on Machine Learning (ICML)}}
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
Zachary Novack, Julian McAuley, Zachary Lipton, Saurabh Garg
International Conference on Machine Learning (ICML), 2023
1st Workshop on Multimodal Representation Learning, ICLR, 2023
Paper / Code / Summary / BibTex
This work introduces CHiLS, a strategy for zero-shot classification to improve CLIP-like models that focuses on improving class names and utilizes implicit semantic hierarchies to enhance accuracy without requiring additional training.
@inproceedings{novack2023chils,
title={CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets},
author={Novack, Zachary and McAuley, Julian and Lipton, Zachary and Garg, Saurabh},
year={2023},
booktitle={International Conference on Machine Learning (ICML)}, }
Disentangling the Mechanisms Behind Implicit Regularization in SGD
Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary Lipton
International Conference on Learning Representations (ICLR), 2023
NeurIPS Workshop on The Benefits of Higher-Order Optimization in Machine Learning, 2022 Spotlight & Best Poster
Paper / Code / Summary / BibTex
This work presents a comparative empirical study of different gradient norm-based regularizers for improving Large-Batch Stochastic Gradient Descent, and ties this behavior to the trajectory of the micro-batch gradient norm during training.
@inproceedings{novack2023disentangling,
title={Disentangling the Mechanisms Behind Implicit Regularization in SGD},
author={Novack, Zachary and Kaur, Simran and Marwah, Tanya and Garg, Saurabh and Lipton, Zachary},
booktitle={International Conference on Learning Representations (ICLR)},
year={2023} }
Workshop Publications
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation
Junda Wu, Warren Li, Zachary Novack, Amit Namburi, Carol Chen, Julian McAuley
SoCal NLP Symposium, 2024
Paper / Summary / BibTex
This work presents CoLLAP, a long-form vesrion of CLAP that is able to perform long-form reasoning and retrieval over minutes of musical audio.
@inproceedings{wu2024collap,
title={CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation},
author={Junda Wu and Warren Li and Zachary Novack and Amit Namburi and Carol Chen and Julian McAuley},
year={2024},
booktitle={SoCal NLP Symposium}
}
PDMX: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing
Phillip Long, Zachary Novack, Taylor Berg-Kirkpatrick, Julian McAuley
NeurIPS Workshop on Creativity & Generative AI, 2024
Paper / Code / Website / Summary / BibTex
This work presents PDMX, the largest open source dataset of public-domain symbolic music scores in existence.
@inproceedings{long2024pdmx,
title={{PDMX}: A Large-Scale Public Domain MusicXML Dataset for Symbolic Music Processing},
author={Long, Phillip and Novack, Zachary and Berg-Kirkpatrick, Taylor and McAuley, Julian},
booktitle={NeurIPS Workshop on Creativity & Generative AI},
year={2024},
}
FUTGA: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
Junda Wu, Zachary Novack, Amit Namburi, Jiaheng Dai, Hao-Wen Dong, Zhouhang Xie, Carol Chen, Julian McAuley
3rd Workshop on NLP for Music and Audio, 2024
Paper / Website / Summary / BibTex
This work presents FUTGA, an open source AudioLLM capable of performing fine-grained reasoning and captioning for audio domain music.
@inproceedings{wu2024futga,
title={Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation},
author={Junda Wu and Zachary Novack and Amit Namburi and Jiaheng Dai and Hao-Wen Dong and Zhouhang Xie and Carol Chen and Julian McAuley},
year={2024},
booktilte={3rd Workshop on NLP for Music and Audio},
}
Unsupervised Lead Sheet Generation via Semantic Compression
Zachary Novack, Nikita Srivatsan, Taylor Berg-Kirkpatrick, Julian McAuley
AES International Symposium on AI and the Musician, 2024
Paper / Code / Summary / BibTex
This work presents a novel method Lead-AE for conditional lead sheet generation, leveraging a discrete auto-encoder setup that allows for fully unsupervised training.
@inproceedings{novack2023unsupervised,
title={Unsupervised Lead Sheet Generation via Semantic Compression},
author={Novack, Zachary and Srivatsan, Nikita and Berg-Kirkpatrick, Taylor and McAuley, Julian},
year={2024},
booktitle={AES International Symposium on AI and the Musician}}
Non Refereed Publications
Personalized Sequential Recommendation for Adaptive Itemization in MOBA Games
Zachary Novack
Paper / Summary / BibTex
This paper adapts the state-of-the-art work HT4-Rec for personalized sequential item recommendation in the popular video game League of Legends, incorporating user skill information to directly capture personal player ability.
@misc{novack2022personalized,
title={Personalized Sequential Recommendation for Adaptive Itemization in MOBA Games},
author={Novack, Zachary},
year={2022}}
Towards Generalizable Deep Speech Anonymization
Aaron Broukhim, Zachary Novack
Paper / Summary / BibTex
This work presents a method for learning language-agnostic speech anonymization systems, utilizing a GAN backbone and modified loss to actively learn an anonymized speaker distribution.
@misc{broukhim2022towards,
title={Towards Generalizable Deep Speech Anonymization},
author={Broukhim, Aaron and Novack, Zachary},
year={2022}}
Down the Rabbit Hole: Modeling Twitter Dynamics through Bayesian Inference
Zachary Novack
Paper / Summary / BibTex
This work presents an analytical study into addiction effects on Twitter, and uncovers the presence of different posting types that obey fundamentally chaotic behavior when modeled using an autoregressive Bayesian framework.
@article{novack2022down,
title={Down the Rabbit Hole: Modeling Twitter Dynamics through Bayesian Inference},
author={Novack, Zachary},
doi={"10.1184/R1/20638989.v1"},
year={2022}}
Approximating Optimal Transport via GANs for Recourse Disparity Analysis
Zachary Novack, Qi Xuan Teo, Ryan Steed
Paper / Summary / BibTex
This work presents a method for analyzing the recourse disparity across protected groups in algorithmic decision making systems, using GANs to approximate the optimal transport mapping between recourse-aligned subgroups.
@misc{novack2022approximating,
title={Approximating Optimal Transport via GANs for Recourse Disparity Analysis},
author={Novack, Zachary and Teo, Qi Xuan and Steed, Ryan},
year={2022}}
Lunch at the EigenSalad Bar: Linear Approaches to Dimensionality Reduction for Image Processing
Zachary Novack
Paper / Summary / BibTex
This pseudo-satirical work investigated linear dimensionality reduction techniques for processing images of food, attempting to find explainable bases for discrimintative food categories.
@misc{novack2021salad,
title={Lunch at the EigenSalad Bar: Linear Approaches to Dimensionality Reduction for Image Processing},
author={Novack, Zachary},
year={2021}}