Tony Tu

PhD Candidate in Machine Learning @ Georgia Tech

My name is Tony Tu, a third year Machine learning PhD candidate at Georgia Tech working with professor Yunan Luo. I am broadly interested in machine learning approaches for protein design, protein function prediction, and structural biology, with a particular focus on developing biological foundation models that capture and generalize the underlying "language" of proteins. Prior to joining Georgia Tech, I completed my Bachelor's degree in Electrical Engineering & Computer Science at the University of California, Berkeley, where I was a part of the BAIR lab working with professor Marti Hearst on educational AI.

🚀 Latest News

Dec 2025 This summer, I will be joining MBZUAI @ Abu Dhabi as a visiting scholar to develop biological foundation models.
Oct 2025 I gave a talk at CSE8803 Machine Learning with Graphs. You can find my slides and recording of my talk here.
Oct 2025 I will be attending NEURIPS 2025 @ San Diego! Feel free to come coffee chat with me about protein design, structural biology, and function alignment.
Oct 2025 Our paper "Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search" is on arXiv [Paper]. Check out our project website.
July 2025 I will be attending ICML 2025 @ Vancouver! Feel free to come coffee chat with me about protein design!
Dec 2024 I will be attending NEURIPS 2024 @ Vancouver! Feel free to come coffee chat with me!
Aug 2023 My work on prokaryotic gene prediction with protein language models is published at the ICML Workshop for Computational Biology.
Aug 2023 "Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces" published at Transactions on Machine Learning Research (TMLR).
Aug 2022 "Educational Multi-Question Generation for Reading Comprehension" published at NAACL BEA workshop.
Aug 2021 "Automatically generating cause-and-effect questions from passages" published at ACL BEA workshop.

📚 Selected Publications

The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search
arXiv (Preprint), 2025
ProtiGeno: a prokaryotic short gene finder using protein language models
International Conference on Machine Learning (ICML) Workshop for Computational Biology, 2023
Spectral Regularization Allows Data-frugal Learning over Combinatorial Spaces
Transactions on Machine Learning Research (TMLR), 2023
Educational Multi-Question Generation for Reading Comprehension
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Automatically generating cause-and-effect questions from passages
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2021)