Logo

I am a Researcher at Microsoft with the AI Red Team, where I focus on probing and improving the safety and security of frontier AI systems. I received my PhD in Computer Science from New York University, advised by Dr. Christina Pöpper, with research centered on responsible AI and privacy technologies.

Previously, I served as Research Manager at MATS, mentoring 11 researchers in shaping impactful projects on AI alignment, control, evaluations and governance. Prior to that, I held research roles at the Center for Cyber Security and Spotify Tech Research, where I contributed to cross-disciplinary efforts at the intersection of technical innovation, security, and ethical oversight.

Recent Highlights


April ‘24: Our research on reverse-engineering and jailbreaking safety filters in DALL·E models will be presented at USENIX Security in Seattle.

Dec ‘24: Elevated to IEEE Senior Member in recognition of over 10 years of contributions to the profession.

Aug ‘24: Defended my PhD dissertation, “Towards Responsible AI: Safeguarding Privacy, Integrity, and Fairness.”

Jul ‘24: Awarded runner-up for the Andreas Pfitzmann Best Paper at PETS 2024.

Mar ‘24: Appointed Publication Chair for ACNS 2024 conference.

Dec ‘23: Received the Best Paper Award at Machine Learning for Health conference, co-located with NeurIPS 2023.

Jun ‘23: Engaged with the UN Information Integrity team to address risks of online hate speech.

May ‘23: Served on the Program Committee for SecWeb 2023.

Research


Exposing the Guardrails: Reverse-Engineering and Jailbreaking Safety Filters in DALL·E Text-to-Image Pipelines
USENIX Security Symposium, Seattle, US, 2025.

Understanding Inequality of LLM Fact-Checking over Geographic Regions with Agent and Retrieval models
In Submission, 2025

CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot
USENIX Security Symposium, Anaheim, US, 2023.

How Fair are Medical Imaging Foundation Models?
Machine Learning for Health (ML4H), New Orleans, US, 2023.

Tactics, Threats & Targets: Modeling Disinformation and its Mitigation
Network and Distributed System Security, San Diego, US, 2023.

Detailed list of publications can be found at Google Scholar.