Skander Moalla
Final-year PhD Candidate @ EPFL (CLAIRE)
I am considering full-time positions in Zurich, Paris, or London for Fall 2026. I bring a strong background in both research and engineering.
Hi! I’m a final-year PhD candidate in reinforcement learning (RL) and LLM post-training at EPFL, advised by Prof. Caglar Gulcehre (CLAIRE Lab).
I work on scaling reinforcement learning for LLMs by developing algorithms that leverage off-policy data to improve training efficiency and diversity, while preventing saturation in terms of plasticity and exploration.
My recent work includes co-leading the post-training of a fully open source 70B LLM (Apertus 70B, Project Apertus), leveraging diversity to improve test-time scaling and sampling in reinforcement learning (SR with the Gemma post-training team at Google DeepMind), deriving a calibrated offline & off-policy RL fine-tuning algorithm for LLMs (Quantile Reward Policy Optimization, NeurIPS 2025), and exposing the connection between plasticity, trust-region, and off-policy collapse (No Representation, No Trust, NeurIPS 2024).
I have developed several codebases from scratch with significant open-source contributions and infrastructure work such as a scalable code execution sandbox for QRPO and a scalable offline post-training pipeline for Apertus 70B. I am also proud of building infrastructure for reproducible ML (Python Machine Learning Research Template) and conducting award-winning reproducibility studies (ReScience 2023).
Interests
Reinforcement Learning LLM post-training, Off-policy RL, Reasoning, Science
Deep Learning Plasticity, Deep RL, Transformers
Education
PhD in CS EPFL | Prof. Caglar Gulcehre
MSc in CS Oxford | Prof. Shimon Whiteson
BSc in Maths & CS Ecole Polytechnique
Exchange Programs U of Toronto | Software Eng
Stanford | AI & Entrepreneurship
Experience
PhD Student Researcher Google DeepMind
RL Applied Scientist Quincus
SDE/SWE & ML Amazon
Research Assistant Prof. Hess & Blue Brain Project
Background
I hold an MSc in Advanced Computer Science from the University of Oxford (Reuben College inaugural cohort 2021-2022). My MSc thesis on Multi-Agent RL and policy gradient methods was supervised by Mingfei Sun and Prof. Shimon Whiteson at the Whiteson Research Lab (WhiRL).
I validated a BSc in Mathematics and Computer Science from Ecole Polytechnique (inaugural cohort 2017-2020) and was a visiting student at EPFL, the University of Toronto, and Stanford University, where I focused on software engineering and entrepreneurship.
I was a PhD student researcher at Google DeepMind (Paris) with the Gemma post-training team hosted by Alexandre Ramé, working on leveraging diversity to improve test-time scaling and sampling in reinforcement learning. I also interned as an applied scientist at Quincus to optimise middle-mile logistics using RL, and as a software engineer at Amazon building AI tools for Amazon Transportation Services.
At EPFL, I was the president of CUBAliente (2024), the latin social dancing association. At Oxford, I led the Careers team of the Oxford Artificial Intelligence society (OxAI) (2022) and played as a libero for the men’s Blues volleyball team (2021-2022). At Ecole Polytechnique, I was elected in the first board of the L’ORE (2018), the Bachelor’s program student society.
Selected Publications
- NeurIPS 2025Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition FunctionsIn Advances in Neural Information Processing Systems (NeurIPS) 2025
- arXivApertus: Democratizing Open and Compliant LLMs for Global Language EnvironmentsarXiv preprint arXiv:2509.14233 2025
- NeurIPS 2024No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPOIn Advances in Neural Information Processing Systems (NeurIPS) 2024
- NeurIPS 2024Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward LayersIn Advances in Neural Information Processing Systems (NeurIPS) 2024
- NeurIPS 2023SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement LearningIn Advances in Neural Information Processing Systems (NeurIPS) 2023
- ReScience