Skander Moalla

Final-year PhD Candidate @ EPFL (CLAIRE)

aug-2023-stn.jpeg

I am considering full-time positions in Zurich, Paris, or London for Fall 2026. I bring a strong background in both research and engineering.

Hi! I’m a final-year PhD candidate in reinforcement learning (RL) and LLM post-training at EPFL, advised by Prof. Caglar Gulcehre (CLAIRE Lab).

I work on scaling reinforcement learning for LLMs by developing algorithms that leverage off-policy data to improve training efficiency and diversity, while preventing saturation in terms of plasticity and exploration.

My recent work includes co-leading the post-training of a fully open source 70B LLM (Apertus 70B, Project Apertus), leveraging diversity to improve test-time scaling and sampling in reinforcement learning (SR with the Gemma post-training team at Google DeepMind), deriving a calibrated offline & off-policy RL fine-tuning algorithm for LLMs (Quantile Reward Policy Optimization, NeurIPS 2025), and exposing the connection between plasticity, trust-region, and off-policy collapse (No Representation, No Trust, NeurIPS 2024).

I have developed several codebases from scratch with significant open-source contributions and infrastructure work such as a scalable code execution sandbox for QRPO and a scalable offline post-training pipeline for Apertus 70B. I am also proud of building infrastructure for reproducible ML (Python Machine Learning Research Template) and conducting award-winning reproducibility studies (ReScience 2023).

Interests

Reinforcement Learning LLM post-training, Off-policy RL, Reasoning, Science

Deep Learning Plasticity, Deep RL, Transformers

Education

PhD in CS EPFL | Prof. Caglar Gulcehre

MSc in CS Oxford | Prof. Shimon Whiteson

BSc in Maths & CS Ecole Polytechnique

Exchange Programs U of Toronto | Software Eng
Stanford | AI & Entrepreneurship

Experience

PhD Student Researcher Google DeepMind

RL Applied Scientist Quincus

SDE/SWE & ML Amazon

Research Assistant Prof. Hess & Blue Brain Project

Background

I hold an MSc in Advanced Computer Science from the University of Oxford (Reuben College inaugural cohort 2021-2022). My MSc thesis on Multi-Agent RL and policy gradient methods was supervised by Mingfei Sun and Prof. Shimon Whiteson at the Whiteson Research Lab (WhiRL).

I validated a BSc in Mathematics and Computer Science from Ecole Polytechnique (inaugural cohort 2017-2020) and was a visiting student at EPFL, the University of Toronto, and Stanford University, where I focused on software engineering and entrepreneurship.

I was a PhD student researcher at Google DeepMind (Paris) with the Gemma post-training team hosted by Alexandre Ramé, working on leveraging diversity to improve test-time scaling and sampling in reinforcement learning. I also interned as an applied scientist at Quincus to optimise middle-mile logistics using RL, and as a software engineer at Amazon building AI tools for Amazon Transportation Services.

At EPFL, I was the president of CUBAliente (2024), the latin social dancing association. At Oxford, I led the Careers team of the Oxford Artificial Intelligence society (OxAI) (2022) and played as a libero for the men’s Blues volleyball team (2021-2022). At Ecole Polytechnique, I was elected in the first board of the L’ORE (2018), the Bachelor’s program student society.

Selected Publications

  1. NeurIPS 2025
    Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
    Simon Matrenok*,  Skander Moalla*, and Caglar Gulcehre
    In Advances in Neural Information Processing Systems (NeurIPS) 2025
  2. arXiv
    Apertus: Democratizing Open and Compliant LLMs for Global Language Environments
    Project Apertus,  [...],  Skander Moalla*,  [...], Antoine Bosselut, Martin Jaggi, and Imanol Schlag
    arXiv preprint arXiv:2509.14233 2025
  3. NeurIPS 2024
    No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
    Skander Moalla*, Andrea Miele, Razvan Pascanu, and Caglar Gulcehre
    In Advances in Neural Information Processing Systems (NeurIPS) 2024
  4. NeurIPS 2024
    Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers
    Xiuying Wei,  Skander Moalla, Razvan Pascanu, and Caglar Gulcehre
    In Advances in Neural Information Processing Systems (NeurIPS) 2024
  5. NeurIPS 2023
    SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning
    Benjamin Ellis, Jonathan Cook,  Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob Foerster, and Shimon Whiteson
    In Advances in Neural Information Processing Systems (NeurIPS) 2023
  6. ReScience
    [Re] Reproducibility Study of Behavior Transformers
    Skander Moalla*, Manuel Madeira*, Lorenzo Riccio*, and Joonhyung Lee*
    ReScience C 2023