Weekly reflections and notes from the Columbia AI Alignment Club Technical Fellowship, covering key papers in AI safety, alignment, interpretability, control, and evaluation.
September 24, 2025
Topics: AI safety, alignment, RLHF
Read notes →October 01, 2025
Topics: AI safety, alignment, misgeneralization
Read notes →October 08, 2025
Topics: AI safety, forecasting, capabilities, risk scenarios
Read notes →October 15, 2025
Topics: AI safety, mechanistic interpretability, superposition, sparse autoencoders
Read notes →October 22, 2025
Topics: AI safety, control, scheming, red teaming
Read notes →October 29, 2025
Topics: AI safety, scalable oversight, debate, weak-to-strong
Read notes →November 05, 2025
Topics: AI safety, red teaming, evaluations, adversarial attacks
Read notes →November 12, 2025
Topics: AI safety, timelines, careers, forecasting
Read notes →