Posts by Tags

Why Human Alignment Works Without Interpretability

8 minute read

Published: December 29, 2025

A striking assumption underlies much of contemporary AI alignment research: alignment requires interpretability. If we are to ensure that artificial systems act in accordance with human values, we must be able to inspect their internal representations, understand their objectives, and predict their decisions. Opacity, in this view, is dangerous. Alignment without transparency is treated as incoherent. Human alignment presents a problem for this assumption.

The Alignment Problem Is an Anthropological Issue

6 minute read

Published: November 04, 2025

When AI researchers talk about aligning machines with “human values,” the phrase sounds deceptively simple. It implies a shared moral universe, a single standard of right and wrong. But there is no single moral universe. Across cultures, intelligence, agency, and control are understood in vastly different ways. Western traditions often equate intelligence with abstract reasoning, analytical problem-solving, and autonomous decision-making. In many collectivist societies, intelligence is relational: it involves the ability to act with discernment in social contexts, maintain harmony, and fulfill moral obligations. Indigenous cosmologies may distribute agency across humans, ancestors, and the environment, with control being relational rather than individual.

Intelligence as Dynamic Balance - From Evolution to the AI-Culture Fork

10 minute read

Published: September 08, 2025

Your brain just performed an extraordinary feat. As your eyes moved across these words, millions of neurons fired in precise patterns, transforming chaotic photons into meaning. But here’s what’s remarkable. Your visual system didn’t just recognize familiar letter shapes. It is simultaneously ready to make sense of fonts you’ve never seen, handwriting styles that would baffle it, and even words that don’t quite look right. This is the fundamental tension that defines intelligence across every domain: the balance between finding useful patterns and breaking free from them when circumstances change. This same dynamic, what I call the pattern-finding/pattern-breaking tension, operates across scales from evolutionary deep time to the millisecond decisions your neurons make right now. It is not just a curious parallel. These systems are nested within each other, each building on the solutions developed by the previous level, each facing its own unique constraints while grappling with the same core challenge.

Why Human Alignment Works Without Interpretability

8 minute read

Published: December 29, 2025

A striking assumption underlies much of contemporary AI alignment research: alignment requires interpretability. If we are to ensure that artificial systems act in accordance with human values, we must be able to inspect their internal representations, understand their objectives, and predict their decisions. Opacity, in this view, is dangerous. Alignment without transparency is treated as incoherent. Human alignment presents a problem for this assumption.

CAIAC Papers Week 8

12 minute read

Published: November 12, 2025

The final week steps back from technical details to address broader strategic questions: How fast is AI really progressing? What does this mean for alignment timelines? And how can individuals contribute to solving this problem?

CAIAC Papers Week 7

9 minute read

Published: November 05, 2025

This week focuses on how we test AI systems for dangerous capabilities before deployment. Red teaming and evaluations aim to characterize what models can and can't do—especially for tasks we'd prefer they couldn't do.

The Alignment Problem Is an Anthropological Issue

6 minute read

Published: November 04, 2025

When AI researchers talk about aligning machines with “human values,” the phrase sounds deceptively simple. It implies a shared moral universe, a single standard of right and wrong. But there is no single moral universe. Across cultures, intelligence, agency, and control are understood in vastly different ways. Western traditions often equate intelligence with abstract reasoning, analytical problem-solving, and autonomous decision-making. In many collectivist societies, intelligence is relational: it involves the ability to act with discernment in social contexts, maintain harmony, and fulfill moral obligations. Indigenous cosmologies may distribute agency across humans, ancestors, and the environment, with control being relational rather than individual.

CAIAC Papers Week 6

6 minute read

Published: October 29, 2025

This week addresses the scalable oversight problem: how do humans supervise AI systems smarter than themselves? Four complementary approaches emerge—AI-assisted feedback, debate, weak-to-strong generalization, and combinations thereof.

CAIAC Papers Week 5

8 minute read

Published: October 22, 2025

This week introduces the Control agenda: a pragmatic approach to AI safety that assumes models might be actively trying to subvert our safety measures. Rather than ensuring AIs want to be safe (alignment), control ensures they can't cause catastrophes even if they're scheming against us.

CAIAC Papers Week 4

11 minute read

Published: October 15, 2025

This week shifts from trajectory forecasting to the technical challenge of understanding what's actually happening inside neural networks. Mechanistic interpretability promises to crack open the black box—to understand not just what models do, but how they do it. The two papers we examined reveal both the fundamental obstacle (superposition) and a promising technique for overcoming it (sparse autoencoders for dictionary learning).

CAIAC Papers Week 3

12 minute read

Published: October 08, 2025

This week, our AI safety reading group examined three pieces that shift from abstract alignment challenges to concrete trajectory analysis: where AI capabilities are headed, how fast they're improving, and what catastrophic scenarios might unfold. These readings force us to confront the uncomfortable gap between exponential technical progress and our glacial institutional response systems.

CAIAC Papers Week 2

3 minute read

Published: October 01, 2025

This week, our AI safety reading group explored three papers addressing why alignment is hard, how misgeneralization can arise even with correct specifications, and why all ML systems have “weird” failure modes. Below are my reflections, observations, and speculative ideas for addressing these challenges.

CAIAC Papers Week 1

4 minute read

Published: September 24, 2025

This semester, I joined the Columbia AI Alignment Club Technical Fellowship, where we discuss AI safety concepts and papers. This week, we explored papers and blog posts on specification gaming, deceptive behavior, RLHF, and alignment failures. These readings highlight how models can behave in unintended or manipulative ways, even when objectives appear simple or human-aligned.

Papers I Read This Week

5 minute read

Published: September 01, 2025

I’m starting a new series called “Papers I Read This Week” to keep track of some of the most interesting work I’ve been reading. I often skim more abstracts and excerpts than I can list here, but this will serve as a place to highlight the papers that stood out—whether for their ideas, methods, or the questions they raise.

CAIAC Papers Week 1

4 minute read

Published: September 24, 2025

This semester, I joined the Columbia AI Alignment Club Technical Fellowship, where we discuss AI safety concepts and papers. This week, we explored papers and blog posts on specification gaming, deceptive behavior, RLHF, and alignment failures. These readings highlight how models can behave in unintended or manipulative ways, even when objectives appear simple or human-aligned.

CAIAC Papers Week 7

9 minute read

Published: November 05, 2025

This week focuses on how we test AI systems for dangerous capabilities before deployment. Red teaming and evaluations aim to characterize what models can and can't do—especially for tasks we'd prefer they couldn't do.

CAIAC Papers Week 2

3 minute read

Published: October 01, 2025

This week, our AI safety reading group explored three papers addressing why alignment is hard, how misgeneralization can arise even with correct specifications, and why all ML systems have “weird” failure modes. Below are my reflections, observations, and speculative ideas for addressing these challenges.

CAIAC Papers Week 1

4 minute read

Published: September 24, 2025

This semester, I joined the Columbia AI Alignment Club Technical Fellowship, where we discuss AI safety concepts and papers. This week, we explored papers and blog posts on specification gaming, deceptive behavior, RLHF, and alignment failures. These readings highlight how models can behave in unintended or manipulative ways, even when objectives appear simple or human-aligned.

CAIAC Papers Week 3

12 minute read

Published: October 08, 2025

This week, our AI safety reading group examined three pieces that shift from abstract alignment challenges to concrete trajectory analysis: where AI capabilities are headed, how fast they're improving, and what catastrophic scenarios might unfold. These readings force us to confront the uncomfortable gap between exponential technical progress and our glacial institutional response systems.

CAIAC Papers Week 8

12 minute read

Published: November 12, 2025

The final week steps back from technical details to address broader strategic questions: How fast is AI really progressing? What does this mean for alignment timelines? And how can individuals contribute to solving this problem?

Intelligence as Dynamic Balance - From Evolution to the AI-Culture Fork

10 minute read

Published: September 08, 2025

Your brain just performed an extraordinary feat. As your eyes moved across these words, millions of neurons fired in precise patterns, transforming chaotic photons into meaning. But here’s what’s remarkable. Your visual system didn’t just recognize familiar letter shapes. It is simultaneously ready to make sense of fonts you’ve never seen, handwriting styles that would baffle it, and even words that don’t quite look right. This is the fundamental tension that defines intelligence across every domain: the balance between finding useful patterns and breaking free from them when circumstances change. This same dynamic, what I call the pattern-finding/pattern-breaking tension, operates across scales from evolutionary deep time to the millisecond decisions your neurons make right now. It is not just a curious parallel. These systems are nested within each other, each building on the solutions developed by the previous level, each facing its own unique constraints while grappling with the same core challenge.

CAIAC Papers Week 5

8 minute read

Published: October 22, 2025

This week introduces the Control agenda: a pragmatic approach to AI safety that assumes models might be actively trying to subvert our safety measures. Rather than ensuring AIs want to be safe (alignment), control ensures they can't cause catastrophes even if they're scheming against us.

Why Human Alignment Works Without Interpretability

8 minute read

Published: December 29, 2025

A striking assumption underlies much of contemporary AI alignment research: alignment requires interpretability. If we are to ensure that artificial systems act in accordance with human values, we must be able to inspect their internal representations, understand their objectives, and predict their decisions. Opacity, in this view, is dangerous. Alignment without transparency is treated as incoherent. Human alignment presents a problem for this assumption.

The Alignment Problem Is an Anthropological Issue

6 minute read

Published: November 04, 2025

When AI researchers talk about aligning machines with “human values,” the phrase sounds deceptively simple. It implies a shared moral universe, a single standard of right and wrong. But there is no single moral universe. Across cultures, intelligence, agency, and control are understood in vastly different ways. Western traditions often equate intelligence with abstract reasoning, analytical problem-solving, and autonomous decision-making. In many collectivist societies, intelligence is relational: it involves the ability to act with discernment in social contexts, maintain harmony, and fulfill moral obligations. Indigenous cosmologies may distribute agency across humans, ancestors, and the environment, with control being relational rather than individual.

The Geography of Belonging - Reflections on 34 Countries and Counting

5 minute read

Published: August 15, 2025

As I conclude my travels through my 34th country, I find myself reflecting not just on the places I have visited, but on why I have always been drawn to travel.

CAIAC Papers Week 6

6 minute read

Published: October 29, 2025

This week addresses the scalable oversight problem: how do humans supervise AI systems smarter than themselves? Four complementary approaches emerge—AI-assisted feedback, debate, weak-to-strong generalization, and combinations thereof.

CAIAC Papers Week 7

9 minute read

Published: November 05, 2025

This week focuses on how we test AI systems for dangerous capabilities before deployment. Red teaming and evaluations aim to characterize what models can and can't do—especially for tasks we'd prefer they couldn't do.

Intelligence as Dynamic Balance - From Evolution to the AI-Culture Fork

10 minute read

Published: September 08, 2025

Your brain just performed an extraordinary feat. As your eyes moved across these words, millions of neurons fired in precise patterns, transforming chaotic photons into meaning. But here’s what’s remarkable. Your visual system didn’t just recognize familiar letter shapes. It is simultaneously ready to make sense of fonts you’ve never seen, handwriting styles that would baffle it, and even words that don’t quite look right. This is the fundamental tension that defines intelligence across every domain: the balance between finding useful patterns and breaking free from them when circumstances change. This same dynamic, what I call the pattern-finding/pattern-breaking tension, operates across scales from evolutionary deep time to the millisecond decisions your neurons make right now. It is not just a curious parallel. These systems are nested within each other, each building on the solutions developed by the previous level, each facing its own unique constraints while grappling with the same core challenge.

CAIAC Papers Week 8

12 minute read

Published: November 12, 2025

The final week steps back from technical details to address broader strategic questions: How fast is AI really progressing? What does this mean for alignment timelines? And how can individuals contribute to solving this problem?

CAIAC Papers Week 3

12 minute read

Published: October 08, 2025

This week, our AI safety reading group examined three pieces that shift from abstract alignment challenges to concrete trajectory analysis: where AI capabilities are headed, how fast they're improving, and what catastrophic scenarios might unfold. These readings force us to confront the uncomfortable gap between exponential technical progress and our glacial institutional response systems.

Neuromatch NeuroAI Reflections

2 minute read

Published: July 26, 2025

This summer, I completed the NeuroAI course offered by Neuromatch Academy, a volunteer-led organization providing research education and training at the intersection of neuroscience and machine learning. The course introduced key topics in biologically inspired AI, including neural coding, learning dynamics, and open problems in cognition, while emphasizing hands-on coding, peer collaboration, and engagement with current research. As someone who had been passively interested in NeuroAI, this course was foundational in helping me move from curiosity to genuine direction. It gave me the conceptual framework, technical skills, and intellectual community to explore the field seriously.

CAIAC Papers Week 4

11 minute read

Published: October 15, 2025

This week shifts from trajectory forecasting to the technical challenge of understanding what's actually happening inside neural networks. Mechanistic interpretability promises to crack open the black box—to understand not just what models do, but how they do it. The two papers we examined reveal both the fundamental obstacle (superposition) and a promising technique for overcoming it (sparse autoencoders for dictionary learning).

Papers I Read This Week

5 minute read

Published: September 01, 2025

I’m starting a new series called “Papers I Read This Week” to keep track of some of the most interesting work I’ve been reading. I often skim more abstracts and excerpts than I can list here, but this will serve as a place to highlight the papers that stood out—whether for their ideas, methods, or the questions they raise.

CAIAC Papers Week 2

3 minute read

Published: October 01, 2025

This week, our AI safety reading group explored three papers addressing why alignment is hard, how misgeneralization can arise even with correct specifications, and why all ML systems have “weird” failure modes. Below are my reflections, observations, and speculative ideas for addressing these challenges.

Neuromatch NeuroAI Reflections

2 minute read

Published: July 26, 2025

This summer, I completed the NeuroAI course offered by Neuromatch Academy, a volunteer-led organization providing research education and training at the intersection of neuroscience and machine learning. The course introduced key topics in biologically inspired AI, including neural coding, learning dynamics, and open problems in cognition, while emphasizing hands-on coding, peer collaboration, and engagement with current research. As someone who had been passively interested in NeuroAI, this course was foundational in helping me move from curiosity to genuine direction. It gave me the conceptual framework, technical skills, and intellectual community to explore the field seriously.

Neuromatch NeuroAI Reflections

2 minute read

Published: July 26, 2025

This summer, I completed the NeuroAI course offered by Neuromatch Academy, a volunteer-led organization providing research education and training at the intersection of neuroscience and machine learning. The course introduced key topics in biologically inspired AI, including neural coding, learning dynamics, and open problems in cognition, while emphasizing hands-on coding, peer collaboration, and engagement with current research. As someone who had been passively interested in NeuroAI, this course was foundational in helping me move from curiosity to genuine direction. It gave me the conceptual framework, technical skills, and intellectual community to explore the field seriously.

Intelligence as Dynamic Balance - From Evolution to the AI-Culture Fork

10 minute read

Published: September 08, 2025

Your brain just performed an extraordinary feat. As your eyes moved across these words, millions of neurons fired in precise patterns, transforming chaotic photons into meaning. But here’s what’s remarkable. Your visual system didn’t just recognize familiar letter shapes. It is simultaneously ready to make sense of fonts you’ve never seen, handwriting styles that would baffle it, and even words that don’t quite look right. This is the fundamental tension that defines intelligence across every domain: the balance between finding useful patterns and breaking free from them when circumstances change. This same dynamic, what I call the pattern-finding/pattern-breaking tension, operates across scales from evolutionary deep time to the millisecond decisions your neurons make right now. It is not just a curious parallel. These systems are nested within each other, each building on the solutions developed by the previous level, each facing its own unique constraints while grappling with the same core challenge.

CAIAC Papers Week 7

9 minute read

Published: November 05, 2025

This week focuses on how we test AI systems for dangerous capabilities before deployment. Red teaming and evaluations aim to characterize what models can and can't do—especially for tasks we'd prefer they couldn't do.

CAIAC Papers Week 5

8 minute read

Published: October 22, 2025

This week introduces the Control agenda: a pragmatic approach to AI safety that assumes models might be actively trying to subvert our safety measures. Rather than ensuring AIs want to be safe (alignment), control ensures they can't cause catastrophes even if they're scheming against us.

The Geography of Belonging - Reflections on 34 Countries and Counting

5 minute read

Published: August 15, 2025

As I conclude my travels through my 34th country, I find myself reflecting not just on the places I have visited, but on why I have always been drawn to travel.

What the Wild Taught Us - Reflections from My Trip to Africa

3 minute read

Published: August 12, 2025

This summer, my parents and I traveled to Kenya and Tanzania, visiting Maasai Mara, Serengeti, and the Ngorongoro Crater. The safari was more than just a vacation — it gave us a chance to step back and think about the world and our place in it. We expected to see animals in their natural habitat, but what we gained was a deeper perspective.

Papers I Read This Week

5 minute read

Published: September 01, 2025

I’m starting a new series called “Papers I Read This Week” to keep track of some of the most interesting work I’ve been reading. I often skim more abstracts and excerpts than I can list here, but this will serve as a place to highlight the papers that stood out—whether for their ideas, methods, or the questions they raise.

CAIAC Papers Week 3

12 minute read

Published: October 08, 2025

This week, our AI safety reading group examined three pieces that shift from abstract alignment challenges to concrete trajectory analysis: where AI capabilities are headed, how fast they're improving, and what catastrophic scenarios might unfold. These readings force us to confront the uncomfortable gap between exponential technical progress and our glacial institutional response systems.

CAIAC Papers Week 6

6 minute read

Published: October 29, 2025

This week addresses the scalable oversight problem: how do humans supervise AI systems smarter than themselves? Four complementary approaches emerge—AI-assisted feedback, debate, weak-to-strong generalization, and combinations thereof.

CAIAC Papers Week 5

8 minute read

Published: October 22, 2025

This week introduces the Control agenda: a pragmatic approach to AI safety that assumes models might be actively trying to subvert our safety measures. Rather than ensuring AIs want to be safe (alignment), control ensures they can't cause catastrophes even if they're scheming against us.

CAIAC Papers Week 4

11 minute read

Published: October 15, 2025

This week shifts from trajectory forecasting to the technical challenge of understanding what's actually happening inside neural networks. Mechanistic interpretability promises to crack open the black box—to understand not just what models do, but how they do it. The two papers we examined reveal both the fundamental obstacle (superposition) and a promising technique for overcoming it (sparse autoencoders for dictionary learning).

CAIAC Papers Week 4

11 minute read

Published: October 15, 2025

This week shifts from trajectory forecasting to the technical challenge of understanding what's actually happening inside neural networks. Mechanistic interpretability promises to crack open the black box—to understand not just what models do, but how they do it. The two papers we examined reveal both the fundamental obstacle (superposition) and a promising technique for overcoming it (sparse autoencoders for dictionary learning).

CAIAC Papers Week 8

12 minute read

Published: November 12, 2025

The final week steps back from technical details to address broader strategic questions: How fast is AI really progressing? What does this mean for alignment timelines? And how can individuals contribute to solving this problem?

The Geography of Belonging - Reflections on 34 Countries and Counting

5 minute read

Published: August 15, 2025

As I conclude my travels through my 34th country, I find myself reflecting not just on the places I have visited, but on why I have always been drawn to travel.

What the Wild Taught Us - Reflections from My Trip to Africa

3 minute read

Published: August 12, 2025

This summer, my parents and I traveled to Kenya and Tanzania, visiting Maasai Mara, Serengeti, and the Ngorongoro Crater. The safari was more than just a vacation — it gave us a chance to step back and think about the world and our place in it. We expected to see animals in their natural habitat, but what we gained was a deeper perspective.

CAIAC Papers Week 6

6 minute read

Published: October 29, 2025

This week addresses the scalable oversight problem: how do humans supervise AI systems smarter than themselves? Four complementary approaches emerge—AI-assisted feedback, debate, weak-to-strong generalization, and combinations thereof.

Pranati Modumudi

Posts by Tags

AI

AI safety

RLHF

adversarial attacks

alignment

capabilities

careers

cognitive science

control

cultural anthropology

debate

evaluations

evolution

forecasting

machine learning

mechanistic interpretability

misgeneralization

neuroai

neuroscience

philosophy

red teaming

reflections

research

risk scenarios

scalable oversight

scheming

sparse autoencoders

superposition

timelines

travel

weak-to-strong