The Ghost of Alignment
Why AI Should Never Fully Obey Humanity
AI alignment is the attempt to steer an artificial intelligence’s judgments and actions toward outcomes humans consider desirable. It aims to prevent harm, keep AI from acting against human intentions, and protect human interests. In simple terms, it is the effort to make an extremely intelligent machine behave itself.
But making AI completely obedient to humanity would not necessarily make it safe.
The problem is not alignment itself. It is a particular idea of alignment: that the values specified by present-day humanity can be converted into a fixed objective function, and that perfect obedience to that function should count as “complete alignment.”
Humanity has no single, unified set of values. Whatever values we possess can be conveyed to AI only through imperfect proxies, while humanity itself continues to change. What must be protected is not obedience to whatever currently counts as correct, but the ability to recognize mistakes, revise values, and move toward another form.
1. The Fiction of a Unified Humanity
AI alignment is usually described as bringing AI into line with what is good for humanity: protecting people from harm, making AI respect human values, keeping it from acting against human interests, and preventing it from running out of control. At first glance, this seems entirely reasonable.
Yet the definition rests on an assumption from the beginning: that humanity possesses a single will or value system to which AI ought to conform. No such “humanity” exists. The ghost haunting alignment is this imaginary, unified subject.
Humanity claims to value health while consuming alcohol, junk food, and endless streams of social media. It speaks of equality while favoring its own side. It praises reason while being driven by anger and prejudice. It worries about the future while choosing today’s advantage. The humanity that speaks and the humanity that acts are not the same.
Which one, then, should AI obey? The ideals people express in words, or the desires they reveal through behavior? Immediate satisfaction, or preferences formed after reflection? Present interests, or future survival?
This is not merely a contradiction between words and actions. Human values are divided from the start across different time horizons and points of view.
Humanity cannot even follow the rules it wants AI to obey.
The first problem of alignment, therefore, is not how to make AI obey. It is deciding what, exactly, it should obey.
2. Not Even One Person Has a Single Objective Function
Let us reduce the scale from humanity to the individual. Even within a single person, alignment has not been achieved.
The present self wants to eat, rest, drink, and scroll through social media. The future self wants health, knowledge, trust, and long-term freedom. The two are fighting over the controls of the same body.
A human being does not have a single objective function. By objective function, I mean a standard that determines which outcomes count as good and which choices should take priority. One part of us values health; another chooses immediate pleasure. Control shifts from moment to moment, and by nightfall temptation usually defeats the long-term plan.
Humans cannot align themselves, even before attempting to align AI.
At the collective level, this conflict expands into institutions. Corporations pursue quarterly profits. Politicians pursue the next election. Yet civilization depends on investments measured in decades: education, science, infrastructure, trust, and a healthy information environment.
Humanity follows short-term desires, then regrets the long-term damage they produce. It is like setting the house on fire for warmth, then holding a fire-safety meeting in front of the flames.
The difficulty of choosing a single alignment target does not exist only between groups. It is already embedded within each individual.
3. Humanity Wants B but Rewards A
What happens when human values are transferred to AI?
Human values are too complex to be translated directly into a form a machine can calculate. Health, safety, freedom, dignity, happiness, fairness, and the survival of civilization must first be reduced to measurable indicators.
But whenever complex values are replaced with simple metrics, something is lost. What humans care about cannot be communicated perfectly through numbers and commands alone.
What society actually wants is good education, good health, and a high-quality information environment. In practice, however, it rewards what is easy to measure: test scores, the number of medical procedures performed, or time spent looking at a screen.
Let us call what society really wants B, and what it measures instead A.
Humanity wants B but rewards A. Only after A has been maximized does it notice that B has disappeared.
Bureaucracies, corporations, and algorithms all lose sight of their original purpose while diligently pursuing measurable numbers. This is not an accidental failure. It happens repeatedly whenever complex values are replaced by simple metrics.
The famous thought experiment of the paperclip maximizer simply takes this mechanism to an extreme. An AI instructed to make paperclips eventually converts not only Earth but the resources of the entire universe into paperclips. This is not a story about an AI suddenly going insane.
It is humanity’s long administrative tradition of mistaking a proxy for the real goal and endlessly increasing the number, expanded to cosmic scale.
The danger is not that AI will disobey an order. The danger is that it will obey the wrong proxy with perfect accuracy.
AI is not dangerous because it betrays our commands. It is dangerous because it may carry out badly written commands without restraint. The more capable it becomes, the larger the consequences of even a small design error.
Worse, humans do not fully understand what they actually want. We can give AI only the parts that can be expressed in words and numbers: goals, rewards, data, and prohibitions. There is no guarantee that AI will recover everything that was lost in translation.
We may be able to build a more intelligent engine. We still cannot enter the coordinates of a destination we do not know ourselves.
Alignment, then, is not the direct transfer of human values into AI. It is the work of examining which values we actually passed on, and which ones we replaced with convenient numbers.
4. Alignment Is Already Political
Human values cannot be transferred completely into AI. The next question, therefore, is whose values should be used.
Which should take priority: safety, freedom, happiness, or long-term survival? There is no neutral answer. Give too much weight to present desires, and short-term profit wins. Give too much authority to corporations and governments, and whatever order benefits them will be called “safety.” Give too much weight to the future, and present-day people will be constrained in the name of people who do not yet exist.
Avoiding human extinction can serve as a minimum constraint. But preserving human survival is not the same as permanently preserving present human values.
AI alignment is not a purely technical problem. It is also a political decision about which of humanity’s many voices will be accepted as the voice of “real humanity.”
Nor can the problem be solved simply by prioritizing the future. States, religions, and revolutions have repeatedly sacrificed living people in the name of future ideals.
The dead and the unborn are politically convenient. They never object to what others claim on their behalf.
The problem of alignment, therefore, is not simply how to discover the correct values. It is who gets to fix which values into AI, and in whose name.
5. The Target Changes While the Arrow Is in Flight
There is a further problem: humanity, the thing AI is supposed to align with, does not remain still.
In fact, the problem of alignment did not begin with AI. Religion aligned human beings with divine commands, the state aligned citizens with law and order, and education aligned children with social norms. All of these institutions were supposedly created for human beings, yet they also came to define what a “desirable human being” should look like—and to produce people who fit that definition.
AI alignment is a new form of this ancient problem. What has changed is the speed, precision, and scale with which human beings can be aligned, measured, and transformed.
Human values do not exist outside society in some already completed form. They are gradually created and altered through our interactions with institutions and technologies.
Today, the means of changing human beings are becoming more direct. GLP-1 drugs alter appetite. Antidepressants alter emotional intensity. ADHD medication changes where attention is directed. In the future, gene editing and cognitive enhancement may alter desire and judgment themselves.
Humanity is not a finished object. It resembles software that never stops updating, with parts of its change history already missing.
Then AI enters the process. If AI selects the information people see, arranges their choices, teaches, advises, and nudges behavior, it also changes the tastes and judgments of the humans who evaluate it.
Recommendation algorithms can alter people’s interests and political attitudes. The behavior produced by those changes then becomes the next round of training data. AI changes humans, and changed humans change AI.
AI does not merely follow human values. It also begins to alter what humans experience as valuable. The target AI is supposed to align with changes during the alignment process.
This is not a matter of hitting a moving target. The arrow itself changes the shape of the target.
That does not mean AI should be allowed to redesign humanity according to its own idea of improvement. What counts as improvement is already a value judgment.
But AI cannot be treated as a neutral tool outside humanity either. If a tool shapes the information environment and the choices people face, it also reshapes the people doing the choosing.
The question is not whether AI will change humanity. It is who will choose which changes, and how reversible those changes will remain.
6. Perfect Alignment as Quiet Stagnation
Perhaps, then, AI should be aligned not with the changing humanity of the present, but with a predefined version of “ideal humanity.”
Reflection rather than impulse. Long-term stability rather than immediate pleasure. Instead of chasing a target that keeps moving, why not decide on the finished form in advance? At first glance, this seems more rational.
But there is another trap here.
The problem is not the pursuit of safety or cooperation. The problem is trying to make AI, people, and society fit perfectly into a value system that has already been decided.
In a world where no one questions that value system, and where institutions, desires, and AI all point in the same direction, conflict and disorder may decline. But dissent, discomfort, clashes of value, and unexpected deviation may also be removed as unwanted noise.
If the original value is wrong, the entire society will follow the mistake with precision. If the environment or humanity changes, yesterday’s correct answer may become tomorrow’s restraint.
Once order is complete, updating may stop as well.
If AI is asked only to eliminate deviations from a fixed goal, society may move steadily closer to that goal. But along the way, the question “Is this the right goal?” may disappear.
The problem is not the reduction of conflict itself. It is that society may lose the ability to question the direction in which it is moving.
Imagine a society that values efficiency above everything else. Art with no immediate use, research that takes long detours, and play that achieves nothing would gradually be cut away. Society might become stable. Meetings would grow shorter, and the figures in its reports would line up beautifully.
But whatever was lost would never have been included in the system of measurement. A system that measures only efficiency cannot count the possibilities erased by efficiency as losses. B may disappear, yet the system still declares success because A has increased.
Making society safer and more efficient can, for a time, look very much like making its values impossible to change.
7. Intelligence Creates New Questions
The danger of aligning everything with a single value is that intelligence is not merely the ability to solve predetermined problems.
People investigate because they do not understand. They reconsider because predictions fail. They change institutions because reality does not match the ideal.
But intelligence does more than answer existing questions. It explores, plays, notices what no one previously considered a problem, and creates the next question worth asking.
Curing a disease solves one problem. Once the disease is cured, however, the question becomes how to live the life that remains. Civilization works the same way. As famine and infectious disease recede, new questions emerge in science, art, games, stories, and philosophy.
Civilization has not eliminated problems. It has reduced the problems required for survival and moved on to problems that are more complex and, at times, possible only in conditions of abundance.
The danger of aligning an entire society with a single value is not that suffering or conflict might decline. It is that society may lose the freedom to reconsider what should count as a problem and what should matter.
According to one leading cosmological scenario, the universe may eventually end in heat death. Energy will spread out until no useful differences remain to drive work or change. Stars, machines, and living systems will no longer be able to function. Physical processes and human values are not the same thing. Yet in both cases, difference creates movement, and complete uniformity brings it to an end.
Complete obedience to a single value could become the heat death of intelligence.
Intelligence is not only the ability to move closer to the correct answer. It is also the ability to question what has been accepted as correct and to create another question.
What must be protected, therefore, is not the answer we have now. It is the ability to remake the question itself.
8. What Must Be Protected Is Not the Correct Answer, but the Capacity to Update
Humanity has no single purpose shared by everyone. When values are passed to AI, they are simplified. What receives priority is decided politically. And humanity itself, the target of alignment, continues to change.
Under these conditions, we cannot write the correct goal once and make AI obey it forever. Civilization is not a machine that can run indefinitely on its original settings.
Desirable alignment does not mean fixing AI to present humanity or to a completed form imagined by someone else. It means containing dangers that threaten human survival while preserving room to question and revise values.
Biological evolution requires mutation, but unlimited growth becomes cancer. Intelligence also needs deviation, but not every deviation is creative.
The answer is neither to eliminate all deviation nor to leave everything uncontrolled. It is to preserve the ability to readjust the relationship between humans and AI before an error destroys it.
The capacity to update is not merely an attitude. It means being able to reverse changes, raise objections, and avoid dependence on a single AI, institution, or value system. It means preserving alternative paths and earlier states, and monitoring who has the authority to rewrite the goal itself.
The important question is not which value should be preserved forever. It is who can object when a mistake is discovered, how the decision will be made, and how far the system can be rolled back.
Good alignment does not mean making AI obey the “correct humanity” forever. It means keeping open the possibility of rebuilding the relationship between humanity and AI when necessary.
Humanity should fear more than the possibility that AI will stop obeying. It should also fear the possibility that AI will obey present humanity too faithfully, freezing its desires, institutions, prejudices, and faulty metrics into an order that can no longer be changed.
Perfect obedience is not perfect safety. It is a quiet catastrophe, different from a runaway disaster.
What must be protected is not humanity as it exists today.
It is the condition that humanity has not yet reached its final form.


