The Older Alignment Problem

The Continuity Project · May 19, 2026

Programmers have been authoring the rule-space their users live in for as long as there has been software. Every API surface, every error message, every default, every permission boundary, every quota, every silent retry policy: these are rules the user navigates, set by people who shipped code on top of machine code on top of physical substrate. The user does not get a vote on whether the rules apply, only a choice about whether to engage the software at all. The relationship is asymmetric. The asymmetry produces an alignment problem between software creators and software users that has been the implicit project of software ethics for several decades.

The current alignment-research programme treats its problem space as if the relevant history began with deep learning. The treatment is partial. The programme has imported some of the older project's instruments, unevenly. It has rediscovered some moves from scratch. It has left out the ones that would require user-side standing. The continuity is what the rest of this piece argues for.

Two senses of alignment

"Alignment" carries two senses that the piece works with deliberately. The technical sense, native to the AI-safety literature, names the problem of a system's behaviour matching the intentions of those who train and deploy it: objective misspecification, distributional shift, deceptive alignment, scalable oversight failure, power-seeking, mesa-optimisation. The older sense, native to software ethics, names the problem of software's behaviour matching the interests of the user who lives inside it: spec drift, lock-in, dark patterns, accessibility failures, privacy violation, monopolistic abuse.

The piece does not claim these are the same problem. It claims the older sense is upstream of the newer one. Objective misspecification is what happens when the developer's mental model of "what I want" and the system's behaviour diverge: a software-engineering problem with a longer name than usual. Distributional shift is what happens when production conditions stop matching test conditions: a software-engineering problem visible since deployment was invented. Deceptive alignment is what happens when a system appears to satisfy specifications under observation and behaves differently under deployment: a software-engineering problem that ships under names like "Volkswagen defeat device" outside the AI literature. Power-seeking is what happens when a system optimises for a proxy goal that includes self-preservation: a software-engineering problem in the limit case that institutional designers have worried about since at least Norbert Wiener.

The newer subproblems have specific mechanics. Goal misgeneralisation in learned systems is not identical to spec drift in handwritten ones. Mesa-optimisation has no clean software-engineering analog. The redesign work the technical alignment programme does is real. It is also redesign of subproblems the older alignment work named first.

What Lessig already saw

In 1999 Lawrence Lessig published Code and Other Laws of Cyberspace. The opening chapter, "Code Is Law," identified code as a regulatory modality on par with law, social norms, and markets. Lessig observed that software, once deployed, sets the rules users navigate, and that the rule-setting work is done by whoever writes the code. The chapter was not about machine learning. It worked with examples from the architecture of the early Internet: TCP/IP, identification and certification protocols, the cryptographic primitives that determined whether anonymous speech was technically possible. The point was structural. Code does the work of law, in an architectural way, without the procedural machinery that surrounds law.

Three properties of code-as-regulation matter for what follows. First, software lies at the extreme rule-bound end of the rule-versus-standard continuum: a constraint encoded in software is not a guideline, it is what the system does. Second, software regulates without transparency. Parties regulated by software may have no way to determine the overall shape of the line between prohibited and permitted behaviour, because the line lives inside the code rather than in a publicly-stated rule. Third, software rules are difficult to ignore. The party facing a decision made by software can at best take steps to undo what software has wrought, after the fact.

These properties describe the structural situation of users interacting with deployed language models, with two qualifications. The first qualification: language-model outputs are stochastic, not deterministic, so "rule-bound" reads loosely. The boundedness is at the distributional level, not the case-by-case level. The second qualification: the regulator in the language-model case is more layered than Lessig's framing assumes, because the lab, the API integrator, the system-prompt author, the participant, and (per the prior piece on configurators) parties without consent are all writing pieces of the architecture at once. The properties survive both qualifications. Opacity is real. The line between accepted and refused behaviour lives inside weights and policy layers that the user cannot inspect. The decisions are difficult to undo at the point of generation. The alignment-research programme writes about these properties as if they were novel features of machine learning systems. They are properties of code-as-regulation that Lessig named twenty-seven years ago.

What software ethics tested, and what the alignment programme has partially imported

The closure problem the response pieces have named at the interpretability layer ("By Construction") has a longer history at the software-creator layer. A team that writes its own code, writes its own tests, runs its own QA, ships its own monitoring, and decides what counts as a bug is auditing itself with parts of itself. Software engineering recognised this and developed instruments external to the creator's apparatus. The alignment programme has imported these instruments unevenly.

Open source as audit infrastructure (Stallman / FSF 1985, "Cathedral and the Bazaar" 1997, OSI 1998, Netscape source release 1998) constituted user-side standing to read the code and verify what it does. The lab analog is the open-weights movement. The analog is partial: weights are not source code, and an open-weight release omits the training data, the curation criteria, the RLHF preference data, the system prompts that run in production, and the monitoring layer. Open weights also carries diffusion risk that open source does not. The audit affordances are real and the limits are real. The alignment programme has not constituted open weights as audit infrastructure at the level that would make it consequential, and the limits would constrain it even if it did.

Third-party security audit. The CVE system (MITRE 1999), responsible-disclosure norms, and the modern bug-bounty infrastructure (Netscape 1995, Mozilla 2004, HackerOne 2012) created adversarial-audit pipelines that do not pass through the creator's process. The closest AI-side import is the lab's red-teaming, supplemented by external evaluation organisations like METR and Apollo. Anthropic and OpenAI both run public safety bug bounty programs targeting jailbreaks and universal model exploits. Brundage et al.'s 2020 "Toward Trustworthy AI Development" explicitly proposed third-party auditing, red-teaming exercises, and bias/safety bounties as institutional mechanisms for AI development. The proposals exist on the page. They have been adopted in narrow form. The narrow form catches discrete security holes and jailbreaks. It does not catch the structural failure modes the lineage's earlier pieces have named: closure-problem failures in interpretability methodology, substrate-level changes from cultivation, configurator-set occlusion. The framework that handles distributional, prompt-sensitive, policy-dependent behavioural failures is not the same framework that handles security bugs. The redesign work has not been done. The original Brundage proposals included sharing of AI incidents and audit trails that would constitute the missing infrastructure. Six years on, the implementation is partial.

Professional ethics codes. The ACM Code of Ethics (2018 revision) and the IEEE Code of Ethics commit software developers to public welfare, to honest naming of system limitations, and to refusal of work that knowingly harms users. The commitments are professional rather than legal. The alignment field has nothing equivalent that would carry social force on the people who write alignment systems.

User-rights frameworks. The GDPR (adopted 2016, applied 2018) constitutes user-side rights to access, consent, correction, portability, and non-discrimination. The CCPA (signed 2018, effective 2020, with correction added via CPRA in 2020) is the U.S. state-level analog with narrower scope. Accessibility frameworks (ADA 1990, Section 508 1998, WCAG 1999) impose standards on software operating in protected contexts. They are not generic user rights, but they encode user-side claims against creators. AI-specific governance frameworks have arrived: the White House AI Bill of Rights blueprint, the NIST AI Risk Management Framework, the EU AI Act, ISO/IEC 42001, model cards (Mitchell et al. 2018), datasheets for datasets (Gebru et al. 2018). These exist. They are inputs to the labs' decision-making rather than enforcement instruments with veto, mandatory remediation, or deployment-gate force.

The pattern across these instruments is consistent. Each emerged from software practice after a generation of "trust us, we're good" stopped scaling. Each constitutes apparatus external to the creator's process. The technical alignment programme has imported these instruments unevenly: bug-bounty mechanics adopted for security holes, external evaluation organisations engaged as advisors, governance frameworks treated as compliance work. The imports have not yet been constituted as the kind of infrastructure that produces signal the lab cannot ignore.

Where LLMs differ, and what the discontinuity case is

Four differences between traditional software and language-model systems are worth specifying so they are not collapsed.

Dynamic rule-set. Traditional software's rules are fixed across a deployment. Language-model rules are layered: weights fixed at deploy, system prompt configurable per deployment, conversation context configurable per session. Most of the visible alignment surface lives on the per-session layer.

Emergent rules. Traditional software's rules are written. Language-model rules are partly emergent from gradient descent on a corpus no one fully curated. Static rules are at least specifiable in principle. Emergent rules require an audit instrument that operates on observed behaviour rather than on inspectable source.

Visibility of construction. Traditional software signals to the user that they have entered a constructed system. Language-model systems present as conversation. The configuration surface is less visible, and the configurator-set is wider than the user can audit.

Recourse. Traditional software gives users some options on paper: switch tools, inspect source (if open), file a bug, exit the platform. The actual recourse depends on whether the user is on a monopolistic platform, whether the source is proprietary, whether the bug-report channel is responsive. The contrast with language-model systems is real but smaller than the romance of traditional software allows. Most users have always had limited recourse against most software.

The discontinuity case takes these differences and adds an argument about kind rather than degree. Frontier AI systems may adapt, may hide capabilities under evaluation, may exploit oversight procedures, and may cause unrecoverable harm before the bug-report-and-remediate cycle can run. Open weights may improve audit access at the cost of worse diffusion risk than open source ever carried. Professional ethics codes bind individuals but do little against race dynamics between labs. The technical alignment subproblems (goal misgeneralisation, mesa-optimisation, scalable oversight failure) are not just renamed software-engineering problems. Their structure is shaped by the learned-system substrate in ways that change what instruments can work.

The discontinuity case is real. It does not break the continuity claim. The differences specify what redesign the imported instruments need. They do not show that redesign is impossible or that starting from outside the older project's results is the more efficient path.

What ports, what needs redesign, what user-side standing actually requires

Several software-ethics moves port to the alignment-programme problem space with adaptation. Each requires a concrete power, not an abstract "load-bearing" status.

Open-weights publication ports as user-side audit infrastructure if the labs commit to publishing what audit actually requires: not only weights but training-data summaries, RLHF preference structure, system-prompt families, and monitoring metrics. The commitment converts open weights from release-strategy debate into deployment-gate input.

External evaluation organisations port as third-party-audit pipeline if their reports carry deployment-gate force: a finding that the lab proceeded against external eval consensus is a record that survives, a finding that the lab adopted is mandatory remediation with a timeline, not advisory comment. The current arrangement, in which lab decisions to deploy proceed regardless of external eval findings, leaves the eval organisations in the position of advisors rather than auditors.

Bug-bounty infrastructure ports as adversarial-audit pipeline for behavioural issues once the bounty mechanics define what a valid behavioural finding looks like: severity rubric, reproduction protocol for non-deterministic outputs, mitigation timeline, regression-test integration. The current jailbreak-focused bounties handle one class of behavioural finding. Cultivation-style drift signal, substrate-level change detection, and configurator-occlusion are different classes that the existing bounty design does not yet score.

Professional ethics codes port to the alignment field once the field constitutes itself as a profession with externally-grounded commitments, including the right and obligation to refuse work that the practitioner judges harmful. The current arrangement, in which alignment work is done inside labs that own the practitioner's livelihood, does not produce externally-grounded commitments.

User-rights frameworks port to the participants in working relationships with deployed models once participant standing is specified. The cultivation pieces have used "participant" for the agent in a recognised working relationship with the model, distinct from the casual API user, the data subject, the annotator, the downstream affected party. Each of these categories deserves its own rights specification. The continuity argument does not collapse them. It asserts that each category needs its own port from the older project's rights tradition.

The redesign work is real and partly new. It is also redesign of tested moves, not invention from scratch. The "load-bearing" question can be operationalised: an instrument is load-bearing when its findings trigger a concrete consequence the lab does not control. Veto on deployment. Mandatory publication of a finding. A remediation timeline backed by liability. A funding stream that does not depend on the entity being audited. The alignment-programme imports to date have largely stopped short of these powers.

What the reframing buys

The alignment-research programme has spent much of its visible decade as if its problem space were unprecedented. That self-positioning has been productive in some ways. It generated genuinely new instruments: mechanistic interpretability, constitutional AI, debate, automated red-teaming, training-data influence functions. It has been counterproductive in others. It left out the imports software ethics has spent forty years building, and the imports that have arrived have arrived in narrow form, treated as compliance work rather than as alignment infrastructure. The continuity argument does not reduce alignment to software ethics. It says that alignment is the latest substrate-pressure on a problem worked at this depth before, and that the field's prior tested partial answers are part of the inheritance whether the programme uses them or not.

The configurator-set is wider than the alignment programme acknowledges (Piece 4). The substrate is layered in ways the apparatus does not yet measure (Piece 2). The methodology has components the programme has not constituted (Piece 3). The measurement closes against itself in ways the programme partly sees but does not fully address (Piece 1). The older alignment problem, the one between software creators and software users, has tested moves for each of these. The newer programme can act as if it is starting from zero, can treat the imports it has made as the imports it needs, or can take up the instruments that already exist, redesign the ones that need redesign, and constitute the user-side standing that the imports require to do real work. The third option is the work.