Latent Program Networks (LPNs) are one of the more interesting recent attempts at solving ARC without using a human-designed DSL. The core idea is appealing: learn a continuous space of programs, let a decoder act as a neural “executor,” and then perform gradient-based search at test time to adapt to new tasks.
After digging into the paper, I found LPN both promising and fundamentally limited. In particular, LPN has no real mechanism for multi-step or compositional reasoning, and the architecture itself makes such reasoning extremely unlikely to emerge.
This post explains why — and outlines a research direction that could address the issue.
Strengths of LPN
LPN sidesteps many problems in conventional program synthesis approaches
- No DSL required: Programs are represented as vectors rather than symbolic programs
- A fully neural executor. The decoder transforms an input grid into an output grid.
- Built-in search: LPN performs gradient ascent in the latent program space at test time.
- Interpretable latent space: Visualizations in the paper show smooth transitions between different program behaviors as the latent vector is moved around.
- Smooth latent program space: As the network is trained with ELBO loss, the learned latent program space is smooth, placing similar latent programs closer.
These features make LPN a natural extension of VAE to program induction model.
Limitations of LPN: One-Step Execution
LPN decodes the latent program in one-step. Even though it has test-time adaptation where it refines the latent program with gradient ascent, it is not performing step-by-step composition of program execution. The decoder receives a single latent program vector and an input grid to directly emit the output grid in a single forward pass. This means that the model cannot apply an operation step by step and the latent program vector should be able to represent both very complex and atomic programs. I suspect this is the core limitation of LPN on compositional domain, and why it did not perform well on ARC-AGI-2.
Gradient Search Helps, But Only Up to a Point
One of LPN’s most impressive traits is test-time search: you can refine the program vector using gradient ascent on the few provided examples. The paper shows huge performance jumps when more gradient steps are used.
However, this search is limited to adjusting the location of the program vector and finding a better global transformation.
It cannot:
- invent sub-programs
- combine programs
- create multi-step procedures
- change the computational structure inside the decoder
Search improves which point in the latent manifold is selected, but it cannot change how the decoder computes. Furthermore, the space of programs are fundamentally not smooth, and naive gradient ascent on ELBO enforced manifold would struggle in learning the structure of program space.
Similarity and Structure
Base on the raining objective, the latent space naturally organizes programs by similarity. This is visually confirmed by:
- smooth latent traversals where patterns change gradually
- clusters of similar programs in the t-SNE visualization on p. 39
But nothing in LPN’s design encourages the latent space to capture compositional structure — how programs combine with one another.
This is similar to early word embeddings:
- “king” and “queen” end up close together
- but embeddings alone don’t let you compose grammar
In LPN:
- similar programs end up close together
- but compositions like “rotate then reflect” have no special geometric relationship to “rotate” and “reflect”
The training objective simply doesn’t enforce any algebraic or structural relationships in the latent space.
Adding Compositional Structure to Latent Program Space
Instead of just learning a smooth space of programs, how about learning a space where composition makes geometric sense.
This opens multiple possibilities.
- Latent operators for composition: Introduce learnable operators that combine two latent programs into a new one that the decoder interprets as their composition.
- Training signals that enforce compositionality: Use datasets or augmentations where composed programs appear. Encourage the embedding of a composed program to reflect the combination of the embeddings of its parts.
- Structured latent representations: Split the latent vector into slots, each representing a primitive. The decoder could execute a sequence of slots, pushing the architecture closer to a differentiable interpreter.
- Multi-step decoders: Allow the decoder to run for multiple internal steps, using different parts of the latent vector at different computation stages.
- Latent “execution traces”: Instead of decoding the final grid directly, decode a sequence of latent actions — a learned DSL, but discovered rather than designed.
Any of these moves LPN toward a world where a latent program is not a monolithic function but a composable structure.
Towards a Geometric Latent Space Respecting the Program Composition
LPN is an elegant idea and a meaningful step forward:
- It learns a continuous program space.
- It adapts at test time using gradient search.
- It offers a level of interpretability rarely seen in purely neural approaches.
But because the decoder executes the entire program in a single shot, LPN fundamentally lacks the ability to represent multi-step reasoning or program composition.
To go further — especially for ARC — we need latent program spaces where geometry reflects structure, and where “adding” programs can meaningfully approximate composing them.
That’s the research direction I’m most excited about: latent program spaces with built-in compositionality.