Intrig Research

AIM: Analogical Invariance Machine

January 18, 2026

Written by

Generalizing $BC=AD$

A:B = C:D=E:?

What if they were just numbers?

Utilizing our knowledge in a consistent number system, we know the fact that

{A \over B} = {C \over D} = {E\over ?}

and to solve for $?$ , we would

? = {{BE}\over A} = {{DE}\over C}

In more familiar expression, we know that

\begin{align*} BC &= AD \\ BE &= A?\\ DE &= C? \end{align*}

and this fact is not trivial. Focusing on

\begin{align*} A:B &= C:D\\ &\Downarrow\\ BC &= AD \end{align*}

this summarizes the implied ‘equality of relevance’ into a single equality, which does not directly compare compare each objects but their cross products.

Can we generalize this to ARC tasks?

We do not have a ‘number system of ARC grids’ neither the operation defined on such number system. Let’s generalize such operator.

We can train an operator such that for all pairs of input-output pairs,

B \otimes C \approx A \otimes D

Such trained operator will form a small consistent system of ARC Grid pairs where those ‘analogical invariance’ holds.

Then, how do we use it to infer the output?

In number systems, the reason we were able to solve for $?$ was we have the multiplicative inverse.

\begin{align*} BE &= A?\\ &\Downarrow\\ ? &= A^{-1}BE \end{align*}

But, in our learned operator, we only have the ‘multiplication’ defined, and we cannot directly find the multiplicative inverse.

If we did not know the concept of division, we would have solved for $?$ by trying out values for $?$ which satisfies the equation $BE=A?$ . It is basically searching the space of $?$ to find a solution that satisfies the expression. We can do something similar.

To solve for $BE = A?$ , we can first calculate $B \otimes E$ with the learned operator.

We can search the space of $?$ to find one such that the calculated analogical invariance from $B \otimes E = A \otimes ?$ is satisfied.

The search algorithm here is gradient descent, and this process is the inference in test time.

To make this search via gradient descent in test-time feasible, the $?$ will be held back to the normal distribution by regularizing by its divergence from normal distribution, like VAE.

From $n$ training example pairs, we can train the neural operator on $n \choose 2$ pairs. Furthermore, the learned neural operator only has to capture the analogical invariance, instead of both learning to represent the input and output, and the implied program, all entangled together.