The Intelligence Yardstick — Lab

Pick a definition of intelligence. Watch eight agents — human, LLM, AlphaFold, octopus, calculator, infant, ant colony, chess engine — re-rank. The yardstick decides the verdict.

An interactive companion to The Intelligence Yardstick — pick a definition of intelligence, watch eight different agents (human, GPT-4, AlphaFold, octopus, calculator, infant, ant colony, Stockfish) re-rank against each other. Same agents. Different rulers. Different rankings. That is the entire point.

Insightful AI World · Lab #3

The Intelligence Yardstick

Pick a definition of intelligence. Watch eight different agents — human, language model, calculator, octopus, chess engine, infant, ant colony, AlphaFold — re-rank against each other. The same agents look brilliant or trivial depending on which yardstick you apply. That is the entire point.

Before you start

Read this short panel first. It tells you what the lab is, what it is trying to make you see, and how you will know if you got there.

🎯 Purpose

This lab is an interactive comparison of five real, serious definitions of intelligence against eight different agents (human, frontier LLM, AlphaFold, Stockfish, octopus, calculator, one-year-old infant, ant colony). Each definition produces its own ranking. The lab is the ranking.

💡 What it is trying to make you see

That "intelligence" is not a single thing. It depends on the yardstick you measure it with. Most arguments about whether AI is "really" intelligent are, underneath, disagreements about which yardstick the speakers are silently using. The companion article explains this in words; this lab lets you feel it by switching the yardstick yourself and watching the ranking flip.

✅ What you should understand after playing

After a minute of clicking, you should leave able to:

  • Name at least three different definitions of intelligence in current use, and say in one sentence what each one measures.
  • Predict, roughly, how a given agent (e.g. a calculator, an infant) will rank under each definition — and explain why.
  • Catch yourself wanting to ask "smarter by which yardstick?" before answering any "is AI smarter than X yet?" question.

If those three are true for you when you leave, the lab did its job. If they are not, re-read the worked example below and try one more comparison.

How to use it — 30 seconds

  1. Pick a yardstick. Click any of the five definition buttons further down. The buttons are:
    • Psychometric — "intelligent = scores well on cognitive tests."
    • Biological — "intelligent = senses, processes, and acts toward goals in a real environment."
    • Generalisation — "intelligent = learns new tasks from very little data (Chollet)."
    • Turing — "intelligent = can converse indistinguishably from a human."
    • Multiple — "intelligent = has several distinct cognitive abilities (Gardner)."
  2. Look at the eight agents. Same agents every time, in order of score under the current yardstick.
  3. Switch yardsticks. Click a different button. Watch the ranking re-order.

A worked example — try these two yardsticks

Click Psychometric. The one-year-old infant lands near the bottom (score 6) — they cannot take an IQ test.

Now click Generalisation. The same infant jumps near the top (score 92) — they learn new things from very few examples better than almost anything else on the list.

Same infant. Did not get smarter or dumber. The ruler changed — and the ranking flipped.

A second comparison to try: the frontier LLM is near the top on Psychometric, but drops sharply on Generalisation. The calculator scores 0 on Generalisation but is not 0 on Biological. Each of these reorderings is real, and each is defensible under its own definition.

Pick a definition

Each one is in active use somewhere in the literature. None is a consensus. Try them in any order.

Currently scoring against

Ranking

Scores are illustrative — calibrated to capture the directional consensus in the literature for each definition, not precise numbers. The point is the re-ordering across definitions, not the absolute values.

What you're actually looking at

Five definitions, eight agents, forty scores. Each definition was designed to capture something real about intelligence — and each, applied honestly, ranks the agents differently.

Watch the infant column closely. Almost nothing in the world is more intelligent than a one-year-old human under Chollet's generalisation-efficiency yardstick — they acquire skills from a handful of examples that no AI system can match. Under the psychometric yardstick they barely register, because they cannot complete a vocabulary subtest. Both rankings are correct under their definition. They are answering different questions.

Watch the calculator too. Under the biological frame (process information, integrate, act) a calculator is the simplest possible case of intelligence — and not zero. Under the generalisation frame it is identically zero. The disagreement is not factual; it is definitional.

This is why arguments about whether AI is "really" intelligent so rarely converge. The participants are using yardsticks they have not named, getting different rankings, and assuming the other person is wrong about facts when the actual disagreement is about the ruler.

Lab #3. Companion to What is intelligence? The definition AI inherits. Feedback welcome.