
AI News
01 Oct 2025
Read 17 min
ChatGPT mathematical reasoning study reveals how AI learns
ChatGPT mathematical reasoning study shows teachers how to use prompts to get AI to solve geometry.
The ancient puzzle in simple words
The classic challenge goes like this. You have a square with side length s. Its area is s × s, or s^2. To double the area, you want 2 × s^2. The square that has this area will have a side of s × √2. The diagonal of the original square is also s × √2. So if you use the old diagonal as the side of a new square, you get exactly twice the area. It is a neat trick. It shows how geometry can turn a hard idea into a simple move. Socrates used this to show how a learner might find truth step by step. The student starts wrong, but with guidance, they reach the answer. That slow climb is part of learning. Today, it also helps us see how AI behaves.What the ChatGPT mathematical reasoning study tested
The researchers did not stop at Plato’s square. They wanted to test generative behavior. They looked for signs that ChatGPT could build on a prior idea, try something new, and even make a wrong turn as it explored.Why pick Plato’s square?
The diagonal trick is not obvious in text. It is visual. Also, many people remember a related story from classes or books. But the exact reasoning steps may not be common in training data. If the model still reached the idea, it might be combining patterns in a fresh way.Why a rectangle next?
After the square, the team asked a follow-up: can you double the area of a rectangle using similar reasoning? Here, the main idea is simple. To double area, you can scale both sides by √2. If a rectangle has sides a and b, you can make a new rectangle with sides a × √2 and b × √2. Area doubles, and the shape stays the same. Classical geometry can construct √2 with a straightedge and compass. So a geometric path exists. The team posed this as a second test. Could the model extend the first idea and find a valid way to double a rectangle?What ChatGPT did — and why it surprised scientists
The model did not just recite facts. It tried to reason in context. It pulled ideas from the square problem and used them in the rectangle case. But it also made a strong mistake.The square: finding the diagonal trick
In the square case, the path is well-known. Many math students learn that the new side must equal the old diagonal. The model could describe or hint at this move because it often appears in math texts. That part was not shocking.The rectangle: a wrong claim
The surprise came with the rectangle. The chatbot said there was no geometric solution based on the diagonal. That is false. A geometric construction to double a rectangle exists. You do not need the rectangle’s diagonal to be the new side. You need to scale both sides by √2, which you can construct. The researchers called the claim “vanishingly unlikely” to exist in the training set. In other words, the model seemed to make up a response by blending the prior square talk with the new rectangle prompt. To the team, this looked “learner-like.” The bot drew on a past idea, tested it, and reached a wrong but reasoned answer. That is what many students do when they face a new problem. They try a nearby method first. Then they adjust. This behavior is what made the result interesting.Does this mean AI can “think”?
The study does not say ChatGPT thinks like a person. It does not claim the model understands geometry. It shows something different but useful: the model can imitate a learning process when prompts nudge it in that direction.Zone of proximal development, simplified
Educators use the phrase “zone of proximal development” to describe the sweet spot for learning. It is the gap between what a student can do alone and what they can do with a hint or a little help. The team suggests the model may act as if it is in that zone. With the right prompt, it can move from known patterns to a new but related task. It can try a hypothesis, make an error, and then refine its approach when pushed. For teachers, this is promising. It means a clear prompt can guide the model to show steps, not just final answers. It also means you can design prompts that make thinking visible, both for the AI and the student who reads it.The black box risk
AI still has a black box problem. We do not see the inner steps that lead to a claim. We see only the words that come out. That means we must check proofs and methods. A confident tone does not equal a correct result. The rectangle mistake is a warning: plausible language can mask a wrong move. The authors point out a key skill for modern math class. Students should learn to read, test, and fix AI-generated proofs. They should not accept them as final. That habit matches good math practice anyway. It also builds digital literacy.How teachers and students can use these lessons now
Good prompts and good checks help you get the best from AI. They also help you avoid slick but wrong answers. Here are practical steps you can use today.Prompt for process, not just results
If you ask for an answer, the model may jump to the end. If you ask for steps, it may reveal its path. When you want the model to explore, be explicit.- Say: “Walk through your reasoning. Show each step and why you chose it.”
- Say: “List two possible approaches. Compare pros and cons before doing one.”
- Say: “If you make an assumption, name it and explain how you will test it.”
- Avoid: “Just give me the final formula.”
Build a habit of verification
Treat AI work like a draft from a bright peer, not gospel truth. Check it.- Plug in numbers to test any formula the model gives.
- Draw a quick sketch to see if a geometric claim makes sense.
- Ask for a second method and compare results.
- Search a trusted textbook or source to confirm a theorem.
Use AI to lift, not replace, student thinking
You can use ChatGPT to spark ideas, create practice tasks, or outline a proof. But keep the student in charge of sense-making.- Have students critique an AI proof and mark its weak steps.
- Ask students to rewrite a proof in their own words, with a new example.
- Let students create counterexamples when the model overgeneralizes.
Leverage visual tools when needed
The study hints at a path forward: connect language models with geometry tools and theorem provers. In class, you can do a simple version of this.- Pair ChatGPT with dynamic geometry software to test constructions.
- Use a CAS or a graphing tool to verify algebraic claims.
- Cross-check a step with a formal proof assistant if available.
Limits, open questions, and next steps
This was one study. It looked at a classic problem and a single extension. We should not overread the results. Still, they point to valuable directions.Test newer models and more problems
Language models change fast. New versions often get better at step-by-step work. The team suggests testing newer models across a wider set of geometry and algebra tasks. That includes problems where visual reasoning is key and tasks where proof structure matters most.Combine tools to broaden “reasoning”
A strong next step is to link an LLM with:- Dynamic geometry systems to try constructions on the fly,
- Theorem provers to check logic, and
- Symbolic math engines to simplify expressions.
Develop better measures of reasoning quality
We need better ways to score reasoning, not just final answers. Useful measures might include:- Clarity of steps and justification,
- Ability to fix errors after feedback,
- Use of definitions and theorems at the right time, and
- Consistency across multiple approaches.
Teach the skill of reading AI proofs
The authors stress this point. Students should learn how to test AI outputs. That means scaffolds in the curriculum:- Short “trust but verify” checklists,
- Peer-review cycles that include an AI draft, and
- Reflections on how a proof was improved after checks.
What this means for AI and learning
The Cambridge-Hebrew University team saw a pattern many teachers know well. A learner meets a new task. They transfer a trick from a familiar case. It partly fits, partly fails. With a nudge, they adjust. The study suggests a language model can mimic that arc when the prompt invites it to do so. This does not prove that the model “understands” geometry. It does show how a well-aimed prompt can make its inner search visible. It also shows why we should slow down and inspect steps. A slick paragraph can hide a false claim, like the “no geometric solution” statement about doubling a rectangle. The right response to that risk is not fear. It is method: prompt for process, verify with tools, and teach students to judge arguments. This approach respects both math and AI. It uses the model’s fluency to generate plans, examples, and alternative paths. It uses human sense-making and external tools to secure truth. In that partnership, classrooms can move from answer-hunting to reasoning-building. The story that began with Socrates and a square now helps us shape how we learn with machines. The diagonal trick still matters. So do clear prompts, honest error-checking, and steady practice. Put those together, and you turn a clever chatbot into a better study partner for real thinking. In short, the ChatGPT mathematical reasoning study is not proof of machine understanding. It is a guide for better human-AI learning: ask for steps, test the claims, and use the right tools to make reasoning robust.For more news: Click Here
FAQ
Contents