Shoothill AI Signal

Live · 13 models tracked

The early-warning system for the AI tools your team uses.

The AI tools your team uses change without warning. The one you trusted last month might be making things up or ignoring instructions today, and you'll only find out when a customer or your accounts team does. Shoothill AI Signal watches for you and flags it the moment something slips. Free, forever.

Talk to our teamRegular updates · Sample methodology published
Claude Opus 4.797.0 2.1GPT-5.494.0 1.4Gemini 2.5 Pro93.3 0.8GPT-5.4 mini90.5 0.4Gemini 2.5 Flash87.8 0.3Claude Haiku 4.587.5 1.2Claude Sonnet 4.581.5 3.8Claude Opus 4.797.0 2.1GPT-5.494.0 1.4Gemini 2.5 Pro93.3 0.8GPT-5.4 mini90.5 0.4Gemini 2.5 Flash87.8 0.3Claude Haiku 4.587.5 1.2Claude Sonnet 4.581.5 3.8
https://signal.shoothill.ai
Shoothill AI Signal
84.5/ 100Strong
Up 1.3 points in the last 30 days.
7 days ago: 82.1Today: 84.5
Watched models54 - 96
  • Claude Opus 4.7
    92.9
  • Gemini 2.5 Flash
    88.5
  • GPT-5.4 mini
    74.6
  • Gemini 2.5 Pro
    57.6
Live leaderboard · last run
Claude Opus 4.797.0▲ 2.1
GPT-5.494.0▲ 1.4
Gemini 2.5 Pro93.3▲ 0.8
GPT-5.4 mini90.5▼ 0.4
Gemini 2.5 Flash87.8▲ 0.3
Example data. Shoothill AI Signal runs continuously throughout the day. to see today's live benchmarks.
01 · The problem

The AI you bought isn't the AI you're using today.

OpenAI, Microsoft, Google and Anthropic push updates without telling you. A model that worked fine last week can start hallucinating, ignoring instructions, or quietly getting worse. Most businesses only find out when something goes wrong on a customer email, a quote, or a report. Shoothill AI Signal is the early-warning system.

01
Truthfulness

Spot answers it made up

We measure how often each model invents a fact, especially in medical, legal, and finance questions. So you know the actual rate, not just the vibe.

02
Reasoning

Watch it on hard problems

Multi-step maths, logic, and planning: the kind of thinking your team actually relies on it for. We update the test set as the bar moves.

03
Discipline

See when it stops following instructions

Catches the silent slips: ignoring formatting rules, breaking persona, drifting off-brief. The kind of slip that quietly breaks the AI tools your team relies on every day.

04
Stability

Catch a model getting worse

Compares each new score to the model's recent history. Email lands the moment something shifts past a threshold you set.

05
Readiness

Test it on real work

Real jobs your team actually does: drafting emails, pulling data out of invoices, sorting documents, summarising meetings. Demos always look easy. We test the messy stuff that breaks in the real world.

06
Governance

Show your working

Every score is timestamped and exportable. Pass risk reviews and audits with a paper trail, not just an opinion.

02 · How it works

How we do it.

No black-box scoring. Sample tests and the full grading methodology are published; the rest of the test set is kept private so model providers can't train against the exact prompts. Same questions every run, so scores stay comparable as the world moves on.

01 · TEST

We test the models

On a regular schedule, we put each tracked model through the same fixed library of test cases. Bespoke business scenarios, same questions every run, kept private so providers can't train against the exact prompts.

02 · GRADE

We grade the answers

Each answer is checked against the right answer, by rules that don't change between runs. So scores today and last week are directly comparable.

03 · COMBINE

We roll it up

Five categories combine into one Signal Score per model: truthfulness, reasoning, instruction adherence, stability, and business readiness.

04 · ALERT

You get the news

Set the limits you care about. We email you the moment a model you watch crosses one.

03 · Who this is for

Built for the people who answer the question "is the AI working?"

Shoothill AI Signal isn't built for AI researchers. It's for the people who actually have to answer for it: the IT manager, the finance director, the business owner. The ones who'd rather spot a problem before a customer, an auditor, or HMRC does.

01
Compliance / risk

Prove you checked.

You picked a model for client-facing work. Six months later, your auditor asks how you know it still meets policy. AI Signal gives you a dated, exportable record of every score since the day you started watching.

02
IT / engineering

Catch slips before customers do.

Your team has GPT-5.5 in a live feature. The provider quietly updates the model and it starts ignoring your formatting rules. You see it on your dashboard the next morning, not in a customer support ticket.

03
Finance / FD

Know what your AI spend is actually buying.

AI Signal shows you whether the tools you're paying for, like Copilot, ChatGPT Enterprise and Gemini for Business, are getting better, worse, or standing still.

04
Operations / CX

Spot when the bot starts making things up.

A drafted reply that's 95% right and 5% invented is the worst kind of mistake. AI Signal tracks hallucination rate per model so you know when to retrain or switch.

05
Senior leadership

Walk into the board meeting with answers.

The C-suite asks if the firm's AI is working. AI Signal lets you answer with months of independent, dated evidence instead of a vendor's marketing slide.

06
Owner / MD

Bring AI into the business with your eyes open.

Walk in with an independent, no-vendor view of which models have actually performed week after week, not whichever one your supplier wants to sell you.

04 · About Shoothill

A full-service digital technology provider.

Shoothill has helped UK businesses get more out of their technology since 2006. Over 400 projects across bespoke software, IT support, cybersecurity, and creative. We built Shoothill AI Signal because our clients kept asking the same question: "is this AI thing actually working?" Now anyone can check. Free.

Consult

Plan the tech that fits.

Copilot, modern workplace, digital transformation. Invest in the right places first.

Create

Websites, design, marketing.

Sharp creative, smart SEO, print and digital campaigns that actually move the needle.

Develop

Bespoke software, built right.

Custom web apps, mobile apps, and AI tailored to your team's real problems.

Support

Keep it running.

Managed IT, cybersecurity, connectivity. The hard part of keeping things live, handled.

Thinking about AI in your business?Let's talk →

Stop guessing which AI is good this week.

Free, forever. Pick the AI tools your team uses, tell us what you want flagged, and get on with your day. We'll email you the moment something changes.

Talk to our team
Fresh Signal Scores
New scores on a regular schedule. Always free.
Watchlists & alerts
Pick the models you use. We email when one slips.
Independent benchmarks
Same questions every run. No vendor spin.
Audit trail built-in
Every score timestamped, exportable. Bring receipts.
Contact

Thinking about AI but not sure where to start?

Shoothill helps businesses pick, build, and operate AI that's safe, useful, and commercially viable. Fill this in and we'll get back to you within one working day.