This site uses Google Adsense and affiliate marketing to support site operations and charitable purposes for children’s welfare!
When AI Learns to Test Its Own Skills — So Do I

When AI Learns to Test Its Own Skills — So Do I

Anthropic published an article a couple of days ago, titled “Improving Skill Creator: Test, Measure, and Refine Agent Skills.”

I laughed when I finished reading it.

Not because it was funny — but because I’d been doing the exact same thing over the past few months.

They’re doing it at the platform level. I’m doing it at the personal level — evolving my own “lazy factory.”

First, What Did Anthropic Actually Say?

The core message is simple:

AI Skills that “seem to work” aren’t enough. You need to test them, measure them, and keep improving them.

They break Skills into two categories:

TypeDescriptionExamples
Capability-enhancingMakes AI do things it couldn’t do well beforePDF form filling, document generation
Preference-encoding“Writes” your workflow into the AINDA review, weekly report summarization

Then they introduced an eval system — basically a way to write “exam questions” for your AI skills.

Define your input, describe the expected output, and see if the AI passes.

Sounds like unit testing in software engineering, right?

Exactly — and they say so themselves.

Bringing the rigor of software development (tests, benchmarks, iterative improvement) into skill writing — without needing to write code.

Funny, I’ve Been Doing the Same Thing

During the Lunar New Year, I did a big overhaul of my workflow and rebuilt my entire AI collaboration system from scratch.

Right now, inside my Claude Code environment, I have 19 custom-built Skills, 12 specialized Agents, and a full cross-tool knowledge sync architecture.

Sounds intense?

It’s really just what happens when a lazy person refuses to do the same thing twice.

My Skills Fall Into Two Categories Too

Looking back at Anthropic’s framework, my Skills map perfectly onto both types:

Capability-enhancing:

  • book-cover-automation: Auto-download and remove backgrounds from book covers
  • translate-blog: Auto-translate Chinese posts to English
  • seo-analysis: SEO data analysis and strategy generation

Preference-encoding (writing my workflow into AI):

  • hugo-content-guide: My writing style and formatting rules
  • commit: Auto-generate Git commit messages
  • daily-review: Daily review → auto-write to Anytype
  • session-end: Auto status check and knowledge extraction when a session ends

The second category is where I’ve invested the most effort.

Because these aren’t about “making AI smarter” — they’re about “making AI become me.”

Testing Isn’t Optional

Anthropic’s article calls out a very real problem:

Most skill authors are domain experts, not engineers. They know their workflows, but lack the tools to verify whether a skill still works correctly.

That hit home.

I’m a CEO from an insurance background, not an engineer. But I’m now managing 19 AI Skills, and every single one affects my content production pipeline.

If translate-blog breaks, my English posts will be wrong.

If the tone rules in hugo-content-guide drift, the AI’s writing won’t sound like me.

So I started doing something similar to eval — just in a more lo-fi way:

  1. check-skills: A dedicated Skill for checking the health of all other Skills
  2. sync-skills: Keeps knowledge in sync across Claude Code, Copilot, and Codex
  3. promote-lessons: Reviews knowledge suggestions to prevent config files from bloating indefinitely

Not elegant, but it works.

When a Model Improves, Should Your Skill Retire?

One part of the article really resonated with me:

If the base model starts passing your eval without the Skill, that means the technique has been absorbed by the model. The Skill isn’t broken — it’s just no longer needed.

This matches my experience exactly.

I’ve already retired 3 Skills:

  • canva-cover-update
  • code-simplifier
  • content-writing

Not because they were poorly written — but because the model itself improved and no longer needed the extra prompting.

And honestly? That’s a good thing.

It means your automation system is alive — it self-simplifies as AI evolves.

The Full Picture of the Lazy Factory

Since we’re here, let me sketch out what the full system looks like right now:

Content Pipeline:
  Notion post → Hugo blog → English translation → SEO optimization
  → Auto-generate social posts → Auto-schedule to FB / IG

Knowledge Pipeline:
  Reading notes → Zettelkasten cards → Anytype
  Daily review → Conversational diary → Anytype

Operations Pipeline:
  GA4 data → Growth strategy → CTR optimization
  Newsletter → Auto-send via ConvertKit
  Podcast → Auto-integration and promotion

All of this is wired together by AI Skills + Python scripts, with three AI tools (Claude Code, GitHub Copilot, Codex) sharing the same knowledge base.

Yes — I’m the kind of person who spends an entire Lunar New Year building systems just to avoid doing things manually.

Very lazy. But my laziness is highly systematic.

The Future: The Line Between Skill and Spec Will Blur

The article closes with a fascinating observation:

As models improve, the line between “Skill” and “Specification” may blur. Today’s SKILL.md is an implementation plan — telling AI exactly how to do something. In the future, a natural language description of what to do might be enough.

I think they’re right.

Right now, each of my SKILL.md files runs hundreds of lines — full of formatting rules, banned phrases, sentence patterns, and example code.

But maybe someday, all I’ll need to write is:

“Write a lifestyle post in Lazy Da’s voice.”

And the AI will just know what to do.

When that day comes, my Skills won’t disappear — they’ll have become the AI’s memory.


AI workflow automation diagram


FAQ


Further Reading


Lazy Conclusion

Lazy Conclusion

What Anthropic is doing and what I’m doing are fundamentally the same thing:

Turning AI skills from “seems like it works” to “confirmed to work.”

The only difference is scale — they’re building platform-level tools, I’m building a personal lazy factory.

But the core logic is identical:

  1. Define your workflow (write it as a Skill)
  2. Test whether it works as expected (via eval or lo-fi methods)
  3. Trim as the model improves (retire what’s no longer needed)

AI won’t replace you — but people who use AI will move faster.

And people who test their own AI skills will move a little more steadily, too.

📩
訂閱電子報,獲取更多理財觀點

🚀 已有 1,000+ 讀者加入理財成長之路

Related Posts

💡 You may also enjoy these articles

I Found a Reason to Leave Comet and Go Back to Chrome...

I Found a Reason to Leave Comet and Go Back to Chrome...

After using Comet for a few months, I thought there was no going back from AI search—until I discovered Chrome could do the same thing. Sharing the setting that made me switch back, three steps to make Google AI Mode your default search engine.

Read More
【Life】Anytype Revived During the New Year! How an Extremely Lazy Person Builds an Efficient Note-Taking Workflow with Automation

【Life】Anytype Revived During the New Year! How an Extremely Lazy Person Builds an Efficient Note-Taking Workflow with Automation

After neglecting Anytype for over a year, I unexpectedly revived it while reorganizing my workflow during the Lunar New Year. I share the mindset of an extremely lazy person: I hate trouble, so I love automation, and I use it to transform my reading, note-taking, and life.

Read More
Elon Musk's Sense of Mission Made Me Bow Down for a Moment — Controversies Aside

Elon Musk's Sense of Mission Made Me Bow Down for a Moment — Controversies Aside

Elon Musk doesn’t just build cars and launch rockets — his 5 wild predictions about the future completely shattered my imagination. From space-based AI data centers to ‘Universal High Income,’ and even the idea that we might be living inside an alien Netflix series — these aren’t just sci-fi, they’re deeply insightful takes on human nature. This post rounds up the Musk ideas that hit me hardest, especially the one about ‘productivity,’ which is a real wake-up call for us efficiency-obsessed lazy folks.

Read More

💰 加入懶得變有錢電子報

每週獲得最新理財心法與投資洞察

我們尊重您的隱私,隨時可以取消訂閱

🚀 已有 1,000+ 讀者加入理財成長之路