Perspective on practical AI state of the art

My news feed offered an article which seems to offer more perspective on the practical effectiveness of contemporary AI facilities than anything else I've recently seen.

AI agents wrong ~70% of time: Carnegie Mellon study • The Register

In retrospect, the thing that struck me most about it is how thoroughly I previously lacked such a handle, without recognizing it until seeing this article. Here's what I make of that.

In the midst of the huge changes that accompany emergence of AI facilities that are practically effective to any significant degree, it's hard to to get a handle on how effective they actually are and aren't. That may be especially so for those of us not fully occupied with tracking the developments, but from the circulating hype and the drastic pace of change in the industry I suspect those who are deeply involved in the actual efforts may also have a difficult time getting a sense of proportion and absolute status, for similar and some different reasons as the rest of us.

Some questions I take with me:

Might tools mentioned in the article, like TheAgentCompany and CRMArena-Pro, continue to advance as the agents they're tracking advance, and lead to ways to track the progress in the usefulness of developing AI facilities?

Are there capabilities that those measures might miss, not address in fundamental ways? We're talking about the capabilities of our intelligence – comprehension of comprehension. "If the brain were so simple we could understand it, we would be so simple we couldn't." -- Lyall Watson

How would human agencies (companies) dedicated to such tasks score? Is the presumption that they would score 100%?
Do the current deficiencies of AI effectiveness demonstrated by those tools constitute what the promise of Artificial General Intelligence offers beyond contemporary, seemingly-comprehensive-but-not-quite AI (LLM, RAG LLM, etc) facilities?

Search This Blog

Perspective on practical AI state of the art

Comments

Popular posts from this blog

Blogger silently drops comments submitted by Safari in embedded-comments mode

Exploring Adaptation of the Underscore for Online Practice

Finding inspiration in solo movement in small changes