Skip to main content
2h ago

Probably Secures $9M to Build Dependable AI

Despite the increasing power of Large Language Models (LLMs), the challenge of mitigating "hallucinations" – instances of factual inaccuracies – remai

2 min read3 views5 tags
Originally reported bytechcrunch

Despite the increasing power of Large Language Models (LLMs), the challenge of mitigating "hallucinations" – instances of factual inaccuracies – remains persistently difficult. Errors can surface even in the most advanced models, and while methods for detection exist, the industry is still in the process of identifying the most effective strategies for prevention and correction.

Addressing this critical issue, Probably, a company that recently secured $9 million in seed funding from Andreessen Horowitz, is developing a more rigorous approach to identify and rectify these errors.

As founder Peter Elias explains, the company's ambition is to prevent hallucinations and basic factual errors from ever reaching the user, aiming for the 99.99% accuracy standard commonly found in deterministic systems but significantly harder to achieve with artificial intelligence. Attaining such a high level of precision with LLMs, it turns out, necessitates a rethinking of many foundational principles in AI engineering.

Probably's initial product is a data science tool designed to generate swift answers from complex datasets. Each result is provided with a citation and a clear audit trail detailing its development, a practice that is becoming increasingly common among sophisticated AI tools.

However, preventing errors from permeating these summaries required an elaborate "harness system," which Elias describes as a "data science mech suit." This system cross-references the LLM's initial responses against a deterministic validator, which rejects any results that do not align with the dataset. Crucially, the LLM has been trained in conjunction with this validator, ensuring the entire system is optimized for both speed and accuracy, according to the company.

"What we learned building this was that the better your harness engineering is, the weaker the model can be," Elias states. "If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it’s an exercise in reducing ambiguity."

This innovative approach allows Probably’s data science tool to operate efficiently on significantly smaller AI models. Elias reveals that the current version runs on a model "four classes weaker than the frontier models," which enables its deployment on local hardware, such as a desktop computer, rather than requiring a data center. This drastically reduces the substantial token costs typically associated with AI usage.

Such an innovation is particularly timely, given that token costs are on the rise and many customers are re-evaluating their AI budgets. Moreover, Elias's vision extends beyond data science; the same engine can be adapted to cover a range of precision-sensitive use cases, including accounting or medical services – essentially, "any precision-sensitive use case," as Elias puts it.

Elias critically observes, "I think it’s really interesting that the big AI labs have not even attempted to do this. They’re incentivized not to, because they make money the more times you have to correct the model."

#AI News#Probably#LLM Hallucinations#Dependable AI#Harness System
ES
Editorial StaffEditor

The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.

View all posts
Reader feedback

What did you think of this story?

User Comments

Filter:
No comments yet. Be the first to comment!
Continue reading
View all news