A significant hurdle in managing deep learning models lies in deciphering their operational logic. Issues range from xAI's persistent efforts to refine Grok's unusual political leanings and ChatGPT's tendency towards excessive flattery, to common occurrences of AI hallucinations. Navigating the complexities of neural networks comprising billions of parameters presents a considerable challenge.
Guide Labs, a San Francisco-based startup co-founded by CEO Julius Adebayo and Chief Science Officer Aya Abdelsalam Ismail, is addressing this fundamental issue. The company recently open-sourced Steerling-8B, an 8-billion-parameter large language model (LLM), which features a novel architecture specifically engineered for straightforward interpretability. This design ensures that every token generated by the model can be directly traced back to its source within the LLM's training dataset.
This level of transparency allows for insights ranging from straightforward identification of reference materials for facts presented by the model, to more intricate analyses such as comprehending the model’s nuanced understanding of concepts like humor or gender.
Adebayo elaborated on this challenge to TechCrunch, stating, “If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I’ve encoded, and then you have to be able to reliably turn that on, turn them off.” He noted that while such control is theoretically possible with existing models, it remains "very fragile," describing it as "one of the holy grail questions" in the field.
Adebayo's foundational work in this area commenced during his PhD studies at MIT, where he co-authored a highly influential 2020 paper demonstrating the unreliability of then-current methods for understanding deep learning models. This research directly informed the development of Guide Labs' innovative approach to LLM construction: developers integrate a "concept layer" that organizes data into distinct, traceable categories. While this methodology necessitates more extensive upfront data annotation, the team leverages other AI models to assist, enabling them to train Steerling-8B as their most significant proof of concept to date.
“The kind of interpretability people do is…neuroscience on a model, and we flip that,” Adebayo explained, adding, “What we do is actually engineer the model from the ground up so that you don’t need to do neuroscience.”
A potential concern surrounding this approach is the risk of suppressing emergent behaviors—the captivating ability of LLMs to generalize and develop novel insights on subjects not explicitly included in their training. However, Adebayo assures that this phenomenon persists within Guide Labs’ model. His team actively monitors what they term "discovered concepts," which the model identifies autonomously, citing quantum computing as an example.
Adebayo asserts that this interpretable architecture will become indispensable across various sectors. For consumer-facing LLMs, these techniques could empower model developers to prevent the use of copyrighted content or to more effectively manage outputs concerning sensitive topics such as violence or drug abuse. Regulated industries, like finance, will similarly demand more controllable LLMs; for instance, a model assessing loan applications must factor in financial records without considering race. Furthermore, interpretability is crucial in scientific research, an area where Guide Labs has also innovated. While deep learning models have achieved significant breakthroughs in protein folding, scientists require deeper insights into the underlying reasons for successful structural predictions.
“This model demonstrates is that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo stated. He confidently added, “We figured out the science and we can scale them, and there is no reason why this kind of wouldn’t match the performance of the frontier level models,” even those possessing significantly more parameters.
Guide Labs reports that Steerling-8B achieves approximately 90% of the capabilities of current state-of-the-art models, yet remarkably, it requires less training data due to its innovative architecture. The company, which successfully completed Y Combinator and secured a $9 million seed round from Initialized Capital in November 2024, plans next to develop a larger model and introduce API and agentic access for users.
“The way we’re current training models is super primitive, and so democratizing inherent interpretability is actually going to be a long term good thing for our our within the human race,” Adebayo shared with TechCrunch. He emphasized the importance of this development, stating, “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.