Businesses today are generating an unprecedented volume of video content. From extensive broadcast archives to thousands of in-store surveillance cameras and countless hours of production footage, a vast majority of this material remains dormant on servers, unexamined and unanalyzed. This phenomenon, known as "dark data," represents a colossal, untapped resource that companies automatically collect but rarely leverage in a meaningful way.
Addressing this challenge, Aza Kai (CEO) and Hiraku Yanagita (COO), two former Google Japan executives who collaborated for nearly a decade, embarked on creating their own solution. The duo co-founded InfiniMind, a Tokyo-based startup dedicated to developing infrastructure that transforms petabytes of unviewed video and audio into structured, actionable business data.
“My co-founder, who spent a decade leading brand and data solutions at Google Japan, and I saw this inflection point coming while we were still at Google,” Kai revealed. He further explained that by 2024, the technological landscape had sufficiently matured, and market demand had become so unequivocal that the co-founders felt compelled to establish the company themselves.
Kai, whose extensive background at Google Japan spanned cloud computing, machine learning, ad systems, and video recommendation models before he led data science teams, elaborated on the limitations of existing solutions. He noted that earlier approaches could identify objects in individual frames but lacked the capability to track narratives, deduce causality, or answer complex inquiries about video content. For clients managing decades of broadcast archives and petabytes of footage, even fundamental questions about their content frequently went unanswered.
The pivotal shift, Kai explained to TechCrunch, occurred with advancements in vision-language models between 2021 and 2023, enabling video AI to move beyond mere object tagging. While decreasing GPU costs and consistent annual performance gains of approximately 15–20% over the past decade contributed, the paramount factor was the models' newfound capability; until recently, they simply couldn't perform the required tasks.
InfiniMind recently successfully closed a $5.8 million seed funding round, spearheaded by UTEC, with participation from CX2, Headline Asia, Chiba Dojo, and an AI researcher from a16z Scout. The company is strategically relocating its headquarters to the U.S. while maintaining a significant operational presence in Japan. Japan proved to be an ideal testing ground, offering robust hardware, highly skilled engineers, and a supportive startup ecosystem, which allowed the team to refine its technology with demanding customers before expanding globally.
Its inaugural product, TV Pulse, launched in Japan in April 2025. This AI-powered platform provides real-time analysis of television content, assisting media and retail companies in "tracking product exposure, brand presence, customer sentiment, and PR impact," according to the startup. Following successful pilot programs with major broadcasters and agencies, the platform has already secured paying customers, including wholesalers and media organizations.
Now, InfiniMind is poised for the international market. Its flagship offering, DeepFrame, a sophisticated long-form video intelligence platform, is capable of processing up to 200 hours of footage to precisely identify specific scenes, speakers, or events. Kai announced that DeepFrame is slated for a beta release in March, with a full commercial launch anticipated in April 2026.
The video analysis sector is notably fragmented. Kai highlighted that while companies like TwelveLabs offer general-purpose video understanding APIs catering to a broad spectrum of users, including consumers, prosumers, and enterprises, InfiniMind strategically concentrates on specialized enterprise use cases. These include monitoring, safety, security, and extracting deeper insights from video content.
“Our solution requires no code; clients bring their data, and our system processes it, providing actionable insights,” Kai stated. He further emphasized, “We also integrate audio, sound, and speech understanding, not just visuals. Our system can handle unlimited video length, and cost efficiency is a major differentiator. Most existing solutions prioritize accuracy or specific use cases but don’t solve cost challenges.”
The recently secured seed funding will be instrumental in advancing the development of the DeepFrame model, scaling the engineering infrastructure, expanding the team with additional engineers, and reaching a broader customer base across both Japan and the U.S.
“This is an exciting space, one of the paths toward AGI,” Kai concluded. “Understanding general video intelligence is about understanding reality. While industrial applications are important, our ultimate goal is to push the boundaries of technology to better understand reality and help humans make better decisions.”
The Editorial Staff at AIChief is a team of professional content writers with extensive experience in AI and marketing. Founded in 2025, AIChief has quickly grown into the largest free AI resource hub in the industry.