Banking & Finance
Voice-enabled KYC, mobile money interfaces and chatbots in the languages clients actually speak.
Open datasets, NLP/ASR/TTS models and language tooling for the 500+ Bantu languages spoken across Central, East and Southern Africa — starting with Lingala.
Hundreds of millions of speakers. Almost no representation in major AI systems. Both numbers are real.
The largest language models in the world are trained on data that's overwhelmingly English, Mandarin, and a handful of European languages. When 350 million people who speak a Bantu language try to use a voice assistant, a chatbot, a search engine — the systems often don't even pretend to listen.
It's not a quirk of the technology. It's a question of which voices were judged worth recording. We exist to change that answer — not by hand-waving about “AI for Africa”, but by shipping the dull, exact, foundational work: clean datasets, reproducible pipelines, public benchmarks.
If you can't train on it, evaluate on it, and improve it in the open — it doesn't exist for AI. We're changing that.
Texts, audio, transcriptions. Sourced from communities, archives, broadcasters and academic partners.
Cleaning, normalization, annotation, alignment. Reproducible pipelines, public scripts, version-controlled.
Open release on Hugging Face, with model cards, license, data sheets, and contribution guidelines.
Reference NLP, ASR, and TTS models. APIs, SDKs, and partnerships with builders across the continent.
The infrastructure is foundational. The applications it enables touch every part of public and economic life.
Voice-enabled KYC, mobile money interfaces and chatbots in the languages clients actually speak.
Vocal medical assistants, diagnostic translation, accessible patient interfaces in remote clinics.
Mother-tongue learning tools, literacy apps, augmented teacher resources for primary schools.
Public services accessible in local languages: official translation, citizen-facing AI agents.
Drop-in APIs and SDKs for African builders shipping localized apps to local markets.
Subtitling, archival, language preservation. Tools for journalists, broadcasters and storytellers.
Lingala is the lingua franca of Kinshasa — a city of fifteen million and one of the fastest-growing in the world — and of much of the Congo River basin. It is sung across Africa, broadcast on dozens of radio stations, and used daily in markets, offices, schools and ministries.
It is also functionally invisible to the AI systems that increasingly mediate banking, education and healthcare. Starting here is a choice with weight: a major African language, a rich oral and written tradition, and a measurable gap we can close.
“Mbote na yo, ndenge nini ozali?”
Hello, how are you? · [mbote na jo, ndenge nini oˈzali]
First open Lingala text corpus enters preview. Cleaning and tokenization scripts published alongside.
Reproducible processing pipelines + CI on the public GitHub organization. Contributor guide drafted.
BantuLanguages Initiative formally established. Founding hubs in Brazzaville and Kinshasa.
Public foundation report mapping the gap between current LLM coverage and the Bantu language family.
Researchers, developers, linguists, foundations, ministries — we're building a community as multilingual as the languages it serves. Get the next dataset releases, calls for contribution, and research notes in your inbox.