While you’d be hard pressed to find any startup not brimming with confidence over the disruptive idea they’re chasing, it’s not often you come across a young company as calmly convinced it’s engineering the future as Dasha AI.
The team is building a platform for designing human-like voice interactions to automate business processes. Put simply, it’s using AI to make machine voices a whole lot less robotic.
“What we definitely know is this will definitely happen,” says CEO and co-founder Vladislav Chernyshov. “Sooner or later the conversational AI/voice AI will replace people everywhere where the technology will allow. And it’s better for us to be the first mover than the last in this field.”
“In 2018 in the US alone there were 30 million people doing some kind of repetitive tasks over the phone. We can automate these jobs now or we are going to be able to automate it in two years,” he goes on. “If you multiple it with Europe and the massive call centers in India, Pakistan and the Philippines you will probably have something like close to 120M people worldwide… and they are all subject for disruption, potentially.”
The New York based startup has been operating in relative stealth up to now. But it’s breaking cover to talk to TechCrunch — announcing a $2M seed round, led by RTP Ventures and RTP Global: An early stage investor that’s backed the likes of Datadog and RingCentral. RTP’s venture arm, also based in NY, writes on its website that it prefers engineer-founded companies — that “solve big problems with technology”. “We like technology, not gimmicks,” the fund warns with added emphasis.
Dasha’s core tech right now includes what Chernyshov describes as “a human-level, voice-first conversation modelling engine”; a hybrid text-to-speech engine which he says enables it to model speech disfluencies (aka, the ums and ahs, pitch changes etc that characterize human chatter); plus “a fast and accurate” real-time voice activity detection algorithm which detects speech in under 100 milliseconds, meaning the AI can turn-take and handle interruptions in the conversation flow. The platform can also detect a caller’s gender — a feature that can be useful for healthcare use-cases, for example.
Another component Chernyshov flags is “an end-to-end pipeline for semi-supervised learning” — so it can retrain the models in real time “and fix mistakes as they go” — until Dasha hits the claimed “human-level” conversational capability for each business process niche. (To be clear, the AI cannot adapt its speech to an interlocutor in real-time — as human speakers naturally shift their accents closer to bridge any dialect gap — but Chernyshov suggests it’s on the roadmap.)
“For instance, we can start with 70% correct conversations and then gradually improve the model up to say 95% of correct conversations,” he says of the learning element, though he admits there are a lot of variables that can impact error rates — not least the call environment itself. Even cutting edge AI is going to struggle with a bad line.
The platform also has an open API so customers can plug the conversation AI into their existing systems — be it telephony, Salesforce software or a developer environment, such as Microsoft Visual Studio.
Currently they’re focused on English, though Chernyshov says the architecture is “basically language agnostic” — but does requires “a big amount of data”.
The next step will be to open up the dev platform to enterprise customers, beyond the initial 20 beta testers, which include companies in the banking, healthcare and insurance sectors — with a release slated for later this year or Q1 2020.
Test use-cases so far include banks using the conversation engine for brand loyalty management to run customer satisfaction surveys that can turnaround negative feedback by fast-tracking a response to a bad rating — by providing (human) customer support agents with an automated categorization of the complaint so they can follow up more quickly. “This usually leads to a wow effect,” says Chernyshov.
Ultimately, he believes there will be two or three major AI platforms globally providing businesses with an automated, customizable conversational layer — sweeping away the patchwork of chatbots currently filling in the gap. And of course Dasha intends their ‘Digital Assistant Super Human Alike’ to be one of those few.
“There is clearly no platform [yet],” he says. “Five years from now this will sound very weird that all companies now are trying to build something. Because in five years it will be obvious — why do you need all this stuff? Just take Dasha and build what you want.”
“This reminds me of the situation in the 1980s when it was obvious that the personal computers are here to stay because they give you an unfair competitive advantage,” he continues. “All large enterprise customers all over the world… were building their own operating systems, they were writing software from scratch, constantly reinventing the wheel just in order to be able to create this spreadsheet for their accountants.
“And then Microsoft with MS-DOS came in… and everything else is history.”
That’s not all they’re building, either. Dasha’s seed financing will be put towards launching a consumer-facing product atop its b2b platform to automate the screening of recorded message robocalls. So, basically, they’re building a robot assistant that can talk to — and put off — other machines on humans’ behalf.
Which does kind of suggest the AI-fuelled future will entail an awful lot of robots talking to each other… ???
Chernyshov says this b2c call screening app will most likely be free. But then if your core tech looks set to massively accelerate a non-human caller phenomenon that many consumers already see as a terrible plague on their time and mind then providing free relief — in the form of a counter AI — seems the very least you should do.
Not that Dasha can be accused of causing the robocaller plague, of course. Recorded messages hooked up to call systems have been spamming people with unsolicited calls for far longer than the startup has existed.
Dasha’s PR notes Americans were hit with 26.3BN robocalls in 2018 alone — up “a whopping” 46% on 2017.
Its conversation engine, meanwhile, has only made some 3M calls to date, clocking its first call with a human in January 2017. But the goal from here on in is to scale fast. “We plan to aggressively grow the company and the technology so we can continue to provide the best voice conversational AI to a market which we estimate to exceed $30BN worldwide,” runs a line from its PR.
After the developer platform launch, Chernyshov says the next step will be to open up access to business process owners by letting them automate existing call workflows without needing to be able to code (they’ll just need an analytic grasp of the process, he says).
Later — pegged for 2022 on the current roadmap — will be the launch of “the platform with zero learning curve”, as he puts it. “You will teach Dasha new models just like typing in a natural language and teaching it like you can teach any new team member on your team,” he explains. “Adding a new case will actually look like a word editor — when you’re just describing how you want this AI to work.”
His prediction is that a majority — circa 60% — of all major cases that business face — “like dispatching, like probably upsales, cross sales, some kind of support etc, all those cases” — will be able to be automated “just like typing in a natural language”.
So if Dasha’s AI-fuelled vision of voice-based business process automation come to fruition then humans getting orders of magnitude more calls from machines looks inevitable — as machine learning supercharges artificial speech by making it sound slicker, act smarter and seem, well, almost human.
But perhaps a savvier generation of voice AIs will also help manage the ‘robocaller’ plague by offering advanced call screening? And as non-human voice tech marches on from dumb recorded messages to chatbot-style AIs running on scripted rails to — as Dasha pitches it — fully responsive, emoting, even emotion-sensitive conversation engines that can slip right under the human radar maybe the robocaller problem will eat itself? I mean, if you didn’t even realize you were talking to a robot how are you going to get annoyed about it?
Dasha claims 96.3% of the people who talk to its AI “think it’s human”, though it’s not clear what sample size the claim is based on. (To my ear there are definite ‘tells’ in the current demos on its website. But in a cold-call scenario it’s not hard to imagine the AI passing, if someone’s not paying much attention.)
The alternative scenario, in a future infested with unsolicited machine calls, is that all smartphone OSes add kill switches, such as the one in iOS 13 — which lets people silence calls from unknown numbers.
And/or more humans simply never pick up phone calls unless they know who’s on the end of the line.
So it’s really doubly savvy of Dasha to create an AI capable of managing robot calls — meaning it’s building its own fallback — a piece of software willing to chat to its AI in future, even if actual humans refuse.
Dasha’s robocall screener app, which is slated for release in early 2020, will also be spammer-agnostic — in that it’ll be able to handle and divert human salespeople too, as well as robots. After all, a spammer is a spammer.
“Probably it is the time for somebody to step in and ‘don’t be evil’,” says Chernyshov, echoing Google’s old motto, albeit perhaps not entirely reassuringly given the phrase’s lapsed history — as we talk about the team’s approach to ecosystem development and how machine-to-machine chat might overtake human voice calls.
“At some point in the future we will be talking to various robots much more than we probably talk to each other — because you will have some kind of human-like robots at your house,” he predicts. “Your doctor, gardener, warehouse worker, they all will be robots at some point.”
The logic at work here is that if resistance to an AI-powered Cambrian Explosion of machine speech is futile, it’s better to be at the cutting edge, building the most human-like robots — and making the robots at least sound like they care.
Dasha’s conversational quirks certainly can’t be called a gimmick. Even if the team’s close attention to mimicking the vocal flourishes of human speech — the disfluencies, the ums and ahs, the pitch and tonal changes for emphasis and emotion — might seem so at first airing.
In one of the demos on its website you can hear a clip of a very chipper-sounding male voice, who identifies himself as “John from Acme Dental”, taking an appointment call from a female (human), and smoothly dealing with multiple interruptions and time/date changes as she changes her mind. Before, finally, dealing with a flat cancelation.
A human receptionist might well have got mad that the caller essentially just wasted their time. Not John, though. Oh no. He ends the call as cheerily as he began, signing off with an emphatic: “Thank you! And have a really nice day. Bye!”
If the ultimate goal is Turing Test levels of realism in artificial speech — i.e. a conversation engine so human-like it can pass as human to a human ear — you do have to be able to reproduce, with precision timing, the verbal baggage that’s wrapped around everything humans say to each other.
This tonal layer does essential emotional labor in the business of communication, shading and highlighting words in a way that can adapt or even entirely transform their meaning. It’s an integral part of how we communicate. And thus a common stumbling block for robots.
So if the mission is to power a revolution in artificial speech that humans won’t hate and reject then engineering full spectrum nuance is just as important a piece of work as having an amazing speech recognition engine. A chatbot that can’t do all that is really the gimmick.
Chernyshov claims Dasha’s conversation engine is “at least several times better and more complex than [Google] Dialogflow, [Amazon] Lex, [Microsoft] Luis or [IBM] Watson”, dropping a laundry list of rival speech engines into the conversation.
He argues none are on a par with what Dasha is being designed to do.
The difference is the “voice-first modelling engine”. “All those [rival engines] were built from scratch with a focus on chatbots — on text,” he says, couching modelling voice conversation “on a human level” as much more complex than the more limited chatbot-approach — and hence what makes Dasha special and superior.
“Imagination is the limit. What we are trying to build is an ultimate voice conversation AI platform so you can model any kind of voice interaction between two or more human beings.”
Google did demo its own stuttering voice AI — Duplex — last year, when it also took flak for a public demo in which it appeared not to have told restaurant staff up front they were going to be talking to a robot.
Chernyshov isn’t worried about Duplex, though, saying it’s a product, not a platform.
“Google recently tried to headhunt one of our developers,” he adds, pausing for effect. “But they failed.”
He says Dasha’s engineering staff make up more than half (28) its total headcount (48), and include two doctorates of science; three PhDs; five PhD students; and ten masters of science in computer science.
It has an R&D office in Russian which Chernyshov says helps makes the funding go further.
“More than 16 people, including myself, are ACM ICPC finalists or semi finalists,” he adds — likening the competition to “an Olympic game but for programmers”. A recent hire — chief research scientist, Dr Alexander Dyakonov — is both a doctor of science professor and former Kaggle No.1 GrandMaster in machine learning. So with in-house AI talent like that you can see why Google, uh, came calling…
But why not have Dasha ID itself as a robot by default? On that Chernyshov says the platform is flexible — which means disclosure can be added. But in markets where it isn’t a legal requirement the door is being left open for ‘John’ to slip cheerily by. Bladerunner here we come.
The team’s driving conviction is that emphasis on modelling human-like speech will, down the line, allow their AI to deliver universally fluid and natural machine-human speech interactions which in turn open up all sorts of expansive and powerful possibilities for embeddable next-gen voice interfaces. Ones that are much more interesting than the current crop of gadget talkies.
This is where you could raid sci-fi/pop culture for inspiration. Such as Kitt, the dryly witty talking car from the 1980s TV series Knight Rider. Or, to throw in a British TV reference, Holly the self-depreciating yet sardonic human-faced computer in Red Dwarf. (Or indeed Kryten the guilt-ridden android butler.) Chernyshov’s suggestion is to imagine Dasha embedded in a Boston Dynamics robot. But surely no one wants to hear those crawling nightmares scream…
Dasha’s five-year+ roadmap includes the eyebrow-raising ambition to evolve the technology to achieve “a general conversational AI”. “This is a science fiction at this point. It’s a general conversational AI, and only at this point you will be able to pass the whole Turing Test,” he says of that aim.
“Because we have a human level speech recognition, we have human level speech synthesis, we have generative non-rule based behavior, and this is all the parts of this general conversational AI. And I think that we can we can — and scientific society — we can achieve this together in like 2024 or something like that.
“Then the next step, in 2025, this is like autonomous AI — embeddable in any device or a robot. And hopefully by 2025 these devices will be available on the market.”
Of course the team is still dreaming distance away from that AI wonderland/dystopia (depending on your perspective) — even if it’s date-stamped on the roadmap.
But if a conversational engine ends up in command of the full range of human speech — quirks, quibbles and all — then designing a voice AI may come to be thought of as akin to designing a TV character or cartoon personality. So very far from what we currently associate with the word ‘robotic’. (And wouldn’t it be funny if the term ‘robotic’ came to mean ‘hyper entertaining’ or even ‘especially empathetic’ thanks to advances in AI.)
Let’s not get carried away though.
In the meanwhile, there are ‘uncanny valley’ pitfalls of speech disconnect to navigate if the tone being (artificially) struck hits a false note. (And, on that front, if you didn’t know ‘John from Acme Dental’ was a robot you’d be forgiven for misreading his chipper sign off to a total time waster as pure sarcasm. But an AI can’t appreciate irony. Not yet anyway.)
Nor can robots appreciate the difference between ethical and unethical verbal communication they’re being instructed to carry out. Sales calls can easily cross the line into spam. And what about even more dystopic uses for a conversation engine that’s so slick it can convince the vast majority of people it’s human — like fraud, identity theft, even election interference… the potential misuses could be terrible and scale endlessly.
Although if you straight out ask Dasha whether it’s a robot Chernyshov says it has been programmed to confess to being artificial. So it won’t tell you a barefaced lie.
How will the team prevent problematic uses of such a powerful technology?
“We have an ethics framework and when we will be releasing the platform we will implement a real-time monitoring system that will monitor potential abuse or scams, and also it will ensure people are not being called too often,” he says. “This is very important. That we understand that this kind of technology can be potentially probably dangerous.”
“At the first stage we are not going to release it to all the public. We are going to release it in a closed alpha or beta. And we will be curating the companies that are going in to explore all the possible problems and prevent them from being massive problems,” he adds. “Our machine learning team are developing those algorithms for detecting abuse, spam and other use cases that we would like to prevent.”
There’s also the issue of verbal ‘deepfakes’ to consider. Especially as Chernyshov suggests the platform will, in time, support cloning a voiceprint for use in the conversation — opening the door to making fake calls in someone else’s voice. Which sounds like a dream come true for scammers of all stripes. Or a way to really supercharge your top performing salesperson.
Safe to say, the counter technologies — and thoughtful regulation — are going to be very important.
There’s little doubt that AI will be regulated. In Europe policymakers have tasked themselves with coming up with a framework for ethical AI. And in the coming years policymakers in many countries will be trying to figure out how to put guardrails on a technology class that, in the consumer sphere, has already demonstrated its wrecking-ball potential — with the automated acceleration of spam, misinformation and political disinformation on social media platforms.
“We have to understand that at some point this kind of technologies will be definitely regulated by the state all over the world. And we as a platform we must comply with all of these requirements,” agrees Chernyshov, suggesting machine learning will also be able to identify whether a speaker is human or not — and that an official caller status could be baked into a telephony protocol so people aren’t left in the dark on the ‘bot or not’ question.
“It should be human-friendly. Don’t be evil, right?”
Asked whether he considers what will happen to the people working in call centers whose jobs will be disrupted by AI, Chernyshov is quick with the stock answer — that new technologies create jobs too, saying that’s been true right throughout human history. Though he concedes there may be a lag — while the old world catches up to the new.
Time and tide wait for no human, even when the change sounds increasingly like we do.