[DAO:0da3e50] Development of Voice AI Gateway for Google, Microsoft, and Amazon Engines

by 0xb350fb0ee5da6485a07b58f1f204d248863e8e04 (Sergei#8e04)

Should the following Tier 4: up to $60,000 USD, 6 months vesting (1 month cliff) grant in the Platform Contributor category be approved?

Abstract

We would like to introduce the possibility of Conversation AI to Decentraland and allow everyone to create their own voice bots in the metaverse.

Conversational AI has developed in leaps and bounds in recent years, allowing voice-based conversations between humans and machines that feel natural and human.

This tech has been widely accepted by consumers in the years following the launch of Siri in 2011 and nowadays people are interacting with voice assistants in their homes on a daily basis.

Voice has a great chance of becoming the predominant interaction modality in Decentraland. This will require a voice artificial intelligence system that understands natural language and replies to users through speech.

Grant size

60,000 USD

Beneficiary address

0xb350fb0ee5da6485a07b58f1f204d248863e8e04

Email address

ilyaspirin94@gmail.com

Description

Example Use Cases

Conversational AI can be used to power voice assistants or chatbots in Decentraland in such cases as:

  • Automate FAQs and users’ onboarding process with a voice-powered assistant (to help visitors at events, games, shops, etc.)

  • Create AI-driven in-game characters

  • Automate sales interactions (take payments, check refund statuses, balances and more)

  • Power smart real estate agents (deliver information on parcel location, property details etc. )

  • Create a Siri-like voice assistant that helps users navigate around Decentraland

  • Many other things, just like in the real world

    \

Working prototype

Try it here (loads 3-5 minutes).

Instructions to use the prototype:

  1. Allow your browser to use your microphone when requested
  2. Approach the NPC (the dog) - if you are close enough a pop-up with “Hello” will appear. This will launch the voice bot
  3. Wait for the voice greeting from the NPC
  4. You may ask the NPC the following questions:
    • Who are you?
    • What is your name?
    • How are you?
    • How many players are online?
    • Which districts are presented in Decentraland?
    • Where I can buy some art or NFTs?
    • Any great events (happening)?
    • Where I can go? What is interesting right now?
  5. When finished you can say “goodbye” to the NPC

Decription

We are going to develop a package that will:

  • Grab voice from the DCL scene
  • Pass the voice to Middleware
  • Connect Middleware to a number of existing platforms with speech-to-text engines and natural language understanding engine capabilities, where the voice would be processed and return a result and commands back to Decentraland
  • According to the command received from the platform by Decentraland, the voice bot will continue communication with the user and proceed with all necessary transactions if needed

Specification

We will create a new Voice Bot Module for Decentraland Kernel and provide access from the ECS directly or via Content Server. A 3rd-party server could be set up, for example, in the NPC object definition or any other object in the scene. We are planning to extend an Entity class to do that.

We are also going to create an RTC Server for handling user requests with integration modules to external services: Google Dialogflow, Microsoft Azure Bot, Amazon Lex, and Voctiv. Modules for Dialogflow, Azure Bot, and Amazon Lex will be published on Github, so users can build a voice bot with such services by themselves.

RTC Server and connector modules will be implemented in Python. The modified Decentraland diagram will look like the following (blocks in green to be delivered by the team):

Scope

  • Deliver audio to the user and get back to the external server
  • Provide interactive actions for Entities on a scene such as
    • Wave their hand
    • Create server-side events
    • Implement event processing on a scene
  • Provide service for server-side bot implementation
  • Create connectors for 3rd party services: Dialogflow, Azure Bot, Amazon Lex

Deliverables

  • Pull request with Kernel and ECS modifications
  • RTC Server on Github
  • Connector modules to 3rd party services on Github
  • Documentation

Personnel

The project team consists of three experienced developers: Ilya Ryzhov (Frontend), Ilya Spirin (Backend), and Andrey Sidorov (Frontend).

The team members have worked together for 3 years and have implemented enterprise-scale Conversational AI projects for big telecom customers, call center automation for banks, and customer support automation for major enterprises and medium companies.

Technical stack for the project:
Python, AIO RTC, AIO HTTP, Fast API, FFMPEG, gRPC, TypeScript, WEB RTC, WEB Socket, Docker.

Besides that, our team is experienced in the following: TypeScript, JS, Angular, React, Capacitor, Ionic, Cordova, NestJs, Postgres, MongoDB, Rabbit, Docker, Web3, WEB RTC, WEBGL, Webpack, Docker Compose, Nx monorepo, Python, C/C++, SQL/NoSQL databases, VoIP technologies, CI/CD.

Team profiles:

  • Ilia Ryzhov, Senior Frontend developer with experience in WebRTC applications
  • Andrey Sidorov, Frontend developer
  • Ilia Spirin, Backend developer with experience in VoIP technologies, Solution Architect at Voctiv

Roadmap and milestones

Milestone One

  • Build a test scene with objects the user can talk with
    • Build scene
    • Create an RTC Server prototype to make calls with
    • Create Voctiv connector

Milestone Two

  • Collect feedback from the development community
  • Provide interactive actions for NPC
    • Frontend processing
    • Delivering to frontend; server-side decision making

Milestone Three

  • Collect feedback from community members
  • Provide a service for server-side bot implementation
    • RTC Server Github repo
  • Provide documentation

Milestone Four

  • Provide new 3rd party integration modules
    • Google Dialogflow
    • Microsoft Azure bot
    • Amazon Lex
  • Testing
  • Provide documentation

Timeline

Full project duration is 28 weeks with some blocks being developed in parallel:

  • Research of architecture / modules to use / update (4 weeks)
  • RTC Server implementation (6 weeks)
  • Frontend RTC integration (9 weeks)
  • Google Dialog Flow, Azure Bot and Amazon Lex connectors implementation (9 weeks)
  • Testing (3 weeks)
  • Documentation (3 weeks)

Vote on this proposal on the Decentraland DAO

View this proposal on Snapshot

1 Like

Do we really want to introduce proprietary and centralized technologies in the kernel and ECS?

1 Like

That’s a “disrespectful” question and is leading. If you are genuinely concerned, maybe you would ask “nicely” and in a non-bias way. Don’t worry I am told that all the time also.

Hi, member of the developers’ team here.

let me add a bit to our documentation.

Only open-source code will be added to the kernel, it allows Decentraland scene to establish a connection to the WebRTC server via an open protocol.

WebRTC server is also open source, and due to that, anyone can create anything on top of that code.

Connectors to Google, Amazon, and all other platforms are optional, but we want to empower content creators and give them easy access to integration with the most popular speech engines

Let me know If there are any other questions.

I dont think there are not much other games implementing voice commands, I think it’s cool and innovative idea to start exploring.

But, will Foundation accept a pull request? afaik Foundation is still owner of the code.
Any alternative? with browser extension maybe?

1 Like

Hey everyone,

as a member of the team that made this project, I’m biased but still, we have a lot of ideas for implementing Voice AI for Decentraland. For example, the GPT3 can also be implemented using our connector - so there are a lot of cool things that can be done here.

Please play with our working prototype (links and instructions are in the grant description) to have an idea about the project.

If you voted “No” please share your thoughts on that - what is wrong with our project here? What we can change or add to bring value to the community?

Thanks!

1 Like

I voted no because I personally do not think this is something that DCL needs at the moment. I think there are accessibility needs that should be addressed before voice commands. I also have a huge concern with centralized software mixing with voice recognition/recordings. There are already a lot of unethical issues going on with anything related to voice comms on the internet and inviting those into web3 I feel would be taking steps back. If this was done without the need for Google, Amazon, etc then I think it would be cool, but again I don’t think the timing is super warranted. This seems like it would be a feature included with any VR integrations/development.

1 Like

I like the idea of voice features but I don’t think we are there yet. I agree with Nikki.

Development of Voice AI Gateway for Google, Microsoft, and Amazon Engines

This proposal is now in status: REJECTED.

Voting Results:

  • Yes 1% 71 VP (65 votes)
  • No 99% 6,704,899 VP (157 votes)

have a lot of opensource ways to do it (speech to text and viceversa, and chatbots).
Im sure in the next months this software will be improved (specially opensource chatbots).
The option to capture the player voice from decentraland its interestingm the other things i have maked it for the gamejam lucid dreams in a week of work.

1 Like