It can be implemented with 10 lines of code.
Docs: https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API Simple Example: https://codepen.io/renanpupin/pen/yVpBRq/
Then you can send that data [text] to the server, server will process it and send whatever result back.
You can do AI [NLP] stuff with this API: https://dialogflow.com/, This API tokanize input string for you and give JSON information like [action in that statement]. BUT remember you did't do anything as API took care of anything. You can make your own NLP with tensorflow, here is an example video.
You can also make your own neural network that can do NLP, but it is a project on its own.
Do whatever you want to do after that, It can be achieved by your server code, like you can find all files with 'example text' in it.
This is a overview is based upon information given by navin.