Researchers in the US have presented a paper based on their research that identified a real-time, activity recognition system capable of interpreting collected sounds that could well be used by home smart speakers.
Identify Other Sounds, and Issue Responses
Researchers at Carnegie Mellon University in the US claim to have discovered a way that the ubiquity of microphones in modern computing devices, and software that could use a device’s always-on built-in microphones could be used to identify all sounds in room, thereby enabling context-related responses from smart devices. For example, if a smart device such as an Amazon Echo were equipped with the technology, and could identify the sound of a tap running in the background in a home, it could issue a reminder to turn the tap off.
The research project, dubbed ‘Ubicoustics’, identified how using an AI /machine learning based sound-labeling mode, drawing on sound effects libraries, could be linked to the microphone (as the listening element) of a smart device e.g. smart-watches, computers, mobile devices, and smart speakers.
As Good As A Human
The sound-identifying, machine-learning model used in the research system was able to achieve human-level performance in recognition accuracy and false positive rejection. The reported accuracy level of 80.4%, and the misclassification level of around one sound in five sounds, means that it is comparable to a person trying to identify a sound.
As well as being comparable to other high-performance sound recognition systems, the Ubicoustics system has the added benefit of being able to recognise a much wider range of activities without site-specific training.
The researchers noted several possible applications of the system used in conjunction with smart devices e.g. sending a notification when a laundry load finished, promoting public health by detecting frequent coughs or sneezes and enabling smart-watches to prompt healthy behaviours after tracking the onset of symptoms.
The obvious worry with a system of this kind is that it could represent an invasion of privacy and could be used to take eavesdropping to a new level i.e. meaning that we could all be living in what is essentially a bugged house.
The researchers suggest a potential privacy protection measure could be to convert all live audio data into low resolution Mel spectrograms (64 bins), thereby making speech recovery sufficiently difficult, or simply running the acoustic model locally on devices so no audio data is transmitted.
What Does This Mean For Your Business?
The ability of a smart device to be able to recognise all sounds in a room (as well as a person can) and to deliver relevant responses could be valued if used in a responsible, helpful, and not an annoying way. It doesn’t detract from the fact that, knowing that having a device with these capabilities in the home or office could represent a privacy and security risk, and has more than a whiff of ‘big brother’ about it. Indeed, the researchers recognised that people may not want sensitive, fine-grained data going to third-parties, and that operating a device with this system but without transmission of the data could provide a competitive edge in the marketplace.
Nevertheless, it could also represent new opportunities for customer service, diagnostics for home and business products / services, crime detection and prevention, targeted promotions, and a whole range of other possibilities.