Apple looks to make Group FaceTime audio calls more realistic

AppleInsider · December 13, 2018 1:56PM

Audio-only group FaceTime calls with multiple people could be easier to deal with in the future, as Apple has suggested a way to process audio to implement the "cocktail party" effect, to make it easier to determine who is speaking.

Beats Studio 3 Skyline Wireless headphones

Conference calls have made it easier to communicate ideas with many other people at the same time, and has become an essential tool for business. While handy, such calls can present their own problems, simply caused by having too many people taking part.

With multiple people in a call, it can be difficult to determine who is speaking at any one time. While video calls offer visual cues to identify the current speaker, audio-only calls do not have that luxury, leaving it up to the user to recognize the voice.

A patent application from Apple published by the U.S. Patent and Trademark Office on Thursday for "Intelligent augmented audio conference calling using headphones" suggests how the use of stereo headphones and software could be used to separate out the participants.

To solve the problem, Apple proposes splitting up each separate channel from individual callers in a multi-party call. These channels are then fed into a system to simulate a virtual audio environment or "room" with appropriate acoustic properties for the listener's local environment, to make the call seem more like it is taking place in the actual room.

Apple patent application illustrating placement of multiple callers in a "virtual room" relative to the user

The remote callers are spaced apart in the virtual room, with the audio feeds for each participant adjusted so as to give the effect of the remote user's voice coming from different areas of the real room, such as around a conference table. Using a head-tracking system with the stereo headphones, the relative position of each remote user can be maintained while the user moves their head, with audio properties changing to match the user's orientation.

By using a spatial rendering audio system, Apple suggests it takes advantage of the "cocktail party mechanism" of the listener's brain for segregating audio sources. In short, placing voices in particular positions, it makes the conference call easier to track for the listener.

It is suggested the system could also use metadata to intelligently cluster together participants, such as co-workers or by company, frequency of speech, and geographic location, among other details. The virtual direction and distance of each caller could be further controlled by the user depending on their preferences, and there is even the possibility of "moving" participants around during the call, as in the case of one presenter switching out their position leading the call with another.

A block diagram showing how the audio feeds would be processed

Apple files a number of patent applications every week, and while the publication by the USPTO may indicate areas of interest to the company, it isn't a guarantee the described concepts will make their way into future consumer products.

In this case, there is a fairly good chance some form of the idea could work. The virtual "room" and audio alterations can certainly be performed on mobile devices and fed through stereo headphones, and while there aren't head-tracking headphones, it is feasible to work using other technologies, such as using an iPhone's FaceTime camera to detect changes in the direction the user is facing.

This is far from the only patent or application Apple has come up with relating to audio and headphones. In October and November, Apple suggested ways to detect how headphones are worn using capacitive proximity sensors and a microphone array.

There are also filings relating to "spatial headphone transparency" to make an audio feed seem like it is coming from a user's surroundings instead of headphones, a dual-mode headphone that could double as a speaker, and headwear that could be used for health monitoring.

genovelle · December 13, 2018 2:23PM

AppleInsider said:

Audio-only group FaceTime calls with multiple people could be easier to deal with in the future, as Apple has suggested a way to process audio to implement the "cocktail party" effect, to make it easier to determine who is speaking.

Beats Studio 3 Skyline Wireless headphones

Conference calls have made it easier to communicate ideas with many other people at the same time, and has become an essential tool for business. While handy, such calls can present their own problems, simply caused by having too many people taking part.

With multiple people in a call, it can be difficult to determine who is speaking at any one time. While video calls offer visual cues to identify the current speaker, audio-only calls do not have that luxury, leaving it up to the user to recognize the voice.

A patent application from Apple published by the U.S. Patent and Trademark Office on Thursday for "Intelligent augmented audio conference calling using headphones" suggests how the use of stereo headphones and software could be used to separate out the participants.

To solve the problem, Apple proposes splitting up each separate channel from individual callers in a multi-party call. These channels are then fed into a system to simulate a virtual audio environment or "room" with appropriate acoustic properties for the listener's local environment, to make the call seem more like it is taking place in the actual room.

Apple patent application illustrating placement of multiple callers in a "virtual room" relative to the user

The remote callers are spaced apart in the virtual room, with the audio feeds for each participant adjusted so as to give the effect of the remote user's voice coming from different areas of the real room, such as around a conference table. Using a head-tracking system with the stereo headphones, the relative position of each remote user can be maintained while the user moves their head, with audio properties changing to match the user's orientation.

By using a spatial rendering audio system, Apple suggests it takes advantage of the "cocktail party mechanism" of the listener's brain for segregating audio sources. In short, placing voices in particular positions, it makes the conference call easier to track for the listener.

It is suggested the system could also use metadata to intelligently cluster together participants, such as co-workers or by company, frequency of speech, and geographic location, among other details. The virtual direction and distance of each caller could be further controlled by the user depending on their preferences, and there is even the possibility of "moving" participants around during the call, as in the case of one presenter switching out their position leading the call with another.

A block diagram showing how the audio feeds would be processed

Apple files a number of patent applications every week, and while the publication by the USPTO may indicate areas of interest to the company, it isn't a guarantee the described concepts will make their way into future consumer products.

In this case, there is a fairly good chance some form of the idea could work. The virtual "room" and audio alterations can certainly be performed on mobile devices and fed through stereo headphones, and while there aren't head-tracking headphones, it is feasible to work using other technologies, such as using an iPhone's FaceTime camera to detect changes in the direction the user is facing.

This is far from the only patent or application Apple has come up with relating to audio and headphones. In October and November, Apple suggested ways to detect how headphones are worn using capacitive proximity sensors and a microphone array.

There are also filings relating to "spatial headphone transparency" to make an audio feed seem like it is coming from a user's surroundings instead of headphones, a dual-mode headphone that could double as a speaker, and headwear that could be used for health monitoring.

This technology could also be extended to a HomePod to give the same effect without the need for headphones.

apple_evo · December 13, 2018 2:34PM

That's funny, Dolby Voice (with blue jeans now) has been doing this for years.

edited December 2018

randominternetperson · December 13, 2018 5:26PM

That's cool. Who knew our brains had a "cocktail party mechanism."

randominternetperson · December 13, 2018 5:31PM

apple_evo said:

That's funny, Dolby Voice (with blue jeans now) has been doing this for years.

The patent title* specifically refers to headphones, so perhaps that the difference. Or Apple is presenting a different method for providing a similar effect to the Dolby version?

*"INTELLIGENT AUGMENTED AUDIO CONFERENCE CALLING USING HEADPHONES"

dewme · December 13, 2018 8:32PM

This sounds like interesting technology that could be applied to problems beyond group audio conferences. The “who’s talking” problem is already solved in applications that provide a visual list of participants. It seems like an audio-only system that includes a passive intelligent assistant could be designed to whisper hints and clues into the headset you are wearing to tell you who is speaking, e.g., “Bob is now speaking” to do much the same thing. I’ve yet to see any mention of directing audio feedback and information from an intelligent agent directly into a smart audio type of device. By “passive” I mean an intelligent agent that is always listening and helping the wearer in a variety of ways, not a query based agent like Siri or Alexa.

chasm · December 14, 2018 12:25AM

apple_evo said:

That's funny, Dolby Voice (with blue jeans now) has been doing this for years.

Yes, because as we all know, once a thing has been invented, nobody else ever tries to a) accomplish the same thing differently or more efficiently, b) invents a slight variation on the original idea that presumably offers advantages for either the creator or the user, or c) implements their own substantially different standard for achieving the same result in hopes of replacing the original standard.

Never happens.

edited December 2018

dedgecko · December 14, 2018 2:46AM

This is likely coming to AirPods / Beats, or it was cut for it is not currently economically feasible, and is going to simmer for a bit.

much like rumored smart connector on the iPhone Pro mockups, that showed up two years later on iPad Pros with FaceID

fatman · December 14, 2018 12:30PM

The US patent office filings are a wonderful source of great US ideas for foreign countries to copy. My guess is that this will be implemented in a Chinese or Korean Android phone within the year. For whatever reason, Apple is slow to move on these types of concepts and will again be labeled ‘no longer an innovator’

Apple looks to make Group FaceTime audio calls more realistic

Comments