Apple filing takes Podcasts to the next level
A recently published filing discovered by AppleInsider reveals work by Apple's chief software architect to advance the Podcast beyond its static form and into a live interactive presentation medium suitable for use by educational institutes and businesses for their daily presentations.
"Podcasts of classroom lectures and other presentations typically require manual editing to switch the focus between the video feed of [an] instructor and the slides (or other contents) being presented," Bertrand Serlet, Senior Vice President of Software Engineering at Apple, wrote in the 15-page filing. "In a school or enterprise where many presentations take place daily, editing podcasts require a dedicated person, which can be prohibitive. "
To solve this problem, Serlet proposes has proposed an automated content capture and processing system where a live camera feed of a presenter can be automatically merged with a Keynote or PowerPoint presentation to form an entertaining and dynamic podcast that lets the viewer watch the presenter's slides as well as the presenter.
In one example outlined in the filing, the content capture system provides a video stream (Stream A) and an Keynote presentation stream (Stream to a recording agent such as a Mac running specialized Podcast creation software. The recording agent then blends the two feeds together based on certain cues and sends the combined feed to a syndication server that would then distribute the video wirelessly as a Podcast to any number of authorized Macs, iPods or iPhones.
Serlet also explained that syndication server could include an automated content creation application that applies one or more operations on the Streams A and/or B to create new content, such as transitions, effects, titles, graphics, audio, narration, avatars, animations, and so forth.
"For example, a content stream (e.g., Stream output by the application can be shown as background (e.g., full screen mode) with a small picture in picture (PIP) window overlying the background for showing the video camera output (e.g., Stream A)," he wrote. "If a slide in Stream B does not change (e.g., the "trigger event") for a predetermined interval of time (e.g., 15 seconds), then Stream A can be operated on (e.g., scaled to full screen on the display). A virtual zoom (e.g., Ken Burns effect) or other effect can be applied to Stream A for a close-up of the instructor or other object (e.g., an audience member) in the environment (e.g., a classroom, lecture hall, studio)."
The Apple executive also explained that trigger events can be captured from the actual presentation environment using, for example, the capture system, including patterns of activity of the instructor giving a presentation and/or of the reaction of an audience watching the presentation.
"The instructor could make certain gestures, or movements (e.g., captured by the video camera), speak certain words, commands or phrases (e.g., captured by a microphone as an audio snippet) or take long pauses before speaking, all of which can generate events in Stream A that can be used to trigger operations," he wrote.
"In one exemplary scenario, the video of the instructor could be shown in full screen as a default. But if the capture system detects that the instructor has turned his back to the audience to read a slide of the presentation, such action can be detected in the video stream and used to apply one or more operations on Stream A or Stream B, including zooming Stream B so that the slide being read by the instructor is presented to the viewer in full screen."
Throughout the filing, Serlet outlined examples of several other potential trigger events, such as the movement of a presentation pointer (e.g., a laser pointer) which could then be captured and detected as an event by an "event detector." For instance, the direction of the laser pointer to a slide can indicate that the instructor is talking about a particular area of the slide. Therefore, in one implementation, an operation can be to show the slide to the viewer.
"The movement of a laser pointer can be detected in the video stream using AVSR software or other known pattern matching algorithms that can isolate the laser's red dot on a pixel device and track its motion (e.g., centroiding)," he added. "If a red dot is detected, then slides can be switched or other operations performed on the video or application streams. Alternatively, a laser pointer can emit a signal (e.g., radio frequency, infrared) when activated that can be received by a suitable receiver (e.g., a wireless transceiver) in the capture system and used to initiate one or more operations.
In some other implementations, a detection of a change of state in a stream is used to determine what is captured from the stream and presented in the final media file or podcast. For instance, the instructors transition to a new slide can cause a switch back from a camera feed of the instructor to a slide. When a new slide is presented by the instructor, the application stream containing the slide would be shown first as a default configuration, and then switched to the video stream showing the instructor, respectively, after a first predetermined period of time has expired. In other implementations, after a second predetermined interval of time has expired, the streams can be switched back to the default configuration.
Taking his next-generation podcast concept a step further, Serlet went on to say that the capture system could conceivably include a video camera that can follow the instructor as he moves about the environment. The cameras could be moved by human operator or automatically using known location detection technology. The camera location information could then be used to trigger an operation on a stream and/or determine what is captured and presented in the final media file or podcast.
It should be noted that Serlet's concept one of at least three Podcast enhancements proposed by Apple employees in recent patent filings, none of which have come to fruition as of yet. Others include personalized on-demand podcasts and Podmaps.
"Podcasts of classroom lectures and other presentations typically require manual editing to switch the focus between the video feed of [an] instructor and the slides (or other contents) being presented," Bertrand Serlet, Senior Vice President of Software Engineering at Apple, wrote in the 15-page filing. "In a school or enterprise where many presentations take place daily, editing podcasts require a dedicated person, which can be prohibitive. "
To solve this problem, Serlet proposes has proposed an automated content capture and processing system where a live camera feed of a presenter can be automatically merged with a Keynote or PowerPoint presentation to form an entertaining and dynamic podcast that lets the viewer watch the presenter's slides as well as the presenter.
In one example outlined in the filing, the content capture system provides a video stream (Stream A) and an Keynote presentation stream (Stream to a recording agent such as a Mac running specialized Podcast creation software. The recording agent then blends the two feeds together based on certain cues and sends the combined feed to a syndication server that would then distribute the video wirelessly as a Podcast to any number of authorized Macs, iPods or iPhones.
Serlet also explained that syndication server could include an automated content creation application that applies one or more operations on the Streams A and/or B to create new content, such as transitions, effects, titles, graphics, audio, narration, avatars, animations, and so forth.
"For example, a content stream (e.g., Stream output by the application can be shown as background (e.g., full screen mode) with a small picture in picture (PIP) window overlying the background for showing the video camera output (e.g., Stream A)," he wrote. "If a slide in Stream B does not change (e.g., the "trigger event") for a predetermined interval of time (e.g., 15 seconds), then Stream A can be operated on (e.g., scaled to full screen on the display). A virtual zoom (e.g., Ken Burns effect) or other effect can be applied to Stream A for a close-up of the instructor or other object (e.g., an audience member) in the environment (e.g., a classroom, lecture hall, studio)."
The Apple executive also explained that trigger events can be captured from the actual presentation environment using, for example, the capture system, including patterns of activity of the instructor giving a presentation and/or of the reaction of an audience watching the presentation.
"The instructor could make certain gestures, or movements (e.g., captured by the video camera), speak certain words, commands or phrases (e.g., captured by a microphone as an audio snippet) or take long pauses before speaking, all of which can generate events in Stream A that can be used to trigger operations," he wrote.
"In one exemplary scenario, the video of the instructor could be shown in full screen as a default. But if the capture system detects that the instructor has turned his back to the audience to read a slide of the presentation, such action can be detected in the video stream and used to apply one or more operations on Stream A or Stream B, including zooming Stream B so that the slide being read by the instructor is presented to the viewer in full screen."
Throughout the filing, Serlet outlined examples of several other potential trigger events, such as the movement of a presentation pointer (e.g., a laser pointer) which could then be captured and detected as an event by an "event detector." For instance, the direction of the laser pointer to a slide can indicate that the instructor is talking about a particular area of the slide. Therefore, in one implementation, an operation can be to show the slide to the viewer.
"The movement of a laser pointer can be detected in the video stream using AVSR software or other known pattern matching algorithms that can isolate the laser's red dot on a pixel device and track its motion (e.g., centroiding)," he added. "If a red dot is detected, then slides can be switched or other operations performed on the video or application streams. Alternatively, a laser pointer can emit a signal (e.g., radio frequency, infrared) when activated that can be received by a suitable receiver (e.g., a wireless transceiver) in the capture system and used to initiate one or more operations.
In some other implementations, a detection of a change of state in a stream is used to determine what is captured from the stream and presented in the final media file or podcast. For instance, the instructors transition to a new slide can cause a switch back from a camera feed of the instructor to a slide. When a new slide is presented by the instructor, the application stream containing the slide would be shown first as a default configuration, and then switched to the video stream showing the instructor, respectively, after a first predetermined period of time has expired. In other implementations, after a second predetermined interval of time has expired, the streams can be switched back to the default configuration.
Taking his next-generation podcast concept a step further, Serlet went on to say that the capture system could conceivably include a video camera that can follow the instructor as he moves about the environment. The cameras could be moved by human operator or automatically using known location detection technology. The camera location information could then be used to trigger an operation on a stream and/or determine what is captured and presented in the final media file or podcast.
It should be noted that Serlet's concept one of at least three Podcast enhancements proposed by Apple employees in recent patent filings, none of which have come to fruition as of yet. Others include personalized on-demand podcasts and Podmaps.
Comments
Could make slide presentations more interesting by allowing more on-the-fly interactivity. Sounds like a patent that Steve suggested...
It's very similar to Steve's setups for Keynote presentations at Macworld, WWDC, etc.
It's interesting.
K
The software that the patent describes seems like it should be copywritable material. I could certainly understand that. It would be an original work ( or a derivative of say, FCP, podcast producer and Keynote).
I understand that Apple has to patent this, or else someone else will. Years will go by, and a faceless "intelectual property right" firm will file a suit in the eastern district of TX claiming that Apple is "willfully and purposefully" infringing on their patents and causing them irreprable damage. You know, we've seen the script before.
Ah, just some ramblings against the current state of patent law.
Color me crazy, but this looks a lot like the patent suits that Apple has been hit with lately. Mainly, that the concept is not really patent worthy. They are taking existing technologies (broadcasting, video streaming and podcasting) and repackaging them in a shiney new product. There is just nothing that is intrinsically new.
The software that the patent describes seems like it should be copywritable material. I could certainly understand that. It would be an original work ( or a derivative of say, FCP, podcast producer and Keynote).
I understand that Apple has to patent this, or else someone else will. Years will go by, and a faceless "intelectual property right" firm will file a suit in the eastern district of TX claiming that Apple is "willfully and purposefully" infringing on their patents and causing them irreprable damage. You know, we've seen the script before.
Ah, just some ramblings against the current state of patent law.
I would echo the comments here. You mean to tell me that a service like Pandora, which incorporate things like ratings, bookmarking, buying songs, etc. into the player isn't prior art? So much of this stuff is derivative. I personally posted similar thoughts and ideas regarding next generation media player plays many months ago, thoughts which of course are influenced by what lots of different vendors are doing.
The whole patent thing has gotten out of hand, IMHO, and specific to Apple it raises some question, as to whether these types of filings are offensive moves or defensive ones, something I blogged about in:
Upward Mobility, Land Grabs and the iPhone Universe
http://thenetworkgarden.com/weblog/2...-mobility.html
Check out the post if interested.
Mark
Doesn't the Screenflow product already do most of this? It's easy to capture video of you and what is going on on your screen, switch back and forth, both at once, zoom in, call out areas, etc.
ScreenFlow is an amazing app! The fact that it uses CoreAnimation so well means that I can have multiple QT videos playing, record myself, use Exposé and not have one hiccup on my MacBook. The editing it allows are simple yet robust.
I've recently made some simple tutorials for a switcher I know. Mostly for fun, but the 1 minute videos only take about 5 minutes to edit,a dn that includes using the fancy angled screen display like in Apple keynotes, reflections, expanded view around the mouse, etc.
PS: You can check out a video of it in action in the link above.
Wow, look at the analog volume slider on that iPod!
Not to mention the mainframe needed to run the Syndication Server!
But there's a far more serious problem with podcasting as it is now, particularly in an education context, Podcasting is like newscasting. It assumes an audience that's waiting for each episode as it comes out. That is often not true.
Currently, if someone wants to study a topic for which there's an extensive series of podcasts, say Japanese101, they have to manage everything by hand, starting months back, carefully downloading a few episodes, listening to them, and then going back for a few more, going to a great deal of effort not to loose track of where they are.
That is poor. Podcasting needs an 'education mode' that would let users select when they start and how many episodes a week to download. Interested in learning Japanese but already knowing a little, I'd could choose to start Japanese101 with Episode 20 and have two episodes a week downloaded through iTunes to my iPod without my having to klutz with anything.
Do that, and podcasting becomes a real educational medium.
Wow, look at the analog volume slider on that iPod!
Wow! - I wonder if the Macintosh LC 550 and the Powerbook 540 are included?
And for my next patent, the coffee maker will make coffee when "certain gestures, or movements" are "used to trigger [the] operation".
http://doi.ieeecomputersociety.org/1...R.2004.1334682
http://doi.ieeecomputersociety.org/1...R.2002.1004189
http://www.research.ibm.com/journal/sj/384/abowd.html
and that's with just a basic search.
Color me crazy, but this looks a lot like the patent suits that Apple has been hit with lately. Mainly, that the concept is not really patent worthy. They are taking existing technologies (broadcasting, video streaming and podcasting) and repackaging them in a shiney new product. There is just nothing that is intrinsically new.
The software that the patent describes seems like it should be copywritable material. I could certainly understand that. It would be an original work ( or a derivative of say, FCP, podcast producer and Keynote).
I understand that Apple has to patent this, or else someone else will. Years will go by, and a faceless "intelectual property right" firm will file a suit in the eastern district of TX claiming that Apple is "willfully and purposefully" infringing on their patents and causing them irreprable damage. You know, we've seen the script before.
Ah, just some ramblings against the current state of patent law.
If it's patentable, they should patent it. Better than facing lawsuits that could force them into unfavorable licensing deals or get stuck with judgments that could cost them millions. They are duty-bound to protect and increase AAPL stock for the shareholders.
Color me crazy, but this looks a lot like the patent suits that Apple has been hit with lately. Mainly, that the concept is not really patent worthy. They are taking existing technologies (broadcasting, video streaming and podcasting) and repackaging them in a shiney new product. There is just nothing that is intrinsically new.
The software that the patent describes seems like it should be copywritable material. I could certainly understand that. It would be an original work ( or a derivative of say, FCP, podcast producer and Keynote).
I understand that Apple has to patent this, or else someone else will. Years will go by, and a faceless "intelectual property right" firm will file a suit in the eastern district of TX claiming that Apple is "willfully and purposefully" infringing on their patents and causing them irreprable damage. You know, we've seen the script before.
Ah, just some ramblings against the current state of patent law.
Knowledge of patent law is helpful here.
It is perfectly all right to combine the patents of others in a way that none of them singly would have been used, if that results in a product, or service, that is unique, when compared to those other patents, as this appears to be.
This may, or may not, incorporate other patented work. Are you saying that you have knowledge that it does?
Or are you just guessing?
I would echo the comments here. You mean to tell me that a service like Pandora, which incorporate things like ratings, bookmarking, buying songs, etc. into the player isn't prior art? So much of this stuff is derivative. I personally posted similar thoughts and ideas regarding next generation media player plays many months ago, thoughts which of course are influenced by what lots of different vendors are doing.
The whole patent thing has gotten out of hand, IMHO, and specific to Apple it raises some question, as to whether these types of filings are offensive moves or defensive ones, something I blogged about in:
Upward Mobility, Land Grabs and the iPhone Universe
http://thenetworkgarden.com/weblog/2...-mobility.html
Check out the post if interested.
Mark
Can you illustrate what you are saying?
Simply because a cursury look at something SEEMS, to the unsophistacated in those areas, to be similar, doesn't mean that they are.
There is often more than one way to do something. It's rarely the end result that is patentable, but the way of getting that result.
I don't know enough about the way Pandora does what it does, or for that matter, exactly what it does.
You are saying that what Apple is proposing here is what Pandora does, in the way it does it?
It's a good idea--get around the problem of needing someone to do camera cuts by providing both feeds, so the viewers can choose for themselves. Whether it qualifies for patent protection is another story. DVD players already permit viewers to choose a camera angle, which would seem to be prior art.
There is no similarity. Camera angles on DVD are recorded, and edited in advance. You are merely choosing between one recording, as in a chapter, and another. This is a live interaction, which is entirely different.
But there's a far more serious problem with podcasting as it is now, particularly in an education context, Podcasting is like newscasting. It assumes an audience that's waiting for each episode as it comes out. That is often not true.
Currently, if someone wants to study a topic for which there's an extensive series of podcasts, say Japanese101, they have to manage everything by hand, starting months back, carefully downloading a few episodes, listening to them, and then going back for a few more, going to a great deal of effort not to loose track of where they are.
That is poor. Podcasting needs an 'education mode' that would let users select when they start and how many episodes a week to download. Interested in learning Japanese but already knowing a little, I'd could choose to start Japanese101 with Episode 20 and have two episodes a week downloaded through iTunes to my iPod without my having to klutz with anything.
Do that, and podcasting becomes a real educational medium.
Again, this for a live presentation, not a recorded, static lecture.
Another one of these junk patents where Apple appears to do a lot of hand-waving (an especially appropriate description given some of the their description, e.g. "the instructor could make certain gestures, or movements [...] that can be used to trigger operations") about something they apparently have no technology to back up.
And for my next patent, the coffee maker will make coffee when "certain gestures, or movements" are "used to trigger [the] operation".
Can you show why it's a "junk" patent application? Other than your overall distain of Apple's patents, that is.