Top Secret Features (leopard)

fisha · February 14, 2007 5:54AM

Quote:

On that video I could only identify a few gesture derived from that capability

- point (basic mouse gesture)

- zoom (in or out)

- rotate (in screen or out of screen (tilt))

- anchor and move (keep one area or object stationary while moving another)

Pretty much exactly my thoughts. What more do you need to handle the simple things? There are already common universal methods that devices use to manage commands into the OS ( such as a mouse wheel and scrolling ) so why not simply apply a gesture to the same command.

I dont think it would be hard to do. In fact, i reckon you could even make it possible to implement simple gesture at the iSight. For example, in iChat and the moving backgrounds dont you just have to let it take a snapshot as a reference?, and then it'll figure out the moving sections and paint the image on the non-moving sections based on its snapshot.

It wouldn't be too hard for it to take a few reference samples of a hand gesture / movement and recognise them. Imagine scrolling... hand up towards ( not right on the screen though ) top corner, move smoothly down to bottom corner. motion tracking could follow that easily. it only has to be a gesture . . . not exact for the command to be recognised.

... a slightly further idea ... hand up towards the corner and with a finger pointed outwards, a little wiggle of the finger first which indentifies a gesture command ( OS could even place a red gesture icon/pointer glowing circle on screen as overlay to shows its picked up the gesture ) and then computer can track the original source point of the *wiggle* and see what the gesture is.

Then you could have 2 hand gestures for the zooming and rotating . . . 2 fingers, 2 wiggles, 2 red circles, 2 movements . . . 1 implemented action.

meelash · February 14, 2007 6:15AM

Quote:

Originally Posted by fisha

Pretty much exactly my thoughts. What more do you need to handle the simple things? There are already common universal methods that devices use to manage commands into the OS ( such as a mouse wheel and scrolling ) so why not simply apply a gesture to the same command.

I dont think it would be hard to do. In fact, i reckon you could even make it possible to implement simple gesture at the iSight. For example, in iChat and the moving backgrounds dont you just have to let it take a snapshot as a reference?, and then it'll figure out the moving sections and paint the image on the non-moving sections based on its snapshot.

It wouldn't be too hard for it to take a few reference samples of a hand gesture / movement and recognise them. Imagine scrolling... hand up towards ( not right on the screen though ) top corner, move smoothly down to bottom corner. motion tracking could follow that easily. it only has to be a gesture . . . not exact for the command to be recognised.

... a slightly further idea ... hand up towards the corner and with a finger pointed outwards, a little wiggle of the finger first which indentifies a gesture command ( OS could even place a red gesture icon/pointer glowing circle on screen as overlay to shows its picked up the gesture ) and then computer can track the original source point of the *wiggle* and see what the gesture is.

Then you could have 2 hand gestures for the zooming and rotating . . . 2 fingers, 2 wiggles, 2 red circles, 2 movements . . . 1 implemented action.

It's not really that easy to implement this using a camera. You are thinking it's simple because it comes so naturally to a human or animal, but we actually have very complicated "computers" and two eyes that enable us to perform gesture recognition and generally pick out and isolate solid objects in three dimensions. If Apple was able to easily implement this kind of recognition using only a single camera, it would be a huge deal in the robotics field...

fisha · February 14, 2007 6:56AM

Quote:

Originally Posted by meelash

It's not really that easy to implement this using a camera. You are thinking it's simple because it comes so naturally to a human or animal, but we actually have very complicated "computers" and two eyes that enable us to perform gesture recognition and generally pick out and isolate solid objects in three dimensions. If Apple was able to easily implement this kind of recognition using only a single camera, it would be a huge deal in the robotics field...

Its been perfectly possible with the eyeToy device for the PS2. A very simple example of the eyeToy was one where all you had to do was stand in front of the camera on top of the TV The screen showed a very faint overlay of what the camera saw ( i.e. you ) and then loads of bubbles floating around on the screen. All you had to do was wiggle your finger beside a bubble and it'd pop and make a little sound. This worked in full screen mode.

You example of having to co-ordinate onto a 3d point is too extreme and too accurate. All the camera has to do is to identify a relatively appropriate movement area which is changing rapidly in relation to its surroundings, the wiggle, which is fairly simple to implement in terms of image comparison. ( iChat already does similar by having the changable backgrounds - its comparing moving sections against realtive stationary sections )

Then once its identified a movement area, it just has to a very basic track of the direction of movement. The area of movement to track ( the hand ) wont visually change too much from the perstpective of the camera, so simple template and motion track along with some obvious knowledge of up, down, left and right motion tracks which it can choose from, it should be easy to guess the gesture.

Accurate tracking for super precise scrolling may be much more difficult, but in simple gesture principles of flicking between images in a slideshow, going back and forward between pages on a web browser etc, it should not be that hard.

kickaha · February 14, 2007 11:05AM

Quote:

Originally Posted by meelash

It's not really that easy to implement this using a camera.

Actually, it took me 45 minutes to write a prototype using QuickTime back in 2002...

http://www.cs.unc.edu/~smithja/facetop

Gestures hooked into Cocoa, 'mouse' events, the whole bit. Surpasses the eyeToy by quite a bit, and predated it.

fisha is right, you don't have to do any 3D mapping - that's the beauty of this system... it completely eliminates the need for expensive 3D spatial recognition or registration hardware. Instead, it uses what's between our ears to do all the heavy lifting.

fisha · February 14, 2007 1:11PM

Kickaha,

Thats really cool . . . very similar to what i was thinking . . . including hooking to the mouse or keyboard events.

How accurate could you make the system? Down to what size . . . your finger tip?

oops:

I should have read all that in more detail. . . i see the green box now around your fingertip.

How come you didn't take it any further? or have you?

kickaha · February 14, 2007 2:27PM

Actually, I haven't worked on it appreciably since 2004, but a couple of other students ran with it. It was a fun side-project, but I needed to get back to my dissertation.

We had it accurate to about a 4x4 pixel area on the camera, which translated to about a 10x10 spot on the screen. Then we used some simple heuristics to keep it from jumping around in that area... was really quite smooth. Another student put many of the gestures in, and he was able to select blocks of text to a character accuracy, and click, double-click, you name it. He was *good* with it.

The green block was just a visual feedback indicator during development so we could see the local search region.

meelash · February 14, 2007 2:35PM

Quote:

Originally Posted by Kickaha

Actually, it took me 45 minutes to write a prototype using QuickTime back in 2002...

http://www.cs.unc.edu/~smithja/facetop

Gestures hooked into Cocoa, 'mouse' events, the whole bit. Surpasses the eyeToy by quite a bit, and predated it.

fisha is right, you don't have to do any 3D mapping - that's the beauty of this system... it completely eliminates the need for expensive 3D spatial recognition or registration hardware. Instead, it uses what's between our ears to do all the heavy lifting.

You're right, I believe we've even discussed this on this forum before and I'd forgotten about it. Cool stuff

Top Secret Features (leopard)

Comments