Apple's first AI research paper wins prestigious machine learning award

Posted:
in General Discussion edited August 2017
Apple's first publicly issued academic paper, research focusing on computer vision systems published in December, recently won a Best Paper Award at the 2017 Conference on Computer Vision & Pattern Recognition, one of the most sought after prizes in the field.




Considered one of the most influential conferences in the field according to the h-index, a metric for scholarly works, CVPR in July selected Apple's paper as one of two Best Paper Awards.

According to AppleInsider reader Tom, who holds a PhD in machine learning and computer vision, the CVPR award is one of the most sought after in the field.

This year, the conference received a record 2,680 valid submissions, of which 2,620 were reviewed. Delegates whittled down that number to 783 papers, granting long oral presentations to 71 entrants. Apple's submission ultimately made its way to the top of the pile, an impressive feat considering it was the company's inaugural showing.

CVPR's second Best Paper Award went to Gao Huang, Zhuang Liu, Laurens van der Maaten and Kilian Q. Weinberger for their research on "Densely Connected Convolutional Networks." Research for the paper was conducted by Cornell University in collaboration with Tsinghua University and Facebook AI Research.

Titled "Learning from Simulated and Unsupervised Images through Adversarial Training," Apple's paper was penned by computer vision expert Ashish Shrivastava and a team of engineers including Tomas Pfister, Oncel Tuzel, Wenda Wang, Russ Webb and Apple Director of Artificial Intelligence Research Josh Susskind. Shrivastava presented the research to CVPR attendees on July 23.





As detailed when it saw publication in December, Apple's public research paper describes techniques of training computer vision algorithms to recognize objects using synthetic images.

According to Apple, training models based solely on real-world images are often less efficient than those leveraging synthetic data cause because computer generated images are usually labeled. For example, a synthetic image of an eye or hand is annotated as such, while real-world images depicting similar objects are unknown to the algorithm and thus need to be described by a human operator.

As noted by Apple, however, relying completely on simulated images might yield unsatisfactory results, as computer generated content is sometimes not realistic enough to provide an accurate learning set. To help bridge the gap, Apple proposes a system of refining a simulator's output through SimGAN, a take on "Simulated+Unsupervised learning." The technique combines unlabeled real image data with annotated synthetic images using Generative Adversarial Networks (GANs), or competing neural networks.

In its study, Apple applied SimGAN to the evaluation of gaze and hand pose estimation in static images. The company says it hopes to one day move S+U learning beyond to support video input.

Like other Silicon Valley tech companies, Apple is sinking significant capital into machine learning and computer vision technologies. Information gleaned from such endeavors will likely enhance consumer facing products like Siri and augmented reality apps built using ARKit. The company is also working on a variety of autonomous solutions, including self-driving car applications, that could make their way to market in the coming months or years.

"We're focusing on autonomous systems," Cook said in a June interview. "It's a core technology that we view as very important. We sort of see it as the mother of all AI projects."
«1

Comments

  • Reply 1 of 24
    gatorguygatorguy Posts: 24,176member
    Quite a compliment. Well done Apple.
    doozydozenleavingthebiggjony0
  • Reply 2 of 24
    steven n.steven n. Posts: 1,229member
    Apple is soooo far behind in AI. DOOOMMMMEEEEDDDD!!!! 

    /s
    watto_cobraStrangeDaysdoozydozenleavingthebigglkruppcornchiphmurchisonjony0edred
  • Reply 3 of 24
    StrangeDaysStrangeDays Posts: 12,834member
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    doozydozenleavingthebigglkruppcornchippscooter63jony0watto_cobra
  • Reply 4 of 24
    wizard69wizard69 Posts: 13,377member
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.    A well written paper is a good sign though, makes you wonder if this tech is already baked into ARKit or Apples coming ML tech.    Or maybe more importantly is ithis a sign of coming hardware acceleration. In the end you really could use hardware acceleration to really move ML tech a long.    
    doozydozen
  • Reply 5 of 24
    SpamSandwichSpamSandwich Posts: 33,407member
    Presumably patents had already been applied for this, so the paper presented wouldn't constitute the ceding of a competitive advantage.
    doozydozencornchipwatto_cobra
  • Reply 6 of 24
    foggyhillfoggyhill Posts: 4,767member
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.    A well written paper is a good sign though, makes you wonder if this tech is already baked into ARKit or Apples coming ML tech.    Or maybe more importantly is ithis a sign of coming hardware acceleration. In the end you really could use hardware acceleration to really move ML tech a long.    
    The "non usefulness"  of Siri is more a meme than actual reality when you've actually used all of those assistants, much like some opinions on Apple maps.
    The differences is mostly in areas of failure and success than anything else.
    The main issues are related to "search" and its more the actual back-end data collection that seems at play here than AI.
    Other things like the many mike on the alexa speakers have nothing to do with Ai at all.
    equality72521leavingthebiggStrangeDayspscooter63jony0watto_cobra
  • Reply 7 of 24
    mobiusmobius Posts: 380member
    Sorry to be dumb, but why is it not far quicker and simpler to manually label the real images, rather than jump through all those hoops to refine a synthetic image with a real image in order to include its label/annotation? Is it to due to the vast number of images in the data set?
    edited August 2017
  • Reply 8 of 24
    mobius said:
    Sorry to be dumb, but why is it not far quicker and simpler to manually label the real images, rather than jump through all those hoops to refine a synthetic image with a real image in order to include its label/annotation? Is it to due to the vast number of images in the data set?
    Yes, performance of such learning algorithms improves (drastically) with the amount of training data. Human annotation is costly and less reliable (but can be done for real images). So combining the two is the obvious way to go. This approach is pretty standard in ML for language understanding.
    mobiuswatto_cobra
  • Reply 9 of 24
    Game-playing children could identify and label a billion photos of everyday objects, if software were designed which kept them entertained. Accuracy would be boosted by crowd-sourcing the exercise, so that a high variance in assigned tags would identify the problem objects -- which could then be used in a game for older children. Algorithms would become smarter by observing what features or qualities the children focus on. Children would make good role models for the algorithms.
    watto_cobrapalomine
  • Reply 10 of 24
    This is a major accomplishment! Bravo Apple!!
    watto_cobra
  • Reply 11 of 24
    foggyhillfoggyhill Posts: 4,767member
    Game-playing children could identify and label a billion photos of everyday objects, if software were designed which kept them entertained. Accuracy would be boosted by crowd-sourcing the exercise, so that a high variance in assigned tags would identify the problem objects -- which could then be used in a game for older children. Algorithms would become smarter by observing what features or qualities the children focus on. Children would make good role models for the algorithms.
    Well, you could use the same algorithm to classify... Agents with various algorythms.. So, instead of classifying the data, your classifying their efficiency at a sort of task. You could then even segregate those "agents" (with different types of learning) into various processing pools with various skills, maybe put thresholds so they slide by themselves into those work pools.

    Its like picking kids out of school in the old soviet union with various skills and then sending them to special schools, training, splitting by skills, etc until you got a lot of very trained athletes in multiples fields.

    Looking for overall training I believe is a mistake and can lead to a lot of inefficiencies in the short term. Your basically doing premature optimization by doing that.

    Those pools with various skillsets could then be used to automatically train agents with more integrated skillsets so the various "pools" work more efficiently, maybe even have a pool of workers as facilitators or balancing resources internally.




  • Reply 12 of 24
    avon b7avon b7 Posts: 7,623member
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    Better not to pay too much attention to 'people online'. For informed opinion far better to pay attention to people who are acreditted in the field.

    As to whether Apple 'understands' AI or not, well of course it does, but so does everybody else working in the field. Whether they are 'behind' or not is something completely different. I have no idea but they say the proof is in the pudding so maybe we should just wait and see.

    Just take whatever 'people online' say with a grain of salt.


    lkrupp
  • Reply 13 of 24
    lkrupplkrupp Posts: 10,557member
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.    A well written paper is a good sign though, makes you wonder if this tech is already baked into ARKit or Apples coming ML tech.    Or maybe more importantly is ithis a sign of coming hardware acceleration. In the end you really could use hardware acceleration to really move ML tech a long.    
    Baloney. I have an Echo Dot and my trusty iPhone 6 w/Siri sitting right next to each other. I have been experimenting to see which digital assistant is more ‘useful’ and my conclusion so far is that Siri is no better or worse then Alexa. I can get the same information from both, both control my HomeKit devices equally well. “Alexa, turn on the porch fan... OK.” Hey Siri, turn off the porch fan... the porch fan is off.” The only ding I would give Siri is when asking about non basic information. “Alexa, Wikipedia Ludwig von Beethoven...” and Alexa begins reading the Wikipedia article. “Hey Siri, Wikipedia Ludwig von Beethoven..,” and Siri throws the Wikipedia article up on the iPhone’s screen but does not start reading it. But then the Echo Dot doesn’t have a screen so...
    pscooter63watto_cobra
  • Reply 14 of 24
    larryjwlarryjw Posts: 1,031member
    mobius said:
    Sorry to be dumb, but why is it not far quicker and simpler to manually label the real images, rather than jump through all those hoops to refine a synthetic image with a real image in order to include its label/annotation? Is it to due to the vast number of images in the data set?
    Yes, performance of such learning algorithms improves (drastically) with the amount of training data. Human annotation is costly and less reliable (but can be done for real images). So combining the two is the obvious way to go. This approach is pretty standard in ML for language understanding.
    I think a key piece in this paper is that real images have too much noise to be the source for training. Synthetic images allow for control of the images themselves and therefore have control of the features of the neuro-network which are being developed. And synthetic images prevent the neuro-network from learning stuff that ain't true, and then having to be further trained to forget the ain't-true stuff, which would be the result of being trained on real images.

    In the real world of teaching and learning, good teaching means scaffolding the lessons so the foundations are built before a real problem is thrown at the students. The Siri group seems to be modeling teaching of neuro-networks on brain research and good teaching methods. 

    BTW, my reading suggests that Siri is no longer just the voice recognition application that Apple delivers, but has evolved into the name of their AI group.  Also note that the training paper is about image recognition with the focus on eye recognition, and hand recognition. The eye recognition is asking the system where the eye focus is, and the hand recognition is asking the system to recognize hand gestures. You can guess where all this AI is leading. 
    tmayavon b7watto_cobrapalomine
  • Reply 15 of 24
    evilutionevilution Posts: 1,399member
    Thank you, come again.
  • Reply 16 of 24
    tmaytmay Posts: 6,311member
    avon b7 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    Better not to pay too much attention to 'people online'. For informed opinion far better to pay attention to people who are acreditted in the field.

    As to whether Apple 'understands' AI or not, well of course it does, but so does everybody else working in the field. Whether they are 'behind' or not is something completely different. I have no idea but they say the proof is in the pudding so maybe we should just wait and see.

    Just take whatever 'people online' say with a grain of salt.


    The essence of the "be a head of or behind of" horse race in the media is that it's a continuum of development until one of, or a consortium of parties, creates a disruption that gives them an advantage, at which time it might become an innovation and possibly a standard. At this point in time, however, there are still plenty of players, large and small, throwing lots of money at the problem. It's a bit early to be picking winners and losers.

    As an example of that, it would be easy to look at MS and Google as "leading" in AR, up until Apple innovated with API's and a wide platform to develop and distribute to of some 500 million iOS devices even before the iPhone 8 arrives. I don't expect Google to lag more than a single generation behind Apple's ARKit, and catch up fast on the Android OS platform, but where will MS widely apply HoloLens, considered to be leading in AR technology?

    On the other hand, I would give MS a considerable advantage in VR, today anyway, simply as the large Windows PC / xbox platforms are the largest base of devices with the necessary performance to allow development and use,
    radarthekat
  • Reply 17 of 24
    StrangeDaysStrangeDays Posts: 12,834member
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.        
    Not according to the shoot-outs I've read. They concluded they all suck about the same, sometimes in differing ways.
    hmurchisonpscooter63watto_cobra
  • Reply 18 of 24
    larryjwlarryjw Posts: 1,031member
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.        
    Not according to the shoot-outs I've read. They concluded they all suck about the same, sometimes in differing ways.
    My knowledge is likely out of date, but some time ago, it was revealed that Apple had contracted with Wolfram to answer Siri questions.

    I use Wolfram Mathematica and WolframAlpha within Mathematica and the standalone app, I find it quite poor in handling natural language questions, first because it doesn't understand the wording, second because its repository of information is very limited. 

    "They all suck about the same" is likely because their knowledge repositories all "suck about the same".

    Here is the disconnect. Computer science studies and AI is all the rage, but the mundane unheralded unsexy heads-down task of collecting and cleaning data is where the resources truly are needed. 
    edited August 2017
  • Reply 19 of 24
    SoliSoli Posts: 10,035member
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.    A well written paper is a good sign though, makes you wonder if this tech is already baked into ARKit or Apples coming ML tech.    Or maybe more importantly is ithis a sign of coming hardware acceleration. In the end you really could use hardware acceleration to really move ML tech a long.    
    How do you measure Siri’s usefulness in terms of AI? If you’re even including speech-to-text from an iDevice’s weak microphone than you’re already failing at isolating Siri’s AI.

    Personally, that seems to be the biggest technical issue I’ve ever had with Siri, so I expect HomePod will have people claiming that Siri magically became good all of a sudden and only for HomePod.

    The other major issue I’ve had is from Apple’s poor feature stepping for a server-side service and lack of periodic updates to clue users in to new and updated uses—which is something Amazon has be great at doing. There are likely many great uses for Siri that I’ve never known about or have forgotten after trying it early on and not getting the desired or expected response.
    edited August 2017
  • Reply 20 of 24
    SoliSoli Posts: 10,035member
    larryjw said:
    wizard69 said:
    But but but -- people online said Apple doesn't understand AI and are too far behind Facebook and friends!
    If you judge them by the usefulness of Siri, Apple is behind.        
    Not according to the shoot-outs I've read. They concluded they all suck about the same, sometimes in differing ways.
    My knowledge is likely out of date, but some time ago, it was revealed that Apple had contracted with Wolfram to answer Siri questions.

    I use Wolfram Mathematica and WolframAlpha within Mathematica and the standalone app, I find it quite poor in handling natural language questions, first because it doesn't understand the wording, second because its repository of information is very limited. 

    "They all suck about the same" is likely because their knowledge repositories all "suck about the same".

    Here is the disconnect. Computer science studies and AI is all the rage, but the mundane unheralded unsexy heads-down task of collecting and cleaning data is where the resources truly are needed. 
    W-A queries can be easily tested. Do you have any examples?
Sign In or Register to comment.