Stanford study finds Apple Watch top-notch heart rate monitor, mediocre calorie counter
A new medical study from Stanford University focusing on consumer fitness tracker reliability found Apple Watch to be the most accurate heart rate monitor out of seven popular devices, though all products tested failed in terms of calorie counting.

Published in the Journal of Personalized Medicine on Wednesday, the study seeks to determine the validity of readings from commonly worn fitness trackers. The growing number of consumers buying and wearing devices with biometric capabilities presents a unique opportunity for preventative cardiovascular medicine, but error rates of these commercial products are largely unknown, the study says.
"People are basing life decisions on the data provided by these devices," Euan Ashley, DPhil, FRCP, professor of cardiovascular medicine, of genetics and of biomedical data science at Stanford said in a statement. He went on to say that consumer devices are not bound by the same regulations as medical-grade equipment, making it difficult for doctors to quantify or otherwise apply generated data to diagnoses.
To better understand the limitations of popular fitness trackers, the study pit the Apple Watch, Basis Peak, Fitbit Surge, Microsoft Band, MIO Alpha 2, PulseOn and Samsung Gear S2 against FDA approved equipment.
A total of 60 volunteers (31 women and 29 men) donned up to four consumer devices and participated in 80 physical tests ranging from cycling to running. Test subjects were simultaneously monitored by a 12-lead electrocardiogram and continuous clinical grade indirect calorimetry, the latter measuring for expired gas.
Researchers set an acceptable error rate at 5 percent.
Apple Watch achieved the highest heart rate accuracy across measured modes of activity with an error rate of 2 percent, followed by the Basis Peak and Fitbit Surge. Samsung's Gear S2 exhibited the highest HR error rate at 6.8 percent, outside of the study's acceptable limits.
All devices fared poorly in energy expenditure, or calorie counting, tests. The most accurate device, Fitbit's Surge, managed an error rate of 27.4 percent, while the least accurate product, the PulseOn, put in a dismal performance of 92.6 percent. Interestingly, the devices logged the lowest error rates during activities like walking and running, while low impact tasks like sitting tracked measurably worse with an average error rate of 52.4 percent.
"The heart rate measurements performed far better than we expected, but the energy expenditure measures were way off the mark," Ashley said. "The magnitude of just how bad they were surprised me."
Researchers were unsure as to why energy expenditure rates were so far off from gold standard equipment, but the study notes each device uses its own proprietary algorithm for calculating calorie burn. These calculations are in large part based on individual user metrics like height, weight, BMI, fitness level, age and more. Whereas heart rate is measured directly from a user's wrist, calorie burn is an estimate derived through complex algorithms.
Ashley and his team are working on an extension to the study that takes testing beyond the laboratory and out into the real world.

Published in the Journal of Personalized Medicine on Wednesday, the study seeks to determine the validity of readings from commonly worn fitness trackers. The growing number of consumers buying and wearing devices with biometric capabilities presents a unique opportunity for preventative cardiovascular medicine, but error rates of these commercial products are largely unknown, the study says.
"People are basing life decisions on the data provided by these devices," Euan Ashley, DPhil, FRCP, professor of cardiovascular medicine, of genetics and of biomedical data science at Stanford said in a statement. He went on to say that consumer devices are not bound by the same regulations as medical-grade equipment, making it difficult for doctors to quantify or otherwise apply generated data to diagnoses.
To better understand the limitations of popular fitness trackers, the study pit the Apple Watch, Basis Peak, Fitbit Surge, Microsoft Band, MIO Alpha 2, PulseOn and Samsung Gear S2 against FDA approved equipment.
A total of 60 volunteers (31 women and 29 men) donned up to four consumer devices and participated in 80 physical tests ranging from cycling to running. Test subjects were simultaneously monitored by a 12-lead electrocardiogram and continuous clinical grade indirect calorimetry, the latter measuring for expired gas.
Researchers set an acceptable error rate at 5 percent.
Apple Watch achieved the highest heart rate accuracy across measured modes of activity with an error rate of 2 percent, followed by the Basis Peak and Fitbit Surge. Samsung's Gear S2 exhibited the highest HR error rate at 6.8 percent, outside of the study's acceptable limits.
All devices fared poorly in energy expenditure, or calorie counting, tests. The most accurate device, Fitbit's Surge, managed an error rate of 27.4 percent, while the least accurate product, the PulseOn, put in a dismal performance of 92.6 percent. Interestingly, the devices logged the lowest error rates during activities like walking and running, while low impact tasks like sitting tracked measurably worse with an average error rate of 52.4 percent.
"The heart rate measurements performed far better than we expected, but the energy expenditure measures were way off the mark," Ashley said. "The magnitude of just how bad they were surprised me."
Researchers were unsure as to why energy expenditure rates were so far off from gold standard equipment, but the study notes each device uses its own proprietary algorithm for calculating calorie burn. These calculations are in large part based on individual user metrics like height, weight, BMI, fitness level, age and more. Whereas heart rate is measured directly from a user's wrist, calorie burn is an estimate derived through complex algorithms.
Ashley and his team are working on an extension to the study that takes testing beyond the laboratory and out into the real world.

Comments
The Apple Watch was only off by two percent! I'd hardly call that mediocre or a failure, I would say it's close to perfect.
the other watches have been dead for a while now.
"All devices fared poorly in energy expenditure, or calorie counting, tests. The most accurate device, Fitbit's Surge, managed an error rate of 27.4 percent, while the least accurate product, the PulseOn, put in a dismal performance of 92.6 percent. Interestingly, the devices logged the lowest error rates during activities like walking and running, while low impact tasks like sitting tracked measurably worse with an average error rate of 52.4 percent. "
I mean if a find a reported number on the device that feels like a good balance for my own personal circumstances then can I rely on that relatively to other readings. So I have a day running around meetings lots of time on my feet and tracker says I've done 120% of my goal with no distinct periods of exercise. Is that 120% accurate even if the calories for the day aren't?
I can just start an "Other" workout on the Apple Watch and sit on my La-Z-Boy watching TV and watch those calories burn!
The Apple Watch also shows a lot more calories burnt using the "Other" workout when compared to using one of the dedicated workouts (Indoor Run, Outdoor Walk, etc.), for the same duration and pace.
I can see the logic in that. A dedicated workout would be quantifiable with a set pace or heartrate, whereas in the case of "Other", the watch does not know what type of a workout it is and counts calories burnt based on the time and heartrate. Effectively, it's almost just counting the resting calories when you aren't doing anything.
I think calorie counting is really difficult and fitness trackers should just be used as indicators. If people want to really track in and out calories, they may need to use specialised devices.
At best they are all making a guess.
Also if remember correctly Apple did testing and calibration on thousands of people not just 60. I would say Apple is probably a little better than most since they have more data. Face it even Apple with all its data is just triangulating onto a calorie number and sometimes it does a better job than others.
I've been using my Apple Watch as part of a weight loss regimen. Each day I track weight, calories (via LoseIt), and Activity from my watch. My "calories out" is calculated via estimated BMR + Watch Activity and "calories in" is estimated via LoseIt. While there is a little more to the calculation, if I sum a group of days and divide by 3500 I can predict "pounds lost/gained" during that time. Over 4 months and 25 pounds that prediction has generally been accurate to within a pound of a running average of my actual weight.
They say "don't let good be the enemy of good enough". While Stanford may claim the Apple Watch isn't "good", it has performed "good enough" for me. I'd imagine it would be "good enough" for almost everyone.
From my own experience (without scientific evidence) I think a lot of the inaccuracy happens in the "Smoothing": If heart rate is being sampled say once every second, and yours is running at 180 beats a minute, that is 3 beats a second -- so that won't work. Even if they increased the sampling rate to 3 times a second, some samples would contain a heart beat and others wouldn't -- so they have to have an algorithm to 'average' that out and produce an accurate heart rate.
This became apparent to me when my chest strap was showing heart rate spikes 30-40 beats per minute above my max heart rate -- but neither my Apple Watch nor medical grade equipment showed those spikes. So, despite the inherent increased accuracy of a chest strap over a wrist device, the wrist device was more accurate. I think the difference is in the algorithm used by the Apple Watch.
BTW, calorie expenditure can only be guessed at using crude assumptions based on limited data. Don't believe ANY of them -- at least not in absolute terms. But, if using the same app, it is useful for comparing one workout to another.
Any count of calories has to start with BMR, and no watch in the world can measure that. It can be estimated with height, weight, etc; but even then there are different methods for arriving at the calories which can vary by around 20%. It would probably pretty easy to find two people with identical height, weight and age, who at any given heart rate have wildly different caloric burn rates.
At the end of the day, using HR to estimate calories is an extremely crude method. For any one individual, with sufficient testing, you could probably create a calibration profile to map HR to calories with some degree of accuracy. But short of that Apple and the others have to make gross generalizations based on a very limited number of variables. In all likelihood, you are not going to fit that generalization as this study has demonstrated.