Apple Watch 'black box' algorithms unreliable for medical research [u]

2»

Comments

  • Reply 21 of 35
    mike1mike1 Posts: 3,286member
    daydalaus said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I think this largely qualifies as a phenomenon, if you see the definition used largely on science is about an observable occurrence, they observed this behavior and they managed to see it repeatedly hence it ocurres... maybe you are confusing it with another use of the word?

    phenomenon
    [fəˈnäməˌnän, fəˈnäməˌnən]
    NOUN
    phenomenon (noun) · phenomena (plural noun)
    a fact or situation that is observed to exist or happen, especially one whose cause or explanation is in question.
    "glaciers are unique and interesting natural phenomena"

    No mystery about the cause here.
  • Reply 22 of 35
    larryjw said:
    NASEM (National Association of Science, Engineering and Medicine), formerly just NAS, issued a report in 2019 requested by Congress, on Reproducibility and Replication in Science. I'd highly recommend it. 

    The report distinguishes Reproducibility from Replication. To reproduce is to take the original data and reanalyze it, sometimes also using the same software used in the original study. Replication is duplicating the original study -- different researchers, different conditions, etc. 

    The NASEM report notes that software used by researchers are subject to change (black boxes, if you will) and this can alter results of studies. Software like R, SAS, SPSS, etc are often updated. 

    Frankly, I'm not at all clear what these paragraphs from the above article mean:
    "Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered. 

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected. "

    A set of data from December 2018, and another set from September 2020? Why should data have been identical? What am I not understanding? Did they really take some raw data from December 2018, and pass it through two separate algorithms and found a difference in how that data was interpreted? 

    In any case, I don't take my Apple Watch results that seriously. I expect a lot of variation. Where on my wrist I wear it, software upgrades, skin changes, sweat, environment, different watch with different sensors. Big picture trends is the only thing that I would expect would count, not absolute values. I have no clue as to the error bars of the Apple Watch. Using medical quality devices would be ideal, but nobody wears them for 20 hours per day over years -- medical equipment is used over a few days or a few minutes -- quite limited in value even if perfectly accurate. 

    They explain it way better in the original study (linked on the article), they are two exports of the phone data of the SAME information, but made on different times. It's not a case of taking samples of a person in different times and expect them to be the same. In this case they collected HRV information for a year and they did an export on September 2020 (means they took the HRV data from the phone up to that point), then they did an export months later in April 2021 (they took the data from the phone up to that point). It turned out that the SAME data from the beginning of the samples to September 2020 were slightly different on the two times they exported it. This means that Apple at some point changed the way they interpret the raw data and then RETROACTIVELY changed the value of the historic information. This is the problem for them they don't complain that the Apple Watch doesn't have accurate measures from time to time, they say that the providers can change historic data and how they interpret the raw info without any notice, so the solution is for them to give them the Raw information (things they don't do)
  • Reply 23 of 35
    mike1 said:
    daydalaus said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I think this largely qualifies as a phenomenon, if you see the definition used largely on science is about an observable occurrence, they observed this behavior and they managed to see it repeatedly hence it ocurres... maybe you are confusing it with another use of the word?

    phenomenon
    [fəˈnäməˌnän, fəˈnäməˌnən]
    NOUN
    phenomenon (noun) · phenomena (plural noun)
    1. a fact or situation that is observed to exist or happen, especially one whose cause or explanation is in question.
      "glaciers are unique and interesting natural phenomena"

    No mystery about the cause here.
    Well, it's not a required characteristic to have an explanation in question to be a phenomena. But on that point, I'm glad you are so sure about the way software companies work with historic data on their devices, which is the point they are talking about here, but since it's an obscure system for the researchers I see they are only applying replication of the situation to understand the results, is this a case of an obvious situation or is this a case that it looks obvious to US because we read the article and we were made aware of it. I don't think is that simple to assume that it's a known universal fact
    edited July 2021
  • Reply 24 of 35
    jimh2jimh2 Posts: 617member
    Those running this study are people who do not want outsiders rocking their boat (eg revenue stream). Publish articles with FUD and hope the regulators shut them down. 
  • Reply 25 of 35
    dysamoriadysamoria Posts: 3,430member
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    You’re missing the point. If they can’t tell WHY an algorithm changes the results so dramatically, the devices are useless for collecting scientific study data. The data itself might be invalid. Someone who thinks they’re so much smarter than these researchers should grasp that “software updates” do not mean “improvements” or “greater accuracy” in a situation where the algorithm itself is a black box.
    muthuk_vanalingamMplsP
  • Reply 26 of 35
    dysamoriadysamoria Posts: 3,430member
    igorsky said:
    neoncat said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I realize this is the new-normal for Apple sites—to be a dismissive prick, like a badge of honor—but the problem the article highlights is the researchers *don't have control* over this algorithmic versioning. This is public data, that *Apple itself* is encouraging be used for these studies (ResearchKit, anyone?) I think the frustration is entirely above-board. 

    If you're going to welcome someone to do their job with your data, and create an interface for that data, don't be so surprised if that person says, "yeah but... this data isn't what we need." So, where's the problem again?

    Right, yes, of course, Anyone But Apple™. Sorry, I'll be better.  
    I see as many posts like yours, with that sarcastic anti-Apple tone, as a see from those who defend them.  So not really sure what the new normal is around here.
    There’s a difference and it’s increasingly exhausting to have to explain it to people who come along with the “you’re both the same” rhetoric.

     The sarcasm in the response is different from the sarcasm it responds to. The initial commentary was illogical, and entirely missed the point of the researchers’ complaints because the commentator’s pro-Apple bias. The response points out this bias and probably reveals the author’s frustration with that bias.

    To say these two were exactly the same or “mere ideological opposites” is akin to gaslighting the person who’s understandably responding to irrational rhetoric. You also give the impression of trying to be superior by appearing neutral, where there’s a factual debate, not a mere disagreement of opinions.

    The black box algorithms are an obstruction to the collection of scientifically valid data. Sane data. No one can ever know what reality is with that data, so why use the tool to collect said data? Attacking researchers for this is ridiculous and is based in pro-Apple fanaticism. The response was not “anti-Apple fanaticism”.
    muthuk_vanalingamMplsP
  • Reply 27 of 35
    macplusplusmacplusplus Posts: 2,112member

    Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered.

    Even medical-grade devices update their firmware. Are those updates "transparent" ? Do you know how that medical-grade ECG machine of yours reports those "bundle blocks" supposedly existing in your heart? So stop making stupid assumptions and thank Apple for refining its algorithms and backwardly user data. This just shows Apple's serious commitment to the integrity and accuracy of user health data.

    Whether that "revision" is suitable to your research project is not my concern nor Apple's business. Go find some funding...
    edited July 2021
  • Reply 28 of 35
    dysamoriadysamoria Posts: 3,430member

    Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered.

    Even medical-grade devices update their firmware. Are those updates "transparent" ? Do you know how that medical-grade ECG machine of yours reports those "bundle blocks" supposedly existing in your heart? So stop making stupid assumptions and thank Apple for refining its algorithms and backwardly user data. This just shows Apple's serious commitment to the integrity and accuracy of user health data.

    Whether that "revision" is suitable to your research project is not my concern nor Apple's business. Go find some funding...
    Medical-grade devices generally have simpler designs and more reliable sensing mechanisms because they’re not designed to be a small, standalone, luxury wrist computer.
  • Reply 29 of 35
    GeorgeBMacGeorgeBMac Posts: 11,421member
    daydalaus said:
    I have to chuckle....   It's widely known and acknowledged in medical circles that the single largest determinant in the outcome of any medical study is who paid for the study.   They go through extreme measures to insure the integrity of the data (which it frequently isn't) -- but there are no controls over the design of the study or its analysis.

    A classic example is proving that chocolate chip cookies are healthy -- by comparing them to Oreo cookies.

    Likewise, it is predictable that this guy used Heart Rate Variability to criticize the Apple Watch's accuracy.
    Anybody familiar with Heart Rate Variability knows that Heart Rate Variability is, well, extremely variable.  Yesterday mine ranged from 18 to 57.
    Those who study it and use it know that to get accurate results you need identical conditions.   So, it is recommended that you check it when you wake up and before you even get out of bed.

    As well, medical personal have always had a bias against consumer grade equipment and preferred their own medical grade equipment over it.
    The result is:   Less data collected in the most expensive way possible -- which limits data collection even further!

    So sorry pal!
    Research has long suffered because of the things you are pushing.
    Mobile data collection is here and it will be growing. 
    And, it will change medicine.
    ....  For the first time researchers will be able to obtain accurate, objective data real time on lifestyle choices.   No more will they be limited to questionnaires asking "How much have you exercised over the last month"?  Or, "What intensity do you exercise at?".  The Apple Watch can collect and monitor that in real time as it happens -- and deliver the cheapest, most accurate data research has ever had.

    Basically I would say to this researcher that it isn't the Apple Watch that is unreliable.   It's the researchers.
    I think there is a miss-conception about the study, reading it they explain that they exported the same data twice, they aren't comparing two different days for the same person and say that HRV is unreliable, they registered HRV during more than a year in a phone and they exported the SAME data in September 2020 and April 2021, the second time the same data was exported they saw different results than the first time, for the same days (previous to September 2020). This means that apple used the same raw data, in both cases, but after one year they recalculated ALL HRV information (even the historic data) with a new algorithm, hence now all the data is slightly different. That's why they say they can't be used, not because the measurements are variable, but because the manufacturer can modify historic data without notice, which makes things unreliable for statistical analysis

    The same goes on with any medical equipment -- whether medical grade or consumer grade.  There's hardware and there's software and either can change.

    But, with HRV on the Apple Watch, you don't "Export it".  It's calculated once as it happens and recorded in the Health App.  You don't get a second shot at the same data.
  • Reply 30 of 35
    GeorgeBMacGeorgeBMac Posts: 11,421member
    larryjw said:
    NASEM (National Association of Science, Engineering and Medicine), formerly just NAS, issued a report in 2019 requested by Congress, on Reproducibility and Replication in Science. I'd highly recommend it. 

    The report distinguishes Reproducibility from Replication. To reproduce is to take the original data and reanalyze it, sometimes also using the same software used in the original study. Replication is duplicating the original study -- different researchers, different conditions, etc. 

    The NASEM report notes that software used by researchers are subject to change (black boxes, if you will) and this can alter results of studies. Software like R, SAS, SPSS, etc are often updated. 

    Frankly, I'm not at all clear what these paragraphs from the above article mean:
    "Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered. 

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected. "

    A set of data from December 2018, and another set from September 2020? Why should data have been identical? What am I not understanding? Did they really take some raw data from December 2018, and pass it through two separate algorithms and found a difference in how that data was interpreted? 

    In any case, I don't take my Apple Watch results that seriously. I expect a lot of variation. 
    Where on my wrist I wear it, software upgrades, skin changes, sweat, environment, different watch with different sensors. Big picture trends is the only thing that I would expect would count, not absolute values. I have no clue as to the error bars of the Apple Watch. Using medical quality devices would be ideal, but nobody wears them for 20 hours per day over years -- medical equipment is used over a few days or a few minutes -- quite limited in value even if perfectly accurate. 


    I think the software they mention (" R, SAS, SPSS") is analytical software programs used to analyze things like research data.  The researchers run their raw data through those programs and the program pops out the answer -- and the researchers assume that answer is THE correct answer.   But yeh, that software is constantly being updated -- which is what, apparently, the Apple Watch is being condemned for doing!

    And I agree with you too that there is no reason to think that two sets of data should be identical -- especially if its highly variable like HRV.




  • Reply 31 of 35
    macplusplusmacplusplus Posts: 2,112member
    dysamoria said:

    Two sets of the same daily heart rate variability data collected from one Apple Watch were collected, covering the same period from December 2018 until September 2020. While the sets were collected on September 5, 2020, and April 15, 2021, the data should have been identical given they dealt with identical timeframes, but differences were discovered.

    Even medical-grade devices update their firmware. Are those updates "transparent" ? Do you know how that medical-grade ECG machine of yours reports those "bundle blocks" supposedly existing in your heart? So stop making stupid assumptions and thank Apple for refining its algorithms and backwardly user data. This just shows Apple's serious commitment to the integrity and accuracy of user health data.

    Whether that "revision" is suitable to your research project is not my concern nor Apple's business. Go find some funding...
    Medical-grade devices generally have simpler designs and more reliable sensing mechanisms because they’re not designed to be a small, standalone, luxury wrist computer.
    The whole article and discussion became moot after Apple's response. According to Apple there is no backward revision of user data after an algorithm change. Bugs aside, if one gets two different data sets after exporting the same data twice in one year interval then this is the export procedure which should be inspected. HRV data is no big deal, there is a beats per minute column, always integer value, and a timestamp column. The different export programs that interpret these values and deduce HRV may present such discrepancies. This is not enough for making such bold claims as "Apple should provide sensor data!".
    GeorgeBMac
  • Reply 32 of 35
    I’ve done work with with biological signals (foetal heart monitors), and these signals are noisy and hardware dependant. 

    I understand the researchers needs and sympathize. 

    Apple likely does not store raw data because it’s voluminous. The need to tweak these algorithms is most likely borne out of the need for continuous improvement (accuracy, power efficiency, etc). 

    Ergo, it may well be that if researchers need to use a consumer device they may need to work with the manufacturer to ensure that hardware and software stay constant (and visible) over the duration of the experiment. This would also enable reproduction of the experiment without introducing exogenous factors. 
  • Reply 33 of 35
    MplsPMplsP Posts: 3,931member
    igorsky said:
    neoncat said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I realize this is the new-normal for Apple sites—to be a dismissive prick, like a badge of honor—but the problem the article highlights is the researchers *don't have control* over this algorithmic versioning. This is public data, that *Apple itself* is encouraging be used for these studies (ResearchKit, anyone?) I think the frustration is entirely above-board. 

    If you're going to welcome someone to do their job with your data, and create an interface for that data, don't be so surprised if that person says, "yeah but... this data isn't what we need." So, where's the problem again?

    Right, yes, of course, Anyone But Apple™. Sorry, I'll be better.  
    I see as many posts like yours, with that sarcastic anti-Apple tone, as a see from those who defend them.  So not really sure what the new normal is around here.
    I don’t think it’s anti-Apple as much as anti- Apple fanboy. Apple makes great products, but they’re not perfect and you see a fair number of fanboys, sycophants etc that would find a way to defend Apple even if there was video of Tim Cook wearing a Nazi uniform and personally torturing workers. I think that’s what bothers people most (and leads to anti-Apple sentiment in general.)
    muthuk_vanalingam
  • Reply 34 of 35
    MplsPMplsP Posts: 3,931member
    gatorguy said:
    mike1 said:
    neoncat said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I realize this is the new-normal for Apple sites—to be a dismissive prick, like a badge of honor—but the problem the article highlights is the researchers *don't have control* over this algorithmic versioning. This is public data, that *Apple itself* is encouraging be used for these studies (ResearchKit, anyone?) I think the frustration is entirely above-board. 

    If you're going to welcome someone to do their job with your data, and create an interface for that data, don't be so surprised if that person says, "yeah but... this data isn't what we need." So, where's the problem again?

    Right, yes, of course, Anyone But Apple™. Sorry, I'll be better.  

    Not dismissive at all. To not expect changes over two years is stupid. Not a surprise since Apple issues a dozen updates per year. If you don't know that, you shouldn't be using the hardware/software in your study. If you design a study where you can't have the variability, then it's your job to control the variability. Easily done by asking participants not to update their devices. It's pure ignorance or hyperbole to claim this is a phenomenon when you know damn well why and how it happens.
    The issue is not so much the changes but the "black box" way of doing it, aka, lack of transparency. Researchers have no idea if anything pertinent to their study has changed because Apple doesn't disclose what goes into the algorithm, how it works, nor report when changes are done and when they happen. 
    Exactly - when doing research, knowing and controlling for variables is key. If your method of collecting the data changes then it makes much of the rest meaningless. 

    This isn’t really anything against Apple, more an artifact of trying to use a consumer device for something it wasn’t really intended for. 
    muthuk_vanalingam
  • Reply 35 of 35
    GeorgeBMacGeorgeBMac Posts: 11,421member
    MplsP said:
    gatorguy said:
    mike1 said:
    neoncat said:
    mike1 said:

    It is thought that tweaks by Apple to algorithms used in the Apple Watch changed how the data was interpreted before being collected.

    "These algorithms are what we would call black boxes - they're not transparent. So it's impossible to know what's in them," said Onnela. "What was surprising was how different they are. This is probably the cleanest example that I have seen of this phenomenon."

    It's amazing how smart people can be so stupid. It's not a phenomenon. It's called continual updates to tweak the algorithm over the course of almost two years. If this is an issue for your research, you make sure the software is locked down for the length of the study.
    I realize this is the new-normal for Apple sites—to be a dismissive prick, like a badge of honor—but the problem the article highlights is the researchers *don't have control* over this algorithmic versioning. This is public data, that *Apple itself* is encouraging be used for these studies (ResearchKit, anyone?) I think the frustration is entirely above-board. 

    If you're going to welcome someone to do their job with your data, and create an interface for that data, don't be so surprised if that person says, "yeah but... this data isn't what we need." So, where's the problem again?

    Right, yes, of course, Anyone But Apple™. Sorry, I'll be better.  

    Not dismissive at all. To not expect changes over two years is stupid. Not a surprise since Apple issues a dozen updates per year. If you don't know that, you shouldn't be using the hardware/software in your study. If you design a study where you can't have the variability, then it's your job to control the variability. Easily done by asking participants not to update their devices. It's pure ignorance or hyperbole to claim this is a phenomenon when you know damn well why and how it happens.
    The issue is not so much the changes but the "black box" way of doing it, aka, lack of transparency. Researchers have no idea if anything pertinent to their study has changed because Apple doesn't disclose what goes into the algorithm, how it works, nor report when changes are done and when they happen. 
    Exactly - when doing research, knowing and controlling for variables is key. If your method of collecting the data changes then it makes much of the rest meaningless. 

    This isn’t really anything against Apple, more an artifact of trying to use a consumer device for something it wasn’t really intended for. 

    Except, as noted in a previous post, nothing changed on Apple's side.
    This researcher just needs to do better research into his pet peeves before going public with them.
Sign In or Register to comment.