Drive corruption, constant apps crashing -> Bad RAM

Posted:
in Genius Bar edited January 2014
Just to pass this little tidbit on...



I have a Pismo G3 (FireWire) that gets hammered on a lot - 8-10 hours a day, 6-7 days a week. It's battle scarred.



About 9 months ago, I started noticing drive errors, including the 'disk1s8: 0x08 (undefined)' that others have mentioned. Dismantling the laptop enough to reseat the drive cable always solved this, usually for a few days.



Eventually it got bad enough that I started seeing real corruption ('keys out of order', etc) that Drive First Aid and fsck couldn't fix. Drive 10 couldn't fix it either... it was just hosed. Two reformat and reinstalls later, and I finally decided to replace the obviously bad drive.



Well... all was good for about a week, when ta-da! Same errors start popping up. At this point I'm sure it is a bad ATA controller on the mobo, and while this is a *GREAT* justification for buying a TiBook, as a grad student that would hurt a bit.



Luckily, posting on the MacOS X Admin mailing list (hosted by omnigroup.com), I was pointed to the answer: my RAM module had gone bad.



Turns out, this makes *complete* sense. I was having this increasingly bad problem with apps crashing, then being unable to relaunch. As soon as an app crashed, it was unable to run again until I rebooted. Also, 'related' apps seemed to go down in waves - all my net related apps, for instance, or all apps that use QuickTime. Why does this make sense? MacOS X relies heavily on dynamic libraries for all apps. When a library is loaded into memory for app A, it just stays there. There's no reason to reload it... so if it gets corrupted in memory, it's corrupted for *all* apps that use it. Relaunching app A doesn't refresh the libraries, it tries to use the corrupted one in memory.



So I grabbed an AppleCare CD from our friendly IT dept on campus - no dice. It reported the RAM as fine. A little digging online though, and I found out that the AppleCare tools are a bit lacking in certain RAM tests, specifically refresh tests. Reports were, however, that Newer Tech's Gauge Pro *would* run the refresh tests... and guess what it found. Bad RAM. [1]



Apparently the drive controller was being a good little chip and writing to disk exactly what was asked of it - it just happened to be completely bogus data from memory.



Gauge Pro only runs under 9, so you have to boot into 9 directly (no Classic), but it just saved me from having to buy a new laptop. (Dammit.)



So, if you're seeing constant crashes, if you're seeing data corruption, if you're seeing the '0x08: (undefined)' or 'keys out of order' errors... go get Gauge Pro and check your RAM. [2] It's certainly the cheapest component to replace right now.



[1] In fact, I was able to deduce, by looking at the errors it was getting, that if data address pin 16 was enabled, then pin 23 of the data value would flip, but if address:16 was off, then value:15 would flip parity. Wasn't able to track down precisely which addresses triggered it, but there were definite patterns, such as all addresses ending in '471C' were either in the '09xx' or '07xx' blocks, while those ending in '1434' were always in the '04xx' and '06xx' blocks. To heck with it - I'm just buying new RAM.



[2] Newer Tech is currently in the middle of a business reorg, so their web site is temporarily down. versiontracker.com just links (linked) to their site, so that link is broken, but I was able to find a copy at macupdate.com.
Sign In or Register to comment.