Apple pushes devs to deliver 64-bit support with new Snow Leopard beta

melgross · April 10, 2009 11:31AM

There isn't really any reason for developers to write new, 32 bit apps, or at least, there won't be in a couple of years. By then there will be no PPC machines running NEW apps, and few that will be running older, but upgraded apps. There simply won't be many PPC machines left as a percentage of the Mac environment.

The number of Yonah, 32 bit machines, will have dwindled as well.

With computer power rising as it has, even if a small app won't directly benefit from 64 bits, there won't be any real overhead either.

I would think that in a few years, most apps will have moved to 64 bits, just as most moved, earlier, over to 16, and then to 32.

It will be nice to see everything on the same path for a time.

Then, who knows?

hiro · April 10, 2009 12:38PM

Quote:

Originally Posted by MisatosAngel

*32 bits addendum:

For those saying everything will go 64bit; that will (eventually) be true for drivers etc as ultimately 32bit kernels will die out.

However for user-space applications going 64bit, this is less likely, and why Apple probably won't care to market a 64bit slogan (or get it confused with `Universal'). 32bit apps will continue to run, and frequently run faster than their 64bit counter-parts. This depends of course on what the application needs to do. If it does need more than 32bits worth of RAM, it needs to go 64bit - end of story. Otherwise the reasons are not as compelling as have been made out earlier.

It is true that x64 gives access to extra general purpose registers, and this does mean fewer load/stores to shuffle numbers around from [disk to ram to] cache to register and hence faster execution. However, it also means that every time the program is chasing a pointer (and some programs that have to deal with large linked lists or tree structure have to do this a lot) are having to operate on 64bit values. This is (comparatively) slow. Too much of this will far outweigh the benefits of the extra registers.

For as long as 32bit has its own subsystem and does not need emulation in software to run, there is little reason for Apple to push/promote 64bit user-space applications.

32-bit will only hold a memory performance advantage for awhile. As memory bandwidth continues to get better, the difference will fall to irrelevance as latency will once again move to the fore as the real problem and then the extra GP registers will inexorably crush 32-bit performance. This world is only 2-3 years away, so yes, there is a reason for Apple and developers to move to 64-bit across the board -- memory bandwidth improvements making previous assumptions about the benefits of smaller words and pointer sized obsolete.

misatosangel · April 10, 2009 12:57PM

Quote:

Originally Posted by hmurchison

The hubbub i'm reading from Devs is that if you don't deliver 64-bit applications you risk being the 32-bit app that causes ALL the 32-bit library to load eating up RAM and other resources.

If it's true that OS X must load all the 32-bit library upon launching a 32-bit app I think there will be a chorus of people wanting everything to be 64-bit just from an efficiency/resource perspective.

Quote:

Originally Posted by melgross

There isn't really any reason for developers to write new, 32 bit apps, or at least, there won't be in a couple of years. By then there will be no PPC machines running NEW apps, and few that will be running older, but upgraded apps. There simply won't be many PPC machines left as a percentage of the Mac environment.

The number of Yonah, 32 bit machines, will have dwindled as well.

With computer power rising as it has, even if a small app won't directly benefit from 64 bits, there won't be any real overhead either.

I would think that in a few years, most apps will have moved to 64 bits, just as most moved, earlier, over to 16, and then to 32.

It will be nice to see everything on the same path for a time.

Then, who knows?

The answer to both of these posts is a bit of research. Ultimately it is very easy marketing speak to say `More bits = better' in just the same way as intel used to say `more Mhz = better'. Neither of these is strictly true.

What is mattering more and more these days is computing power in the number of instructions the core(s) can get through a second (which is very different from the clock speed of your cpu - i.e. G/Mhz) ; and the amount of power it drains/dissipates. Even Intel are finally realising this - hence their start on lower powers cores like their atom in an attempt to get in on the market that companies like ARM and MIPS have traditionally dominated.

However, back to your posts. Let us look at what a simplistic address-book like program might do:

Say I store contacts which have

first name

last name

date of birth

company name

phone numbers

In my program's layout this might look like the following in a 32bit execution environment assuming I decided to store the `cards' in memory as a 2-way linked list (I don't want to get overly technical here, we could consider hash maps or index trees or other implementations which might scale even more badly memory-wise)

32-bit pointer-to-first-record (which contains)

32-bit pointer-to-end-record

(32bit) pointer-to-first-name-string -> "Misato"

(32bit) pointer-to-last-name-string -> "Katsuragi"

(32-bit integer) date of birth

(32bit) pointer-to-company-name -> "NERV"

(32bit) pointer-to-phone-array (let's say that's empty for simplicity)

(32bit) pointer-to-next-record

(32bit) pointer-to-previous-record

Allowing for the strings to be encoded in utf8 - that will be 8 bits (1 byte) per letter.

So the above totals 6 pointers at 32bits (4 bytes) each per empty record plus the space used by the text. Each additional phone number would be another pointer and a compacted string, but we'll ignore them since it would make things more complicated.

So above we have 19 textual characters (19 bytes) - I'll ignore encoding artefacts like null terminators etc) and 24 bytes of pointers. That's 43bytes. Add to that our top and bottom pointers (another 2*32 = 64bits = 8 bytes) and we have 51 bytes.

Now in a 64 bit environment, all of those pointers suddenly double in size.

So then we have our 6+2 pointers = 8 pointers x 8 bytes for each one - 64 bytes, plus the 19 bytes of characters = 83 bytes.

Ugh, so we've gone from 51 bytes to 83 bytes (over half as much again) to store exactly the same information. Hurray for 64bit...

Every time I need to chase one of those pointers, it is going to take me longer as well.

My operating system will need to allocate me more memory to hold it all, which means I need more real memory in the machine in order to have the same `effective' memory as I did when I Was 32bit.

Finally, the book-keeping the operating system needs to do for each of these pointers increases in size and complexity in the same way (this is a hidden cost too complicated to go into detail about here).

Likewise with your cache, it means you need a larger cache to have the same useful cache space as you had before. A larger cache draws more power and takes longer to synchronise amongst multiple cores. Taking longer also equates to more effort for the cpus/bus to keep everything synchronised with main memory, and the equates to drawing more power.

To draw a less technical comparison. Supposing you have a library with all its books on index cards. You give each shelf of books a number and each book a number. However, people want more books. So you get more shelves in order to hold more books.

Then you realise that in order to have these extra shelves and the books that you have one them, you need to increase the size of your index cards. This means you have to invest in larger filling cabinets in order to just hold these larger cards (let alone all the new ones).

Then when you buy the extra index cards, you need yet more of these large filing cabinets to hold them. The larger filing cabinets then eat into the space where you were putting the new shelves to store the extra books!

So now you need a larger room, and that means making the house bigger... You then need to feed electric power to this new part of the room where the extra shelves are that wasn't previously needed.

In Summary

64bit is not great. It is not a silver bullet to solve computing problems. It is a must-have for those programs which require more memory than their operating system can allow them to address with a 32bit pointer. Databases, Photoshop (and the like) are good examples of these. Little programs are not, they might even suffer badly if they are compiled 64bit native, resulting in taking up more of your memory and hence draining your power faster.

There will always be a very real overhead in power terms dealing with all this extra house-keeping information associated with larger indexes. Yes as computers evolve, things may continue to speed up, but the future is not in more Mhz, it is in increased parallelism of execution through potentially less powerful (and hence less power hungry) cores.

This in itself comes with its own challenges, because programming an application to take advantage of the innate parallelism of the system is extremely hard. It is where OS-X has a chance to really shine if Apple gets its new technologies in snow leopard right, as it will mean the OS does that little bit more to help.

So 32 bit is not necessarily about keeping compatibility with existing cpus (be they ppc or x86). It may also be about increased efficiency and decreased power-consumption compared to a direct 64bit re-compile of the same thing. Of course this varies depending on the underlying hardware and what your program actually does. The rules are a quite different amongst ppc and x86, and ARM and embedded 8bit micro-controllers for that matter too -even 16bit would be far too big there.

For x86 at least, the trade-off is how much you need those extra registers, vs how much pointer chasing your program does. For ppc, there are only the downsides.

So as to `being the one app that forces all the 32bit libraries to load' this is a very generalised way of looking at things. How many of these 32bit libraries are required depends on what the program does - and at least in the meantime, the RAM requirements of running large numbers of pointer-heavy 64bit programs may be far worse.

If your program runs faster as 32bit, why invest in the effort of change? Programs like Office will contain so much legacy 64bit-dirty code that I would be very surprised if they went 64bit any time soon; and can you imagine Word running even more slowly, and using even more RAM... just to show an empty document?

This is not what you want on an embedded device like the iPhone - and it is those and netbooks that are going to be the main computers of the future - hot hulking desktops.

melgross · April 10, 2009 1:17PM

Quote:

Originally Posted by MisatosAngel

The answer to both of these posts is a bit of research. Ultimately it is very easy marketing speak to say `More bits = better' in just the same way as intel used to say `more Mhz = better'. Neither of these is strictly true.

What is mattering more and more these days is computing power in the number of instructions the core(s) can get through a second (which is very different from the clock speed of your cpu - i.e. G/Mhz) ; and the amount of power it drains/dissipates. Even Intel are finally realising this - hence their start on lower powers cores like their atom in an attempt to get in on the market that companies like ARM and MIPS have traditionally dominated.

However, back to your posts. Let us look at what a simplistic address-book like program might do:

Say I store contacts which have

first name

last name

date of birth

company name

phone numbers

In my program's layout this might look like the following in a 32bit execution environment assuming I decided to store the `cards' in memory as a 2-way linked list (I don't want to get overly technical here, we could consider hash maps or index trees or other implementations which might scale even more badly memory-wise)

32-bit pointer-to-first-record (which contains)

32-bit pointer-to-end-record

(32bit) pointer-to-first-name-string -> "Misato"

(32bit) pointer-to-last-name-string -> "Katsuragi"

(32-bit integer) date of birth

(32bit) pointer-to-company-name -> "NERV"

(32bit) pointer-to-phone-array (let's say that's empty for simplicity)

(32bit) pointer-to-next-record

(32bit) pointer-to-previous-record

Allowing for the strings to be encoded in utf8 - that will be 8 bits (1 byte) per letter.

So the above totals 6 pointers at 32bits (4 bytes) each per empty record plus the space used by the text. Each additional phone number would be another pointer and a compacted string, but we'll ignore them since it would make things more complicated.

So above we have 19 textual characters (19 bytes) - I'll ignore encoding artefacts like null terminators etc) and 24 bytes of pointers. That's 43bytes. Add to that our top and bottom pointers (another 2*32 = 64bits = 8 bytes) and we have 51 bytes.

Now in a 64 bit environment, all of those pointers suddenly double in size.

So then we have our 6+2 pointers = 8 pointers x 8 bytes for each one - 64 bytes, plus the 19 bytes of characters = 83 bytes.

Ugh, so we've gone from 51 bytes to 83 bytes (over half as much again) to store exactly the same information. Hurray for 64bit...

Every time I need to chase one of those pointers, it is going to take me longer as well.

My operating system will need to allocate me more memory to hold it all, which means I need more real memory in the machine in order to have the same `effective' memory as I did when I Was 32bit.

Finally, the book-keeping the operating system needs to do for each of these pointers increases in size and complexity in the same way (this is a hidden cost too complicated to go into detail about here).

Likewise with your cache, it means you need a larger cache to have the same useful cache space as you had before. A larger cache draws more power and takes longer to synchronise amongst multiple cores. Taking longer also equates to more effort for the cpus/bus to keep everything synchronised with main memory, and the equates to drawing more power.

To draw a less technical comparison. Supposing you have a library with all its books on index cards. You give each shelf of books a number and each book a number. However, people want more books. So you get more shelves in order to hold more books.

Then you realise that in order to have these extra shelves and the books that you have one them, you need to increase the size of your index cards. This means you have to invest in larger filling cabinets in order to just hold these larger cards (let alone all the new ones).

Then when you buy the extra index cards, you need yet more of these large filing cabinets to hold them. The larger filing cabinets then eat into the space where you were putting the new shelves to store the extra books!

So now you need a larger room, and that means making the house bigger... You then need to feed electric power to this new part of the room where the extra shelves are that wasn't previously needed.

In Summary

64bit is not great. It is not a silver bullet to solve computing problems. It is a must-have for those programs which require more memory than their operating system can allow them to address with a 32bit pointer. Databases, Photoshop (and the like) are good examples of these. Little programs are not, they might even suffer badly if they are compiled 64bit native, resulting in taking up more of your memory and hence draining your power faster.

There will always be a very real overhead in power terms dealing with all this extra house-keeping information associated with larger indexes. Yes as computers evolve, things may continue to speed up, but the future is not in more Mhz, it is in increased parallelism of execution through potentially less powerful (and hence less power hungry) cores.

This in itself comes with its own challenges, because programming an application to take advantage of the innate parallelism of the system is extremely hard. It is where OS-X has a chance to really shine if Apple gets its new technologies in snow leopard right, as it will mean the OS does that little bit more to help.

So 32 bit is not necessarily about keeping compatibility with existing cpus (be they ppc or x86). It may also be about increased efficiency and decreased power-consumption compared to a direct 64bit re-compile of the same thing. Of course this varies depending on the underlying hardware and what your program actually does. The rules are a quite different amongst ppc and x86, and ARM and embedded 8bit micro-controllers for that matter too -even 16bit would be far too big there.

For x86 at least, the trade-off is how much you need those extra registers, vs how much pointer chasing your program does. For ppc, there are only the downsides.

So as to `being the one app that forces all the 32bit libraries to load' this is a very generalised way of looking at things. How many of these 32bit libraries are required depends on what the program does - and at least in the meantime, the RAM requirements of running large numbers of pointer-heavy 64bit programs may be far worse.

If your program runs faster as 32bit, why invest in the effort of change? Programs like Office will contain so much legacy 64bit-dirty code that I would be very surprised if they went 64bit any time soon; and can you imagine Word running even more slowly, and using even more RAM... just to show an empty document?

This is not what you want on an embedded device like the iPhone - and it is those and netbooks that are going to be the main computers of the future - hot hulking desktops.

'Murch and I are both very aware of those issues.

The point is that it doesn't matter. None of it matters.

There was a time where we were processor constrained, as well as memory bus constrained. Even for single cpu dual core machines, those days are fast receding.

By sometime next year, it won't matter whether programs are 64 bits or 32 bits from a run speed perspective.

What you're not thinking about is that many programs are fast enough already. Nehalem allows much more speed if programs are properly threaded than ever before. Whether a program can't string enough instructions together for 64 bits isn't a concern. Proper programming is a concern, and more developers are going to be forced into it.

There's nothing about 64 bits that a minus, and plenty that's a plus.

no one here, except you, is talking about 64 bit for the iPhone/iTouch. At some point even that will come, but not yet.

dfiler · April 10, 2009 2:09PM

One aspect I find interesting about this transition to 64bits is that it differs slightly from when we switched to 16 or 32 bits.

The data sets that most people deal with on their computers fit easily into 32 bits. 4.3 billion is generally enough to enumerate just about anything, from most CAD environments to my own bank account. (I could always hope though right?

)

16 bits wasn't enough for most data sets. 65,536 just isn't that big of a number. So the switch to 32 bits was beneficial to enough tasks that it made sense to do it across the board. 64 bits isn't as enticing for most tasks. Most tasks don't need quick access to 18,446,744,100,000,000,000 items or that level of precision. (Hopefully my math is right on that one. My calculator doesn't go that high)

So I'm wondering if the future of computing will be that of continuing to support both 32 and 64 bits. Will the overhead be trivial in the end or will people always want to eek a bit more performance out of their computers by doing simple tasks in 32 bits?

melgross · April 10, 2009 2:37PM

Quote:

Originally Posted by dfiler

One aspect I find interesting about this transition to 64bits is that it differs slightly from when we switched to 16 or 32 bits.

The data sets that most people deal with on their computers fit easily into 32 bits. 4.3 billion is generally enough to enumerate just about anything, from most CAD environments to my own bank account. (I could always hope though right?

)

16 bits wasn't enough for most data sets. 65,536 just isn't that big of a number. So the switch to 32 bits was beneficial to enough tasks that it made sense to do it across the board. 64 bits isn't as enticing for most tasks. Most tasks don't need quick access to 18,446,744,100,000,000,000 items or that level of precision. (Hopefully my math is right on that one. My calculator doesn't go that high)

So I'm wondering if the future of computing will be that of continuing to support both 32 and 64 bits. Will the overhead be trivial in the end or will people always want to eek a bit more performance out of their computers by doing simple tasks in 32 bits?

I'm not sure if it matters that people may not need it. But there are other problems as well. With 32 bits, you can only support a file size of 4 GB. When doing video, we have files that can easily extend into the tens of GB. Look at digital cameras that do video, the time are constrained by that 32 bit file system used. On my new Canon 5D mkII, I can buy a 16 GB card, but can only take video in 4 GB bites, which only allows about 13 minutes in HD 1080p. A 64 bit file system (camera cards are written as FAT32) would remove that constraint. It's not that simplistic for computers, but there are similar problems.

Even though you are right about not everyone needing 64 bits, XCode will make it so easy to compile for this, as it does now, that doing a program as 64 bit will be a push of a key (I'm sure someone is thinking that it's not quite THAT simple).

Doing 64 bit won't hurt, so why not?

chipz · April 10, 2009 2:39PM

Regardless of what you or anybody else wants to call it, Apple has a lot of educating to do in the future. It will have to educate people what will and will not run the new OS. I hope they're ready for this, because it seems the new OS is far from ready to be rolled out.

misatosangel · April 10, 2009 2:42PM

Quote:

Originally Posted by melgross

'Murch and I are both very aware of those issues.

The point is that it doesn't matter. None of it matters.

There was a time where we were processor constrained, as well as memory bus constrained. Even for single cpu dual core machines, those days are fast receding.

By sometime next year, it won't matter whether programs are 64 bits or 32 bits from a run speed perspective.

What you're not thinking about is that many programs are fast enough already. Nehalem allows much more speed if programs are properly threaded than ever before. Whether a program can't string enough instructions together for 64 bits isn't a concern. Proper programming is a concern, and more developers are going to be forced into it.

There's nothing about 64 bits that a minus, and plenty that's a plus.

no one here, except you, is talking about 64 bit for the iPhone/iTouch. At some point even that will come, but not yet.

My apologies - I was not trying to imply you were not aware of any particular issue(s) regarding 32 and 64bit, I'm trying to write for the leyman to understand. If you want to have a deeply technical discussion we can exchange email addresses

For my part, reading that it sounds to me like you are stating quite a few things which I would like to hold up [citation needed], but I do not want to get into this

Most of what you say, I do agree with.

Yes, it is definitely true that as time goes on, overall system performance improves, and what is or is not the bottle-neck also changes over time. Branch-prediction, multi execution units, out-of-order execution, pipelining, reorder buffers and a host of acronyms I won't bore the reader with all help shift the load about from one part to another.

Some processor instruction sets are better at naturally letting the software (usually via a clever compiler rather than hand-crafted assembly) utilise this more than others. For some hardware setups, it can even be better to let the hardware handle it anyway.

However it doesn't change the fact that more data is being moved around and this takes more time, wastes more space and chews more power compared to an equivalent program with 32bit data on theoretically otherwise identical hardware. This was my point, and even as hardware gets newer, that fundamentally does not change. It does of course fall down flat once you can't get `theoretically otherwise identical hardware'. Once Intel have optimised their motherboards for 64bit, they may well just drop the entire 32bit subsystem in a CPU re-design, saving more power and at that point, 32bit will be dead.

But in the meantime too yes, I am aware of and agree that for the vast majority of user-space programs they run `fast enough'. Nobody is going to mind particularly about fractions of a second. Given time, nobody will care about another 100MB of memory being used. We all get used to number inflation.

However, I did not want to bring that up in my previous wall of text as there was enough text already. It isn't very interesting or relevant to the point I was making, which was merely demonstrating the increased memory and power usage of programs as they go 64bit (please re-read my summary!)

Putting aside 32v64 for a moment, what is interesting though is the scalability of some of these programs. If (for example) you have a program with a sub-optimal sort algorithm, it may not be noticeable for (say) ordering 1000 emails. However, as things get bigger and you now have say 100,000 emails, suddenly that extra 0.01 second on each email might be starting to make a difference that is very visible to the end-user. There are far too many programmers that ignore basic complexity theory, and many of the text UI widgits in cocoa in mac-os-x are prime contenders for this.

There is the danger that developers will be sold 64bit and more memory as the cure for all evils (as I was saying, before, it isn't). Picking a technical example: supposing I want to load that 4GB trace dump into my editor. With 64bits, there's the danger that the developer might actually really load the whole lot into his or her newly expanded address space; whilst the right solution is to stream the information as required from the file on disk.

As programs do deal with larger and larger data sets, more smarts will be required to respond quickly. As you say, `proper' programming is a concern. However, there are very few developers who can deal with it. Threading can be incredibly difficult to get right (and to debug when it frequently is not). Picking an optimal algorithm for solving your problem is also very tricky, but (IMHO) often fun - that's where the real computer science is; rather than the donkey work of coding.

More worryingly, companies are usually not prepared to pay for the sorts of developers who will even (for example) stop and think about how big a file is before loading it all into memory from disk... Or even whether it is a file at all; rather than a network socket or pipe. Performance is rarely of concern in commercial software. Prime examples include Office 2008 and Adobe's Mac incarnation of their Flash web-plug-in.

After all, consumers pay for bigger numbers (ZOMG 64bit!!!1one1!1eleven <_<) and more features. They (or most) don't want to pay for faster, stable and more efficient. They feel that should be free, when actually that's often much harder than the new features in the first place. (Which ties in with my first post about actually fundamentally understanding the problem)

Finally, re the iPhone - there I said `and it is those and netbooks that are going to be the main computers of the future'. I was not talking about now, the iPhone isn't currently 64bit. However, netbooks are coming out now and the Atom chip that Intel is desperately pushing to get into some of them over an ARM variant is capable in some forms of being 64bit. The Atom's power draw (TDP as they call it) is shoddy enough already compared to an ARM's without it getting worse by being pushed into 64bit world for marketing reasons. And does your web-browser really need 64bits? Since web pages are DOM-trees that is another example of potentially ghastly pointer-tastic memory usage.

One thing which is (IMHO) still true - and what I was trying to get across in my last post is that it is the ever rising power consumption that is the problem. One technology that really has not scaled with the rest of computer architecture is battery technology. Because computers are all going mobile/mid/netbook/smart-phone - this is where the main throttle is going to be on the ever expanding motherboard improvements we both talk about.

Personally I think that desktop will for the vast majority of end users be consigned to the dust-bin (or at least the home attic) for just taking up too much room and not being portable.

melgross · April 10, 2009 2:43PM

Quote:

Originally Posted by chipz

Regardless of what you or anybody else wants to call it, Apple has a lot of educating to do in the future. It will have to educate people what will and will not run the new OS. I hope they're ready for this, because it seems the new OS is far from ready to be rolled out.

I doubt that'll be much of a problem. 32 bit programs will run. Pretty much everything will run.

Apple does this very well.

Most consumers will never know if they have a 32 bit, or a 64 bit program.

misatosangel · April 10, 2009 2:47PM

Quote:

Originally Posted by melgross

I'm not sure if it matters that people may not need it. But there are other problems as well. With 32 bits, you can only support a file size of 4 GB. When doing video, we have files that can easily extend into the tens of GB. Look at digital cameras that do video, the time are constrained by that 32 bit file system used. On my new Canon 5D mkII, I can buy a 16 GB card, but can only take video in 4 GB bites, which only allows about 13 minutes in HD 1080p. A 64 bit file system (camera cards are written as FAT32) would remove that constraint. It's not that simplistic for computers, but there are similar problems.

Even though you are right about not everyone needing 64 bits, XCode will make it so easy to compile for this, as it does now, that doing a program as 64 bit will be a push of a key (I'm sure someone is thinking that it's not quite THAT simple).

Doing 64 bit won't hurt, so why not?

This is actually just the misconception I was talking about (didn't notice since you wrote it whilst I was posting before).

You do not need a 64bit computer to run 64bit file system. Heck, you can use ZFS (a 128bit file system) on a 32bit box.

You can also quite happily deal with values larger than 2^32 on 32bit hardware - otherwise we'd never have been able to crank up all those appalling financial losses

This canon camera you talk about is exactly mimicking my example of bad programming - if it were to stream the file out as it was recording, it would have no reason to try and keep a 4GB file in memory for the entire direction of the shoot - which is an idiotic thing to be doing.

sigh

And yes, how simple a 64bit re-compile is depends on the state of the code and assumptions made by the coders as they were programming. It could be simple, it could be horrendous.

melgross · April 10, 2009 3:04PM

Quote:

Originally Posted by MisatosAngel

My apologies - I was not trying to imply you were not aware of any particular issue(s) regarding 32 and 64bit, I'm trying to write for the leyman to understand. If you want to have a deeply technical discussion we can exchange email addresses

For my part, reading that it sounds to me like you are stating quite a few things which I would like to hold up [citation needed], but I do not want to get into this

Most of what you say, I do agree with.

Yes, it is definitely true that as time goes on, overall system performance improves, and what is or is not the bottle-neck also changes over time. Branch-prediction, multi execution units, out-of-order execution, pipelining, reorder buffers and a host of acronyms I won't bore the reader with all help shift the load about from one part to another.

Some processor instruction sets are better at naturally letting the software (usually via a clever compiler rather than hand-crafted assembly) utilise this more than others. For some hardware setups, it can even be better to let the hardware handle it anyway.

However it doesn't change the fact that more data is being moved around and this takes more time, wastes more space and chews more power compared to an equivalent program with 32bit data on theoretically otherwise identical hardware. This was my point, and even as hardware gets newer, that fundamentally does not change. It does of course fall down flat once you can't get `theoretically otherwise identical hardware'. Once Intel have optimised their motherboards for 64bit, they may well just drop the entire 32bit subsystem in a CPU re-design, saving more power and at that point, 32bit will be dead.

But in the meantime too yes, I am aware of and agree that for the vast majority of user-space programs they run `fast enough'. Nobody is going to mind particularly about fractions of a second. Given time, nobody will care about another 100MB of memory being used. We all get used to number inflation.

However, I did not want to bring that up in my previous wall of text as there was enough text already. It isn't very interesting or relevant to the point I was making, which was merely demonstrating the increased memory and power usage of programs as they go 64bit (please re-read my summary!)

Putting aside 32v64 for a moment, what is interesting though is the scalability of some of these programs. If (for example) you have a program with a sub-optimal sort algorithm, it may not be noticeable for (say) ordering 1000 emails. However, as things get bigger and you now have say 100,000 emails, suddenly that extra 0.01 second on each email might be starting to make a difference that is very visible to the end-user. There are far too many programmers that ignore basic complexity theory, and many of the text UI widgits in cocoa in mac-os-x are prime contenders for this.

There is the danger that developers will be sold 64bit and more memory as the cure for all evils (as I was saying, before, it isn't). Picking a technical example: supposing I want to load that 4GB trace dump into my editor. With 64bits, there's the danger that the developer might actually really load the whole lot into his or her newly expanded address space; whilst the right solution is to stream the information as required from the file on disk.

As programs do deal with larger and larger data sets, more smarts will be required to respond quickly. As you say, `proper' programming is a concern. However, there are very few developers who can deal with it. Threading can be incredibly difficult to get right (and to debug when it frequently is not). Picking an optimal algorithm for solving your problem is also very tricky, but (IMHO) often fun - that's where the real computer science is; rather than the donkey work of coding.

More worryingly, companies are usually not prepared to pay for the sorts of developers who will even (for example) stop and think about how big a file is before loading it all into memory from disk... Or even whether it is a file at all; rather than a network socket or pipe. Performance is rarely of concern in commercial software. Prime examples include Office 2008 and Adobe's Mac incarnation of their Flash web-plug-in.

After all, consumers pay for bigger numbers (ZOMG 64bit!!!1one1!1eleven <_<) and more features. They (or most) don't want to pay for faster, stable and more efficient. They feel that should be free, when actually that's often much harder than the new features in the first place. (Which ties in with my first post about actually fundamentally understanding the problem)

Finally, re the iPhone - there I said `and it is those and netbooks that are going to be the main computers of the future'. I was not talking about now, the iPhone isn't currently 64bit. However, netbooks are coming out now and the Atom chip that Intel is desperately pushing to get into some of them over an ARM variant is capable in some forms of being 64bit. The Atom's power draw (TDP as they call it) is shoddy enough already compared to an ARM's without it getting worse by being pushed into 64bit world for marketing reasons. And does your web-browser really need 64bits? Since web pages are DOM-trees that is another example of potentially ghastly pointer-tastic memory usage.

One thing which is (IMHO) still true - and what I was trying to get across in my last post is that it is the ever rising power consumption that is the problem. One technology that really has not scaled with the rest of computer architecture is battery technology. Because computers are all going mobile/mid/netbook/smart-phone - this is where the main throttle is going to be on the ever expanding motherboard improvements we both talk about.

Personally I think that desktop will for the vast majority of end users be consigned to the dust-bin (or at least the home attic) for just taking up too much room and not being portable.

I'm not disagreeing with your technical points. I just see them being less of a concern going forward.

As far as run times are concerned, and threading, well, that's what Grand Central and Open CL is coming out to help with. They surely won't solve all problems, but they will help the lazy, or less talented, programmers do more with less.

As we both agree, faster machines, more RAM etc. will hide many of these lessor problems from the user, and that's really what it's all about.

When I compare my new 2.66 dual Mac Pro to my old dual 2 GHz G5, I see that with the same work, and a program optimized for both machines, the G5 pegs both cores, while the Mac Pro barely reaches 25% on two cores, and the rest are just peeping in, so to speak, every so often.

When the e-mail program is properly threaded, this problem will go away. The speedup will be much greater than any theoretical slowdowns caused by 64 bits.

When Apple switches the iMac to Nehalem, and the laptops as well, we'll see another major jump in power. Even the Mini will be more powerful than the fastest machine Apple had two or three years ago.

In a year, or two, most all machines will be running off 4 core 64 bit cpu's that are much faster than today, with much more memory.

People will be pleased at how much faster they run properly (and more easily) coded programs.

It hasn't been shown anywhere that I've seen, that going from 32 to 64 bits results in more than a small slowdown, at most. Assuming that the program is decently coded, of course. But I would never use as a standard, poorly coded programs, or use them as an example of how bad it could be. Poor code is inexcusable at all times. Companies that execute poor code will be hurt by their competitors who are rushing to get it right.

Right now, and into the future, multiple cores will be the bridge to speed. Everyone will move to this as much as their type of program will allow. Apple is making that move easier. That's why I'm not concerned about 32 vs 64 bit speed differences, if there prove to be any.

melgross · April 10, 2009 3:07PM

Quote:

Originally Posted by MisatosAngel

This is actually just the misconception I was talking about (didn't notice since you wrote it whilst I was posting before).

You do not need a 64bit computer to run 64bit file system. Heck, you can use ZFS (a 128bit file system) on a 32bit box.

You can also quite happily deal with values larger than 2^32 on 32bit hardware - otherwise we'd never have been able to crank up all those appalling financial losses

This canon camera you talk about is exactly mimicking my example of bad programming - if it were to stream the file out as it was recording, it would have no reason to try and keep a 4GB file in memory for the entire direction of the shoot - which is an idiotic thing to be doing.

sigh

And yes, how simple a 64bit re-compile is depends on the state of the code and assumptions made by the coders as they were programming. It could be simple, it could be horrendous.

We haven't been able to because the programs haven't allowed it. Believe me, I've gone through this over the years. There are problems, even on the Mac.

The streaming is to the compact flash card. It isn't poor programming. It can't be done. Period!

Manufacturers are talking about dropping FAT32 and moving to a 64 bit file system, which IS required.

misatosangel · April 10, 2009 5:09PM

Quote:

Originally Posted by melgross

We haven't been able to because the programs haven't allowed it. Believe me, I've gone through this over the years. There are problems, even on the Mac.

The streaming is to the compact flash card. It isn't poor programming. It can't be done. Period!

Manufacturers are talking about dropping FAT32 and moving to a 64 bit file system, which IS required.

Hmm, so you work in this area? I wasn't talking about the Mac here. Mac-OS's kernel disk I/O is pretty darn shoddy sadly (well compared to linux). But yes, I fully agree that Microsoft's FAT32 has had it's day, and why it was ever chosen as the `de-facto standard' file-system for embedded stuff beats me. Okay, I lie, I do know why, but it still annoys me a lot. I was pointing out though that the bit-ness of your file-system and CPU are completely independent.

As to streaming to flash being impossible I think I'm going to pick you up there. If only for asking what are you doing recording HD on such an expensive toy to a tiny little flash disk?

Most people if they're going to splash on something that expensive for 1080p are going to be carrying around some nice fast UW-SCSI or FW2 hard-disk to stream to. Although perhaps that model doesn't support it.

You're right though - I was hasty in my assertion (I saw the 4GB figure and thought `I'd just said that!') - compact flash doesn't currently have the sustained write speed for streaming that kind of stuff... Tops out around 50MB/s ATOW I think (which would be enough for 720p, depending on frame rate) - with most camera cards being more like 9 MB/s. But I think it will come in a year or so as there's a lot of demand not to have to lug about large fast disks with your film camera. Larger 1U Flash disks arrays (at least over UW-SCSI or fibre) can certainly accommodate it though, and even individual 3.5" disks are breaking the 100MB/s.

Should have waited a year or two before you bought!

melgross · April 10, 2009 5:26PM

Quote:

Originally Posted by MisatosAngel

Hmm, so you work in this area? I wasn't talking about the Mac here. Mac-OS's kernel disk I/O is pretty darn shoddy sadly (well compared to linux). But yes, I fully agree that Microsoft's FAT32 has had it's day, and why it was ever chosen as the `de-facto standard' file-system for embedded stuff beats me. Okay, I lie, I do know why, but it still annoys me a lot. I was pointing out though that the bit-ness of your file-system and CPU are completely independent.

As to streaming to flash being impossible I think I'm going to pick you up there. If only for asking what are you doing recording HD on such an expensive toy to a tiny little flash disk?

Most people if they're going to splash on something that expensive for 1080p are going to be carrying around some nice fast UW-SCSI or FW2 hard-disk to stream to. Although perhaps that model doesn't support it.

I didn't say that streaming to flash was impossible. I said that they save to a flash card, and that the file limits are 4 GB no matter how large the file is.

Quote:

You're right though - I was hasty in my assertion (I saw the 4GB figure and thought `I'd just said that!') - compact flash doesn't currently have the sustained write speed for streaming that kind of stuff... Tops out around 50MB/s ATOW I think (which would be enough for 720p, depending on frame rate) - with most camera cards being more like 9 MB/s. But I think it will come in a year or so as there's a lot of demand not to have to lug about large fast disks with your film camera. Larger 1U Flash disks arrays (at least over UW-SCSI or fibre) can certainly accommodate it though, and even individual 3.5" disks are breaking the 100MB/s.

Should have waited a year or two before you bought!

The fastest UDMA 5 and 6 cards are fast enough for 1080p, using H.264 compression.

Later this year, or early next, we will be getting SATA flash cards.

hiro · April 11, 2009 12:28PM

Quote:

Originally Posted by MisatosAngel

Now in a 64 bit environment, all of those pointers suddenly double in size.

So then we have our 6+2 pointers = 8 pointers x 8 bytes for each one - 64 bytes, plus the 19 bytes of characters = 83 bytes.

Ugh, so we've gone from 51 bytes to 83 bytes (over half as much again) to store exactly the same information. Hurray for 64bit...

Every time I need to chase one of those pointers, it is going to take me longer as well.

My operating system will need to allocate me more memory to hold it all, which means I need more real memory in the machine in order to have the same `effective' memory as I did when I Was 32bit.

Finally, the book-keeping the operating system needs to do for each of these pointers increases in size and complexity in the same way (this is a hidden cost too complicated to go into detail about here).

Likewise with your cache, it means you need a larger cache to have the same useful cache space as you had before. A larger cache draws more power and takes longer to synchronise amongst multiple cores. Taking longer also equates to more effort for the cpus/bus to keep everything synchronised with main memory, and the equates to drawing more power.

Technically, considering todays shipping hardware I have no arguments with your facts and I don't think anyone else does either. The problem is the conclusion you are reaching and applying into the near future.

Cache and RAM size issues you cite are red-herrings for 42nm and smaller process sizes. There is so much real estate on a die now that the designers cannot fill it with cores, so they are filling it with L2 and soon L2 & L3 cache! Not to mention moving the memory controller to the CPU die. All in a vain attempt to fill the available die space with transistors that actually do something continuously. Yes I said vain attempt. Most of those transistors will end up sitting around relatively unused compared to the speeds the processor burns through logical time-space.

Projections of 256MB of 7ns latency L2 and a GB of 12-20ns latency L3 on die are only about two die shrinks away. Those caches far outstrip the cache space available two years ago and before on a per core basis--even for the projected 32 physical core with SMT (making 64 logical cores) die. And given that less than 5% of the code runs more than 95% of the time, those caches exponentially outstrip current needs extrapolated to the near to mid-future.

Like mel said, the problems are going away, well he said "hide", but I contend the problem is truly going away because the 32-bit general purpose CPU we see today will be unavailable anymore and future extrapolations about how fast we can load pointers on it compared to 64 bit architectures will be as meaningless as the conversation was in 1998 on the 16 to 32 bit pointer transition 8-10 years before that. The issues will still be relevant in the embedded space, but that is always a special case an the gents working there carefully choose what they need and then optimize as much as necessary.

Heretical statement: Threading is easy, OK not easy--but not hard either, as long as you design it in up front. Making threaded apps really fast is a bit harder, but much of that has to do with algorithm choice and data organization. Too many designers make the data too monolithic which leads to lock contention, and sketchy shortcuts which break the safety model injecting race condition bugs and then it starts to look hard. This last part is what most programmers deal with when they develop their fear of threading.

melgross · April 12, 2009 1:12AM

Quote:

Originally Posted by Hiro

Technically, considering todays shipping hardware I have no arguments with your facts and I don't think anyone else does either. The problem is the conclusion you are reaching and applying into the near future.

Cache and RAM size issues you cite are red-herrings for 42nm and smaller process sizes. There is so much real estate on a die now that the designers cannot fill it with cores, so they are filling it with L2 and soon L2 & L3 cache! Not to mention moving the memory controller to the CPU die. All in a vain attempt to fill the available die space with transistors that actually do something continuously. Yes I said vain attempt. Most of those transistors will end up sitting around relatively unused compared to the speeds the processor burns through logical time-space.

Projections of 256MB of 7ns latency L2 and a GB of 12-20ns latency L3 on die are only about two die shrinks away. Those caches far outstrip the cache space available two years ago and before on a per core basis--even for the projected 32 physical core with SMT (making 64 logical cores) die. And given that less than 5% of the code runs more than 95% of the time, those caches exponentially outstrip current needs extrapolated to the near to mid-future.

Like mel said, the problems are going away, well he said "hide", but I contend the problem is truly going away because the 32-bit general purpose CPU we see today will be unavailable anymore and future extrapolations about how fast we can load pointers on it compared to 64 bit architectures will be as meaningless as the conversation was in 1998 on the 16 to 32 bit pointer transition 8-10 years before that. The issues will still be relevant in the embedded space, but that is always a special case an the gents working there carefully choose what they need and then optimize as much as necessary.

Heretical statement: Threading is easy, OK not easy--but not hard either, as long as you design it in up front. Making threaded apps really fast is a bit harder, but much of that has to do with algorithm choice and data organization. Too many designers make the data too monolithic which leads to lock contention, and sketchy shortcuts which break the safety model injecting race condition bugs and then it starts to look hard. This last part is what most programmers deal with when they develop their fear of threading.

I have a program called MacPar deluxe. It does what you think it does.

The newer versions use multiple cores. The old version didn't. In decoding a 2 GB file with a LOT of pars (and errors it had to fix), it ate through it in just over 5 seconds on my dual 2.66 Mac Pro. The old version, which I tried, to see the difference, took 2 MINUTES!

This is why going from 32 bits to 64 is of little importance for many programs. It's the threading with multiple cores (and hyperthreading) that will matter.

lfmorrison · April 14, 2009 10:09AM

Quote:

Originally Posted by melgross

I didn't say that streaming to flash was impossible. I said that they save to a flash card, and that the file limits are 4 GB no matter how large the file is.

To which the response is, the "bittness" (did I just make up a new word?) of the filesystem is not intrinsically linked to the word length of the CPU hosting it. As MisatosAngel said, it is perfectly reasonable to host even a 128-bit filesystem (such as ZFS) on a 32-bit CPU.

The current de facto file systems employed by the 32-bit kernels of both OS X and Windows (HFS+ and NTFS, respectively) both have theoretical maximum file sizes that reach the exabyte range.

That embedded designers have chosen not to keep up with that when it comes to the default filesystem installed on most Flash-based removable media is definitely unfortunate. It is completely possible to format a compact flash card with a more advanced filesystem.

Unfortunately many embedded devices wouldn't know what to do with them if they weren't formatted with FAT32, and it is precisely for such compatibility reasons that such cards will probably continue to be formatted by default with an inferior filesystem long after the host CPUs in the vast majority of PCs all switch over to 64-bit mode. They'll switch only when the volumes get so large that the older filesystems cannot support them any more.

melgross · April 14, 2009 3:07PM

Quote:

Originally Posted by lfmorrison

To which the response is, the "bittness" (did I just make up a new word?) of the filesystem is not intrinsically linked to the word length of the CPU hosting it. As MisatosAngel said, it is perfectly reasonable to host even a 128-bit filesystem (such as ZFS) on a 32-bit CPU.

The current de facto file systems employed by the 32-bit kernels of both OS X and Windows (HFS+ and NTFS, respectively) both have theoretical maximum file sizes that reach the exabyte range.

That embedded designers have chosen not to keep up with that when it comes to the default filesystem installed on most Flash-based removable media is definitely unfortunate. It is completely possible to format a compact flash card with a more advanced filesystem.

Unfortunately many embedded devices wouldn't know what to do with them if they weren't formatted with FAT32, and it is precisely for such compatibility reasons that such cards will probably continue to be formatted by default with an inferior filesystem long after the host CPUs in the vast majority of PCs all switch over to 64-bit mode. They'll switch only when the volumes get so large that the older filesystems cannot support them any more.

There are workarounds for this, but a 32 bit file system is the same no matter what.

misatosangel · April 16, 2009 3:03AM

Wow, lots of posts since I vanished for Easter... Working backwards then:

@melgross - One workaround for large files on Fat32 is simply to save multiple smaller files. Then you provide a program or instruct the user on how to join them back together again when they take the files off the FAT32 volume and onto something that wasn't from yester-decade. Obviously, better is just not to use FAT32 as the volume format in the first place...

@lfmorrison - `bitness' may not be a real word, but it is certainly bandied about like one. Anyone in my area of work will understand exactly what you mean - although there it is not limited to just 32 and 64. It also tends to go in the same sentence as `endianness ' - another non-word which is frequently pertinent to discussions about bitness. Darn hardware makers...

@Hiro - yes I agree with you; especially in the world of GUI desktop apps, advancing hardware will (and in fact already does) eradicate any perceived difference in speed due to 64bit pointer chasing - indeed the slow-down is not noticeable in that situation anyway.

However, things can be different inside tight loops, and one of the joys of command-line tools is that you have no idea whether your tools is going to be called inside someone else's tight loop.

For an example, we develop a program which some of our customers run over a million times every night on a single cpu. (Yes, they already parallelise across a large cluster of computers, I'm just talking about one cpu here). It is simply unacceptable to them that we give them a new version of the tool which runs (and we're talking entire program load and execution time) one incy, tiny fraction of a second slower. Because even if it runs 0.01 seconds slower, that is going to add another 2.7hrs to complete their run in, which means it won't be finished in time, which means we will get screamed at.

Ultimately we can't just rely on faster hardware here. If they buy faster hardware and things speed up in our current 32bit app, they will simply invoke it more times, meaning that an even smaller slow-down per invocation will have an even larger overall cost. I.e. they will *still* will not want to see that miniscule slowdown - they will *never* want to see a slowdown be it due to bitness or anything else. *Any* slowdown at all is completely, totally and utterly unacceptable to them. [At the same time, they still want more features and think it should do its job better... but that's customers for you

]

The only time it would ever be acceptable to them to use a 64bit version of the same app would be if that version runs *faster*. Eventually, I grant you, maybe it will - but at the moment and for at least a couple of years, this is definitely not the case.

Finally regarding threading - IMHO the problem there is in education. For example, most developers sadly don't even realise that (say) the statement `i++;' isn't atomic. Knowing about mutexes, semaphores and atomicity is another one of those scary things that might cause seizures and hospitalisation.

Most developers wouldn't even know where to begin; even in Java where there is far more language support for this sort of thing. In C, having a fast efficient cross-platform (operating system) threading implementation is non-trivial. Windows for example doesn't have POSIX threads at all, and OS-X makes it very hard to try and get OS specific events sent to different threads. Whilst you can argue that for the majority of programs, this is just an additional albeit annoying step to take; in a few, the cost of manually sending events to threads from a central thread is just too great.

It is definitely true that putting a threading framework in at the beginning of a program's life is the way to go; and the more the OS provides to help, the better. Initiatives like the OpenCL endeavours will help a little for targets that support it. A fact of life though, if you're lucky enough to be working on a brand-new program, most people like to get the exciting features in place first, not the boring infrastructure.

It is worth remembering though that for every problem that is embarrassingly parallel (and which GrandCentral might be able to pass off onto your graphics card, which is extremely good at embarrassingly parallel stuff), there will be another where the data structures just due to the nature of the problem trying to be solved do not lend themselves to it.

Threading isn't the ultimate answer (42 threads?), but it is a good one.

melgross · April 16, 2009 3:59AM

Quote:

Originally Posted by MisatosAngel

Wow, lots of posts since I vanished for Easter... Working backwards then:

@melgross - One workaround for large files on Fat32 is simply to save multiple smaller files. Then you provide a program or instruct the user on how to join them back together again when they take the files off the FAT32 volume and onto something that wasn't from yester-decade. Obviously, better is just not to use FAT32 as the volume format in the first place...

What, you don't think that's been thought of before?

It's no solution.

If you knew something about filming, you would know that you can't stop every time you reach the end of a file.

There's such a thing called "continuity".

Fine, YOU tell the manufacturers to not use the only current file format that's standard there.

It's not as though they aren't aware of the problem.

We hope to see a solution around the end of the year.

But it will have to come in on a new generation of equipment.

Apple pushes devs to deliver 64-bit support with new Snow Leopard beta

Comments