Project "Wolf" and "Presley" in July?

merlion · June 23, 2002 9:43AM

Well if anyone can clear this up it would be Kormac. He always has good insider information. The last I heard the K-Man has been gathering inside information for a pre-expo roundup.

Keep looking for the K-Man; he is the one in touch with the moles.

<img src="graemlins/smokin.gif" border="0" alt="[Chilling]" />

macommentary · June 23, 2002 9:43AM

[quote]Originally posted by allenmcjones:



- (tough to explain because I don’t know what this means) Data is sent back and forth through “vectorization mode”. Binary data is converted in real-time to vector data, converted to ascii, parsed-mapping and then sent back. ( I have no idea)

<hr></blockquote>

Ok I look at this and think: Apple processors (G4 at least) have vector processors. If you need more info about vector processors then you can just find it somewhere in Apple. The point of them is that you can put more pieces of data through them at one time. Now, you're not going to be able to send this data in vector units so it's broken back down into binary and sent bit by bit across (Gigabit Ethernet: the only real possibility that I see curently) and then the information is rebuilt into whatever unit is best for that type of data: int long int doubble, ascii, etc. and processed by the processors in the different computers. Then, one (or more) processors are set aside to piece the information back together because different processor speeds mean that not all of the data will come back in the order that it was sent out. This lets the cluster avoid a huge bottleneck if just one piece of data is a little late getting procesed or home. That's my take anyway.

As far as the internet goes, could this really be worth it without T1/T3/OC3 ?

[ 06-23-2002: Message edited by: MaCommentary ]

keyboardf12 · June 23, 2002 10:08AM

[quote]So? Just about everyone doesn't care about that level of performance. If you have a great need to rip movies at lightening speed as part of your job then you just buy your own cluster. Get it? 99.9% of Mac users don't need that level of power and the other 0.1% just buy what they need.

<hr></blockquote>

huh? why in the world would i buy racks of xserves if i can use the machines i already own in my own office first if they do the job?

huh? double huh? tripple huh?

keyboardf12 · June 23, 2002 10:13AM

[quote] You're wrong. Anyone in science that needs a fast cluster doesn't buy macs. Overpriced and underperforming. Reality sucks huh?

<hr></blockquote>

quadruple huh? so genentech one of the bigger life sciences firms in the world did not just buy a butt load of imacs lcds and xserves? You ever hear of blast? you know the program that has amazing performance on a g4 due to the velocity chip???

[quote] There's no great cry in the night for a super software solution to steel cycles from the secretaries? over powered G4 <hr></blockquote>

There was no cry for publishing your own documents before pagemaker. And you are wrong once again. there is a current cry to make things go faster.

[quote]Trust me. I work in academia and people just buy what they need. . <hr></blockquote>

based on your arguments and comments on this thread and in others, no way.

programmer · June 23, 2002 10:44AM

[quote]Originally posted by MaCommentary:



Ok I look at this and think: Apple processors (G4 at least) have vector processors. If you need more info about vector processors then you can just find it somewhere in Apple. The point of them is that you can put more pieces of data through them at one time. Now, you're not going to be able to send this data in vector units so it's broken back down into binary and sent bit by bit across (Gigabit Ethernet: the only real possibility that I see curently) and then the information is rebuilt into whatever unit is best for that type of data: int long int doubble, ascii, etc. and processed by the processors in the different computers. Then, one (or more) processors are set aside to piece the information back together because different processor speeds mean that not all of the data will come back in the order that it was sent out. This lets the cluster avoid a huge bottleneck if just one piece of data is a little late getting procesed or home. That's my take anyway.

<hr></blockquote>

Well if you're right about this (I had ignored it as incoherent, mis-communicated babble), then it sounds to me like:

- App provides a description of its data and algorithm.

- System rebuilds data into vectors, breaks that into pieces, sends it across transport layer (whatever that is).

- Compute engine (whatever they are) compiles the text algorithm source into its native form and processes its piece of the data and sends it back.

- System collects various pieces into a map, which is then parsed back into the app's format.

Describing the algorithm and metadata is the only use of text I can imagine as being useful in a system like this... go with me on a flight of fancy for a moment:

The interesting thing about this notion is that you can run the algorithm on all sorts of different hardware... you just need a compiler on the destination machine. Things which are good at processing vector data will shine -- i.e. the G4. I say "transport layer" above because it doesn't even necessarily have to be a network... it could be an internal bus within the machine. The new generation of graphics cards, for example, are also powerful enough to implement this kind of a model. If that were true, this system would allow applications to transparently take advantage of the massive (and largely unused) computational power of the nv30 / R300 / etc. This would tie in nicely with the AGP slot in each and every Xserve, and the rumoured high end graphics card that Apple is apparently working on with nVidia (along with OpenGL shader extensions). It would also work with PCI boards that have multiple G4 processors, DSPs (in the chipset, perhaps?), Cray supercomputers, 32-way POWER4 mainframes, etc.

Developers have been complaining about supporting all different kinds of specialized compute hardware since there were specialized kinds of compute hardware. There have been research projects and experiments in this kind of thing for a long time. Wouldn't it be cool if Apple came along with a workable solution?

Fantasy mode off.

naden · June 23, 2002 10:47AM

I think a few people in here are seriously confused:

Altivec has NOTHING what so ever to do with distributed computing or clusters or whatever you want to call it. If you think this, then for god sake stop posting and go read up on it. It is a register set and execution unit ONLY. It is not even a true vector unit but that's debatable. Some comments here seem to merge the two.

The reason it is a benefit in distributed computing scenarios is because most people who build large farms usually are dealing with math equations. These math equations are great for Altivec because you can 'pack' four 32-bit FP numbers into a single 128-bit register and then do four 32-bit mult's by simply doing a matrix multiplication between two 1x128-bit matrices. So in most cases, you get great speed advantages.

As for these stupid people thinking distributed computing is only for server farms are really quite naive. It is undisputedly the future of computing. And the benefits of bringing this to Apple are immense. Firstly, educational markets WILL consider Apple Macs if they can put together a cluster without having to fiddle with MPI and do a lot of programming.

But I think everyone is missing the point of this technology. Apple has purchased a lot of high end video technology companies in the last few months.

Why ?

Obviously, they want to get Hollywood to use Macs. What's a great way to do that ? Get clustering as simply as pressing a button. I can speak from experience that programming something like MPI to build a cluster is not something fun.

Just a thought ...

naden · June 23, 2002 10:58AM

[quote]Originally posted by Programmer:



Well if you're right about this (I had ignored it as incoherent, mis-communicated babble), then it sounds to me like:

- App provides a description of its data and algorithm.

- System rebuilds data into vectors, breaks that into pieces, sends it across transport layer (whatever that is).

- Compute engine (whatever they are) compiles the text algorithm source into its native form and processes its piece of the data and sends it back.

- System collects various pieces into a map, which is then parsed back into the app's format.

.<hr></blockquote>

The only problem with this is that you would have to EXPLICITLY program for these specialised computers lying around. You can't just run something on a Cray or a G4 and hope that the vector units just switch on and automagically pack themselves full of data. It just doesn't happen that way. Now there is obviously intelligent compilers that can do this, but some of these older machines probably don't even have a C compiler. Either way, a DP G4 is probably faster than some of those older Cray's anyhow. And I don't mean some sort of 10 billion CPU beasts

If you mean build a VM for each of these machines and compile your task into an intermediary code then okay that's possible. But Apple won't do that. Now Transmeta has something in that sort of area .. emphasis on the 'sort of area'.

There is heaps of work being done in acadmia in this area. So checkout the IEEE Distributed Computing papers if your so inclined.

rogue27 · June 23, 2002 11:15AM

Wolf.

I like the idea and it makes sense on many levels.

Marketing - it increases the appeal of Xserve and macs in general. This will potentially lead to more sales in areas where macs aren't too popular at the moment.

Hardware - Well, since we are probably going to have DDR memory but not a double-pumped FSB, it makes sense to send some of the data through that gigabit ethernet port and let other processors do some of the work on it.

Software - This would encourage more people to make well-threaded applications which will benefit all dual processor systems even without wolf turned on.

Also, it is not that difficult to make an application break up a project into small pieces and have several different threads work on different pieces. I wrote a Java program that uses this concept to render a fractal image. SMP would let these threads work over both processors if I had a dual processor system. Wolf would let these threads work on other computers in such a way that is transparent to the user. I think that would be really cool.

I'm assuming wolf works with rendevous (probably spelled wrong) so it would know how many processors are available to work on and where they are located before you start doing any work. I'd also assume that wolf would re-assign unfinished pieces of work after moving through the data set because it won't know if they got lost or one of the computers was turned off, etc.

Conceptually, this is very sound. I mean, OS X is designed to be left on at all times. Much of the time people will be logged out or idle anyway. Most of the time, your computer is waiting for you. I mean, you, reading this post, aren't using all of your processor's clock cycles. You are reading and scrolling down. Maybe you have an MP3 also playing. You still have a lot of free processor power. The pieces of data that would be rendered are small.

This clustering would not touch any of the software on your workstation. It would all be bus, cpu and maybe RAM, but the data pieces would probably be small enough to fit in the cache anyway, especially with the big L3s on the Xserve and (probably) upcoming PowerMacs. You probably wouldn't even notice it happening unless you had Top turned on and were staring at it in the split second when a piece of work was being processed.

Practical uses for normal people:

Art lab. At my college we have a couple Electronic Imaging labs. One has 16 G4s, the other has about 30 iMacs and G3s. suppose you want to render an After Effects video or a Cinema 4D project (which will take many times the length of what you're rendering). Now, you are at one computer in the lab. Your computer will be handling the thread management (which is *not* very processor intensive) and you'd be doing some threads as well. Great speed gains could be found if some of the machines that nobody was logged into were doing some work, or a machine where somebody is comparing the appearance of two photoshop images he made could do part of it. I mean, in a lab like this, the only times a machine is being maxed out in usage is when it's running a long photoshop filter or rendering something. Usually they are mostly idle as people are moving layers around, adjusting contrast, brightness, and hue, etc. Rendering is what takes time. All machines that are not rendering have many free cycles. If you use those free cycles when you are rendering, then machines won't even be busy rendering for long.

My bedroom. I have a Powerbook and there is also a G4 which is belongs to my brother and is acting as a web server. If I am compiling code on my PowerBook (which I often do) and a few of the objects to be compiled could be sent to the G4, compiled, and sent back, it would save me time. It might only be seconds for the projects I am working on, but a larger project, it could save a significant amount of time, thus allowing you to test, modify, and re-compile more often.

Also, the airport camera is feasible and sensible. 802.11g goes at 56 or so megabits per second I believe? (There are so many 802.11x standards that I always get the numbers mixed up.) That translates to 7 megabytes per second. Considering that you can watch a streaming movie at under 100kB/sec, there is a lot of room here. I understand that the movies you see on the web are compressed, but with less than 6MB/sec, you could do un-compressed 800x600 24-bit video at 4 fps. If you drop the resolution to 320x240 the fps can be brought above 24. If the camera had some kind of compression algorithm built in, the fps could be increased even further. It would not let you do the highest quality videos, but if you are in a location where you cannot setup big equipment, a laptop and this camera would be a good option. the fact that you could use airport means you could set the laptop down and just film stuff without a wire holding you in place. I'd assume you'd have firewire as well, but airport would be there if you want to use it.

Also, a digital video camera is expensive anyway. Nobody will notice the extra cost of AirPort, and AirPort would be much cheaper to integrate instead of selling individual cards. They could just put the chip on the camera. Or maybe have an AirPort card slot and let you buy the card yourself if you want it.

I don't necessarily believe this, but it is feasible and sensible.

Oh, and that fractal program I mentioned earlier...

<a href="http://homepage.mac.com/rogue27/"; target="_blank">http://homepage.mac.com/rogue27/</a>;

programmer · June 23, 2002 11:18AM

[quote]Originally posted by naden:

If you mean build a VM for each of these machines and compile your task into an intermediary code then okay that's possible. But Apple won't do that. Now Transmeta has something in that sort of area .. emphasis on the 'sort of area'.

There is heaps of work being done in acadmia in this area. So checkout the IEEE Distributed Computing papers if your so inclined.<hr></blockquote>

It doesn't have to be either, really. Develop a new language (i.e. not C, which is horribly unsuited to the task) to describe compute jobs, and build a fast compiler that uses whatever vector facilities are on the compute engine (note that the compiler doesn't even have to run on the compute engine, there could be another computer which feeds the engine algorithms compiled for it). If everything were decomposed into operations on long vectors of data then the overhead might become manageable. Since expensive operations on long vectors of data are typically the most expensive calculations people do with clusters, it makes some degree of sense to describe them in a way that is independent of the target hardware so that the developer doesn't have to worry about each piece of target hardware. Apple wouldn't have to implement the compute engine side of things for anything except their own machines, but they could publish the spec for how to do it using their OpenSource license. This is similar to how they made the QuickTime file format public. Academia could then implement this for all the miscellaneous hardware they have lying around.

Even just within a single Mac this could have potential -- many developers use the Apple provided signal processing libraries without caring how they are implemented, but the problem with those is that you are stuck with the algorithms that Apple gives you. If instead you had a high-level vector language (think of something more along the lines of Mathmatica) to code your computions in, then you could feed it to the system and have the system take care of it using whatever resources it has available.

Yes, there is tons of this kind of research being done... but it has yet to be turned into a technology that is built into your personal computer and which developers could rely on as always being available.

iCompute.

jonathan brisby · June 23, 2002 1:18PM

And this is where I bow out of the tread. As I feared... another one lost to the techies and dreamers. Don't patronize me about this being 'rumors' or not. The misunderstood can always claim to be right after the fact. This is fake, don't let it become another runaway lucida post. (I forsee pictures of said mouse, and more screenshots of iShake in the future of this thread). Good ridance. On to newgrounds...

airsluf · June 23, 2002 1:39PM

programmer · June 23, 2002 1:58PM

[quote]Originally posted by Jonathan Brisby:

And this is where I bow out of the tread. As I feared... another one lost to the techies and dreamers. Don't patronize me about this being 'rumors' or not. The misunderstood can always claim to be right after the fact. This is fake, don't let it become another runaway lucida post. (I forsee pictures of said mouse, and more screenshots of iShake in the future of this thread). Good ridance. On to newgrounds...<hr></blockquote>

<img src="graemlins/bugeye.gif" border="0" alt="[Skeptical]" /> Without the "techies and dreamers" you wouldn't have more than an abacus, bub.

engpjp · June 23, 2002 2:46PM

"You're wrong. Anyone in science that needs a fast cluster doesn't buy macs. Overpriced and underperforming. Reality sucks huh?

Trust me. I work in academia and people just buy what they need. There's no great cry in the night for a super software solution to steel cycles from the secretaries? over powered G4."

Scott...etc,

I don't need to trust you on this one. As a Dane I was extremely interested to read in today's news that Denmark's foremost technical university, DTH, has set up what they claim is Scandinavia's largest Mac cluster:

"The Danish Technical University (DTU) has developed Scandinavia's largest dedicated Mac cluster, consisting of 32 dedicated Power Mac G4 Dual-800 MHz workstations (equivalent to 200 Gigaflops). The cluster, which is called Velocity-X, will be turned on for use on Monday, June 24th with the presence of Apple Denmark. The dedicated Mac cluster will primarily be used to understand proteins' influence on cancer as well as on larger film and animation projects. Velocity-X can be rented for use through the Internet for approximately 50,000 Danish kroner per week ($1 = 8.5 DKK). " (MacNews.com)

And you trust me: they have more than enough PC's, Sun's and people to work them, to choose anything other than Macs for such a cluster if it wasn't for the fact that they deemed it the best for the job!

I know. Several of the professors there are among my personal acquaintances. And I have similar contacts at universities in five other countries - their feedback is that PC clustering is rather more difficult than Mac clustering.

Please continue with this well-informed, technical discussion. I learn a lot from it, and I enjoy partaking in subjective but well-balanced and well founded argumentation like this. Kindly keep it up - and thanks.

speechgod · June 23, 2002 2:57PM

I have a radical idea. What if this is a technology to run an X86 or whatever cluster from a Mac? You get all the advantages of OS X, and such, but use the power of high-end Athons or whatever.

Apple could use this to utilize faster processors while maintaing the compatability with PPC code.

This seems more plausable to me than using Macs as the big iron.

Unless Apple has something to fix that. . .

frawgz · June 23, 2002 3:37PM

[quote]Originally posted by scott_h_phd:

You're wrong. Anyone in science that needs a fast cluster doesn't buy macs. Overpriced and underperforming. Reality sucks huh?

Trust me. I work in academia and people just buy what they need. There's no great cry in the night for a super software solution to steel cycles from the secretaries? over powered G4.<hr></blockquote>

"If Apple were to offer a scalable, high-density hardware solution, I would push hard for a platform switch," said Patrick Gavin of the University of California at Santa Cruz Center for Biomolecular Science and Engineering. "The PowerPC architecture is vastly superior to anything else out there in terms of power consumption versus processing power."

<a href="http://www.wired.com/news/mac/0,2125,50454,00.html"; target="_blank">http://www.wired.com/news/mac/0,2125,50454,00.html</a>;

programmer · June 23, 2002 5:16PM

[quote]Originally posted by speechgod:

I have a radical idea. What if this is a technology to run an X86 or whatever cluster from a Mac? You get all the advantages of OS X, and such, but use the power of high-end Athons or whatever.

Apple could use this to utilize faster processors while maintaing the compatability with PPC code.

This seems more plausable to me than using Macs as the big iron.

Unless Apple has something to fix that. . .<hr></blockquote>

That's what I was getting at, only on an even larger scope.

admactanium · June 23, 2002 11:12PM

[quote]Originally posted by speechgod:

I have a radical idea. What if this is a technology to run an X86 or whatever cluster from a Mac? You get all the advantages of OS X, and such, but use the power of high-end Athons or whatever.

Apple could use this to utilize faster processors while maintaing the compatability with PPC code.

This seems more plausable to me than using Macs as the big iron.

Unless Apple has something to fix that. . .<hr></blockquote>

but it wouldn't push apple's hardware. i can't imagine they'd want to encourage anyone to buy more x86 boxen. as far as people saying that almost no one NEEDS this clustering technology: there are plenty of people who don't necessarily *need* it but would use it if it were available. why wouldn't you. and if you had to replace some low-level computers, it would actually give you a reason to buy a new mac instead of a new dell. not many people *needed* to set type on a computer or edit film on a computer at some point in the past.

johnsonwax · June 24, 2002 12:53AM

[quote]Originally posted by scott_h_phd:



You're wrong. Anyone in science that needs a fast cluster doesn't buy macs. Overpriced and underperforming. Reality sucks huh?<hr></blockquote>

Well, that's historically been true. It's also why our local Apple rep calls me to find out what they need to do to get into our engineering school.

[quote]Trust me. I work in academia and people just buy what they need. There's no great cry in the night for a super software solution to steel cycles from the secretaries? over powered G4.<hr></blockquote>

Uh, this I would disagree with. The typical research university in this country has 5,000-10,000 administrative computers. Even with low-end G4s, that's a shitload of CPU cycles sitting around doing nothing 120 hours per week.

I'm reasonably sure that if Apple shipped a generalized distributed computing solution that was easy to administer and to build solutions into, that we could selectively combine administrative computing budgets and funds for computational research system and both parties would be better off.

If presented with the notion that there might be 5,000+ GFlops laying around the place that could be recycled, I'm pretty sure the university would jump at it. After all, last summer a lot of universities in CA were subsidizing LCD purchases to save electricity consumption.

I think you might be underestimating the ability of other large institutions to lay down a comprehensive computing plan.

[ 06-24-2002: Message edited by: johnsonwax ]

macmatt · June 24, 2002 3:10AM

Does this solution have anything to do with the so called 'glove' venture... ?

costique · June 24, 2002 8:26AM

[quote]Originally posted by speechgod:

I have a radical idea. What if this is a technology to run an X86 or whatever cluster from a Mac? You get all the advantages of OS X, and such, but use the power of high-end Athons or whatever.

Apple could use this to utilize faster processors while maintaing the compatability with PPC code.

This seems more plausable to me than using Macs as the big iron.

Unless Apple has something to fix that. . .<hr></blockquote>

...So you could 'upgrade' your Mac with two or three 2GHz Wintel boxes for the same money? That's a break-through marketing idea.

Seriously speaking, it might have a lot of demand.

Project "Wolf" and "Presley" in July?

Comments