Quad G5 isn't doing enough

Marvin · March 6, 2006 3:33PM

The Quad G5 is fast but it could do things faster if it just put in more effort. It doesn't seem to allocate enough CPU to certain tasks. One example is that I was using ffmpegX to recode a DVD and it only allocated about 75% CPU to the task.

It seemed to be distributed across the CPUs too because the activity monitor showed that all CPUs were running at most 25%.

Am I right in saying any process should be able to use up to 400% on a Quad? If so, no process has ever gone above 100%. Multi-threaded apps don't seem to either. It's as if the Quad reserves power so there is always some left to let you do other work but I want all the power when encoding.

I noticed that CPU was set to automatic today so I put it on highest but still it made no difference. I understand that some apps like Cinebench can use all the CPUs but why can't the OS allocate more resources for the majority of software?

It kind of makes the Quad a wasted purchase. A lot of people who have Quads have said the same thing. In some ways it's good because unlike other machines, no matter what I do, it never seems to slow down but I'm used to leaving it running by itself and I'd like it to work faster on it's own.

Are there any solutions for this?

slughead · March 6, 2006 3:45PM

it's because most tasks are sent to the processor as if they need to be completed in a sequence.

For instance, you have a math problem and you want to solve for G:

A = 1

B = 2

C = 3

A * B = D

D + A = E

B * B + E = F

C + F = G

Now, the way that was "coded", you need each of those to be solved in sequence (Ya can't do D + A if you donno what D is!). The only way a program can use two CPUs is if it has 2 or more independent operations running at once.

Think of it this way: say you were counting to a million and a bunch of your friends wanted to help you. They couldn't start from 250,000 while you're still at 1, they'd have to wait until you got there to continue. Thus, it wouldn't matter how many people you had helping you, if you're doing something in order, the fastest way to count is just to get a faster counter (we'll call that gigahertz). The Quad G5, therefore, is as fast at doing some things as a single processor G5 running at the same clock.

So, if a program not only needed the above problem solved, but you also needed a problem that used variables H through L, you could code it so it would send that to a different processor (or hyperthread it on the same one).

Sometimes programmers write things that COULD do more than one thing at a time, but they choose not to code it that way for one reason or another. Therefore, it can only use one processor and only do one thing at a time.

Howevah!:

Please note that if you're running 2 or more processes, it'll automagically split them up.

So, in short, just buy a ton of RAM and run a whole bunch of random tasks at once and you'll see where your money went.

fahlman · March 6, 2006 5:26PM

Quote:

Originally posted by Marvin

Are there any solutions for this?

Recode 4 DVDs at a time.

lundy · March 6, 2006 6:30PM

The reason you see ll 4 being used at 25% is that the Mach kernel is balancing all of the running processes across all of the cores, including the OS's processes.

The ffmpeg must be periodically waiting for something (disk access?) and during that time it is not threaded well enough to know to proceed with more processing. If it had one thread doing the math and another periodically saving to disk, instead of doing those things alternately, you might see a speedup - but that is more tricky code to write, as you have to use semaphores between the threads to tell each other when they are ready for more data.

The Mach kernel does not "reserve" CPU time in the sense that you mention - if a process is queued for execution and starts to run, it only stops when

- it blocks waiting for a resource (disk I/O, printer, VM swap)

- it uses up its 10 millisecond timeslice

With no other CPU-intensive tasks running, using up the timeslice will usually result in the process getting queued again right away.

So most likely it is blocking waiting for a resource.

If you want to see all four cores maxed out, run my Perl script that does the Monty Hall Problem. It does not do any I/O so it will pin the CPUs. Give it 10 million games to play.

http://forums.appleinsider.com/showt...threadid=60684

marvin · March 6, 2006 6:33PM

Quote:

Originally posted by fahlman

Recode 4 DVDs at a time.

Yeah, that's true. I could have duplicated ffmpegX and run multiple copies at once. The way I did it was to rip one while encoding another then rip the next while still encoding and then encode while burning the first and so on. It's a shame the Quad G5 only has one optical drive.

Quote:

Originally posted by slughead

Please note that if you're running 2 or more processes, it'll automagically split them up.

I just sorta wish the OS could split up tasks by itself but I guess for sequential problems like the one you wrote, it would need pretty accurate timing and even then, the overhead would probably slow it down more than it's worth.

Still, it doesn't explain why I've never seen any one of the CPUs maxed out. When I was using ffmpegX, it looked as though it was using multiple CPUs, just not all of their power.

I mean handbrake was only giving me about 15fps for mpeg-4 encoding but all the processor meters were down low.

ebby · March 6, 2006 6:43PM

Quote:

Originally posted by fahlman

Recode 4 DVDs at a time.

I don't know if you were joking or not, but this is totally true. If you run 4 instances at a time (4 encoding jobs) you can use almost 100% of each CPU.

I heard Handbrake is really good at using multiple CPU's and my Dual hovers around 90% of each.

I learned this trick from Folding way back when where 2 copies were run simultaneously to max out the performance.

lundy · March 6, 2006 9:18PM

Quote:

Originally posted by Ebby

I don't know if you were joking or not, but this is totally true. If you run 4 instances at a time (4 encoding jobs) you can use almost 100% of each CPU.

I heard Handbrake is really good at using multiple CPU's and my Dual hovers around 90% of each.

I learned this trick from Folding way back when where 2 copies were run simultaneously to max out the performance.

Definitely true. Mach can't split a thread unless the thread blocks. So if you as the user can manually make 4 threads by running 4 instances, that will work wonderfully - Mach will have 4 threads to work with, so if one blocks it will queue up one of the other ones, keeping the CPUs busy.

lundy · March 6, 2006 9:23PM

Quote:

Originally posted by Marvin

Still, it doesn't explain why I've never seen any one of the CPUs maxed out. When I was using ffmpegX, it looked as though it was using multiple CPUs, just not all of their power..

This is a question that arises often.

Even if your app is a single thread that does nothing but CPU, Mach is still giving it a timeslice of 10 milliseconds. When that timeslice is up, if the other processor is underutilized, Mach will schedule the next 10 milliseconds for the other processor to keep the loads balanced. It also will put as many OS threads and other app threads on the other processors to balance your CPU-hungry app. That is why you see the processor bars all the same height - it is the design of the scheduler.

Your app will not show 100% CPU unless all it is doing is CPU. If it blocks for I/O, then during that time the CPU isn't being used and you see the result in the percentage use bars.

marvin · March 7, 2006 12:51PM

Quote:

Originally posted by lundy

When that timeslice is up, if the other processor is underutilized, Mach will schedule the next 10 milliseconds for the other processor to keep the loads balanced. It also will put as many OS threads and other app threads on the other processors to balance your CPU-hungry app. That is why you see the processor bars all the same height - it is the design of the scheduler.

Ah, that's the explanation I was needing. That makes sense. It's a pretty good idea when it comes to CPU cooling I imagine because in most cases, the processors aren't being stressed.

I will just have to get used to doing more tasks at once or get more software that uses better multi-threading.

fahlman · March 7, 2006 4:19PM

Install an instance of Stanford University's folding@home client on each of your processors if you'd like to see all four of them working at 100%. You'll also help cure cancer.

xool · March 7, 2006 4:41PM

Is the source material on the hard disk or are you reading it off the DVD?

The same problem appears for iTunes encoding, where the source media will impact encoding speeds. For this reason, I perform all my tests of media that resides on the disk drive so that the optical drive is not the bottleneck.

marvin · March 8, 2006 4:12AM

Quote:

Originally posted by Xool

Is the source material on the hard disk or are you reading it off the DVD?

It was off the hard disk.

Quad G5 isn't doing enough

Comments