anonconformist

About

Username: anonconformist
Joined: September 2015
Visits: 108
Last Active: February 7
Roles: member
Points: 554
Badges: 0
Posts: 199

Reactions

201Like71Informative

Future MacBook Pro displays could automatically open & tilt to suit the user

anonconformist

December 2021

Other than the more moving parts issue is the question of what the user wants and why: perhaps there’s an issue of being in a less-than-private environment while working with private stuff, while another situation involves lighting conditions. Perhaps you’re limited in space as to how wide you can open it (airline seat).

DO NOT WANT
Porting operating systems to Apple Silicon leagues harder than migrating software

anonconformist

November 2021

cloudguy said:

anonconformist said:

Weird headline indicates operating systems aren’t software, which is nonsensical on its face: operating systems are the quintessential software, upon which separate user space applications depend on to exist, as abstract libraries full of functions that allow users to interface between the physical hardware interfaces and the user space applications.

So, if you have more to worry about than adapting compiler toolchains, boot code, a little abstraction code for certain optimized APIs in the OS, and the HAL and device drivers, you’re doing it wrong.

The rest is mechanical in nature.

Point 1: good grief. People who have, you know, actually studied computer architecture and organization please speak up. In the meantime ... it has been common parlance for decades to refer to application software as software and systems software as the OS. And also to refer to both the kernel and the software that runs on top of the kernel but below the application software (conceptually) like the GUI as "the OS" when you really shouldn't, especially when it comes to Linux. So ... I don't get where you are going with this?

Point 2: "You're doing it wrong ... the rest is mechanical in nature." Ah. That is where you are going with this. The first is FALSE and the second is VERY FALSE. Clearly you haven't done this before. Let the people who have be the ones to talk about this because they are the ones who are actually qualified to do so. Go try to port a major application from x86 to ARM on the same OS (i.e. Linux or Android) and see how "mechanical" it is. Not even that ... merely port a 32 bit application to a 64 bit one. Done it before? It is neither easy or fun. But I guess it should have been "mechanical" and it was "being done wrong" eh? Goodness where do people like this come from?

Well, I go to clear out long-open tabs, and come back to this thread.

Guess what, I do this sort of thing for a living. I work in the bowels of a major OS, in the context of ALL the code, not just that which executes in user-space. Guess what? It’s exactly as I’ve described it as far as how things are abstracted! Also guess what? The OS provides LOTS of abstractions in libraries you link into your application for a huge number things to make your life far easier. Guess what? Yes, making a direct port between one machine architecture and another IS done by the Hardware Abstraction Layer of the OS for a lot of the things that need to be abstracted, a d there is ALSO an ABI (Application Binary Interface) standardized for each compatible processor that runs the same low-level code, in the case of Intel, how things are passed via the registers and the stack vary based on whether it is 32-bit or 64-bit. Sure, you could pass everything via the same ABI on 64-bit, but it has more registers to work with, so that’s something good to take advantage of.

Now, about the code in user-space applications: given the same source code, whatever the higher-level language, even if it is NOT perfectly “portable” (C/C++ have a huge number of things in the language specifications explicitly left to the vendors to specify, because they’re intended for use at the lowest system level for performance) and, this will clearly amaze you, to get perfectly correct translations from one CPU to any other CPU for anything at a level higher than native assembly language ABSOLUTELY has a mechanical transform you can make from code A to code B to achieve the same logical results. Things are a little more interesting when sizes of integers used from one CPU to the next, but despite all that, it is STILL very mechanical and straightforward to translate working code from one CPU to the next: this is very commonly done in C/C++ via macros and the preprocessor.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

November 2021

tht said:

anonconformist said:
With dedicated media acceleration they don’t have a need for several low-power cores while playing back video, or even encoding it: most common desktop applications rarely make use of more than 2 separate threads at any given time, with one being for front end GUI interaction with the user, and a background thread for waiting on I/O, which, when the hardware is implemented correctly, is mostly waiting without much processor use anyway. Thus, it’s fully possible in most use-cases you can do all you need with just the 2 energy-efficient cores: Mail or Pages will be far more than fast enough with the 2 small cores. Web browsing, similar for I/O, but other things will cause greater power core usage, and a lot more threads.

The other reason you would use efficiency cores are for regular scheduled background system tasks, that don’t need much CPU power. Again, even 2 threads is more than a lot of those will use. I personally thought it was more an odd choice for the M1 to start out with 4 energy-efficient cores because of these reality-based considerations. Perhaps that was more a question of getting some real-world results achieved with lower risk, and getting real-world data of core usage in practice as to why they went with those decisions.
If you look at the die shots, the e-cores take up about 1% of the SoC chip area in the M1 Max and about 2% in the M1 Pro. Adding another 2 e-cores isn't a big cost. I can only think of one reason for them to do this: they needed the standby and idle power consumption of the M1 Pro and Max to be as low as they could possibly go, and they felt that 2 additional e-cores was enough CPU to handle standby and idle tasks, like all the app and system tasks you mention. But the iPhone has 4 e-cores! And, you'd think an iPhone would really need to save on power, and yet, it has 4 e-cores. So, here we are, why the difference?

The e-cores are so small such that having 4 or 6 e-cores just isn't that big of a cost. If they prevent the p-cores from firing up while web browsing, especially background web pages, it seems like a nice gain in energy efficiency. It's a browser-centric, javascript-centric workflow most of the time for users, even if they do a lot of GPU compute or multi-threaded workloads, most people still spend a lot of their time just web browsing. So, managing background web pages with e-cores seems like a really good idea.

At some point adding e-cores isn't effective in the energy efficiency versus perf trade, so, it's not going to be 8, 12, 16 e-cores, but I would think 4 to 8 would be the sweet spot for 8, 16, 32 p-core systems.

Consider this for a possible explanation as to why only 2 e-cores versus more: as much as each of their cores are a tiny percentage of the die space, it’s distance signals must traverse farther for, and included in this is memory controllers and cache coherency for each core, which I’m reasonably guessing is as complicated and as large and as much of a system performance overhead with no regards to how fast the core is. In software development a big overhead for many processes with more than one thread is due to synchronization between threads, and the e-cores versus p-cores won’t be any better, perhaps even worse, since they tend to run at different frequencies, which has its own fun for buffering between them. The iPhone isn’t something I’d ever expect will expand beyond around what it has for number of main CPU cores and similar GPU cores, and the other added things in the SoC, because… it’s a phone, and not a wise use-case trying to do all you can do in a laptop or even a full-sized iPad for human reasons, if nothing else.

Glue logic has its place and its price in any design, and can make or break a system. It’ll be interesting to see what Apple does for core breakdowns across all their products, from the Watch up to their MacPro. I also consider the original M1, as much as anything, a usable lower-risk proof of concept release for many things, including the scaled-up design of things along so many fronts, and a rational iterative development strategy of walk (fast!) before you try to run. It’s really the Mac MVP for their transition.
Compared: M1 vs M1 Pro and M1 Max

anonconformist

October 2021

retrogusto said:

Any thoughts on why the Pro would perform slightly better than the Max on the Geekbench multi-core test? Maybe the Max runs hotter, which would place it at a disadvantage in a test where both chips have ten identical cores?

There is always a trade off for speed when you add more things to address in a chip.

When you double the accessed number of functions, you add another bit of address/selection lines, and even if the speed of what was accessed (once it was selected) is identical and remains in the selected state, that initial access due to the layers of combinatorial logic still takes more time because it requires more linear distance for signals to travel, and that means more resistance and capacitance of the signal lines, beyond the speed of each transistor used.

Look at the sizes of caches for CPUs: the smaller the caches are, the lower the latency, because of this as a factor. Also, as as you increase the size of the caches, the distance grows (and thus time) for access, and generally it’s preferable to have constant-time/cycles for cache access (especially L1 instruction and data cache) to simplify and speed logic. Once a cache becomes too large, not even counting the heat/power issue, your system performance goes down, even if you can afford to make that large of a chip.

Now, look at the size of the M1 Pro versus the M1 Max: the Max is over twice as large for area. All of the GPU/CPU cache and those cores must be kept coherent, and that has a cost in space, time and energy, and waste heat. Bigger isn’t always better for performance, as alluded to above, and bigger systems tend to be slower due to sheer size: at the speeds of electrons in circuits (not as fast as light, but a good percentage) the differences in the sizes of the M1, M1 Pro and the M1 Max really add up. For the same reasons, it’s a HUGE design win to have the RAM on the same package, because the circuit traces are kept minimal for distance, which greatly affects power usage as well, making the system so darn power-efficient while also being fast.

A brief explanation of resistance/capacitance for those not familiar with it: resistance, measured in ohms, affects how much current (in amps) can flow through a circuit, and a perfect superconductor has zero resistance, so theoretically infinite power could flow through. Capacitance is measured in Farads, and capacitance is the resistance to voltage change. Digital circuits need to have 2 distinct voltage ranges for determining what’s a 1 (high voltage, usually) and a zero, and there is a range in between the one and zero voltage that’s a no-man’s land. The higher the top voltage is compared to the low voltage, the longer time it takes to settle into a recognizable state. Capacitance and resistance both have elements of size involved, and you want to minimize capacitance and its effects for speed (because it resists the change in voltage) as well as ideally not increase current flow with too low of resistance, because that’s wasted power in the form of heat. Long circuit traces increase both, for same width of traces, with more material for traces reducing resistance. This is why processors are down to less than 2 volts for normal operation in recent years, and to make them run faster, you need to increase the voltage to have less effect from RC (resistance/capacitance) time constants because you have more voltage to force through, but it wastes power at the law of squares: twice the voltage, twice the speed, but square the power usage in watts so you now use 4 times as much power.

Given all these constraints from physics, I’m very curious how they’re going to design and manufacture the SoCs used for the Pro desktop/workstations with more CPU and GPU cores with unified memory. Designing and building them as they are has practical limits for costs as well as scaling. I expect the next step is using chiplets on the package with an underlying total system bus on the main chip that is mostly system cache and I/O, with the CPUs and GPUs with their more local caches on chiplets with a number of cores each.
Intel under fire: What Wall Street thinks about Apple's new MacBook Pro

anonconformist

October 2021

mcdave said:

hackintoisier said:

Not sure how the summaries listed correlate to the opinion that Intel is “under fire.”

Also, Intel is ramping Alder Lake-S which probably will exceed M1 Max performance (albeit while consuming much more power). Also alder lake will have up to 8 golden cove performance cores and 8 gracemont efficiency cores. Raptor Lake is rumored to launch in 2022 and double the efficiency cores to 16, for a total of 8 + 16 = 24 cores and 32 threads. Intel is still selling a metric ton of processors to the ecosystem. 80 percent market share. Microsoft just announced that it updated the windows 11 kernel thread scheduler to schedule threads in a manner that takes advantage of the hybrid design. Intel might be coming back.

Intel and AMD will be in trouble if and when ecosystem partners like Asus, Dell, Lenovo, Razer, Microsoft etc. introduce non-x86 designs.

I don’t see x86 being in trouble until two things happen.
First, An ARM vendor emerges that sells an ARM processor to the mass market with performance characteristics on par with Apple silicon or the upcoming x86 designs (or the ecosystem partners develop their own in house designs). Qualcomm can’t compete with Alder Lake or Zen 4. And as good as Apple silicon is, it can’t run windows natively… and not only that, Dell, Lenovo, Asus can’t put an Apple silicon processor inside of their laptops because Apple doesn’t sell to other people. So for the billions of users out there who don’t use macOS, Apple silicon is not relevant to them. Now if Apple got into the processor supplier game (it won’t) then that would spell serious trouble for AMD and Intel.

Second, windows on arm needs to be licensed for broader non-OEM use, and it also has to seamlessly run the applications that people want to use like games, office suite software, content creation software, and so on.

Until those two things happen, Intel and AMD will be fine. But Apple’s innovations could spur other laptop manufacturers to follow suit and ultimately press Microsoft for a windows on arm solution. Intel and AMD need to tread carefully, and continue to ramp x86 core design production on smaller nodes. ASAP.

A very dated perspective.
1) Nobody cares about cores, it’s about work done. Let’s see what the workflow reviews say once MBPs hit decent reviewers (giving the trolls another chance to brush up their tactics).
2) Vendors are badging their own silicon with MS leading the way (more like AMD customising for Playstation) so I don’t see an ARM champion emerging as it’s just a supervisory ISA. When Lenovo & HP announce their chips, it’s over for Intel.
3) Intel isn’t even close. When you look at the products which match Apple’s meagre CPU benchmarks (Cinebench is optimised for x86 AVX2 only - the Embree renderer is Intel’s code) the TDP is 125W with 250W peak.

The majority of applications for non-server uses that use more than 2 threads/cores at any given time, let alone effectively, is a very small number and percentage: writing software that isn’t inherently readily parallel and making practical use of more cores is usually far more effort than it’s worth, on a good day. As such, processors like AMD’s ThreadRipper is silly for most uses and users for average software, as most cores will sit idle unless you’re running a lot of other software in the background. A smaller number of faster single cores is the sweet spot for cost/performance of a system, and thus, having a bunch of efficiency cores AND a bunch of performance cores doesn’t seem probable to get great overall results. Most background tasks that aren’t your main active process usually aren’t running constantly, as they’re waiting for data: a lot of background processes can be running on efficient cores, slower, and not affect their effectiveness. As such, most of the time, in practice, a couple efficiency cores can effectively provide more than enough system throughput to provide a responsive GUI on your foreground application AND all the background stuff, too: this is how Apple can get such crazy battery life.

But most people aren’t aware of this reality. The one place where throwing lots of cores at something everyone will notice is the GPU, because its task is embarrassingly parallel in nature.

There are a number of use-cases where all cores can and will be used to good effect, and the people that use those applications (hopefully!) know what they are. But for Apple’s office suite? More than 2 CPU cores is wasted hardware.