To get maximum parallelization for an application, not only must the application be developed to take advantage of multiple cores, but should also have the code in place to keep a number of threads working on each core. A modern processor architecture, such as the Intel Xeon Phi processor, can accommodate at least 4 threads for each core.
On the Intel Xeon Phi processor, each of the threads per core is known as a hyper-thread. In this architecture, all of the threads on a core progress through the pipeline simultaneously, producing results much more quickly than if just one thread was used. The processor decides which thread should progress, based on a number of factors, such as waiting for data from memory, instruction availability, and stalls.
If an application does not need all 4 threads at a given time, the thread(s) that are not being used can be turned off with a BIOS setting. However, if all the threads are turned on for a core, then software will put the un-needed threads into a HALT state. At a later time, the software can start up the required number of threads.
Using two active threads is generally a good choice for most applications. This allows for out of order processing of the instructions, and can hide latencies for cache misses and memory access. If an application is highly threaded, then all 4 of the threads can be turned on which can give even higher scaling for memory sensitive workloads.
Utilizing available hardware computing and memory access capabilities is the responsibility of the application developer. However, new generations of hardware can greatly assist by adjusting the resources to match the application requirements. The Intel Xeon Phi processor is an excellent example of being able to adjust to the current workloads to produce the best and fastest results for the end user.