# reading matter | Playful Jaguar inside Xbox One and PS4: Architecture

Процессоры AMD

The AMD Jaguar processor is based on two newest gaming consoles. Microsoft Xbox One , on a visit to the creators of which our readers recently visited. And, of course, Sony PlayStation 4 (PS4) . What makes this particular processor so attractive for the gaming industry? Perhaps the answer to this question can be found in its technical features. We continue to look at the game from the "iron". And today we'll take a closer look at the processor architecture involved in the game consoles of the new generation.

Two architectures are more efficient than one

Архитектура Bulldozer

Nowadays, the development of microprocessor architecture rests on the constraint imposed by power consumption. The processor should be clearly positioned for use in certain devices. The reason for this indicator is TDP. The figure TDP characterizes the ability of the cooling system to remove heat. It should be higher than the maximum heat dissipation of the chip, but too much margin reduces the efficiency of the architecture.

For example, the Intel Core architecture (Sandy / Ivy Bridge) can work efficiently, consuming energy in the range of 13 to 130 watts. It turns out that it can be used in devices that consume more or less energy. But it is much more efficient to develop an architecture whose heat dissipation (and, accordingly, power consumption) to a greater extent corresponds to the TDP indicator. In other words, it is desirable that all processor indicators match each other. It is more profitable from both technical and economic points of view.

Both AMD and Intel follow this “order of magnitude rule”. Therefore, each of the world's leading chip makers has two different microprocessor architectures. Intel offers Atom for low-power systems and Core for high-performance computers. In 2010, AMD introduced Bobcat, energy-efficient, and Bulldozer for high-performance systems.


Both Bobcat and Bulldozer are updated annually. In 2011, Bobcat appeared, which was used in Ontario and Zacate single-crystal systems (SoC) as part of the Brazos platform. Last year, AMD announced Brazos 2.0. It uses slightly updated, but very close to the last-minute SoC based on Bobcat. Recently, APU Kabini and Temash were introduced, based on the first major Bobcat update: the Jaguar core.

Jaguar and Bobcat: Similarities and Differences

Архитектура Jaguar

At the kernel level, the Jaguar processor is similar to its predecessor Bobcat. This architecture still assumes the simultaneous launch of two commands or instructions for execution (dual-issue). At the same time, it is characterized by out-of-order execution of commands (out-of-order): this means that instructions are executed not in order, but as they are ready for execution. In short, the architecture is based on the basic idea that AMD introduced in 2010.

All the same first level cache (L1); input and output execution units are present as before. Note that the ARM architecture in Cortex A9 allows you to simultaneously run three instructions (three-issue) for execution. In the Cortex A15 applied and extraordinary execution of commands. AnandTech's host of well-known technical columnist Anand Lal Shimpi believed that AMD would offer something similar to the world.

AMD miniaturized the manufacturing process of its chips to 28 nanometers. This is very interesting, but the corporation has focused on improving the performance of Jaguar in comparison with its predecessor, while maintaining the same TDP. What prompted AMD to reduce the heat dissipation of their chips?

The fact is that the architecture of Bobcat was focused on low-cost and small-sized computers, in particular, netbooks and nettops. Jaguar'u same - in full accordance with the spirit of the times - to find its place in even more compact devices: tablets. AMD is not going to create a SoC for smartphones, but the market for tablets based on Windows 8 (and maybe Android?) Is very attractive for the company. Since such devices are not required to be compatible with cellular networks (not least because their price is comparatively lower), AMD can easily offer an alternative to the Intel Atom architecture.

With most types of workload, modern processors still perform less than one instruction per cycle (cycle). Most likely, it was for this reason that AMD did not see the point of introducing into the architecture the possibility of simultaneously running three instructions (three-issue). After all, it would require increased power consumption, and this approach is not justified for a processor oriented to mobile devices.

In addition, the Jaguars could become so powerful that the difference between them and the Bulldozer family would no longer be felt. And as noted at the beginning of the story, it is unprofitable to “average” the architecture from either an engineering or an economic point of view. Much more attractive is the model, in which chip maker offers a separate architecture for powerful computers and separately for compact devices. In the first case, the focus is on performance, in the second - power consumption and heat transfer, things are closely interrelated.

The transition to the simultaneous loading of three instructions, of course, will increase performance, but will not allow AMD processors to take their place in the tablets. Therefore, Anand Lal Shimpi believes, the build-up of power is postponed for the future. Against the background of the "three-instrumental" ARM, both Jaguar and Intel Silvermont look somewhat old-fashioned. But when comparing it is necessary to take into account the fact that AMD and Intel are struggling to reduce the power consumption of their processors. As for the ARM, within this architecture, the main focus is on increasing productivity.

Jaguar has a 4 x 32 byte buffer. If a loop is detected, instead of retrieving the instructions from the first-level cache used in the loop, they are extracted from this small buffer. It should not be considered as a trace cache or micro-cache (micro-cache). Things are much more prosaic. The advantage of this buffer is only that the instruction cache is not loaded every time when referring to the instructions included in the cycle. This means to reduce the power consumption of the processor, and not to improve its performance, as it may seem at first glance.

Before entering the market, the processor is modeled several times. In the simulation are the weak points of the processor. Even when the processor is already designed, some of them still remain. And they are eliminated already in the next generations of chips.

You can, of course, for years, and even decades to bring the processor to an ideal state. But the manufacturer has to think not only about offering the best possible processor to the market. He constantly takes into account the constraints, among which the main roles are played by the costs and schedule, which should be met. If the manufacturer had an unlimited budget, he could eliminate all the bottlenecks of his developments. But it would take forever. In reality, you have to make compromises.

One example of such a compromise is the fact that AMD stopped running two (and not three) instructions at the same time. The corporation also rejected the micro-operation cache in favor of the loop buffer, which is a simpler solution. Most likely, the company's engineers found that too much energy is wasted in vain during the execution of cycles, and the addition of a special buffer for these tasks is the optimal solution in terms of energy saving, cost and implementation complexity.

AMD also improved the advanced device (pre-emptive) selection of instructions instruction cache (s). This is an appeal to the Bobcat design and the endeavor to find its features in the new Jaguar architecture. This time, AMD did not have to create a fundamentally different architecture. She needed only to create an architecture based on Bobcat that would cope with the same tasks better than its predecessor. The buffer of instructions between the instruction cache and decoders in Jaguar has become larger. But this is a half decision. In Bulldozer, the sampling and decoding stages are generally separate.

Jaguar has 40-bit physical addressing and support for new instructions: SSE4.1 / 4.2, AES, CLMUL, MOVBE, AVX, F16C, BMI1. Bobcat's weak point was that its decoder imposed a limit on the maximum frequency. An additional decoding stage has been added to Jaguar to allow the frequency provided by AMD to be implemented within the framework of 28-nanometer technology.

Based on AnandTech.com

The article is based on materials https://hi-news.ru/consoles/chtivo-igrivyj-jaguar-vnutri-xbox-one-i-ps4-arxitektura.html.

Comments