Charles Chiang, President and CEO of MSI, Passes Away at 56

It is on a sad note that we are learning that MSI’s President and CEO, Charles Chiang, has passed away. Charles took the role of CEO a little over a year ago in January 2019, having headed up the massive success of MSI’s Desktop Platform Business Division and the growth in the companies Gaming branding and laser focus these past few years.


Charles had been part of MSI’s life for over 20 years; I have had the pleasure of meeting with him quite frequently in my trips to Taiwan and Computex, as well as an extensive HQ tour when MSI’s gaming brand was first starting – we discussed the upcoming emergence of virtual reality and how MSI wanted to create the world’s first VR-ready notebook. Charles always had time to listen to my industry ramblings, and was always keen to showcase how he perceived the industry with his decades of insight. He will be missed.


Going for Gaming: An Interview with MSI VP Charles Chiang on Gaming and Strategy


The following statement was given to Tom’s Hardware from MSI:


“Earlier today, MSI GM and CEO Charles Chiang passed away. Having been a part of the company for more than 20 years, he made outstanding contributions and was admired by his colleagues. Mr. Chiang was a respected leader in the MSI family, and helped pave the way for the brand’s success. We are all deeply saddened by the news, and are mourning the loss of Mr. Chiang. He will be deeply missed by the entire team.”


Our condolences go out to his family and to MSI.



Source: AnandTech – Charles Chiang, President and CEO of MSI, Passes Away at 56

Deepcool Releases Castle 280EX AIO CPU Cooler

Further expanding its ever-growing list of CPU coolers, Deepcool has announced its new Castle 280EX AIO CPU cooler, a closed loop liquid cooler with a 280 mm radiator. Slotting in between the pre-existing Castle 240EX and 360EX, the 280EX is designed to improve options for users looking for an RGB-infused cooler with a grey cylindrical CPU block.


The Deepcool Castle EX series is designed with a 3-phase motor intended to improve flow rate and overall cooling performance, but with less operating noise. The latest in the Castle EX range is the 280EX, which uses a 280 mm radiator that’s paired with a black sprayed aluminium core, and comes supplied with a pair of 140 mm 400-1600 rpm cooling fans. The new cooler supports all the usual socket types, including Intel’s LGA20XX, LGA 1200 and LGA 115x sockets, as well as AMD’s TRX40/TR4 and AM4 sockets.



One of the Deepcool Castle 280EX’s main design traits comes via the CPU block and pump, with a swappable logo plate which allows users to choose between Deepcool’s Gamerstorm emblem or a ying-yang symbol. Integrated into the rather bulky looking pump and block is some addressable RGB LED lighting which can be customized through a 3-pin ARGB motherboard header, or with an included RGB controller in the accessories bundle.


The cooling plate is made from copper for effective heat dissipation, and Deepcool has opted for a larger design with 25% more skived fins, although Deepcool doesn’t state which model it is using as a comparison. The larger plate allows better support for the sizable AMD TR4 socket, which has a much larger IHS than smaller processors such as the AMD’s Ryzen 3000 series.


The Deepcool Castle 280EX has an MSRP of $150 and is currently available to buy at Amazon. For reference, the larger Castle 360EX is presently available for $158, while the smallest of the now completed trio, the 240EX with 240 mm radiator can be purchased for $130.


Related Reading




Source: AnandTech – Deepcool Releases Castle 280EX AIO CPU Cooler

Intel Marks Gemini Lake Atom Platform for End-of-Life

Alongside Intel’s Skylake Core CPU architecture, Intel’s other CPU workhorse architecture for the last few years has been the Goldmont Plus Atom core. First introduced in 2017 as part of the Gemini Lake platform, Goldmont Plus was a modest update to Intel’s Atom architecture that has served as the backbone of the cheapest Intel-based computers since 2017. However Goldmont Plus’s days have been numbered since the announcement of the Tremont Atom architecture, and now Goldmont Plus is taking another step out the door with the announcement that Intel has begun End-Of-Life procedures for the Gemini Lake platform.


Intel’s bread and butter budget platform for the last few years, Gemini Lake chips have offered two or four CPU cores, as well as the interesting UHD Graphics 600/605 iGPUs, which ended up incorporating a mix of Intel’s Gen9 and Gen10 GPU architectures. These chips have been sold under the Pentium Silver and Celeron N-series brands for both desktop and mobile use, with TDPs ranging from 6W to 10W. All told, Gemini Lake doesn’t show up in too many notable PCs for obvious reasons, but it has carved out a bit of a niche in mini-PCs, where its native HDMI 2.0 support and VP2 Profile 2 support (for HDR) have given it a leg up over intel’s 7th/8th/9th generation Core parts.


None the less, after almost 3 years in the market Gemini Lake’s days are numbered. In the long-run, it’s set to be replaced with designs using Intel’s new Tremont architecture. Meanwhile in the short-run Intel’s budget lineup will be anchored by the Gemini Lake Refresh platform, which Intel quietly released in late 2019 as a stopgap for Tremont. As a result, Intel has started the process for retiring the original Gemini Lake platform, as the OG platform has become redundant.



Under a set of Product Change Notifications published yesterday, Intel has laid out a pretty typical EOL plan for the processors. Depending on the specific SKU, customers have until either October 23rd or January 22nd to make their final chip orders. Meanwhile those final orders will ship by April 2nd of 2021 or July 9th of 2021 respectively.


All told, this gives customers roughly another year to wrap up business with a platform that itself was supplanted the better part of a year ago.


Sources: Intel, Intel, & Intel



Source: AnandTech – Intel Marks Gemini Lake Atom Platform for End-of-Life

New AMD Ryzen 3000XT Processors Available Today

Announced a couple of weeks ago, the new AMD Ryzen 3000XT models with increased clock frequencies should be available today in primary markets. These new processors offer slightly higher performance than their similarly named 3000X counterparts for the same price, with AMD claiming to be taking advantage of a minor update in process node technology in order to achieve slightly better clock frequencies.



Source: AnandTech – New AMD Ryzen 3000XT Processors Available Today

xMEMS Announces World's First Monolithic MEMS Speaker

Speakers aren’t traditionally part of our coverage, but today’s announcement of xMEMS’ new speaker technology is something that everybody should take note of. Voice coil speakers as we know them and have been around in one form or another for over a hundred years and have been the basis of how we experience audio playback.


In the last few years, semiconductor manufacturing has become more prevalent and accessible, with MEMS (Microelectromechanical systems) technology now having advanced to a point that we can design speakers with characteristics that are fundamentally different from traditional dynamic drivers or balanced armature units. xMEMS’ “Montara” design promises to be precisely such an alternative.



xMEMS is a new start-up, founded in 2017 with headquarters in Santa Clara, CA and with a branch office in Taiwan. To date the company had been in stealth mode, not having publicly released any product till today. The company’s motivations are said to be breaking decades old speaker technology barriers and reinventing sound with new innovative pure silicon solutions, using extensive experience that its founders have collected over years at different MEMS design houses.


The manufacturing of xMEMS’ pure silicon speaker is very different to that of a conventional speaker. As the speaker is essentially just one monolithic piece manufactured via your typical lithography manufacturing process, much like how other silicon chips are designed. Due to this monolithic design aspect, the manufacturing line has significantly less complexity versus voice coil designs which have a plethora of components that need to be precision assembled – a task that is quoted to require thousands of factory workers.


The company didn’t want to disclose the actual process node of the design, but expect something quite crude in the micron range – they only confirmed that it was a 200mm wafer technology.



Besides the simplification of the manufacturing line, another big advantage of the lithographic aspect of a MEMS speaker is the fact that its manufacturing precision and repeatability are significantly superior to that of a more variable voice coil design. The mechanical aspects of the design also has key advantages, for example higher consistency membrane movement which allows higher responsiveness and lower THD for active noise cancellation.



xMEMS’ Montara design comes in an 8.4 x 6.06 mm silicon die (50.9mm²) with 6 so-called speaker “cells” – the individual speaker MEMS elements that are repeated across the chip. The speaker’s frequency response covers the full range from 10Hz to up to 20KHz, something which current dynamic driver or balanced armature drivers have issues with, and why we see multiple such speakers being employed for covering different parts of the frequency range.


The design is said to have extremely good distortion characteristics, able to compete with planar magnetic designs and promises to have only 0.5% THD at 200Hz – 20KHz.


As these speakers are capacitive piezo-driven versus current driven, they are able to cut power consumption to fractions of that of a typical voice coil driver, only using up 42µW of power.



Size is also a key advantage of the new technology. Currently xMEMS is producing a standard package solution with the sound coming perpendicularly out of the package which has the aforementioned 8.4 x 6.05 x 0.985mm footprint, but we’ll also see a side-firing solution which has the same dimensions, however allows manufacturers to better manage internal earphone design and component positioning.



In the above crude 3D printed unit with no optimisations whatsoever in terms of sound design, xMEMS easily managed to design an earphone of similar dimensions to that of current standard designs. In fact, commercial products are likely to looks much better and to better take advantage of the size and volume savings that such a design would allow.



One key aspect of the capacitive piezo-drive is that it requires a different amplifier design to that of classical speaker. Montara can be driven up to 30V peak-to-peak signals which is well above the range of your existing amplifier designs. As such, customers wishing to deploy a MEMS speaker design such as the Montara requires an additional companion chip, such as Texas Instruments’ LM48580.


In my view this is one of the big hurdles for more widespread adoption of the technology as it will limit its usage to more integrated solutions which do actually offer the proper amplifier design to drive the speakers – a lot of existing audio solutions out there will need an extra adapter/amp if any vendor actually decides to actually make a non-integrated “dumb” earphone design (As in, your classical 3.5mm ear/headphones).


TWS (True wireless stereo) headphones here obviously are the prime target market for the Montara as the amplifier aspect can be addressed at design, and such products can fully take advantage of the size, weight and power advantages of the new speaker technology.



In measurements, using the crude 3D-printed earphone prototype depicted earlier, xMEMS showcases that the Montara MEMS speaker has significantly higher SPL than any other earphone solution, with production models fully achieving the targeted 115dB SPL (The prototype only had 5 of the 6 cells active). The native frequency response here is much higher in the higher frequencies – allowing vendors headroom in order adapt and filter the sound signature in their designs. Filtering down is much easier than boosting at these frequencies.


THD at 94dB SPL is also significantly better than even an unnamed pair of $900 professional IEMs – and again, there’s emphasis that this is just a crude design with no audio optimisations whatsoever.



In terms of cost, xMEMS didn’t disclose any precise figure, but shared with us that it’ll be in the range of current balanced armature designs. xMEMS’ Montara speaker is now sampling to vendors, with expected mass production kicking in around spring next year – with commercial devices from vendors also likely to see the light of day around this time.




Source: AnandTech – xMEMS Announces World’s First Monolithic MEMS Speaker

SK Hynix: HBM2E Memory Now in Mass Production

Just shy of a year ago, SK Hynix threw their hat into the ring, as it were, by becoming the second company to announce memory based on the HBM2E standard. Now the company has announced that their improved high-speed, high density memory has gone into mass production, offering transfer rates up to 3.6 Gbps/pin, and capacities of up to 16GB per stack.


As a quick refresher, HBM2E is a small update to the HBM2 standard to improve its performance, serving as a mid-generational kicker of sorts to allow for higher clockspeeds, higher densities (up to 24GB with 12 layers), and the underlying changes that are required to make those happen. Samsung was the first memory vendor to ship HBM2E with their 16GB/stack Flashbolt memory, which runs at up to 3.2 Gbps in-spec (or 4.2 Gbps out-of-spec). This in turn has led to Samsung becoming the principle memory partner for NVIDIA’s recently-launched A100 accelerator, which was launched using Samsung’s Flashbolt memory.


Today’s announcement by SK Hynix means that the rest of the HBM2E ecosystem is taking shape, and that chipmakers will soon have access to a second supplier for the speedy memory. As per SK Hynix’s initial announcement last year, their new HBM2E memory comes in 8-Hi, 16GB stacks, which is twice the capacity of their earlier HBM2 memory. Meanwhile, the memory is able to clock at up to 3.6 Gbps/pin, which is actually faster than the “just” 3.2 Gbps/pin that the official HBM2E spec tops out at. So like Samsung’s Flashbolt memory, it would seem that the 3.6 Gbps data rate is essentially an optional out-of-spec mode for chipmakers who have HBM2E memory controllers that can keep up with the memory.


At those top speeds, this gives a single 1024-pin stack a total of 460GB/sec of memory bandwidth, which rivals (or exceeds) most video cards today. And for more advanced devices which employ multiple stacks (e.g. server GPUs), this means a 6-stack configuration could reach as high as 2.76TB/sec of memory bandwidth, a massive amount by any measure.


Finally, for the moment SK Hynix isn’t announcing any customers, but the company expects the new memory to be used on “next-generation AI (Artificial Intelligence) systems including Deep Learning Accelerator and High-Performance Computing.” An eventual second-source for NVIDIA’s A100 would be among the most immediate use cases for the new memory, though NVIDIA is far from the only vendor to use HBM2. If anything, SK Hynix is typically very close to AMD, who is due to launch some new server GPUs over the next year for use in supercomputers and other HPC systems. So one way or another, the era of HBM2E is quickly ramping up, as more and more high-end processors are set to be introduced using the faster memory.



Source: AnandTech – SK Hynix: HBM2E Memory Now in Mass Production

The Intel Lakefield Deep Dive: Everything To Know About the First x86 Hybrid CPU

For the past eighteen months, Intel has paraded its new ‘Lakefield’ processor design around the press and the public as a paragon of new processor innovation. Inside, Intel pairs one of its fast peak performance cores with four of its lower power efficient cores, and uses novel technology in order to build the processor in the smallest footprint it can. The new Lakefield design is a sign that Intel is looking into new processor paradigms, such as hybrid processors with different types of cores, but also different stacking and packaging technologies to help drive the next wave of computing. With this article, we will tell you all you need to know about Lakefield.



Source: AnandTech – The Intel Lakefield Deep Dive: Everything To Know About the First x86 Hybrid CPU

Samsung Lets Note20+/Ultra Design Slip

We still haven’t had any official announcements from Samsung regarding the Note20 series as of yet, expecting the company to only reveal the new phone series sometime in early to mid-August if past release dates are any indications. Yet in a surprise blunder, the company has managed to publicly upload two product images of the upcoming Note20+ or Ultra (naming uncertain) on one of its Ukrainian pages.


Whilst we usually don’t report on leaks or unofficial speculations as part of our editorial standards – a first party blunder like this is very much an exception to the rule.



The leak showcases the seemingly bigger sibling of the Note20 series as it features the full camera housing and seemingly same modules as the Galaxy S20 Ultra. There’s been a design aesthetic change as the cameras are now accentuated by a ring element around the lenses, making the modules appear more consistent with each other, even though there’s still clearly different sized lenses along with the rectangular periscope zoom module. The images showcase actual depth on the part of the ring elements, so they may extend in three dimensions.


The new gold/bronze colour also marks a return for Samsung for such a more metallic option.


We expect the Note20 series to be a minor hardware upgrade over the S20 devices, with the most defining characteristic naturally being the phone’s integrated S-Pen stylus.


Related Reading:




Source: AnandTech – Samsung Lets Note20+/Ultra Design Slip

ASUS ROG Maximus XII Apex Now Available

Back in April, Intel released its Z490 chipset for its 10th generation Comet Lake processors with a choice of over 44 models for users to select from. One of the more enthusiast-level models for Z490 was announced by ASUS via its ROG Maximus Apex, with solid overclocking focused traits, but equally with enough features for performance users and gamers too. ASUS has announced that the ROG Maximus Apex is now available to purchase with some of the most prominent features including three PCIe 3.0 x4 M.2 slots, a 16-phase power delivery, an Intel 2.5 GbE Ethernet controller and an Intel Wi-Fi 6 wireless interface.


Not all motherboards are created equal, and not all conform to fit a specific purpose e.g content creation, gaming, or workstation. One of ASUS’s most distinguished brands is the Republic of Gamers series, with its blend of premium controllers, aesthetics, and the models are generally full of features. The Apex series is the brands overclocking focused models, and there have been some fantastic Apex models across the chipsets. ASUS has just put the new ROG Maximus XII Apex into North American retail channels.



Some of the most notable features of the ASUS ROG Maximus XII Apex include support for up to three PCIe 3.0 x4 M.2 drives, with the use of an included ROG DIMM.2 module included in the accessories bundle. Looking at storage, the Apex includes eight SATA ports which use a friendly V-shaped design to allow easier installation of SATA drives. Despite this board being ATX, ASUS includes just two memory slots with support for up to 64 GB of DDR4-4800 memory, which is likely to improve latencies and overall memory performance when overclocking memory. There are two full-length PCIe 3.0 slots which operate at x16 and x8/x8, with a half-length PCIe 3.0 x4 slot and a single PCIe 3.0 x1 slot. On the rear panel are a load of USB connectivity with four USB 3.2 G2 Type-A, one USB 3.2 G2 Type-C, and five USB 3.2 G1 Type-A ports. For networking, ASUS includes an Intel I225-V 2.5 GbE ethernet controller and an Intel AX201 Wi-Fi 6 interface which also includes support for BT 5.1 devices. The board also includes a SupremeFX S1220A HD codec which adds five 3.5 mm audio jacks and a single S/PDIF optical output on the rear.


Underneath the large power delivery heatsink is a big 16-phase setup with sixteen TDA21490 90 A power stages, with an ASP1405I PWM controller operating in 7+1 mode. This is due to ASUS opting to use teamed power stages with fourteen for the CPU, and two for the SoC, with teamed designed to improve transient response when compared to setups that use doublers. Providing power to the CPU is a pair of 12 V ATX CPU power inputs, while a 4-pin Molex is present to provide additional power to the PCIe slots. 


The ASUS ROG Maximus XII Apex is currently available to purchase at Digital Storm and Cyberpower in the US, with stock expected to land at both Amazon and Newegg very soon. Stockists and retailers such as Scan Computers in the UK also have stock at present.



Related Reading




Source: AnandTech – ASUS ROG Maximus XII Apex Now Available

Qualcomm Announces New Snapdragon Wear 4100 & 4100+: 12nm A53 Smartwatches

Today Qualcomm is making a big step forward in its smartwatch SoC offerings by introducing the brand-new Snapdragon Wear 4100 and Wear 4100+ platforms. The new chips succeed the aging two 2018 originating Wear 3100 platforms and significantly upgrading the hardware specifications, bringing to the table all new IPs for CPU, GPU and DSPs, all manufactured on a newer lower power process node.



Source: AnandTech – Qualcomm Announces New Snapdragon Wear 4100 & 4100+: 12nm A53 Smartwatches

AMD Publishes First Beta Driver With Windows 10 Hardware GPU Scheduling Support

Following last week’s release of NVIDIA’s first Hardware-Accelerated GPU Scheduling-enabled video card driver, AMD this week has stepped up to the plate to do the same. The Radeon Software Adrenalin 2020 Edition 20.5.1 Beta with Graphics Hardware Scheduling driver (version 20.10.17.04) has been posted to AMD’s website, and as the name says on the tin, the driver offers support for Windows 10’s new hardware-accelerated GPU scheduling technology.


As a quick refresher, hardware acceleration for GPU scheduling was added to the Windows display driver stack with WDDM 2.7 (shipping in Win10 2004). And, as alluded to by the name, it allows GPUs to more directly manage their VRAM. Traditionally Windows itself has done a lot of the VRAM management for GPUs, so this is a distinctive change in matters.


Microsoft has been treating the feature as a relatively low-key development – relative to DirectX 12 Ultimate, they haven’t said a whole lot about it – meanwhile AMD’s release notes make vague performance improvement claims, stating “By moving scheduling responsibilities from software into hardware, this feature has the potential to improve GPU responsiveness and to allow additional innovation in GPU workload management in the future”. As was the case with NVIDIA’s release last week, don’t expect anything too significant here, otherwise AMD would be more heavily promoting the performance gains. But it’s something to keep an eye on over the long term.


In the meantime, AMD seems to be taking a cautious approach here. The beta driver has been published outside their normal release channels and only supports products using AMD’s Navi 10 GPUs – so the Radeon 5700 series, 5600 series, and their mobile variants. Support for the Navi 14-based 5500 series is notably absent, as is Vega support for both discrete and integrated GPUs.


Additional details about the driver release, as well as download instructions, can be found on AMD’s website in the driver release notes.


Finally, on a tangential note, I’m aiming to sit down with The Powers That Be over the next week or so in order to better dig into hardware-accelerated GPU scheduling. Since it’s mostly a hardware developer-focused feature, Microsoft hasn’t talked about it much in the consumer context or with press. So I’ll be diving into more on the theory behind it: what it’s meant to do, future feature prospects, and as well as the rationale for introducing it now as opposed to earlier (or later). Be sure to check back in next week for that.



Source: AnandTech – AMD Publishes First Beta Driver With Windows 10 Hardware GPU Scheduling Support

The OnePlus 8, OnePlus 8 Pro Pro Review: Becoming The Flagship

It’s been a couple of months since OnePlus released the new OnePlus 8 & OnePlus 8 Pro, and both devices have received plenty of software updates improving the device’s experiences and camera qualities. Today, it’s time to finally go over the full review of both devices, which OnePlus no longer really calls “flagship killers”, but rather outright flagships.


The OnePlus 8, and especially the OnePlus 8 pro are big step-up redesigns from the company, significantly raising the bar in regards to the specifications and features of the phones. The OnePlus 8 Pro is essentially a check-marked wish-list of characteristics that were missing from last year’s OnePlus 7 Pro as the company has addressed some of its predecessors’ biggest criticisms. The slightly smaller and cheaper regular OnePlus 8 more closely follows its predecessors’ ethos as well as competitive pricing, all whilst adopting the new design language that’s been updated with this year’s devices.



Source: AnandTech – The OnePlus 8, OnePlus 8 Pro Pro Review: Becoming The Flagship

HPC Systems Special Offer: Two A64FX Nodes in a 2U for $40k

It was recently announced that the Fugaku supercomputer, located at Riken in Japan, has scored the #1 position on the TOP500 supercomputer list, as well as #1 positions in a number of key supercomputer benchmarks. At the heart of Fugaku isn’t any standard x86 processor, but one based on Arm – specifically, the A64FX 48+4-core processor, which uses Arm’s Scalable Vector Extensions (SVE) to enable high-throughput FP64 compute. At 435 PetaFLOPs and 7.3 million cores, Fugaku beat the former #1 system by 2.8x in performance. Currently Fugaku has been used for COVID-19 related research, such as modelling tracking rates or virus in liquid droplet dispersion.



The Fujitsu A64FX card is a unique piece of kit, offering 48 compute cores and 4 control cores, each with monumental bandwidth to keep the 512-bit wide SVE units fed. The chip runs at 2.2 GHz, and can operate in FP64, FP32, FP16 and INT8 modes for a variety of AI applications. There is 1 TB/sec of bandwidth from the 32 GB of HBM2 on each card, and because there are four control cores per chip, it runs by itself without any external host/device situation.



It wasn’t ever clear if the A64FX module would be available on a wider scale beyond supercomputer sales, however today confirms that it is, with the Japanese based HPC Systems set to offer a Fujitsu PrimeHPC FX700 server that contains up to eight A64FX nodes (at 1.8 GHz) within a 2U form factor. Each note is paired with 512 GB of SSD storage and gigabit Ethernet capabilities, with room for expansion (Infiniband EDR etc). The current deal at HPC Systems is for a 2-node implementation, at a price of ¥4,155,330 (~$39000 USD), with the deal running to the end of the year.



The A64FX card already has listed support for quantum chemical calculation software Gaussian16, molecular dynamics software AMBER, non-linear structure analysis software LS-DYNA. Other commercial packages in the structure and fluid analysis fields will be coming on board in due course. There is also Fujitsu’s Software Compiler Package v1.0 to enable developers to build their own software.


Source: HPC Systems, PDF Flyer


Related Reading


 



Source: AnandTech – HPC Systems Special Offer: Two A64FX Nodes in a 2U for k

Sponsored Post: Check Out all of the ASUS B550 Motherboards Available Now

The arrival of the AMD B550 chipset is an exciting prospect for PC builders, as it’s the first to bring the potential of PCIe 4.0 to the forefront for mainstream builders. ASUS has a diverse selection of new motherboards to choose from with this chipset, and this useful B550 motherboard guide will help you figure out which one is right for you.

In ASUS B550 motherboards, the main PCIe x16 and M.2 slots are PCIe 4.0-capable. They also feature up to four USB 3.2 Gen 2 ports that clock in with a maximum supported speed of 10Gbps each. The chipset’s built-in lanes now have PCIe 3.0 connectivity as well, which is great to see. Additionally, AMD has noted that future CPUs built on the Zen 3 architecture will be fully compatible with B550 motherboards, making them a safe and long-lasting investment for people who wish to upgrade to those new processors down the line.



Source: AnandTech – Sponsored Post: Check Out all of the ASUS B550 Motherboards Available Now

Intel’s Raja Koduri Teases “Ponte Vecchio” Xe GPU Silicon

Absent from the discrete GPU space for over 20 years, this year Intel is set to see the first fruits from their labors to re-enter that market. The company has been developing their new Xe family of GPUs for a few years now, and the first products are finally set to arrive in the coming months with the Xe-LP-based DG1 discrete GPU, as well as Tiger Lake’s integrated GPU, kicking off the Xe GPU era for Intel.


But those first Xe-LP products are just the tip of a much larger iceberg. Intending to develop a comprehensive top-to-bottom GPU product stack, Intel is also working on GPUs optimized for the high-power discrete market (Xe-HP), as well as the high-performance computing market (Xe-HPC).



Xe-HPC, in turn, is arguably the most important of the three segments for Intel, as well as being the riskiest. The server-class GPU will be responsible for broadening Intel’s lucrative server business beyond CPUs, along with fending off NVIDIA and other GPU/accelerator rivals, who in the last few years have ridden the deep learning wave to booming profits and market shares that increasingly threaten Intel’s traditional market dominance. The server market is also the riskiest market, due to the high-stakes nature of the hardware: the only thing bigger than the profits are the chips, and thus the costs to enter the market. So under the watchful eye of Raja Koduri, Intel’s GPU guru, the company is gearing up to stage a major assault into the GPU space.


That brings us to the matter of this week’s teaser. One of the benefits of being a (relatively) upstart rival in the GPU business is that Intel doesn’t have any current-generation products that they need to protect; without the risk of Osborning themselves, they’re free to talk about their upcoming products even well before they ship. So, as a bit of a savvy social media ham, Koduri has been posting occasional photos of Ponte Vecchio, the first Xe-HPC GPU, as Intel brings it up in their labs.




Today’s teaser from Koduri shows off a tray with three different Ponte Vecchio chips of different sizes. While detailed information about Ponte Vecchio is still limited, Intel has previously commented that Ponte Vecchio would be taking a chiplet route for the GPU, using multiple chiplets to build larger and more powerful designs. Koduri’s latest photo, in turn, looks to be a clear illustration of that, with the larger chip sizes roughly correlating to 1×2 and 2×2 configurations of the smallest chip.


And with presumably multiple chiplets under the hood, the resulting chips are quite sizable. With a helpful 18650 battery in the photo for reference, we can see that the smaller packages are around 65mm wide, while the largest package is easily approaching 110mm on a side.  (For refence, an Intel desktop CPU is around 37.5mm x 37.5mm).


Finally, in a separate tweet, Koduri quickly talks about performance: “And..they let me hold peta ops in my palm(almost:)!” Koduri doesn’t go into any detail about the numeric format involved – an important qualifier when talking about compute throughput on GPUs that can process lower-precision formats at higher rates – but we’ll be generous and assume INT8 operations. INT8 has become a fairly popular format for deep learning inference, as the integer format offers great performance for neural nets that don’t need high precision. NVIDIA’s A100 accelerator, for reference, tops out at 0.624 PetaOPs for regular tensor operations, or 1.248 PetaOps for a sparse matrix.


And that is the latest on Ponte Vecchio. Though with the parts likely not shipping until later in 2021 as part of the Aurora supercomputer, it’s likely not going to be the last word from Intel and Koduri on their first family of HPC GPUs.



Source: AnandTech – Intel’s Raja Koduri Teases “Ponte Vecchio” Xe GPU Silicon

Intel’s Raja Koduri Teases Even Larger Xe GPU Silicon

Absent from the discrete GPU space for over 20 years, this year Intel is set to see the first fruits from their labors to re-enter that market. The company has been developing their new Xe family of GPUs for a few years now, and the first products are finally set to arrive in the coming months with the Xe-LP-based DG1 discrete GPU, as well as Tiger Lake’s integrated GPU, kicking off the Xe GPU era for Intel.


But those first Xe-LP products are just the tip of a much larger iceberg. Intending to develop a comprehensive top-to-bottom GPU product stack, Intel is also working on GPUs optimized for the high-power discrete market (Xe-HP), as well as the high-performance computing market (Xe-HPC).



That high end of the market, in turn, is arguably the most important of the three segments for Intel, as well as being the riskiest. The server-class GPUs will be responsible for broadening Intel’s lucrative server business beyond CPUs, along with fending off NVIDIA and other GPU/accelerator rivals, who in the last few years have ridden the deep learning wave to booming profits and market shares that increasingly threaten Intel’s traditional market dominance. The server market is also the riskiest market, due to the high-stakes nature of the hardware: the only thing bigger than the profits are the chips, and thus the costs to enter the market. So under the watchful eye of Raja Koduri, Intel’s GPU guru, the company is gearing up to stage a major assault into the GPU space.


That brings us to the matter of this week’s teaser. One of the benefits of being a (relatively) upstart rival in the GPU business is that Intel doesn’t have any current-generation products that they need to protect; without the risk of Osborning themselves, they’re free to talk about their upcoming products even well before they ship. So, as a bit of a savvy social media ham, Koduri has been posting occasional photos of Intel’s Xe GPUs, as Intel brings them up in their labs.




Today’s teaser from Koduri shows off a tray with three different Xe chips of different sizes. While detailed information about the Xe family is still limited, Intel has previously commented that the Xe-HPC-based Ponte Vecchio would be taking a chiplet route for the GPU, using multiple chiplets to build larger and more powerful designs. So while Koduri’s tweets don’t make it clear what specific GPUs we’re looking at – if they’re all part of the Xe-HP family or a mix of different families – the photo is an interesting hint that Intel may be looking at a wider use of chiplets, as the larger chip sizes roughly correlate to 1×2 and 2×2 configurations of the smallest chip.


And with presumably multiple chiplets under the hood, the resulting chips are quite sizable. With a helpful AA battery in the photo for reference, we can see that the smaller packages are around 50mm wide, while the largest package is easily approaching 85mm on a side.  (For refence, an Intel desktop CPU is around 37.5mm x 37.5mm).


Finally, in a separate tweet, Koduri quickly talks about performance: “And..they let me hold peta ops in my palm(almost:)!” Koduri doesn’t go into any detail about the numeric format involved – an important qualifier when talking about compute throughput on GPUs that can process lower-precision formats at higher rates – but we’ll be generous and assume INT8 operations. INT8 has become a fairly popular format for deep learning inference, as the integer format offers great performance for neural nets that don’t need high precision. NVIDIA’s A100 accelerator, for reference, tops out at 0.624 PetaOPs for regular tensor operations, or 1.248 PetaOps for a sparse matrix.


And that is the latest on Xe. With the higher-end discrete parts likely not shipping until later in 2021, this is likely not going to be the last word from Intel and Koduri on their first modern family of discrete GPUs.


Update: A previous version of the article called the large chip Ponte Vecchio, Intel’s Xe-HPC flagship. We have since come to understand that the silicon we’re seeing is likely not Ponte Vecchio, making it likely to be something Xe-HP based



Source: AnandTech – Intel’s Raja Koduri Teases Even Larger Xe GPU Silicon

AMD Succeeds in its 25×20 Goal: Renoir Crosses the Line in 2020

One of the stories bubbling away in the background of the industry is the AMD self-imposed ‘25×20’ goal. Starting with performance in 2014, AMD committed to itself, to customers, and to investors that it would achieve an overall 25x improvement in ‘Performance Efficiency’ by 2020, which is a function of raw performance and power consumption. At the time AMD was defining its Kaveri mobile product as the baseline for the challenge – admittedly a very low bar – however each year AMD has updated us on its progress. With this year being 2020, the question on my lips ever since the launch of Zen2 for mobile was if AMD had achieved its goal, and if so, by how much? The answer is yes, and by a lot.

In this article we will recap the 25×20 project, how the metrics are calculated, and what this means for AMD in the long term.



Source: AnandTech – AMD Succeeds in its 25×20 Goal: Renoir Crosses the Line in 2020

NVIDIA Posts First DirectX 12 Ultimate Driver Set, Enables GPU Hardware Scheduling

NVIDIA sends word this morning that the company has posted their first DirectX 12 Ultimate-compliant driver. Published as version 451.48 – the first driver out of NVIDIA’s new Release 450 driver branch – the new driver is the first release from the company to explicitly support the latest iteration of DirectX 12, enabling support for features such as DXR 1.1 ray tracing and tier 2 variable rate shading. As well, this driver also enables support for hardware accelerated GPU scheduling.


As a quick refresher, DirectX 12 Ultimate is Microsoft’s latest iteration of the DirectX 12 graphics API, with Microsoft using it to synchronize the state of the API between current-generation PCs and the forthcoming Xbox Series X console, as well as to set a well-defined feature baseline for future game development. Based around the capabilities of current generation GPUs (namely: NVIDIA Turing) and the Xbox Series X’s AMD RDNA2-derrived GPU, DirectX 12 Ultimate introduces several new GPU features under a new feature tier (12_2). This includes an updated version of DirectX’s ray tracing API, DXR 1.1, as well as tier 2 variable rate shading, mesh shaders, and sampler feedback. The software groundwork for this has been laid in the latest version of Windows 10, version 2004, and now is being enabled in GPU drivers for the first time.














DirectX 12 Feature Levels
  12_2

(DX12 Ult.)
12_1 12_0
GPU Architectures

(Introduced as of)
NVIDIA: Turing

AMD: RDNA2

Intel: Xe?
NVIDIA: Maxwell 2

AMD: Vega

Intel: Gen9
NVIDIA: Maxwell 2

AMD: Hawaii

Intel: Gen9
Ray Tracing

(DXR 1.1)
Yes No No
Variable Rate Shading

(Tier 2)
Yes No No
Mesh Shaders Yes No No
Sampler Feedback Yes No No
Conservative Rasterization Yes Yes No
Raster Order Views Yes Yes No
Tiled Resources

(Tier 2)
Yes Yes Yes
Bindless Resources

(Tier 2)
Yes Yes Yes
Typed UAV Load Yes Yes Yes

In the case of NVIDIA’s recent video cards, the underlying Turing architecture has supported these features since the very beginning. However, their use has been partially restricted to games relying on NVIDIA’s proprietary feature extensions, due to a lack of standardized API support. Overall it’s taken most of the last two years to get the complete feature set added to DirectX, and while NVIDIA isn’t hesitating to use this moment to proclaim their GPU superiority as the first vendor to ship DirectX 12 Ultimate support, to some degree it’s definitely vindication of the investment the company put in to baking these features into Turing.


In any case, enabling DirectX 12 Ultimate support is an important step for the company, though one that’s mostly about laying the groundwork for game developers, and ultimately, future games. At this point no previously-announced games have confirmed that they’ll be using DX12U, though this is just a matter of time, especially with the Xbox Series X launching in a few weeks.



Perhaps the more interesting aspect of this driver release, though only tangential to DirectX 12 Ultimate support, is that NVIDIA is enabling support for hardware accelerated GPU scheduling. This mysterious feature was added to the Windows display driver stack with WDDM 2.7 (shipping in Win10 2004), and as alluded to by the name, it allows GPUs to more directly manage their VRAM. Traditionally Windows itself has done a lot of the VRAM management for GPUs, so this is a distinctive change in matters.


At a high level, NVIDIA is claiming that hardware accelerated GPU scheduling should offer minor improvements to the user experience, largely by reducing latency and improving performance thanks to more efficient video memory handling. I would not expect anything too significant here – otherwise NVIDIA would be heavily promoting the performance gains – but it’s something to keep an eye out for. Meanwhile, absent any other details, I find it interesting that NVIDIA lumps video playback in here as a beneficiary as well, since video playback is rarely an issue these days. At any rate, the video memory handling changes are being instituted at a low level, so hardware scheduling is not only for DirectX games and the Windows desktop, but also for Vulkan and OpenGL games as well.


Speaking of Vulkan, the open source API is also getting some attention with this driver release. 451.48 is the first GeForce driver with support for Vulkan 1.2, the latest version of that API. An important housekeeping update for Vulkan, 1.2 is promoting a number of previously optional feature extensions into the core Vulkan API, such as Timeline Semaphores, as well as improved cross portability support by adding full support for HLSL (i.e. DirectX) shaders within Vulkan.



Finally, while tangential to today’s driver release, NVIDIA has posted an interesting note on its customer support portal regarding Windows GPU selection that’s worth making note of. In short, Windows 10 2004 has done away with the “Run with graphics processor” contextual menu option within NVIDIA’s drivers, which prior to now has been a shortcut method of forcing which GPU an application runs on it an Optimus system. In fact, it looks like control over this has been removed from NVIDIA’s drivers entirely. As noted in the support document, controlling which GPU is used is now handled through Windows itself, which means laptop users will need to get used to going into the Windows Settings panel to make any changes.



As always, you can find the full details on NVIDIA’s new GeForce driver, as well as the associated release notes, over on NVIDIA’s driver download page.



Source: AnandTech – NVIDIA Posts First DirectX 12 Ultimate Driver Set, Enables GPU Hardware Scheduling