Google claims not only has it made an AI that’s faster and as good as if not better than humans at designing chips, the web giant is using it to design chips for faster and better AI.
By designing, we mean the drawing up of a chip’s floorplan, which is the arrangement of its subsystems – such as its CPU and GPU cores, cache memory, RAM controllers, and so on – on its silicon die. The placement of the minute electronic circuits that make up these modules can affect the microchip’s power consumption and processing speed: the wiring and signal routing needed to connect it all up matters a lot.
In a paper to be published this week in Nature, and seen by The Register ahead of publication, Googlers Azalia Mirhoseini and Anna Goldie, and their colleagues, describe a deep reinforcement-learning system that can create floorplans in under six hours whereas it can take human engineers and their automated tools months to come up with an optimal layout.
We’re told Google has used this AI system to produce the floorplan of a next-generation TPU – its Tensor Processing Unit, which the web giant uses to accelerate the neural networks in its search engine, public cloud, AlphaGo and AlphaZero, and other projects and products.
In effect, Google is using machine-learning software to optimize future chips that speed up machine-learning software.
To the software, this is no different to playing a game: it gradually learns a winning strategy when arranging a chip’s die as if it were playing, say, a Go match. The neural network is content with laying out a chip that to a human may look like an unconventional mess, but in practice, the component has an edge over a part planned by engineers and their industry tools. The neural net also uses a couple of techniques once considered by the semiconductor industry but abandoned as dead ends.
“Our method was used to design the next generation of Google’s artificial-intelligence accelerators, and has the potential to save thousands of hours of human effort for each new generation,” the Googlers wrote. “Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields.”
When designing a microprocessor or workload accelerator, typically you’ll define how its subsystems work in a high-level language, such as VHDL, SystemVerilog, or perhaps even Chisel. This code will eventually be translated into something called a netlist, which describes how a collection of macro blocks and standard cells should be connected by wires to perform the chip’s functions. Standard cells contain basic things like NAND and NOR logic gates, and macro blocks contain a collection of standard cells or other electronics to perform a special function, such as provide on-die memory or a CPU core. Macro blocks are thus significantly larger than the standard cells.
Next, you have to choose how to arrange this netlist of cells and macro blocks on the die. It can take human engineers weeks to months working with specialist chip design tools over many iterations to achieve a floorplan that is optimized as needed for power consumption, timing, speed, and so on. Usually, you would tweak the placement of the large macro blocks as your design develops, and let the automated tools, which use non-intelligent algorithms, place the multitude of smaller standard cells, and then rinse and repeat until done.
To speed up this floorplanning stage, Google’s AI scientists created a convolutional neural network system that performs the macro block placement all by itself within hours to achieve an optimal layout; the standard cells are automatically placed in the gaps by other software. This machine-learning system should be able to produce an ideal floorplan far faster and better than the above method of tweaking and iterating a floorplan with the industry’s traditional automated tools and humans at the controls.
The neural network, we’re told, gradually improves its placement skills as it gains experience. It tries to place macro blocks on the die, with the space in between filled with standard cells, and is rewarded depending on the routing congestion, wire interconnect lengths, and other factors. This reward is used as feedback to improve its next attempt at placing the blocks. This is repeated until the software gets the hang of it, and can apply its abilities to whatever chip you want to lay out, even if it hasn’t seen one like it before.
In their paper, the Googlers said their neural network is “capable of generalizing across chips — meaning that it can learn from experience to become both better and faster at placing new chips — allowing chip designers to be assisted by artificial agents with more experience than any human could ever gain.”
Generating a floorplan can take less than a second using a pre-trained neural net, and with up to a few hours of fine-tuning the network, the software can match or beat a human at floorplan design, according to the paper, depending on which metric you use. The neural net outperformed human engineers who worked on a previous TPU accelerator in terms of signal timing, power usage, die area, and/or the amount of wiring needed, depending on the macro blocks involved.
For instance, a previous-generation TPU chip was laid out by human engineers, and when the neural network made a floorplan for the same component, the software was able to trim the amount of wiring needed on the die (reducing the wire length from 57.07m to 55.42m.) Similarly, the neural network reduced the wire length in an Ariane RISC-V CPU core when generating its floorplan, the paper states. The system was trained, in part, using previous TPU designs.
We note that after the Googlers’ neural network had laid out a next-generation TPU, the design still had to be tweaked by experts to ensure the component would actually work as intended – humans and their usual software tools were needed for the fiddly business of checking clock signal propagation, and so on. This step would still be needed even if the TPU was floorplanned by people and not a neural network.
“Our method was used in the product tapeout of a recent Google TPU,” the Googlers wrote. “We fully automated the placement process through PlaceOpt, at which point the design was sent to a third party for post-placement optimization, including detailed routing, clock tree synthesis and post-clock optimization.”
We also note that neural network’s placement for the taped-out next-gen TPU was, according to the paper, “comparable to manual designs,” and with a further fine-tuning step that optimized the orientation of the blocks, the wire length was trimmed by 1.07 per cent. The whole floorplan process took just eight hours. So, it seems, either Google can use its neural network to outperform humans or more or less match them, and it can do so in hours and not days, weeks, or months.
Their paper concludes:
We show that our method can generate chip floorplans that are comparable or superior to human experts in under six hours, whereas humans take months to produce acceptable floorplans for modern accelerators. Our method has been used in production to design the next generation of Google TPU.
Last year, Google put out a preprint of a similar paper that described using reinforcement learning to layout chips. As our sister site The Next Platform pointed out around that time, Google has been working on AI-powered chip design at least as far back as 2017. This latest paper is not only a journal-published refinement of that earlier work, it also discloses that the software was used to design a next-generation TPU.
Nvidia, meanwhile, has discussed using AI-driven tools for laying out chips. Get ready for more neural networks designing hardware to make neural networks more powerful.