Best RTL Coding Practice for Better PPA

Many times, different RTL coding styles in the same piece of code result in different PPA (Power, Performance, Area). Let us dive into a few examples.

Use Register Enable Conditions

When coding data pipelines, it is a good practice to always explicitly include an enable condition, as the synthesis tool could infer clock gaters for these enable conditions.

For example, instead of writing something like this:

    always_ff @(posedge clk)
        data_q <= data;

Code it like this:

     always_ff @(posedge clk)
         if (data_en) data_q <= data;

If the “data” bus is wide, the inferred clock gater will save quite a lot of power.

Minimize Signal Toggling When Not Needed

Reducing signal toggling when not needed is a common technique to save dynamic power. Other than the above mentioned “use register enable condition”, RTL designers sometimes use data gaters when the data path is supposed to be idle.

One example is, the inputs of a multiplier will change every cycle, but it only produces valid results every few cycles. When the multiplier is not doing any meaningful work, RTL designers can gate its inputs, such that the multiplication circuitry can be quiesced.

Shift Registers without Actual Shifting

Shift register is a common circuitry to maintain a finite window of data sequence cycle by cycle. Instead of shifting data stage by stage, consider data muxing. See diagram below:

True Shift Registers vs Shift Registers w/o Actual Shifting

It is easy to observe that Scheme b) does not introduce data shifting thus it saves dynamic power compared to Scheme a).

Use Proper Register Slice Type

In our book “Crack the Hardware Interview – Architecture & Micro-Architecture Design”, we discussed several types of register slices. We use a full slice to break the timing path between the valid and ready path.

However, when only the valid path has timing violation and the ready path has no issues, using a forward slice saves half of the data storage; on the other hand, when only the ready path has timing violation, using a backward slice can also save half of the data storage.

Use Cache to Suppress Redundant SRAM Reads

Unlike reading from flop, reading from SRAM will always activate the SRAM peripheral circuitries such as output buffers and sense amplifiers, introducing dynamic power consumptions for each SRAM read.

If RTL designers know beforehand that the SRAM read address pattern follows a certain pattern, for example, a few consecutive reads will access the same SRAM address, they can use cache to suppress the “redundant” SRAM reads.

When an SRAM read address is accessed the first time, then read data can be stored in a small cache. Subsequent reads to the same address can retrieve the data from cache, instead of triggering an actual SRAM read.

Remove Reset from Data Path Flops

Certain flops in the design do not require reset, such as data storage. Removing resets from these flops saves the overall design area.

However, unresetable flops may cause DFT coverage loss or increase test time, as scan based ASIC testing must explicitly initialize the flops during scan shift-in phase.

Remove Unused RTL

Unused RTL should not be synthesized, as it wastes area. For example, RTL designers often write behavior modeling code for assertions, and such code should not be part of real silicon.

There are several ways to detect unused RTL, for example:

Use Spyglass Lint: warnings like W120, W240, FlopEConst flag unused variables
Use Jasper Gold: Use Jasper Gold: its comprehensive structural lint check can flag dead-code and unreachable states
DesignCompiler: DC uses “OPT-1206, 1207” to report constant or unloaded flops, and “ELAB-976, 982, 984, 985” to report unused always blocks; designers can also rely on DC’s final report to review unloaded flops and constant flops
Use Conformal LEC: it can report unreachable endpoints, and designers should carefully review the report
Use Formality: similar to Conformal LEC, it can report endpoints without fanout

References:

https://www.cadence.com/en_US/home/tools/system-design-and-verification/formal-and-static-verification/jasper-gold-verification-platform/jaspergold-superlint-app.html

Chipress

Best RTL Coding Practice for Better PPA

Subscribe

Leave a comment Cancel reply

Best RTL Coding Practice for Better PPA

Spread the Words:

Subscribe

Leave a comment Cancel reply