Our 1st book in “Crack the Hardware Interview” series focuses on architecture and micro-architecture questions frequently asked during RTL design interviews. We hope you find the information in this book useful in preparing digital design / verification interviews.

To provide a preview of the book, the Table of Contents is shown below:
Part 1 Architecture & Prototype
Chapter 1 CPU Pipeline
- Q1: How does MIPS 5-stage pipeline look like?
- Q2: Hazards and solutions: a case study using MIPS 5-stage pipeline
- Q3: Can we arbitrarily increase the CPU pipeline depth?
- Q4: How to implement hardware based branch predictions?
Chapter 2 CPU Out-of-Order Scheduling
- Q5: How does Tomasulo’s Algorithm work?
- Q6: What are data dependencies through memory in Tomasulo’s Algorithm?
- Q7: How to handle data dependencies through memory?
- Q8: How to implement hardware-based speculation in Tomasulo’s Algorithm to minimize control hazards?
Chapter 3 Virtual Memory & TLBs
- Q9: What are the benefits of virtual memory?
- Q10: How does a virtual address get translated?
- Q11: Why do we need TLBs?
- Q12: How to handle a TLB miss?
- Q13: How to handle a page fault?
Chapter 4 Precise Interrupt Implementation
- Q14: What is precise interrupt? What is imprecise interrupt?
- Q15: How to implement precise interrupt?
Chapter 5 Cache
- Q16: Why do we need cache?
- Q17: What are cache conflicts?
- Q18: What are read / write / replacement policies in cache?
- Q19: How to measure the cache performance?
- Q20: Why is there no performance improvement with cache upsizing?
- Q21: What are the problems using virtual addresses to access cache?
- Q22: What are the types of caches based on index and tag bits?
Chapter 6 Cache Coherency
- Q23: What are NUMA / UMA architectures?
- Q24: What is cache coherency?
- Q25: How to enforce cache coherency?
- Q26: Can you show the state transition for snoop-based scheme using MSI protocol?
- Q27: What are MESI / MEOSI / MEOFSI protocols?
- Q28: How to implement a home directory for MESI protocol?
Chapter 7 Common On-Chip Bus Protocols
- Q29: Can you describe how the APB protocol works?
- Q30: Do you know how the AHB protocol works?
- Q31: Can you tell me how the AXI protocol works?
- Q32: Why do the AXI & AHB protocols offer wrapping bursts?
- Q33: What are the dependencies between channels in the AXI protocol?
- Q34: How to enforce ordering between AXI Write & Read Channels
- Q35: What is exclusive access in the AXI protocol?
Part 2 Micro-architecture Design
Chapter 1 Verilog Syntax & Primitive
- Q1: What is the difference between blocking and non-blocking assignments?
- Q2: How to detect and resolve X-related RTL issues?
- Q3: What is the difference between casex, casez and case-inside?
- Q4: What to watch out for when using SystemVerilog’s “signed” data types?
- Q5: What is the difference between “===” and “==”?
- Q6: What is delta simulation time?
- Q7: What are universal gates?
Chapter 2 Handshake Protocols
- Q8: What is a valid-ready protocol?
- Q9: What is a valid-ready slice?
- Q10: Convert 4-phase req-ack protocol to valid-ready protocol
- Q11: Convert valid-ready protocol to 4-phase req-ack protocol
Chapter 3 FIFO
- Q12: Design a non-power-of-2-entry flop-based sync FIFO
- Q13: Design a 2-push 1-pop flop based sync FIFO
- Q14: Design a sync FIFO using a dual-port SRAM
- Q15: Design a flop based async FIFO
- Q16: Design an async FIFO with non-power-of-2 even number of entries
- Q17: Design an SRAM based async FIFO
Chapter 4 Clock Domain Crossing (CDC)
- Q18: What is metastability
- Q19: What is MTBF? Why can synchronizers handle CDC?
- Q20: What are common CDC considerations to transfer a pulse?
- Q21: What are common CDC considerations to transfer multi-bit signals?
Chapter 5 LRU
- Q22: How to implement true LRU?
- Q23: How to implement pseudo LRU?
Chapter 6 Reordering
- Q24: Design a memory controller with in-order read responses (I)
- Q25: Design a memory controller with in-order read responses (II)
Chapter 7 Look Up Table (LUT)
- Q26: Implement y=f(x) function using 1-D LUT
- Q27: Implement z=f(x, y) function using 2-D LUT
Chapter 8 Arbiter
- Q28: Design a fixed priority arbiter
- Q29: Design a round robin arbiter
- Q30: Design a priority based arbiter
Chapter 9 Digital Frequency Divider
- Q31: Implement divide-by-N frequency divider
- Q32: Implement divide-by-power-of-2 frequency divider with 50% duty cycle
- Q33: Implement divide-by-2N frequency divider with 50% duty cycle
- Q34: Implement divide-by-(2N+1) frequency divider with 50% duty cycle
Chapter 10 Arithmetic Logic Design
- Q35: Design a simple ALU and draw its logical block diagram
- Q36: How to implement w = 3/2 x + 1/4 y + z?
- Q37: How to implement multiplication by 5 for BCD code?
- Q38: How to implement an integer divider?
- Q39: Build a 2-cycle latency 32-bit adder using two 16-bit adders
- Q40: Build a 2-cycle latency 32-bit accumulator using two 16-bit adders
Chapter 11 Sequence Generator & Detector
- Q41: Design a sequence generator
- Q42: Design a circuit that detects if one input is a delayed version of the other
- Q43: Design a circuit that detects sequence 1(01)*1
- Q44: Sequence detector of 3-bit Palindrome
- Q45: Determine whether an infinite sequence is a multiple of 5
- Q46: Design a programmable sequence detector
Chapter 12 Search
- Q47: Find index to the first one in a byte array from LSB
- Q48: Find index to the first one in a 16-bit array from LSB
- Q49: Find index to the most recently added one in a 8-depth 1-bit-wide FIFO
- Q50: Find index to the closest number in a sorted array (I)
- Q51: Find index to the closest number in a sorted array (II)

Leave a comment