Wednesday, 4 March 2026

Understanding Memory Hierarchy and Why Optimization Matters in SoC Design | Memory Design 101 – Part 2 |

Memory Design Series:

Part 1 – Introduction to Memory Design
Part 2 – Memory Hierarchy and Optimization

Memory Design 101 – Part 2

Understanding Memory Hierarchy and Why Optimization Matters in SoC Design

By Afzal Malik

In the previous article of this series, we introduced an important industry perspective:

“Logic wins the headlines, but memory pays the bills.”

In modern System-on-Chip (SoC) designs, 60–80% of the silicon area is often occupied by memory structures such as caches, register files, and buffers. Because of this, understanding how memory is organized and optimized is a fundamental skill for anyone entering the world of VLSI and chip design.

In this article, we will explore two key ideas that guide memory architecture:

Memory Hierarchy – how different memories are organized in a system
PPA Trade-offs – the fundamental design constraints of memory blocks

The Need for a Memory Hierarchy

If we tried to design a system with only one type of memory, we would immediately run into a major problem.

Ideally, memory should be:

Extremely fast
Very large in capacity
Low power
Low cost

Unfortunately, in real hardware no single memory technology satisfies all these requirements simultaneously. Fast memories are usually small and expensive, while large memories are slower but cheaper.

To solve this problem, computer architects organize memory into a hierarchy, where each level balances speed, capacity, and cost.

A simplified memory hierarchy looks like this:

Processor
   │
Registers
   │
L1 Cache
   │
L2 Cache
   │
L3 Cache
   │
Main Memory (DRAM)
   │
Non-Volatile Storage (SSD / Disk)

As we move away from the processor:

Capacity increases
Latency increases
Cost per bit decreases

This layered structure allows processors to access data quickly most of the time, while still supporting large overall memory capacity.

Level 1: Registers – The Fastest Memory

Registers sit inside the processor datapath and are the fastest storage elements in the system.

Characteristics:

Located directly in the CPU datapath
Accessed in one clock cycle
Extremely small capacity
Implemented using flip-flops

Registers store intermediate arithmetic results, instruction operands, and temporary data used by the processor.

Because they must operate at core frequency, they are designed for maximum speed, not density.

Level 2: L1 Cache – Processor Speed Driven

The next level is the Level-1 cache (L1 cache).

L1 cache is usually split into two parts:

Instruction cache (I-Cache)
Data cache (D-Cache)

Key characteristics:

Located very close to the CPU core
Built using SRAM
Extremely low latency
Small capacity (typically 32 KB – 128 KB)

The purpose of L1 cache is simple: keep the most frequently used instructions and data close to the processor.

Because CPU pipelines run at very high speeds, L1 cache must deliver data within a few cycles.

L2 and L3 Caches – Balancing Speed and Capacity

As programs become larger and more complex, L1 cache alone is not enough.

L2 Cache

Larger than L1
Slightly slower
Typically 256 KB – 1 MB
May be private or shared depending on architecture

L3 Cache

Even larger
Shared between multiple cores
Typically several megabytes

These caches act as intermediate buffers between the CPU and main memory, reducing expensive accesses to DRAM.

Main Memory – Off-Chip DRAM

Beyond the caches lies main memory, typically implemented using DRAM.

Characteristics:

Much larger capacity
Higher latency
Located off-chip

Accessing DRAM may take hundreds of clock cycles, which is extremely slow compared to CPU speed.

This is why caches are critical — they hide DRAM latency by storing frequently accessed data closer to the processor.

Non-Volatile Storage – Persistent Data

At the bottom of the hierarchy is non-volatile storage, such as:

SSD
Flash
Hard disks

Unlike SRAM or DRAM, these memories retain data even when power is off. However, their latency is much higher, so they are used primarily for long-term storage.

The Key Trade-Off in Memory Design: PPA

Whenever we design any memory block in VLSI, we constantly evaluate three critical parameters:

PPA — Power, Performance, Area

Power
Performance
Area

These three metrics define the quality of a memory design.

Performance: Latency and Throughput

Performance of memory is mainly determined by two factors.

Latency

Time taken to fetch data after a request is issued. Lower latency means faster program execution and fewer processor stalls.

Throughput

Amount of data transferred per unit time. Higher throughput improves system bandwidth and parallel workloads.

Power: A Critical Constraint

Power consumption has become one of the biggest challenges in modern SoC design. Memory contributes significantly to system power because of frequent accesses and large memory arrays.

Reducing power often involves techniques such as:

Bitline optimization
Banking
Clock gating
Low-leakage devices

Area: Why Memory Dominates the Chip

Area is extremely important in semiconductor manufacturing. A smaller die area means:

More chips per wafer
Higher yield
Lower manufacturing cost

Designers often remember a simple relationship:

Yield ∝ 1 / Area

Larger chips are statistically more likely to contain manufacturing defects.

This is why memory designers often say:

“Area is diamond in memory design.”

Because memory arrays are replicated millions of times, even small improvements in cell area can significantly reduce total chip size.

Why Memory Optimization Matters

In many modern SoCs:

Memory occupies more than half of the chip area
Memory accesses dominate power consumption
Memory latency often limits system performance

Because of this, optimizing memory architecture can have a massive impact on the entire chip.

A well-designed memory hierarchy can:

Reduce processor stalls
Improve energy efficiency
Lower manufacturing cost

Understanding these building blocks will give you a real glimpse into how modern cache memories are designed inside processors.

Stay tuned for the next article in this series on VLSI EDGE.

Understanding memory means understanding the backbone of modern SoC design.

Memory Design 101: The Secret Backbone of Every Modern SoC | Afzal Malik

Deep Dive Series: Memory Design

By Afzal Malik | Industry Perspective

In the semiconductor industry, there’s a common saying: "Logic wins the prize, but Memory pays the bills." While high-speed CPUs and neural engines grab the headlines, the reality is that 60% to 80% of a modern SoC's footprint is dedicated to memory. From the tiny registers in a pipeline to the massive L3 caches in a server chip, memory is the circulatory system of data.

As a memory design engineer, your job is a constant battle against the "Power-Performance-Area" (PPA) triad. You aren't just placing gates; you are managing millivolts of noise margin and femtofarads of parasitic capacitance.

1. Why Memory Design is the Ultimate Challenge

Memory design is unique because it is custom-intensive. Unlike standard cell digital logic where you use automated Place and Route (PnR) tools, memory—especially the bitcell and the sensing circuitry—is often designed by hand at the transistor level. Why?

The Density Constraint: In a 16MB cache, you have over 134 million transistors just for the bitcells. If your bitcell is 10% larger than it needs to be, you might lose 20% of your chip's profit margin.
The Signal Integrity Battle: Reading a memory cell involves discharging a highly capacitive "Bitline." We are often looking for a voltage swing of only 50mV to 100mV before we have to sense it. Distinguishing that signal from background noise is a feat of analog engineering.

2. The Hierarchy: From Speed to Bulk

Not all memory is created equal. We categorize memory based on its proximity to the processor:

Type	Latency	Density	Primary Use
Flip-Flops / Reg	Zero (1-cycle)	Very Low	Datapath / Control
SRAM	Low (1-5 cycles)	Medium	L1/L2/L3 Caches
DRAM	High (100+ cycles)	Very High	Main System Memory

3. Anatomy of a Memory Instance

When you look at a memory "hard macro" (a finished memory block), it consists of several critical components:

The Bitcell Array: The core where data lives (usually 6T SRAM cells).
Row Decoder: Converts an address into a single "Wordline" (WL) activation.
Column Mux/Peripheral: Selects which bitline to route to the output.
Sense Amplifier: The "heart" of the read operation; it amplifies tiny voltage differences to full CMOS logic levels.
Control Logic: Manages the timing of clocks, pre-charge pulses, and enable signals.

4. The Frontier: High Bandwidth & AI

The latest challenge in memory design is Bandwidth. With AI models needing gigabytes of parameters, we are moving toward **HBM (High Bandwidth Memory)** and **3D Stacking**. In these designs, the memory is literally stacked on top of the logic using TSVs (Through-Silicon Vias). This is the cutting edge where "Memory Design" becomes "Systems Engineering."

Conclusion: Your Path in Memory Design

Memory design is the perfect career path if you love both digital logic and analog precision. It requires a deep understanding of device physics, layout parasitics, and architectural bottlenecks.

Stay tuned for Part 2

Digital Design Flow in Cadence Virtuoso — From Schematic to Layout and Parasitic Extraction Using a CMOS Inverter (90 nm) I Md Ghalib Hussain

Digital Design Flow in Cadence Virtuoso: From Schematic to Layout and Parasitic Extraction (90 nm CMOS)

Based on a tutorial by Md Ghalib Hussain . This article explains the complete custom-design flow in Cadence Virtuoso using a simple yet fundamental block — the CMOS inverter.

Introduction

The CMOS inverter is the “Hello World” of analog and digital circuit design. Even though it seems simple, designing and analyzing an inverter in a professional EDA environment such as Cadence Virtuoso helps students understand the real-world flow used in semiconductor companies.

In this article, we will walk through the complete custom-design flow:

Schematic design
Symbol creation
Testbench setup
DC & transient simulations
Layout design (90 nm gPDK)
Design Rule Check (DRC)
Layout vs Schematic (LVS)
Parasitic extraction
Post-layout simulation

This mirrors the exact workflow used inside industry — making it highly valuable for students, beginners, and engineers preparing for internships.

1. Schematic Design

The flow begins with drawing the inverter in the Virtuoso Schematic Editor. The inverter uses:

One PMOS (pull-up)
One NMOS (pull-down)

Key aspects students must understand:

Device models come from the foundry PDK (in this case, 90 nm gPDK).
W/L selection controls switching threshold and noise margins.
Power rails (VDD, GND) must be correctly assigned.

Why schematic design matters?

This stage ensures your circuit is electrically correct. Before touching layout, the inverter must fully meet functional expectations. Students often skip this step, but professionals rely heavily on schematic-level simulation.

2. Creating a Symbol & Testbench

Once the schematic is complete, a symbol is generated. This allows the inverter to be instantiated in other circuits and testbenches.

In the testbench, the following elements are added:

AC / transient input sources
Load capacitance (C_L) — crucial for realistic delay
Power rails

Simulations performed:

DC Transfer Curve → VTC, switching threshold, noise margins
Transient Simulation → rise/fall delays, propagation delay

3. Layout Design (Virtuoso Layout Suite)

Layout converts the schematic into a manufacturable geometric representation. For beginners, the CMOS inverter is the best starting point because it introduces:

Diffusion regions
Polysilicon gate formation
N-well / P-well structures
Contacts & vias
Metal routing & spacing rules

The PDK enforces design rules — minimum widths, spacing, enclosure, and overlaps — that ensure manufacturability.

4. DRC — Design Rule Check

After layout, a DRC is run to ensure all geometries meet foundry rules. Common beginner errors:

Poly spacing violations
Metal enclosure issues
Minimum width of diffusion
Via enclosure violations

5. LVS — Layout vs Schematic

LVS ensures that the layout corresponds exactly to the schematic. If the layout connectivity doesn't match, the circuit will fail in silicon.

Common LVS issues:

Incorrect net connection
Transistor bulk not connected properly
Mismatched device parameters

6. Parasitic Extraction (PEX)

Parasitics arise from:

Interconnect resistance (R)
Diffusion capacitances
Gate overlap capacitances
Coupling capacitances between metal layers

Extraction creates an annotated netlist with R, C components included. This step is essential because post-layout delay can be very different from schematic simulation.

7. Post-Layout Simulation

The final step is running simulations with parasitics included.

Observations typically include:

Increased propagation delay
Slight shift in switching threshold
Slew rate degradation

This is the closest approximation to real silicon behavior.

Video Tutorial Series

This entire flow is explained beautifully by Md Ghalib Hussain in his YouTube playlist:

Watch YouTube Playlist →

Conclusion

Understanding the full custom design flow — from schematic to layout to post-layout simulation — is essential for anyone aiming for roles in VLSI Design, Physical Design, or AMS Circuit Design. The CMOS inverter provides the perfect foundation to learn industry-standard tools like Cadence Virtuoso.

Credits: Tutorial and project by Md Ghalib Hussain. Article written for VLSIEdge by Afzal Malik.

How to Build a Strong VLSI Portfolio While in College - by Afzal Malik

How to Build a Strong VLSI Portfolio While in College

A practical, experience-based guide for students who want to stand out in VLSI internships, placements, and research roles.

Why a VLSI Portfolio Matters More Than You Think

When I started my own VLSI journey, nobody told me that a portfolio is more important than your GPA. Not because grades aren’t important—they are—but because VLSI is a practical, design-driven field. Whether you're applying for an internship, a job, or even a research project, companies want proof that:

You can think like an engineer.
You can simulate, debug, and analyze circuits.
You understand fundamentals beyond textbooks.
You’ve touched real tools or built real mini projects.

A strong portfolio demonstrates all of this—without you saying a word.

What Exactly Is a VLSI Portfolio?

A VLSI portfolio is a collection of your work that shows your:

Design skills (analog/digital)
Simulation ability
Understanding of concepts
Project execution
Documentation quality
Problem-solving mindset

Think of it like a personal "design journal" that you publicly showcase using GitHub or a simple Google Drive folder + website.

Step 1: Start With 3–5 Solid Mini Projects

You don’t need a big SoC project. You just need simple but well-executed projects. Here are beginner-friendly yet impressive ones:

CMOS Inverter Characterization – delay, power, noise margin
Current Mirror Analysis – mismatch, output resistance
Two-stage Op-Amp – gain, GBW, PM, stability
6T SRAM Bitcell – read/write operation explained
Simple FIR Filter on FPGA – Verilog + testbench
RC Delay Model – comparing theoretical vs simulated

Each mini project teaches something real. Interviewers love students who can articulate small concepts extremely well.

Step 2: Document Every Project Like an Engineer

Your documentation should contain:

Problem Statement — What you are building.
Theory Overview — Basics written in your own words.
Hand Calculations — Even if approximate.
Simulation Setup — Tools, models, parameters.
Results + Waveforms — Screenshots with labels.
Analysis — Why the results make sense.
Comparison — Theory vs simulation.
Conclusion — What you learned.

This is EXACTLY how engineers write design reports at companies like ST, Intel, Qualcomm, TI, and NXP.

Step 3: Use GitHub to Create a Clean Public Portfolio

GitHub is the industry gold standard. Recruiters love to see:

neat folder structures
proper naming
clear README files
version history
organized simulation files

Your GitHub should have repositories like:

VLSI-Portfolio/
├── CMOS-Inverter/
│   ├── docs/
│   ├── simulations/
│   ├── README.md
├── OpAmp-Design/
├── Current-Mirror/
├── FIR-Filter-FPGA/
└── PLL-Notes/

This instantly sets you apart from 95% of students.

Step 4: Build a Simple Personal Website

It doesn’t have to be fancy. Even a Blogger or GitHub Pages site works. Your website should have:

About Me — your background
Projects — link your GitHub work
Resume — internship-friendly version
Articles — publish tutorials or reflections
Contact

This shows maturity and communication ability—both highly valued in VLSI roles.

Step 5: Publish Articles Sharing What You Learn

Sharing knowledge is one of the strongest signals of confidence and understanding. Good topics for students:

“How I simulated my first CMOS inverter”
“What I learned building a current mirror”
“5 things every VLSI student should know before starting cadence”
“My experience debugging simulation errors”

These posts help others AND show recruiters you're serious.

Step 6: Build Depth in One Track

You don’t need to master everything. Pick ONE track:

Analog
Digital
Physical Design
Verification
Memory

Then build 3–4 projects around that track. Engineers respect **depth over random breadth**.

Step 7: Showcase Everything When Applying

In your resume:

Link GitHub
Link personal website
Add 2–3 strongest projects
Share waveforms and analysis during interviews

Trust me — this makes interviewers take you seriously.

Final Checklist

☑ 3–5 mini projects documented
☑ GitHub repositories clean and public
☑ Simple personal website
☑ Articles explaining what you learned
☑ One chosen specialization
☑ Resume linked to projects

Conclusion

A good portfolio doesn’t require money, expensive tools, or special labs. It requires consistency, curiosity, and documentation. Start small. Build mini projects. Share your learning. Within 6 months, you’ll look completely different from the crowd—and companies will notice.

Written by: Afzal Malik

Wednesday, 4 March 2026

Memory Design 101 – Part 2

Understanding Memory Hierarchy and Why Optimization Matters in SoC Design

The Need for a Memory Hierarchy

A simplified memory hierarchy looks like this:

Level 1: Registers – The Fastest Memory

Level 2: L1 Cache – Processor Speed Driven

L2 and L3 Caches – Balancing Speed and Capacity

L2 Cache

L3 Cache

Main Memory – Off-Chip DRAM

Non-Volatile Storage – Persistent Data

The Key Trade-Off in Memory Design: PPA

PPA — Power, Performance, Area

Performance: Latency and Throughput

Latency

Throughput

Power: A Critical Constraint

Area: Why Memory Dominates the Chip

Why Memory Optimization Matters

Tuesday, 3 March 2026

1. Why Memory Design is the Ultimate Challenge

2. The Hierarchy: From Speed to Bulk

3. Anatomy of a Memory Instance

4. The Frontier: High Bandwidth & AI

Conclusion: Your Path in Memory Design

Friday, 5 December 2025

Digital Design Flow in Cadence Virtuoso: From Schematic to Layout and Parasitic Extraction (90 nm CMOS)

Introduction

1. Schematic Design

Why schematic design matters?

2. Creating a Symbol & Testbench

Simulations performed:

3. Layout Design (Virtuoso Layout Suite)

4. DRC — Design Rule Check

5. LVS — Layout vs Schematic

Common LVS issues:

6. Parasitic Extraction (PEX)

7. Post-Layout Simulation

Video Tutorial Series

Conclusion

Saturday, 22 November 2025

Why a VLSI Portfolio Matters More Than You Think

What Exactly Is a VLSI Portfolio?

Step 1: Start With 3–5 Solid Mini Projects

Step 2: Document Every Project Like an Engineer

Step 3: Use GitHub to Create a Clean Public Portfolio

Step 4: Build a Simple Personal Website

Step 5: Publish Articles Sharing What You Learn

Step 6: Build Depth in One Track

Step 7: Showcase Everything When Applying

Final Checklist

Conclusion

Popular Posts

Categories

Search This Blog