Saturday, 9 May 2026

An Introductory guide to Memory Compilers (SRAM and Custom Memory) - By Afzal Malik

What are Memory Compilers in VLSI Design?

What are Memory Compilers in VLSI Design?

Memory compilers are software tools — or a combination of tools and scripts — that automatically generate memory circuits (SRAM, ROM, etc.) based on given specifications. The typical inputs are things like memory size, operating speed, power budget, and timing requirements. Based on these, the compiler automatically generates the memory circuit layout that meets the spec.


Why use compilers instead of manual design?

A manually designed “hard macro” (a fixed memory block) can be more optimized for a single specific use case — but it lacks the flexibility that modern SoC (System on Chip) design demands. Here is why compilers make more sense:

Variety of instances. In a modern chip, a customer might need multiple memory instances of different sizes — say 1024×32, 512×64, and so on. Manually designing each of them would take months of engineering effort. A compiler generates any configuration in minutes with minimal manual work. (That said, building the compiler itself takes considerable time and effort upfront.)

Layout flexibility. A customer might need a “Tall and Skinny” or “Short and Fat” memory depending on the available floorplan area. Compilers handle this by adjusting the Mux Factor — how many columns share the same I/O logic (like a Sense Amplifier). This reshapes the memory without redesigning it from scratch.

Automatic characterization. The compiler also generates the full ecosystem of files — timing models, power models, logical models — required for SoC integration. These views (Verilog, .lib/Liberty, LEF, SPICE Netlist, GDS/Layout, etc.) are all delivered to the customer automatically.

Built-in special features. Compiler-generated circuits can include things like Redundancy Logic. If any row or column is found dead during testing, the memory can repair itself by re-routing signals to a spare row or column already built into the array.


The “Leaf Cell” philosophy: how the layout is built

A memory compiler does not “draw” transistors or metal lines in the traditional sense. Instead, it acts as a high-speed architect that assembles a complex puzzle using a library of pre-validated Leaf Cells. These include:

  • Bitcell
  • Periphery cells (Wordline drivers, Sense Amplifiers, Column Mux, Control Logic, etc.)
  • End cells
  • Decoders

The heart of the compiler is a complex script, usually written in Tcl or Python. In a Cadence-based environment, this is typically written in SKILL. When a user inputs a specification like 2048×32, the script executes the following:

Coordinate Calculation — It calculates exactly how many rows and columns are needed, and determines the (X, Y) coordinates for every single leaf cell.

Tiling and Abutment — The script “tiles” cells side by side. These cells are designed for abutment, meaning when placed next to each other, their power rails (VDD/VSS) and signal lines automatically overlap and connect.

Auto-Stitching — The script ensures that global signals (like the main Wordline or Bitline) are perfectly continuous across thousands of cells without requiring manual routing.

Because individual leaf cells have already been meticulously checked for DRC (Design Rule Checks) and LVS (Layout vs. Schematic), the resulting massive layout is Correct-by-Construction. This is what allows a compiler to generate an error-free memory in seconds — a task that would take a human designer months.

Interesting Insight: In modern SoCs, memory arrays can occupy around 80% of the total die area, making bitcell optimization critical for both cost and performance. It is said that foundries sometimes intentionally violate certain standard design rules to squeeze out more bitcell efficiency. These are called Push Rules — in the bitcell, foundries allow rules that are 10–20% tighter than standard logic rules.


Types of Memory Compilers

In the SoC world, compilers come in a variety of flavors. Here are the main types you will encounter in industry:


Based on Performance: High Density (HD) vs. High Speed (HS)

High Density (HD) cell Smallest bitcell size Longer access time Area is primary concern Use case: caches, data buffers where die area = cost driver vs High Speed (HS) cell Larger bitcell size Faster access time Speed is primary concern Use case: high-performance critical path caches
Fig 2. HD vs HS compiler tradeoff: area efficiency vs access speed

High-Density (HD): These use the smallest, most optimized bitcells allowed by the foundry’s design rules. They are used in caches or data buffers where area is the primary cost driver, not operating frequency. Access time in HD compilers is longer compared to HS.

High-Speed (HS): These generally use larger bitcells and are optimized for situations where every picosecond of access time counts — typically in caches or buffers on high-performance critical paths.


Based on Architecture: Single Port vs. Two Port vs. Dual Port

Single Port (SP) Array R or W — not both Port A (R/W) 1 shared port Two Port (TP) Array R + W simultaneously Read Write dedicated ports Dual Port (DP) Array 2 independent ports Port A R/W Port B R/W fully independent
Fig 3. Single Port, Two Port, and Dual Port memory architectures compared

Single-Port (SP): The most common and area-efficient type. It has one set of address and data buses — you can either read or write in a single clock cycle, but never both at the same time. Used for general-purpose storage where high throughput is not a priority.

Two-Port (TP): Has one dedicated Read port and one dedicated Write port. This allows reading from one address while simultaneously writing to a different address.

Dual-Port (DP): The most flexible and also the most expensive in terms of area. It has two fully independent ports, each capable of both reading and writing.


Other important terminologies


Banking

Memory Banking is a “divide and conquer” strategy used by compilers to manage the physical limitations of large storage arrays. Instead of building one massive, sluggish block, the compiler partitions the memory into smaller, symmetrical sub-blocks (banks) that share a global periphery while maintaining their own local drivers.

This is critical because it significantly reduces the RC delay caused by long, highly capacitive bitlines — allowing much faster access times. Beyond speed, banking also offers a major power advantage: individual unused banks can be put into low-power retention modes while the rest of the memory remains active, effectively slashing the chip’s overall leakage footprint without sacrificing total storage capacity.

This is why in industry you will commonly hear terms like “bank-2” or “bank-4” compilers.


Monorail vs. Dual-Rail

Monorail VDDsafe Core array bitcells Periphery decoders, SA ⚠ Unavoidable compromise Lower V → bitcell instability Higher V → excessive periphery power Dual-Rail VDDMA VDDMP Core array higher, stable V Periphery scales with DVFS ✓ Decoupled control Core stays robust at high V for SNM Periphery scales to save dynamic power
Fig 5. Monorail vs. Dual-Rail power distribution in memory compilers

Power distribution is one of the most critical design decisions in memory compilers. Compilers generally offer two configurations:

Monorail: A single power supply (VDDsafe) is used for both the memory core and the peripheral logic. The trade-off is unavoidable — if you lower the voltage to save power, you risk bitcell instability (Read-Destruct or Write-Failure). If you keep it high for stability, the periphery consumes excessive dynamic power.

Dual-Rail: Modern SoCs almost exclusively use Dual-Rail architectures to decouple the needs of the bitcells from the high-speed requirements of the logic.

  • Memory Periphery Supply (VDDMP): Powers the row/column decoders, control logic, and I/O. Usually operates at the same voltage as the standard cells in the SoC. This allows the periphery to scale its voltage during Dynamic Voltage and Frequency Scaling (DVFS), significantly reducing switching power.
  • Memory Core/Array Supply (VDDMA): Dedicated strictly to the storage array. Often operates at a higher, more stable potential than the periphery. Keeping the core at a higher voltage ensures a robust Static Noise Margin (SNM), which is vital for preventing accidental bit-flips in advanced technology nodes.

Threshold Voltage (VT) Flavors

In memory design, selecting the threshold voltage (VT) is the primary lever for balancing leakage power and switching speed. Because SRAMs often sit idle for long periods but must respond instantly when addressed, the choice of VT impacts the entire chip’s power profile.

Memory compilers typically offer the following options:

Flavor What it means
HVT (High VT) “Low Power” choice. Lowest leakage, slowest switching. Used in non-critical data stores where battery life is the priority.
LVT (Low VT) “Performance” choice. Faster access times, but significantly higher standby leakage.
ULVT (Ultra-Low VT) Reserved for extreme performance. Transistors are leaky even when off — used only on the most critical speed paths of high-performance processors.
MVT (Mixed VT) Combines VT types. For example, the Memory Core uses HVT to minimize leakage in millions of bitcells, while the Periphery uses LVT to keep address decoding and sensing paths fast.

Low Power Modes

Active Core: full VDD ▮ Periphery: full VDD ▮ Normal read / write gate peri. Nap / Sleep Core: full VDD ▮ Periphery: OFF □ Data safe. Fast wake-up. Leakage reduced reduce V Retention (Deep Sleep) Core: VRET (reduced) ▫ Periphery: OFF □ Max leakage saving. Slower wake-up.
Fig 6. Memory low-power state transitions: Active, Nap/Sleep, and Retention (Deep Sleep)

Nap/Sleep Mode: The power supply to the Memory Periphery (decoders, sense amps, and control logic) is shut off, while the Memory Core remains fully powered. Since the periphery consists of a large number of switching transistors, gating its supply significantly reduces leakage. Because the core is still at full VDD, the data is 100% safe. Wake-up time is very fast — since the bitcells never lost voltage, the memory can return to active state almost instantly once the periphery power is restored.

Retention Mode (Deep Sleep): Takes power saving a step further. Not only is the periphery gated, but the voltage to the Memory Core is reduced to its Minimum Retention Voltage (VRET). A bitcell can hold its state at a voltage much lower than what is required for a Read or Write operation. By dropping the core voltage to just above the “trip point” of the cross-coupled inverters, leakage is slashed exponentially. However, operating too close to VRET makes the memory sensitive to soft errors.


The Compiler Delivery Suite

Once the compiler completes its generation cycle, it produces a variety of views tailored for specific stages of the ASIC design flow:

Memory Compiler Verilog / VHDL LEF GDSII .lib / Liberty CDL / SPICE RTL engineers (simulation) PD team (floorplanning, routing) Foundry (mask making, tapeout) STA tools (timing sign-off) LVS verification
Fig 7. The compiler delivery suite: output views and who consumes them in the ASIC design flow

Verification and Logic Views (Verilog/VHDL): A high-level description used by RTL engineers. It does not contain transistors but simulates the functional logic — for example, “If WEN is low and CLK rises, write data to address A.”

Physical Design and Implementation Views:

  • LEF (Layout Exchange Format): A “Physical Abstract.” Contains no internal transistor details but shows the macro’s boundary, metal layers, and pin locations. PD engineers use this for floorplanning and routing without bloating the tool’s memory with millions of polygons.
  • GDSII: The actual layout data — the “Golden File” sent to the foundry for mask making. Contains every single polygon for every layer (Diffusion, Poly, Metal, etc.).
  • Netlist (CDL/SPICE): Transistor-level connectivity. Essential for LVS (Layout vs. Schematic) checks to ensure the physical GDSII matches the intended circuit design.

Sign-off and Analysis Views:

  • Timing Models (.lib / Liberty): These Liberty files contain characterization data. They tell Static Timing Analysis (STA) tools how the memory performs under different Process, Voltage, and Temperature (PVT) conditions — for example, 0.9V at 125°C in a Slow-Slow corner.
  • AVM (Accuracy-aware Voltage Models): Critical for Power Integrity. Describes the current profiles during a read or write burst.

A note on how this was written

This article is a mix of a few things. Some of it comes from reading articles, papers, and documentation available on the internet. But a big part of it — especially the practical details, the industry terminology, and the “interesting insights” — comes from what I have personally learned and experienced while working in the memory compiler industry so far.

Working on actual compiler flows, dealing with real customer specs, and seeing how these views get used downstream in SoC integration is what gives this content its depth beyond what you would find in a textbook or a generic blog post.

The article was also refined with the help of AI writing tools to fix grammar and improve readability, while keeping the original voice and structure intact. Some of the diagrams and visuals included here were generated using AI image generation tools to better illustrate concepts that are otherwise hard to picture just from text.

So in short — it is a combination of reading, doing, and using the right tools to present it cleanly. That is pretty much how most good technical writing gets made.

Monday, 13 April 2026

BASH Chapter 1: Terminal, Finder, and Command Line Mastery

Chapter 1: Terminal, Finder, and Command Line Mastery

CHAPTER 1: TERMINAL and FINDER

A Technical Guide to Linux Scripting Fundamentals

1. Documentation & The REPL Model

MAN PAGES

To access the comprehensive system documentation for any utility, use the man page followed by the specific command. This is the primary source of truth for flags and arguments.

The Terminal & REPL

The Terminal is a program that facilitates an instance of Bash (the Bourne Again Shell). It operates on a REPL architecture:

  • READ: It captures user input.
  • EVAL: It parses and executes the command logic.
  • PRINT: It outputs the result to the screen.
  • LOOP: It returns to the prompt, waiting for the next instruction.
afzal@afzal $ echo hi
hi
NOTE: BASH always executes within a specific context—the directory or folder.

2. Navigation and Environment Awareness

afzal@afzal $ pwd
/Users/afzal/projects

pwd: Present Working Directory. It reveals your current absolute path in the file system hierarchy.

NOTE: Finder (GUI) and Terminal (CLI) exist in the same folder structure; they are simply two different ways to view the same data.

Core Commands:

  • ls: List the files and sub-directories within the current directory.
  • touch: Instantly create a new, empty file or update the timestamp of an existing one.
  • rm: Remove/delete a file from the system.
  • clear (or Ctrl + L): Flushes the terminal buffer to provide a clean workspace.
  • cd: Change Directory. This command modifies your working context, which can be verified using pwd.
NOTE: The terms "DIRECTORY" and "FOLDER" are used interchangeably in technical contexts.

Command History

Use the Up and Down Arrow keys to cycle through previously executed commands for rapid re-entry.

3. Basic File Manipulation

To rename or relocate a file, use the mv (move) command.

mv original_name.txt new_name.txt
CAUTION: Use these commands carefully. While Finder provides confirmation dialogs and "Trash" safety nets, the Terminal executes commands immediately and silently. These are powerful, low-level tools.

Wildcards and Safety

The * (asterisk) is a glob wildcard used to match any character pattern when deleting or moving files.

rm -i lesson-*

The -i flag enables Interactive Mode, forcing the shell to ask for a y/n (yes/no) confirmation before each deletion.

Aliases

If you want rm to always use specific flags, you can define an alias. For example, to make it recursive and forced:

alias rm='rm -rf'

To check the current configuration of an alias, use: alias rm.

4. Hidden Files & Hierarchy

Files prefixed with a . (dot) are considered hidden files.

touch .secret_config.txt

By default, Finder and the ls command do not display these files. To view them:

  • Terminal: Use ls -a (list all).
  • Finder: Use the shortcut Cmd + Shift + . (or Ctrl + H on some systems).
NOTE: ./ represents the current directory, and ../ represents the parent directory above it in the hierarchy.

Navigation Shortcuts:

  • cd - : Return to the last directory you were in.
  • cd ~ : Jump to the User Home directory.
  • cd / : Move to the system Root directory.

Tab Completion: Use the Tab key to auto-complete file and directory names. If there are multiple matching patterns, the terminal will emit a bell sound.

5. File Searching and Paging

cat: Concatenates and prints the entire contents of a file to the standard output.

cat /usr/share/dict/words

Searching with Grep:

grep is a powerful pattern-matching utility.

  • grep "dave" [file]: Finds occurrences of "dave" anywhere in the line.
  • grep "^dave" [file]: Matches lines beginning with "dave".
  • grep "dave$" [file]: Matches lines ending with "dave".

File Redirection:

  • echo "text" > file.txt: Overwrites the file with new content.
  • echo "text" >> file.txt: Appends the text to the end of the existing file.

6. Advanced Grep Flags & Pipelines

Flags can be combined to provide context to your search results:

  • -A1: (After) Print the match and 1 line following it.
  • -B1: (Before) Print the match and 1 line preceding it.
  • -C1: (Context) Print the match and lines before/after.
  • -i: Case-insensitive search.
  • -o: Print only the matching part of the pattern.

PIPELINES (|)

Pipelines allow you to chain commands together, treating them as filters. The output of the first command becomes the input for the second.

cat file.txt | grep "dave"

If grep is not given a filename, it automatically reads from Standard Input. This is the foundation of powerful data processing in Linux.

Wednesday, 4 March 2026

Understanding Memory Hierarchy and Why Optimization Matters in SoC Design | Memory Design 101 – Part 2 |

Memory Design Series:

  • Part 1 – Introduction to Memory Design
  • Part 2 – Memory Hierarchy and Optimization

Memory Design 101 – Part 2

Understanding Memory Hierarchy and Why Optimization Matters in SoC Design

By Afzal Malik

In the previous article of this series, we introduced an important industry perspective:

“Logic wins the headlines, but memory pays the bills.”

In modern System-on-Chip (SoC) designs, 60–80% of the silicon area is often occupied by memory structures such as caches, register files, and buffers. Because of this, understanding how memory is organized and optimized is a fundamental skill for anyone entering the world of VLSI and chip design.

In this article, we will explore two key ideas that guide memory architecture:

  • Memory Hierarchy – how different memories are organized in a system
  • PPA Trade-offs – the fundamental design constraints of memory blocks

The Need for a Memory Hierarchy

If we tried to design a system with only one type of memory, we would immediately run into a major problem.

Ideally, memory should be:

  • Extremely fast
  • Very large in capacity
  • Low power
  • Low cost

Unfortunately, in real hardware no single memory technology satisfies all these requirements simultaneously. Fast memories are usually small and expensive, while large memories are slower but cheaper.

To solve this problem, computer architects organize memory into a hierarchy, where each level balances speed, capacity, and cost.

A simplified memory hierarchy looks like this:

Processor
   │
Registers
   │
L1 Cache
   │
L2 Cache
   │
L3 Cache
   │
Main Memory (DRAM)
   │
Non-Volatile Storage (SSD / Disk)

As we move away from the processor:

  • Capacity increases
  • Latency increases
  • Cost per bit decreases

This layered structure allows processors to access data quickly most of the time, while still supporting large overall memory capacity.


Level 1: Registers – The Fastest Memory

Registers sit inside the processor datapath and are the fastest storage elements in the system.

Characteristics:

  • Located directly in the CPU datapath
  • Accessed in one clock cycle
  • Extremely small capacity
  • Implemented using flip-flops

Registers store intermediate arithmetic results, instruction operands, and temporary data used by the processor.

Because they must operate at core frequency, they are designed for maximum speed, not density.


Level 2: L1 Cache – Processor Speed Driven

The next level is the Level-1 cache (L1 cache).

L1 cache is usually split into two parts:

  • Instruction cache (I-Cache)
  • Data cache (D-Cache)

Key characteristics:

  • Located very close to the CPU core
  • Built using SRAM
  • Extremely low latency
  • Small capacity (typically 32 KB – 128 KB)

The purpose of L1 cache is simple: keep the most frequently used instructions and data close to the processor.

Because CPU pipelines run at very high speeds, L1 cache must deliver data within a few cycles.


L2 and L3 Caches – Balancing Speed and Capacity

As programs become larger and more complex, L1 cache alone is not enough.

L2 Cache

  • Larger than L1
  • Slightly slower
  • Typically 256 KB – 1 MB
  • May be private or shared depending on architecture

L3 Cache

  • Even larger
  • Shared between multiple cores
  • Typically several megabytes

These caches act as intermediate buffers between the CPU and main memory, reducing expensive accesses to DRAM.


Main Memory – Off-Chip DRAM

Beyond the caches lies main memory, typically implemented using DRAM.

Characteristics:

  • Much larger capacity
  • Higher latency
  • Located off-chip

Accessing DRAM may take hundreds of clock cycles, which is extremely slow compared to CPU speed.

This is why caches are critical — they hide DRAM latency by storing frequently accessed data closer to the processor.


Non-Volatile Storage – Persistent Data

At the bottom of the hierarchy is non-volatile storage, such as:

  • SSD
  • Flash
  • Hard disks

Unlike SRAM or DRAM, these memories retain data even when power is off. However, their latency is much higher, so they are used primarily for long-term storage.


The Key Trade-Off in Memory Design: PPA

Whenever we design any memory block in VLSI, we constantly evaluate three critical parameters:

PPA — Power, Performance, Area

  • Power
  • Performance
  • Area

These three metrics define the quality of a memory design.


Performance: Latency and Throughput

Performance of memory is mainly determined by two factors.

Latency

Time taken to fetch data after a request is issued. Lower latency means faster program execution and fewer processor stalls.

Throughput

Amount of data transferred per unit time. Higher throughput improves system bandwidth and parallel workloads.


Power: A Critical Constraint

Power consumption has become one of the biggest challenges in modern SoC design. Memory contributes significantly to system power because of frequent accesses and large memory arrays.

Reducing power often involves techniques such as:

  • Bitline optimization
  • Banking
  • Clock gating
  • Low-leakage devices

Area: Why Memory Dominates the Chip

Area is extremely important in semiconductor manufacturing. A smaller die area means:

  • More chips per wafer
  • Higher yield
  • Lower manufacturing cost

Designers often remember a simple relationship:

Yield ∝ 1 / Area

Larger chips are statistically more likely to contain manufacturing defects.

This is why memory designers often say:

“Area is diamond in memory design.”

Because memory arrays are replicated millions of times, even small improvements in cell area can significantly reduce total chip size.


Why Memory Optimization Matters

In many modern SoCs:

  • Memory occupies more than half of the chip area
  • Memory accesses dominate power consumption
  • Memory latency often limits system performance

Because of this, optimizing memory architecture can have a massive impact on the entire chip.

A well-designed memory hierarchy can:

  • Reduce processor stalls
  • Improve energy efficiency
  • Lower manufacturing cost

Understanding these building blocks will give you a real glimpse into how modern cache memories are designed inside processors.

Stay tuned for the next article in this series on VLSI EDGE.

Understanding memory means understanding the backbone of modern SoC design.

Tuesday, 3 March 2026

Memory Design 101: The Secret Backbone of Every Modern SoC | Afzal Malik

Deep Dive Series: Memory Design

By Afzal Malik |  Industry Perspective

In the semiconductor industry, there’s a common saying: "Logic wins the prize, but Memory pays the bills." While high-speed CPUs and neural engines grab the headlines, the reality is that 60% to 80% of a modern SoC's footprint is dedicated to memory. From the tiny registers in a pipeline to the massive L3 caches in a server chip, memory is the circulatory system of data.

As a memory design engineer, your job is a constant battle against the "Power-Performance-Area" (PPA) triad. You aren't just placing gates; you are managing millivolts of noise margin and femtofarads of parasitic capacitance.

1. Why Memory Design is the Ultimate Challenge

Memory design is unique because it is custom-intensive. Unlike standard cell digital logic where you use automated Place and Route (PnR) tools, memory—especially the bitcell and the sensing circuitry—is often designed by hand at the transistor level. Why?

  • The Density Constraint: In a 16MB cache, you have over 134 million transistors just for the bitcells. If your bitcell is 10% larger than it needs to be, you might lose 20% of your chip's profit margin.
  • The Signal Integrity Battle: Reading a memory cell involves discharging a highly capacitive "Bitline." We are often looking for a voltage swing of only 50mV to 100mV before we have to sense it. Distinguishing that signal from background noise is a feat of analog engineering.

2. The Hierarchy: From Speed to Bulk

Not all memory is created equal. We categorize memory based on its proximity to the processor:

Type Latency Density Primary Use
Flip-Flops / Reg Zero (1-cycle) Very Low Datapath / Control
SRAM Low (1-5 cycles) Medium L1/L2/L3 Caches
DRAM High (100+ cycles) Very High Main System Memory

3. Anatomy of a Memory Instance

When you look at a memory "hard macro" (a finished memory block), it consists of several critical components:

  1. The Bitcell Array: The core where data lives (usually 6T SRAM cells).
  2. Row Decoder: Converts an address into a single "Wordline" (WL) activation.
  3. Column Mux/Peripheral: Selects which bitline to route to the output.
  4. Sense Amplifier: The "heart" of the read operation; it amplifies tiny voltage differences to full CMOS logic levels.
  5. Control Logic: Manages the timing of clocks, pre-charge pulses, and enable signals.

4. The Frontier: High Bandwidth & AI

The latest challenge in memory design is Bandwidth. With AI models needing gigabytes of parameters, we are moving toward **HBM (High Bandwidth Memory)** and **3D Stacking**. In these designs, the memory is literally stacked on top of the logic using TSVs (Through-Silicon Vias). This is the cutting edge where "Memory Design" becomes "Systems Engineering."

Conclusion: Your Path in Memory Design

Memory design is the perfect career path if you love both digital logic and analog precision. It requires a deep understanding of device physics, layout parasitics, and architectural bottlenecks.

Stay tuned for Part 2

Friday, 5 December 2025

Digital Design Flow in Cadence Virtuoso — From Schematic to Layout and Parasitic Extraction Using a CMOS Inverter (90 nm) I Md Ghalib Hussain

Digital Design Flow in Cadence Virtuoso: From Schematic to Layout and Parasitic Extraction (90 nm CMOS)

Based on a tutorial by Md Ghalib Hussain . This article explains the complete custom-design flow in Cadence Virtuoso using a simple yet fundamental block — the CMOS inverter.

Introduction

The CMOS inverter is the “Hello World” of analog and digital circuit design. Even though it seems simple, designing and analyzing an inverter in a professional EDA environment such as Cadence Virtuoso helps students understand the real-world flow used in semiconductor companies.

In this article, we will walk through the complete custom-design flow:

  • Schematic design
  • Symbol creation
  • Testbench setup
  • DC & transient simulations
  • Layout design (90 nm gPDK)
  • Design Rule Check (DRC)
  • Layout vs Schematic (LVS)
  • Parasitic extraction
  • Post-layout simulation

This mirrors the exact workflow used inside industry — making it highly valuable for students, beginners, and engineers preparing for internships.

1. Schematic Design

The flow begins with drawing the inverter in the Virtuoso Schematic Editor. The inverter uses:

  • One PMOS (pull-up)
  • One NMOS (pull-down)

Key aspects students must understand:

  • Device models come from the foundry PDK (in this case, 90 nm gPDK).
  • W/L selection controls switching threshold and noise margins.
  • Power rails (VDD, GND) must be correctly assigned.

Why schematic design matters?

This stage ensures your circuit is electrically correct. Before touching layout, the inverter must fully meet functional expectations. Students often skip this step, but professionals rely heavily on schematic-level simulation.

2. Creating a Symbol & Testbench

Once the schematic is complete, a symbol is generated. This allows the inverter to be instantiated in other circuits and testbenches.

In the testbench, the following elements are added:

  • AC / transient input sources
  • Load capacitance (CL) — crucial for realistic delay
  • Power rails

Simulations performed:

  • DC Transfer Curve → VTC, switching threshold, noise margins
  • Transient Simulation → rise/fall delays, propagation delay

3. Layout Design (Virtuoso Layout Suite)

Layout converts the schematic into a manufacturable geometric representation. For beginners, the CMOS inverter is the best starting point because it introduces:

  • Diffusion regions
  • Polysilicon gate formation
  • N-well / P-well structures
  • Contacts & vias
  • Metal routing & spacing rules

The PDK enforces design rules — minimum widths, spacing, enclosure, and overlaps — that ensure manufacturability.

4. DRC — Design Rule Check

After layout, a DRC is run to ensure all geometries meet foundry rules. Common beginner errors:

  • Poly spacing violations
  • Metal enclosure issues
  • Minimum width of diffusion
  • Via enclosure violations

5. LVS — Layout vs Schematic

LVS ensures that the layout corresponds exactly to the schematic. If the layout connectivity doesn't match, the circuit will fail in silicon.

Common LVS issues:

  • Incorrect net connection
  • Transistor bulk not connected properly
  • Mismatched device parameters

6. Parasitic Extraction (PEX)

Parasitics arise from:

  • Interconnect resistance (R)
  • Diffusion capacitances
  • Gate overlap capacitances
  • Coupling capacitances between metal layers

Extraction creates an annotated netlist with R, C components included. This step is essential because post-layout delay can be very different from schematic simulation.

7. Post-Layout Simulation

The final step is running simulations with parasitics included.

Observations typically include:

  • Increased propagation delay
  • Slight shift in switching threshold
  • Slew rate degradation

This is the closest approximation to real silicon behavior.

Video Tutorial Series

This entire flow is explained beautifully by Md Ghalib Hussain in his YouTube playlist:

Watch YouTube Playlist →

Conclusion

Understanding the full custom design flow — from schematic to layout to post-layout simulation — is essential for anyone aiming for roles in VLSI Design, Physical Design, or AMS Circuit Design. The CMOS inverter provides the perfect foundation to learn industry-standard tools like Cadence Virtuoso.


Credits: Tutorial and project by Md Ghalib Hussain. Article written for VLSIEdge by Afzal Malik.

Saturday, 22 November 2025

How to Build a Strong VLSI Portfolio While in College - by Afzal Malik

How to Build a Strong VLSI Portfolio While in College

A practical, experience-based guide for students who want to stand out in VLSI internships, placements, and research roles.

Why a VLSI Portfolio Matters More Than You Think

When I started my own VLSI journey, nobody told me that a portfolio is more important than your GPA. Not because grades aren’t important—they are—but because VLSI is a practical, design-driven field. Whether you're applying for an internship, a job, or even a research project, companies want proof that:

  • You can think like an engineer.
  • You can simulate, debug, and analyze circuits.
  • You understand fundamentals beyond textbooks.
  • You’ve touched real tools or built real mini projects.

A strong portfolio demonstrates all of this—without you saying a word.

What Exactly Is a VLSI Portfolio?

A VLSI portfolio is a collection of your work that shows your:

  • Design skills (analog/digital)
  • Simulation ability
  • Understanding of concepts
  • Project execution
  • Documentation quality
  • Problem-solving mindset

Think of it like a personal "design journal" that you publicly showcase using GitHub or a simple Google Drive folder + website.

Step 1: Start With 3–5 Solid Mini Projects

You don’t need a big SoC project. You just need simple but well-executed projects. Here are beginner-friendly yet impressive ones:

  • CMOS Inverter Characterization – delay, power, noise margin
  • Current Mirror Analysis – mismatch, output resistance
  • Two-stage Op-Amp – gain, GBW, PM, stability
  • 6T SRAM Bitcell – read/write operation explained
  • Simple FIR Filter on FPGA – Verilog + testbench
  • RC Delay Model – comparing theoretical vs simulated

Each mini project teaches something real. Interviewers love students who can articulate small concepts extremely well.

Step 2: Document Every Project Like an Engineer

Your documentation should contain:

  1. Problem Statement — What you are building.
  2. Theory Overview — Basics written in your own words.
  3. Hand Calculations — Even if approximate.
  4. Simulation Setup — Tools, models, parameters.
  5. Results + Waveforms — Screenshots with labels.
  6. Analysis — Why the results make sense.
  7. Comparison — Theory vs simulation.
  8. Conclusion — What you learned.

This is EXACTLY how engineers write design reports at companies like ST, Intel, Qualcomm, TI, and NXP.

Step 3: Use GitHub to Create a Clean Public Portfolio

GitHub is the industry gold standard. Recruiters love to see:

  • neat folder structures
  • proper naming
  • clear README files
  • version history
  • organized simulation files

Your GitHub should have repositories like:

VLSI-Portfolio/
├── CMOS-Inverter/
│   ├── docs/
│   ├── simulations/
│   ├── README.md
├── OpAmp-Design/
├── Current-Mirror/
├── FIR-Filter-FPGA/
└── PLL-Notes/
      

This instantly sets you apart from 95% of students.

Step 4: Build a Simple Personal Website

It doesn’t have to be fancy. Even a Blogger or GitHub Pages site works. Your website should have:

  • About Me — your background
  • Projects — link your GitHub work
  • Resume — internship-friendly version
  • Articles — publish tutorials or reflections
  • Contact

This shows maturity and communication ability—both highly valued in VLSI roles.

Step 5: Publish Articles Sharing What You Learn

Sharing knowledge is one of the strongest signals of confidence and understanding. Good topics for students:

  • “How I simulated my first CMOS inverter”
  • “What I learned building a current mirror”
  • “5 things every VLSI student should know before starting cadence”
  • “My experience debugging simulation errors”

These posts help others AND show recruiters you're serious.

Step 6: Build Depth in One Track

You don’t need to master everything. Pick ONE track:

  • Analog
  • Digital
  • Physical Design
  • Verification
  • Memory

Then build 3–4 projects around that track. Engineers respect **depth over random breadth**.

Step 7: Showcase Everything When Applying

In your resume:

  • Link GitHub
  • Link personal website
  • Add 2–3 strongest projects
  • Share waveforms and analysis during interviews

Trust me — this makes interviewers take you seriously.

Final Checklist

  • ☑ 3–5 mini projects documented
  • ☑ GitHub repositories clean and public
  • ☑ Simple personal website
  • ☑ Articles explaining what you learned
  • ☑ One chosen specialization
  • ☑ Resume linked to projects

Conclusion

A good portfolio doesn’t require money, expensive tools, or special labs. It requires consistency, curiosity, and documentation. Start small. Build mini projects. Share your learning. Within 6 months, you’ll look completely different from the crowd—and companies will notice.

Written by: Afzal Malik

© VLSIEdge — Your edge in VLSI, chip design & semiconductors.