SSD System Architecture Overview

This is the highest-level view of a NAND flash-based Solid State Drive (SSD) system. The FBI (Frequency Boosting Interface) chip bridges the speed gap between the SSD controller and multiple NAND flash memory dies, enabling higher capacity while maintaining high-speed performance.

SSD System Architecture
SSD System Architecture - Click to explore the FBI chip and NAND interface
▼ Click to descend into detail ▼

🔌 Host Interface

PCIe/NVMe interface connecting to the application processor (AP) or host system. Supports high-speed serial communication (3-6 Gbps per lane).

🎛️ SSD Controller

Manages flash translation layer (FTL), wear leveling, garbage collection, and error correction. Controls data flow between host and NAND flash memory.

⚡ FBI Chip

Frequency Boosting Interface chip that multiplies the interface speed (typically 2:1 ratio) to overcome the bottleneck of multi-chip stacked NAND packages.

💾 NAND Flash Array

Non-volatile memory storage consisting of multiple NAND dies stacked in a multi-chip package (MCP). Stores actual data in floating-gate transistors.

Key Concepts at This Level

  • Performance Bottleneck: As serial host interfaces (PCIe) achieve higher speeds, the internal NAND flash interface becomes the limiting factor
  • Capacity vs. Speed Trade-off: Stacking more NAND dies increases capacity but adds capacitive loading, degrading signal integrity
  • FBI Solution: The FBI chip enables frequency multiplication while managing the challenges of multi-drop bus topology

FBI Chip & NPHY Interface Protocol

The FBI (Frequency Boosting Interface) chip sits between the SSD controller and NAND packages, implementing frequency multiplication and signal conditioning. NPHY (NAND PHY) is the physical layer protocol that defines electrical characteristics and timing for NAND flash communication.

FBI Chip Internal Architecture

FBI Chip Architecture

The FBI chip performs critical functions:

  • 2:1 Frequency Multiplication: Doubles the interface speed between controller and NAND
  • Fast-Lock PLL: Phase-locked loop with 16-cycle lock time and extended pull-in range
  • Equalization: Compensates for inter-symbol interference (ISI) and reflected noise
  • Multi-drop Support: Handles up to 4-drop configurations while maintaining signal integrity
  • Low Power Operation: Achieves 2.85 pJ/bit energy efficiency
▼ Click to explore internal components ▼
NPHY Protocol Layers

NPHY Interface Protocol

NPHY defines the physical layer for NAND communication:

  • ONFI Standard: Open NAND Flash Interface (ONFI 4.x/5.x) specification
  • NV-DDR3: Non-Volatile DDR3 interface with speeds up to 800 MT/s
  • Toggle Mode: Alternative high-speed interface (Toggle 2.0)
  • Signal Types: Data (DQ), Address/Command (ALE/CLE), Control (CE#, RE#, WE#)
  • ZQ Calibration: Impedance calibration for optimal signal integrity
▼ Click to explore protocol details ▼

Technical Specifications

Parameter FBI Side (Controller) NAND Side (Memory)
Interface Speed 6.4 Gb/s/pin 3.2 Gb/s/pin
Frequency Ratio 2:1 multiplication Base frequency
Energy Efficiency 2.85 pJ/bit Varies by NAND
Process Node 12nm CMOS Various (14nm-50nm)
Max Drops 4 NAND packages per channel

Controller & PHY Component Architecture

At this level, we examine the internal components of both the NAND controller and PHY layer. The controller manages high-level operations while the PHY handles physical signal transmission.

NAND Flash Controller

Digital Logic
Controller Block Diagram

Main Functional Blocks:

  • Command Engine: Processes and queues NAND commands
  • DMA Controller: Manages data transfers with 32/64-bit addressing
  • ECC Engine: Error correction with configurable BCH/LDPC codes
  • FTL (Firmware): Flash Translation Layer mapping
  • Buffer Manager: SRAM buffers for data staging
  • Interface Logic: AXI/AHB system bus interface
▼ Click for block-level detail ▼

NAND PHY Layer

Mixed Signal
PHY Block Diagram

Main Functional Blocks:

  • PLL (Phase-Locked Loop): Clock generation and synchronization
  • DLL (Delay-Locked Loop): Precise timing alignment
  • TX Driver: Transmit buffers with impedance control
  • RX Receiver: Input buffers with equalization
  • ZQ Calibration: Automatic impedance matching
  • DFI Interface: Modified DFI 3.0 for controller connection
▼ Click for block-level detail ▼

FBI Bridge Chip

Mixed Signal
FBI Internal Blocks

Main Functional Blocks:

  • Frequency Multiplier: 2:1 ratio frequency conversion
  • Fast-Lock PLL: 16-cycle lock, extended pull-in range
  • TX/RX Equalizers: Pre-emphasis and de-emphasis
  • Data Path Logic: FIFO buffering and flow control
  • Jitter Filter: Input jitter reduction
  • Power Management: Dynamic voltage/frequency scaling
▼ Click for block-level detail ▼

NAND Flash Die

Analog Memory
NAND Die Architecture

Main Functional Blocks:

  • Memory Array: 3D NAND cell matrix (SLC/MLC/TLC)
  • Page Buffer: Row buffer for read/write operations
  • Row Decoder: Word line selection logic
  • Column Decoder: Bit line selection logic
  • Charge Pumps: High-voltage generation for programming
  • I/O Interface: Data, command, address I/O pins
▼ Click for block-level detail ▼

Data Flow Through The System

1
Host Command

PCIe/NVMe command from host system

2
Controller Processing

FTL translation, ECC encoding, buffering

3
FBI Frequency Boost

2x speed increase with signal conditioning

4
PHY Transmission

Physical signal to NAND via ONFI/Toggle

5
NAND Array

Data programmed/read from flash cells

Block-Level Digital Architecture

This level reveals the detailed block diagrams showing register transfer level (RTL) components, state machines, data paths, and control logic that implement the functionality described at Level 3.

Fast-Lock PLL Architecture

PLL Block Diagram

Components:

  • Phase Detector (PD): Compares input and feedback phases
  • Charge Pump (CP): Converts phase error to control voltage
  • Loop Filter: Low-pass filter for stability
  • VCO (Voltage Controlled Oscillator): Generates output clock
  • Divider: Feedback path with programmable ratio
  • Lock Detector: Monitors PLL lock status

Key Feature: 16-cycle lock time enables fast frequency switching during NAND command sequences.

▼ Click for circuit-level detail ▼

Command Engine Architecture

Command Engine Block

Components:

  • Command FIFO: Queues incoming NAND commands
  • State Machine: Controls command execution sequence
  • Timer/Counter: Manages NAND timing requirements
  • Status Register: Tracks command completion
  • Arbiter: Manages multi-channel access
  • Auto-Command Generator: Generates internal commands

Operation: Supports pipelined read-ahead and write commands for maximum throughput.

▼ Click for circuit-level detail ▼

DMA Controller Architecture

DMA Controller Block

Components:

  • AXI Master: System bus interface (32/64-bit)
  • Descriptor Engine: Processes transfer descriptors
  • Channel Arbitration: Multi-channel priority control
  • FIFO Buffers: Data staging between clock domains
  • Address Generator: Incremental/scatter-gather addressing
  • Interrupt Controller: Transfer completion signaling

Modes: Supports both simple transfers and chained descriptor modes for complex scatter-gather operations.

▼ Click for circuit-level detail ▼

TX/RX Equalizer Architecture

Equalizer Block

Components:

  • FIR Filter: Finite impulse response for pre-emphasis
  • Adaptive Engine: Coefficient adjustment based on channel
  • DFE (Decision Feedback Equalizer): Post-cursor ISI cancellation
  • CTLE (Continuous Time Linear Equalizer): Frequency compensation
  • CDR (Clock Data Recovery): Sampling clock extraction
  • Pattern Detector: Training sequence recognition

Purpose: Compensates for signal degradation in 4-drop multi-chip configurations and long PCB traces.

▼ Click for circuit-level detail ▼

ONFI 4.x NV-DDR3 Interface Timing

ONFI Timing Diagram

Critical Timing Parameters:

Parameter Symbol Min Typ Max Unit
Clock Cycle Time tCK 1.25 - - ns
Data Setup Time tDS 200 - - ps
Data Hold Time tDH 200 - - ps
Output Access Time tAC - - 500 ps
ZQ Calibration Time tZQCAL - 1 - ms

Circuit-Level Analog & Digital Design

At the deepest level, we examine the actual transistor-level circuits and analog design that implement the blocks from Level 4. This includes SPICE models, transistor schematics, and layout considerations.

Charge Pump Circuit (PLL)

Charge Pump Schematic

SPICE Netlist:

* Fast-Lock Charge Pump
.subckt charge_pump UP DN VDD VSS VOUT
* Up current source (PMOS)
M1 VOUT UP_B VDD VDD pmos_3v3 W=10u L=0.18u
M2 UP_B UP VDD VDD pmos_3v3 W=10u L=0.18u
* Down current source (NMOS)
M3 VOUT DN_B VSS VSS nmos_3v3 W=5u L=0.18u
M4 DN_B DN VSS VSS nmos_3v3 W=5u L=0.18u
* Current matching: Iup = 50uA, Idn = 50uA
* Switch resistance < 100 ohms
.ends charge_pump

Function: Converts phase detector output to control voltage for VCO

Design Goals: Matched up/down currents, low switch resistance, fast switching

Process: 12nm CMOS with thick-oxide devices for analog performance

Voltage Controlled Oscillator

VCO Schematic

Verilog-A Behavioral Model:

// Ring oscillator VCO
module vco_behavioral (
    input real VCTRL,
    output reg CLK
);
    parameter real KVCO = 500e6;  // 500 MHz/V
    parameter real FCENTER = 3.2e9;  // 3.2 GHz center
    parameter real VJITTER = 1e-12;  // 1ps RMS jitter
    
    real freq, period, phase;
    
    always @(VCTRL) begin
        freq = FCENTER + KVCO * (VCTRL - 0.6);
        period = 1.0 / freq;
    end
    
    initial begin
        CLK = 0;
        phase = 0;
        forever #(period/2) CLK = ~CLK;
    end
endmodule

Type: 5-stage differential ring oscillator

Frequency Range: 2.8 - 3.6 GHz (tuning range ~800 MHz)

Phase Noise: -110 dBc/Hz @ 1 MHz offset

High-Speed TX Driver

TX Driver Schematic

SystemVerilog RTL + Analog:

// Pre-emphasis TX driver
module tx_driver #(
    parameter NUM_TAPS = 3,
    parameter IMPEDANCE = 50  // Ohms
)(
    input wire clk,
    input wire [7:0] data_in,
    input wire [2:0] pre_emphasis,
    output wire txp, txn
);
    // 3-tap FIR for pre-emphasis
    reg [7:0] tap_delay [0:NUM_TAPS-1];
    wire [9:0] fir_output;
    
    always @(posedge clk) begin
        tap_delay[0] <= data_in;
        tap_delay[1] <= tap_delay[0];
        tap_delay[2] <= tap_delay[1];
    end
    
    // Pre-emphasis coefficients
    assign fir_output = 
        (tap_delay[0] << (3 + pre_emphasis)) +
        (tap_delay[1] << 3) - 
        (tap_delay[2] << pre_emphasis);
    
    // Differential output driver
    // SPICE: ZO = 50 ohms, slew rate > 6V/ns
endmodule

Output: CML differential driver, 400 mV swing

Impedance: 50-ohm On-die termination (ODT)

Slew Rate: > 6 V/ns for 6.4 Gb/s operation

ZQ Impedance Calibration

ZQ Calibration

Verilog Calibration FSM:

// ZQ impedance calibration state machine
module zq_calibration (
    input wire clk,
    input wire start_cal,
    input wire comp_result,  // from comparator
    output reg [5:0] code_p,  // PMOS code
    output reg [5:0] code_n,  // NMOS code
    output reg cal_done
);
    // Target impedance: 50 ohms
    // Binary search algorithm
    reg [3:0] state;
    reg [5:0] search_step;
    
    always @(posedge clk) begin
        case (state)
            IDLE: if (start_cal) state <= CAL_PMOS;
            CAL_PMOS: begin
                if (comp_result)
                    code_p <= code_p - search_step;
                else
                    code_p <= code_p + search_step;
                search_step <= search_step >> 1;
                if (search_step == 0) state <= CAL_NMOS;
            end
            CAL_NMOS: begin
                // Similar for NMOS
                if (search_step == 0) state <= DONE;
            end
            DONE: begin
                cal_done <= 1'b1;
                state <= IDLE;
            end
        endcase
    end
endmodule

Method: Binary search with precision comparator

Accuracy: ±1.5% impedance matching

Calibration Time: 1 ms for full range, 200 μs for tracking

3D NAND Flash Memory Cell

NAND Cell Structure

SPICE Model (Simplified):

* 3D NAND cell - floating gate transistor
.subckt nand_cell WL BL SRC
* Control gate (word line)
* Floating gate (stores charge)
* Tunnel oxide: ~7nm
* IPD (inter-poly dielectric): ~10nm

* Programming: Fowler-Nordheim tunneling
*   VWL = 20V, VBL = 0V, VSRC = 0V
*   Electrons tunnel to floating gate
*   Vth increases (0V -> 3V for programmed)

* Erasing: FN tunneling (block erase)
*   VWL = 0V, VBL = float, VSRC = 20V
*   Electrons tunnel from FG to substrate
*   Vth decreases (3V -> -1V for erased)

* Read operation:
*   VWL = 0V (or Vread levels for MLC)
*   VBL = 1V, VSRC = 0V
*   Cell conducts if erased (Vth < 0V)

Mcell BL WL SRC VSS nfet_fg 
+ W=50n L=50n
+ Vth0=0.5 Cox=5e-3 Qfg=0

.model nfet_fg nmos 
+ level=14 
+ tox=7e-9
.ends nand_cell

Type: Charge-trap or floating-gate transistor

3D NAND: Vertical channel, stacked 64-256 layers

Programming: ISPP (Incremental Step Pulse Programming)

Endurance: 3K-100K P/E cycles (depends on SLC/MLC/TLC)

BCH Error Correction Engine

ECC Architecture

Verilog BCH Encoder:

// BCH(255, 239) encoder - corrects 2 bits
module bch_encoder #(
    parameter N = 255,  // code length
    parameter K = 239,  // data length
    parameter T = 2     // error correction capability
)(
    input wire clk,
    input wire [K-1:0] data_in,
    output reg [N-1:0] codeword_out
);
    // Generator polynomial for BCH(255,239,2)
    // g(x) = x^16 + x^12 + x^5 + 1
    localparam [16:0] GEN_POLY = 17'b10001000000100001;
    
    reg [N-1:0] shift_reg;
    integer i;
    
    always @(posedge clk) begin
        shift_reg <= {data_in, {(N-K){1'b0}}};
        
        // Polynomial division
        for (i = N-1; i >= N-K; i = i - 1) begin
            if (shift_reg[i] == 1'b1)
                shift_reg[i-:17] <= shift_reg[i-:17] ^ GEN_POLY;
        end
        
        codeword_out <= {data_in, shift_reg[N-K-1:0]};
    end
    
    // Syndrome calculation and Berlekamp-Massey 
    // algorithm for decoding (not shown)
endmodule

ECC Type: BCH or LDPC (Low-Density Parity Check)

Capability: Up to 60-100 bits per 1KB for modern NAND

Latency: 1-10 μs depending on error density

Physical Layout Considerations

🔲 Floor Planning

  • Separate analog (PLL, VCO) from digital (FSM, FIFO)
  • Guard rings around sensitive analog blocks
  • Power domains: core (0.9V), I/O (1.8V/1.2V)
  • Minimize routing distance for high-speed signals

⚡ Signal Integrity

  • Differential pair routing (100 ohm impedance)
  • Matched trace lengths (< 50 ps skew)
  • Via shielding for transitions between layers
  • Return path planning for high-speed signals

🔋 Power Integrity

  • On-die decoupling capacitors (MOM/MIM)
  • Power grid sizing for IR drop < 5%
  • Separate analog/digital grounds (star topology)
  • Current density: < 1 mA/μm for signal wires

🌡️ Thermal Management

  • Hot spot analysis (TX drivers, charge pumps)
  • Thermal vias to package substrate
  • Junction temperature: < 125°C
  • Power density: < 0.5 W/mm²

Design Tools & Verification

Analog Design:

  • Cadence Virtuoso - Schematic capture & simulation
  • HSPICE - Transistor-level SPICE simulation
  • Spectre - RF/mixed-signal simulation
  • Calibre - Layout vs. Schematic (LVS) & DRC

Digital Design:

  • Synopsys Design Compiler - RTL synthesis
  • VCS - Verilog simulation & verification
  • Verdi - Waveform debugging
  • Innovus - Place & Route

Signal Integrity:

  • HFSS - 3D electromagnetic simulation
  • ADS (Advanced Design System) - High-speed link simulation
  • Ansys SIwave - Power integrity analysis
  • HyperLynx - SerDes channel analysis

Verification:

  • UVM (Universal Verification Methodology)
  • SystemVerilog assertions & functional coverage
  • Formal verification tools (JasperGold)
  • FPGA prototyping (Xilinx/Intel)