SSD System Architecture Overview

This is the highest-level view of a NAND flash-based Solid State Drive (SSD) system. The FBI (Frequency Boosting Interface) chip bridges the speed gap between the SSD controller and multiple NAND flash memory dies, enabling higher capacity while maintaining high-speed performance.

SSD System Architecture - Click to explore the FBI chip and NAND interface

▼ Click to descend into detail ▼

🔌 Host Interface

PCIe/NVMe interface connecting to the application processor (AP) or host system. Supports high-speed serial communication (3-6 Gbps per lane).

🎛️ SSD Controller

Manages flash translation layer (FTL), wear leveling, garbage collection, and error correction. Controls data flow between host and NAND flash memory.

⚡ FBI Chip

Frequency Boosting Interface chip that multiplies the interface speed (typically 2:1 ratio) to overcome the bottleneck of multi-chip stacked NAND packages.

💾 NAND Flash Array

Non-volatile memory storage consisting of multiple NAND dies stacked in a multi-chip package (MCP). Stores actual data in floating-gate transistors.

Key Concepts at This Level

Performance Bottleneck: As serial host interfaces (PCIe) achieve higher speeds, the internal NAND flash interface becomes the limiting factor
Capacity vs. Speed Trade-off: Stacking more NAND dies increases capacity but adds capacitive loading, degrading signal integrity
FBI Solution: The FBI chip enables frequency multiplication while managing the challenges of multi-drop bus topology

FBI Chip & NPHY Interface Protocol

The FBI (Frequency Boosting Interface) chip sits between the SSD controller and NAND packages, implementing frequency multiplication and signal conditioning. NPHY (NAND PHY) is the physical layer protocol that defines electrical characteristics and timing for NAND flash communication.

FBI Chip Architecture

The FBI chip performs critical functions:

2:1 Frequency Multiplication: Doubles the interface speed between controller and NAND
Fast-Lock PLL: Phase-locked loop with 16-cycle lock time and extended pull-in range
Equalization: Compensates for inter-symbol interference (ISI) and reflected noise
Multi-drop Support: Handles up to 4-drop configurations while maintaining signal integrity
Low Power Operation: Achieves 2.85 pJ/bit energy efficiency

▼ Click to explore internal components ▼

NPHY Interface Protocol

NPHY defines the physical layer for NAND communication:

ONFI Standard: Open NAND Flash Interface (ONFI 4.x/5.x) specification
NV-DDR3: Non-Volatile DDR3 interface with speeds up to 800 MT/s
Toggle Mode: Alternative high-speed interface (Toggle 2.0)
Signal Types: Data (DQ), Address/Command (ALE/CLE), Control (CE#, RE#, WE#)
ZQ Calibration: Impedance calibration for optimal signal integrity

▼ Click to explore protocol details ▼

Technical Specifications

Parameter	FBI Side (Controller)	NAND Side (Memory)
Interface Speed	6.4 Gb/s/pin	3.2 Gb/s/pin
Frequency Ratio	2:1 multiplication	Base frequency
Energy Efficiency	2.85 pJ/bit	Varies by NAND
Process Node	12nm CMOS	Various (14nm-50nm)
Max Drops	4 NAND packages per channel

Controller & PHY Component Architecture

At this level, we examine the internal components of both the NAND controller and PHY layer. The controller manages high-level operations while the PHY handles physical signal transmission.

NAND Flash Controller

Digital Logic

Main Functional Blocks:

Command Engine: Processes and queues NAND commands
DMA Controller: Manages data transfers with 32/64-bit addressing
ECC Engine: Error correction with configurable BCH/LDPC codes
FTL (Firmware): Flash Translation Layer mapping
Buffer Manager: SRAM buffers for data staging
Interface Logic: AXI/AHB system bus interface

▼ Click for block-level detail ▼

NAND PHY Layer

Mixed Signal

Main Functional Blocks:

PLL (Phase-Locked Loop): Clock generation and synchronization
DLL (Delay-Locked Loop): Precise timing alignment
TX Driver: Transmit buffers with impedance control
RX Receiver: Input buffers with equalization
ZQ Calibration: Automatic impedance matching
DFI Interface: Modified DFI 3.0 for controller connection

▼ Click for block-level detail ▼

FBI Bridge Chip

Mixed Signal

Main Functional Blocks:

Frequency Multiplier: 2:1 ratio frequency conversion
Fast-Lock PLL: 16-cycle lock, extended pull-in range
TX/RX Equalizers: Pre-emphasis and de-emphasis
Data Path Logic: FIFO buffering and flow control
Jitter Filter: Input jitter reduction
Power Management: Dynamic voltage/frequency scaling

▼ Click for block-level detail ▼

NAND Flash Die

Analog Memory

Main Functional Blocks:

Memory Array: 3D NAND cell matrix (SLC/MLC/TLC)
Page Buffer: Row buffer for read/write operations
Row Decoder: Word line selection logic
Column Decoder: Bit line selection logic
Charge Pumps: High-voltage generation for programming
I/O Interface: Data, command, address I/O pins

▼ Click for block-level detail ▼

Data Flow Through The System

Host Command

PCIe/NVMe command from host system

→

Controller Processing

FTL translation, ECC encoding, buffering

→

FBI Frequency Boost

2x speed increase with signal conditioning

→

PHY Transmission

Physical signal to NAND via ONFI/Toggle

→

NAND Array

Data programmed/read from flash cells

Block-Level Digital Architecture

This level reveals the detailed block diagrams showing register transfer level (RTL) components, state machines, data paths, and control logic that implement the functionality described at Level 3.

Fast-Lock PLL Architecture

Components:

Phase Detector (PD): Compares input and feedback phases
Charge Pump (CP): Converts phase error to control voltage
Loop Filter: Low-pass filter for stability
VCO (Voltage Controlled Oscillator): Generates output clock
Divider: Feedback path with programmable ratio
Lock Detector: Monitors PLL lock status

Key Feature: 16-cycle lock time enables fast frequency switching during NAND command sequences.

▼ Click for circuit-level detail ▼

Command Engine Architecture

Components:

Command FIFO: Queues incoming NAND commands
State Machine: Controls command execution sequence
Timer/Counter: Manages NAND timing requirements
Status Register: Tracks command completion
Arbiter: Manages multi-channel access
Auto-Command Generator: Generates internal commands

Operation: Supports pipelined read-ahead and write commands for maximum throughput.

▼ Click for circuit-level detail ▼

DMA Controller Architecture

Components:

AXI Master: System bus interface (32/64-bit)
Descriptor Engine: Processes transfer descriptors
Channel Arbitration: Multi-channel priority control
FIFO Buffers: Data staging between clock domains
Address Generator: Incremental/scatter-gather addressing
Interrupt Controller: Transfer completion signaling

Modes: Supports both simple transfers and chained descriptor modes for complex scatter-gather operations.

▼ Click for circuit-level detail ▼

TX/RX Equalizer Architecture

Components:

FIR Filter: Finite impulse response for pre-emphasis
Adaptive Engine: Coefficient adjustment based on channel
DFE (Decision Feedback Equalizer): Post-cursor ISI cancellation
CTLE (Continuous Time Linear Equalizer): Frequency compensation
CDR (Clock Data Recovery): Sampling clock extraction
Pattern Detector: Training sequence recognition

Purpose: Compensates for signal degradation in 4-drop multi-chip configurations and long PCB traces.

▼ Click for circuit-level detail ▼

ONFI 4.x NV-DDR3 Interface Timing

Critical Timing Parameters:

Parameter	Symbol	Min	Typ	Max	Unit
Clock Cycle Time	tCK	1.25	-	-	ns
Data Setup Time	tDS	200	-	-	ps
Data Hold Time	tDH	200	-	-	ps
Output Access Time	tAC	-	-	500	ps
ZQ Calibration Time	tZQCAL	-	1	-	ms

Circuit-Level Analog & Digital Design

At the deepest level, we examine the actual transistor-level circuits and analog design that implement the blocks from Level 4. This includes SPICE models, transistor schematics, and layout considerations.

Charge Pump Circuit (PLL)

SPICE Netlist:

* Fast-Lock Charge Pump
.subckt charge_pump UP DN VDD VSS VOUT
* Up current source (PMOS)
M1 VOUT UP_B VDD VDD pmos_3v3 W=10u L=0.18u
M2 UP_B UP VDD VDD pmos_3v3 W=10u L=0.18u
* Down current source (NMOS)
M3 VOUT DN_B VSS VSS nmos_3v3 W=5u L=0.18u
M4 DN_B DN VSS VSS nmos_3v3 W=5u L=0.18u
* Current matching: Iup = 50uA, Idn = 50uA
* Switch resistance < 100 ohms
.ends charge_pump

Function: Converts phase detector output to control voltage for VCO

Design Goals: Matched up/down currents, low switch resistance, fast switching

Process: 12nm CMOS with thick-oxide devices for analog performance

Voltage Controlled Oscillator

Verilog-A Behavioral Model:

// Ring oscillator VCO
module vco_behavioral (
    input real VCTRL,
    output reg CLK
);
    parameter real KVCO = 500e6;  // 500 MHz/V
    parameter real FCENTER = 3.2e9;  // 3.2 GHz center
    parameter real VJITTER = 1e-12;  // 1ps RMS jitter
    
    real freq, period, phase;
    
    always @(VCTRL) begin
        freq = FCENTER + KVCO * (VCTRL - 0.6);
        period = 1.0 / freq;
    end
    
    initial begin
        CLK = 0;
        phase = 0;
        forever #(period/2) CLK = ~CLK;
    end
endmodule

Type: 5-stage differential ring oscillator

Frequency Range: 2.8 - 3.6 GHz (tuning range ~800 MHz)

Phase Noise: -110 dBc/Hz @ 1 MHz offset

High-Speed TX Driver

SystemVerilog RTL + Analog:

// Pre-emphasis TX driver
module tx_driver #(
    parameter NUM_TAPS = 3,
    parameter IMPEDANCE = 50  // Ohms
)(
    input wire clk,
    input wire [7:0] data_in,
    input wire [2:0] pre_emphasis,
    output wire txp, txn
);
    // 3-tap FIR for pre-emphasis
    reg [7:0] tap_delay [0:NUM_TAPS-1];
    wire [9:0] fir_output;
    
    always @(posedge clk) begin
        tap_delay[0] <= data_in;
        tap_delay[1] <= tap_delay[0];
        tap_delay[2] <= tap_delay[1];
    end
    
    // Pre-emphasis coefficients
    assign fir_output = 
        (tap_delay[0] << (3 + pre_emphasis)) +
        (tap_delay[1] << 3) - 
        (tap_delay[2] << pre_emphasis);
    
    // Differential output driver
    // SPICE: ZO = 50 ohms, slew rate > 6V/ns
endmodule

Output: CML differential driver, 400 mV swing

Impedance: 50-ohm On-die termination (ODT)

Slew Rate: > 6 V/ns for 6.4 Gb/s operation

ZQ Impedance Calibration

Verilog Calibration FSM:

// ZQ impedance calibration state machine
module zq_calibration (
    input wire clk,
    input wire start_cal,
    input wire comp_result,  // from comparator
    output reg [5:0] code_p,  // PMOS code
    output reg [5:0] code_n,  // NMOS code
    output reg cal_done
);
    // Target impedance: 50 ohms
    // Binary search algorithm
    reg [3:0] state;
    reg [5:0] search_step;
    
    always @(posedge clk) begin
        case (state)
            IDLE: if (start_cal) state <= CAL_PMOS;
            CAL_PMOS: begin
                if (comp_result)
                    code_p <= code_p - search_step;
                else
                    code_p <= code_p + search_step;
                search_step <= search_step >> 1;
                if (search_step == 0) state <= CAL_NMOS;
            end
            CAL_NMOS: begin
                // Similar for NMOS
                if (search_step == 0) state <= DONE;
            end
            DONE: begin
                cal_done <= 1'b1;
                state <= IDLE;
            end
        endcase
    end
endmodule

Method: Binary search with precision comparator

Accuracy: ±1.5% impedance matching

Calibration Time: 1 ms for full range, 200 μs for tracking

3D NAND Flash Memory Cell

SPICE Model (Simplified):

* 3D NAND cell - floating gate transistor
.subckt nand_cell WL BL SRC
* Control gate (word line)
* Floating gate (stores charge)
* Tunnel oxide: ~7nm
* IPD (inter-poly dielectric): ~10nm

* Programming: Fowler-Nordheim tunneling
*   VWL = 20V, VBL = 0V, VSRC = 0V
*   Electrons tunnel to floating gate
*   Vth increases (0V -> 3V for programmed)

* Erasing: FN tunneling (block erase)
*   VWL = 0V, VBL = float, VSRC = 20V
*   Electrons tunnel from FG to substrate
*   Vth decreases (3V -> -1V for erased)

* Read operation:
*   VWL = 0V (or Vread levels for MLC)
*   VBL = 1V, VSRC = 0V
*   Cell conducts if erased (Vth < 0V)

Mcell BL WL SRC VSS nfet_fg 
+ W=50n L=50n
+ Vth0=0.5 Cox=5e-3 Qfg=0

.model nfet_fg nmos 
+ level=14 
+ tox=7e-9
.ends nand_cell

Type: Charge-trap or floating-gate transistor

3D NAND: Vertical channel, stacked 64-256 layers

Programming: ISPP (Incremental Step Pulse Programming)

Endurance: 3K-100K P/E cycles (depends on SLC/MLC/TLC)

BCH Error Correction Engine

Verilog BCH Encoder:

// BCH(255, 239) encoder - corrects 2 bits
module bch_encoder #(
    parameter N = 255,  // code length
    parameter K = 239,  // data length
    parameter T = 2     // error correction capability
)(
    input wire clk,
    input wire [K-1:0] data_in,
    output reg [N-1:0] codeword_out
);
    // Generator polynomial for BCH(255,239,2)
    // g(x) = x^16 + x^12 + x^5 + 1
    localparam [16:0] GEN_POLY = 17'b10001000000100001;
    
    reg [N-1:0] shift_reg;
    integer i;
    
    always @(posedge clk) begin
        shift_reg <= {data_in, {(N-K){1'b0}}};
        
        // Polynomial division
        for (i = N-1; i >= N-K; i = i - 1) begin
            if (shift_reg[i] == 1'b1)
                shift_reg[i-:17] <= shift_reg[i-:17] ^ GEN_POLY;
        end
        
        codeword_out <= {data_in, shift_reg[N-K-1:0]};
    end
    
    // Syndrome calculation and Berlekamp-Massey 
    // algorithm for decoding (not shown)
endmodule

ECC Type: BCH or LDPC (Low-Density Parity Check)

Capability: Up to 60-100 bits per 1KB for modern NAND

Latency: 1-10 μs depending on error density

Physical Layout Considerations

🔲 Floor Planning

Separate analog (PLL, VCO) from digital (FSM, FIFO)
Guard rings around sensitive analog blocks
Power domains: core (0.9V), I/O (1.8V/1.2V)
Minimize routing distance for high-speed signals

⚡ Signal Integrity

Differential pair routing (100 ohm impedance)
Matched trace lengths (< 50 ps skew)
Via shielding for transitions between layers
Return path planning for high-speed signals

🔋 Power Integrity

On-die decoupling capacitors (MOM/MIM)
Power grid sizing for IR drop < 5%
Separate analog/digital grounds (star topology)
Current density: < 1 mA/μm for signal wires

🌡️ Thermal Management

Hot spot analysis (TX drivers, charge pumps)
Thermal vias to package substrate
Junction temperature: < 125°C
Power density: < 0.5 W/mm²

Design Tools & Verification

Analog Design:

Cadence Virtuoso - Schematic capture & simulation
HSPICE - Transistor-level SPICE simulation
Spectre - RF/mixed-signal simulation
Calibre - Layout vs. Schematic (LVS) & DRC

Digital Design:

Synopsys Design Compiler - RTL synthesis
VCS - Verilog simulation & verification
Verdi - Waveform debugging
Innovus - Place & Route

Signal Integrity:

HFSS - 3D electromagnetic simulation
ADS (Advanced Design System) - High-speed link simulation
Ansys SIwave - Power integrity analysis
HyperLynx - SerDes channel analysis

Verification:

UVM (Universal Verification Methodology)
SystemVerilog assertions & functional coverage
Formal verification tools (JasperGold)
FPGA prototyping (Xilinx/Intel)