SSD System Architecture Overview
This is the highest-level view of a NAND flash-based Solid State Drive (SSD) system.
The FBI (Frequency Boosting Interface) chip bridges the speed gap between the SSD controller
and multiple NAND flash memory dies, enabling higher capacity while maintaining high-speed performance.
SSD System Architecture - Click to explore the FBI chip and NAND interface
▼ Click to descend into detail ▼
🔌 Host Interface
PCIe/NVMe interface connecting to the application processor (AP) or host system.
Supports high-speed serial communication (3-6 Gbps per lane).
🎛️ SSD Controller
Manages flash translation layer (FTL), wear leveling, garbage collection, and error correction.
Controls data flow between host and NAND flash memory.
⚡ FBI Chip
Frequency Boosting Interface chip that multiplies the interface speed (typically 2:1 ratio)
to overcome the bottleneck of multi-chip stacked NAND packages.
💾 NAND Flash Array
Non-volatile memory storage consisting of multiple NAND dies stacked in a multi-chip package (MCP).
Stores actual data in floating-gate transistors.
Key Concepts at This Level
- Performance Bottleneck: As serial host interfaces (PCIe) achieve higher speeds,
the internal NAND flash interface becomes the limiting factor
- Capacity vs. Speed Trade-off: Stacking more NAND dies increases capacity but
adds capacitive loading, degrading signal integrity
- FBI Solution: The FBI chip enables frequency multiplication while managing
the challenges of multi-drop bus topology
FBI Chip & NPHY Interface Protocol
The FBI (Frequency Boosting Interface) chip sits between the SSD controller and NAND packages,
implementing frequency multiplication and signal conditioning. NPHY (NAND PHY) is the physical
layer protocol that defines electrical characteristics and timing for NAND flash communication.
FBI Chip Architecture
The FBI chip performs critical functions:
- 2:1 Frequency Multiplication: Doubles the interface speed between controller and NAND
- Fast-Lock PLL: Phase-locked loop with 16-cycle lock time and extended pull-in range
- Equalization: Compensates for inter-symbol interference (ISI) and reflected noise
- Multi-drop Support: Handles up to 4-drop configurations while maintaining signal integrity
- Low Power Operation: Achieves 2.85 pJ/bit energy efficiency
▼ Click to explore internal components ▼
NPHY Interface Protocol
NPHY defines the physical layer for NAND communication:
- ONFI Standard: Open NAND Flash Interface (ONFI 4.x/5.x) specification
- NV-DDR3: Non-Volatile DDR3 interface with speeds up to 800 MT/s
- Toggle Mode: Alternative high-speed interface (Toggle 2.0)
- Signal Types: Data (DQ), Address/Command (ALE/CLE), Control (CE#, RE#, WE#)
- ZQ Calibration: Impedance calibration for optimal signal integrity
▼ Click to explore protocol details ▼
Technical Specifications
| Parameter |
FBI Side (Controller) |
NAND Side (Memory) |
| Interface Speed |
6.4 Gb/s/pin |
3.2 Gb/s/pin |
| Frequency Ratio |
2:1 multiplication |
Base frequency |
| Energy Efficiency |
2.85 pJ/bit |
Varies by NAND |
| Process Node |
12nm CMOS |
Various (14nm-50nm) |
| Max Drops |
4 NAND packages per channel |
Controller & PHY Component Architecture
At this level, we examine the internal components of both the NAND controller and PHY layer.
The controller manages high-level operations while the PHY handles physical signal transmission.
Main Functional Blocks:
- Command Engine: Processes and queues NAND commands
- DMA Controller: Manages data transfers with 32/64-bit addressing
- ECC Engine: Error correction with configurable BCH/LDPC codes
- FTL (Firmware): Flash Translation Layer mapping
- Buffer Manager: SRAM buffers for data staging
- Interface Logic: AXI/AHB system bus interface
▼ Click for block-level detail ▼
Main Functional Blocks:
- PLL (Phase-Locked Loop): Clock generation and synchronization
- DLL (Delay-Locked Loop): Precise timing alignment
- TX Driver: Transmit buffers with impedance control
- RX Receiver: Input buffers with equalization
- ZQ Calibration: Automatic impedance matching
- DFI Interface: Modified DFI 3.0 for controller connection
▼ Click for block-level detail ▼
Main Functional Blocks:
- Frequency Multiplier: 2:1 ratio frequency conversion
- Fast-Lock PLL: 16-cycle lock, extended pull-in range
- TX/RX Equalizers: Pre-emphasis and de-emphasis
- Data Path Logic: FIFO buffering and flow control
- Jitter Filter: Input jitter reduction
- Power Management: Dynamic voltage/frequency scaling
▼ Click for block-level detail ▼
Main Functional Blocks:
- Memory Array: 3D NAND cell matrix (SLC/MLC/TLC)
- Page Buffer: Row buffer for read/write operations
- Row Decoder: Word line selection logic
- Column Decoder: Bit line selection logic
- Charge Pumps: High-voltage generation for programming
- I/O Interface: Data, command, address I/O pins
▼ Click for block-level detail ▼
Data Flow Through The System
1
Host Command
PCIe/NVMe command from host system
→
2
Controller Processing
FTL translation, ECC encoding, buffering
→
3
FBI Frequency Boost
2x speed increase with signal conditioning
→
4
PHY Transmission
Physical signal to NAND via ONFI/Toggle
→
5
NAND Array
Data programmed/read from flash cells
Block-Level Digital Architecture
This level reveals the detailed block diagrams showing register transfer level (RTL) components,
state machines, data paths, and control logic that implement the functionality described at Level 3.
Fast-Lock PLL Architecture
Components:
- Phase Detector (PD): Compares input and feedback phases
- Charge Pump (CP): Converts phase error to control voltage
- Loop Filter: Low-pass filter for stability
- VCO (Voltage Controlled Oscillator): Generates output clock
- Divider: Feedback path with programmable ratio
- Lock Detector: Monitors PLL lock status
Key Feature: 16-cycle lock time enables fast frequency switching
during NAND command sequences.
▼ Click for circuit-level detail ▼
Command Engine Architecture
Components:
- Command FIFO: Queues incoming NAND commands
- State Machine: Controls command execution sequence
- Timer/Counter: Manages NAND timing requirements
- Status Register: Tracks command completion
- Arbiter: Manages multi-channel access
- Auto-Command Generator: Generates internal commands
Operation: Supports pipelined read-ahead and write commands
for maximum throughput.
▼ Click for circuit-level detail ▼
DMA Controller Architecture
Components:
- AXI Master: System bus interface (32/64-bit)
- Descriptor Engine: Processes transfer descriptors
- Channel Arbitration: Multi-channel priority control
- FIFO Buffers: Data staging between clock domains
- Address Generator: Incremental/scatter-gather addressing
- Interrupt Controller: Transfer completion signaling
Modes: Supports both simple transfers and chained descriptor modes
for complex scatter-gather operations.
▼ Click for circuit-level detail ▼
TX/RX Equalizer Architecture
Components:
- FIR Filter: Finite impulse response for pre-emphasis
- Adaptive Engine: Coefficient adjustment based on channel
- DFE (Decision Feedback Equalizer): Post-cursor ISI cancellation
- CTLE (Continuous Time Linear Equalizer): Frequency compensation
- CDR (Clock Data Recovery): Sampling clock extraction
- Pattern Detector: Training sequence recognition
Purpose: Compensates for signal degradation in 4-drop multi-chip
configurations and long PCB traces.
▼ Click for circuit-level detail ▼
ONFI 4.x NV-DDR3 Interface Timing
Critical Timing Parameters:
| Parameter |
Symbol |
Min |
Typ |
Max |
Unit |
| Clock Cycle Time |
tCK |
1.25 |
- |
- |
ns |
| Data Setup Time |
tDS |
200 |
- |
- |
ps |
| Data Hold Time |
tDH |
200 |
- |
- |
ps |
| Output Access Time |
tAC |
- |
- |
500 |
ps |
| ZQ Calibration Time |
tZQCAL |
- |
1 |
- |
ms |
Circuit-Level Analog & Digital Design
At the deepest level, we examine the actual transistor-level circuits and analog design
that implement the blocks from Level 4. This includes SPICE models, transistor schematics,
and layout considerations.
Charge Pump Circuit (PLL)
SPICE Netlist:
* Fast-Lock Charge Pump
.subckt charge_pump UP DN VDD VSS VOUT
* Up current source (PMOS)
M1 VOUT UP_B VDD VDD pmos_3v3 W=10u L=0.18u
M2 UP_B UP VDD VDD pmos_3v3 W=10u L=0.18u
* Down current source (NMOS)
M3 VOUT DN_B VSS VSS nmos_3v3 W=5u L=0.18u
M4 DN_B DN VSS VSS nmos_3v3 W=5u L=0.18u
* Current matching: Iup = 50uA, Idn = 50uA
* Switch resistance < 100 ohms
.ends charge_pump
Function: Converts phase detector output to control voltage for VCO
Design Goals: Matched up/down currents, low switch resistance, fast switching
Process: 12nm CMOS with thick-oxide devices for analog performance
Voltage Controlled Oscillator
Verilog-A Behavioral Model:
// Ring oscillator VCO
module vco_behavioral (
input real VCTRL,
output reg CLK
);
parameter real KVCO = 500e6; // 500 MHz/V
parameter real FCENTER = 3.2e9; // 3.2 GHz center
parameter real VJITTER = 1e-12; // 1ps RMS jitter
real freq, period, phase;
always @(VCTRL) begin
freq = FCENTER + KVCO * (VCTRL - 0.6);
period = 1.0 / freq;
end
initial begin
CLK = 0;
phase = 0;
forever #(period/2) CLK = ~CLK;
end
endmodule
Type: 5-stage differential ring oscillator
Frequency Range: 2.8 - 3.6 GHz (tuning range ~800 MHz)
Phase Noise: -110 dBc/Hz @ 1 MHz offset
High-Speed TX Driver
SystemVerilog RTL + Analog:
// Pre-emphasis TX driver
module tx_driver #(
parameter NUM_TAPS = 3,
parameter IMPEDANCE = 50 // Ohms
)(
input wire clk,
input wire [7:0] data_in,
input wire [2:0] pre_emphasis,
output wire txp, txn
);
// 3-tap FIR for pre-emphasis
reg [7:0] tap_delay [0:NUM_TAPS-1];
wire [9:0] fir_output;
always @(posedge clk) begin
tap_delay[0] <= data_in;
tap_delay[1] <= tap_delay[0];
tap_delay[2] <= tap_delay[1];
end
// Pre-emphasis coefficients
assign fir_output =
(tap_delay[0] << (3 + pre_emphasis)) +
(tap_delay[1] << 3) -
(tap_delay[2] << pre_emphasis);
// Differential output driver
// SPICE: ZO = 50 ohms, slew rate > 6V/ns
endmodule
Output: CML differential driver, 400 mV swing
Impedance: 50-ohm On-die termination (ODT)
Slew Rate: > 6 V/ns for 6.4 Gb/s operation
ZQ Impedance Calibration
Verilog Calibration FSM:
// ZQ impedance calibration state machine
module zq_calibration (
input wire clk,
input wire start_cal,
input wire comp_result, // from comparator
output reg [5:0] code_p, // PMOS code
output reg [5:0] code_n, // NMOS code
output reg cal_done
);
// Target impedance: 50 ohms
// Binary search algorithm
reg [3:0] state;
reg [5:0] search_step;
always @(posedge clk) begin
case (state)
IDLE: if (start_cal) state <= CAL_PMOS;
CAL_PMOS: begin
if (comp_result)
code_p <= code_p - search_step;
else
code_p <= code_p + search_step;
search_step <= search_step >> 1;
if (search_step == 0) state <= CAL_NMOS;
end
CAL_NMOS: begin
// Similar for NMOS
if (search_step == 0) state <= DONE;
end
DONE: begin
cal_done <= 1'b1;
state <= IDLE;
end
endcase
end
endmodule
Method: Binary search with precision comparator
Accuracy: ±1.5% impedance matching
Calibration Time: 1 ms for full range, 200 μs for tracking
3D NAND Flash Memory Cell
SPICE Model (Simplified):
* 3D NAND cell - floating gate transistor
.subckt nand_cell WL BL SRC
* Control gate (word line)
* Floating gate (stores charge)
* Tunnel oxide: ~7nm
* IPD (inter-poly dielectric): ~10nm
* Programming: Fowler-Nordheim tunneling
* VWL = 20V, VBL = 0V, VSRC = 0V
* Electrons tunnel to floating gate
* Vth increases (0V -> 3V for programmed)
* Erasing: FN tunneling (block erase)
* VWL = 0V, VBL = float, VSRC = 20V
* Electrons tunnel from FG to substrate
* Vth decreases (3V -> -1V for erased)
* Read operation:
* VWL = 0V (or Vread levels for MLC)
* VBL = 1V, VSRC = 0V
* Cell conducts if erased (Vth < 0V)
Mcell BL WL SRC VSS nfet_fg
+ W=50n L=50n
+ Vth0=0.5 Cox=5e-3 Qfg=0
.model nfet_fg nmos
+ level=14
+ tox=7e-9
.ends nand_cell
Type: Charge-trap or floating-gate transistor
3D NAND: Vertical channel, stacked 64-256 layers
Programming: ISPP (Incremental Step Pulse Programming)
Endurance: 3K-100K P/E cycles (depends on SLC/MLC/TLC)
BCH Error Correction Engine
Verilog BCH Encoder:
// BCH(255, 239) encoder - corrects 2 bits
module bch_encoder #(
parameter N = 255, // code length
parameter K = 239, // data length
parameter T = 2 // error correction capability
)(
input wire clk,
input wire [K-1:0] data_in,
output reg [N-1:0] codeword_out
);
// Generator polynomial for BCH(255,239,2)
// g(x) = x^16 + x^12 + x^5 + 1
localparam [16:0] GEN_POLY = 17'b10001000000100001;
reg [N-1:0] shift_reg;
integer i;
always @(posedge clk) begin
shift_reg <= {data_in, {(N-K){1'b0}}};
// Polynomial division
for (i = N-1; i >= N-K; i = i - 1) begin
if (shift_reg[i] == 1'b1)
shift_reg[i-:17] <= shift_reg[i-:17] ^ GEN_POLY;
end
codeword_out <= {data_in, shift_reg[N-K-1:0]};
end
// Syndrome calculation and Berlekamp-Massey
// algorithm for decoding (not shown)
endmodule
ECC Type: BCH or LDPC (Low-Density Parity Check)
Capability: Up to 60-100 bits per 1KB for modern NAND
Latency: 1-10 μs depending on error density
Physical Layout Considerations
🔲 Floor Planning
- Separate analog (PLL, VCO) from digital (FSM, FIFO)
- Guard rings around sensitive analog blocks
- Power domains: core (0.9V), I/O (1.8V/1.2V)
- Minimize routing distance for high-speed signals
⚡ Signal Integrity
- Differential pair routing (100 ohm impedance)
- Matched trace lengths (< 50 ps skew)
- Via shielding for transitions between layers
- Return path planning for high-speed signals
🔋 Power Integrity
- On-die decoupling capacitors (MOM/MIM)
- Power grid sizing for IR drop < 5%
- Separate analog/digital grounds (star topology)
- Current density: < 1 mA/μm for signal wires
🌡️ Thermal Management
- Hot spot analysis (TX drivers, charge pumps)
- Thermal vias to package substrate
- Junction temperature: < 125°C
- Power density: < 0.5 W/mm²