Zum Inhalt springen

Warenkorb

Dein Warenkorb ist leer

Weiter einkaufen
Driving Micro OLED Displays | MIPI Interface, MCU Requirements & Frame Rate
28. Feb 202626 Min. Lesezeit

Driving Micro OLED Displays | MIPI Interface, MCU Requirements & Frame Rate

Driving a Micro OLED requires a 4-lane MIPI DSI interface supporting 1080P and a high frame rate of 90Hz, with a lane bandwidth reaching 1Gbps.

The MCU must integrate hardware MIPI (such as the STM32H7).

During operation, external PSRAM is required as video memory, and DMA must be enabled to directly transfer image data to the DSI peripheral to ensure smooth, stutter-free performance.

MIPI Interface

Under the D-PHY 1.2 specification, the single-lane rate reaches 2.5 Gbps, and 4 lanes output a total bandwidth of 10 Gbps, supporting 1920×1080 resolution, 90Hz, and 24-bit RGB888 signals.

The signal level uses a 200mV low-voltage differential swing, with transmission power consumption at the milliwatt level.

This interface reduces the number of physical pins from more than 20 in early buses to 10, complying with the strict physical limitations on motherboard area for wearable devices.

Physical Lane Calculation

Taking a monocular 2560×2560 resolution panel as an example, actual transmission cannot only calculate the effective display area. Horizontal blanking (HBP, HFP) and vertical blanking (VBP, VFP) intervals must be added to the signal, which generates approximately 20% additional pixel overhead in total.

After adding the blanking areas to the resolution, the equivalent horizontal pixels reach 3000, and the equivalent vertical pixels reach 2600. When the refresh rate is set to 90Hz, the total number of pixels to be processed per second is 702 million (702 MHz pixel clock). Using standard 24-bit RGB888 color depth transmission, the base data rate needs to reach 16.84 Gbps.

If the head-mounted display device is upgraded to a 120Hz refresh rate, the same 2.5K panel will generate a throughput requirement of 936 million pixels. If the HDR10 format is enabled and the color depth is increased to 30-bit (RGB101010), the uncompressed video stream bandwidth will soar to 28.08 Gbps.

In response to the above bandwidth requirements, the hardware selection phase needs to match different MIPI D-PHY physical layer versions and numbers of lanes:

  • D-PHY v1.1: Maximum 1.5 Gbps per lane, providing 6 Gbps across 4 lanes.

  • D-PHY v1.2: Maximum 2.5 Gbps per lane, providing 10 Gbps across 4 lanes.

  • D-PHY v2.1: Maximum 4.5 Gbps per lane, providing 18 Gbps across 4 lanes.

  • D-PHY v3.0: Introduces PAM4 encoding, reaching 9 Gbps per lane.

Mainstream MCUs (such as the NXP i.MX 8 series) are typically only equipped with a D-PHY v1.2 interface. Facing a 16.84 Gbps 2.5K/90Hz panel, the theoretical upper limit of 10 Gbps provided by 4 lanes has a serious shortfall. The protocol layer has about a 20% packet header and checksum overhead, so the actual available payload is only around 8 Gbps.

To bridge the bandwidth gap, VESA DSC (Display Stream Compression) technology needs to be introduced. The DSC algorithm can compress data under the premise of visual losslessness. By setting the DSC compression ratio to 3:1, the original 16.84 Gbps raw data stream will be reduced to 5.61 Gbps, which can be carried by a 4-lane D-PHY v1.2.

The setting of the compression ratio affects the decoding latency at the panel end. Under a 2:1 compression ratio, the decompression module built into the microdisplay typically consumes 50% of a single-line pixel scan time to restore the image. If the compression ratio is increased to 3:1, the decoding time will increase by 15 microseconds, requiring sufficient margin in the vertical sync signal design.

When pursuing monocular 4K (3840×3840) and not wanting to sacrifice image quality, the number of physical pins in the D-PHY architecture becomes a system burden. An 8-lane D-PHY requires 20 physical signal lines, which doubles the routing density on the motherboard and easily causes crosstalk.

The C-PHY specification changes the signal modulation method at the physical layer, improving pin utilization through three-phase symbol encoding:

  • No independent clock line; the clock is embedded in the data stream.

  • Every 3 wires form a Trio (lane group).

  • Transmits 3.02 bits of data per transmission.

  • Standard configuration is 3 Trios, totaling 9 pins.

In the C-PHY v1.2 specification, the operating rate of a single Trio is 3.0 Gsps. After encoding conversion, the actual bit rate of a single lane reaches 6.84 Gbps. 3 Trios operating at full load can provide a total bandwidth of 20.52 Gbps, surpassing the throughput capacity of D-PHY's 4 lanes (10 wires) with just 9 wires.

When the bandwidth requirement is only 5 Gbps, if 4 lanes of D-PHY are used, each lane needs to run at 1.25 Gbps. The MIPI specification states that the energy consumption per bit of D-PHY in the 800 Mbps to 1.5 Gbps range is approximately 1.5 to 2.0 picojoules (pJ/bit).

Switching to a low-power design under the same 5 Gbps load, 2 lanes can be turned off, leaving the remaining 2 lanes running at 2.5 Gbps. At this time, the signal swing remains at 200mV, but the dynamic power consumption caused by high-frequency switching will increase by 30%. The master controller needs to find a balance between the number of lanes and the single-lane frequency based on the battery discharge curve.

The PLL lock time of the panel's built-in receiver is interfered with by the lane configuration. MIPI D-PHY requires that when switching from LP (Low Power) to HS (High Speed), a T-HS-PREPARE time of at least 85 to 145 nanoseconds must be reserved. The wake-up time difference between lanes must be strictly aligned.

If the four data lanes have an offset of more than 20% UI (Unit Interval), the display controller will discard the entire data packet. At a rate of 2.5 Gbps, 1 UI is only 400 picoseconds, and 20% is an error margin of 80 picoseconds. The registers need to accurately set the Skew calibration parameters for each lane.

Command packets at the protocol layer also occupy bandwidth resources. When calculating the limit load, the issue frequency of the DCS (Display Command Set) needs to be taken into account:

  • Brightness adjustment command: Usually a Short Packet, occupying 4 bytes.

  • Gamma table issuing: Belongs to Long Packet, which may exceed a hundred bytes.

  • TE synchronization: Uses Blanking time to return status.

  • Sleep wake-up: Occupies a handshake cycle of about 120 microseconds.

Within the 11.1-millisecond cycle of refreshing a 90Hz image, if high-frequency adjustment of local brightness is required, the DCS packet will crowd out the gap in the video pipeline. Hardware engineers usually allocate the BLLP (Blanking or Low Power) period of the video blanking period for control command transmission to avoid triggering a frame rate drop.

Selecting Operation Modes

Whether the Display Driver IC (DDIC) of the microdisplay integrates SRAM (Static Random Access Memory) internally dominates the interface output strategy of the host SoC. Calculated at a specification of 1920×1080 resolution and 24-bit color depth, a single-frame image requires about 6.22 MB of physical video memory capacity. Implanting a massive storage unit into a 0.39-inch silicon backplane will exponentially increase the manufacturing cost and area of the chip.

The mechanism that relies on the DDIC's internal video memory to operate is called Command Mode. The host chip sends a Type 0x39 Long Packet through the MIPI bus to write pixel data into the panel's GRAM. Under a static UI screen, the SoC's DSI transmitting end can enter ULPS (Ultra-Low Power State), and the physical layer power consumption can drop to below 2 mW.

At this time, the panel relies on its internal clock to read data from GRAM at a frequency of 60Hz or 90Hz and self-refresh the display. To avoid screen tearing, the microdisplay sends a pulse signal to the SoC through a dedicated TE (Tearing Effect) physical pin. Within 2 milliseconds after the SoC receives the TE rising edge, it begins to write the changing area of the next frame to the GRAM.

The operation of only updating local areas can greatly reduce the bus load. If only a 200×200 pixel cursor moves in the current frame, the host only needs to send a 120 KB data packet. Transmitting at a single-lane rate of 500 Mbps, this local update process takes only about 1.92 milliseconds, and the bus goes completely to sleep for the remaining 14 milliseconds of the frame cycle.

The head-tracking algorithms of AR/VR headsets require extremely low Motion-to-Photon latency, and the secondary read/write operations brought by the GRAM architecture will add about 5 to 11 milliseconds of image lag. After removing the video memory, the panel must receive a real-time pixel stream without interruption, and the system is forced to switch to Video Mode operation.

In Video Mode, the host must accurately generate sync signals that comply with VESA timing standards. A 1920×1080 frame is deconstructed on the bus into active pixel areas and blanking areas. Horizontal blanking includes HBP (Back Porch), HFP (Front Porch), and HSA (Sync Pulse), typically occupying 15% to 20% of the line cycle.

Operation Mode Category VRAM Integrated Typical End-to-End Latency MIPI Link Sleep Ratio Applicable Refresh Rate Range
Command Mode 6MB~12MB 12ms ~ 20ms 60% ~ 90% 30Hz ~ 60Hz
Non-Burst Video No GRAM 2ms ~ 4ms 0% 60Hz ~ 90Hz
Burst Video Minimal Line Buffer 2ms ~ 4ms 15% ~ 30% 90Hz ~ 120Hz

To squeeze out energy-saving space in a memory-less architecture, the MIPI protocol defines Burst video stream transmission. This mechanism "compresses" and sends a single line of 1920 pixels in a shorter time by increasing the bus clock frequency.

Taking a D-PHY 4-lane full-load 1.5 Gbps as an example, a conventional Non-Burst mode may take 10 microseconds to finish sending a line of data. After enabling Burst, the data packet is transmitted within 7 microseconds, releasing a 3-microsecond time gap. The host will issue commands in this time slot, causing the D-PHY to switch from the 200mV high-speed differential level to the 1.2V low-frequency single-ended LP (Low Power) state.

Frequent switching between high-speed and low-power states introduces timing jitter at the physical layer. The D-PHY specification requires that switching from the LP-11 state to the HS-0 state must go through the transition sequences of LP-01 and LP-00. The complete wake-up process takes about 85 nanoseconds; if the duration of the Blanking area is less than 100 nanoseconds, the link will be unable to complete the state reset, causing image desynchronization.

The setting of the number of lines in the vertical blanking areas (VBP and VFP) directly affects the brightness performance of the Micro OLED. At a 120Hz refresh rate, the total duration of a single frame is compressed to 8.33 milliseconds. If VBP is set to 12 lines and VFP is set to 8 lines, combined with a 2-microsecond line cycle, the time left for the panel's illuminating pixels (OLED Emission) is further squeezed.

Hardware engineers often implement a Rolling Illumination strategy by expanding the vertical blanking time (V-Blanking). By stretching VBP to account for 10% of the total frame time (about 0.83 milliseconds), during this "black field" time with no data transmission, the screen pixels are completely extinguished, which can significantly alleviate the persistence of vision (ghosting) phenomenon generated when the human eye turns its head.

For a monocular 4K (3840×3840) microdisplay with a refresh rate locked at 90Hz, even if it cuts into Video Mode and turns on Burst, the bus load still exceeds 20 Gbps. The host must invoke VESA DSC compression encoding, dividing the raw data of each line into 4 Slices for parallel computing.

The DSC algorithm introduces a fixed latency of 0.5 lines at the encoding end. For a 3840-width panel, each Slice processes 960 pixels. The host takes less than 2 microseconds to complete the compression of 4 Slices at the transmitting end. This requires the MIPI bus to have microsecond-level packet scheduling precision to prevent the Line Buffer at the receiving end from experiencing an underload underflow.

Signal Integrity Specifications

In the 2.5 Gbps high-speed mode of MIPI D-PHY, signal rise and fall times are typically compressed to within 150 picoseconds (ps). High-frequency square waves contain a large number of odd harmonics, so motherboard routing must treat them as microwave transmission lines. Any impedance discontinuity will trigger signal reflections, weakening the 200mV differential voltage swing at the receiving end.

Lamination deviations and etching undercut at PCB manufacturing plants often cause the actual line width to fluctuate by ±10%.

When designing the physical layer routing, the nominal differential impedance is specified as 100 ohms, and the single-ended impedance is 50 ohms. The routing needs to use a microstrip line or stripline structure and keep the reference ground plane intact. When crossing partitioned areas of different power planes, the return path is blocked, and the loop inductance will instantly jump to the nanohenry (nH) level.

To control timing Skew, hardware specifications set extremely strict digital limits on routing length matching:

  • Intra-pair length matching: The length difference between the positive data (DP) and negative data (DN) is limited to within 5 mil (0.127 mm).

  • Inter-pair length matching: The length difference between each data lane and the clock lane must not exceed 10 mil (0.254 mm).

  • Phase compensation: Where meandering occurs, serpentine routing must be added within 15 mm of the mismatch point.

When 4 data lanes and 1 clock lane run in parallel on the motherboard, Near-End Crosstalk (NEXT) between lanes will raise the receiver's noise floor. Trace spacing must strictly follow routing principles, meaning the edge distance between two differential pairs should be greater than 3 times the width of a single trace, suppressing the crosstalk amplitude to below -40 dB.

Microdisplay modules are typically connected to the motherboard via an FPC (Flexible Printed Circuit).

The FPC substrate uses polyimide (PI), which has a dielectric constant (Dk) of around 3.4, a physical difference from the 4.2 to 4.6 of a standard FR4 rigid board. At the B2B connector location where the signal transitions from the rigid board to the flex board, impedance is highly prone to a sudden change of more than 15 ohms. Engineers must apply an anti-pad treatment at the connector pads to reduce parasitic capacitance by 1.2 pF.

Connector pin assignments must use an alternating "Signal-Ground-Signal-Ground" (S-G-S-G) layout. In micro-connectors with a 0.4 mm pitch, inserting ground pins can provide the shortest high-frequency return path. Without isolated ground wires, the isolation of adjacent lanes at a 2.5 GHz fundamental frequency will degrade to the danger zone of -20 dB.

Layer-changing vias are another major source of impedance cliffs. Specifications dictate that from the host SoC pins to the microdisplay connector, the number of vias for each signal line must not exceed 2. A via with a 10 mil diameter introduces about 0.5 pF of parasitic capacitance and 1.2 nH of parasitic inductance, exacerbating the low-pass filtering effect of the transmission line.

  • Via Stub: The excess copper pillar extending beyond the actual signal transmission layer will create an antenna effect.

  • Stub length limit: When the signal rate reaches 1.5 Gbps or above, the stub length must be controlled within 20 mil.

  • Backdrill process: Using a secondary drilling technique to remove excess copper foil can restore about 10% of the eye diagram opening.

The Eye Diagram captured by testing equipment is the ultimate physical metric for determining link quality.

The MIPI D-PHY receiver requires an eye height of at least 70 mV and an eye width occupying 0.2 UI (Unit Interval) under a BER (Bit Error Rate) condition of 1E-12. At a 1.5 Gbps rate, 1 UI is 666 picoseconds, and the effective time window left for the channel is only 133 picoseconds. Jitter exceeding this range will cause the receiver's decoder to generate bit errors.

To address eye diagram closure issues, the host chip's transmitter (TX) is typically configured with a Pre-emphasis register. By boosting the amplitude of the first bit of a transition by 2.5 dB to 3.5 dB, it compensates for the dielectric loss (Df) of the FR4 board on high-frequency components. After long-distance attenuation, the signal arriving at the receiver (RX) can still maintain a complete waveform profile.

Low Power (LP) mode operates at a 1.2V single-ended level with a frequency of only 10 MHz to 20 MHz, making it unaffected by severe dielectric loss. However, at the instant of high-frequency switching between LP and HS (High Speed) states, transient current spikes (di/dt) are generated on the power network. If the bypass decoupling capacitors are insufficient, the VDD power rail will generate a ripple exceeding 50 mV.

  • Decoupling capacitor configuration: Place a combination of 0.1 μF and 1 nF ceramic capacitors within 1 mm of the SoC's MIPI power supply pins.

  • Resonant frequency: The 1 nF small capacitor is responsible for filtering out high-frequency noise above 2 GHz.

  • Packaging inductance: Choose an 0201 or smaller package size to reduce the capacitor's Equivalent Series Inductance (ESL) to below 0.4 nH.

Wearable devices have confined internal spaces, and RF antennas often physically overlap with video data lines. The common-mode radiation when MIPI D-PHY is running will precisely fall into the 2.4 GHz Wi-Fi band. Engineers must cover the surface of the FPC with a layer of conductive silver paste or PC (polycarbonate) absorbing film to provide at least 60 dB of electromagnetic shielding effectiveness.

MCU Requirements

When selecting hardware for 1080p@90Hz Micro OLEDs, the host chip must be equipped with a 4-lane MIPI D-PHY (single-lane rate > 1.2 Gbps).

A single frame of an RGB888 image reaches 6.2MB, which cannot fit into the MCU's internal SRAM, requiring external Octal SPI PSRAM or 32-bit SDRAM with a bus throughput of > 400MB/s.

It is recommended to use an ARM Cortex-M7 processor with a main frequency above 400MHz, equipped with a 2D DMA graphics accelerator.

Data Interface Specifications

Evaluating the refresh capability of a headset starts with the pixel clock. Taking a monocular 1920x1080 resolution Micro OLED as an example, a single frame's data is not just the product of horizontal and vertical pixels. The horizontal direction has a back porch, front porch, and sync width, as does the vertical direction, typically adding 20% overhead for blanking areas.

After adding the active pixels of a full frame to the blanking area, the total number of pixels approaches the scale of 1920x1200. Running the screen at a full 90Hz refresh rate with 24-bit color depth, the raw throughput will soar to approximately 4.97 Gbps. Microcontrollers on the market with a main frequency below 200MHz, whose SPI ports peak at only 50Mbps, are completely out of the running.

Choose crossover MCUs with native MIPI DSI interfaces, such as the NXP i.MX RT1170 or STM32H75x series. When checking the chip manual, look for the D-PHY physical layer specification, which needs to be version v1.1 or v1.2. The physical layer lane limit of older MCUs is stuck at 1 Gbps per lane, which cannot drive high-refresh-rate, high-resolution screens.

Including the packet overhead of the MIPI protocol, the physical link needs to reserve 20% extra bandwidth. A 4.97 Gbps payload requires laying out nearly 6 Gbps of total capacity at the physical layer. A qualified MCU hardware must be equipped with 4 Data Lanes and 1 Clock Lane.

Averaged out, each data line must run at 1.5 Gbps in high-speed mode. D-PHY uses Double Data Rate transmission, so the corresponding Clock Lane frequency must reach 750 MHz. On the PCB routing, the impedance of the 5 pairs of differential signal lines must be strictly controlled at 100 ohms, with an error margin not exceeding 10%.

When configuring the hardware registers of the display controller inside the MCU, you need to set the packet push method. Video Mode includes three optional mechanisms:

  • Non-Burst Mode with Sync Pulses

  • Non-Burst Mode with Sync Events

  • Burst Mode

Most high-resolution Micro OLEDs do not have independent graphics memory and do not support Command Mode. The MCU must continuously push images over like an assembly line. Burst mode is the most power-efficient; it allows the MIPI interface to switch back to a low-power standby state within an extremely short time after transmitting a line of 1920 pixels.

Switching from low-power back to high-speed mode requires the MCU's physical layer to have extremely precise nanosecond-level timing control. If the T_HS-PREPARE parameter is less than 40 nanoseconds plus 4 UI (Unit Intervals), the screen receiver will miss the transmission start packet, and a full horizontal flashing line will immediately appear on the screen.

Developers need to tweak the following picosecond-level timing parameter configurations in the MCU initialization code using an oscilloscope:

  • T_LPX: Low-power state transition time, must be greater than 50ns

  • T_HS-ZERO: High-speed mode zero-data wait, must be greater than 105ns

  • T_HS-TRAIL: Data trailing hold time, must be less than 105ns

  • CLK-PRE: Clock setup, maintain at least 8 UI duration

The pixel bus operating clock inside the MCU and the peripheral MIPI clock are asynchronous. For example, the LCD controller runs at 150MHz, and the MIPI PHY runs at 750MHz. When processing across clock domains, if the internal FIFO depth is less than 256 bytes, a data underflow will occur at the moment of level transition.

An underflow will instantly cause a black or corrupted screen. Checking the manufacturer's errata sheet is essential; on some Cortex-M7 chips, when internal AXI matrix arbitration is enabled, if pixels are transferred to the MIPI FIFO at a maximum of 1.5 Gbps, the bus will experience latency fluctuations.

To solve bus congestion, blanking periods must be used to send empty packets. After sending a line of 1920 pixels, take advantage of the few microseconds of the horizontal front porch to have the MCU send a blanking packet with the code 0x09. This artificially frees up a gap for the internal system bus, giving the external SDRAM time to prepare the data for the next line.

Electromagnetic interference is very severe on a 750MHz Clock Lane and can easily interfere with RF modules on the device. Some advanced MCUs feature a built-in Spread Spectrum Clock Generator function. Enabling the register's spread spectrum bit allows the clock frequency to undergo a 1.5% downward-biased modulation over a 30kHz cycle.

Relying solely on 4 fully loaded MIPI lanes to brute-force 4.97 Gbps of data generates immense chip heat. High-end application processors utilize DSC 1.2a hardware modules. It can perform a 3:1 compression of the raw data without any visual loss, thereby reducing power consumption.

Since Cortex-M series microcontrollers generally do not integrate a DSC hardware encoder, we have to brute-force the 5 Gbps traffic using uncompressed methods. Without the assistance of compression algorithms, the clock stability of the MIPI physical layer must be extremely high, with jitter strictly less than 0.15 UI.

Once clock jitter exceeds the threshold, the bit error rate at the receiving end will spike from 10 to the power of negative 12. In terms of board-level power supply design, 0.1uF and 1nF low-ESR ceramic capacitors must be placed closely next to the MCU's MIPI power supply pins to filter out high-frequency clutter.

Inconsistent trace lengths will ruin microsecond-level physical layer timing. The absolute length difference of the 4 pairs of data lines and 1 pair of clock lines on the PCB must be controlled within 50 mil (1.27 mm). The number of vias per line should not exceed 2 to avoid signal reflection caused by impedance mutations.

When the system enters a standby state, the MCU must switch the entire MIPI link into Escape Mode. At this point, the clock frequency plummets below 20MHz, and the bus enters ultra-low power standby. It relies solely on a minuscule amount of sustaining capacitors built into the screen to keep the AMOLED backplane pixel circuits from losing power.

Video Memory Selection Calculation

In full precision (FP16/BF16) mode, each parameter occupies 2 bytes (Bytes), so the weights of the Llama-3-70B model alone will occupy 70 × 2 = 140 GB of VRAM space. Because the current capacity limit of a single NVIDIA H100 graphics card is 80GB, deploying a model of this scale requires multi-card parallelism or the adoption of 4-bit quantization technology.

4-bit quantization reduces the weight VRAM footprint of a 70B model to between approximately 35GB and 40GB by compressing the weights down to 0.5 bytes. Even if the weights can be loaded, the VRAM calculation still needs to account for the inference load when the model runs, which is the dynamic growth of the KV Cache. The footprint of the KV Cache is collectively determined by the sequence length, batch size, number of model layers, and attention head dimension.

Taking the Llama-3-8B model as an example, it has 32 layers and 32 attention heads, with each head having a dimension of 128. Under FP16 precision, each token consumes about 0.5MB of VRAM. When processing 8192 context tokens with a Batch Size of 1, the cache part alone will swallow up an extra 4GB or so of VRAM.

Model Scale Parameter Precision Weight Footprint (GB) 8k Context KV Cache (Batch 1) Recommended Total VRAM (GB)
7B FP16 14.0 0.5 GB 20 GB
7B INT4 3.5 0.5 GB 8 GB
70B FP16 140.0 4.5 GB 160+ GB
70B INT4 35.0 4.5 GB 48 GB

VRAM pressure in high-concurrency scenarios will expand linearly with the increase in Batch Size. When the Batch Size is increased to 32, the cache demand of the above 8B model at an 8k length will leap to 128GB, exceeding the physical limits of the vast majority of single-card devices. At this point, PagedAttention technology needs to be introduced to manage VRAM fragmentation, thereby increasing effective utilization from 60% to over 90% without adding physical VRAM.

The VRAM selection logic during the training phase is entirely different from inference because it needs to store gradients and optimizer states. When training in FP32 using the Adam optimizer, each parameter corresponds to a 4-byte weight, a 4-byte gradient, and an 8-byte optimizer state.

Activation VRAM occupancy is positively correlated with model depth and input data volume. During backpropagation, to calculate gradients, the output of every layer from the forward pass must be retained in VRAM. Adopting Activation Checkpointing technology can sacrifice about 33% of computational throughput in exchange for substantial VRAM savings; this technique stores activation values of only a fraction of the layers, with the remaining layers regenerated via recalculation when needed.

VRAM layout in multi-node, multi-card environments involves Tensor Parallelism and Pipeline Parallelism. In TP2 mode, the 140GB weights of a 70B model are evenly sliced across two cards, with each card shouldering 70GB of weights. To support high-speed data exchange between cards, NVLink provides a unidirectional bandwidth of 450GB/s, compressing cross-card access latency to the microsecond level.

VRAM Bandwidth is another metric used to measure performance. The bandwidth of an NVIDIA RTX 4090 is 1TB/s, while the H100 reaches 3.35TB/s. During inference, every Token generated by the model requires reading the complete set of weights once. If VRAM bandwidth is insufficient, even with a massive VRAM capacity, inference speed will be constrained, causing a bottleneck in the number of words generated per second.

GPU Model VRAM Type VRAM Capacity (GB) Bandwidth (TB/s) Compute TFLOPS (FP16)
RTX 4090 GDDR6X 24 1.0 82.6
A100 HBM2e 80 2.0 312
H100 HBM3 80 3.35 989
B200 HBM3e 192 8.0 2250

As models enter the era of tens or even hundreds of billions of parameters, the MoE (Mixture of Experts) architecture has shifted VRAM distribution. Although the total parameter count of Mixtral 8x7B is close to 47B, only a fraction of the expert parameters are active during inference.

To address the issue of insufficient VRAM, CPU Offloading offers an alternative solution. By temporarily storing part of the weights in system RAM via a PCIe 5.0 interface, it's possible to run models that exceed GPU capacity. The bidirectional bandwidth of PCIe 5.0 x16 is 128GB/s; although far lower than the internal HBM bandwidth of VRAM, this solution can drastically reduce operating costs in large batch processing or non-real-time tasks.

In selection calculations, an additional ~10% of VRAM must be reserved as system redundancy, to handle CUDA Context startup, PyTorch operator caching, and temporary tensor transformations. If the calculated total footprint is 78GB, it will easily trigger an Out of Memory (OOM) error on an 80GB graphics card.

Frame Rate

To limit the MTP (Motion-to-Photon) latency to within 20 milliseconds to prevent motion sickness, Micro OLEDs require a refresh rate of 90Hz or 120Hz.

Calculated at a 1920x1080 resolution and 24-bit color depth, a 120Hz frame rate corresponds to a video data throughput of approximately 5.97 Gbps.

This amount of data phases out traditional SPI communication, forcing the motherboard MCU to possess a 4-lane MIPI DSI interface and rely on high-speed DMA to handle VRAM reads.

Visual Anti-Motion Sickness

The vestibular system in the human ear detects head rotation at a speed of approximately 1 millisecond. When the visual changes seen by the eye through a screen lag behind the physical perception of the vestibular system, the brain experiences a sensory conflict. Medical research benchmarks indicate that when visual frame latency exceeds 20 milliseconds, the majority of wearers will develop vestibular nerve dysfunction within 5 minutes, experiencing symptoms of sweating and nausea.

In engineering, the time from when a sensor records motion to when the screen illuminates pixels is termed Motion-to-Photon (MTP) latency. The motherboard of a headset device must compress the time taken by the complete MTP signal link to within the 20-millisecond safety line.

The first stop on the physical link is the Inertial Measurement Unit (IMU). Gyroscopes and accelerometers sample at frequencies of 1000Hz or even 2000Hz, and the time consumed for a single data acquisition and attitude calculation is strictly controlled between 1 millisecond and 1.5 milliseconds.

The remaining 18.5-millisecond budget is entirely allocated to graphics rendering and screen refresh. If a Micro OLED uses a 60Hz frame rate, the physical dwell time of a single frame is as long as 16.66 milliseconds. The main chip (SoC) is left with less than 2 milliseconds to generate an image totaling 4 million pixels for both eyes, an impossible feat for current mobile silicon processes.

By boosting the physical frame rate to 90Hz, the single-frame cycle is shortened to 11.11 milliseconds. The rendering pipeline gains a processing margin of 7 to 8 milliseconds. Devices like the Meta Quest have long established 90Hz as the baseline passing mark. Under a 120Hz specification, the single-frame cycle is further abbreviated to 8.33 milliseconds, providing the main chip with a more generous task scheduling window.

  • Attitude Data Polling: Consumes 1 millisecond (based on a 1000Hz sampling rate setting).

  • GPU Image Rendering: Consumes 8 to 10 milliseconds (calculated based on 90Hz or 120Hz margins).

  • MIPI Interface Transmission: Writing one frame to the panel takes about 3 milliseconds at a 4 Gbps physical bandwidth.

  • Pixel Physical Flipping: Micro OLED panel level transition, taking about 0.01 milliseconds.

Beyond the mere frame rate number, pixel illumination duration (Persistence) is an independent physical variable affecting motion sickness. When the eyes track a moving object, if screen pixels continuously emit light throughout an 11.11-millisecond frame cycle, a motion smear from the previous instant will linger on the retina.

The deflection of liquid crystal molecules in traditional LCD panels takes 3 to 5 milliseconds. Silicon-based Micro OLEDs rely on semiconductor luminescence; the brightness change response time caused by level transitions is less than 10 microseconds, equipping them with the underlying hardware prerequisites to implement Low Persistence display technology.

The Display Driver IC (DDIC) will insert black frames at high frequencies within a frame cycle. In a 90Hz operation mode, the system controls Micro OLED pixels to illuminate for only 1.5 to 2 milliseconds, leaving the screen in a completely black, non-illuminated state for the remaining 9 milliseconds.

High-frequency flickering capitalizes on the human eye's persistence of vision. The projection edges on the retina from an image source that includes dark-field time are extremely sharp, reducing the width of physical smearing by over 80%.

  • Observing the environment by turning the head at an angular velocity of 60 degrees/second.

  • Under 16.6-millisecond full-frame illumination conditions, a blur band about 15 pixels wide is produced in the line of sight.

  • By employing 2-millisecond low persistence illumination technology, the blur band is slashed to a width of 1.8 pixels, making it extremely difficult to perceive visually.

When handling complex high-polygon 3D scenes, the rendering output capability of mobile chips may plummet to 45 frames. Fluctuations in frame rate will immediately shatter the sub-20 millisecond MTP latency promise, triggering physiological rejection and a sense of nausea in users.

Display control systems introduce the Asynchronous TimeWarp spatial algorithm. When the GPU fails to submit a new frame within 11.11 milliseconds, a coprocessor intercepts and fetches the VRAM data of the old frame.

Reading the latest IMU attitude coordinates, the algorithm uses matrix transformations to perform a planar translation or rotational warp on the previous frame's image, forcibly generating an artificial interpolated new frame.

The interpolated frame feeds through the MIPI DSI channel, plugging into the missing refresh time slot. The physical screen is forced to maintain a fixed frequency of 90Hz or 120Hz, outputting high-frequency optical signals to the human eye, thereby severing the root cause of visual stuttering.

  • Frame Buffer Division: 3 buffer pools are deployed within the VRAM to physically separate rendering and reading operations.

  • High-Frequency Interrupt Signals: The VSync signal wakes up the microcontroller at a 120Hz frequency for VRAM data alignment.

  • Hardware Warping Engine: A 2D distortion correction unit completes pixel resampling of a 1080P image within 1 millisecond.

In Sony's PSVR2 setup, a 120Hz infrared eye-tracking camera maintains precise clock synchronization with the 120Hz display refresh rate. After acquiring the pupil's focal coordinates, the system—within a single frame's 8.33 milliseconds—renders only the foveated vision area, accounting for 10% of the total area, at full resolution.

The rendering resolution of the peripheral vision areas is drastically dropped to a quarter or even a sixteenth of the original. The downscaled data-reduced image tiles are re-stitched together and pushed out through the display interface at a frequency of 120Hz. The high-frequency refresh at the edges of the field of view ensures a stable spatial sensation in peripheral vision.

Independent dual-eye driving is another vital parameter for maintaining spatial depth stability. The left and right eyes are each equipped with a high-resolution 2560x2560 Micro OLED panel. When refreshing at 120Hz, the motherboard needs to simultaneously crank up two physically isolated high-speed video transmission buses, each carrying an effective payload exceeding 6 Gbps.

Interface Throughput

Driving a single Micro OLED screen with a resolution of 2560x2560 entails raw pixel data calculations built upon rigorous physical parameters. Under 24-bit RGB color depth, a single frame contains 6,553,600 pixels, physically occupying 19.66 MB of VRAM. When the device runs at a frequency of 90Hz, the pure video payload generated per second hits 14.16 Gbps.

Video signal transmission cannot merely send visible pixels. In adherence to VESA standard timing, the periphery of every image frame must be wrapped with a horizontal blanking period (HBP/HFP) and a vertical blanking period (VBP/VFP). The accompanying control data for these non-visible areas typically inflates the data load at the physical link layer by an extra 15% to 20%.

A data bridge is constructed between the motherboard end and the Display Driver IC (DDIC) utilizing the MIPI DSI protocol. Under the D-PHY v1.2 physical layer specification, the theoretical maximum transmission rate for a single differential data lane is rated at 2.5 Gbps. Outfitted with a standard 4-lane physical link, the theoretical total bandwidth limit secured by the system is 10 Gbps.

The tangible demand of 14.16 Gbps has already breached the physical upper bounds of a 4-lane D-PHY. Hardware engineers are pivoting to adopt the C-PHY protocol specification; this standard abandons traditional differential pairs in favor of a 3-wire Trio architecture for clock-embedded data encoding. Under the C-PHY v1.2 specification, the symbol rate of a single Trio link can reach 3.0 Gsps, yielding an equivalent throughput of 6.84 Gbps.

Physical Layer Protocol Version Hardware Routing Structure Max Physical Throughput (Gbps) Supported Max Frame Rate for 2.5K Screen
D-PHY (v1.2) 4 Lane (8 wires) 10.0 45 Hz
C-PHY (v1.2) 3 Trio (9 wires) 20.5 120 Hz
D-PHY (v2.0) 4 Lane (8 wires) 18.0 90 Hz

Even with an upgraded physical layer, the high-frequency flipping electrical signals will still generate severe electromagnetic interference (EMI) and exacerbate battery drain on wearable devices. System integrators deploy VESA Display Stream Compression (DSC) hardware codec modules between the SoC end and the display panel DDIC.

The DSC 1.2a standard utilizes a constant bitrate algorithm; through predictive coding and historical color look-up tables, it performs real-time packetization before pixel data exits the VRAM. The algorithm forcefully compresses the volume of 24-bit RGB pixel data down to 8 bits, attaining a high-density 3:1 compression ratio.

After enabling the DSC module, the original 14.16 Gbps video payload is trimmed down to 4.72 Gbps. The motherboard microcontroller only needs to maintain a downclocked operating state of 2.5 Gbps per lane; relying on 2 D-PHY lanes is sufficient to stably uphold a 90Hz refresh rate, slashing the number of physical lines by 50%.

Headset devices typically need to drive both left and right independent Micro OLED panels concurrently. When both 2.5K resolution screens are set to a 120Hz refresh rate, the system needs to process two colossal data streams in parallel. The main chip must be furnished with dual independent MIPI DSI transmitting controllers.

The high-speed throughput of the interface demands that the system memory supply blazing-fast upstream feeding capabilities. Calculated based on refreshing dual 2.5K screens at 120Hz, the memory controller (DDR Controller) must allocate exceedingly high priority to sustain a continuous read rate exceeding 3.8 GB/s.

The Direct Memory Access (DMA) unit internal to the microcontroller takes command of the entire transfer process. In accordance with a burst length of 16 bytes or 32 bytes, the display DMA extracts a data block from the LPDDR5 memory chips every few microseconds, funneling it into the asynchronous FIFO buffer pool of the MIPI module.

The depth of the FIFO buffer pool is customarily set between 1KB and 4KB. If massive data exchanges transpiring on the system bus cause DMA transfers to stall for over 5 microseconds, the FIFO gets drained, and the display end will throw a low-level error known as "Under-run," which manifests as screen tearing or black screen flickering.

When the device displays static images, the system triggers a low-power state transition for the MIPI link. The MCU dials down the transmission clock frequency from 1 GHz to 200 MHz, and the frame rate concurrently drops to 30 Hz. The link layer transmits a specific Escape Mode Sequence to instruct the panel's DDIC to match the speed reduction.

To prevent screen tearing during frame rate switching, the panel end sends a synchronization pulse to the MCU via a standalone Tearing Effect (TE) physical pin. Only within 2 microseconds after the MCU's hardware interrupt pin captures the rising edge of the TE signal is it permitted to pump a new frame's video data packet into the DSI interface.

Teilen

Hinterlasse einen Kommentar

Diese Website ist durch hCaptcha geschützt und es gelten die allgemeinen Geschäftsbedingungen und Datenschutzbestimmungen von hCaptcha.

RuffRuff Apps RuffRuff Apps by Tsun