Transflective Display combines transmissive backlight with a reflective layer, using ambient light outdoors (slashing power ~30%) and backlit indoors. It achieves high outdoor contrast (e.g., 1000:1 vs. 200:1 on transmissive screens), eliminating glare for clear visibility in varied lighting without manual switching.
Video See-Through
Unlike optical systems, VST relies on high-resolution cameras—typically dual 1080p or 4K sensors running at 30 to 60 frames per second (FPS)—to act as the system's "eyes." This video feed is processed by a powerful System-on-a-Chip (SoC) or dedicated GPU that aligns and overlays 3D graphics with an end-to-end latency target of under 20 milliseconds (ms) to minimize user discomfort.
It begins with the stereo cameras, which are precisely calibrated to mimic human interpupillary distance (IPD), typically set between 60-70 mm. These cameras continuously capture the real-world environment. The raw video data, which can amount to a data rate of over 1 Gbps for uncompressed 4K/60fps feeds, is streamed into the processing unit. Here, a complex series of tasks occurs in parallel. The system uses data from Inertial Measurement Units (IMUs) sampling at 1000 Hz to predict head movement, while computer vision algorithms analyze the video feed to perform 6-Degree-of-Freedom (6DoF) positional tracking, creating a real-time 3D map of the room.
The most critical performance metric for VST is motion-to-photon latency—the delay between a user moving their head and the display updating accordingly. A latency exceeding 20 ms is perceptible to most users, and delays over 50 ms significantly increase the risk of simulator sickness (cybersickness). To combat this, systems employ reprojection techniques, where the rendered image is warped at the very last moment based on the latest head-tracking data just before it's sent to the display. This can reduce perceived latency by 30-40%, but it introduces a small image distortion error, usually less than 0.5 pixels.
|
|
|
|
|
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The primary advantage of VST is its ability to achieve perfect occlusion and consistent lighting between real and virtual objects. Since the entire scene is digitized, a virtual object can be seamlessly placed behind a real physical desk.
Optical See-Through
Optical See-Through (OST) AR technology, utilized in devices like the Microsoft HoloLens 2, achieves its magic not by processing a video feed, but by guiding light. It allows users to see the physical world directly through optical combiners—often waveguides thinner than 1 mm—while micro-projectors inject digital imagery into this light path. This fundamental approach results in a near-zero latency view of reality, a critical safety feature, and enables all-day use with a typical power consumption 30-50% lower than video-based systems.
High-end waveguides can achieve a light transmittance of over 80%, meaning more than 80% of the ambient light passes through to the user's eyes. Second, it must act as a screen for the miniature projectors housed in the glasses' arms. These projectors, sometimes based on Laser Beam Scanning (LBS) or micro-LED technologies, generate the digital images. The combiner uses microscopic diffractive gratings—patterns etched with nanometer-scale precision—to "bend" the light from the projectors and direct it into the user's pupils.
The key differentiator is the sub-1 millisecond optical latency. Because the real world is viewed directly through optics, not via a camera-sensor-screen pipeline, there is no processing delay. Your view of a moving hand or a fast-changing real-world event is instantaneous.
First, inertial measurement units (IMUs), including gyroscopes and accelerometers sampling at 1000 Hz, track the headset's rotation and movement with extreme speed. This initial tracking is combined with data from world-facing depth sensors and cameras that map the environment, identifying surfaces and anchors. A dedicated processing unit then renders the virtual objects from the correct perspective in real-time, typically targeting 60 frames per second (FPS) to ensure smooth motion.
Since the user's eyes move behind the lenses, the system must account for interpupillary distance (IPD), which varies between 54 mm and 74 mm in adults. Any miscalibration can cause a registration error of several millimeters, making a virtual label appear to float away from its intended physical object. Advanced systems perform continuous eye-tracking at 120 Hz to adjust the image projection dynamically, compensating for these shifts and ensuring the digital content stays "locked" in place. Furthermore, achieving sufficient digital brightness, often measured in nits (candelas per square meter), is a constant battle. To be visible in a typical 500-nit office environment, the projector might need to output over 1000 nits, which directly impacts power consumption and device thermal management, often limiting operational sessions to 2-3 hours under full load.
Comparing Both Technologies
Selecting between Optical See-Through (OST) and Video See-Through (VST) is a foundational decision in AR development, with implications for system cost, user safety, and application viability. While OST offers a direct, zero-latency view for tasks requiring real-world interaction, VST provides a controlled, immersive canvas for complex graphical overlays. The choice is rarely about which technology is universally better, but about which one's performance characteristics—measured in milliseconds of latency, lumens of brightness, and watts of power consumption—best align with the specific demands of a use case, from a 15-minute remote assistance call to an 8-hour warehouse picking shift.
OST systems provide a direct optical path, resulting in a consistent, sub-1-millisecond visual delay for the real world. This is why a surgeon using an OST headset sees their own hands moving in real-time. However, the projected graphics suffer from a registration error that can range from 0.5 to 2 degrees of arc, causing virtual objects to "swim" as the user's head moves. In contrast, a VST system has an inherent end-to-end latency of 20-50 milliseconds due to the camera capture, processing, and display pipeline, a key factor in cybersickness incidence rates of 10-15% in sensitive users. Yet, because the entire scene is digitized, VST achieves a registration accuracy an order of magnitude better than OST, with errors often below 0.1 degrees, allowing for pixel-perfect alignment of virtual and real objects.
Visual fidelity and situational awareness present a stark trade-off. OST devices struggle with virtual object luminosity, typically peaking at around 3,000 nits, making them difficult to see in direct sunlight exceeding 10,000 nits. Their key advantage is high optical transmittance, often over 80%, preserving the user's natural peripheral vision and depth perception. A VST system completely controls the visual feed; it can digitally amplify a dark scene or apply filters, but it presents the world through a limited field of view, typically 90-120 degrees, which can create a "tunnel vision" effect. The quality of the real-world view is entirely dependent on the resolution and dynamic range of the cameras, which, even at 4K resolution, provide a lower effective angular resolution than the human eye.
OST designs are often more power-efficient, consuming 4-6 watts for the display system, enabling longer battery life of 3-6 hours for mobile use. Their primary hardware challenge is the precise manufacturing of waveguides, with yields on complex designs sometimes below 30%, driving up unit cost. VST systems are computationally intensive, requiring a sustained processing throughput of 3-5 TFLOPS, leading to power draws of 10-20 watts and often necessitating a tether or a large, heavy battery, limiting untethered sessions to 60-90 minutes.
|
|
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If the application involves users moving freely for over 2 hours in a dynamic physical environment where safety is paramount, OST is the default choice. Its zero latency and direct view mitigate physical risk. Conversely, if the goal is maximum graphical realism, perfect occlusion, and digital manipulation of the real world for periods under 90 minutes, VST provides a superior, albeit more computationally expensive, experience.
Future of Optical See-Through
The trajectory of Optical See-Through (OST) AR is not just one of incremental improvement, but a fundamental reshaping of its capabilities, aimed squarely at overcoming the long-standing 30-40% user adoption barriers in enterprise and consumer markets. The next five years will see a convergence of materials science, photonics, and AI, driving advancements that target a 100x increase in computational efficiency, a >50% reduction in device weight and power consumption, and the achievement of a "visual reality" where digital and physical objects are perceptually indistinguishable.
Current waveguide combiners, while compact, suffer from low optical efficiency, often transmitting less than 15% of the projected light to the eye, resulting in dim images. Next-generation combiners using holographic optical elements (HOEs) and volume Bragg gratings are poised to increase this efficiency to over 30%, while simultaneously expanding the field of view from today's 40-50 degrees to a more immersive 70-90 degrees. This will be coupled with a shift in light engines from liquid crystal on silicon (LCoS) to micro-LED arrays smaller than 0.5 inches diagonally. These micro-LEDs offer peak brightness exceeding 1,000,000 nits (compared to today's 3,000-5,000 nits), a contrast ratio of over 1,000,000:1, and a power consumption reduction of approximately 50% per lumen emitted, finally making OST displays viable in bright sunlight. Manufacturing these components at scale with nanometer-level precision (with feature sizes below 200 nm) remains the primary hurdle, but advances in semiconductor fabrication techniques are projected to increase production yields from below 30% to over 70% in the next 36 months, dramatically lowering unit costs.
The perennial issue of registration mismatch, with errors of 2-5 millimeters, will be tackled by a fusion of advanced sensor data. Systems will move beyond standard 6-degree-of-freedom (6DoF) tracking to all-time, full-environment understanding.
-
High-resolution depth sensors with a 10-meter range and an accuracy of ±1 cm, building a persistent, millimeter-accurate 3D map of the user's environment.
-
Event-based cameras that operate with a microsecond-level latency, capturing only changes in the scene to provide instantaneous tracking updates without the data overload of traditional video.
-
On-device machine learning models that continuously calibrate the display in real-time based on eye-tracking data sampled at 120 Hz, predicting and correcting for chromatic aberration and distortions with a sub-millimeter positional error target.
This sensor fusion will enable persistent occlusion, where virtual objects correctly appear behind real-world geometry even as the user moves, a feat that requires a localization accuracy with a standard deviation of less than 0.5 degrees.
Read more

Raspberry Pi 4 - A comprehensive upgrade, touching almost every element of the platform. Specially add one more HDMI display and USB TYPE-C.

BOE's specially developed for AR 0.39 inch micro-OLED display start to work. It means that AR glasses will usher in a new era of price reduction.


Leave a comment
This site is protected by hCaptcha and the hCaptcha Privacy Policy and Terms of Service apply.