Colocated Large-Space Multiplayer MR

This is a study-derived case, built from the SpatialXR (广州虚境起源) "VR-MR large-space multiplayer" Unity video course. The course is video-only (.sz = renamed MP4, no subtitles) + password-locked RAR materials, so what follows is the documented colocation pipeline — not a runnable Unity app. The Netcode SDK used for networking is not named in the readable materials (video-only — unverified).

The hardest step in colocated MR: making multiple headsets agree on one coordinate system

Single-user VR never hits this, but the moment two people share the same physical space looking at the same virtual content, you have to solve it: each headset builds its own local coordinate frame at boot — the origins do not coincide and the orientations differ. Without alignment, the table A sees on the left is on the right for B, and the shared experience collapses.

The pipeline the course documents breaks this into clear steps:

spatial anchors
  └─> spatial alignment       ← multiple headsets converge to ONE shared origin
        └─> networked room
              └─> player + interactable-object state sync
                    └─> public-internet relay

The target platform is Pico (including enterprise large-space scenarios).

The 13-lesson sequence (from the course structure)

#	Lesson	Place in the pipeline
1	Environment setup	project baseline
2	Room	networked room
3	Player-data sync	state sync (player)
4	Interactable-object sync	state sync (object)
5	Complex-interaction sync	state sync (complex)
6	Anchor + alignment principle	colocation principle
7	Alignment implementation	spatial alignment
8	Public-net networking	public-net relay
9	Pico MR	platform adaptation
10	Gesture adaptation	input adaptation
11	Pico anchors	platform spatial anchors

The course splits "principle" (lesson 6) from "implementation" (lesson 7) — first why you must snap multiple local frames onto one shared origin, then how to land it. That is exactly the conceptual crux of colocated MR.

Before vs after alignment (the core of the demo replay)

Before alignment: the two headsets are each in their own local frame. For the same grabbed object, A computes one position and B computes another (drawn as a dashed "ghost box" in the demo).
After alignment: spatial alignment snaps the two local frames onto one shared origin. From then on all avatars and the shared object live in one coordinate system — A and B agree on positions, and only then does state sync mean anything.

The interactive demo uses a top-down floorplan to replay this step: the two headsets start with independent coordinate axes → scan and place spatial anchors → the "align" step snaps B's axes onto A's shared origin → the avatars and a shared grabbed object then stay positionally consistent across clients.

On Netcode: honestly marked as unverified

Once alignment is done, the rest is classic networked state sync (player poses, object transforms, complex interactions) over a public-internet relay so people who are not on the same LAN can still join. But which Netcode SDK is actually used is not named in the materials I could read (the 3 PDFs) — everything else is subtitle-less video and encrypted RAR. So it is marked only as "unverified, from video," with no guessed framework name.

What this signals

Colocation / shared coordinate frames: understanding why "multiple headsets agreeing on one origin" is among the hardest XR problems, and the spatial-anchor → spatial-alignment convergence path
Layered state sync: player data / interactable objects / complex interactions synced in stages, not all at once
Platform landing: the concrete adaptation points for Pico large-space (incl. enterprise) + gesture adaptation + Pico spatial anchors
Senior signal + honest scoping: colocation is advanced XR engineering; at the same time, clear that this is a study-derived replay from a video-only course and the Netcode SDK is unverified — nothing fabricated

Demo strategy

What the demo replays

The interactive demo uses a top-down floorplan to replay the documented colocation pipeline: two headsets with independent local frames → spatial-anchor scan → spatial alignment snaps both frames onto one shared origin → avatars and a shared grabbed object stay positionally consistent across clients. The pipeline steps (spatial anchors / spatial alignment / state sync / public-net relay) and the Pico large-space target are taken from the course structure. This is a study-derived replay from a video-only course (no runnable Unity project), not a shipped app; the multiplayer Netcode SDK is unverified (video-only).

Public preview can be enabled later without redesigning the case-study layout