: Detailed mesh points to capture "non-manual markers" (facial expressions essential for ASL grammar).

: Normalize all points relative to a "root" point (e.g., the base of the neck or center of the face) to make the features invariant to where the person is standing in the frame.

: Tracking the shoulders, elbows, and wrists to define the "signing space." 2. Temporal Normalization

To turn raw landmarks into a feature vector for a model (like a Transformer or LSTM), apply the following: