Abstract:To address the high cost, time-consuming nature, and specialized expertise required by marker-based optical systems, this article proposes a visual kinematic analysis method utilizing two viewpoints to achieve convenient, low-cost kinematic evaluation. First, a two-dimensional feature extraction architecture is established by integrating the global context modelling capability of the Swin Transformer, the precise positional awareness of coordinate attention, and the multi-scale feature fusion capability of the bidirectional feature pyramid network. It overcomes challenges such as occlusion and small target detection for keypoints, enabling effective extraction of two-dimensional features. Secondly, a triangulation method is proposed, employing joint contextual constraints based on keypoints position plausibility and limb length consistency. This is combined with a parametric human model to reconstruct 3D keypoints, enhancing estimation accuracy. Finally, a keypoint augmentation model is formulated to obtain an anatomical label set, which is then integrated with a musculoskeletal model for kinematic analysis. Kinematic evaluation on public datasets demonstrates an average joint angular error of 8.59° and average joint positional error of 42.02 mm, outperforming existing high-performance methods. To validate real-world applicability, commercial motion capture system Xsens serves as the evaluation benchmark against the mainstream OpenCap method, with analyses conducted on shoulder joint and gait kinematics, respectively. Experimental results show that for shoulder joint and gait kinematics, the proposed method achieves correlation coefficients of 0.92 and 0.86, respectively, with Xsens, representing improvements of 9.52% and 7.40% over OpenCap. Angular errors are reduced to 13.97° and 3.12°, respectively, marking decreases of 27.01% and 25.18% compared to OpenCap. In summary, the proposed method achieves more accurate kinematic analysis than current mainstream approaches on both public datasets and in real-world scenarios, holding significant implications for advancing applications related to kinematic analysis.