Journals
A Goldbraikh, O Shubi, O Rubin, CM Pugh, S Laufer,
"MS-TCRNet: Multi-Stage Temporal Convolutional Recurrent Networks for action segmentation using sensor-augmented kinematics",
Pattern Recognition, 2024 Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two versions of Multi-Stage Temporal Convolutional Recurrent Networks (MS-TCRNet), specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Hand Inversion, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieved state-of-the-art performance.
Ido Zuckerman, Nicole Werner, Jonathan Kouchly, Emma Huston, Shannon DiMarco, Paul DiMusto, Shlomi Laufer,
"Depth over RGB: automatic evaluation of open surgery skills using depth camera",
International Journal of Computer Assisted Radiology and Surgery (2024): 1-9. Purpose
In this paper, we present a novel approach to the automatic evaluation of open surgery skills using depth cameras. This work is intended to show that depth cameras achieve similar results to RGB cameras, which is the common method in the automatic evaluation of open surgery skills. Moreover, depth cameras offer advantages such as robustness to lighting variations, camera positioning, simplified data compression, and enhanced privacy, making them a promising alternative to RGB cameras.
Methods
Experts and novice surgeons completed two simulators of open suturing. We focused on hand and tool detection and action segmentation in suturing procedures. YOLOv8 was used for tool detection in RGB and depth videos. Furthermore, UVAST and MSTCN++ were used for action segmentation. Our study includes the collection and annotation of a dataset recorded with Azure Kinect.
Results
We demonstrated that using depth cameras in object detection and action segmentation achieves comparable results to RGB cameras. Furthermore, we analyzed 3D hand path length, revealing significant differences between experts and novice surgeons, emphasizing the potential of depth cameras in capturing surgical skills. We also investigated the influence of camera angles on measurement accuracy, highlighting the advantages of 3D cameras in providing a more accurate representation of hand movements.
Conclusion
Our research contributes to advancing the field of surgical skill assessment by leveraging depth cameras for more reliable and privacy evaluations. The findings suggest that depth cameras can be valuable in assessing surgical skills and provide a foundation for future research in this area.
Shahaf Arica, Or Rubin, Sapir Gershov, Shlomi Laufer,
"CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers",
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024 In this paper, we introduce VoteCut, an innovative method for unsupervised object discovery that leverages feature representations from multiple self-supervised models.VoteCut employs normalized-cut based graph partitioning, clustering and a pixel voting approach. Additionally, We present CuVLER (Cut-Vote-and-LEaRn), a zero-shot model, trained using pseudo-labels, generated by VoteCut, and a novel soft target loss to refine segmentation accuracy.
Through rigorous evaluations across multiple datasets and several unsupervised setups, our methods demonstrate significant improvements in comparison to previous state-ofthe- art models. Our ablation studies further highlight the contributions of each component, revealing the robustness and efficacy of our approach. Collectively, VoteCut and CuVLER pave the way for future advancements in image segmentation.
Sapir Gershov, Aeyal Raz, Erez Karpas and Shlomi Laufer,
"Towards an autonomous clinical decision support system",
Engineering Applications of Artificial Intelligence 127 (2024): 107215.
Clinicians’ decision-making is of utmost importance during critical situations. Thus, integrating Clinical Decision Support Systems (CDSS) may assist the medical staff by enhancing the decision-making process, eventually improving patient outcomes. The potential of an autonomous CDSS, proficient in predicting and guiding medical treatment, is significant—especially in situations where every second counts.
We proposed a methodology to design a CDSS based on observational data of clinical procedures. This approach employs graph-convolutional networks (GCN) to encapsulate medical knowledge from simulated clinical procedures with sequential data. Consequently, our model can extrapolate from these procedures, identifying novel structural and characteristic combinations. This innovative method harnesses information that might elude human observers. Moreover, our model generates action sequences that a human physician has not previously executed.
Traditional techniques tend to fall short in adapting to changing trends, thus failing to anticipate human actions. Conversely, advanced models like GCN have demonstrated promising potential in tasks like human action prediction, including activity recognition. We assessed these performances using benchmark datasets, which yielded encouraging results.
Additionally, we constructed a graph-based CDSS to deliver pertinent medical advice. We outline a methodology to monitor the procedure’s current stage and predict the physician’s subsequent action, facilitating time-saving measures like pre-emptive instrument preparation. Our novel CDSS methodology achieved an F1-score of 0.899 and 0.714 when performing one and two-step predictions, respectively. Furthermore, our simulations illustrate a considerable time-saving potential, with an average reduction of approximately 00:01:28 ± 00:01:15 min in the preparation time for adrenaline dosage, a crucial component for successful resuscitation.
Liran Halperin, Gideon Sroka, Ido Zuckerman, Shlomi Laufer,
"Automatic performance evaluation of the intracorporeal suture exercise",
International Journal of Computer Assisted Radiology and Surgery, Volume 19, pages 83–86, (2024) Purpose
This work uses deep learning algorithms to provide automated feedback on the suture with intracorporeal knot exercise in the fundamentals of laparoscopic surgery simulator. Different metrics were designed to provide informative feedback to the user on how to complete the task more efficiently. The automation of the feedback will allow students to practice at any time without the supervision of experts.
Methods
Five residents and five senior surgeons participated in the study. Object detection, image classification, and semantic segmentation deep learning algorithms were used to collect statistics on the practitioner’s performance. Three task-specific metrics were defined. The metrics refer to the way the practitioner holds the needle before the insertion to the Penrose drain, and the amount of movement of the Penrose drain during the needle’s insertion.
Results
Good agreement between the human labeling and the different algorithms’ performance and metric values was achieved. The difference between the scores of the senior surgeons and the surgical residents was statistically significant for one of the metrics.
Conclusion
We developed a system that provides performance metrics of the intracorporeal suture exercise. These metrics can help surgical residents practice independently and receive informative feedback on how they entered the needle into the Penrose.
Shlomi Laufer, Roberta L Klatzky, Carla M Pugh,
"Sensor-Based Discovery of Search and Palpation Modes in the Clinical Breast Examination",
Academic Medicine 99(4S):p S89-S94, April 2024. Purpose
Successful implementation of precision education systems requires widespread adoption and seamless integration of new technologies with unique data streams that facilitate real-time performance feedback. This paper explores the use of sensor technology to quantify hands-on clinical skills. The goal is to shorten the learning curve through objective and actionable feedback.
Method
A sensor-enabled clinical breast examination (CBE) simulator was used to capture force and video data from practicing clinicians (N = 152). Force-by-time markers from the sensor data and a machine learning algorithm were used to parse physicians’ CBE performance into periods of search and palpation and then these were used to investigate distinguishing characteristics of successful versus unsuccessful attempts to identify masses in CBEs.
Results
Mastery performance from successful physicians showed stable levels of speed and force across the entire CBE and a 15% increase in force when in palpation mode compared with search mode. Unsuccessful physicians failed to search with sufficient force to detect deep masses (F[5,146] = 4.24, P = .001). While similar proportions of male and female physicians reached the highest performance level, males used more force as noted by higher palpation to search force ratios (t[63] = 2.52, P = .014).
Conclusions
Sensor technology can serve as a useful pathway to assess hands-on clinical skills and provide data-driven feedback. When using a sensor-enabled simulator, the authors found specific haptic approaches that were associated with successful CBE outcomes. Given this study’s findings, continued exploration of sensor technology in support of precision education for hands-on clinical skills is warranted.
Sapir Gershov, Daniel Braunold, Robert Spector, Alexander Ioscovich, Aeyal Raz and Shlomi Laufer,
"Automating Medical Simulations",
Journal of Biomedical Informatics 144 (2023): 104446.
Objective
This study aims to explore speech as an alternative modality for human activity recognition (HAR) in medical settings. While current HAR technologies rely on video and sensory modalities, they are often unsuitable for the medical environment due to interference from medical personnel, privacy concerns, and environmental limitations. Therefore, we propose an end-to-end, fully automatic objective checklist validation framework that utilizes medical personnel’s uttered speech to recognize and document the executed actions in a checklist format.
Methods
Our framework records, processes, and analyzes medical personnel’s speech to extract valuable information about performed actions. This information is then used to fill the corresponding rubrics in the checklist automatically.
Results
Our approach to activity recognition outperformed the online expert examiner, achieving an F1 score of 0.869 on verbal tasks and an ICC score of 0.822 with an offline examiner. Furthermore, the framework successfully identified communication failures and medical errors made by physicians and nurses.
Conclusion
Implementing a speech-based framework in medical settings, such as the emergency room and operation room, holds promise for improving care delivery and enabling the development of automated assistive technologies in various medical domains. By leveraging speech as a modality for HAR, we can overcome the limitations of existing technologies and enhance workflow efficiency and patient safety.
Aviad Lazar, Gideon Sroka, Shlomi Laufer,
"Automatic assessment of performance in the FLS trainer using computer vision",
Surgical Endoscopy 37.8 (2023): 6476-6482 Background
Fundamentals of Laparoscopic Surgery (FLS) box trainer is a well-accepted method for training and evaluating laparoscopic skills. It mandates an observer that will measure and evaluate the trainee’s performance. Measuring performance in the Peg Transfer task includes time and penalty for dropping pegs. This study aimed to assess whether computer vision (CV) may be used to automatically measure performance in the FLS box trainer.
Methods
Four groups of metrics were defined and measured automatically using CV. Validity was assessed by dividing participants to 3 groups of experience levels. Twenty-seven participants were recorded performing the Peg Transfer task 2–4 times, amounting to 72 videos. Frames were sampled from the videos and labeled to create an image dataset. Using these images, we trained a deep neural network (YOLOv4) to detect the different objects in the video. We developed an evaluation system that tracks the transfer of the triangles and produces a feedback report with the metrics being the main criteria. The metric groups were Time, Grasper Movement Speed, Path Efficiency, and Grasper Coordination. The performance was compared based on their last video (3 participants were excluded due to technical issues).
Results
The ANOVA tests show that for all metrics except one, the variance in performance can be explained by the experience level of participants. Senior surgeons and residents significantly outperform students and interns on almost every metric. Senior surgeons usually outperform residents, but the gap is not always significant.
Conclusion
The statistical analysis shows that the metrics can differentiate between the experts and novices performing the task in several aspects. Thus, they may provide a more detailed performance analysis than is currently used. Moreover, these metrics calculation is automatic and relies solely on the video camera of the FLS trainer. As a result, they allow independent training and assessment.
Eddie Bkheet, Anne-Lise D’Angelo, Adam Goldbraikh, Shlomi Laufer,
"Using Hand Pose Estimation To Automate Open Surgery Training Feedback",
International Journal of Computer Assisted Radiology and Surgery 18.7 (2023): 1279-1285. Purpose
This research aims to facilitate the use of state-of-the-art computer vision algorithms for the automated training of surgeons and the analysis of surgical footage. By estimating 2D hand poses, we model the movement of the practitioner’s hands, and their interaction with surgical instruments, to study their potential benefit for surgical training.
Methods
We leverage pre-trained models on a publicly available hands dataset to create our own in-house dataset of 100 open surgery simulation videos with 2D hand poses. We also assess the ability of pose estimations to segment surgical videos into gestures and tool-usage segments and compare them to kinematic sensors and I3D features. Furthermore, we introduce 6 novel surgical dexterity proxies stemming from domain experts’ training advice, all of which our framework can automatically detect given raw video footage.
Results
State-of-the-art gesture segmentation accuracy of 88.35% on the open surgery simulation dataset is achieved with the fusion of 2D poses and I3D features from multiple angles. The introduced surgical skill proxies presented significant differences for novices compared to experts and produced actionable feedback for improvement.
Conclusion
This research demonstrates the benefit of pose estimations for open surgery by analyzing their effectiveness in gesture segmentation and skill assessment. Gesture segmentation using pose estimations achieved comparable results to physical sensors while being remote and markerless. Surgical dexterity proxies that rely on pose estimation proved they can be used to work toward automated training feedback. We hope our findings encourage additional collaboration on novel skill proxies to make surgical training more efficient.
Calvin Perumalla, LaDonna Kearse, Michael Peven, Shlomi Laufer, Cassidi Goll, Brett Wise, Su Yang, Carla Pugh,
"AI-Based Video Segmentation: Procedural Steps or Basic Maneuvers?",
Journal of Surgical Research 283 (2023): 500-506.
Introduction
Video-based review of surgical procedures has proven to be useful in training by enabling efficiency in the qualitative assessment of surgical skill and intraoperative decision-making. Current video segmentation protocols focus largely on procedural steps. Although some operations are more complex than others, many of the steps in any given procedure involve an intricate choreography of basic maneuvers such as suturing, knot tying, and cutting. The use of these maneuvers at certain procedural steps can convey information that aids in the assessment of the complexity of the procedure, surgical preference, and skill. Our study aims to develop and evaluate an algorithm to identify these maneuvers.
Methods
A standard deep learning architecture was used to differentiate between suture throws, knot ties, and suture cutting on a data set comprised of videos from practicing clinicians (N = 52) who participated in a simulated enterotomy repair. Perception of the added value to traditional artificial intelligence segmentation was explored by qualitatively examining the utility of identifying maneuvers in a subset of steps for an open colon resection.
Results
An accuracy of 84% was reached in differentiating maneuvers. The precision in detecting the basic maneuvers was 87.9%, 60%, and 90.9% for suture throws, knot ties, and suture cutting, respectively. The qualitative concept mapping confirmed realistic scenarios that could benefit from basic maneuver identification.
Conclusions
Basic maneuvers can indicate error management activity or safety measures and allow for the assessment of skill. Our deep learning algorithm identified basic maneuvers with reasonable accuracy. Such models can aid in artificial intelligence-assisted video review by providing additional information that can complement traditional video segmentation protocols.
Kristina Basiev, Adam Goldbraikh, Carla M Pugh and Shlomi Laufer,
"Open surgery tool classification and hand utilization using a multi-camera system",
International Journal of Computer Assisted Radiology and Surgery (2022) The goal of this work is to use multi-camera video to classify open surgery tools as well as identify which tool is held in each hand. Multi-camera systems help prevent occlusions in open surgery video data. Furthermore, combining multiple views such as a Top-view camera covering the full operative field and a Close-up camera focusing on hand motion and anatomy, may provide a more comprehensive view of the surgical workflow. However, multi-camera data fusion poses a new challenge: a tool may be visible in one camera and not the other. Thus, we defined the global ground truth as the tools being used regardless of their visibility. Therefore, tools that are out of the image should be remembered for extensive periods of time while the system responds quickly to changes visible in the video. Participants (n=48) performed a simulated open bowel repair. A Top-view and a Close-up cameras were used. YOLOv5 was used for tool and hand detection. A high frequency LSTM with a 1 second window at 30 frames per second (fps) and a low frequency LSTM with a 40 second window at 3 fps were used for spatial, temporal, and multi-camera integration. The accuracy and F1 of the six systems were: Top-view (0.88/0.88), Close-up (0.81,0.83), both cameras (0.9/0.9), high fps LSTM (0.92/0.93), low fps LSTM (0.9/0.91), and our final architecture the Multi-camera classifier(0.93/0.94).
By combining a system with a high fps and a low fps from the multiple camera array we improved the classification abilities of the global ground truth.
Adam Goldbraikh, Tomer Volk, Carla M. Pugh & Shlomi Laufer,
"Using open surgery simulation kinematic data for tool and gesture recognition",
International Journal of Computer Assisted Radiology and Surgery (2022) Purpose The use of motion sensors is emerging as a means for measuring surgical performance. Motion sensors are typically
used for calculating performance metrics and assessing skill. The aim of this study was to identify surgical gestures and tools
used during an open surgery suturing simulation based on motion sensor data.
Methods Twenty-five participants performed a suturing task on a variable tissue simulator. Electromagnetic motion sensors
were used to measure their performance. The current study compares GRU and LSTM networks, which are known to perform
well on other kinematic datasets, as well as MS-TCN++, which was developed for video data and was adapted in this work
for motion sensors data. Finally, we extended all architectures for multi-tasking.
Results In the gesture recognition task the MS-TCN++ has the highest performance with accuracy of 82.4 ± 6.97 and F1-
Macro of 78.92 ± 8.5, edit distance of 86.30 ± 8.42 and F1@10 of 89.30 ± 7.01 In the tool usage recognition task for the
right hand, MS-TCN++ performs the best in most metrics with an accuracy score of 94.69±3.57, F1-Macro of 86.06±7.06,
F1@10 of 84.34 ± 10.90, and F1@25 of 80.58 ± 12.03. The multi-task GRU performs best in all metrics in the left-hand
case, with an accuracy of 95.04±4.18, edit distance of 85.01±16.94, F1-Macro of 89.81±11.65, F1@10 of 89.17±13.28,
and F1@25 of 88.64 ± 13.6.
Conclusion In this study, using motion sensor data, we automatically identified the surgical gestures and the tools used
during an open surgery suturing simulation. Our methods may be used for computing more detailed performance metrics
and assisting in automatic workflow analysis. MS-TCN++ performed better in gesture recognition as well as right-hand tool
recognition, while the multi-task GRU provided better results in the left-hand case. It should be noted that our multi-task
GRU network is significantly smaller and has achieved competitive results in the rest of the tasks as well.
Imri Amiel, Roi Anteby, Moti Cordoba, Shlomi Laufer, Chaya Shwaartz, Danny Rosin, Mordechai Gutman, Amitai Ziv, Roy Mashiach,
"Feedback based simulator training reduces superfluous forces exerted by novice residents practicing knot tying for vessel ligation",
The American Journal of Surgery 220.1 (2020): 100-104 Technological advances have led to the development of state-of-the-art simulators for training surgeons; few train basic surgical skills, such as vessel ligation. A novel low-cost bench-top simulator with auditory and visual feedback that measures forces exerted during knot tying was tested on 14 surgical residents. Pre- and post-training values for total force exerted during knot tying, maximum pulling and pushing forces and completion time were compared. Mean time to reach proficiency during training was 11:26 min, with a mean of 15 consecutive knots. Mean total applied force for each knot were 35% lower post-training than pre-training (7.5 vs. 11.54 N (N), respectively, p = 0.039). Mean upward peak force was significantly lower after, compared to before, training (1.29 vs. 2.12 N, respectively, p = 0.004). Simulator training with visual and auditory force feedback improves knot-tying skills of novice surgeons.
Adam Goldbraikh, Anne-Lise D’Angelo, Carla M. Pugh & Shlomi Laufer,
"Video-based fully automatic assessment of open surgery suturing skills",
International Journal of Computer Assisted Radiology and Surgery (2022): 1-12 The goal of this study was to develop a new reliable open surgery suturing simulation system for training medical students in situations where resources are limited or in the domestic setup. Namely, we developed an algorithm for tools and hands localization as well as identifying the interactions between them based on simple webcam video data, calculating motion metrics for assessment of surgical skill. Twenty-five participants performed multiple suturing tasks using our simulator. The YOLO network was modified to a multi-task network for the purpose of tool localization and tool–hand interaction detection. This was accomplished by splitting the YOLO detection heads so that they supported both tasks with minimal addition to computer run-time. Furthermore, based on the outcome of the system, motion metrics were calculated. These metrics included traditional metrics such as time and path length as well as new metrics assessing the technique participants use for holding the tools. The dual-task network performance was similar to that of two networks, while computational load was only slightly bigger than one network. In addition, the motion metrics showed significant differences between experts and novices. While video capture is an essential part of minimal invasive surgery, it is not an integral component of open surgery. Thus, new algorithms, focusing on the unique challenges open surgery videos present, are required. In this study, a dual-task network was developed to solve both a localization task and a hand–tool interaction task. The dual network may be easily expanded to a multi-task network, which may be useful for images with multiple layers and for evaluating the interaction between these different layers
Shlomi Laufer, Anne-Lise D. D’Angelo, Calvin Kwan, Rebbeca D. Ray, Rachel Yudkowsky, John R. Boulet, William C. McGaghie and Carla M. Pugh,
"Rescuing the Clinical Breast Examination: Advances in Classifying Technique and Assessing Physician Competency",
Annals of surgery 266.6 (2017): 1069 There are several, technical aspects of a proper CBE. Our recent work discovered a significant, linear relationship between palpation force and CBE accuracy. This article investigates the relationship between other technical aspects of the CBE and accuracy. This performance assessment study involved data collection from physicians (n = 553) attending 3 different clinical meetings between 2013 and 2014: American Society of Breast Surgeons, American Academy of Family Physicians, and American College of Obstetricians and Gynecologists. Four, previously validated, sensor-enabled breast models were used for clinical skills assessment. Models A and B had solitary, superficial, 2 cm and 1 cm soft masses, respectively. Models C and D had solitary, deep, 2 cm hard and moderately firm masses, respectively. Finger movements (search technique) from 1137 CBE video recordings were independently classified by 2 observers. Final classifications were compared with CBE accuracy. Accuracy rates were model A = 99.6%, model B = 89.7%, model C = 75%, and model D = 60%. Final classification categories for search technique included rubbing movement, vertical movement, piano fingers, and other. Interrater reliability was (k = 0.79). Rubbing movement was 4 times more likely to yield an accurate assessment (odds ratio 3.81, P < 0.001) compared with vertical movement and piano fingers. Piano fingers had the highest failure rate (36.5%). Regression analysis of search pattern, search technique, palpation force, examination time, and 6 demographic variables, revealed that search technique independently and significantly affected CBE accuracy (P < 0.001). Our results support measurement and classification of CBE techniques and provide the foundation for a new paradigm in teaching and assessing hands-on clinical skills. The newly described piano fingers palpation technique was noted to have unusually high failure rates. Medical educators should be aware of the potential differences in effectiveness for various CBE techniques.
Anne-Lise D. D’Angelo, Drew N. Rutherford, Rebecca D. Ray, Shlomi Laufer, Andrea Mason, Carla M. Pugh,
"Working volume: validity evidence for a motion-based metric of surgical efficiency",
The American Journal of Surgery 211.2 (2016): 445-450,
special award 2016 The aim of this study was to evaluate working volume as a potential assessment metric for open surgical tasks. Surgical attendings (n = 6), residents (n = 4), and medical students (n = 5) performed a suturing task on simulated connective tissue (foam), artery (rubber balloon), and friable tissue (tissue paper). Using a motion tracking system, effective working volume was calculated for each hand. Repeated measures analysis of variance assessed differences in working volume by experience level, dominant and/or nondominant hand, and tissue type. Analysis revealed a linear relationship between experience and working volume. Attendings had the smallest working volume, and students had the largest (P = .01). The 3-way interaction of experience level, hand, and material type showed attendings and residents maintained a similar working volume for dominant and nondominant hands for all tasks. In contrast, medical students’ nondominant hand covered larger working volumes for the balloon and tissue paper materials (P < .05). This study provides validity evidence for the use of working volume as a metric for open surgical skills. Working volume may provide a means for assessing surgical efficiency and the operative learning curve.
Anne-Lise D. D’Angelo, Drew N. Rutherford, Rebecca D. Ray, Shlomi Laufer, Calvin Kwan, Elaine R. Cohen, Andrea Mason, Carla M. Pugh,
"Idle time: an underdeveloped performance metric for assessing surgical skill",
The American journal of surgery 209.4 (2015): 645-651 The aim of this study was to evaluate validity evidence using idle time as a perfor-mance measure in open surgical skills assessment. This pilot study tested psychomotor planning skills of surgical attendings (n=56), res-idents (n=54) and medical students (n=55) during suturing tasks of varying difficulty. Performance data were collected with a motion tracking system. Participants’ hand movements were analyzed for idle time, total operative time, and path length. We hypothesized that there will be shorter idle times for more experienced individuals and on the easier tasks. A total of 365 idle periods were identified across all participants. Attendings had fewer idle periods during 3 specific procedure steps (P < .001). All participants had longer idle time on friable tissue (P < .005). Using an experimental model, idle time was found to correlate with experience and motor planning when operating on increasingly difficult tissue types. Further work exploring idle time as a valid psychomotor measure is warranted.
Conferences
Sapir Gershov, Fadi Mahameed, Aeyal Raz and Shlomi Laufer ,
"More Than Meets the Eye: Physicians’ Visual Attention in the Operating Room",
International Workshop on Applications of Medical AI. Cham: Springer Nature Switzerland, 2023. During surgery, the patient’s vital signs and the field of endoscopic view are displayed on multiple screens. As a result, both surgeons’ and anesthesiologists’ visual attention (VA) is crucial. Moreover, the distribution of said VA and the acquisition of specific cues might directly impact patient outcomes.
Recent research utilizes portable, head-mounted eye-tracking devices to gather precise and comprehensive information. Nevertheless, these technologies are not feasible for enduring data acquisition in an operating room (OR) environment. This is particularly the case during medical emergencies.
This study presents an alternative methodology: a webcam-based gaze target prediction model. Such an approach may provide continuous visual behavioral data with minimal interference to the physicians’ workflow in the OR. The proposed end-to-end framework is suitable for both standard and emergency surgeries.
In the future, such a platform may serve as a crucial component of context-aware assistive technologies in the OR.
Adam Goldbraikh, Netanell Avisdris and Shlomi Laufer,
"Bounded Future MS-TCN++ for Surgical Gesture Recognition",
Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13803. Springer, Cham. In recent times there is a growing development of videobased applications for surgical purposes. Part of these applications can work offline after the end of the procedure, other applications must react immediately. However, there are cases where the response should be done during the procedure but some delay is acceptable. In the literature, the online-offline performance gap is known. Our goal in this study was to learn the performance-delay trade-off and design an MS-TCN++-based algorithm that can utilize this trade-off. To this aim, we used our open surgery simulation data-set containing 96 videos of 24 participants that perform a suturing task on a variable tissue simulator. In this study, we used video data captured from the side view. The Networks were trained to identify the performed surgical gestures. The naive approach is to reduce the MS-TCN++ depth, as a result, the receptive field is reduced, and also the number of required future frames is also reduced. We showed that this method is sub-optimal, mainly in the small delay cases. The second method was to limit the accessible future in each temporal convolution. This way, we have flexibility in the network design and as a result, we achieve significantly better performance than in the naive approach.
Kristina Basiev, Adam Goldbraikh, Carla M Pugh and Shlomi Laufer,
"Open surgery tool classification and hand utilization using a multi-camera system",
IPCAI 2022 The goal of this work is to use multi-camera video to classify open surgery tools as well as identify which tool is held in each hand. Multi-camera systems help prevent occlusions in open surgery video data. Furthermore, combining multiple views such as a Top-view camera covering the full operative field and a Close-up camera focusing on hand motion and anatomy, may provide a more comprehensive view of the surgical workflow. However, multi-camera data fusion poses a new challenge: a tool may be visible in one camera and not the other. Thus, we defined the global ground truth as the tools being used regardless of their visibility. Therefore, tools that are out of the image should be remembered for extensive periods of time while the system responds quickly to changes visible in the video. Participants (n=48) performed a simulated open bowel repair. A Top-view and a Close-up cameras were used. YOLOv5 was used for tool and hand detection. A high frequency LSTM with a 1 second window at 30 frames per second (fps) and a low frequency LSTM with a 40 second window at 3 fps were used for spatial, temporal, and multi-camera integration. The accuracy and F1 of the six systems were: Top-view (0.88/0.88), Close-up (0.81,0.83), both cameras (0.9/0.9), high fps LSTM (0.92/0.93), low fps LSTM (0.9/0.91), and our final architecture the Multi-camera classifier(0.93/0.94).
By combining a system with a high fps and a low fps from the multiple camera array we improved the classification abilities of the global ground truth.
Sapir Gershov, Yaniv Ringel, Erez Dvir, Tzvia Tsirilman, Elad Ben Zvi, Sandra Braun, Aeyal Raz, Shlomi Laufer,
"Automatic Speech-Based Checklist for Medical Simulations",
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations, 2021 Medical simulators provide a controlled environment for training and assessing clinical skills. However, as an assessment platform, it requires the presence of an experienced examiner to provide performance feedback, commonly preformed using a task specific checklist. This makes the assessment process inefficient and expensive. Furthermore, this evaluation method does not provide medical practitioners the opportunity for independent training. Ideally, the process of filling the checklist should be done by a fully-aware objective system, capable of recognizing and monitoring the clinical performance. To this end, we have developed an autonomous and a fully automatic speech-based checklist system, capable of objectively identifying and validating anesthesia residents’ actions in a simulation environment. Based on the analyzed results, our system is capable of recognizing most of the tasks in the checklist: F1 score of 0.77 for all of the tasks, and F1 score of 0.79 for the verbal tasks. Developing an audio-based system will improve the experience of a wide range of simulation platforms. Furthermore, in the future, this approach may be implemented in the operation room and emergency room. This could facilitate the development of automatic assistive technologies for these domains.