Proficiency performance measures and artificial intelligence in robotic education

Introduction

In the past decade, mounting evidence suggests that surgical performance affects postoperative outcomes, as superior quality of surgery reduces the risk of complications, shortens the length of hospital stay, and results in better long-term functional and oncological outcomes. Thus surgical training is the foundation for high-quality surgery. The recent transition in surgical training from Halstead’s traditional apprenticeship model, famously known as “see one, do one, teach one,” to a more standardized and quantifiable training method has led to a rapidly growing interest in surgical assessment and performance measures.

Competency refers to the bare minimum performance level required for a surgeon to operate independently, whereas proficiency implies some level of performance above the bare minimum. Mastery refers to a rare, exceptional performance that is the highest level of achievement within a certain domain. This chapter focuses on performance measures at a proficiency level.

Measures of performance

Several surrogates of performance have been used in the past, with the previous “gold standard” being surgical outcomes (e.g., complication rates, readmission rates, oncologic outcomes, functional outcomes). Although assessing outcomes seems reasonable and clinically relevant, it is limited by several shortcomings. Due to its retrospective nature, poor surgical performance can be identified only after an adverse event occurs rather than before or as it occurs. In addition, the heterogeneity of patient and disease characteristics makes comparisons between surgeons or institutions difficult. For example, high-risk patients are often transferred from community hospitals to tertiary centers for treatment, which may make simple surgical outcome comparisons between community hospitals and tertiary centers unfair. Another popular method of estimating surgical skills was based on prior case load; however, caseload is not always necessarily accurate as surgeons’ learning curves vary widely. Thus better surgical assessment methods are needed to accurately and objectively gauge surgeon performance.

To meet this need, multiple assessment methods have been developed and can be broadly classified into three main categories: manual assessment, computer-generated metrics, and surgeon biometrics ( Fig. 10.1 ). Burgeoning studies have shown these methods to be capable of differentiating expertise, demonstrating surgeons’ learning curves, and predicting surgical outcomes, thereby showing great potential for measuring robotic surgery performance for training, certification, and accreditation purposes.

Fig. 10.1, The Three Main Categories of Surgical Proficiency Measures.

Artificial intelligence in robotic surgery and education

Artificial intelligence (AI) has gained lots of interest in robotic surgery due to the plethora of data available from just a single surgery. In 2019, approximately 1,229,000 da Vinci robotic-assisted surgeries were done globally, compared with just 753,000 in 2016. ^, With multiple new robotic systems on the horizon, robotic surgery will likely grow even faster than before. The blossom of robotic surgery provides a unique and natural venue for incorporating AI, especially into surgical education and proficiency measurement. The combination of intraoperative instrument motion-tracking (kinematics), video recording, and AI has created an unprecedented opportunity to objectively and instantaneously assess a surgeon’s performance and provide customized feedback to train the next generation of robotic surgeons.

In this chapter, we will first review current proficiency measures used to assess surgeon proficiency in robotic surgery, then elaborate on the current applications of AI in robotic surgical education, and finally discuss future directions.

Manual assessment of robotic surgical education

Manual assessment requires expert evaluators to watch either a live or previously recorded surgical performance and then manually rate different surgical skills, with each domain anchored by objective grading criteria. Manual assessment can be broadly divided into two main categories: global skills assessments and procedure-specific assessments ( Table 10.1 ). As implied by the names, global skills assessments evaluate only general robotic surgical skills and can be applied across many procedures, whereas procedure-specific assessments evaluate surgical skills in a specific procedure or procedural step. Manual assessments have demonstrated the ability to differentiate surgeons based on previously used surrogates of skill and proficiency (i.e., prior surgeon case load) and are now commonly used as the standard with which newer assessment tools are compared. A summary of manual assessments is demonstrated in Table 10.1 .

TABLE 10.1

Manual Assessment Methods

Assessment Method	Brief Description	Skills Domains Assessed
Global Skills Assessments
OSATS	Evaluates technical proficiency in open surgery	Respect for tissue Time and motion Instrument handling Flow of operation and forward planning Knowledge of instruments Use of assistants Knowledge of specific procedure
GOALS	Expanded on OSATS to evaluate technical proficiency in laparoscopic surgery	Depth perception Bimanual dexterity Efficiency Tissue handling Autonomy
GEARS	First robotic surgery-specific assessment tool; currently the most widely used to measure robotic surgery proficiency	Depth perception Bimanual dexterity Efficiency Force sensitivity Robotic control Autonomy Verbal guidance
R-OSATS	Combines elements of OSATS and GOALS to assess proficiency across four standardized dry lab tasks	Depth perception/accuracy Force/tissue handling Dexterity Efficiency
ARCS	Developed by Intuitive Surgical to assess distinct console skills not assessed by GEARS	Dexterity with multiple wristed instruments Optimizing field of view Instrument visualization Optimizing master manipulator workspace Force sensitivity and control Basic energy pedal skills
Procedure-Specific Assessments
RACE	Developed to assess specific technical skills in vesicourethral anastomosis in RARP	Needle positioning Needle entry Needle driving and tissue trauma Suture placement Tissue approximation Knot tying
PACE	Developed to assess RARP-specific skills in standardized steps	5-point Likert scale with performance description for each domain of the seven standardized steps
CASE	Developed to assess RARC-specific skills in standardized steps	5-point Likert scale with performance description for each domain of the eight standardized steps
RARP Assessment Score	Developed to assess technical skill in critical steps identified by HFMEA analysis	Standardized steps divided into substeps to identify hazards and create a 17-stage system (17 processes and 41 subprocesses)

ARCS , Assessment of Robotic Console Skills; CASE , Cystectomy Assessment and Competency Evaluation; OSATS , Objective Structured Assessment of Technical Skill; GEARS , Global Evaluative Assessment of Robotic Skills; GOALS , Global Operative Assessment of Laparoscopic Skills; HFMEA , Healthcare Failure Mode Effect Analysis; PACE , Prostatectomy Assessment and Competency Evaluation RARP , Robot-assisted radical prostatectomy; RARC , Robot-assisted radical cystectomy; R-OSATS , Robotic-Objective Structured Assessment of Technical Skills; RACE , Robotic Anastomosis Competency Evaluation

Global skills assessments

Various global skills assessments have been created or adapted for robotic surgical performance evaluation over the years. The three most commonly used are Objective Structured Assessment of Technical Skill (OSATS), Global Operative Assessment of Laparoscopic Skills (GOALS), and Global Evaluation and Assessment of Robotic Skills (GEARS).

Objective structured assessment of technical skill

OSATS was first introduced in 1996 as one of the first surgical technical skills assessment tools and was originally designed for laboratory use. ^, It has since been used across many settings, including simulated, laboratory, and live operating environments, and for many surgical subspecialties, including urology, to evaluate robotic surgical performance.

OSATS has been used to accurately differentiate between novice and expert surgeons and to capture improvement in surgical skills after training programs. ^, OSATS has also shown potential correlations with important surgical outcomes, as OSATS scores correlated with anastomosis patency in live rat models undergoing robot-assisted microvascular surgery. However, correlations to surgical outcomes have been primarily in the laboratory or simulator setting, and further research must be done to identify correlations after live robotic surgery.

Although OSATS has been used and adapted to evaluate robotic surgery proficiency, it was originally designed for open surgery in a laboratory setting. As a result, it does not encompass all of the distinct skills required for robotic surgery, such as depth perception, bimanual dexterity, camera control, and master manipulator workspace.

Global operative assessment of laparoscopic skills

GOALS was developed in 2005 by Vassilou et al. to specifically evaluate laparoscopic surgical skills and has since been adapted for robotic surgical assessment in the live surgery and laboratory setting. Because GOALS was created for minimally invasive surgery, it evaluates robotic surgery skills better than OSATS; however, it still lacks certain skills unique to robotic surgery such as camera control and master manipulator workspace. ^, Like OSATS, GOALS has been used to reliably assess surgeon performance and distinguish expert and trainee performance, including in specific procedural steps such as a ureteral anastomosis.

Global evaluative assessment of robotic skills

Although GOALS was an improvement from OSATS in assessing robotic surgeon performance, it still lacks some of the skills specific for robotic surgery. Thus GEARS was developed by Goh et al. in 2012 as an expansion of GOALS and is the first robotic surgery-specific assessment tool. GEARS expanded upon GOALS to include many of the robotic surgery-specific skills that were missing in previous assessment tools such as camera control, robotic control, and operator workspace. ^, Because of its specificity for robotic surgery, yet generic and flexible grading scheme, GEARS has become the most widely used and extensively studied manual assessment tool to evaluate robotic surgery performance, especially in urology.

Like the previously described global assessment tools, GEARS can accurately distinguish surgeons by skill level based on previously used surrogates, such as prior caseloads, and consistently determine rank order of robotic surgeons, particularly those of lower skill level. ^, Unlike the previously described assessments, GEARS is the only global skills assessment tool that has been correlated to clinical outcomes after live surgery. In studying GEARS in robot-assisted radical prostatectomy (RARP), Goldenberg et al. found that GEARS scores of the overall cases as well as in specific steps (bladder neck dissection and vesicourethral anastomosis) were independent predictors of 3-month continence recovery. ^, GEARS scores have also been correlated with urethral catheter replacement rates and readmission rates after RARP.

Other global skills assessments

Although GEARS is specific for robotic surgery and widely used, it still does not encompass every skill required for robotic surgery. Thus several other robotic surgery-specific assessment tools have been proposed to address these missing domains. Siddiqui et al. proposed an adapted system by combining elements from OSATS and GOALS to create an assessment tool specifically for robotic surgery, termed Robotic-Objective Structured Assessment of Technical Skills (R-OSATS). ^, R-OSATS has demonstrated the ability to differentiate between various expertise levels, including faculty, fellows, and junior and senior residents, and to determine a minimum cutoff score for competence. ^,

Recently, Intuitive Surgical Inc. (Sunnyvale, CA) developed the Assessment of Robotic Console Skills (ARCS) to assess robotic console manipulation skills, including optimization of field of view and workspace and basic energy pedal skills, different skills than those measured by GEARS. In their validation study, ARCS accurately distinguished surgeons based on previous caseloads, making it another promising assessment tool in the future.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here