SARS-CoV-2: Structure, Pathogenesis, and Diagnosis


SARS-CoV-2 and Coronaviruses

Evolutionary Origins

In the last two decades, three coronaviruses have caused outbreaks of varying scales, with the pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) representing the most recent threat to human health at a global level. Aside from severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV), which were responsible for the first two outbreaks of the 21st century, only four other coronaviruses that cause relatively mild disease in humans have been discovered: HCoV-229E, HCoV-NL63, HCoV-OC43, and HCoV-HKU1. , The SARS, MERS, and COVID-19 outbreaks have demonstrated that zoonotic coronaviruses can successfully cross species barriers to infect humans and cause a high level of pathogenicity and mortality.

SARS-CoV-2 belongs to the order Nidovirales, family Coronaviridae, subfamily Coronavirinae, and genus Betacoronavirus. Within the genus Betacoronavirus, four distinct lineages are assigned: HCoV-OC43 and HCoV-HKU1 belong to lineage A, SARS-CoV and SARS-CoV-2 belong to lineage B, and MERS-CoV belongs to lineage C. SARS-CoV-2 is further classified under the subgenus Sarbecovirus. , SARS-CoV and MERS-CoV both originated in bats, palm civets acted as the intermediate host for SARS-CoV, and camels served as the intermediate host for MERS-CoV. The genome of SARS-CoV-2 shares 80% sequence identity with SARS-CoV and presents a high degree of sequence identity to the genomes of bat coronaviruses RaTG13 and RmYN02. , The high level of similarity to bat-derived coronaviruses suggests that SARS-CoV-2 must have also originated in bats. , Although sarbecoviruses are known to undergo frequent recombination, assessment of the SARS-CoV-2 genome revealed no evidence to suggest that it originated from a recent recombination event. Early studies characterizing SARS-CoV-2 reported that it uses the same human receptor as SARS-CoV, angiotensin-converting enzyme-2 (ACE2), to enter and infect host cells. In contrast, MERS-CoV uses a receptor called dipeptidyl peptidase 4 (DPP4). Interestingly, SARS-CoV-2 possesses a polybasic cleavage site insertion (PRRA sequence) in the spike protein at the junction of the S1 and S2 subunits, which resembles a sequence that is present in MERS-CoV but absent in SARS-CoV and RaTG13. This sequence was identified as a putative furin cleavage site that may be acted upon by the proprotein convertase furin during viral egress.

The discovery of a coronavirus similar to SARS-CoV-2, pangolin CoV, in Malayan pangolins showing clinical signs of infection drove the suspicion that pangolins may serve as the intermediate host for SARS-CoV-2. The receptor-binding domain (RBD) in the spike protein of pangolin-CoV was almost identical to that of SARS-CoV-2, but pangolin-CoV lacked the furin cleavage site found in SARS-CoV-2. Moreover, many bat-derived coronaviruses, such as RmYN02, have been reported to contain a similar insertion in spike protein at the S1/S2 junction, suggesting that SARS-CoV-2 likely originated from multiple recombination events that occurred within viruses inhabiting bats and other species. With the accumulating evidence, it appears unlikely that pangolins acted as intermediate hosts facilitating SARS-CoV-2 spillover to humans. , Although the involvement of an intermediate host cannot be ruled out, it has been suggested that SARS-CoV-2 may have spilled over directly from bats to humans without requiring an intermediate host. The origin and cross-species transmission of SARS-CoV-2 is still under investigation and remains a subject of debate.

Structure, Genome, and Proteome

The positive-sense single-stranded RNA genome of SARS-CoV-2 extends to 29,891 nucleotides and encodes 9860 amino acids. In addition to the flanking 5′ and 3′ untranslated regions (UTRs), the SARS-CoV-2 genome includes coding regions for the structural proteins spike glycoprotein (S), envelope protein (E), membrane glycoprotein (M), and nucleocapsid protein (N), and several open reading frames (ORF1ab, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, and ORF10) that code for accessory structural and nonstructural proteins ( Fig. 2.1 ).

Fig. 2.1, SARS-CoV-2 Genome . In addition to the 5′ and 3′ untranslated regions (UTR), the SARS-CoV-2 genome contains coding regions for structural proteins spike protein (S), envelope protein (E), membrane protein (M), and nucleocapsid protein (N), and many open reading frames (ORF1ab, ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8, ORF9b, and ORF10). ORF9b is an alternative reading frame located within the N gene. ORF10 may not generate a functional protein in SARS-CoV-2. ORF1ab makes up more than 60% of the genome, and produces two polyproteins: pp1ab and pp1a, which are proteolytically cleaved by nonstructural protein 5 (Nsp5) and Nsp3 to yield the full set of nonstructural proteins. Nsp1-11 is generated through cleavage of the pp1a polyprotein, whereas Nsp1-10 and Nsp12-16 are generated through cleavage of the pp1ab polyprotein. The pp1ab polyprotein is produced from ORF1ab by a ribosomal frameshifting event involving a (–1) nucleotide shift during translation.

The first electron micrographs of SARS-CoV-2 revealed virus particles described as spherical with some pleomorphism and distinctive spikes that created the appearance of a solar corona. Coronavirus particles consist of an outer lipid bilayer envelope inserted with the spike, membrane, and envelope structural proteins. The inner core contains a helical nucleocapsid structure that is formed by the association of nucleocapsid phosphoproteins with the viral genomic RNA ( Fig. 2.2 ).

Fig. 2.2, Viral Structure . Coronavirus particles consist of an outer lipid bilayer envelope that is studded with the spike (S), membrane (M), and envelope (E) structural proteins. The spike protein protrudes out from the viral envelope, giving the appearance of a solar corona. The inner core of the virion consists of a helical nucleocapsid structure formed by the association of nucleocapsid (N) phosphoproteins with viral genomic RNA.

Structural Proteins

Spike Protein

The spike glycoprotein (S) is a structural protein that is critical for binding to the ACE2 receptor on host cells and facilitating cell entry. The S protein is embedded into the outer lipid bilayer envelope in a uniform distribution and extends out from the virion surface, producing the defining “corona”-like appearance. The SARS-CoV-2 S protein is a densely glycosylated homotrimeric class I fusion protein that is divided into two subunits, S1 and S2, which are separated by a multibasic furin protease cleavage site. The S1 subunit forms a globular head that binds to the host cell receptor, and the S2 subunit facilitates viral membrane fusion with the host cell membrane. The S1 subunit is composed of an amino- or N-terminal domain (NTD) and a carboxy- or C-terminal domain (CTD). The S1 subunit CTD functions as the RBD that binds to the human ACE2 (hACE2) receptor. ,

The S protein exists in a metastable prefusion conformation. On binding to the host cell receptor, it undergoes a significant structural transformation to enable fusion of the viral membrane with the host cell membrane. The furin cleavage site (RRXR motif) at the S1/S2 junction is proteolytically cleaved, leading to the separation of S1 from S2; however, the two subunits remain noncovalently bound to each other. , After S1/S2 cleavage and engagement of the RBD with the host cell ACE2 receptor, another cleavage site (S2′) in the S2 subunit is exposed, and this site also must be cleaved by proteases to release the fusion peptide, which is essential for membrane fusion and viral infectivity. Receptor binding destabilizes the prefusion trimer, resulting in the shedding of the S1 subunit and transition of the S2 subunit to a stable postfusion conformation.

To engage the host cell receptor, the RBD in the S1 subunit undergoes hinge-like structural movements that either expose or hide the residues involved in receptor recognition. The accessible state is referred to as the “open” or “up” conformation, and the inaccessible state is referred to as the “closed” or “down” conformation ( Fig. 2.3 ). The RBD consists of a core and a receptor binding motif (RBM), with the latter directly mediating contacts with the ACE2 receptor. Various studies have reported either equivalent or higher binding affinity of the SARS-CoV-2 RBD for ACE2, compared with its counterpart in SARS-CoV. , ,

Fig. 2.3, Spike Protein Conformations . The SARS-CoV-2 spike protein is a densely glycosylated homotrimeric class I fusion protein that is divided into two subunits: S1 and S2, with S1 binding to the angiotensin-converting enzyme-2 (ACE2) receptor on host cells, and S2 mediating membrane fusion. To bind to the ACE2 receptor, the receptor binding domain (RBD) in the S1 subunit undergoes hinge-like movements that either expose (“open” or “up” conformation) or hide (“closed” or “down” conformation) the residues that mediate receptor recognition. Protein data bank (PDB) structure IDs: 6VXX and 6VYB. (From Walls AC, Park YJ, Tortorici MA, et al. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. 2020;181[2]:281-292.e286. https://doi.org/10.1016/j.cell.2020.02.058 )

Envelope Protein

The envelope glycoprotein (E) in SARS-CoV-2 is an integral membrane protein of only 75 amino acids in length. Envelope proteins are viroporins, which are generally described as virally encoded small proteins that can form pores in the membranes of host cell organelles and modulate ion channel activity, among other functions. In lipid bilayers that mimic the endoplasmic reticulum–Golgi intermediate compartment (ERGIC) membrane, the SARS-CoV-2 E protein transmembrane domain forms a five-helix bundle that surrounds a narrow pore, a homopentameric cation channel. The elucidated N-terminal lumen/C-terminal cytoplasmic membrane topology of SARS-CoV-2 E is conducive to the conduction of Ca 2+ ions out of the ERGIC lumen, a role that may link the E protein to host inflammasome activation based on similar topology and involvement of the SARS-CoV E protein in this process. , Examination of E protein from other coronaviruses indicates that the protein is abundantly expressed in infected cells, but only a small proportion of the protein is incorporated into the virion envelope. The larger proportion is localized at sites of intracellular trafficking, such as the endoplasmic reticulum (ER), Golgi, and ERGIC membranes. At these sites, the E protein plays a role in virion assembly and budding.

Membrane Protein

The most abundant protein in coronaviruses, the Membrane protein (M), is a transmembrane glycoprotein that plays a role in delineation of the viral envelope shape and size. The M protein acts as a scaffold to regulate virion assembly by binding to other viral structural proteins at the site of budding and bringing them together to form the viral envelope. The M protein is functionally dimeric and may be able to associate with other M dimers to form a matrix-like layer. The M protein network possesses intrinsic membrane-bending properties, but its interactions with the S protein, E protein, and N protein–viral RNA complexes are important for effective membrane curvature and has an impact on virion size. , The M protein interacts with the spike protein to facilitate its retention in the ERGIC/Golgi complex. The M protein can adopt two conformations; the elongated conformer of M protein plays a role in spike incorporation into new virions. The structural properties of this conformer facilitate the formation of a rigid and convex viral envelope. The C-terminal domain of the M protein further interacts with and stabilizes the internal core and N protein–RNA complexes that make up the nucleocapsid, to promote completion of virion assembly. Expression of the M, S, and E proteins is minimally required for the production of SARS-CoV-2 noninfectious virus-like particles.

Nucleocapsid Protein

The nucleocapsid (N) protein is a multivalent RNA-­binding protein, the primary role of which is to package the 30-kb-long genomic RNA compactly into viral ribonucleoprotein (vRNP) complexes that can be accommodated into the approximately 80-nm-diameter viral lumen. The N protein self-associates and naturally exists in a dimeric state, although it can also form oligomers, a property that is likely important for the formation of vRNPs. The protein consists of two globular domains, the NTD and the CTD, with the NTD containing RNA-binding sites that may interact with the viral genomic RNA packaging signal and the CTD forming a dimer with an RNA-binding groove to aid in vRNP assembly. The NTD and CTD are separated by a central, highly conserved, intrinsically disordered region containing a serine arginine (SR)–rich sequence with multiple phosphorylation sites that are targeted by cytoplasmic kinases to regulate the function of the N protein. The N protein also includes N-terminal and C-terminal intrinsically disordered regions; the latter and the CTD are both involved in M protein binding to anchor the vRNP complex to the inner surface of the viral envelope.

Structural analysis of SARS-CoV-2 viruses by cryoelectron tomography revealed that vRNPs associated with the viral envelope were stacked into cylindrical or helical filament-like assemblies. Efficient packing of the virus particles by N proteins could be accomplished through a “beads on a string” formation with viral RNA linking neighboring vRNPs. Another study determined that the vRNPs in SARS-CoV-2 existed in different arrangements based on the shape of the virions. In spherical virions, there was a higher incidence of membrane-proximal vRNPs packed internally against the envelope in a “hexon” formation. In ellipsoidal virions, membrane-free vRNPs were arranged like pyramids or in “tetrahedron” formations. However, in both arrangements, neighboring vRNPs were equally spaced apart, and in situ projection suggests that tetrahedrons might be able to assemble into hexons. Therefore the vRNP triangle presents a durable building block that can withstand environmental and mechanical stresses and allows for the adoption of various arrangements within the virion.

Accessory Proteins

ORF3a

SARS-CoV-2 ORF3a encodes a viroporin that shares 73% sequence identity with SARS-CoV ORF3a. ORF3a has been demonstrated to play a role in blocking autophagy and conferring viral escape from lysosomal destruction, a function reported to be unique to SARS-CoV-2 ORF3a and not observed in its SARS-CoV counterpart. , Additionally, ORF3a is involved in inflammasome activation, pyroptosis, and apoptosis induction. , ,

ORF3b

Because of the presence of premature stop codons, ORF3b is a short protein that is 22 amino acids in length, with no homology to its SARS-CoV counterpart, which is 154 amino acids long and known to function as an interferon (IFN) antagonist. As a result of the completely different sequences, SARS-CoV-2 ORF3b was originally predicted to lack any similarity in function to SARS-CoV ORF3b. However, this novel short protein has been demonstrated to be a potent IFN antagonist that can suppress type I IFN (IFN-I) induction even more effectively than SARS-CoV ORF3b. SARS-CoV-2–related viruses found in bats and pangolins encode a similar truncated ORF3b with IFN antagonist activity. The C-terminal region of SARS-CoV ORF3b contains a nuclear localization signal (NLS) that is lacking in SARS-CoV-2 ORF3b. Truncation of the C-terminus of SARS-CoV ORF3b enhances its IFN antagonist activity, suggesting that the NLS may impair its ability to block the nuclear translocation of IRF3, a transcription factor that induces IFN-β (IFNB1) expression.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here