Severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) was first reported from Wuhan, China [Reference Zhu1]. Several studies have reported the transmission of SARS-CoV-2 from human to human, asymptomatic transmission and transmission in family and hospital settings [Reference Huang2, Reference Jiang3]. The first case in eastern Uttar Pradesh, India was reported from Basti town on 31 March 2020 [Reference Kant4]. Following this, the next case in this region was reported from Sant Kabir Nagar (SKN), a district adjacent to Basti . The first case in SKN was reported on 15 April 2020 where a 71-year-old male who had visited New Delhi was found positive. The infection in this case was limited with no positive contacts. Following this a second case was noticed from SKN who was identified as the index case of the present cluster and he had returned home due to countrywide lockdown. Although he remained asymptomatic during his stay, the infection spread further leading to a family cluster. The present study describes in detail the asymptomatic transmission and delineates the transmission dynamics using next-generation sequencing (NGS).
A 23-year-old male student suspected to be infected with SARS-CoV-2 along with two of his co-travellers on the same bus returned to SKN from Deoband, Uttar Pradesh during the lockdown. He was tested positive for SARS-CoV-2 on 17 April 2020, following which he was quarantined. As part of contact tracing, all of his family members (28 individuals) and seven relatives who were residing in the same house were tested. In all the individuals, nasopharyngeal and nasal swab samples were collected by the state government health department and sent to our laboratory for testing. All the samples were processed as per the standard protocol and stored at −80 °C. Eighteen out of 35 members were found positive. The positive and negative tested family members were quarantined separately. On repeat testing of negative cases on 02 and 03 May 2020, five more members tested positive for SARS-CoV-2. Overall 12 out of 36 people in the house remain infection-free throughout their quarantine period (median quarantine period: 18 days, range 16–26 days). The secondary attack rate in this familial cluster was found to be 65.7% (assumption: all people got infected in the house before quarantine).
The median age in this cluster was 20.5 years (range: 2 months−72 years). The cluster had 10 family members below 18 years of age. This cluster was composed of 19 males (52.8%) and 17 females (47.2%). To obtain the ‘P value’ for the association, Fisher's exact test was applied (Table 1). The family lived in a pukka (cemented) house with a total of eight living rooms and a separate kitchen. Overcrowding was noted in the family. Among the individuals tested positives, only three out of 24 developed mild symptoms. Details of the chronological events and family structure are shown in (Fig. 1). Three members in this cluster had co-morbid conditions and were also found infected. All 24 infected individuals recovered from infection and no case fatality occurred in this cluster.
NGS was conducted by preparing RNA libraries from the extracted RNA. The RNA libraries prepared were sequenced using the Illumina platform (Qiagen, Germany) to delineate the transmission dynamics using the positive samples to retrieve the complete genomic sequence of the SARS-CoV-2. The detailed protocol of the method used is described elsewhere [Reference Yadav6]. The pipeline used to obtain the SARS-CoV-2 sequences is depicted in Figure 2a. The retrieved sequences were aligned using the representative GISAID Indian SARS-CoV-2 sequences. A neighbour-joining tree was generated using the best model in MEGA version 7.0 [Reference Kumar, Stecher and Tamura7]. A bootstrap replication of 1000 replication was used to assess the statistical robustness. Amino acid variations were also observed for the different proteins encoded by SARS-CoV-2.
The percentage of genome recovered from 24 samples ranged from 1.49 to 99.99 and the relevant reads mapped lay between 0.0 and 92.35%. The details of the percentage of relevant reads mapped and the percentage of the genome retrieved for all the 24 samples are tabulated in Table 2. Eight genomic sequences were retrieved with a genome coverage ranging between 99.94 and 99.96% while the other 16 sequences were below 95.5%. The neighbour-joining tree as generated using the Kimura-2-parameter model demonstrated that the retrieved sequences lay in the B.6.6 pangolin lineage (https://pangolin.cog-uk.io/). Two distinct clusters were observed for the generated tree, one with the B pangolin lineage variants comprising of B.B.4, B.6, B.6.6 lineages and other with B.1 pangolin variants (B.1, B.1.1.306, B.1.36.8) (Figure 2). The amino acid variation analysis for different structural and non-structural proteins demonstrated the presence of multiple variant amino acid sites in the ORF1ab at position nsp2(G519S), nsp3(P1010S,T2016 K), nsp4(N2767 T) nsp6(L3606F), nsp12(A4489 V), nsp13(G5411 V), whereas protein S, ORF3a and ORF8 showed no distinct variation (Technical Appendix Table 1). The amino acid position of ORF1ab protein L3606F (nsp6) is shared with clade O and A3i, and substitution at position A4489 V (nsp12) is shared with clade A3i. The nucleocapsid protein of the studied strain showed a single variable site in amino acid sequence at position P9265L. The percentage of nucleotide and amino acid similarity for different genes are tabulated in the Technical Appendix Table 2.
Bold values in the table 2 reflect the eight clinical samples in whom the genomic sequences were retrieved by NGS and used in the phylogenetic analysis.
We describe here a familial cluster (n = 24), out of 36 who were found to be infected with SARS-CoV-2. The index case had a travel history and spent 24 days in the house before being tested and was asymptomatic. Physical overcrowding in the house provided a favourable environment for intra-cluster infection transmission. Restriction of movement of family members due to countrywide lockdown limited the spread in the community. Most of the infected individuals were asymptomatic and those having symptoms had only mild ones.
Of the negative family members, five turned positive on repeat sampling. The possibility of asymptomatic and reverse transcription-polymerase chain reaction (RT-PCR) negative individuals transmitting the infection has been shown in several studies with possible reasons for false-negative results of RT-PCR due to insufficient viral specimens and the low load of virus in the upper respiratory tract infection [Reference Chen8]. The viral load in symptomatic and asymptomatic individuals is shown to be similar [Reference Zou9] and they serve as potential source of infection in the community and hospital settings [Reference Jiang3, Reference Li10].
Females were infected more than males. Other factors like age, occupation, co-morbidity and marital status were not found statistically significant. The present cluster had 10 members of age <18 years, of which two were <5 years. Children being infected from the family members given the proximity and asymptomatic nature of the illness has been documented [Reference Chen8].
In India, multiple SARS-CoV-2 clades are reported to be circulating. The phylogenetic analysis demonstrated a distinct cluster, lying in the B.6.6 pangolin lineage. Further, genetic analysis of the sequences in this study demonstrated conserved amino acid variation in the ORF1ab regions. The variations were observed in the amino acid that contains trans-membrane domain (nsp3, nsp4 and nsp6), RdRp (nsp12) and helicase (nsp13). The implications of these changes need to be further explored.
This study provides insight into transmission of SARS-CoV-2 within a crowded household setting using genome sequencing, which is of notable value when assessing cluster outbreaks and transmission dynamics. With novel variants requiring attention and resources, the window period for such cluster analysis should be considered timely. However, of the 24 samples, genome was retrieved in only eight samples which might be due to low viral load; this highlights the limitation of sequencing in such situations.
This study encourages future implementation of genome sequencing when addressing outbreaks in real-time, particularly in crowded household settings, and tangentially highlights the effectiveness of lockdown measures.
The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821001989.
The authors acknowledge the Sant Kabir Nagar District Health authorities and District Administration for their support and help in the study, and thank all the participants and their families involved in the study. They also thank the support extended by all the technical staff of RMRC Gorakhpur laboratory for the technical assistance.
Financial support was provided from the Indian Council of Medical Research-Regional Medical Research Centre (ICMR-RMRC), Gorakhpur.
Conflict of interest
The authors do not have any conflict of interest.
Ethical approval for this study was taken from the ICMR-RMRC Gorakhpur Human Institutional Ethical Committee.
All data generated and/or analysed during the current study are available from the corresponding author on reasonable request.