Using computational modeling to design antiviral strategies and Using computational modeling to design antiviral strategies and understand plant-virus interactions understand plant-virus interactions

: Using a bioinformatics approach to identify binding pockets between proteins is a preferable method before modifying the genome to delineate host interactions with viruses. Based on extensive proteomics data in numerous databases, several interaction prediction methods are available to identify binding sites between viruses and hosts at the individual residue level, but little is known about the interaction prediction strategy for plant viruses. Begomoviruses, belonging to the family Geminiviridae, constitute a group of circular single-stranded (ss) DNA viruses that encode multifunctional proteins responsible for viral replication, causing severe diseases in multiple host plants. These viruses usually escape through plant defense mechanism overcoming physical and chemical barriers to trigger the infection with all possible combinations of interaction in the target host protein partners. Here, we have applied our computational approach for plant virus interaction at domain level. Previous study showed that myristoylation-like motif in Begomovirus cotton leaf curl Multan associated betasatellite protein βC1 (CLCuMB-βC1) played an important role in interaction with ubiquitin conjugating enzyme protein (UBC3) in tomato. This kind of binding at residue level has been validated using in vivo and in vitro molecular approach. Here, an in silico approach was utilized which is a combinatorial source of previous and recent protein prediction methods to determine all possible identified interface sites between βC1 and UBC3. This molecular interaction of CLCuMB-βC1 was further verified in the actual host i.e. cotton using a bimolecular fluorescence complementation system and yeast two-hybrid assay. This computational and molecular data will help to identify the interaction between virus and host before using any expensive and time-consuming molecular techniques.


Introduction
Protein-protein interaction (PPI) is one of the important steps for connection among proteins to form a complex network and perform multiple functions at the cellular level (Ali et al., 2023).Based on virus host coevolutionary study, viruses undergo rapid mutations in their genome to adapt and survive according to the environmental changes whereas, host species showed fewer changes in their genome sequence during long term evolution (Simmonds et al., 2019).Due to this evolutionary arms race, virus and host proteins are modifying their binding sites in interface regions either to diminish this relationship or to gain a stable interaction by overpassing the host immune system (Franzosa and Xia, 2011).Therefore, it is a critical need to study host-pathogen interactions identifying all possible binding sites between plant and different pathogens such as fungi, bacteria and viruses (Kamal et al., 2024;Lu-Lu et al., 2014).In silico approaches have been playing an important role in host-pathogen interaction study and such methods were used to cover a huge amount of interactomics data (Haroon et al., 2023;Rao and Srinivas, 2011).Since experimental approaches led to the creation of multiple databases that provided a significant knowledge about virus-host complexes at the protein level (Sagendorf et al., 2020).Therefore, there is a need for improved computational methods to analyze the entire PPI networks by integrating heterogeneous resources before using costly and time-consuming experimental methods (Ding and Kihara, 2018).
For PPI study based on bioinformatics approach, several deep learning methods are available to predict protein interactions in plants to study biological network at a large scale (Pan et al., 2022;Zheng et al., 2023).Among these tools, sequence-based methods also play an important role to identify binding sites using evolutionary information, psychochemical properties and domain-based knowledge (Guo and Chen, 2019;Sun et al., 2017).Initially, it was observed that it is not necessary that interaction will occur through domain-domain binding (Segura et al., 2015).In this case, structure-based methods were employed to predict fine structures and to determine binding pocket between two proteins.Further, the combination of sequence and structure based methods predicted accurate interface region with all possible information (Aloy et al., 2003).
Relating this in silico approach to plant viruses such as geminiviruses, limited data is available on interaction prediction based on the binding site.Geminiviruses possess single-stranded DNA (ssDNA) genome that can infect many economically important crops throughout the world including cotton leaf curl virus and maize streak virus (Alegbejo et al., 2002;Briddon and Markham, 2000), beat curly top virus (Creamer et al., 1996), African cassava mosaic virus (Legg et al., 2011) and tomato yellow leaf curl virus in Asia, Africa, America, and Europe (Kim et al., 2011).Due to its emerging threat to multiple hosts, geminiviruses have become a global problem in agricultural trade (Varma and Malathi, 2003).Currently, editing at the binding site could produce favorable traits in the genome, without disrupting host structure and functions (Safari et al., 2019).Therefore, it is important to perform in silico PPI study to diagnose plant diseases including geminiviruses to explore its relationship with plants, which can provide us a better antiviral defense strategy (Kamal et al., 2019).
Notably, symptoms of CLCuD are caused by Cotton leaf curl Multan betasatellite (CLCuMB) encode βC1 (CLCuMB-βC1) gene that acts as a pathogenicity determinant (Briddon et al., 2001).Based on the role of CLCuMB-βC1, we have trained our in silico interaction prediction approach to identify the binding site at residue level.Findings in our study were similar to the data experimentally proven for Solanum lycopersicum encoded ubiquitin conjugating enzyme (UBC3) protein that binds with CLCuMB-βC1 to induce viral symptoms in tomato leaf curl disease (Eini et al., 2009).This study was based on deletion mutants to identify binding motifs at amino acid level in both virus and host protein.
CLCuMB-βC1 interaction with the native host G. hirsutum encoded ubiquitin-conjugating enzyme UBC1 (GhUBC1) was identified using bimolecular fluorescence complementation system and yeast two hybrid assay.Therefore, this molecular and computational data authenticates our proposed in silico approach to identify binding motifs in the host for geminiviruses.The results provided here are the evidence that sequence-structurebased information can be used for effective antiviral strategy and precise genome editing in the crops.

Plant lines and genes amplification
A distinct isolate of cotton leaf curl Multan betasatellite (AM774307) collected from CLCuD symptomatic cotton plant and G. hirsutum host protein UBC1 (AY082004) isolated from wild type cotton cultivar UA222 resistant variety were used as an inoculum source for this study.Total RNA extracted using RNeasy plant mini kit (Cat# 74904, Qiagen) was reverse transcribed to generate cDNA using a RevertAid first strand cDNA synthesis kit (C#K1621, Thermo Scientific) with oligo (dT) primer.Clones were prepared using gateway pENTR-D-TOPO vector (C#K240020, Invitrogen) and entry clones were subcloned into destination vectors using LR ClonaseTM enzyme mix (C#11791019).For in vivo protein interaction, wild-type Nicotiana benthamiana seeds were grown in Sunshine Mix LC1 (Sun Gro Horticulture) in a greenhouse with a 16 h light/8 h dark cycle.All positive clones were confirmed with sequencing using gene-specific primers (Table S1).

In planta interaction study using BiFC assay
To validate in silico predictions, protein expression was studied with bimolecular fluorescence complementation system.For this propose, gateway entry clones were fused into destination vector pSITE-2CA (ABRC; Ohio).This vector system produced CLCuMB-βC1 and GhUBC1 positive clones carrying GFP fusion protein.Agroinfiltrated leaves were studied after 24-48 h incubation under confocal microscopy for expression analysis.A minimum three leaves were used to conduct subcellular localization study and images were acquired using Leica TCS SP8 X microscopy at 20x dry, 40x dry and 63x oil for fine detail images and LAS X software were used to analyze the protein fluorescence signals.

Yeast two hybrid experiment
The interactions predicted in silico were validated with another molecular technique known as yeast two hybrid system.Gateway entry clones were fused into bait and prey vectors pEZY202 and pEZY45 (Guo et al., 2007) being used were a gift by Yu-Zhu Zhang (Addgene plasmid # 18,704 and 18,705).Yeast transformation was done following lithium acetate yeast transformation protocol in a strain EGY48.Positive interaction was screened with Base/Gal/ Raf containing Double drop out medium [SD-Trp-Leu (+L)] and triple dropout medium [-His/-Trp/-Ura (-L)] supplemented with 3-Amino-1,2,4-triazole (3-AT) ranging in a serial dilution to enhance positive interaction only.

Structure-function identification and binding affinity prediction
To conduct structure-based interaction study for CLCuMB-βC1 and UBC3, pdb structures were predicted from I-TASSER (Heider and Barnekow, 2008).Among all predicted models of virus and host protein, the most accurate model was selected based on a high C-score.C-score is basically a confidence score ranging from -5 to 2 that estimates the accuracy for all predicted five models.Higher the C-score, corresponds to the best model (Mustafa et al., 2019).As we know ubiquitin conjugating enzyme (E2) protein has been widely studied in different species.Therefore, the SIUBC3 sequence was compared against pdb database using PSI-BLAST (Altschul et al., 1997) to determine similarity and function annotation with any reported pdb structure.Predicted GhUBC1 structure shared 95% identity with chain A of A. thaliana E2 protein (4X57) [data unpublished], 80% with Saccharomyces cerevisiae (IQCQ) (Cook et al., 1993), and H. sapiens (IUR6) (Xu et al., 2008).Moreover, the root mean square deviation (RMSD) score for SIUBC3 with all these models was 1.36 Å (Figure S2) that builds up the confidence in case of our predicted structure.After structure prediction, PPI was identified using binding affinity in term of change in Gibbs free energy (rrG) between CLCuMB-βC1 and SIUBC3.The high negative value for rrG corresponds to a more stable protein complex.Bioinformatics tool PPA-Pred (Yugandhar and Gromiha, 2014) generated -11.07 kcal/mol energy for CLCuMB-βC1 and SIUBC3 complex, PRISM predicted -21.04 kcal/mol and PROGIDY (Van Zundert et al., 2016) provided -6.7 kcal/mol (Table S2), showing strong bonding between these two proteins.High negative value for rrG corresponds to more stable protein complex.
To determine evolutionary conserved function of ubiquitin-conjugated protein, two physical interactions for UBC protein were determined using BioGRID (Chatr-Aryamontri et al., 2017) against A. thaliana AtUBC2 protein, showing ubiquitination function in cytosol.In case of SIUBC3 and CLCuMB-βC1 interaction, subcellular localization for both proteins was determined with PredictProtein (Yachdav et al., 2014).SIUBC3 location was identified in cytosol which means it could be present either in cytoplasm or in the nucleus.Whereas, CLCuMB-βC1 can bind with SIUBC3 in any of these cell compartments to use ubiquitination cycle of host plant for virus infection.

Identification of binding site between CLCuMB and SIUBC3
After binding energy and structure prediction, the binding site was predicted with sequence and structurebased methods.For sequence analysis, PSIVER (proteinprotein interaction SItes prediction serVER), Bspred, PredictProtein, and NSP-HomPPI (nonpartner-specific HomPPI) were used to extract all possible features including psychochemical properties for each residue of CLCuMB-βC1 and SIUBC3.All these methods use some threshold parameters to generate specific data.For example, PSIVER generates residue-based data using two threshold values, one as low (>=0.37)and another with higher specificity (>=0.56).Optimum threshold value >=0.4-0.56 was used to avoid any false positive data.Bspred produced data using sequence profiling, secondary structure, and hydrophobicity scale to extract features in the form of a neural network (NN) scoring function.The NN score >-0.1 was the most accurate score for interface prediction between CLCuMB-βC1 and SIUBC3.NSP-HomPPI identified interaction using relative accessible solvent area (RASA) value.Within 10%-30% RASA value, it has identified binding sites in CLCuMB-βC1 and SIUBC3 with safe-mode zone.Using this optimum standard, data obtained from PSIVER, Bspred, NSP-HomPPI are shown in Table S2.
For structure-based study, PrISE, CPORT [PIER, PPISP, SPPIDER, PINUP], predUS, VORFFIP and Promate identified binding sites in both proteins.These methods generated data based on RASA and B-factor.RASA predicts amino acids that are clustered either at the core or on the surface available for interaction.10%-30% RASA provides false positive results for interaction prediction at residue level using structure-based methods.Using this optimum threshold value for RASA using VORFFIP and PrISE, few residues from N-terminal of SIUBC3 and C-terminal of CLCuMB-βC1 were identified in a safemode zone, shown in Table S2.While, B-factor (Liu et al., 2014), also called Debye-Waller factor or temperature factor is used to measure displacement of atomic positions from its mean position and most of the binding site methods use color scheme in protein structures to show predicted B-factor values.Collective information of all methods, tools, servers mentioned here which are used for PPI study is shown in Table S3.
After binding affinity score, interface prediction that calculates the mutual score for both virus and host was determined via machine learning and protein docking methods.Interface methods identify residues in SIUBC3 and CLCuMB-βC1 that are involved in binding with each other.Machine learning method PPiPP (Ahmad and Mizuguchi, 2011) predicted propensity score for each amino acid and identified residues from the central region of CLCuMB-βC1 that are found to be involved in binding with N-and C-terminal residues of SIUBC3; at ≤6 Å distance (Figure 1A).Another machine learning method PRISM predicted amino acids at position 28-31 in SIUBC3 as binding site whereas amino acids at position 75-80 in CLCuMB-βC1 were determined as hot region for interaction (Figure 1B).PAIRPred (Afsar Minhas et al., 2014) has also predicted interface site using sequence and structure information.For SIUBC3/CLCuMB-βC1, residues at threshold score of ≥0.5 for receptor (SIUBC3) and ligand (CLCuMB-βC1) were selected in interface site as shown in Figure 1C.
HADDOCK (Van Zundert et al., 2016), a machine learning method has also identified active residues from SIUBC3/CLCuMB-βC1 complex with the best Z-score of -1.8 among 19 clusters (Figure 2A).All surrounding (passive) residues within 5 Å were also selected along with active ones.From all these machine learning methods, residues at the N-terminal in SIUBC3 and residues from the 50-80 position of CLCuMB-βC1 were identified.Furthermore, interaction at residue level was predicted using protein docking methods ZDOCK 3.0.2(Pierce et al., 2014) and Docking2 at ROSETTA v3.2 (Lyskov et al., 2013).These methods predicted the top ten models for SIUBC3/CLCuMB-βC1 complex.Interacting residues within 5 Å between chain ' A' of SIUBC3 and chain 'B' of CLCuMB-βC1 were highlighted in two colors as shown in Figure 2B and C. Common residues among all ten models were selected as binding residues in this complex.

Interaction prediction using intermolecular distance
Rotamericity study and distance measurement is another approach to determine the interface site for interaction.The lower distance between interface residues further validated the interaction between SIUBC3 and CLCuMB-βC1 protein.Among all predicted residues in SIUBC3, interface residue Asp-29 binds with CLCuMB-βC1 Phe-62 and Asn-69 at a low distance of 9.3 Å and 8.7 Å respectively.Similarly, SIUBC3 interface residues Lys-72 and Tyr-145 binds with CLCuMB-βC1 Glu-51 and Tyr-50 at a distance of 9.8 Å and 11.2 Å respectively (Figure 4A).However, SIUBC3 Gln-34 possessing a low binding score binds with CLCuMB-βC1 Lys-24 with a high distance of 15.4 Å. which shows only interface residues are present close to each other for a stronger interaction.In the case of CLCuMB-βC1, Phe-62 is the only residue which can bind with most of the host residues at a low distance shown in Figure 4B.Other residues in CLCuMB-βC1 such as Met-104 and Asp-105 possess high binding scores and bind with host residues solely in α-helices and β-sheets indicating strong affinity upon interaction.Moreover, these two residues A. PRODIGY from HADDOCK predicts binding energy with -1.8 Z-score, high negative value indicates better interaction and interface region predicted from Whiscy is shown in a mesh cartoon.Interacting residues were selected within 5 Å, highlighted with red and blue color.B. Top ten models predicted by ZDOCK were aligned using PyMOL.Binding site was determined using 5 Å, highlighted with red and blue.The most reliable model among them was further studied to identify interface site.C. Docking2 generates interface result in graphical form with top ten models.These models were explained as protein decoys Based on Interface score (I-sc) with low RMSD or total energy score with low RMSD to ensure reliable result from the server.Total ten models generated with high energy score from Docking2 server was further aligned to identify binding site between virus and host protein.
can bind with their neighboring residues such as Tyr-50 and Gln-51.These four residues in CLCuMB-βC1 show a strong bonding and form a closed pocket with SIUBC3 coded Lys-72 and Tyr-145 for interaction Figure 4C.

In silico deletion mutagenesis
These in silico results were compared with the previously reported by Eini.O (Eini et al., 2009) for CLCuMB-βC1 interaction with Solanum lycopersicum UBC3 (SlUBC3).In this study, deletion mutant revealed that first 55 amino acids of CLCuMB-βC1 do not play any role in interaction with SlUBC3.Similarly, deletion of last 16 amino acids at c-terminal including myristoylation signal (103GMDVNE-108) or last 39 amino acids disrupted the interaction with SlUBC3 that validate in silico analysis for CLCuMB-βC1 to identify interaction for SIUBC3.
In the case of SIUBC3, SIFT (Ng and Henikoff, 2003) and ROSETTA sequence tolerance tool (Smith and Kortemme, 2011) have predicted tolerated and nontolerated substitution against each amino acid.Using SDM (Smith and Kortemme, 2011), mCSM (Pires and Ascher, 2017), iMUTANT 3.0 (Capriotti et al., 2005) and CUPSAT (Parthiban et al., 2007), there was a clear reduction in ∆∆G value for SIUBC3 at position 26-32 and 145-148.This energy fall was observed after substitution with tolerated/nontolerated residues, with alanine residues and with deletion only.Deletion of these regions indicated low negative value for ∆∆G, though it can alter normal function of the protein.The predicted structure of parent SIUBC3 was compared with all structures that are substituted for mutagenesis.Structure alignment shows 95% identity except the structure containing deletion sequence from 26-32 and 145-148 as shown in Figure 5.This information postulated that the deletion of binding regions impaired the interaction with CLCuMB-βC1 while it can alter SIUBC3 function due to the changes in its structure.Therefore, deletion for PPI is not an accurate practice and deletion substitution with Alanine scanning or tolerated amino acid is a better approach to delimit the interaction with intact host protein.These predicted in silico results were also verified with molecular techniques to further confirm this computational based study.

In Planta BiFC assay for interaction prediction
After in silico study, wet lab experiment was done to determine the interaction between cotton leaf curl Multan betasatellite protein and cotton encoded UBC1.For this study, bimolecular fluorescence complementation (BiFC) assay was performed.GhUBC1 and CLCuMB-βC1 entry clones amplified from pENTR-D-TOPO vector were fused into the destination vector pSITE-2CA (green fluorescent protein-GFP) to identify interaction.After successful agroinfiltration into N. benthamiana leaves, fluorescence expression was observed under confocal microscopy.Strong BiFC signal under GFP marker indicates CLCuMB-βC1 binds with GhUBC1 in cotton shown in Figure 6A.The presence of guard cells along epithelial lining of the interaction expression shows subcellular localization in cytoplasm.This subcellular localization was confirmed at higher magnification showing strong binding between GhUBC1 and CLCuMB-βC1 complex.However, no nonspecific signal or fluorescence expression was observed in case of the negative control (Figure 6B) that validates agro-transformation procedure for BiFC experiment.

In vivo approach for interaction study
Another molecular approach to identify interaction between GhUBC1 and CLCuMB-βC1 was Y2H.The confirmed gateway clone of GhUBC1 was subcloned into bait vector pEZY202 (BD), and CLCuMB-βC1 protein was fused into prey vector pEZY45 (AD).After successful transformation into yeast strain EGY48, screening was done on different selection media to identify positive interaction.Positive colonies appeared for GhUBC1 on SD-Trp-Leu (double dropout medium/+L).Colonies harboring GhUBC1 gene were further transformed with CLCuMB-βC1 onto selection media SD-Trp-Leu-His (triple dropout medium/-L).Colonies appeared for GhUBC1/CLCuMB-βC1 complex, indicates positive interaction between them (Figure 7A).To enhance positive interaction only, different concentrations (10 mM and 20 mM) of 3-amino-1,2,4-triazole (3AT) were used with -L media.At a high amount of 3AT, few colonies appeared, an indication of positive interaction (Figure 7A).Two negative controls carrying empty bait with CLCuMB-βC1 and another control carrying both empty bait and prey vector were used.No colonies appeared for emptybait/ βC1 on -L supplemented with 3-AT (Figure 7B) and the same results were observed for empty construct (Figure 7C), validating the transformation protocol for Y2H.Both in planta and in vivo results validated sequence-structurebased interaction prediction study which can be used for any virus-host interaction before any molecular-based assays.

Discussion
PPI plays a critical role in several biological processes such as cell signaling, gene silencing, and defense mechanisms (Mustafa, 2024;Zhao et al., 2024).PPI study is mainly involved in understanding the relationship between two proteins and their possible role in cellular machinery after interaction.All experimental techniques such as X-ray crystallography (Friedberg et al., 2008), immunoprecipitation coupled to mass spectrometry (IP-MS) (Li et al., 2016) and Y2H (Fields and Song, 1989) have been extensively used to study proteins structures and its role in biological mechanism (Zhang and Wei, 2015).However, these sophisticated and expensive molecular techniques are time consuming and produces limited data as compared to the massive amount of data stored publicly in the data banks (Xing et al., 2016).Here, in this study, we have proposed an in silico approach to identify interaction between plant and viral protein.Such approaches are applicable to study numerous human infections to help scientist working on better disease management (Malik and Dhuldhaj, 2023;Mohammadi et al., 2022) and offer computer aided networks and tools to combat certain infections (Yang et al., 2020).
Focusing on host pathogen study, this computer based approach has provided successful results at domain level predictions for GhSnRK1 with CLCuMB-βC1 and calmodulin like protein (GhCML11) with Cotton leaf curl Multan virus (CLCuMV) encoded transcription activator protein (TrAP) (Kamal et al., 2019).This study was further focused at the motif level among domains providing the evidence that sequence-structure based information can be used for effective antiviral strategy and precise genome editing in the crops.However, retrieving the data from these interface methods corresponds to the similar molecular data for Solanum lycopersicum encoded UBC3 and CLCuMB-βC1 at the residue level.
Using this in silico method, sequence based information from SIUBC3 and CLCuMB-βC1 indicated high binding affinity in terms of ∆G/∆∆G.These methods extract features from protein sequences such as binding affinity, distribution of hydrophobic and hydrophilic amino acids, physicochemical properties, evolutionary conserved regions and motifs (Shen et al., 2007).These sequence-based methods provide limited knowledge for direct physical contact between two proteins.Therefore, alternative sources of information using structure coordinates have complemented sequence based methods in the PPI study (Esmaielbeiki et al., 2016).Structure based methods retrieved data using RASA value to calculate propensity score (binding score) for each residue present on the surface for interaction.Based on this methodology, we have identified only 17 residues among 148 residues of SIUBC3 while CLCuMB-βC1 possesses 16 residues with high binding scores.Furthermore, B-factor data was produced for both SIUBC3 and CLCuMB-βC1 structures to identify binding sites.B-factor highlights interacting residues in hot color (light blue to dark red) which is useful to identify hot regions among two protein sequences.
Along with sequence-based methods, protein docking methods also provide knowledge on interface regions present in close proximity at atomic level, involved in PPI.Though, interaction predictions generated from these protein docking tools depend on the quality of the tool or server for predictions (McConkey et al., 2002).Therefore, interaction data was not depended on a single docking method and we have performed extensive analysis from various sequence and structure-based methods.However, it was observed that consensus scoring is a successful approach to identify strong binder among predicted residues to avoid noisy data and improve the prediction of bound residues in receptor-ligand interaction (Charifson et al., 1999;Guedes et al., 2014).Here, during final residues scoring, dockings methods identified only three residues Asp-29, Lys-72, and Tyr-145 in SIUBC3 that binds with βC1 residues Phe-62, Asn-69, Glu-51, and Tyr-50 at a low distance for stronger interaction.It was also determined that residues in CLCuMB-βC1 such as Met-104 and Asp-105 possess high binding score during interaction and these two residues are already a part of myristoylation signal for infectivity.This in silico predictions for virus protein correlates with the study on SlUBC3/CLCuMB (Eini et al., 2009).It has already been studied that myristoylation signals present in geminiviruses such as East African cassava Cameroon virus (EACMCV) enhance the pathogenicity and symptom development (Fondong et al., 2007).In case of CLCuMV, myristoylation signal in βC1 (103-108) binds strongly with that showed severe infection in CLCuD and deletion of this motif indicated weak interaction with ubiquitin protein in Y2H assay.Moreover, computational biology offers in silico mutagenesis tools that provide information for accurate in-frame insertions and deletions.For accurate mutation, it is preferable to infer the data from highly conserved to less conserved residues using conservation score for a protein.For example, using conservation score from consurf (Ashkenazy et al., 2010), it was observed that most of the predicted residues in SIUBC3 possess a very low conservation score shown in Table S4.Based on conservation score and consensus scoring, substitution deletion of identified regions indicated a low negative value for ∆∆G, suggesting a low interaction.Therefore, we should focus on the host protein to identify the binding region because it is more favorable to modify the host protein and to delimit the interaction with plant pathogens (Wallqvist et al., 2000).Hence, sequence conservation score for each amino acid is a better choice to study function of a typical protein with substitutions.
After extensive analysis on UBC sequence and structure study, CLCuMB-βC1/GhUBC1 interaction was validated with molecular techniques such as BiFC and Y2H.Investigating interaction with multiple approaches, there are less chances of observing false positive or false negative results during PPI study (Zehrmann et al., 2015).Using in planta method, binding of CLCuMB-βC1 with ubiquitin-conjugating enzyme (UBC1) was observed in the cytoplasm.This cytoplasmic based interaction proposed that βC1 hijacks plants ubiquitination machinery to help infected cells for posttranslational modifications during cell cycle.This theory was previously studied based on strong binding of TYLCCNB-βC1 with UBC protein that inhibits the ubiquitination cycle and accumulates the infection in plant cells (Yang et al., 2008).This finding suggested that binding of CLCuMB-βC1 with GhUBC1 resulted in virus infection in the cotton.

Conclusion
Our findings suggested that computational biology may lead us to understand the epidemic spread of CLCuD complex, showing that virus-virus and virus-host interaction has a potential role in studying the molecular basis of pathogens and their infectious mechanism.Further, these methods are useful for studying biological pathways which are responsible for developing any disease in humans and animals.In brief, data from bioinformatics and molecular tools indicate interaction prediction methods can be used to determine interaction for agricultural important crops before manipulating their genome with editing tools.

Funding
This research was supported by The Researchers Supporting Project number (RSPD2024R952), King Saud University, Riyadh, Saudi Arabia.

Authors contributions
HK and MMZ did the experimental work and bioinformatics analyses.HK and MMZ wrote the first draft.AR, AI, ZA, HP, KME, AS, UF, and XJ finalized and reviewed the article.All authors have read and approved the final manuscript.

Ethics declarations
Ethics approval and consent to participate Not applicable.

Figure 1 .
Figure 1.Interface prediction score using PPiPP, PRISM and PairPred.A. Sequence-based interface method PPiPP predicts binding residues from both virus and host proteins.Residues with high score are shown in bold.B. Using B-factor data generated from PRISM, interacting residues were identified and are shown in yellow and green color.C. Residues that are involved in binding are predicted above threshold value 0 after retrieving information from both sequence and structure of a protein.B-factor data applied on SIUBC3 and CLCuMB-βC1 structures indicates highly interacting region in both proteins.

Figure 2 .
Figure 2. Interface site prediction between SIUBC3 and CLCuMB-βC1 using PRODIGY, ZDOCK, and Docking2 at ROSETTA server.A. PRODIGY from HADDOCK predicts binding energy with -1.8 Z-score, high negative value indicates better interaction and interface region predicted from Whiscy is shown in a mesh cartoon.Interacting residues were selected within 5 Å, highlighted with red and blue color.B. Top ten models predicted by ZDOCK were aligned using PyMOL.Binding site was determined using 5 Å, highlighted with red and blue.The most reliable model among them was further studied to identify interface site.C. Docking2 generates interface result in graphical form with top ten models.These models were explained as protein decoys Based on Interface score (I-sc) with low RMSD or total energy score with low RMSD to ensure reliable result from the server.Total ten models generated with high energy score from Docking2 server was further aligned to identify binding site between virus and host protein.

Figure 3 .
Figure3.Cumulative result of sequence and structure-based approaches for interaction prediction between SIUBC3/CLCuMB-βC1 complex.A. All the data generated from sequence and structure-based methods including docking methods were stored in a file with binary numeral system.Orange bar shows binding site score for virus-host using docking methods.Blue bar represents consensus of all methods for residue based binding site prediction.Residues above threshold 2 possess high score in case of docking data and are shown in black dots.These residues are also shown in consensus data in black dots to determine their score.In case of consensus data, residues with threshold ≤7 possess high score and are shown in yellow dots.B. Interacting residues from sequence and structure methods including docking methods selected are highlighted in SIUBC3-βC1 complex.These residues formed a binding pocket shown in black circles at different geometrical angles, indicating strong binding affinity of virus with its host protein in tomato.

Figure 4 .
Figure 4. Distance measurement of predicted residues from SIUBC3 and CLCuMB-βC1.Most of the binding residues (shown in red for host and blue for virus) possess minimum distance for interaction which provided a mean to identify potential interface site.A single residue either in virus or host can bind with several residues at the same time giving strength to the interaction.A. LYS-24 in CLCuMB-βC1 can bind with SIUBC3 at two different locations and similarly Asp-29 in SIUBC3 binds with more than one residue in CLCuMB-βC1.B. Among all predicted regions in CLCuMB-βC1, PHE-62 position binds with the hot region 26-33 present in SIUBC3 with minimum distance especially VAL-26 and HIS-32 possess only 6.9 Å. C. Residue at position 105 in CLCuMB-βC1 plays a role to stabilize the interaction.It also shares binding energy with other two residues TYR-50 and GLU-51 present in its neighbor in CLCuMB-βC1 which further binds with SIUBC3 residues at position 145-148 at c-terminal.

Figure 5 .
Figure 5. SIUBC3 protein sequence-structure comparison.In silico mutagenesis study indicates substitution with Alanine scanning and residue tolerance did not impair protein structure.On the other hand, deletion in the sequence (red color loop in black circle) predicted the SIUBC3 structure with missing loop, indicating host protein structure has become unstable.

Figure 6 .
Figure 6.In planta validation of GhUBC1 interaction with CLCuMB-βC1 using bimolecular fluorescence complementation system.Positive clone after sequencing was agroinfiltrated into Nicotiana benthamiana plants with OD 600 -0.8.After 24-72 h, several leaf sections were studied under confocal microscopy.A. Positive BiFC signals for GFP marker indicates interaction between CLCuMB-βC1 and host protein GhUBC1 in the cytoplasm.B. Negative control did not produce any nonspecific fluorescence expression.All the images were acquired at 20X zoom option.Scale bar = 50 µm.

Figure 7 .
Figure 7.In vivo CLCuMB-βC1 interaction study with GhUBC1 using yeast two-hybrid system.GhUBC1 protein as a bait and CLCuMB-βC1 as a prey were transformed in the EGY48 strain.After successful yeast transformation on +L (SD-Trp-Leu) media, colonies were screened on -L (SD-Trp-Leu-His) media supplemented with 3-AT to enhance the growth of only positive interaction.A. Colonies on both +L and -L in the presence of 3-AT indicate positive interaction between CLCuMB-βC1 and GhUBC1 protein in cotton.B. Negative control possessing empty bait was transformed with CLCuMB-βC1.No colonies appeared on -L media indicating validation of the experiment.C. Negative control with empty construct.

Figure S1 .
Figure S1.Bioinformatics pipeline to predict the interaction between virus and the host.

Figure S2 .
Figure S2.Ubiquitin-conjugating enzyme in Solanum lycopersicum (SIUBC3) protein structure alignment.UBC3 in tomato was aligned with A.thaliana, S. cerevisiae, and H. sapiens UBC protein to study their evolutionary function.This structure alignment with minimum deviation score indicates UBC protein possesses conserved structure throughout different species to play a role in the ubiquitination pathway during posttranslational modifications.
********************** Prediction results of Non-Partner-Specific interface residues by NPSHOMPPI (http://ailab1.ist.psu.edu/NPSHOMPPI).: the query protein can find homologous proteins in Safe Zone.Mode = TwilightMode1: the query protein can find homologous proteins in Twilight Zone 1. Mode = TwilightMode2: the query protein can find homologous proteins in Twilight Zone 2. For more details about the Safe/Twiligh/Dark Zone, please refer to the paper for NPS-HomPPI: Li C. Xue, Drena Dobbs, Vasant Honavar: HomPPI: A Class of Sequence Homology Based Protein-Protein Interface Prediction Methods.BMC Bioinformatics 2011, 12:244.2. pINT: predicted interface residues.1: interface.0: non-interface.?: no prediction can be made.3. score: prediction score from NPS-HomPPI.The higher the score the higher prediction confidence.*** Query ID: SI UBC3 *** Qry = G.hristum UBC1 MODE: SafeMode Predicted residues # Seq pINT SCORE 1 E I L G M D V I E P Y V F N K K F T V E N P E I L G M D V I E P Y I F N K K F T V E N P E I L G M D V I E P Y V F N K K F T V E N P E I L G M D V I E P Y V F N K K F T V E N P E I L G M D V I E P Y I F N K K F T V E N P E I L G M D V I E P Y I F N K K F