SARS-CoV-2 main protease targeting potent fluorescent inhibitors: Repurposing thioxanthones

The coronavirus disease, COVID-19, is the major focus of the whole world due to insufficient treatment options. It has spread all around the world and is responsible for the death of numerous human beings. The future consequences for the disease survivors are still unknown. Hence, all contributions to understand the disease and effectively inhibit the effects of the disease have great importance. In this study, different thioxanthone based molecules, which are known to be fluorescent compounds, were selectively chosen to study if they can inhibit the main protease of SARS-CoV-2 using various computational tools. All candidate ligands were optimized, molecular docking and adsorption, distribution, metabolism, excretion, and toxicity (ADMET) studies were conducted and subsequently, some were subjected to 100 ns molecular dynamics simulations in conjunction with the known antiviral drugs, favipiravir, and hydroxychloroquine. It was found that different functional groups containing thioxanthone based molecules are capable of different intermolecular interactions. Even though most of the studied ligands showed stable interactions with the main protease, para-oxygen-di-acetic acid functional group containing thioxanthone was found to be a more effective inhibitor due to the higher number of intermolecular interactions and higher stability during the simulations.

SARS-CoV-2 replicase encodes two polyproteins, namely pp1a and pp1ab, which are required for the production of functional polypeptides that have significant importance in the viral replication and transcription processes. These functional polypeptides are released from the polyproteins as a result of extensive proteolytic processes by M pro [6], [10][11][12].
Thioxanthone is an organic compound with a heterocyclic structure and can be also classified as the sulfur analog of xanthone. Thioxanthone and its derivatives are versatile compounds and have many applications in different areas due to their exceptional photochemical and photophysical properties [13][14]. Initially, they were utilized as photoinitiators in free radical polymerization [15][16][17][18][19]. Later, the science benefited from the exceptional photochemical and photophysical properties of these molecules in various biological and biochemical systems such as, in DNA binding studies, in cytotoxicity studies, in antibacterial systems, and as drug analogues [20][21][22][23]. In past years, several different analogues of thioxanthone based molecules were synthesized with the aim of antitumor efficiency and some other biological activities such as antiallergic, antibiotics, and monoamine oxidase (MAO) inhibitors [13].
In silico studies are successful methods to target biological compounds and have been used in various studies throughout the years. Predominantly used in silico approaches include molecular docking and molecular dynamics simulations. There have been numerous studies for CADD of SARS-CoV-2 and these studies mainly focused on the known drugs and/or blind searching the chemical databases as it was an urgent issue at beginning of the pandemic.
Several other targeted and planned approaches based on the knowledge in the literature were also employed to understand the inhibitory effect of various compounds against SARS-CoV-2. It was proposed previously by Gerçek et. al [24] that some of the SARS-CoV-2 targeting molecules bear three different constituents on the main core such as substituted aromatic rings, p-florothiophene and a nitrogen. Coumarin derivatives are also a good example in the literature among the tested molecules against SARS-CoV-2 as they are known to be versatile organic compounds [25][26]. In this study, seven different molecules with thioxanthone core were chosen selectively based on the investigated photophysical and photochemical properties in literature [27][28][29]. As these molecules possess fluorescent properties [27][28][29], it is believed that this field of study could benefit from their exceptional photochemical properties. Besides the fluorescent properties of thioxanthone derivatives, these species contain functional groups which are capable of making intermolecular interactions such as hydrogen bonding, π-π interactions, π-alkyl interactions, and more. As reported in the literature, due to the ease of synthesis, thioxanthone based molecules can be also functionalized with different functional groups with desired properties. We used density functional theory (DFT) to optimize thioxanthone based ligands and these structures were used in the molecular docking of M pro . After molecular docking the best-performing molecules were determined. Best docking poses of determined molecules were retrieved, and 100 ns molecular dynamics simulations were conducted. Additionally, ADMET properties of thioxanthone derivatives were studied. In conjunction with the thioxanthone derivatives, two known antiviral drugs, favipiravir (FAV) and Hydroxychloroquine (HCQ) were also used in our study as they were administered to the people during the initial stages of the pandemic [30]. Stability of keto and enol forms of favipiravir in water environment was compared and most stable conformer was determined. Most stable conformer of FAV and HCQ were also subjected to molecular docking and subsequent MD simulations to compare their performance with thioxanthone derivatives. At the end of the conducted studies, promising results were obtained and presented to the literature.

Density functional theory (DFT) calculations
Ligands were optimized with Gaussian 16 Revision B.01 software package by density functional theory (DFT). Calculations were carried out at B3LYP [31][32][33] level and 6-31+G(d,p) basis set was used for all atoms. As reported earlier by Metin et al. [34] adding a larger basis set for the S atom in the DFT calculations of thioxanthone based molecules leads to a good agreement with the experimental results. For this reason, a larger basis set, 6-311++G(3df,3pd), was added for S, F, and Cl atoms. Initially, preoptimization calculations were carried out with the initially created structures. Then, possible rotations around the single bonds were taken into account and these structures were reoptimized. Optimized structures were frequency checked and no imaginary frequency was noted. The minimum energy conformer of each ligand was selected for docking studies. To mimic the biological environment, all optimization calculations were carried out in the water (ε = 78.3553) by using the integral equation formalism polarizable continuum model (IEF-PCM) as implemented in Gaussian 16.

Docking
Docking studies were carried out with AutoDock Vina 1.1.2 [35]. As the docking studies require both protein and ligand preparation before running calculations; we used AutoDockTools 1.5.7 (ADT), which is distributed along with the MGLTools 1.5.7 (https://ccsb.scripps.edu/mgltools/), to prepare the protein and the ligands.
BIOVIA Discovery Studio Visualizer 2020 software was used for visualization and investigation purposes throughout the study.
Initially, the crystal structure of SARS-CoV-2 main protease (M pro ) in complex with N3 [6] (PDB ID: 6LU7) was downloaded from Protein Data Bank in .pdb format. The crystal structure of the protein contains an inhibitor called N3, water molecules and it has missing hydrogens. The .pdb file was opened with ADT and the protein preparation was established by; (i) removing the attached ligand, (ii) removing the surrounding water molecules, (iii) adding missing polar hydrogens, and then (iv) Gasteiger charges. The prepared structure was converted to .pdbqt format to be used in AutoDock Vina calculations.
The position of the active site was determined by the central coordinates of N3 inhibitor in the crystal structure as x: -10.711837, y: 12.411388, z: 68.831286. A cubic grid box, that covers the whole active site, was created by adjusting the spacing as 1 Å and setting the number of points in x,y, and z dimensions as 22 × 22 × 22, respectively. A configuration file was created to run AutoDock Vina with the following parameters: exhaustiveness as 24, the number of modes as 20, and energy range as 2 kcal mol -1 .

ADMET, drug likeness and medicinal chemistry friendliness
ADME is an abbreviation in pharmacokinetics and pharmacology for "absorption, distribution, metabolism, and excretion", and it is about the disposition of a pharmaceutical compound within an organism. These all criteria influence the drug levels and kinetics of drug exposure to the living tissues and hence influence the performance and pharmacological activity of the compounds as a drug. Often toxicological properties of a drug are considered to have a vital role in drug design strategies. When toxicity studies accompany ADME studies, the abbreviation is transformed into ADMET where T stands for toxicology.

Molecular dynamics (MD) simulations
GROMACS 2020.1 software package was used for molecular dynamics simulations. Initially, all molecules including protein and ligands were prepared for the molecular dynamics simulations. At the initial state of the simulation, topology for the protein and the ligands were generated. Subsequently, the protein-ligand complexes were built. A box around the complexes was created and the simulation system was solvated. The created system was energy minimized and equilibrated. Lastly, successful MD production runs were conducted.
During MD runs CHARMM36 all-atom force field and TIP3P water model were used. We benefited from CGenFF server for the ligand preparations [40][41][42][43][44]. The detailed parameters of the created systems are as followed. The entire protein-ligand complex was centered in a dodecahedron box at least 1 nm from the edge of the box. Then the system is solvated. Na + and Clions were added to neutralize the system and set the ion concentration to 0.15 M to obtain a physiological environment. Following the preparation of the system, first, the system was energy minimized. Then the minimized structure was equilibrated by NVT and NPT ensembles for 200 ps, respectively. Verlet cutoff scheme and Particle Mesh Ewald for long-range electrostatics (PME) were used with 1.2 nm cutoff. Modified Berendsen thermostat (V-rescale) was used and the temperature was set to 310 K. For the NPT, pressure coupling with Berendsen was conducted. After successful NVT and NPT equilibration steps, the system was subjected to MD production run for 100 ns at 310 K with modified Berendsen thermostat (V-rescale) for temperature coupling and Parrinello-Rahman for pressure coupling. As a result of the MD runs, the root-mean-square deviation (RMSD) and Radius of Gyration (R g ) of the simulated proteinligand systems were investigated with the corrected trajectories of the systems.

Energy minimizations of ligands
Prior to docking studies and MD simulations, all ligands were optimized in an explicit solvent model, namely in water, to mimic the physiological environment. After successfully optimizing the ligand molecules and obtaining the lowest energy conformation of the ligands, these geometries were selectively chosen for further studies. Besides these routine minimizations, it has come to our attention that the favipiravir is capable of keto-enol tautomerism ( Figure 2). For this reason, we optimized both the keto and enol forms of the molecule in water and calculated the relative energy difference between these species.
It was found that the relative energy difference between the keto and enol form favipiravir is -3.88 kcal mol -1 . The enol form is energetically more favorable when optimized in water under the same conditions. In consequence, we chose to go further with the enol form as a result of the successful validation of the energy minimization.

Ligand-protein docking
All thioxanthone based molecules, in conjunction with favipiravir (keto and enol forms) and hydroxychloroquine, that were subjected to docking studies are summarized in Table 1 with the corresponding abbreviations, molecular formulas, 2D molecular structures, molecular weights, and IUPAC names. Molecular structures of all thioxanthone based molecules bear a thioxanthone core and differentiate within each other with the attached functional groups and/or position of the functional groups. This enables us to be more flexible when modifying the thioxanthone core for desired purposes. All thioxanthone core bearing molecules in this study are the ones modified with the functional groups that are capable of hydrogen bonding and/or intermolecular interactions in a dynamic environment.
As can be seen from Figure 3, all ligands were docked to the substrate binding site, the cavity of the M pro , successfully. The binding energies of all thioxanthone bearing molecules possessed a similar trend and were found to have significantly better binding energies than the known drugs favipiravir and hydroxychloroquine (Table 2).    All possible intermolecular and intramolecular interactions of the ligands with the M pro are given in Figure 4. It has been found that all molecules were capable making of at least 3 conventional hydrogen bonds in a combination and/ or coexistence of C-H bonds, π-donor H bonds, π -σ interactions, π -alkyl interactions, π -π T-shaped interactions, interactions with a halogen and π -cation interactions.
We chose 3 thioxanthone core bearing molecules, enol form of favipiravir, and HCQ to further study with molecular dynamics simulations. TX-NH-AA was chosen due to the -NH functional group, 4 conventional hydrogen bonds in combination with a C-H bond, π -alkyl interaction, and π -π T-shaped interaction. Within the predominantly oxygenbased functional groups containing thioxanthone derivatives, p-TX-O-DiAA was chosen due to double acetic acid functionality, opposite positioning of the functional groups, 5 conventional hydrogen bonds in combination with a C-H bond, and a π -alkyl interaction. Among the other sulfur-containing thioxanthones, TX-SH was chosen due to the exceptional interaction characteristics, such as 3 conventional hydrogen bonds and 3 π -cation interactions.

ADMET and drug likeness
In Figure 5, Brain Or IntestinaL EstimateD permeation method (BOILED-Egg) [37] representation of the studied ligands is represented. BOILED-Egg representation is a predictive model that works by computing the lipophilicity and polarity of small molecules. Points located in the egg's yolk are molecules predicted to passively permeate through the blood-brain barrier (BBB), whilst the ones located in the egg's white are molecules predicted to be passively absorbed by the gastrointestinal tract, in other words, human intestinal absorption (HIA). On the other hand, blue-white circles are the representation of molecules that are predicted to be effluated from the central nervous system (CNS) by the P-glycoprotein (PGP+), and the red-white circles represent the ones that are not to be effluated from the CNS by P-glycoprotein (PGP-). [ZCN1] As can be seen from Figure 5, all molecules except HCQ are located outside the egg's yolk indicating that, these molecules are predicted not to passively permeate through the BBB. However, all thioxanthone based molecules and both forms of favipiravir are predicted to be passively absorbed by the gastrointestinal tract. Note that the p-TX-O-DiAA and o-TX-O-DiAA were positioned on the same spot inside the egg's white.
BOILED-Egg representation related values of studied ligands and various molecular, lipophilicity, water solubility, pharmacokinetics, drug-likeness, medicinal chemistry, and toxicity properties of the studied molecules are provided in Table 3. As can be followed from the obtained values all studied thioxanthone based ligands showed superior properties to be utilized as drug analogues, with a sufficient number of H acceptors/donors and rotatable bonds, with good lipophilicity index values, with high gastrointestinal absorption whereas no BBB permeation, with Lipinski drug-likeness rules compatibility and good bioavailability scores, with compatible medicinal chemistry properties, good synthetic accessibility values, and good toxicity assessments.

Molecular dynamics (MD) simulations
To ensure that the created systems have no steric clashes or inappropriate geometry, systems were relaxed by energy minimization. After this process, it was found that the potential energy values were negative and had the order of 10 5 -10 6 which is dependent on the size of the system and the number of ions and water molecules added to the system ( Figure  6a). Additionally, the maximum force (F max ) values were found to be smaller than 1000 kJ mol -1 which was targeted at the beginning energy minimization process. Thus, it was concluded that the systems have a reasonable starting structure regarding the geometry of the components and solvent orientation.   Before the actual MD production run, the system must be further optimized by equilibrating the solvent and the added ions around the protein. Otherwise, the system may produce insufficient and unreliable data. The main reason is that the solvent is generally optimized within itself and not with the solute. It needs to be brought to the temperature we would like to run the simulation and have a proper orientation around the protein. So, the first approach is to arrive at the target temperature and check the stability. Then, the pressure is applied until the proper and stable density is reached.
The first phase of the equilibration was conducted under NVT ensemble (constant number of particles, volume, and temperature), also called as "isothermal-isochoric" or "canonical" ensemble. In general, the 100 ps timeframe for this procedure is accepted to be sufficient but depending on the system properties this time can be prolonged. For the NVT ensemble, we prolonged the timeframe up to 200 ps. It can be followed in Figure 6b that the temperature of the system rapidly reached the target value (310 K) and remained stable during the equilibration for all the studied systems.
The second phase of the equilibration was conducted under NPT ensemble (constant number of particles, pressure, and temperature), also referred to as "isothermal-isobaric" ensemble. We prolonged once again the timeframe up to 200 ps. As can be seen from Figure 6c the pressure fluctuated rapidly and widely during the equilibration, as expected. However, the density ( Figure 6d) reached a certain value (approx. 1030 kg m -3 ) and remained stable over the course of the equilibration for all the studied systems. After the verification of the preoptimization and energy minimization steps for all created systems, molecular dynamics simulations were produced for 100 ns. As can be seen from Figure 7a the RMSD values for all systems fluctuated in the range of 0.1 nm (1 Å) indicating that the systems remained very stable over the course of the simulations. It should be noted that the 6LU7, the neat protein structure, showed an increased fluctuation between the time frame around 85-90 ns but became stable later. Standard antiviral drugs, FAV-eno and HCQ showed similar trends with the thioxanthone derivatives. For all studied ligand-protein systems, the rapid fluctuation as the neat protein around 90 ns was not observed. However, these initial findings must also be confirmed with the manual investigation of the produced trajectories. After the examination of the produced trajectories, it was found that HCQ, TX-NH-AA, and p-TX-O-DiAA are stable and located in the binding pocket of M pro making strong interactions with the protein during the entire simulation. On the other hand, FAV-eno loses its stability after a short time. TX-SH possesses interesting behavior as it is stable in the binding pocket for some time but eventually loses its position but repositions itself near the binding site. This observation is in good agreement with the RMSD results of TX-SH in which it showed two unstable behaviors around 20-30 ns and 60-80 ns. However, these fluctuations were limited and have also remained in the range of 0.1-0.2 nm.
The radius of gyration (R g ) of protein can be summarized as the compactness of the protein. If a stable folding is achieved one can expect a stable R g value. On the other hand, if there is an unfolding, the R g value is expected to change over time. For all the studied ligand-protein systems, the R g values stayed stable (Figure 7b) except for TX-SH. TX-SH showed a sharp fluctuation in the range of 60-80 ns which is in correlation with the RMSD findings of the same system and manual observations on the trajectory.
The number of H-bonds vs. time for MD run of p-TX-O-DiAA is given in Figure 8 suggesting a minimum of at least 1 H bond over the course of the simulation. This finding supports the other obtained results. Hence, after the investigation of all produced MD runs and consequent analysis of the trajectories, we concluded that among all studied molecules p-TX-O-DiAA is the most stable and has more interactions in the binding pocket of M pro .

Conclusion
In this study, we investigated some members of versatile thioxanthone derivatives and tried to understand the potential use of these compounds in the research against COVID-19. Almost all the studied thioxanthone based molecules have demonstrated that they can establish stable interactions with the main protease or even further stabilize the protein. Due to their high fluorescent properties as reported in the literature, we believe that these compounds have great potential to be utilized as fluorescent main protease targeting compounds. This feature, in conjunction with the stable interactions with the main protease, enables a great potential for these compounds to be used for detecting, analyzing, and preventing COVID-19. Also, it is believed that these compounds have also a great potential to be one of the key components of biomedical devices that target the main protease and as well as require a fluorescent feature. At least for the reasons mentioned above, we believe that the wide range of scientific disciplines can benefit from these initial findings with a promising opportunity for alternative modifications and further developments.