In the previous essay, I showed that the simple fragmentation function doesn't preserve chiral after making a single cut. Here's the function definition:
from rdkit import Chem # Only works correctly for achiral molecules def fragment_simple(mol, atom1, atom2): rwmol = Chem.RWMol(mol) rwmol.RemoveBond(atom1, atom2) wildcard1 = rwmol.AddAtom(Chem.Atom(0)) wildcard2 = rwmol.AddAtom(Chem.Atom(0)) rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE) rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE) return rwmol.GetMol()The reason is the RemoveBond()/AddBond() combination can change the permutation order of the bonds around and atom, which inverts the chirality. Here's the relevant part of the connection table from the end of that essay:
connections from atom 1 (as bond type + other atom index) 1 C -0 -2 -3 -5 original structure 1 C -0 -2 -5 -11 modified; bond to atom 3 is now a bond to atom 11 ^^^--- was bond to atom 3 ^^^^^^^--- these two bonds swapped position = inverted chirality
I'll now show how to improve the code to handle chirality. (Note: this essay is pedagogical. To fragment in RDKit use FragmentOnBonds().)
Parity of a permutation
There's no way from Python to go in and change the permutation order of RDKit's bond list for an atom. Instead, I need to detect if the permutation order has changed, and if so, un-invert the atom's chirality.
While I say "un-invert", that's because we only need to deal with tetrahedral chirality, which has only two chirality types. SMILES supports more complicated chiralities, like octahedral (for example, "@OH19") which can't be written simply as "@" or "@@". However, I've never seen them in use.
With only two possibilities, this reduces to determining the "parity" of the permutation. There are only two possible parities. I'll call one "even" and the other "odd", though in code I'll use 0 for even and 1 for odd.
A list of values in increasing order, like (1, 2, 9), has an even parity. If I swap two values then it has odd parity. Both (2, 1, 9) and (9, 2, 1) have odd parity, because each needs only one swap to put it in sorted order. With another swap, such as (2, 9, 1), the permutation order is back to even parity. The parity of a permutation is the number of pairwise swaps needed to order the list, modulo 2. If the result is 0 then it has even parity, if the result is 1 then it has odd parity.
One way to compute the permutation order is to sort the list, and count the number of swaps needed. Since there will only be a handful of bonds, I can use a simple sort like the Shell sort:
def parity_shell(values): # Simple Shell sort; okay for small numbers values = list(values) N = len(values) num_swaps = 0 for i in range(N-1): for j in range(i+1, N): if values[i] > values[j]: values[i], values[j] = values[j], values[i] num_swaps += 1 return num_swaps % 2I'll test it with a few different cases to see if it gives the expected results:
>>> parity_shell( (1, 2, 9) ) 0 >>> parity_shell( (2, 1, 9) ) 1 >>> parity_shell( (2, 9, 1) ) 0 >>> parity_shell( (2, 1, 9) ) 1 >>> parity_shell( (1, 3, 9, 8) ) 1There are faster and better ways to determine the parity. I find it best to start with the most obviously correct solution first.
Determine an atom's parity
The next step is to determine the configuration order before and after attaching the dummy atom. I'll use the fragment_simple() and parity_shell() functions I defined earlier, and define a couple of helper functions to create an isomeric canonical SMILES from a molecule or SMILES string.
from rdkit import Chem def C(mol): # Create a canonical isomeric SMILES from a molecule return Chem.MolToSmiles(mol, isomericSmiles=True) def Canon(smiles): # Create a canonical isomeric SMILES from a SMILES string return C(Chem.MolFromSmiles(smiles))
The permutation order is based on which atoms are connected to a given bond. I'll parse a simple chiral structure (which is already in canonical form) and get the ids for the atoms bonded to the second atom. (The second atom has an index of 1.)
>>> mol = Chem.MolFromSmiles("O[C@](F)(Cl)Br") >>> C(mol) 'O[C@](F)(Cl)Br' >>> >>> atom_id = 1 >>> atom_obj = mol.GetAtomWithIdx(atom_id) >>> other_atoms = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] >>> other_atoms [0, 2, 3, 4]The list values are in order, so you won't be surprised it has a parity of 0 ("even"):
>>> parity_shell(other_atoms) 0
I'll use the fragment_simple() function to fragment between the oxygen and the chiral carbon:
>>> fragmented_mol = fragment_simple(mol, 0, 1) >>> fragmented_smiles = C(fragmented_mol) >>> fragmented_smiles '[*]O.[*][C@@](F)(Cl)Br'the use the convert_wildcards_to_closures() function from the previous essay to re-connect the fragments and produce a canonical SMILES from it:
>>> from smiles_syntax import convert_wildcards_to_closures >>> >>> closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0)) >>> closure_smiles 'O%90.[C@@]%90(F)(Cl)Br' >>> >>> Canon(closure_smiles) 'O[C@@](F)(Cl)Br'
If you compare this to the canonicalized input SMILES you'll see the chirality is inverted from what it should be. I'll see if I can detect that from the list of neighbor atoms to the new atom 1 of the fragmented molecule:
>>> atom_id = 1 >>> atom_obj = fragmented_mol.GetAtomWithIdx(atom_id) >>> other_atoms = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] >>> other_atoms [2, 3, 4, 6]These values are ordered. It's tempting to conclude that this list also has an even parity. But recall that the original list was [0, 2, 3, 4]. The id 0 (the connection to the oxygen) has been replaced with the id 6 (the connection to the wildcard atom).
The permutation must use the same values, so I'll replace the 6 with a 0 and determine the parity of the resulting list:
>>> i = other_atoms.index(6) >>> i 3 >>> other_atoms[i] = 0 >>> other_atoms [2, 3, 4, 0] >>> parity_shell(other_atoms) 1This returned a 1 when the ealier parity call returned a 0, which means parity is inverted, which means I need to invert the chirality of the second atom:
>>> atom_obj.InvertChirality()
Now to check the re-assembled structure:
>>> fragmented_smiles = C(fragmented_mol) >>> fragmented_smiles '[*]O.[*][C@](F)(Cl)Br' >>> >>> closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0)) >>> closure_smiles 'O%90.[C@]%90(F)(Cl)Br' >>> >>> Canon(closure_smiles) 'O[C@](F)(Cl)Br'This matches the canonicalized input SMILES, so we're done.
An improved fragment function
I'll use a top-down process to describe the changes to fragment_simple() to make it work. What this doesn't show you is the several iterations I went through to make it look this nice.
At the top level, I need some code to figure out if an atom is chiral, then after I made the cut, and if the atom is chiral, I need some way to restore the correct chirality once I've connected it to the new wildcard atom.
def fragment_chiral(mol, atom1, atom2): rwmol = Chem.RWMol(mol) # Store the old parity as 0 = even, 1 = odd, or None for no parity atom1_parity = get_bond_parity(mol, atom1) atom2_parity = get_bond_parity(mol, atom2) rwmol.RemoveBond(atom1, atom2) wildcard1 = rwmol.AddAtom(Chem.Atom(0)) wildcard2 = rwmol.AddAtom(Chem.Atom(0)) new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE) new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE) # Restore the correct parity if there is a parity if atom1_parity is not None: set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1) if atom2_parity is not None: set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2) # (Later I'll find I also need to call SanitizeMol().) return rwmol.GetMol()
To get atom's bond permutation parity, check if it has tetrahedral chirality (it will either be clockwise or counter-clockwise). If it doesn't have a tetrahedral chirality, return None. Otherwise use the neighboring atom ids to determine the parity:
from rdkit import Chem CHI_TETRAHEDRAL_CW = Chem.ChiralType.CHI_TETRAHEDRAL_CW CHI_TETRAHEDRAL_CCW = Chem.ChiralType.CHI_TETRAHEDRAL_CCW def get_bond_parity(mol, atom_id): atom_obj = mol.GetAtomWithIdx(atom_id) # Return None unless it has tetrahedral chirality chiral_tag = atom_obj.GetChiralTag() if chiral_tag not in (CHI_TETRAHEDRAL_CW, CHI_TETRAHEDRAL_CCW): return None # Get the list of atom ids for the each atom it's bonded to. other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] # Use those ids to determine the parity return parity_shell(other_atom_ids)
To restore the parity, again get the list of neighboring atom ids, but this time from the fragmented molecule. This will be connected to one of the new wildcard atoms. I need to map that back to the original atom index before I can compute the parity and, if it's changed, invert the chirality:
def set_bond_parity(mol, atom_id, old_parity, old_other_atom_id, new_other_atom_id): atom_obj = mol.GetAtomWithIdx(atom_id) # Get the list of atom ids for the each atom it's bonded to. other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] # Replace id from the new wildcard atom with the id of the original atom i = other_atom_ids.index(new_other_atom_id) other_atom_ids[i] = old_other_atom_id # Use those ids to determine the parity new_parity = parity_shell(other_atom_ids) if old_parity != new_parity: # If the parity has changed, invert the chirality atom_obj.InvertChirality()
Testing
I used a simple set of tests during the initial development where I split the bond between the first and second atom of a few structures and compared the result to a reference structure that I fragmented manually (and re-canonicalized):
# Create a canonical isomeric SMILES from a SMILES string. # Used to put the manually-developed reference structures into canonical form. def Canon(smiles): mol = Chem.MolFromSmiles(smiles) assert mol is not None, smiles return Chem.MolToSmiles(mol, isomericSmiles=True) def simple_test(): for smiles, expected in ( ("CC", Canon("*C.*C")), ("F[C@](Cl)(Br)O", Canon("*F.*[C@](Cl)(Br)O")), ("F[C@@](Cl)(Br)O", Canon("*F.*[C@@](Cl)(Br)O")), ("F[C@@H](Br)O", Canon("*F.*[C@@H](Br)O")), ): mol = Chem.MolFromSmiles(smiles) fragmented_mol = fragment_chiral(mol, 0, 1) fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True) if fragmented_smiles != expected: print("smiles:", smiles) print("fragmented:", fragmented_smiles) print(" expected:", expected)These tests passed, so I developed some more extensive tests. My experience is that real-world chemistry is far more complex and interesting than the manual test cases I develop. After the basic tests are done, I do more extensive testing by processing a large number of structures from PubChem and then from ChEMBL.
ChEMBL structures comes from multiple sources
I'll go on a tanget. Why do I start with PubChem before going on to ChEMBL? The PubChem data is all generated by one chemistry toolkit, I think OEChem, while CheMBL data comes from many sources. To get a sense of the diversity, I processed the ChEBML 21 release to get the second line of each record. The ctfile format specification says that if non-blank then the 8 characters starting in the third position should contain the program name. I'll write a program to extract those names and count how many times each one occurs.
The following program is not a general-purpose SD file reader. It depends on the specific layout of ChEMBL 21, where there are only two lines which start with "CHEMBL"; the title line and the tag data after the "chembl_id" tag.
import sys, gzip from collections import Counter counter = Counter() with gzip.open("/Users/dalke/databases/chembl_21.sdf.gz", "rb") as infile: for line in infile: # Get the line after the title line (which starts with 'CHEMBL') if line[:6] == b"CHEMBL": next_line = next(infile) # Print unique program names program = next_line[2:10] if program not in counter: print("New:", repr(str(program, "ascii"))) counter[program] += 1 if line[:13] == b">": ignore = next(infile) # skip the CHEMBL print("Done. Here are the counts for each program seen.") for name, count in counter.most_common(): print(repr(str(name, "ascii")), count)Here are the counts: #count_table { border-collapse: collapse; margin-left: 5em; } #count_table th { border-bottom: 1px solid black; } #count_table th:nth-child(2) { border-left: 1px solid grey; } #count_table tr td:nth-child(2) { border-left: 1px solid grey; text-align: right; } pre.code img { border: 1px dotted #084; }
Program name | count |
---|---|
'SciTegic' | 1105617 |
' ' | 326445 |
'CDK 9' |  69145 |
'Mrv0541 ' | 30962 |
'' | 24531 |
'CDK 6' | 10805 |
'CDK 1' | 8642 |
'CDK 5' | 4771 |
'CDK 8' | 2209 |
'-ISIS- ' | 281 |
'Symyx ' | 171 |
'CDK 7' | 144 |
'-OEChem-' | 96 |
'Marvin ' | 61 |
'Mrv0540 ' | 13 |
' RDKit' | 3 |
'CDK 3' | 1 |
- II: two characters for the user's initials
- PPPPPPPP: eight characters for the program name
- MMDDYY: two characters for the month, two for the day, two for the year
- HHmm: two characters for the hour, two for the minutes
- … additional fields omitted …
IIPPPPPPPPMMDDYYHHmmddSS… CDK 7/28/10,10:58 CDK 8/10/10,12:22 CDK 9/16/09,9:40 CDK 10/7/09,10:42 CDK 11/9/10,11:20Now to return from this (double) tangent to the topic of cutting a bond to produce two fragments.
Need ClearComputedProps()?
The more extensive test processes every structure which contains a chiral atom (where the SMILES contains a '@'), cuts every bond between heavy atoms, so long as it's a single bond not in a ring, and puts the results back together to see if it matches the canonicalized input structure. The code isn't interesting enough to make specific comments about it. You can get the code at the end of this essay.
The first error occurred quickly, and there were many errors. Here's the first one:
FAILURE in record 94 input_smiles: C[C@H]1CC[C@@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1 begin/end atoms: 4 5 fragmented smiles: [*]NCc1ccc2c(c1)Cc1c(n[nH]c1-2)-c1ccc(cc1)CC(=O)O.[*][C@H]1CC[C@H](C)CC1 closure smiles: N%90Cc1ccc2c(c1)Cc1c(n[nH]c1-2)-c1ccc(cc1)CC(=O)O.[C@@H]%901CC[C@H](C)CC1 final smiles: C[C@H]1CC[C@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1 expected smiles: C[C@H]1CC[C@@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1Greg Landrum, the main RDKit developer, points to the solution. He writes: "after you break one or more bonds, you really, really should re-sanitize the molecule (or at least call ClearComputedProps()".
The modified code is:
def fragment_chiral(mol, atom1, atom2): rwmol = Chem.RWMol(mol) atom1_parity = get_bond_parity(mol, atom1) atom2_parity = get_bond_parity(mol, atom2) rwmol.RemoveBond(atom1, atom2) wildcard1 = rwmol.AddAtom(Chem.Atom(0)) wildcard2 = rwmol.AddAtom(Chem.Atom(0)) new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE) new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE) if atom1_parity is not None: set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1) if atom2_parity is not None: set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2) # After breaking bonds, should re-sanitize, or at least use ClearComputedProps() # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482 new_mol = rwmol.GetMol() # (I will find that I need to call SanitizeMol().) Chem.ClearComputedProps(new_mol) return new_mol
Ring chirality failures
There were 200,748 chiral structures in my selected subset from PubChem. Of those, 232 unique structures problems when I try to cut a bond. Here's an example of one of the reported failures:
FAILURE in record 12906 input_smiles: [O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 begin/end atoms: 0 1 fragmented smiles: [*][O-].[*][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 closure smiles: [O-]%90.[S@+]%90(CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 final smiles: [O-][S@@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 expected smiles: [O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1
Here's the complete list of failing structures, which might make a good test set for other programs:
C-N=C(-S)[C@]1(c2cccnc2)CCCC[S@+]1[O-] 1 C=C(c1ccc([S@+]([O-])c2ccc(OC)cc2)cc1)C1CCN(C2CCCCC2)CC1 2 C=C(c1ccc([S@@+]([O-])c2ccc(OC)cc2)cc1)C1CCN(C2CCCCC2)CC1 3 C=CCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 4 C=CCCCC[N@@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 5 C=CC[S@+]([O-])C[C@H](N)C(=O)O 6 CC#CCOc1ccc([S@@+]([O-])[C@H](C(=O)NO)C(C)C)cc1 7 CC(=O)NC[C@H]1CN(c2ccc([S@+](C)[O-])cc2)C(=O)O1 8 CC(=O)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 9 CC(=O)Nc1cc(-c2c(-c3ccc(F)cc3)nc([S@+](C)[O-])n2C)ccn1 10 CC(C(=O)O[C@@H]1CC2CCC(C1)[N@]2C)(c1ccccc1)c1ccccc1 11 CC(C)(C(=O)N[C@H]1C2CCC1C[C@@H](C(=O)O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 12 CC(C)(C(=O)N[C@H]1C2CCCC1C[C@@H](C(=O)O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 13 CC(C)(C(=O)N[C@H]1C2CCCC1C[C@@H](C(N)=O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 14 CC(C)(C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 15 CC(C)(Oc1ccc(C#N)cn1)C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2 16 CC(C)(Oc1ccc(C#N)cn1)C(=O)N[C@H]1C2COCC1C[C@H](C(N)=O)C2 17 CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@@H](C(=O)O)C2 18 CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@@H](C(N)=O)C2 19 CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@H](C(=O)O)C2 20 CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2 21 CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2COCC1C[C@@H](C(N)=O)C2 22 CC(C)Cc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 23 CC(C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 24 CC(C)Nc1cc(-c2[nH]c([S@@+]([O-])C(C)C)nc2-c2ccc(F)cc2)ccn1 25 CC(C)OC(=O)N1CCC(Oc2ncnc3c2CCN3c2ccc([S@+](C)[O-])c(F)c2)CC1 26 CC(C)OC(=O)N1CCC(Oc2ncnc3c2CCN3c2ccc([S@@+](C)[O-])c(F)c2)CC1 27 CC(C)[C@@H](C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 28 CC(C)[C@H](C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 29 CC(C)[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCCCC3)c2)[nH]1 30 CC1(C)C(=O)N([C@H]2C3CCCC2C[C@@H](C(N)=O)C3)CC1COc1ccc(C#N)cn1 31 CC1(C)C(=O)N([C@H]2C3CCCC2C[C@H](C(N)=O)C3)CC1COc1ccc(C#N)cn1 32 CC1(C)N=C(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@+]([O-])C(C)(C)[C@@H]2C(=O)O 33 CC1(C)NC(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@+]([O-])C(C)(C)[C@@H]2C(=O)O 34 CC1(C)NC(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@@+]([O-])C(C)(C)[C@@H]2C(=O)O 35 CCCCCCCCCCCCCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 36 CCCCCCCCC[S@@+]([O-])CC(=O)OC 37 CCCCCCCCC[S@@+]([O-])CC(N)=O 38 CCCCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 39 CCCCCC[C@@H](C(=O)NO)[S@+]([O-])c1ccc(OC)cc1 40 CCCCCC[C@@H](C(=O)NO)[S@@+]([O-])c1ccc(OC)cc1 41 CCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 42 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCCN3CC(C)C)cc1 43 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCCN3CCC)cc1 44 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCN3CC(C)C)cc1 45 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCN3CCC)cc1 46 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CC(C)C)cc1 47 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CC(C)C)cc1.CS(=O)(=O)O 48 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CCC)cc1 49 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3Cc2cnn(C)c2)cc1 50 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4nncn4CCC)cc2)CCCN3CC(C)C)cc1 51 CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4nncn4CCC)cc2)CCCN3CCC)cc1 52 CCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 53 CCC[C@H]1CC[C@H]([C@H]2CC[C@@H](OC(=O)[C@H]3[C@H](c4ccc(O)cc4)[C@H](C(=O)O[C@H]4CC[C@@H]([C@H]5CC[C@H](CCC)CC5)CC4)[C@H]3c3ccc(O)cc3)CC2)CC1 54 CCCc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 55 CCN1C(=O)c2ccccc2[C@H]1[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 56 CCN1c2cc(C(=O)NCc3ccc(C#N)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 57 CCN1c2cc(C(=O)NCc3ccc(F)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 58 CCN1c2cc(C(=O)NCc3ccc(OC)cc3)ccc2[S@+]([O-])c2ccccc2C1=O 59 CCN1c2cc(C(=O)NCc3ccc(OC)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 60 CCN1c2cc(C(=O)N[C@H](C)c3ccc(Br)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 61 CCN1c2cc(C(=O)N[C@H](C)c3ccc4ccccc4c3)ccc2[S@@+]([O-])c2ccccc2C1=O 62 CCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 63 CC[C@H](NC(=O)c1c2ccccc2nc(-c2ccccc2)c1C[S@+](C)[O-])c1ccccc1 64 CC[C@H](NC(=O)c1c2ccccc2nc(-c2ccccc2)c1[S@+](C)[O-])c1ccccc1 65 CC[S@+]([O-])c1ccc(-c2coc3ccc(-c4ccc(C)o4)cc32)cc1 66 CCc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 67 CN(C[C@@H](CCN1CCC(c2ccc(Br)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 68 CN(C[C@@H](CCN1CCC(c2ccc(F)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 69 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1c2ccccc2cc(C#N)c1-c1ccccc1 70 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(Br)c2ccccc12 71 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(C#N)c2ccccc12 72 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(F)c2ccccc12 73 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2cc(C#N)ccc21 74 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2cc(O)ccc21 75 CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 76 CN1c2cc(C(=O)NCCN3CCCC3)ccc2[S@@+]([O-])c2ccccc2C1=O 77 CN1c2cc(C(=O)NCCc3cccs3)ccc2[S@+]([O-])c2ccccc2C1=O 78 CN1c2cc(C(=O)NCCc3cccs3)ccc2[S@@+]([O-])c2ccccc2C1=O 79 CN1c2cc(C(=O)NCc3ccc(Br)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 80 CN1c2cc(C(=O)NCc3ccc(Cl)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 81 CN1c2cc(C(=O)NCc3cccnc3)ccc2[S@@+]([O-])c2ccccc2C1=O 82 COC(=O)[C@H]1CC2COCC(C1)[C@H]2NC(=O)[C@@]1(C)CCCN1S(=O)(=O)c1cccc(Cl)c1C 83 COC(=O)c1ccc2[nH]c([S@+]([O-])Cc3nccc(OC)c3OC)nc2c1 84 COC(=O)c1ccc2[nH]c([S@@+]([O-])Cc3nccc(OC)c3OC)nc2c1 85 COC1(C[S@@+]([O-])c2ccccc2)CCN(CCc2c[nH]c3ccc(F)cc23)CC1 86 COCCCOc1ccnc(C[S@+]([O-])c2nc3ccccc3[nH]2)c1C 87 CO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 88 CO[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 89 COc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccc(Br)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 90 COc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 91 COc1cc(-C=C-OC(=O)[C@H]2CC[C@@H](N(C)[C@H]3CC[C@H](C(=O)O-C=C-c4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC 92 COc1cc(C(=O)N2CCO[C@@](CCN3CCC4(CC3)c3ccccc3C[S@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 93 COc1cc(C(=O)N2CCO[C@@](CCN3CCC4(CC3)c3ccccc3C[S@@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 94 COc1cc(C(=O)N2CCO[C@](CCN3CCC4(CC3)c3ccccc3C[S@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 95 COc1cc(C(=O)N2CCO[C@](CCN3CCC4(CC3)c3ccccc3C[S@@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 96 COc1cc(OC(=O)[C@@H]2CC[C@H](N(C)[C@@H]3CC[C@@H](C(=O)Oc4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC 97 COc1ccc(C2CCN(CC[C@H](CN(C)C(=O)c3c4ccccc4cc(C#N)c3OC)c3ccc(Cl)c(Cl)c3)CC2)c([S@+](C)[O-])c1 98 COc1ccc(C2CCN(CC[C@H](CN(C)C(=O)c3cc(C#N)cc4ccccc43)c3ccc(Cl)c(Cl)c3)CC2)c([S@+](C)[O-])c1 99 COc1ccc(N(C(=O)OC(C)(C)C)[C@@H]2C(C)=C[C@H]([S@@+]([O-])c3ccc(C)cc3)[C@@H]3C(=O)N(C)C(=O)[C@@H]32)cc1 100 COc1ccc([C@@H]2C3(CO)C4[N@](C)C5C6(CO)C([N@@](C)C3C4(CO)[C@H]6c3ccc(OC)cc3)C52CO)cc1 101 COc1ccc([S@+]([O-])c2ccc(C(=O)C3CCN(C4CCCCC4)CC3)cc2)cc1 102 COc1ccc([S@+]([O-])c2ccc(C(C#N)C3CCN(C4CCCCC4)CC3)cc2)cc1 103 COc1ccc([S@+]([O-])c2ccc(C(C#N)N3CCN(C4CCCCC4)CC3)cc2)cc1 104 COc1ccc([S@@+]([O-])c2ccc(C(=O)C3CCN(C4CCCCC4)CC3)cc2)cc1 105 COc1ccc([S@@+]([O-])c2ccc(C(C#N)C3CCN(C4CCCCC4)CC3)cc2)cc1 106 COc1ccc([S@@+]([O-])c2ccc(C(C#N)N3CCN(C4CCCCC4)CC3)cc2)cc1 107 COc1ccc2c(cc(C#N)cc2C(=O)N(C)C[C@@H](CCN2CCC(c3ccccc3[S@+](C)[O-])CC2)c2ccc(Cl)c(Cl)c2)c1 108 COc1ccc2cc(-c3nc(-c4ccc([S@+](C)[O-])cc4C)[nH]c3-c3ccncc3)ccc2c1 109 COc1ccc2cc(-c3nc(-c4ccc([S@@+](C)[O-])cc4C)[nH]c3-c3ccncc3)ccc2c1 110 COc1ccc2nc([S@@+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2c1 111 COc1ccnc(C[S@@+]([O-])c2nc3ccc(OC(F)F)cc3[nH]2)c1OC 112 CSC[S@+]([O-])C[C@H](CO)NC(=O)-C=C-c1c(C)[nH]c(=O)[nH]c1=O 113 CSC[S@@+]([O-])CC(CO)NC(=O)-C=C-c1c(C)nc(O)nc1O 114 C[C@@H](Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1)c1ccccc1 115 C[C@@H]1CC[C@H]2[C@@H](C)[C@@H](CCC(=O)Nc3cccc([S@+](C)[O-])c3)O[C@@H]3O[C@@]4(C)CC[C@@H]1[C@]32OO4 116 C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 117 C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 118 C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@+]1[O-] 119 C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 120 C[C@H](Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1)c1ccccc1 121 C[C@H]1O[C@@H]([C@@H]2CCCN2C)C[S@+]1[O-] 122 C[C@H]1O[C@@H]([C@@H]2CCCN2C)C[S@@+]1[O-] 123 C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 124 C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 125 C[N+](C)(C)C[C@@H]1C[S@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 126 C[N+](C)(C)C[C@@H]1C[S@@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 127 C[N@+]1([O-])[C@H]2CC[C@H]1C[C@H](OC(=O)[C@@H](CO)c1ccccc1)C2 128 C[N@@+]1(CC2CC2)C2CCC1C[C@@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 129 C[N@@+]1(CC2CC2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 130 C[N@@+]1(CCCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 131 C[N@@+]1(CCCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 132 C[N@@+]1(CCCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 133 C[N@@+]1(CCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 134 C[N@@+]1(CCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 135 C[S@+]([O-])CCCCN=C=S 136 C[S@+]([O-])CC[C@@](CO)(C(=O)O[C@H]1CN2CCC1CC2)c1ccccc1 137 C[S@+]([O-])C[C@@H]1[C@@H](O)[C@]23CC[C@H]1C[C@H]2[C@]1(C)CCC[C@](C)(C(=O)O)[C@H]1CC3 138 C[S@+]([O-])C[C@](C)(O)[C@H]1OC(=O)C=C2[C@@]13O[C@@H]3[C@H]1OC(=O)[C@@]3(C)C=CC[C@@]2(C)[C@@H]13 139 C[S@+]([O-])c1ccc(C2=C(c3ccccc3)C(=O)OC2)cc1 140 C[S@+]([O-])c1cccc2c3c(n(Cc4ccc(Cl)cc4)c21)[C@@H](CC(=O)O)CCC3 141 C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC(=O)Cc3ccc(F)cc3)c2)[nH]1 142 C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCCCC3)c2)[nH]1 143 C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCOCC3)c2)[nH]1 144 C[S@@+]([O-])CCCC-C(=N-OS(=O)(=O)[O-])S[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O 145 C[S@@+]([O-])CCCCN=C=S 146 C[S@@+]([O-])C[C@@H]1[C@@H](O)[C@]23CC[C@H]1C[C@H]2[C@]1(C)CCC[C@](C)(C(=O)O)[C@H]1CC3 147 C[S@@+]([O-])Cc1ccc(C(=O)Nc2cccnc2C(=O)NCC2CCOCC2)c2ccccc12 148 Cc1c(C#N)cc(C(=O)N(C)C[C@@H](CCN2CCC(c3ccccc3[S@+](C)[O-])CC2)c2ccc(Cl)c(Cl)c2)c2ccccc12 149 Cc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 150 Cc1c(OCC(F)(F)F)ccnc1C[S@@+]([O-])c1nc2ccccc2[nH]1 151 Cc1cc(=O)c(Oc2ccc(Br)cc2F)c(-c2ccc([S@@+](C)[O-])cc2)o1 152 Cc1cc(=O)c(Oc2ccc(Cl)cc2F)c(-c2ccc([S@@+](C)[O-])cc2)o1 153 Cc1cc(=O)c(Oc2ccc(F)cc2F)c(-c2ccc([S@+](C)[O-])cc2)o1 154 Cc1ccc(-c2ccc3occ(-c4ccc([S@+](C)[O-])cc4)c3c2)o1 155 Cc1ccc(-c2ncc(Cl)cc2-c2ccc([S@+](C)[O-])cc2)cn1 156 Cc1ccc([S@+]([O-])-C(F)=C-c2ccccn2)cc1 157 Cc1ccc([S@+]([O-])c2occc2C=O)cc1 158 Cc1nc(O)nc(O)c1-C=C-C(=O)N[C@@H](CO)C[S@+]([O-])CCl 159 Cc1nc(O)nc(O)c1-C=C-C(=O)N[C@@H](CO)C[S@@+]([O-])CCl 160 NC(=O)C[S@+]([O-])C(c1ccccc1)c1ccccc1 161 NC(=O)C[S@@+]([O-])C(c1ccccc1)c1ccccc1 162 O.O.O.O.[Sr+2].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1.COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 163 O=C(O)CC-C=C-CC[C@H]1[C@H](OCc2ccc(-c3ccccc3)cc2)C[S@+]([O-])[C@@H]1c1cccnc1 164 O=C(O)CC-C=C-CC[C@H]1[C@H](OCc2ccc(-c3ccccc3)cc2)C[S@@+]([O-])[C@@H]1c1cccnc1 165 O=S(=O)([O-])OCCC[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 166 O=S(=O)([O-])OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 167 O=S(=O)([O-])OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 168 O=S(=O)([O-])O[C@@H](CO)CC[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 169 O=S(=O)([O-])O[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 170 O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@@H](O)CO 171 O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@H](O)CO 172 O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@@H](O)CO 173 O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@H](O)CO 174 O=S(=O)([O-])O[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 175 O=S(=O)([O-])O[C@H](CO)[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 176 O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@@H](O)CO 177 O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@H](O)CO 178 O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@@H](O)CO 179 O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@H](O)CO 180 OCC12C3[N@](Cc4ccc(O)cc4)C4C5(CO)C([N@@](Cc6ccc(O)cc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 181 OCC12C3[N@](Cc4ccc(OCc5ccccc5)cc4)C4C5(CO)C([N@@](Cc6ccc(OCc7ccccc7)cc6)C1C3(CO)[C@@H]5c1ccccc1)C4(CO)[C@@H]2c1ccccc1 182 OCC12C3[N@](Cc4cccc(OCc5ccccc5)c4)C4C5(CO)C([N@@](Cc6cccc(OCc7ccccc7)c6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 183 OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccc(O)cc1)C4(CO)[C@H]2c1ccc(O)cc1 184 OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccc(OCc3ccccc3)cc1)C4(CO)[C@H]2c1ccc(OCc2ccccc2)cc1 185 OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1cccc(O)c1)C4(CO)[C@H]2c1cccc(O)c1 186 OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 187 OCC12C3[N@](Cc4cccnc4)C4C5(CO)C([N@@](Cc6cccnc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 188 OCC12C3[N@](Cc4ccncc4)C4C5(CO)C([N@@](Cc6ccncc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 189 OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 190 OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 191 OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 192 OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 193 OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 194 OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 195 OC[C@H](OCc1ccccc1)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO.F[B-](F)(F)F 196 [Br-].C=CCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 197 [Br-].CCCCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 198 [Br-].CCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 199 [Br-].CCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 200 [Br-].C[N@@+]1(CC2CC2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 201 [Br-].C[N@@+]1(CCCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 202 [Br-].C[N@@+]1(CCCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 203 [Br-].C[N@@+]1(CCCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 204 [Br-].C[N@@+]1(CCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 205 [Cl-].CCCCCCCCCCCCCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 206 [Cl-].CCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 207 [Cl-].CO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 208 [Cl-].CO[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 209 [Cl-].OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 210 [Cl-].OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 211 [Cl-].OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 212 [Cl-].OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 213 [Cl-].OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 214 [I-].C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 215 [I-].C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 216 [I-].C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@+]1[O-] 217 [I-].C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 218 [I-].C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 219 [I-].C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 220 [I-].C[N+](C)(C)C[C@@H]1C[S@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 221 [I-].C[N+](C)(C)C[C@@H]1C[S@@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 222 [I-].C[N@@+]1(CCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 223 [K+].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 224 [Mg+2].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1.COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 225 [Na+].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 226 [O-]CC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 227 [O-]C[C@@H](O)[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 228 [O-][C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 229 [O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 230 [O-][S@+](Cc1cc(OCC2CC2)ccn1)c1nc2cc(F)ccc2[nH]1 231 [O-][S@@+](Cc1cc(OCC2CC2)ccn1)c1nc2cc(F)ccc2[nH]1 232
I've been trying to make sense of the 232 failures. Some observations:
- 181 structures with a +1 charged chiral sulfur (ChEMBL20 has 293 structures with a chiral sulfur and 230 with +1 chiral sulfur)
- 34 structures with a chiral nitrogen, of which 23 have a +1 charge and 11 are uncharged (ChEMBL20 240 records with a chiral nitrogen, of which 225 have a +1 charge and 15 have no charge);
- All 34 chiral nitrogens are bridgeheads (I don't know how many are in CHEMBL20)
- 14 of the carbon-only chiral structures are bridgeheads
- 3 remaining carbon-only chiral structures fail
RDKit bug reports
The results of my investigations lead to two RDKit bug reports:
- #1021 AssignStereochemistry() giving incorrect results after FastFindRings()
- #1022 non-canonical SMILES from FragmentOnBonds generated molecule
In the first, Greg identified that that FastFindRings() isn't
putting the two chiral atoms into the same primitive ring, so
AssignStereochemistry() isn't seeing that this is an instance of ring
stereochemistry
.
In the second, Greg points to the August 2015 thread titled "Stereochemistry - Differences between RDKit Indigo" in the RDKit mailing list". Greg comments about nitrogren chirality:
There are two things going on here in the RDKit: 1) Ring stereochemistry 2) stereochemistry about nitrogen centers. Let's start with the second, because it's easier: RDKit does not generally "believe in" stereochemistry around three coordinate nitrogens. ... Back to the first: ring stereochemistry. ... The way the RDKit handles this is something of a hack: it doesn't identify those atoms as chiral centers, but it does preserve the chiral tags when generating a canonical SMILES:
Need SanitizeMol(), not ClearComputedProps()
He proposes that I sanitize the newly created molecule, so I replaced the call to ClearComputedProps() with one to "SanitizeMol()", near the end of fragment_chiral(), as shown here:
# After breaking bonds, should re-sanitize or at least call # ClearComputedProps(). # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482 new_mol = rwmol.GetMol() #new_mol.ClearComputedProps()Chem.SanitizeMol(new_mol) return new_molWith that in place, where there were 232 records which failed my test, now there are 195. All 181 of the chiral sulfurs still fail, 11 of the 34 chiral nitrogens still fail, the chiral carbon bridgeheads all pass, while the 3 remaining chiral carbons still fail.
(I also tested with both ClearComputedProps() and SanitizeMol(), but using both made no difference.)
While better, it's not substantially better. What's going on?
RDKit can produce non-canonical SMILES
At this point we're pushing the edge of what RDKit can handle. A few paragraphs ago I quoted Greg as saying that ring chirality is "something of a hack". I think that's the reason why, of the 232 records that cause a problem, 67 of them don't produce a stable SMILES string. That is, if I parse what should be a canonicalized SMILES string and recanonicalize it, I get a different result. The canonicalization is bi-stable, in that recanonicalization swaps between two possibilites, with a different chirality assignment each time.
Here's a reproducible if you want to try it out yourself:
from rdkit import Chem def Canon(smiles): mol = Chem.MolFromSmiles(smiles) return Chem.MolToSmiles(mol, isomericSmiles=True) def check_if_canonical(smiles): s1 = Canon(smiles) s2 = Canon(s1) if s1 != s2: print("Failed to canonicalize", smiles) print(" input:", smiles) print("canon1:", s1) print("canon2:", s2) print("canon3:", Canon(s2)) else: print("Passed", smiles) for smiles in ( "O[C@H]1CC2CCC(C1)[N@@]2C", "C[C@]1(c2cccnc2)CCCC[S@+]1O", "[C@]1C[S@+]1O"): check_if_canonical(smiles)The output from this is:
Failed to canonicalize O[C@H]1CC2CCC(C1)[N@@]2C input: O[C@H]1CC2CCC(C1)[N@@]2C canon1: C[N@]1C2CCC1C[C@@H](O)C2 canon2: C[N@]1C2CCC1C[C@H](O)C2 canon3: C[N@]1C2CCC1C[C@@H](O)C2 Failed to canonicalize C[C@]1(c2cccnc2)CCCC[S@+]1O input: C[C@]1(c2cccnc2)CCCC[S@+]1O canon1: C[C@]1(c2cccnc2)CCCC[S@@+]1O canon2: C[C@]1(c2cccnc2)CCCC[S@+]1O canon3: C[C@]1(c2cccnc2)CCCC[S@@+]1O Failed to canonicalize [C@]1C[S@+]1O input: [C@]1C[S@+]1O canon1: O[S@@+]1[C]C1 canon2: O[S@+]1[C]C1 canon3: O[S@@+]1[C]C1
Bridgeheads
Many of the failures were due to chiral bridgehead atoms. I used the following two SMARTS to detect bridgeheads:
*~1~*~*(~*~*~2)~*~*~2~*~1 *~1~*~*(~*~*~*~2)~*~*~2~*~1Before I added the SanitizeMol() call, there were 34 chiral nitrogen structures which failed. Of those 34, only 11 are still failures after adding the SanitizeMol(). Of those 11, one is a normal-looking bridgehead:
CC(C(=O)O[C@@H]1CC2CCC(C1)[N@]2C)(c1ccccc1)c1ccccc1It's the only one of the simple nitrogen bridgehead structures which doesn't have a stable canonicalization. (I used the core bridgehead from this structure as the first test case in the previous section, where I showed a few bi-stable SMILES strings.)
The other 10 of the 11 nitrogen bridgehead failures have a more complex ring system, like:
OCC12C3[N@](Cc4cccnc4)C4C5(CO)C([N@@](Cc6cccnc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1All of these have a bi-stable canonicalization.
I also looked at the chiral carbon bridgeheads which failed. Of the original 14, all 14 of them pass after I added the SanitizeMol() call.
The remaining structures
There are three chiral structures which fail even after sanitization, which do not contain a chiral nitrogen or chiral sulfur, and which do not contain a bridgehead. These are:
CCC[C@H]1CC[C@H]([C@H]2CC[C@@H](OC(=O)[C@H]3[C@H](c4ccc(O)cc4)[C@H](C(=O)O[C@H]4CC[C@@H]([C@H]5CC[C@H](CCC)CC5)CC4)[C@H]3c3ccc(O)cc3)CC2)CC1 COc1cc(-C=C-OC(=O)[C@H]2CC[C@@H](N(C)[C@H]3CC[C@H](C(=O)O-C=C-c4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC COc1cc(OC(=O)[C@@H]2CC[C@H](N(C)[C@@H]3CC[C@@H](C(=O)Oc4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OCUpon investigation, all three seem involve the ring chirality solution that Greg called a "hack". I did not investigate further.
The final code
That was lot of text. And a lot of work. If you made it this far, congratualtions. Oddly, I still have more to write about on the topic.
I'll leave you with the final version of the code, with various tweaks and comments that I didn't discuss in the essay. As a bonus, it includes an implementation of fragment_chiral() which uses RDKit's FragmentOnBonds() function, which is the function you should be using to fragment bonds.
# Cut an RDKit molecule on a specified bond, and replace the old terminals with wildcard atoms ("*"). # The code includes test suite which depends on an external SMILES file. # # This code is meant as a study of the low-level operations. For production use, # see the commented out function which uses RDKit's built-in FragmentOnBonds(). # # Written by Andrew Dalke <dalke@dalkescientific.com>. from __future__ import print_function from rdkit import Chem # You can get a copy of this library from: # http://www.dalkescientific.com/writings/diary/archive/2016/08/09/fragment_achiral_molecules.html#smiles_syntax.py from smiles_syntax import convert_wildcards_to_closures CHI_TETRAHEDRAL_CW = Chem.ChiralType.CHI_TETRAHEDRAL_CW CHI_TETRAHEDRAL_CCW = Chem.ChiralType.CHI_TETRAHEDRAL_CCW def parity_shell(values): # Simple Shell sort; while O(N^2), we only deal with at most 4 values values = list(values) N = len(values) num_swaps = 0 for i in range(N-1): for j in range(i+1, N): if values[i] > values[j]: values[i], values[j] = values[j], values[i] num_swaps += 1 return num_swaps % 2 def get_bond_parity(mol, atom_id): """Compute the parity of the atom's bond permutation Return None if it does not have tetrahedral chirality, 0 for even parity, or 1 for odd parity. """ atom_obj = mol.GetAtomWithIdx(atom_id) # Return None unless it has tetrahedral chirality chiral_tag = atom_obj.GetChiralTag() if chiral_tag not in (CHI_TETRAHEDRAL_CW, CHI_TETRAHEDRAL_CCW): return None # Get the list of atom ids for the each atom it's bonded to. other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] # Use those ids to determine the parity return parity_shell(other_atom_ids) def set_bond_parity(mol, atom_id, old_parity, old_other_atom_id, new_other_atom_id): """Compute the new bond parity and flip chirality if needed to match the old parity""" atom_obj = mol.GetAtomWithIdx(atom_id) # Get the list of atom ids for the each atom it's bonded to. other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()] # Replace id from the new wildcard atom with the id of the original atom i = other_atom_ids.index(new_other_atom_id) other_atom_ids[i] = old_other_atom_id # Use those ids to determine the parity new_parity = parity_shell(other_atom_ids) if old_parity != new_parity: # If the parity has changed, invert the chirality atom_obj.InvertChirality() # You should really use commented-out function below, which uses # RDKit's own fragmentation code. Both do the same thing. def fragment_chiral(mol, atom1, atom2): """Cut the bond between atom1 and atom2 and replace with connections to wildcard atoms Return the fragmented structure as a new molecule. """ rwmol = Chem.RWMol(mol) atom1_parity = get_bond_parity(mol, atom1) atom2_parity = get_bond_parity(mol, atom2) rwmol.RemoveBond(atom1, atom2) wildcard1 = rwmol.AddAtom(Chem.Atom(0)) wildcard2 = rwmol.AddAtom(Chem.Atom(0)) new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE) new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE) if atom1_parity is not None: set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1) if atom2_parity is not None: set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2) # After breaking bonds, should re-sanitize # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482 # However, I didn't see much of an improvement, except for chiral # carbon bridgeheads. new_mol = rwmol.GetMol() Chem.SanitizeMol(new_mol) return new_mol #### Use this code for production ## def fragment_chiral(mol, atom1, atom2): ## bond = mol.GetBondBetweenAtoms(atom1, atom2) ## new_mol = Chem.FragmentOnBonds(mol, [bond.GetIdx()], dummyLabels=[(0, 0)]) ## # After breaking bonds, should re-sanitize ## # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482 ## # However, I didn't see much of an improvement, except for chiral ## # carbon bridgeheads. ## Chem.SanitizeMol(new_mol) ## return new_mol ##### ##### ##### ##### Test code ##### ##### ##### ##### ##### # Create a canonical isomeric SMILES from a SMILES string # Used to put the manually-developed reference structures into canonical form. def Canon(smiles): mol = Chem.MolFromSmiles(smiles) assert mol is not None, smiles return Chem.MolToSmiles(mol, isomericSmiles=True) def simple_test(): for smiles, expected in ( ("CC", Canon("*C.*C")), ("F[C@](Cl)(Br)O", Canon("*F.*[C@](Cl)(Br)O")), ("F[C@@](Cl)(Br)O", Canon("*F.*[C@@](Cl)(Br)O")), ("F[C@@H](Br)O", Canon("*F.*[C@@H](Br)O")), ): mol = Chem.MolFromSmiles(smiles) fragmented_mol = fragment_chiral(mol, 0, 1) fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True) if fragmented_smiles != expected: print("smiles:", smiles) print("fragmented:", fragmented_smiles) print(" expected:", expected) # Match a single bond not in a ring BOND_SMARTS = "[!#0;!#1]-!@[!#0;!#1]" single_bond_pat = Chem.MolFromSmarts(BOND_SMARTS) _bridgehead1_pat = Chem.MolFromSmarts("*~1~*~*(~*~*~*~2)~*~*~2~*~1") _bridgehead2_pat = Chem.MolFromSmarts("*~1~*~*(~*~*~2)~*~*~2~*~1") def is_bridgehead(mol): """Test if the molecule contains one of the bridgehead patterns""" return (mol.HasSubstructMatch(_bridgehead1_pat) or mol.HasSubstructMatch(_bridgehead2_pat)) def file_test(): # Point this to a SMILES file to test filename = "/Users/dalke/databases/chembl_20_rdkit.smi" with open(filename) as infile: num_records = num_successes = num_failures = 0 for lineno, line in enumerate(infile): # Give some progress feedback if lineno % 100 == 0: print("Processed", lineno, "lines and", num_records, "records. Successes:", num_successes, "Failures:", num_failures) # Only test structures with a chiral atom input_smiles = line.split()[0] if "@" not in input_smiles: continue # The code doesn't handle directional bonds. Convert them # to single bonds if "/" in input_smiles: input_smiles = input_smiles.replace("/", "-") if "\\" in input_smiles: input_smiles = input_smiles.replace("\\", "-") mol = Chem.MolFromSmiles(input_smiles) if mol is None: continue ### Uncomment as appropriate if is_bridgehead(mol): pass #continue else: pass continue num_records += 1 # I expect the reassembled structure to match this canonical SMILES expected_smiles = Chem.MolToSmiles(mol, isomericSmiles=True) # Cut each of the non-ring single bonds between two heavy atoms matches = mol.GetSubstructMatches(single_bond_pat) has_failure = False for begin_atom, end_atom in matches: # Fragment fragmented_mol = fragment_chiral(mol, begin_atom, end_atom) fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True) assert "." in fragmented_smiles, fragmented_smiles # safety check # Convert the "*"s to the correct "%90" closures closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0)) assert "%90" in closure_smiles, closure_smiles # safety check closure_mol = Chem.MolFromSmiles(closure_smiles) # canonicalize and compare; report any mismatches final_smiles = Chem.MolToSmiles(closure_mol, isomericSmiles=True) if final_smiles != expected_smiles: print("FAILURE in record", num_records+1) print(" input_smiles:", input_smiles) print(" begin/end atoms:", begin_atom, end_atom) print("fragmented smiles:", fragmented_smiles) print(" closure smiles:", closure_smiles) print(" final smiles:", final_smiles) print(" expected smiles:", expected_smiles) has_failure = True if has_failure: num_failures += 1 else: num_successes += 1 #print("SUCCESS", input_smiles) print("Done. Records:", num_records, "Successes:", num_successes, "Failures:", num_failures) if __name__ == "__main__": simple_test() file_test()
Thanks!
Thanks to Greg Landrum both for RDKit and for help in tracking down some of the stubborn cases. Thanks also to the University of Hamburg for SMARTSViewer, which I use as a SMILES structure viewer so I don't have to worry about bond type, aromaticity, or chiral re-intepretations.