Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Andrew Dalke: Fragment chiral molecules in RDKit

$
0
0

In the previous essay, I showed that the simple fragmentation function doesn't preserve chiral after making a single cut. Here's the function definition:

from rdkit import Chem

# Only works correctly for achiral molecules
def fragment_simple(mol, atom1, atom2):
    rwmol = Chem.RWMol(mol)
    rwmol.RemoveBond(atom1, atom2)
    wildcard1 = rwmol.AddAtom(Chem.Atom(0))
    wildcard2 = rwmol.AddAtom(Chem.Atom(0))
    rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE) 
    rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE) 
    return rwmol.GetMol()
The reason is the RemoveBond()/AddBond() combination can change the permutation order of the bonds around and atom, which inverts the chirality. Here's the relevant part of the connection table from the end of that essay:
  connections from atom 1 (as bond type + other atom index)
1 C -0 -2 -3 -5   original structure
1 C -0 -2 -5 -11  modified; bond to atom 3 is now a bond to atom 11
             ^^^--- was bond to atom 3
         ^^^^^^^--- these two bonds swapped position = inverted chirality

I'll now show how to improve the code to handle chirality. (Note: this essay is pedagogical. To fragment in RDKit use FragmentOnBonds().)

Parity of a permutation

There's no way from Python to go in and change the permutation order of RDKit's bond list for an atom. Instead, I need to detect if the permutation order has changed, and if so, un-invert the atom's chirality.

While I say "un-invert", that's because we only need to deal with tetrahedral chirality, which has only two chirality types. SMILES supports more complicated chiralities, like octahedral (for example, "@OH19") which can't be written simply as "@" or "@@". However, I've never seen them in use.

With only two possibilities, this reduces to determining the "parity" of the permutation. There are only two possible parities. I'll call one "even" and the other "odd", though in code I'll use 0 for even and 1 for odd.

A list of values in increasing order, like (1, 2, 9), has an even parity. If I swap two values then it has odd parity. Both (2, 1, 9) and (9, 2, 1) have odd parity, because each needs only one swap to put it in sorted order. With another swap, such as (2, 9, 1), the permutation order is back to even parity. The parity of a permutation is the number of pairwise swaps needed to order the list, modulo 2. If the result is 0 then it has even parity, if the result is 1 then it has odd parity.

One way to compute the permutation order is to sort the list, and count the number of swaps needed. Since there will only be a handful of bonds, I can use a simple sort like the Shell sort:

def parity_shell(values):
    # Simple Shell sort; okay for small numbers
    values = list(values)
    N = len(values)
    num_swaps = 0
    for i in range(N-1):
        for j in range(i+1, N):
            if values[i] > values[j]:
                values[i], values[j] = values[j], values[i]
                num_swaps += 1
    return num_swaps % 2
I'll test it with a few different cases to see if it gives the expected results:
>>> parity_shell( (1, 2, 9) )
0
>>> parity_shell( (2, 1, 9) )
1
>>> parity_shell( (2, 9, 1) )
0
>>> parity_shell( (2, 1, 9) )
1
>>> parity_shell( (1, 3, 9, 8) )
1
There are faster and better ways to determine the parity. I find it best to start with the most obviously correct solution first.

Determine an atom's parity

The next step is to determine the configuration order before and after attaching the dummy atom. I'll use the fragment_simple() and parity_shell() functions I defined earlier, and define a couple of helper functions to create an isomeric canonical SMILES from a molecule or SMILES string.

from rdkit import Chem

def C(mol): # Create a canonical isomeric SMILES from a molecule
  return Chem.MolToSmiles(mol, isomericSmiles=True)

def Canon(smiles): # Create a canonical isomeric SMILES from a SMILES string
  return C(Chem.MolFromSmiles(smiles))

The permutation order is based on which atoms are connected to a given bond. I'll parse a simple chiral structure (which is already in canonical form) and get the ids for the atoms bonded to the second atom. (The second atom has an index of 1.)

>>> mol = Chem.MolFromSmiles("O[C@](F)(Cl)Br")
>>> C(mol)
'O[C@](F)(Cl)Br'
>>> 
>>> atom_id = 1
>>> atom_obj = mol.GetAtomWithIdx(atom_id)
>>> other_atoms = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]
>>> other_atoms
[0, 2, 3, 4]
The list values are in order, so you won't be surprised it has a parity of 0 ("even"):
>>> parity_shell(other_atoms)
0

I'll use the fragment_simple() function to fragment between the oxygen and the chiral carbon:

>>> fragmented_mol = fragment_simple(mol, 0, 1)
>>> fragmented_smiles = C(fragmented_mol)
>>> fragmented_smiles
'[*]O.[*][C@@](F)(Cl)Br'
the use the convert_wildcards_to_closures() function from the previous essay to re-connect the fragments and produce a canonical SMILES from it:
>>> from smiles_syntax import convert_wildcards_to_closures
>>> 
>>> closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0))
>>> closure_smiles
'O%90.[C@@]%90(F)(Cl)Br'
>>> 
>>> Canon(closure_smiles)
'O[C@@](F)(Cl)Br'

If you compare this to the canonicalized input SMILES you'll see the chirality is inverted from what it should be. I'll see if I can detect that from the list of neighbor atoms to the new atom 1 of the fragmented molecule:

>>> atom_id = 1
>>> atom_obj = fragmented_mol.GetAtomWithIdx(atom_id)
>>> other_atoms = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]
>>> other_atoms
[2, 3, 4, 6]
These values are ordered. It's tempting to conclude that this list also has an even parity. But recall that the original list was [0, 2, 3, 4]. The id 0 (the connection to the oxygen) has been replaced with the id 6 (the connection to the wildcard atom).

The permutation must use the same values, so I'll replace the 6 with a 0 and determine the parity of the resulting list:

>>> i = other_atoms.index(6)
>>> i
3
>>> other_atoms[i] = 0
>>> other_atoms
[2, 3, 4, 0]
>>> parity_shell(other_atoms)
1
This returned a 1 when the ealier parity call returned a 0, which means parity is inverted, which means I need to invert the chirality of the second atom:
>>> atom_obj.InvertChirality()

Now to check the re-assembled structure:

>>> fragmented_smiles = C(fragmented_mol)
>>> fragmented_smiles
'[*]O.[*][C@](F)(Cl)Br'
>>> 
>>> closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0))
>>> closure_smiles
'O%90.[C@]%90(F)(Cl)Br'
>>> 
>>> Canon(closure_smiles)
'O[C@](F)(Cl)Br'
This matches the canonicalized input SMILES, so we're done.

An improved fragment function

I'll use a top-down process to describe the changes to fragment_simple() to make it work. What this doesn't show you is the several iterations I went through to make it look this nice.

At the top level, I need some code to figure out if an atom is chiral, then after I made the cut, and if the atom is chiral, I need some way to restore the correct chirality once I've connected it to the new wildcard atom.

def fragment_chiral(mol, atom1, atom2):
    rwmol = Chem.RWMol(mol)

    # Store the old parity as 0 = even, 1 = odd, or None for no parity 
    atom1_parity = get_bond_parity(mol, atom1)
    atom2_parity = get_bond_parity(mol, atom2)
    
    rwmol.RemoveBond(atom1, atom2)
    wildcard1 = rwmol.AddAtom(Chem.Atom(0))
    wildcard2 = rwmol.AddAtom(Chem.Atom(0))
    new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE)
    new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE)

    # Restore the correct parity if there is a parity
    if atom1_parity is not None:
        set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1)
    if atom2_parity is not None:
        set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2)

    # (Later I'll find I also need to call SanitizeMol().)
    return rwmol.GetMol()

To get atom's bond permutation parity, check if it has tetrahedral chirality (it will either be clockwise or counter-clockwise). If it doesn't have a tetrahedral chirality, return None. Otherwise use the neighboring atom ids to determine the parity:

from rdkit import Chem

CHI_TETRAHEDRAL_CW = Chem.ChiralType.CHI_TETRAHEDRAL_CW
CHI_TETRAHEDRAL_CCW = Chem.ChiralType.CHI_TETRAHEDRAL_CCW

def get_bond_parity(mol, atom_id):
    atom_obj = mol.GetAtomWithIdx(atom_id)
    
    # Return None unless it has tetrahedral chirality
    chiral_tag = atom_obj.GetChiralTag()
    if chiral_tag not in (CHI_TETRAHEDRAL_CW, CHI_TETRAHEDRAL_CCW):
        return None
    
    # Get the list of atom ids for the each atom it's bonded to.
    other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]
    
    # Use those ids to determine the parity
    return parity_shell(other_atom_ids)

To restore the parity, again get the list of neighboring atom ids, but this time from the fragmented molecule. This will be connected to one of the new wildcard atoms. I need to map that back to the original atom index before I can compute the parity and, if it's changed, invert the chirality:

def set_bond_parity(mol, atom_id, old_parity, old_other_atom_id, new_other_atom_id):
    atom_obj = mol.GetAtomWithIdx(atom_id)
    # Get the list of atom ids for the each atom it's bonded to.
    other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]

    # Replace id from the new wildcard atom with the id of the original atom
    i = other_atom_ids.index(new_other_atom_id)
    other_atom_ids[i] = old_other_atom_id

    # Use those ids to determine the parity
    new_parity = parity_shell(other_atom_ids)
    if old_parity != new_parity:
        # If the parity has changed, invert the chirality
        atom_obj.InvertChirality()

Testing

I used a simple set of tests during the initial development where I split the bond between the first and second atom of a few structures and compared the result to a reference structure that I fragmented manually (and re-canonicalized):

# Create a canonical isomeric SMILES from a SMILES string.
# Used to put the manually-developed reference structures into canonical form.
def Canon(smiles):
    mol = Chem.MolFromSmiles(smiles)
    assert mol is not None, smiles
    return Chem.MolToSmiles(mol, isomericSmiles=True)

def simple_test():
    for smiles, expected in (
            ("CC", Canon("*C.*C")),
            ("F[C@](Cl)(Br)O", Canon("*F.*[C@](Cl)(Br)O")),
            ("F[C@@](Cl)(Br)O", Canon("*F.*[C@@](Cl)(Br)O")),
            ("F[C@@H](Br)O", Canon("*F.*[C@@H](Br)O")),
            ):
        mol = Chem.MolFromSmiles(smiles)
        fragmented_mol = fragment_chiral(mol, 0, 1)
        fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True)
        if fragmented_smiles != expected:
            print("smiles:", smiles)
            print("fragmented:", fragmented_smiles)
            print("  expected:", expected)
These tests passed, so I developed some more extensive tests. My experience is that real-world chemistry is far more complex and interesting than the manual test cases I develop. After the basic tests are done, I do more extensive testing by processing a large number of structures from PubChem and then from ChEMBL.

ChEMBL structures comes from multiple sources

I'll go on a tanget. Why do I start with PubChem before going on to ChEMBL? The PubChem data is all generated by one chemistry toolkit, I think OEChem, while CheMBL data comes from many sources. To get a sense of the diversity, I processed the ChEBML 21 release to get the second line of each record. The ctfile format specification says that if non-blank then the 8 characters starting in the third position should contain the program name. I'll write a program to extract those names and count how many times each one occurs.

The following program is not a general-purpose SD file reader. It depends on the specific layout of ChEMBL 21, where there are only two lines which start with "CHEMBL"; the title line and the tag data after the "chembl_id" tag.

import sys, gzip
from collections import Counter
counter = Counter()
with gzip.open("/Users/dalke/databases/chembl_21.sdf.gz", "rb") as infile:
  for line in infile:
      # Get the line after the title line (which starts with 'CHEMBL')
      if line[:6] == b"CHEMBL":
          next_line = next(infile)
          # Print unique program names
          program = next_line[2:10]
          if program not in counter:
              print("New:", repr(str(program, "ascii")))
          counter[program] += 1
      
      if line[:13] == b">":
          ignore = next(infile) # skip the CHEMBL
  
  print("Done. Here are the counts for each program seen.")
  for name, count in counter.most_common():
    print(repr(str(name, "ascii")), count)
Here are the counts: #count_table { border-collapse: collapse; margin-left: 5em; } #count_table th { border-bottom: 1px solid black; } #count_table th:nth-child(2) { border-left: 1px solid grey; } #count_table tr td:nth-child(2) { border-left: 1px solid grey; text-align: right; } pre.code img { border: 1px dotted #084; }
Program namecount
'SciTegic'1105617
' '326445
'CDK 9'&nbsp69145
'Mrv0541 '30962
''24531
'CDK 6'10805
'CDK 1'8642
'CDK 5'4771
'CDK 8'2209
'-ISIS- '281
'Symyx '171
'CDK 7'144
'-OEChem-'96
'Marvin '61
'Mrv0540 '13
' RDKit'3
'CDK 3'1
The number of different CDK lines is not because of a version number but because CDK doesn't format the line correctly. The specification states that the first few fields of a non-blank line are supposed to be:
  • II: two characters for the user's initials
  • PPPPPPPP: eight characters for the program name
  • MMDDYY: two characters for the month, two for the day, two for the year
  • HHmm: two characters for the hour, two for the minutes
  • … additional fields omitted …
while here's what CDK does:
IIPPPPPPPPMMDDYYHHmmddSS…
  CDK    7/28/10,10:58
  CDK    8/10/10,12:22
  CDK    9/16/09,9:40
  CDK    10/7/09,10:42
  CDK    11/9/10,11:20
Now to return from this (double) tangent to the topic of cutting a bond to produce two fragments.

Need ClearComputedProps()?

The more extensive test processes every structure which contains a chiral atom (where the SMILES contains a '@'), cuts every bond between heavy atoms, so long as it's a single bond not in a ring, and puts the results back together to see if it matches the canonicalized input structure. The code isn't interesting enough to make specific comments about it. You can get the code at the end of this essay.

The first error occurred quickly, and there were many errors. Here's the first one:

FAILURE in record 94
     input_smiles: C[C@H]1CC[C@@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1
  begin/end atoms: 4 5
fragmented smiles: [*]NCc1ccc2c(c1)Cc1c(n[nH]c1-2)-c1ccc(cc1)CC(=O)O.[*][C@H]1CC[C@H](C)CC1
   closure smiles: N%90Cc1ccc2c(c1)Cc1c(n[nH]c1-2)-c1ccc(cc1)CC(=O)O.[C@@H]%901CC[C@H](C)CC1
     final smiles: C[C@H]1CC[C@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1
  expected smiles: C[C@H]1CC[C@@H](NCc2ccc3c(c2)Cc2c(-c4ccc(CC(=O)O)cc4)n[nH]c2-3)CC1
Greg Landrum, the main RDKit developer, points to the solution. He writes: "after you break one or more bonds, you really, really should re-sanitize the molecule (or at least call ClearComputedProps()".

The modified code is:

def fragment_chiral(mol, atom1, atom2):
    rwmol = Chem.RWMol(mol)
    
    atom1_parity = get_bond_parity(mol, atom1)
    atom2_parity = get_bond_parity(mol, atom2)
    
    rwmol.RemoveBond(atom1, atom2)
    wildcard1 = rwmol.AddAtom(Chem.Atom(0))
    wildcard2 = rwmol.AddAtom(Chem.Atom(0))
    new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE)
    new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE)
    
    if atom1_parity is not None:
        set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1)
    if atom2_parity is not None:
        set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2)

    # After breaking bonds, should re-sanitize, or at least use ClearComputedProps()
    # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482
    new_mol = rwmol.GetMol()
    # (I will find that I need to call SanitizeMol().)
    Chem.ClearComputedProps(new_mol)
    return new_mol

Ring chirality failures

There were 200,748 chiral structures in my selected subset from PubChem. Of those, 232 unique structures problems when I try to cut a bond. Here's an example of one of the reported failures:

FAILURE in record 12906
     input_smiles: [O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1
  begin/end atoms: 0 1
fragmented smiles: [*][O-].[*][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1
   closure smiles: [O-]%90.[S@+]%90(CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1
     final smiles: [O-][S@@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1
  expected smiles: [O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1

Here's the complete list of failing structures, which might make a good test set for other programs:

C-N=C(-S)[C@]1(c2cccnc2)CCCC[S@+]1[O-] 1
C=C(c1ccc([S@+]([O-])c2ccc(OC)cc2)cc1)C1CCN(C2CCCCC2)CC1 2
C=C(c1ccc([S@@+]([O-])c2ccc(OC)cc2)cc1)C1CCN(C2CCCCC2)CC1 3
C=CCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 4
C=CCCCC[N@@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 5
C=CC[S@+]([O-])C[C@H](N)C(=O)O 6
CC#CCOc1ccc([S@@+]([O-])[C@H](C(=O)NO)C(C)C)cc1 7
CC(=O)NC[C@H]1CN(c2ccc([S@+](C)[O-])cc2)C(=O)O1 8
CC(=O)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 9
CC(=O)Nc1cc(-c2c(-c3ccc(F)cc3)nc([S@+](C)[O-])n2C)ccn1 10
CC(C(=O)O[C@@H]1CC2CCC(C1)[N@]2C)(c1ccccc1)c1ccccc1 11
CC(C)(C(=O)N[C@H]1C2CCC1C[C@@H](C(=O)O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 12
CC(C)(C(=O)N[C@H]1C2CCCC1C[C@@H](C(=O)O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 13
CC(C)(C(=O)N[C@H]1C2CCCC1C[C@@H](C(N)=O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 14
CC(C)(C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2)N1CCN(c2ccc(C(F)(F)F)cn2)CC1 15
CC(C)(Oc1ccc(C#N)cn1)C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2 16
CC(C)(Oc1ccc(C#N)cn1)C(=O)N[C@H]1C2COCC1C[C@H](C(N)=O)C2 17
CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@@H](C(=O)O)C2 18
CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@@H](C(N)=O)C2 19
CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@H](C(=O)O)C2 20
CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2CCCC1C[C@H](C(N)=O)C2 21
CC(C)(Oc1ccc(Cl)cc1)C(=O)N[C@H]1C2COCC1C[C@@H](C(N)=O)C2 22
CC(C)Cc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 23
CC(C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 24
CC(C)Nc1cc(-c2[nH]c([S@@+]([O-])C(C)C)nc2-c2ccc(F)cc2)ccn1 25
CC(C)OC(=O)N1CCC(Oc2ncnc3c2CCN3c2ccc([S@+](C)[O-])c(F)c2)CC1 26
CC(C)OC(=O)N1CCC(Oc2ncnc3c2CCN3c2ccc([S@@+](C)[O-])c(F)c2)CC1 27
CC(C)[C@@H](C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 28
CC(C)[C@H](C)Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1 29
CC(C)[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCCCC3)c2)[nH]1 30
CC1(C)C(=O)N([C@H]2C3CCCC2C[C@@H](C(N)=O)C3)CC1COc1ccc(C#N)cn1 31
CC1(C)C(=O)N([C@H]2C3CCCC2C[C@H](C(N)=O)C3)CC1COc1ccc(C#N)cn1 32
CC1(C)N=C(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@+]([O-])C(C)(C)[C@@H]2C(=O)O 33
CC1(C)NC(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@+]([O-])C(C)(C)[C@@H]2C(=O)O 34
CC1(C)NC(c2ccccc2)C(=O)N1[C@@H]1C(=O)N2[C@@H]1[S@@+]([O-])C(C)(C)[C@@H]2C(=O)O 35
CCCCCCCCCCCCCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 36
CCCCCCCCC[S@@+]([O-])CC(=O)OC 37
CCCCCCCCC[S@@+]([O-])CC(N)=O 38
CCCCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 39
CCCCCC[C@@H](C(=O)NO)[S@+]([O-])c1ccc(OC)cc1 40
CCCCCC[C@@H](C(=O)NO)[S@@+]([O-])c1ccc(OC)cc1 41
CCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 42
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCCN3CC(C)C)cc1 43
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCCN3CCC)cc1 44
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCN3CC(C)C)cc1 45
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCCN3CCC)cc1 46
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CC(C)C)cc1 47
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CC(C)C)cc1.CS(=O)(=O)O 48
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3CCC)cc1 49
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4cncn4CCC)cc2)CCCN3Cc2cnn(C)c2)cc1 50
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4nncn4CCC)cc2)CCCN3CC(C)C)cc1 51
CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@+]([O-])Cc4nncn4CCC)cc2)CCCN3CCC)cc1 52
CCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 53
CCC[C@H]1CC[C@H]([C@H]2CC[C@@H](OC(=O)[C@H]3[C@H](c4ccc(O)cc4)[C@H](C(=O)O[C@H]4CC[C@@H]([C@H]5CC[C@H](CCC)CC5)CC4)[C@H]3c3ccc(O)cc3)CC2)CC1 54
CCCc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 55
CCN1C(=O)c2ccccc2[C@H]1[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 56
CCN1c2cc(C(=O)NCc3ccc(C#N)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 57
CCN1c2cc(C(=O)NCc3ccc(F)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 58
CCN1c2cc(C(=O)NCc3ccc(OC)cc3)ccc2[S@+]([O-])c2ccccc2C1=O 59
CCN1c2cc(C(=O)NCc3ccc(OC)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 60
CCN1c2cc(C(=O)N[C@H](C)c3ccc(Br)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 61
CCN1c2cc(C(=O)N[C@H](C)c3ccc4ccccc4c3)ccc2[S@@+]([O-])c2ccccc2C1=O 62
CCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 63
CC[C@H](NC(=O)c1c2ccccc2nc(-c2ccccc2)c1C[S@+](C)[O-])c1ccccc1 64
CC[C@H](NC(=O)c1c2ccccc2nc(-c2ccccc2)c1[S@+](C)[O-])c1ccccc1 65
CC[S@+]([O-])c1ccc(-c2coc3ccc(-c4ccc(C)o4)cc32)cc1 66
CCc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 67
CN(C[C@@H](CCN1CCC(c2ccc(Br)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 68
CN(C[C@@H](CCN1CCC(c2ccc(F)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 69
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1c2ccccc2cc(C#N)c1-c1ccccc1 70
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(Br)c2ccccc12 71
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(C#N)c2ccccc12 72
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)c(F)c2ccccc12 73
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2cc(C#N)ccc21 74
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2cc(O)ccc21 75
CN(C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1)C(=O)c1cc(C#N)cc2ccccc21 76
CN1c2cc(C(=O)NCCN3CCCC3)ccc2[S@@+]([O-])c2ccccc2C1=O 77
CN1c2cc(C(=O)NCCc3cccs3)ccc2[S@+]([O-])c2ccccc2C1=O 78
CN1c2cc(C(=O)NCCc3cccs3)ccc2[S@@+]([O-])c2ccccc2C1=O 79
CN1c2cc(C(=O)NCc3ccc(Br)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 80
CN1c2cc(C(=O)NCc3ccc(Cl)cc3)ccc2[S@@+]([O-])c2ccccc2C1=O 81
CN1c2cc(C(=O)NCc3cccnc3)ccc2[S@@+]([O-])c2ccccc2C1=O 82
COC(=O)[C@H]1CC2COCC(C1)[C@H]2NC(=O)[C@@]1(C)CCCN1S(=O)(=O)c1cccc(Cl)c1C 83
COC(=O)c1ccc2[nH]c([S@+]([O-])Cc3nccc(OC)c3OC)nc2c1 84
COC(=O)c1ccc2[nH]c([S@@+]([O-])Cc3nccc(OC)c3OC)nc2c1 85
COC1(C[S@@+]([O-])c2ccccc2)CCN(CCc2c[nH]c3ccc(F)cc23)CC1 86
COCCCOc1ccnc(C[S@+]([O-])c2nc3ccccc3[nH]2)c1C 87
CO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 88
CO[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 89
COc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccc(Br)cc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 90
COc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 91
COc1cc(-C=C-OC(=O)[C@H]2CC[C@@H](N(C)[C@H]3CC[C@H](C(=O)O-C=C-c4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC 92
COc1cc(C(=O)N2CCO[C@@](CCN3CCC4(CC3)c3ccccc3C[S@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 93
COc1cc(C(=O)N2CCO[C@@](CCN3CCC4(CC3)c3ccccc3C[S@@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 94
COc1cc(C(=O)N2CCO[C@](CCN3CCC4(CC3)c3ccccc3C[S@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 95
COc1cc(C(=O)N2CCO[C@](CCN3CCC4(CC3)c3ccccc3C[S@@+]4[O-])(c3ccc(Cl)c(Cl)c3)C2)cc(OC)c1OC 96
COc1cc(OC(=O)[C@@H]2CC[C@H](N(C)[C@@H]3CC[C@@H](C(=O)Oc4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC 97
COc1ccc(C2CCN(CC[C@H](CN(C)C(=O)c3c4ccccc4cc(C#N)c3OC)c3ccc(Cl)c(Cl)c3)CC2)c([S@+](C)[O-])c1 98
COc1ccc(C2CCN(CC[C@H](CN(C)C(=O)c3cc(C#N)cc4ccccc43)c3ccc(Cl)c(Cl)c3)CC2)c([S@+](C)[O-])c1 99
COc1ccc(N(C(=O)OC(C)(C)C)[C@@H]2C(C)=C[C@H]([S@@+]([O-])c3ccc(C)cc3)[C@@H]3C(=O)N(C)C(=O)[C@@H]32)cc1 100
COc1ccc([C@@H]2C3(CO)C4[N@](C)C5C6(CO)C([N@@](C)C3C4(CO)[C@H]6c3ccc(OC)cc3)C52CO)cc1 101
COc1ccc([S@+]([O-])c2ccc(C(=O)C3CCN(C4CCCCC4)CC3)cc2)cc1 102
COc1ccc([S@+]([O-])c2ccc(C(C#N)C3CCN(C4CCCCC4)CC3)cc2)cc1 103
COc1ccc([S@+]([O-])c2ccc(C(C#N)N3CCN(C4CCCCC4)CC3)cc2)cc1 104
COc1ccc([S@@+]([O-])c2ccc(C(=O)C3CCN(C4CCCCC4)CC3)cc2)cc1 105
COc1ccc([S@@+]([O-])c2ccc(C(C#N)C3CCN(C4CCCCC4)CC3)cc2)cc1 106
COc1ccc([S@@+]([O-])c2ccc(C(C#N)N3CCN(C4CCCCC4)CC3)cc2)cc1 107
COc1ccc2c(cc(C#N)cc2C(=O)N(C)C[C@@H](CCN2CCC(c3ccccc3[S@+](C)[O-])CC2)c2ccc(Cl)c(Cl)c2)c1 108
COc1ccc2cc(-c3nc(-c4ccc([S@+](C)[O-])cc4C)[nH]c3-c3ccncc3)ccc2c1 109
COc1ccc2cc(-c3nc(-c4ccc([S@@+](C)[O-])cc4C)[nH]c3-c3ccncc3)ccc2c1 110
COc1ccc2nc([S@@+]([O-])Cc3ncc(C)c(OC)c3C)[nH]c2c1 111
COc1ccnc(C[S@@+]([O-])c2nc3ccc(OC(F)F)cc3[nH]2)c1OC 112
CSC[S@+]([O-])C[C@H](CO)NC(=O)-C=C-c1c(C)[nH]c(=O)[nH]c1=O 113
CSC[S@@+]([O-])CC(CO)NC(=O)-C=C-c1c(C)nc(O)nc1O 114
C[C@@H](Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1)c1ccccc1 115
C[C@@H]1CC[C@H]2[C@@H](C)[C@@H](CCC(=O)Nc3cccc([S@+](C)[O-])c3)O[C@@H]3O[C@@]4(C)CC[C@@H]1[C@]32OO4 116
C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 117
C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 118
C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@+]1[O-] 119
C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 120
C[C@H](Nc1cc(-c2[nH]c([S@+](C)[O-])nc2-c2ccc(F)cc2)ccn1)c1ccccc1 121
C[C@H]1O[C@@H]([C@@H]2CCCN2C)C[S@+]1[O-] 122
C[C@H]1O[C@@H]([C@@H]2CCCN2C)C[S@@+]1[O-] 123
C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 124
C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 125
C[N+](C)(C)C[C@@H]1C[S@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 126
C[N+](C)(C)C[C@@H]1C[S@@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 127
C[N@+]1([O-])[C@H]2CC[C@H]1C[C@H](OC(=O)[C@@H](CO)c1ccccc1)C2 128
C[N@@+]1(CC2CC2)C2CCC1C[C@@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 129
C[N@@+]1(CC2CC2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 130
C[N@@+]1(CCCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 131
C[N@@+]1(CCCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 132
C[N@@+]1(CCCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 133
C[N@@+]1(CCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 134
C[N@@+]1(CCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 135
C[S@+]([O-])CCCCN=C=S 136
C[S@+]([O-])CC[C@@](CO)(C(=O)O[C@H]1CN2CCC1CC2)c1ccccc1 137
C[S@+]([O-])C[C@@H]1[C@@H](O)[C@]23CC[C@H]1C[C@H]2[C@]1(C)CCC[C@](C)(C(=O)O)[C@H]1CC3 138
C[S@+]([O-])C[C@](C)(O)[C@H]1OC(=O)C=C2[C@@]13O[C@@H]3[C@H]1OC(=O)[C@@]3(C)C=CC[C@@]2(C)[C@@H]13 139
C[S@+]([O-])c1ccc(C2=C(c3ccccc3)C(=O)OC2)cc1 140
C[S@+]([O-])c1cccc2c3c(n(Cc4ccc(Cl)cc4)c21)[C@@H](CC(=O)O)CCC3 141
C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC(=O)Cc3ccc(F)cc3)c2)[nH]1 142
C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCCCC3)c2)[nH]1 143
C[S@+]([O-])c1nc(-c2ccc(F)cc2)c(-c2ccnc(NC3CCOCC3)c2)[nH]1 144
C[S@@+]([O-])CCCC-C(=N-OS(=O)(=O)[O-])S[C@@H]1O[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O 145
C[S@@+]([O-])CCCCN=C=S 146
C[S@@+]([O-])C[C@@H]1[C@@H](O)[C@]23CC[C@H]1C[C@H]2[C@]1(C)CCC[C@](C)(C(=O)O)[C@H]1CC3 147
C[S@@+]([O-])Cc1ccc(C(=O)Nc2cccnc2C(=O)NCC2CCOCC2)c2ccccc12 148
Cc1c(C#N)cc(C(=O)N(C)C[C@@H](CCN2CCC(c3ccccc3[S@+](C)[O-])CC2)c2ccc(Cl)c(Cl)c2)c2ccccc12 149
Cc1c(C#N)cc2ccccc2c1C(=O)N(C)C[C@@H](CCN1CCC(c2ccccc2[S@+](C)[O-])CC1)c1ccc(Cl)c(Cl)c1 150
Cc1c(OCC(F)(F)F)ccnc1C[S@@+]([O-])c1nc2ccccc2[nH]1 151
Cc1cc(=O)c(Oc2ccc(Br)cc2F)c(-c2ccc([S@@+](C)[O-])cc2)o1 152
Cc1cc(=O)c(Oc2ccc(Cl)cc2F)c(-c2ccc([S@@+](C)[O-])cc2)o1 153
Cc1cc(=O)c(Oc2ccc(F)cc2F)c(-c2ccc([S@+](C)[O-])cc2)o1 154
Cc1ccc(-c2ccc3occ(-c4ccc([S@+](C)[O-])cc4)c3c2)o1 155
Cc1ccc(-c2ncc(Cl)cc2-c2ccc([S@+](C)[O-])cc2)cn1 156
Cc1ccc([S@+]([O-])-C(F)=C-c2ccccn2)cc1 157
Cc1ccc([S@+]([O-])c2occc2C=O)cc1 158
Cc1nc(O)nc(O)c1-C=C-C(=O)N[C@@H](CO)C[S@+]([O-])CCl 159
Cc1nc(O)nc(O)c1-C=C-C(=O)N[C@@H](CO)C[S@@+]([O-])CCl 160
NC(=O)C[S@+]([O-])C(c1ccccc1)c1ccccc1 161
NC(=O)C[S@@+]([O-])C(c1ccccc1)c1ccccc1 162
O.O.O.O.[Sr+2].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1.COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 163
O=C(O)CC-C=C-CC[C@H]1[C@H](OCc2ccc(-c3ccccc3)cc2)C[S@+]([O-])[C@@H]1c1cccnc1 164
O=C(O)CC-C=C-CC[C@H]1[C@H](OCc2ccc(-c3ccccc3)cc2)C[S@@+]([O-])[C@@H]1c1cccnc1 165
O=S(=O)([O-])OCCC[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 166
O=S(=O)([O-])OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 167
O=S(=O)([O-])OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 168
O=S(=O)([O-])O[C@@H](CO)CC[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 169
O=S(=O)([O-])O[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 170
O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@@H](O)CO 171
O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@H](O)CO 172
O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@@H](O)CO 173
O=S(=O)([O-])O[C@@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@H](O)CO 174
O=S(=O)([O-])O[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 175
O=S(=O)([O-])O[C@H](CO)[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 176
O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@@H](O)CO 177
O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@@H](O)[C@H](O)CO 178
O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@@H](O)CO 179
O=S(=O)([O-])O[C@H]([C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO)[C@@H](O)[C@H](O)[C@H](O)CO 180
OCC12C3[N@](Cc4ccc(O)cc4)C4C5(CO)C([N@@](Cc6ccc(O)cc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 181
OCC12C3[N@](Cc4ccc(OCc5ccccc5)cc4)C4C5(CO)C([N@@](Cc6ccc(OCc7ccccc7)cc6)C1C3(CO)[C@@H]5c1ccccc1)C4(CO)[C@@H]2c1ccccc1 182
OCC12C3[N@](Cc4cccc(OCc5ccccc5)c4)C4C5(CO)C([N@@](Cc6cccc(OCc7ccccc7)c6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 183
OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccc(O)cc1)C4(CO)[C@H]2c1ccc(O)cc1 184
OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccc(OCc3ccccc3)cc1)C4(CO)[C@H]2c1ccc(OCc2ccccc2)cc1 185
OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1cccc(O)c1)C4(CO)[C@H]2c1cccc(O)c1 186
OCC12C3[N@](Cc4ccccc4)C4C5(CO)C([N@@](Cc6ccccc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 187
OCC12C3[N@](Cc4cccnc4)C4C5(CO)C([N@@](Cc6cccnc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 188
OCC12C3[N@](Cc4ccncc4)C4C5(CO)C([N@@](Cc6ccncc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1 189
OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 190
OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 191
OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 192
OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 193
OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 194
OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 195
OC[C@H](OCc1ccccc1)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO.F[B-](F)(F)F 196
[Br-].C=CCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 197
[Br-].CCCCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 198
[Br-].CCCCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 199
[Br-].CCCC[N@+]1(C)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 200
[Br-].C[N@@+]1(CC2CC2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 201
[Br-].C[N@@+]1(CCCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 202
[Br-].C[N@@+]1(CCCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 203
[Br-].C[N@@+]1(CCCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 204
[Br-].C[N@@+]1(CCOc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 205
[Cl-].CCCCCCCCCCCCCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 206
[Cl-].CCO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 207
[Cl-].CO[C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 208
[Cl-].CO[C@@H]([C@H](O)[C@@H](O)CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 209
[Cl-].OC[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 210
[Cl-].OC[C@@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 211
[Cl-].OC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 212
[Cl-].OC[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 213
[Cl-].OC[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 214
[I-].C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 215
[I-].C[C@@H]1O[C@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 216
[I-].C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@+]1[O-] 217
[I-].C[C@@H]1O[C@H]([C@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 218
[I-].C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@+]1[O-] 219
[I-].C[C@H]1O[C@@H]([C@@H]2CCC[N+]2(C)C)C[S@@+]1[O-] 220
[I-].C[N+](C)(C)C[C@@H]1C[S@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 221
[I-].C[N+](C)(C)C[C@@H]1C[S@@+]([O-])C(C2CCCCC2)(C2CCCCC2)O1 222
[I-].C[N@@+]1(CCOCc2ccccc2)C2CCC1C[C@H](CC(C#N)(c1ccccc1)c1ccccc1)C2 223
[K+].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 224
[Mg+2].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1.COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 225
[Na+].COc1ccc2[n-]c([S@@+]([O-])Cc3ncc(C)c(OC)c3C)nc2c1 226
[O-]CC[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 227
[O-]C[C@@H](O)[C@@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 228
[O-][C@@H](CO)[C@H](O)C[S@@+]1C[C@@H](O)[C@H](O)[C@H]1CO 229
[O-][S@+](CC1(O)CCN(CCc2c[nH]c3ccc(F)cc23)CC1)c1ccccc1 230
[O-][S@+](Cc1cc(OCC2CC2)ccn1)c1nc2cc(F)ccc2[nH]1 231
[O-][S@@+](Cc1cc(OCC2CC2)ccn1)c1nc2cc(F)ccc2[nH]1 232

I've been trying to make sense of the 232 failures. Some observations:

  • 181 structures with a +1 charged chiral sulfur (ChEMBL20 has 293 structures with a chiral sulfur and 230 with +1 chiral sulfur)
  • 34 structures with a chiral nitrogen, of which 23 have a +1 charge and 11 are uncharged (ChEMBL20 240 records with a chiral nitrogen, of which 225 have a +1 charge and 15 have no charge);
  • All 34 chiral nitrogens are bridgeheads (I don't know how many are in CHEMBL20)
  • 14 of the carbon-only chiral structures are bridgeheads
  • 3 remaining carbon-only chiral structures fail

RDKit bug reports

The results of my investigations lead to two RDKit bug reports:

In the first, Greg identified that that FastFindRings() isn't putting the two chiral atoms into the same primitive ring, so AssignStereochemistry() isn't seeing that this is an instance of ring stereochemistry.

In the second, Greg points to the August 2015 thread titled "Stereochemistry - Differences between RDKit Indigo" in the RDKit mailing list". Greg comments about nitrogren chirality:

There are two things going on here in the RDKit: 1) Ring stereochemistry 2) stereochemistry about nitrogen centers. Let's start with the second, because it's easier: RDKit does not generally "believe in" stereochemistry around three coordinate nitrogens. ... Back to the first: ring stereochemistry. ... The way the RDKit handles this is something of a hack: it doesn't identify those atoms as chiral centers, but it does preserve the chiral tags when generating a canonical SMILES:

Need SanitizeMol(), not ClearComputedProps()

He proposes that I sanitize the newly created molecule, so I replaced the call to ClearComputedProps() with one to "SanitizeMol()", near the end of fragment_chiral(), as shown here:

    # After breaking bonds, should re-sanitize or at least call
    # ClearComputedProps().
    # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482
    new_mol = rwmol.GetMol()
    #new_mol.ClearComputedProps()Chem.SanitizeMol(new_mol)
    return new_mol
With that in place, where there were 232 records which failed my test, now there are 195. All 181 of the chiral sulfurs still fail, 11 of the 34 chiral nitrogens still fail, the chiral carbon bridgeheads all pass, while the 3 remaining chiral carbons still fail.

(I also tested with both ClearComputedProps() and SanitizeMol(), but using both made no difference.)

While better, it's not substantially better. What's going on?

RDKit can produce non-canonical SMILES

At this point we're pushing the edge of what RDKit can handle. A few paragraphs ago I quoted Greg as saying that ring chirality is "something of a hack". I think that's the reason why, of the 232 records that cause a problem, 67 of them don't produce a stable SMILES string. That is, if I parse what should be a canonicalized SMILES string and recanonicalize it, I get a different result. The canonicalization is bi-stable, in that recanonicalization swaps between two possibilites, with a different chirality assignment each time.

Here's a reproducible if you want to try it out yourself:

from rdkit import Chem

def Canon(smiles):
    mol = Chem.MolFromSmiles(smiles)
    return Chem.MolToSmiles(mol, isomericSmiles=True)

def check_if_canonical(smiles):
    s1 = Canon(smiles)
    s2 = Canon(s1)
    if s1 != s2:
        print("Failed to canonicalize", smiles)
        print(" input:", smiles)
        print("canon1:", s1)
        print("canon2:", s2)
        print("canon3:", Canon(s2)) 
    else:
        print("Passed", smiles)
    
for smiles in (
    "O[C@H]1CC2CCC(C1)[N@@]2C",
    "C[C@]1(c2cccnc2)CCCC[S@+]1O",
    "[C@]1C[S@+]1O"):
    check_if_canonical(smiles)
The output from this is:
Failed to canonicalize O[C@H]1CC2CCC(C1)[N@@]2C
 input: O[C@H]1CC2CCC(C1)[N@@]2C
canon1: C[N@]1C2CCC1C[C@@H](O)C2
canon2: C[N@]1C2CCC1C[C@H](O)C2
canon3: C[N@]1C2CCC1C[C@@H](O)C2
Failed to canonicalize C[C@]1(c2cccnc2)CCCC[S@+]1O
 input: C[C@]1(c2cccnc2)CCCC[S@+]1O
canon1: C[C@]1(c2cccnc2)CCCC[S@@+]1O
canon2: C[C@]1(c2cccnc2)CCCC[S@+]1O
canon3: C[C@]1(c2cccnc2)CCCC[S@@+]1O
Failed to canonicalize [C@]1C[S@+]1O
 input: [C@]1C[S@+]1O
canon1: O[S@@+]1[C]C1
canon2: O[S@+]1[C]C1
canon3: O[S@@+]1[C]C1

Bridgeheads

Many of the failures were due to chiral bridgehead atoms. I used the following two SMARTS to detect bridgeheads:

depiction of two bridgehead topologies
*~1~*~*(~*~*~2)~*~*~2~*~1
*~1~*~*(~*~*~*~2)~*~*~2~*~1
Before I added the SanitizeMol() call, there were 34 chiral nitrogen structures which failed. Of those 34, only 11 are still failures after adding the SanitizeMol(). Of those 11, one is a normal-looking bridgehead:
a nitrogen bridgehead that has a bistable SMILES
CC(C(=O)O[C@@H]1CC2CCC(C1)[N@]2C)(c1ccccc1)c1ccccc1
It's the only one of the simple nitrogen bridgehead structures which doesn't have a stable canonicalization. (I used the core bridgehead from this structure as the first test case in the previous section, where I showed a few bi-stable SMILES strings.)

The other 10 of the 11 nitrogen bridgehead failures have a more complex ring system, like:

a complex chiral nitrogen ring system
OCC12C3[N@](Cc4cccnc4)C4C5(CO)C([N@@](Cc6cccnc6)C1C3(CO)[C@H]5c1ccccc1)C4(CO)[C@H]2c1ccccc1
All of these have a bi-stable canonicalization.

I also looked at the chiral carbon bridgeheads which failed. Of the original 14, all 14 of them pass after I added the SanitizeMol() call.

The remaining structures

There are three chiral structures which fail even after sanitization, which do not contain a chiral nitrogen or chiral sulfur, and which do not contain a bridgehead. These are:


CCC[C@H]1CC[C@H]([C@H]2CC[C@@H](OC(=O)[C@H]3[C@H](c4ccc(O)cc4)[C@H](C(=O)O[C@H]4CC[C@@H]([C@H]5CC[C@H](CCC)CC5)CC4)[C@H]3c3ccc(O)cc3)CC2)CC1
COc1cc(-C=C-OC(=O)[C@H]2CC[C@@H](N(C)[C@H]3CC[C@H](C(=O)O-C=C-c4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC
COc1cc(OC(=O)[C@@H]2CC[C@H](N(C)[C@@H]3CC[C@@H](C(=O)Oc4cc(OC)c(OC)c(OC)c4)CC3)CC2)cc(OC)c1OC
Upon investigation, all three seem involve the ring chirality solution that Greg called a "hack". I did not investigate further.

The final code

That was lot of text. And a lot of work. If you made it this far, congratualtions. Oddly, I still have more to write about on the topic.

I'll leave you with the final version of the code, with various tweaks and comments that I didn't discuss in the essay. As a bonus, it includes an implementation of fragment_chiral() which uses RDKit's FragmentOnBonds() function, which is the function you should be using to fragment bonds.

# Cut an RDKit molecule on a specified bond, and replace the old terminals with wildcard atoms ("*").
# The code includes test suite which depends on an external SMILES file.
#
# This code is meant as a study of the low-level operations. For production use,
# see the commented out function which uses RDKit's built-in FragmentOnBonds().
#
# Written by Andrew Dalke <dalke@dalkescientific.com>.

from __future__ import print_function

from rdkit import Chem

# You can get a copy of this library from:
# http://www.dalkescientific.com/writings/diary/archive/2016/08/09/fragment_achiral_molecules.html#smiles_syntax.py
from smiles_syntax import convert_wildcards_to_closures


CHI_TETRAHEDRAL_CW = Chem.ChiralType.CHI_TETRAHEDRAL_CW
CHI_TETRAHEDRAL_CCW = Chem.ChiralType.CHI_TETRAHEDRAL_CCW

def parity_shell(values):
    # Simple Shell sort; while O(N^2), we only deal with at most 4 values
    values = list(values)
    N = len(values)
    num_swaps = 0
    for i in range(N-1):
        for j in range(i+1, N):
            if values[i] > values[j]:
                values[i], values[j] = values[j], values[i]
                num_swaps += 1
    return num_swaps % 2


def get_bond_parity(mol, atom_id):
    """Compute the parity of the atom's bond permutation

    Return None if it does not have tetrahedral chirality,
    0 for even parity, or 1 for odd parity.
    """
    atom_obj = mol.GetAtomWithIdx(atom_id)
    
    # Return None unless it has tetrahedral chirality
    chiral_tag = atom_obj.GetChiralTag()
    if chiral_tag not in (CHI_TETRAHEDRAL_CW, CHI_TETRAHEDRAL_CCW):
        return None
    
    # Get the list of atom ids for the each atom it's bonded to.
    other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]
    
    # Use those ids to determine the parity
    return parity_shell(other_atom_ids)


def set_bond_parity(mol, atom_id, old_parity, old_other_atom_id, new_other_atom_id):
    """Compute the new bond parity and flip chirality if needed to match the old parity"""
    
    atom_obj = mol.GetAtomWithIdx(atom_id)
    # Get the list of atom ids for the each atom it's bonded to.
    other_atom_ids = [bond.GetOtherAtomIdx(atom_id) for bond in atom_obj.GetBonds()]
    
    # Replace id from the new wildcard atom with the id of the original atom
    i = other_atom_ids.index(new_other_atom_id)
    other_atom_ids[i] = old_other_atom_id
    
    # Use those ids to determine the parity
    new_parity = parity_shell(other_atom_ids)
    if old_parity != new_parity:
        # If the parity has changed, invert the chirality
        atom_obj.InvertChirality()

# You should really use commented-out function below, which uses
# RDKit's own fragmentation code. Both do the same thing.

def fragment_chiral(mol, atom1, atom2):
    """Cut the bond between atom1 and atom2 and replace with connections to wildcard atoms

    Return the fragmented structure as a new molecule.
    """
    rwmol = Chem.RWMol(mol)
    
    atom1_parity = get_bond_parity(mol, atom1)
    atom2_parity = get_bond_parity(mol, atom2)
    
    rwmol.RemoveBond(atom1, atom2)
    wildcard1 = rwmol.AddAtom(Chem.Atom(0))
    wildcard2 = rwmol.AddAtom(Chem.Atom(0))
    new_bond1 = rwmol.AddBond(atom1, wildcard1, Chem.BondType.SINGLE)
    new_bond2 = rwmol.AddBond(atom2, wildcard2, Chem.BondType.SINGLE)
    
    if atom1_parity is not None:
        set_bond_parity(rwmol, atom1, atom1_parity, atom2, wildcard1)
    if atom2_parity is not None:
        set_bond_parity(rwmol, atom2, atom2_parity, atom1, wildcard2)
    
    # After breaking bonds, should re-sanitize
    # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482
    # However, I didn't see much of an improvement, except for chiral
    # carbon bridgeheads.
    new_mol = rwmol.GetMol()
    Chem.SanitizeMol(new_mol)
    return new_mol

#### Use this code for production
## def fragment_chiral(mol, atom1, atom2):
##     bond = mol.GetBondBetweenAtoms(atom1, atom2)
##     new_mol = Chem.FragmentOnBonds(mol, [bond.GetIdx()], dummyLabels=[(0, 0)])
##     # After breaking bonds, should re-sanitize
##     # See https://github.com/rdkit/rdkit/issues/1022#issuecomment-239355482
##     # However, I didn't see much of an improvement, except for chiral
##     # carbon bridgeheads.
##     Chem.SanitizeMol(new_mol)
##     return new_mol


##### ##### ##### ##### Test code ##### ##### ##### ##### #####

# Create a canonical isomeric SMILES from a SMILES string
# Used to put the manually-developed reference structures into canonical form.
def Canon(smiles):
    mol = Chem.MolFromSmiles(smiles)
    assert mol is not None, smiles
    return Chem.MolToSmiles(mol, isomericSmiles=True)
        
def simple_test():
    for smiles, expected in (
            ("CC", Canon("*C.*C")),
            ("F[C@](Cl)(Br)O", Canon("*F.*[C@](Cl)(Br)O")),
            ("F[C@@](Cl)(Br)O", Canon("*F.*[C@@](Cl)(Br)O")),
            ("F[C@@H](Br)O", Canon("*F.*[C@@H](Br)O")),
            ):
        mol = Chem.MolFromSmiles(smiles)
        fragmented_mol = fragment_chiral(mol, 0, 1)
        fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True)
        if fragmented_smiles != expected:
            print("smiles:", smiles)
            print("fragmented:", fragmented_smiles)
            print("  expected:", expected)

# Match a single bond not in a ring
BOND_SMARTS = "[!#0;!#1]-!@[!#0;!#1]"
single_bond_pat = Chem.MolFromSmarts(BOND_SMARTS)


_bridgehead1_pat = Chem.MolFromSmarts("*~1~*~*(~*~*~*~2)~*~*~2~*~1")
_bridgehead2_pat = Chem.MolFromSmarts("*~1~*~*(~*~*~2)~*~*~2~*~1")

def is_bridgehead(mol):
    """Test if the molecule contains one of the bridgehead patterns"""
    return (mol.HasSubstructMatch(_bridgehead1_pat) or
            mol.HasSubstructMatch(_bridgehead2_pat))

def file_test():
    # Point this to a SMILES file to test
    filename = "/Users/dalke/databases/chembl_20_rdkit.smi"
    
    with open(filename) as infile:
        num_records = num_successes = num_failures = 0
        for lineno, line in enumerate(infile):
            # Give some progress feedback
            if lineno % 100 == 0:
                print("Processed", lineno, "lines and", num_records,
                      "records. Successes:", num_successes,
                      "Failures:", num_failures)
            
            # Only test structures with a chiral atom
            input_smiles = line.split()[0]
            if "@" not in input_smiles:
                continue
            
            # The code doesn't handle directional bonds. Convert them
            # to single bonds
            if "/" in input_smiles:
                input_smiles = input_smiles.replace("/", "-")
            if "\\" in input_smiles:
                input_smiles = input_smiles.replace("\\", "-")
            
            mol = Chem.MolFromSmiles(input_smiles)
            if mol is None:
                continue
            
            ### Uncomment as appropriate
            if is_bridgehead(mol):
                pass
                #continue
            else:
                pass
                continue
            num_records += 1
            
            # I expect the reassembled structure to match this canonical SMILES
            expected_smiles = Chem.MolToSmiles(mol, isomericSmiles=True)
            
            # Cut each of the non-ring single bonds between two heavy atoms
            matches = mol.GetSubstructMatches(single_bond_pat)
            has_failure = False
            for begin_atom, end_atom in matches:
                # Fragment
                fragmented_mol = fragment_chiral(mol, begin_atom, end_atom)
                fragmented_smiles = Chem.MolToSmiles(fragmented_mol, isomericSmiles=True)
                assert "." in fragmented_smiles, fragmented_smiles # safety check
                
                # Convert the "*"s to the correct "%90" closures
                closure_smiles = convert_wildcards_to_closures(fragmented_smiles, (0, 0))
                assert "%90" in closure_smiles, closure_smiles # safety check
                closure_mol = Chem.MolFromSmiles(closure_smiles)
                
                # canonicalize and compare; report any mismatches
                final_smiles = Chem.MolToSmiles(closure_mol, isomericSmiles=True)
                if final_smiles != expected_smiles:
                    print("FAILURE in record", num_records+1)
                    print("     input_smiles:", input_smiles)
                    print("  begin/end atoms:", begin_atom, end_atom)
                    print("fragmented smiles:", fragmented_smiles)
                    print("   closure smiles:", closure_smiles)
                    print("     final smiles:", final_smiles)
                    print("  expected smiles:", expected_smiles)
                    has_failure = True
            if has_failure:
                num_failures += 1
            else:
                num_successes += 1
                #print("SUCCESS", input_smiles)
        
        print("Done. Records:", num_records, "Successes:", num_successes,
              "Failures:", num_failures)
            
if __name__ == "__main__":
    simple_test()
    file_test()

Thanks!

Thanks to Greg Landrum both for RDKit and for help in tracking down some of the stubborn cases. Thanks also to the University of Hamburg for SMARTSViewer, which I use as a SMILES structure viewer so I don't have to worry about bond type, aromaticity, or chiral re-intepretations.


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>