-
-
Save leelasd/43219a222bf57d3e01c2c83f2ad9b031 to your computer and use it in GitHub Desktop.
import pandas as pd | |
from rdkit import Chem | |
from rdkit.Chem import AllChem | |
df = pd.read_csv('SMILES.csv') | |
mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES] | |
hmols = [Chem.AddHs(m) for m in mols] | |
for mol in hmols: | |
AllChem.EmbedMolecule(mol,AllChem.ETKDG()) | |
print(AllChem.UFFOptimizeMolecule(mol,1000)) | |
smiles = list(df.SMILES) | |
sid = list(df.SOURCE_ID) | |
libs = df[df.columns[0]] | |
writer = Chem.SDWriter('TEST.sdf') | |
for n in range(len(libs)): | |
hmols[n].SetProp("_Library","%s"%libs[n]) | |
hmols[n].SetProp("_Name","%s"%sid[n]) | |
hmols[n].SetProp("_SourceID","%s"%sid[n]) | |
hmols[n].SetProp("_SMILES","%s"%smiles[n]) | |
writer.write(hmols[n]) | |
writer.close() |
CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1
Tried above SMILE file, but got error:
mols = [Chem.MolFromSmiles(smi) for smi in df.SMILES]
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib64/python3.9/site-packages/pandas/core/generic.py", line 5487, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'SMILES'
Your SMILES.csv
file should contain the header line so that the file content is:
SMILES
CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1
Update: the above change only deals with the 'SMILES' column but not the 'SOURCE_ID' and the first column whose name is not known to us but looks like a library name for the corresponding compounds. So to get the code to work, the 'SMILES.csv' should look like:
LIBRARY,SMILES,SOURCE_ID
Pubchem,CC1N=C(c2cc3ccccc3o2)N=C1c1ccco1,11234
Where 'Pubchem' and '11234' are just example values for LIBRARY and SOURCE_ID respectively. Also, the 'LIBRARY' column has to be in the first place but the others can change their orders.
For those having more knowledge of the pandas library and the csv format, you should change the original code to be consistent with your csv file instead.
Thanks!