Next-Generation Organic Chemistry Toolkit from SciTouch LLC
We tested the 64-bit Bingo on a slice of PubChem database that contained about 20 million structures in Molfile format. The testing configuration was the following:
| Database | Oracle 11 |
| Operating System | 64-bit Debian Linux |
| CPU | 2.66 GHz Intel Core 2 Duo E8200 |
| RAM | 4 GB |
| HDD | 400 GB |
Converting the table to compact format (Bingo.CompactMolecule) took about 19 hours. Indexing the converted table with FP_TAU_SIZE=0 parameter and with default threads setting took about 58 hours. So the total time was 67 hours. We did not test the performance of the tautomer substructure search this time.
We measured the time of fetching the first 100 hits, the time of fetching all hits, and the total amount of hits.
| Query | Query SMILES | First 100 Hits (Sec.) | All Hits (Sec.) | Number of Hits |
|---|---|---|---|---|
N1C=NN=C1C1=NC=CN=C1 | 0.6 | 11.95 | 942 | |
CCC1=NSC(NCCO)=N1 | 1.22 | 4.46 | 286 | |
CC1=C(C=C(SO)C=C1)[S]=O | 1.91 | 12.69 | 860 | |
CC1=CCNC(NC2=CC=CC=C2C)=N1 | 3.99 | 3.99 | 73 | |
CCOC(=O)C(C)CC1=CC=CC(Br)=C1 | 0.64 | 5.35 | 761 | |
COCC1=C(SN=N1)SC1=CC=CC=C1 | 1.06 | 1.06 | 20 | |
CCCCP(CC)CCC | 0.58 | 35.07 | 4957 | |
OP(O)(=O)CC1=CC=CC=C1 | 0.56 | 47.82 | 10836 | |
CC1=CC(=CC=C1)N1C=NN=N1 | 0.64 | 28.79 | 8538 | |
CC(C)CC1=CC(CF)=CC=C1 | 2.69 | 40.5 | 2880 |
Computing canonical SMILES using Bingo.CanSMILES() operator for the 20-million slice of PubChem database took about 27 hours, which is approximately 200 molecules per second.
One molecule took 38 seconds to canonicalize, another three molecules took 12, 8, and 4 seconds, and each of all the other molecules took less than 3 seconds. Approximately 0.0001% of molecules took more than a second, and approximately 4.3% of molecules took more than 0.01 sec.