Bingo: Performance Measures

Testing configuration

We tested the 64-bit Bingo on a slice of PubChem database that contained about 20 million structures in Molfile format. The testing configuration was the following:

Database Oracle 11
Operating System 64-bit Debian Linux
CPU 2.66 GHz Intel Core 2 Duo E8200
RAM 4 GB
HDD 400 GB

Substructure Search over PubChem Database

Indexing

Converting the table to compact format (Bingo.CompactMolecule) took about 19 hours. Indexing the converted table with FP_TAU_SIZE=0 parameter and with default threads setting took about 58 hours. So the total time was 67 hours. We did not test the performance of the tautomer substructure search this time.

Search

We measured the time of fetching the first 100 hits, the time of fetching all hits, and the total amount of hits.

Query Query SMILES First 100 Hits (Sec.) All Hits (Sec.) Number of Hits
N1C=NN=C1C1=NC=CN=C1 0.6 11.95 942
CCC1=NSC(NCCO)=N1 1.22 4.46 286
CC1=C(C=C(SO)C=C1)[S]=O 1.91 12.69 860
CC1=CCNC(NC2=CC=CC=C2C)=N1 3.99 3.99 73
CCOC(=O)C(C)CC1=CC=CC(Br)=C1 0.64 5.35 761
COCC1=C(SN=N1)SC1=CC=CC=C1 1.06 1.06 20
CCCCP(CC)CCC 0.58 35.07 4957
OP(O)(=O)CC1=CC=CC=C1 0.56 47.82 10836
CC1=CC(=CC=C1)N1C=NN=N1 0.64 28.79 8538
CC(C)CC1=CC(CF)=CC=C1 2.69 40.5 2880

Canonical SMILES computation

Computing canonical SMILES using Bingo.CanSMILES() operator for the 20-million slice of PubChem database took about 27 hours, which is approximately 200 molecules per second. One molecule took 38 seconds to canonicalize, another three molecules took 12, 8, and 4 seconds, and each of all the other molecules took less than 3 seconds. Approximately 0.0001% of molecules took more than a second, and approximately 4.3% of molecules took more than 0.01 sec.

 
Back to top
bingo/performance.txt · Last modified: 2009/10/29 02:45 by root
 
 
This site belongs to SciTouch LLC. Contact us at info@scitouch.net if you have questions or feedback. See also Terms of Use.
This site is driven by Dokuwiki engine.