Cano: Overview

Cano is an open-source library for canonical SMILES and InChI code computation.

Canonical SMILES generated by Cano are, according to Daylight and ChemAxon terminology, unique SMILES with isomeric information, or absolute SMILES. All significant molecule features, such as isotopes, charges, radicals, stereocenters, stereogroups, cis-trans bonds, and aromaticity, are encoded into SMILES in a canonical form. A canonical SMILES string defines the molecule independently of any particular representation (atom renumbering, stereogroup renumbering, explicit/implicit hydrogens). So, the equality of the canonical SMILES of two molecules guarantees that these molecules are the same, and vice versa.

Note: 'Useless' stereocenters are ignored in bost canonical SMILES and InChI generated by Cano. Stereocenter is considered useless when it doesn't give any information for distinguishing stereoisomers. Please see examples below.

InChI support in Cano is preliminary and is not yet 100% conforming to official InChI implementation from IUPAC. The following InChI layers are included:

  • Gross formula
  • Connection table
  • Hydrogens (without mobile hydrogens yet)
  • Cis-trans sterochemistry
  • Tetrahedral sterochemistry

One notable difference of Cano InChI from IUPAC InChI is that Cano does not mark stereocenter as '?' in the tetrahedral stereochemistry layer if the stereocenter is not specified. A good result from this decision is that Cano is able to construct same InChI code for molecules where 'useless' stereocenters are present. Please see examples below.

Portability

Cano is written in portable C++ and supports the Linux, Windows and Mac OS X operating systems. No third-party components are used.

Cano exposes C interface to applications. For Windows, there is also a Cano.Net C# language wrapper. See .NET Library Reference for details.

A command-line utility based on Cano is provided. See Command-line Reference for details.

All operation of Cano is thread-safe, and so there is no problem to use it in multi-threaded applications.

Input Formats

Note: Query features are not supported for canonicalization.

Daylight formats with ChemAxon extensions

Almost all features of the original Daylight SMILES format are supported, including:

  • Aromatic rings
  • Tetrahedral stereocenters
  • Cis-trans double bonds

The only features that are not supported are:

  • Non-tetrahedral stereocenters: allene-like, square-planar, trigonal-bipyramidal, octahedral

The following ChemAxon SMILES extensions are supported:

  • Atom aliases
  • Radical numbers: monovalent, divalent singlet, and divalent triplet

MDL formats

MDL (Symyx) Molfiles are supported. Almost all format features are supported, including:

  • Charges, radicals, isotopes, and abnormal valences
  • Chiral centers and stereogroups
  • Pseudo-atoms (atom aliases)

The only features that are not supported are:

  • SGroups and polymers

Other features

  • Automatic detection of input formats
  • Option to turn off: perception of aromatic rings (AROMATICITY), tetrahedral stereocenters (TETRAHEDRAL), and cis-trans bonds information (CISTRANS).

Examples

Canonical SMILES with various options

Input SMILES Parameters Resulting SMILES
C1C=CC=CC=1 +AROMATICITY c1ccccc1
C1C=CC=CC=1 -AROMATICITY C1=CC=CC=C1
C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 +AROMATICITY c1ccccc1
C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 -AROMATICITY C1=CC=CC=C1
N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| +TETRAHEDRAL CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)N[C@H](C)CCO)CC |a:20|
N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| -TETRAHEDRAL CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)NC(C)CCO)CC
C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C +CISTRANS CC(C)(C)OC(=O)NCCNC(=O)/C=C/C(O)=O
C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C -CISTRANS CC(C)(C)OC(=O)NCCNC(=O)C=CC(O)=O

InChI

The table below presents a comparison on InChI codes given by Cano with InChI codes obtained by IUPAC software. The prefix “InChi=0.2Indigo” (instead of “InChI=1S”) emphasizes that the implementation is at the moment different from the standard.

Input SMILES Results
C1C=CC=CC=1 Cano InChI:
InChI=0.2Indigo/C6H6/c1-2-4-6-5-3-1/h1-6H
IUPAC InChI:
InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H
N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O Cano InChI:
InChI=0.2Indigo/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20-21,24H,4-12H2,1-3H3/t13-/m1/s1
IUPAC InChI:
InChI=1S/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20,24H,4-12H2,1-3H3,(H,21,26)/t13-,14?,15?,16?,17?,20?/m1/s1
C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C Cano InChI:
InChI=0.2Indigo/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5,12-13,15H,6-7H2,1-3H3/b5-4+
IUPAC InChI:
InChI=1S/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5H,6-7H2,1-3H3,(H,12,14)(H,13,17)(H,15,16)/b5-4+

You can see that the gross formula and connection layers of Cano InChI match the corresponding layers of IUPAC InChI, and so do cis-trans layers.

Useless stereocenters

From the pictures below, you can see that all three molecules specify the same mixture. This is represented in the fact that Cano gives identical SMILES and InChI codes for all three molecules.

Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O
Cano InChI:
InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1
IUPAC InChI:
InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1
Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O
Cano InChI:
InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1
IUPAC InChI:
InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1
Canonical SMILES:
C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O
Cano InChI:
InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1
IUPAC InChI:
InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14?/m1/s1

Also, you can see that the IUPAC InChI implementation gives slightly different stereocenter layer in the third molecule, than in the first two molecules.

Download and Install

Look at the Downloads page for the installation package suitable for your system.

See also .NET Library Reference and Command-line Reference.

License

Copyright © 2009-2010 SciTouch LLC

This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If you did not not, please see http://www.gnu.org/licenses/.

Commercial Availability

If GPL-licensed Cano does not fit your needs, please contact us at info@scitouch.net to discuss the purchase of a commercial license. You may need the commercial license if you want to:

  • Receive ongoing support and maintenance
  • Include Cano as component in your proprietary software product

Third-Party Projects

Dingo-PHP is a PHP wrapper for Dingo and Cano.

 
Back to top
cano.txt · Last modified: 2010/03/01 10:51 by root
 
 
This site belongs to SciTouch LLC. Contact us at info@scitouch.net if you have questions or feedback. See also Terms of Use.
This site is driven by Dokuwiki engine.