Next-Generation Organic Chemistry Toolkit from SciTouch LLC
Cano is an open-source library for canonical SMILES and InChI code computation.
Canonical SMILES generated by Cano are, according to Daylight and ChemAxon terminology, unique SMILES with isomeric information, or absolute SMILES. All significant molecule features, such as isotopes, charges, radicals, stereocenters, stereogroups, cis-trans bonds, and aromaticity, are encoded into SMILES in a canonical form. A canonical SMILES string defines the molecule independently of any particular representation (atom renumbering, stereogroup renumbering, explicit/implicit hydrogens). So, the equality of the canonical SMILES of two molecules guarantees that these molecules are the same, and vice versa.
Note: 'Useless' stereocenters are ignored in bost canonical SMILES and InChI generated by Cano. Stereocenter is considered useless when it doesn't give any information for distinguishing stereoisomers. Please see examples below.
InChI support in Cano is preliminary and is not yet 100% conforming to official InChI implementation from IUPAC. The following InChI layers are included:
One notable difference of Cano InChI from IUPAC InChI is that Cano does not mark stereocenter as '?' in the tetrahedral stereochemistry layer if the stereocenter is not specified. A good result from this decision is that Cano is able to construct same InChI code for molecules where 'useless' stereocenters are present. Please see examples below.
Cano is written in portable C++ and supports the Linux, Windows and Mac OS X operating systems. No third-party components are used.
Cano exposes C interface to applications. For Windows, there is also a Cano.Net C# language wrapper. See .NET Library Reference for details.
A command-line utility based on Cano is provided. See Command-line Reference for details.
All operation of Cano is thread-safe, and so there is no problem to use it in multi-threaded applications.
Note: Query features are not supported for canonicalization.
Almost all features of the original Daylight SMILES format are supported, including:
The only features that are not supported are:
The following ChemAxon SMILES extensions are supported:
MDL (Symyx) Molfiles are supported. Almost all format features are supported, including:
The only features that are not supported are:
AROMATICITY), tetrahedral stereocenters (TETRAHEDRAL), and cis-trans bonds information (CISTRANS).| Input SMILES | Parameters | Resulting SMILES |
|---|---|---|
| C1C=CC=CC=1 | +AROMATICITY | c1ccccc1 |
| C1C=CC=CC=1 | -AROMATICITY | C1=CC=CC=C1 |
| C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 | +AROMATICITY | c1ccccc1 |
| C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 | -AROMATICITY | C1=CC=CC=C1 |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| | +TETRAHEDRAL | CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)N[C@H](C)CCO)CC |a:20| |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| | -TETRAHEDRAL | CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)NC(C)CCO)CC |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | +CISTRANS | CC(C)(C)OC(=O)NCCNC(=O)/C=C/C(O)=O |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | -CISTRANS | CC(C)(C)OC(=O)NCCNC(=O)C=CC(O)=O |
The table below presents a comparison on InChI codes given by Cano with InChI codes obtained by IUPAC software. The prefix “InChi=0.2Indigo” (instead of “InChI=1S”) emphasizes that the implementation is at the moment different from the standard.
| Input SMILES | Results |
|---|---|
| C1C=CC=CC=1 | Cano InChI: InChI=0.2Indigo/C6H6/c1-2-4-6-5-3-1/h1-6H IUPAC InChI: InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O | Cano InChI: InChI=0.2Indigo/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20-21,24H,4-12H2,1-3H3/t13-/m1/s1 IUPAC InChI: InChI=1S/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20,24H,4-12H2,1-3H3,(H,21,26)/t13-,14?,15?,16?,17?,20?/m1/s1 |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | Cano InChI: InChI=0.2Indigo/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5,12-13,15H,6-7H2,1-3H3/b5-4+ IUPAC InChI: InChI=1S/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5H,6-7H2,1-3H3,(H,12,14)(H,13,17)(H,15,16)/b5-4+ |
You can see that the gross formula and connection layers of Cano InChI match the corresponding layers of IUPAC InChI, and so do cis-trans layers.
From the pictures below, you can see that all three molecules specify the same mixture. This is represented in the fact that Cano gives identical SMILES and InChI codes for all three molecules.
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Cano InChI: InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1 |
|
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Cano InChI: InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1 |
|
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Cano InChI: InChI=0.2Indigo/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14?/m1/s1 |
Also, you can see that the IUPAC InChI implementation gives slightly different stereocenter layer in the third molecule, than in the first two molecules.
Look at the Downloads page for the installation package suitable for your system.
See also .NET Library Reference and Command-line Reference.
Copyright © 2009-2010 SciTouch LLC
This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If you did not not, please see http://www.gnu.org/licenses/.
If GPL-licensed Cano does not fit your needs, please contact us at info@scitouch.net to discuss the purchase of a commercial license. You may need the commercial license if you want to: