InChI – Knowledge and References

Explore chapters and articles related to this topic

Symbols, Terminology, and Nomenclature

Published in W. M. Haynes, David R. Lide, Thomas J. Bruno, CRC Handbook of Chemistry and Physics, 2016

W. M. Haynes, David R. Lide, Thomas J. Bruno

The IUPAC International Chemical Identifier (InChI) is a freely available, non-proprietary identifier for chemical substances that can be used in both printed and electronic data sources. It is generated from a computerized representation of a molecular structure diagram, which can be produced by chemical structuredrawing software. Its use enables linking of diverse data compilations and unambiguous identification of chemical substances. A full description of the Identifier and software for its generation are available from the IUPAC Web site (Ref. 1), and a helpful compilation of answers to frequently asked questions has been put together at the Unilever Centre for Molecular Science Informatics (Ref. 2). Commercial structure-drawing software that will generate the Identifier is available from several organizations, listed on the IUPAC Web site. The conversion of structural information to the Identifier is based on a set of IUPAC structure conventions, and rules for normalization and canonicalization (conversion to a single, predictable sequence) of an input structure representation. The resulting InChI is simply a series of characters that serve to uniquely identify the structure from which it was derived. The InChI uses a layered format to represent all available structural information relevant to compound identity. InChI layers are listed below. Each layer in an InChI representation contains a specific type of structural information. These layers, automatically extracted from the input structure, are designed so that each successive layer adds additional detail to the Identifier. The specific layers generated depend on the level of structural detail available and whether or not allowance is made for tautomerism. Of course, any ambiguities or uncertainties in the original structure will remain in the InChI. This layered structure design offers a number of advantages. If two structures for the same substance are drawn at different levels of detail, the one with the lower level of detail will, in effect, be contained within the other. Specifically, if one substance is drawn with stereo-bonds and the other without, the layers in the latter will be a subset of the former. The same will hold for compounds treated by one author as tautomers and by another as exact structures with all H-atoms fixed. This can work at a finer level. For example, if one author includes double bond and tetrahedral stereochemistry, but another omits stereochemistry, the latter InChI will be contained in the former. The InChI layers are 1. Formula 2. Connectivity (no formal bond orders) a. disconnected metals b. connected metals 3. Isotopes 4. Stereochemistry a. double bond (Z/E) b. tetrahedral (sp3) 5. Tautomers (on or off ) Charges are not part of the basic InChI, but rather are added at the end of the InChI string. Two examples of InChI representations are given below. It is important to recognize, however, that InChI strings are intended for use by computers and end users need not understand any of their details. In fact, the open nature of InChI and its flexibility of representation, after implementation into software systems, may allow chemists to be even less concerned with the details of structure representation by computers.

A Systematic Review of Deep Learning Approaches for Natural Language Processing in Battery Materials Domain

View Article

Journal Information

Published in IETE Technical Review, 2022

Geetanjali Singh, Namita Mittal, Satyendra Singh Chouhan

The increase in the available chemical data, machine learning and DL architectures enable the advancement in the field of cheminformatics and discovery of important material, which is useful in the representation and exploration of new material formation. New materials were discovered using various traditional methods but suffered from a bottleneck, as the interaction between the molecules is still difficult to predict and the whole process is time consuming. Few text alternatives for chemical structure representation are chemical formula, international chemical identifier (InChI) name given by IUPAC [50] and simplified molecular input line entry specification (SMILES) [51]. These text-based representations of the chemical entities are easily available to the researchers on the internet and hence NLP techniques can be utilized in the processing of the text-based representations of these chemicals and help to discover the unstructured or hidden knowledge. For the purpose of using NLP, some chemical databases are utilized such as UniProt, PDB, PFam, PROSITE, PubChem, DrugBank [52]. Carrera et al. [53], discovered six novel guanidinium salts using the QSPR model, which is based on the CPG neural networks, which can understand the structural relationship of the guanidinium cations.

DFT study of hydrazone-based molecular switches: the effect of different stators on the on/off state distribution

View Article

Journal Information

Published in Molecular Physics, 2019

Silvia Angelova, Vesselina Paskaleva, Nikolay Kochev, Liudmil Antonov

Ambit-Tautomer [15] is an open source tool for automatic tautomer generation. It is written in the object-oriented programming language Java and can be used under different operating systems. The software is part of chemoinformatics platform Ambit [16] where it is integrated as a separate module. Ambit-Tautomer is implemented on top of Chemistry Development Kit (CDK) [17] and supports standard chemical formats like SMILES, InChI, Mol/Sdf files, CML. Ambit-Tautomer includes three algorithms for exhaustive tautomer generation – Pure Combinatorial, Combinatorial Improved and Incremental Approach. The software utilises a list of predefined tautomeric rules that cover the prototropic type of tautomerism and H-shifts of types 1-3, 1-5 and 1-7. Ambit-Tautomer can be customised by a set of pre- and post-filters. The software also has tautomer ranking based on simple empirical energy-based rules.

Thermal Decomposition Mechanism of Nitroglycerin by ReaxFF Reactive Molecular Dynamics Simulations

View Article

Journal Information

Published in Combustion Science and Technology, 2021

Tao Zeng, Rongjie Yang, Jianmin Li, Weiqiang Tang, Dinghua Li

A self-written perl (filename extension) script is applied to analyze the trajectory files obtained during the ReaxFF simulations. After reading in the trajectory files, the chemical bonds are calculated and the molecules are assigned for each frame structure. In each frame, the output molecules are written in a mol (filename extension) format file one by one and InChI (The International Union of Pure and Applied Chemistry (IUPAC) gives the unique identification code for the chemical structure of each compound) (Heller et al. 2013) analysis is performed using Openbabel (The Open Source Chemistry Toolbox) (O’Boyle et al. 2011) to determine the unique identifier of the molecule. According to InChI markers, the number, species and structure of molecules are counted and output.