1
/**********************************************************************
2
Copyright (C) 2006 Chris Morley
4
This file is part of the Open Babel project.
5
For more information, see <http://openbabel.sourceforge.net/>
7
This program is free software; you can redistribute it and/or modify
8
it under the terms of the GNU General Public License as published by
9
the Free Software Foundation version 2 of the License.
11
This program is distributed in the hope that it will be useful,
12
but WITHOUT ANY WARRANTY; without even the implied warranty of
13
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
GNU General Public License for more details.
15
***********************************************************************/
17
#include <openbabel/babelconfig.h>
24
///Returns true if character is not one used in an InChI.
25
bool inline isnic(char ch)
27
//This set of characters could be extended
28
static string nic("\"\'\\@<>!$%&{}[]");
29
return nic.find(ch)!=string::npos;
32
/// @brief Reads an InChI (possibly split) from an input stream and returns it as unsplit text.
34
This function recovers a normal InChI from an input stream which
35
contains other arbitary text. The InChI string can have
36
extraneous characters inserted, for example because of word wrapping,
37
provided it follows certain rules.
39
Dmitrii Tchekhovskoi made a proposal for "InChI hyphenation".
40
http://sourceforge.net/mailarchive/forum.php?thread_id=10200459&forum_id=45166
41
The function here is consistent with this proposal but extends
42
it, allowing a wider range of corrupted InChIs to be accepted.
44
The original proposal was essentially:
45
- When an InChI string is enclosed by " quote characters,
46
any whitespace characters it contains (including new lines) are
48
- Other extraneous strings can also be ignored, but this
50
- The "InChI=" cannot be split.
53
- The character that encloses a quoted InChI does not have to be "
54
and can be any character that is not used in InChI - a NIC
55
[never miss the opportunity for a TLA!]. This means that
56
conflicts in systems which have other uses for the quote character
58
- As well as whitespace characters (which are ignored), a quoted
59
InChI can contain an extraneous string which starts and ends with
60
a NIC. This allows inserted strings like <br /> to be ignored.
61
However, only one such extraneous string is allowed.
62
- There are no restrictions on splitting "InChI=" by whitespace
63
characters, allowing a minimum column width of 1.
64
If the splitting were by an extraneous string the minimum column
67
The following are some examples of split InChIs.
68
OpenBabel will find and convert 12 InChIs
69
in this file, e.g. babel -iinchi getinchi.cpp -osmi
71
First two unbroken examples, the first is unquoted
72
InChI=1/CH4/h1H4 methane
73
"InChI=1/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3" diethyl ether
75
Multiple white space splitting
76
@InChI=1/C15H14O3/c1-11(15(16)17)18-
77
14-10-6-5-9-13(14)12-7-3-2-4-8-12/h2
80
Split with extraneous text, which starts and ends with a non-InChI character
81
'InChI=1/C2H6O/c1-2-<br />3/h3H,2H2,1H3'
83
Table with wrapped InChI column. (View with fixed font.)
85
'InChI=1/CH4/h1H4' !flammable!
86
'InChI=1/C2H2O4/c3-1 !toxic!
89
'InChI=1/CH4O/c1-2/h !flammable! !toxic!
92
'InChI=1/C10H5ClN2/c !no information!
96
Quoted text in emails (but InChI is preserved after one break only).
97
> "InChI=1/C4H7N3OS/c1-7(8)4-9-5-2-3-6-9/h
98
> 2-4,8H,1H3/p+1/fC4H8N3OS/h5H/q+1/t9?"
99
>> "InChI=1/C4H7N3OS/c1-7(8)4-9-5-2-3-6-9/
100
>> h2-4,8H,1H3/p+1/fC4H8N3OS/h5H/q+1/t9?"
102
Column width can be 1 if there is no extraneous text other than whitespace.
103
(When there is an extraneous string with NICs the minimum column width is 2).
122
string OBAPI GetInChI(istream& is);
124
string GetInChI(istream& is)
126
string prefix("InChI=");
128
enum statetype {before_inchi, match_inchi, unquoted, quoted};
129
statetype state = before_inchi;
130
char ch, lastch=0, qch=0;
131
size_t split_pos = 0;
133
while((ch=is.get())!=EOF)
135
if(state==before_inchi)
157
if(ch==qch && state!=match_inchi)
160
result.erase(split_pos);
161
split_pos = result.size();
167
if(state==match_inchi)
169
if(prefix.compare(0,result.size(),result)==0) //true if correct
171
if(result.size()==prefix.size())
172
state = isnic(qch) ? quoted : unquoted;
177
state = before_inchi;