~jpakkane/cuneiform-linux/trunk

1 by JussiP
Imported source zip of Cuneiform
1
/*
2
Copyright (c) 1993-2008, Cognitive Technologies
3
All rights reserved.
4
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
5
Ðàçðåøàåòñÿ ïîâòîðíîå ðàñïðîñòðàíåíèå è èñïîëüçîâàíèå êàê â âèäå èñõîäíîãî êîäà,
6
òàê è â äâîè÷íîé ôîðìå, ñ èçìåíåíèÿìè èëè áåç, ïðè ñîáëþäåíèè ñëåäóþùèõ óñëîâèé:
7
8
      * Ïðè ïîâòîðíîì ðàñïðîñòðàíåíèè èñõîäíîãî êîäà äîëæíû îñòàâàòüñÿ óêàçàííîå
9
        âûøå óâåäîìëåíèå îá àâòîðñêîì ïðàâå, ýòîò ñïèñîê óñëîâèé è ïîñëåäóþùèé
10
        îòêàç îò ãàðàíòèé.
11
      * Ïðè ïîâòîðíîì ðàñïðîñòðàíåíèè äâîè÷íîãî êîäà â äîêóìåíòàöèè è/èëè â
12
        äðóãèõ ìàòåðèàëàõ, ïîñòàâëÿåìûõ ïðè ðàñïðîñòðàíåíèè, äîëæíû ñîõðàíÿòüñÿ
13
        óêàçàííàÿ âûøå èíôîðìàöèÿ îá àâòîðñêîì ïðàâå, ýòîò ñïèñîê óñëîâèé è
14
        ïîñëåäóþùèé îòêàç îò ãàðàíòèé.
15
      * Íè íàçâàíèå Cognitive Technologies, íè èìåíà åå ñîòðóäíèêîâ íå ìîãóò
16
        áûòü èñïîëüçîâàíû â êà÷åñòâå ñðåäñòâà ïîääåðæêè è/èëè ïðîäâèæåíèÿ
17
        ïðîäóêòîâ, îñíîâàííûõ íà ýòîì ÏÎ, áåç ïðåäâàðèòåëüíîãî ïèñüìåííîãî
18
        ðàçðåøåíèÿ.
19
20
ÝÒÀ ÏÐÎÃÐÀÌÌÀ ÏÐÅÄÎÑÒÀÂËÅÍÀ ÂËÀÄÅËÜÖÀÌÈ ÀÂÒÎÐÑÊÈÕ ÏÐÀÂ È/ÈËÈ ÄÐÓÃÈÌÈ ËÈÖÀÌÈ "ÊÀÊ
21
ÎÍÀ ÅÑÒÜ" ÁÅÇ ÊÀÊÎÃÎ-ËÈÁÎ ÂÈÄÀ ÃÀÐÀÍÒÈÉ, ÂÛÐÀÆÅÍÍÛÕ ßÂÍÎ ÈËÈ ÏÎÄÐÀÇÓÌÅÂÀÅÌÛÕ,
22
ÂÊËÞ×Àß ÃÀÐÀÍÒÈÈ ÊÎÌÌÅÐ×ÅÑÊÎÉ ÖÅÍÍÎÑÒÈ È ÏÐÈÃÎÄÍÎÑÒÈ ÄËß ÊÎÍÊÐÅÒÍÎÉ ÖÅËÈ, ÍÎ ÍÅ
23
ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÈÌÈ. ÍÈ ÂËÀÄÅËÅÖ ÀÂÒÎÐÑÊÈÕ ÏÐÀÂ È ÍÈ ÎÄÍÎ ÄÐÓÃÎÅ ËÈÖÎ, ÊÎÒÎÐÎÅ
24
ÌÎÆÅÒ ÈÇÌÅÍßÒÜ È/ÈËÈ ÏÎÂÒÎÐÍÎ ÐÀÑÏÐÎÑÒÐÀÍßÒÜ ÏÐÎÃÐÀÌÌÓ, ÍÈ Â ÊÎÅÌ ÑËÓ×ÀÅ ÍÅ
25
ÍÅÑšÒ ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ, ÂÊËÞ×Àß ËÞÁÛÅ ÎÁÙÈÅ, ÑËÓ×ÀÉÍÛÅ, ÑÏÅÖÈÀËÜÍÛÅ ÈËÈ
26
ÏÎÑËÅÄÎÂÀÂØÈÅ ÓÁÛÒÊÈ, ÑÂßÇÀÍÍÛÅ Ñ ÈÑÏÎËÜÇÎÂÀÍÈÅÌ ÈËÈ ÏÎÍÅÑÅÍÍÛÅ ÂÑËÅÄÑÒÂÈÅ
27
ÍÅÂÎÇÌÎÆÍÎÑÒÈ ÈÑÏÎËÜÇÎÂÀÍÈß ÏÐÎÃÐÀÌÌÛ (ÂÊËÞ×Àß ÏÎÒÅÐÈ ÄÀÍÍÛÕ, ÈËÈ ÄÀÍÍÛÅ,
28
ÑÒÀÂØÈÅ ÍÅÃÎÄÍÛÌÈ, ÈËÈ ÓÁÛÒÊÈ È/ÈËÈ ÏÎÒÅÐÈ ÄÎÕÎÄÎÂ, ÏÎÍÅÑÅÍÍÛÅ ÈÇ-ÇÀ ÄÅÉÑÒÂÈÉ
29
ÒÐÅÒÜÈÕ ËÈÖ È/ÈËÈ ÎÒÊÀÇÀ ÏÐÎÃÐÀÌÌÛ ÐÀÁÎÒÀÒÜ ÑÎÂÌÅÑÒÍÎ Ñ ÄÐÓÃÈÌÈ ÏÐÎÃÐÀÌÌÀÌÈ,
30
ÍÎ ÍÅ ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÝÒÈÌÈ ÑËÓ×ÀßÌÈ), ÍÎ ÍÅ ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÈÌÈ, ÄÀÆÅ ÅÑËÈ ÒÀÊÎÉ
31
ÂËÀÄÅËÅÖ ÈËÈ ÄÐÓÃÎÅ ËÈÖÎ ÁÛËÈ ÈÇÂÅÙÅÍÛ Î ÂÎÇÌÎÆÍÎÑÒÈ ÒÀÊÈÕ ÓÁÛÒÊÎÂ È ÏÎÒÅÐÜ.
1 by JussiP
Imported source zip of Cuneiform
32
33
Redistribution and use in source and binary forms, with or without modification,
34
are permitted provided that the following conditions are met:
35
36
    * Redistributions of source code must retain the above copyright notice,
37
      this list of conditions and the following disclaimer.
38
    * Redistributions in binary form must reproduce the above copyright notice,
39
      this list of conditions and the following disclaimer in the documentation
40
      and/or other materials provided with the distribution.
41
    * Neither the name of the Cognitive Technologies nor the names of its
42
      contributors may be used to endorse or promote products derived from this
43
      software without specific prior written permission.
44
45
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
205 by Jussi Pakkanen
Removed end-of-line whitespace.
46
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
47
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
48
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
49
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
50
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
51
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
52
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
53
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
1 by JussiP
Imported source zip of Cuneiform
54
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
55
*/
56
57
58
// HTML.cpp
59
60
//********************************************************************
61
//
417 by julien
attempt fix comments
62
// HTML.cpp - ôîðìàò HTML
1 by JussiP
Imported source zip of Cuneiform
63
//
64
// This file creation date: 27.05.99
65
// By Eugene Pliskin pliskin@cs.isa.ac.ru
66
//********************************************************************
67
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
68
89 by JussiP
Rout compiles.
69
#include <string.h>
422 by Jussi Pakkanen
Trial merge from refactoring branch.
70
#include <string>
71
#include <sstream>
72
#include <vector>
73
1 by JussiP
Imported source zip of Cuneiform
74
#include "stdafx.h"
75
#include "rout_own.h"
89 by JussiP
Rout compiles.
76
#include "compat_defs.h"
1 by JussiP
Imported source zip of Cuneiform
77
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
78
using namespace std;
79
412.1.14 by uliss
Migration from BOOL to Bool
80
static Bool Static_MakeHTML(Handle hObject, long reason);
1 by JussiP
Imported source zip of Cuneiform
81
412.1.14 by uliss
Migration from BOOL to Bool
82
static Bool FontStyle(ulong newStyle);
83
static Bool BeginParagraph(Handle hObject);
84
static Bool CellStart();
85
static Bool CalcCellSpan();
86
static Bool OptimizeTags();
87
static Bool Picture();
88
static Bool CreatePageFilesFolder();
1 by JussiP
Imported source zip of Cuneiform
89
422 by Jussi Pakkanen
Trial merge from refactoring branch.
90
static ulong sFontStyle = 0;		// Ñòèëü øðèôòà
1 by JussiP
Imported source zip of Cuneiform
91
static long rowspan = 0, colspan = 0;
412.1.14 by uliss
Migration from BOOL to Bool
92
static Bool hocrmode = FALSE; // If true, print hOCR tags to output.
1 by JussiP
Imported source zip of Cuneiform
93
94
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
95
Bool MakeHTML()
1 by JussiP
Imported source zip of Cuneiform
96
{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
97
/* Ôîðìàò HTML.
1 by JussiP
Imported source zip of Cuneiform
98
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
99
   Âêëþ÷àþòñÿ òàáëèöû.
100
   Êîíöû ñòðîê ñîõðàíÿþòñÿ, åñëè gPreserveLineBreaks = TRUE.
1 by JussiP
Imported source zip of Cuneiform
101
*/
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
102
	sFontStyle = 0;			// Ñòèëü øðèôòà
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
103
	hocrmode = FALSE;
1 by JussiP
Imported source zip of Cuneiform
104
105
	return BrowsePage(Static_MakeHTML,
106
				FALSE,		// wantSkipTableCells
107
				FALSE);		// wantSkipParagraphs
108
109
}
110
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
111
Bool MakeHOCR() {
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
112
    sFontStyle = 0;
113
    hocrmode = TRUE;
114
    return BrowsePage(Static_MakeHTML, FALSE, FALSE);
115
}
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
116
117
/*!
118
\brief \~english Put stream bufer into buffer for OCR results.
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
119
       \~russian Ïîìåñòèòü ñîäåðæèìîå ñòðîêîâîãî ïîòîêà â áóôåð
120
                 ðåçóëüòàòîâ ðàñïîçíàâàíèÿ.
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
121
*/
426 by Jussi Pakkanen
Merged more integer type refactoring.
122
static Bool
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
123
strm2buf(const ostringstream& outStrm)
124
{
125
	unsigned long sizeMem = outStrm.str().size();
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
126
	// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
127
	CHECK_MEMORY(sizeMem + 10);
128
129
	::memcpy(gMemCur, outStrm.str().c_str(), sizeMem);
130
	gMemCur += sizeMem;
131
132
	return TRUE;
133
}
134
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
135
136
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
137
/*!
138
\brief \~english Put info about hOCR text line into buffer for OCR results.
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
139
       \~russian Ïîìåñòèòü òåêñòîâóþ ñòðîêó hOCR â áóôåð ðåçóëüòàòîâ ðàñïîçíàâàíèÿ.
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
140
*/
426 by Jussi Pakkanen
Merged more integer type refactoring.
141
static Bool
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
142
writeHocrLineStartTag(Byte* pLineStart, const edRect& rcLine, const unsigned int iLine)
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
143
{
144
	ASSERT(pLineStart);
145
	ostringstream outStrm;
146
	outStrm << "<span class='ocr_line' id='line_" << iLine << "' "
147
		<< "title=\"bbox "
148
		<< rcLine.left << " "
149
		<< rcLine.top << " "
150
		<< rcLine.right << " "
151
		<< rcLine.bottom << "\">";
152
	outStrm.write(reinterpret_cast<const char*>(pLineStart), gMemCur - pLineStart);
153
154
	unsigned long sizeMem = outStrm.str().size();
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
155
	// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
156
	CHECK_MEMORY(sizeMem + 10);
157
158
	::memcpy(pLineStart, outStrm.str().c_str(), sizeMem);
159
	gMemCur = pLineStart + sizeMem;
160
161
	return TRUE;
162
}
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
163
164
165
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
166
static bool
167
isGoodCharRect(const edRect& rc)
168
{
169
	bool goodCharRect = true;
170
	goodCharRect = goodCharRect && (rc.left != -1);
171
	goodCharRect = goodCharRect && (rc.left != 65535);
172
	goodCharRect = goodCharRect && (rc.right != 65535);
173
	goodCharRect = goodCharRect && (rc.top != 65535);
174
	goodCharRect = goodCharRect && (rc.bottom != 65535);
175
	return goodCharRect;
176
}
177
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
178
// decided to use CHECK_MEMORY macro in case it becomes a function which does more things than check if gMemCur+a>gMemEnd
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
179
// as a consequence, this function assures that allocated memory in gMemCur is enough.
426 by Jussi Pakkanen
Merged more integer type refactoring.
180
static Bool
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
181
writeHocrCharBBoxesInfo(const std::vector<edRect > &charBboxes, const unsigned int iLine)
182
{
183
	ostringstream outStrm;
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
184
	outStrm << "<span class='ocr_cinfo' title=\"x_bboxes ";
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
185
186
	for (unsigned int i = 0; i < charBboxes.size(); i++) {
187
		outStrm << charBboxes[i].left << " " << charBboxes[i].top << " "
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
188
				<< charBboxes[i].right << " " << charBboxes[i].bottom << " ";
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
189
	}
190
191
	outStrm << "\"></span>";
192
193
	unsigned long sizeMem = outStrm.str().size();
194
195
	// (check memory assures gMemCur can store and has 10 bytes extra).
196
	// the comment below was copied from writeHocrLine
417 by julien
attempt fix comments
197
	// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
198
	CHECK_MEMORY(sizeMem + 10);
199
200
	::memcpy(gMemCur, outStrm.str().c_str(), sizeMem);
201
	gMemCur += sizeMem;
202
203
	return TRUE;
204
}
205
206
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
207
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
208
Bool Static_MakeHTML(
205 by Jussi Pakkanen
Removed end-of-line whitespace.
209
			Handle hObject,
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
210
			long reason	// Ñì. enum BROWSE_REASON
1 by JussiP
Imported source zip of Cuneiform
211
			)
212
{
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
213
	static char buf[256] = {0};
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
214
    //! \~russian ïðÿìîóãîëüíèê ñèìâîëà
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
215
	edRect r = {0};
216
217
	static unsigned int iPage(1);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
218
    //! \~russian ïðÿìîóãîëüíèê ñòðîêè
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
219
	//! \~english rectangle state variable, for the current line, is expanded per incoming char.
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
220
	static edRect rcLine = {0};
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
221
    //! \~russian ïðÿìîóãîëüíèê ñòðîêè
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
222
	//! \~english true if last none-space character was in line (i.e had a valid bbox).
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
223
	static bool isInLine(false);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
224
    //! \~russian íîìåð òåêóùåé ñòðîêè
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
225
	//! \~english state flag for current line nr.
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
226
	static unsigned int iLine(1);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
227
    //! \~russian ïîçèöèÿ íà÷àëà ñòðîêè â òåêñòîâîì áóôåðå âûâîäà
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
228
	static Byte* pLineStart = 0;
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
229
	//! \~english is the ptr to the location that gMemCur pointed to when reason was BROWSE_LINE_START
230
231
	static std::vector<edRect >	currentLineCharBBoxes;
232
	currentLineCharBBoxes.reserve(200);
233
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
234
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
235
	// Â êîíöå âûçûâàåòñÿ WordControl
1 by JussiP
Imported source zip of Cuneiform
236
237
	switch(reason)
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
238
	{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
239
		case BROWSE_CHAR: // Ñèìâîë
1 by JussiP
Imported source zip of Cuneiform
240
		{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
241
			// Óñòàíîâèòü ÿçûê
1 by JussiP
Imported source zip of Cuneiform
242
			long lang = CED_GetCharFontLang(hObject);
243
			if (lang != gLanguage)
244
				SetLanguage(lang);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
245
			// Ñòèëü øðèôòà
1 by JussiP
Imported source zip of Cuneiform
246
			FontStyle(CED_GetCharFontAttribs(hObject));
247
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
248
			r = CED_GetCharLayout(hObject);
418 by julien
attempt fix comments
249
           		currentLineCharBBoxes.push_back(r);
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
250
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
251
			// Çàïèñàòü ñèìâîë
418 by julien
attempt fix comments
252
            		if(isGoodCharRect(r) && hocrmode)
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
253
			{
418 by julien
attempt fix comments
254
                		if (0 == isInLine)
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
255
				// íà÷íåì îïðåäåëåíèå ãðàíèö ñòðîêè
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
256
				{
257
					if (isGoodCharRect(r))
258
					{
259
						rcLine = r;
260
						isInLine = true;
261
					}
262
				}
263
				else
264
				{
265
					if (isGoodCharRect(r))
266
					{
267
						rcLine.left = min(rcLine.left, r.left);
268
						rcLine.top = min(rcLine.top, r.top);
269
						rcLine.right = max(rcLine.right, r.right);
270
						rcLine.bottom = max(rcLine.bottom, r.bottom);
271
					}
272
					else
273
					{
274
					}
275
				}
276
            }
277
            ONE_CHAR(hObject);
278
279
			break;
280
		}
281
		case BROWSE_LINE_START:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
282
			// Íà÷àëî ñòðîêè òåêñòà
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
283
			pLineStart = gMemCur;
284
			::memset(&rcLine, 0, sizeof(rcLine));
1 by JussiP
Imported source zip of Cuneiform
285
			break;
286
287
		case BROWSE_LINE_END:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
288
			// Êîíåö ñòðîêè òåêñòà
419 by julien
no hocr tags are written if -f html is used.
289
			if (hocrmode)
290
				writeHocrLineStartTag(pLineStart, rcLine, iLine);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
291
			FontStyle(0);
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
292
293
			// write character bounding boxes info
294
			if (currentLineCharBBoxes.size())
419 by julien
no hocr tags are written if -f html is used.
295
				if (hocrmode)
296
					writeHocrCharBBoxesInfo(currentLineCharBBoxes, iLine);
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
297
			currentLineCharBBoxes.resize(0);
298
299
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
300
			isInLine = false;
1 by JussiP
Imported source zip of Cuneiform
301
			if ( gPreserveLineBreaks || gEdLineHardBreak )
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
302
			{
1 by JussiP
Imported source zip of Cuneiform
303
				PUT_STRING("<br>");
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
304
			}
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
305
306
			iLine++;
307
			// close HocrLine tag
308
			PUT_STRING("</span>");
309
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
310
			NEW_LINE;
1 by JussiP
Imported source zip of Cuneiform
311
			break;
312
313
		case BROWSE_PARAGRAPH_START:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
314
			// Íà÷àëî àáçàöà
1 by JussiP
Imported source zip of Cuneiform
315
			FontStyle(0);
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
316
			BeginParagraph(hObject);
1 by JussiP
Imported source zip of Cuneiform
317
			break;
318
319
		case BROWSE_PARAGRAPH_END:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
320
			// Êîíåö àáçàöà
1 by JussiP
Imported source zip of Cuneiform
321
			FontStyle(0);
322
			PUT_STRING("</p>");
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
323
			NEW_LINE;
1 by JussiP
Imported source zip of Cuneiform
324
			break;
325
326
		case BROWSE_PAGE_START:
276.1.4 by Jussi Pakkanen
Doctype fix from Alex Samorukov.
327
			// Start of page.
1 by JussiP
Imported source zip of Cuneiform
328
			FontStyle(0);
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
329
			{
330
				ostringstream outStrm;
331
				outStrm << "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 "
332
					       "Transitional//EN\""
333
						   " \"http://www.w3.org/TR/html4/loose.dtd\">" << endl;
334
				outStrm << "<html><head><title></title>" << endl;
335
				if (gActiveCode==ROUT_CODE_UTF8)
336
				{
337
					outStrm << "<meta http-equiv=\"Content-Type\""
338
						       " content=\"text/html;charset=utf-8\" >" << endl;
339
				}
340
				outStrm << "<meta name='ocr-system' content='openocr'>" << endl;
341
				outStrm << "</head>" << endl << "<body>";
342
				strm2buf(outStrm);
343
			}
344
			{
345
				ostringstream outStrm;
346
				EDSIZE sizeImage(CED_GetPageImageSize(hObject));
347
				const char* pImageName = CED_GetPageImageName(hObject);
348
				assert(pImageName);
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
349
				//ïðèìåð <div class='ocr_page' title='image "page-000.pbm"; bbox 0 0 4306 6064'>
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
350
				outStrm << "<div class='ocr_page' id='page_" << iPage << "' ";
351
				outStrm << "title='image \"" << pImageName << "\"; bbox 0 0 "
414 by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes
352
					<< sizeImage.cx << " " << sizeImage.cy << "'>" << endl;
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
353
				strm2buf(outStrm);
354
				++iPage;
355
			}
1 by JussiP
Imported source zip of Cuneiform
356
			break;
357
358
		case BROWSE_PAGE_END:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
359
			// Êîíåö ñòðàíèöû
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
360
			PUT_STRING("</div>");
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
361
			// Êîíåö äîêóìåíòà
432 by Jussi Pakkanen
End HTML output with linefeed.
362
			PUT_STRING("</body></html>\n");
413 by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp
363
			iLine = 1;
1 by JussiP
Imported source zip of Cuneiform
364
			break;
365
366
		case BROWSE_TABLE_START:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
367
			// Íà÷àëî òàáëèöû
1 by JussiP
Imported source zip of Cuneiform
368
			FontStyle(0);
369
			PUT_STRING("<table border>");
370
			break;
371
372
		case BROWSE_TABLE_END:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
373
			// Êîíåö òàáëèöû
1 by JussiP
Imported source zip of Cuneiform
374
			FontStyle(0);
375
			PUT_STRING("</table>");
376
			break;
377
378
		case BROWSE_ROW_START:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
379
			// Íà÷àëî ñòðîêè òàáëèöû
1 by JussiP
Imported source zip of Cuneiform
380
			PUT_STRING("<tr>");
381
			break;
382
383
		case BROWSE_CELL_START:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
384
			// Íà÷àëî ÿ÷åéêè òàáëèöû
1 by JussiP
Imported source zip of Cuneiform
385
			CellStart();
386
			break;
387
388
		case BROWSE_PICTURE:
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
389
			// Êàðòèíêà
1 by JussiP
Imported source zip of Cuneiform
390
			Picture();
391
			break;
392
393
		}
394
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
395
	// Ñëåæåíèå çà ñëîâàìè è ñòðîêàìè
1 by JussiP
Imported source zip of Cuneiform
396
	WORDS_CONTROL(reason);
397
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
398
	// Óñòðàíåíèå èçáûòî÷íûõ òåãîâ
1 by JussiP
Imported source zip of Cuneiform
399
	OptimizeTags();
400
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
401
	return TRUE;	// Ïðîäîëæèòü ïðîñìîòð
1 by JussiP
Imported source zip of Cuneiform
402
}
403
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
404
static Bool FontStyle(ulong newStyle)
1 by JussiP
Imported source zip of Cuneiform
405
{
406
205 by Jussi Pakkanen
Removed end-of-line whitespace.
407
if ((newStyle & FONT_BOLD) &&
408
				(!(sFontStyle & FONT_BOLD) ||
1 by JussiP
Imported source zip of Cuneiform
409
				  (sFontStyle & FONT_LIGHT)))
410
	{PUT_STRING("<b>");}
411
205 by Jussi Pakkanen
Removed end-of-line whitespace.
412
else if ((sFontStyle & FONT_BOLD) &&
413
				(!(newStyle & FONT_BOLD) ||
1 by JussiP
Imported source zip of Cuneiform
414
				  (newStyle & FONT_LIGHT)))
415
	{PUT_STRING("</b>");}
416
205 by Jussi Pakkanen
Removed end-of-line whitespace.
417
if ((newStyle & FONT_ITALIC) &&
1 by JussiP
Imported source zip of Cuneiform
418
				(!(sFontStyle & FONT_ITALIC) ))
419
	{PUT_STRING("<i>");}
420
205 by Jussi Pakkanen
Removed end-of-line whitespace.
421
else if ((sFontStyle & FONT_ITALIC) &&
1 by JussiP
Imported source zip of Cuneiform
422
				(!(newStyle & FONT_ITALIC) ))
423
	{PUT_STRING("</i>");}
424
205 by Jussi Pakkanen
Removed end-of-line whitespace.
425
if ((newStyle & FONT_UNDERLINE) &&
1 by JussiP
Imported source zip of Cuneiform
426
	!(sFontStyle & FONT_UNDERLINE))
427
	{PUT_STRING("<u>");}
428
205 by Jussi Pakkanen
Removed end-of-line whitespace.
429
else if ((sFontStyle & FONT_UNDERLINE) &&
1 by JussiP
Imported source zip of Cuneiform
430
		 !(newStyle & FONT_UNDERLINE))
431
	{PUT_STRING("</u>");}
432
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
433
// Çàïîìíèòü øðèôò
1 by JussiP
Imported source zip of Cuneiform
434
sFontStyle = newStyle;
435
return TRUE;
436
}
437
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
438
static Bool BeginParagraph(Handle hObject)
1 by JussiP
Imported source zip of Cuneiform
439
{
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
440
	const char *p = NULL;
1 by JussiP
Imported source zip of Cuneiform
441
	char buf[80] = "";
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
442
        edBox b = CED_GetLayout(hObject);
412.1.1 by uliss
BIG Changes))) need to commit mo frequently
443
        ulong alignment = CED_GetAlignment(hObject);
1 by JussiP
Imported source zip of Cuneiform
444
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
445
        switch (alignment & ALIGN_MASK)	{
1 by JussiP
Imported source zip of Cuneiform
446
	case ALIGN_CENTER:
447
		p = "center";
448
		break;
449
450
	case (ALIGN_LEFT | ALIGN_RIGHT):
451
		p = "justify";
452
		break;
453
454
	case ALIGN_LEFT:
455
	default:
456
		// "left" by default
292 by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe.
457
	    ;
458
    }
459
460
    PUT_STRING("<p");
461
    if (p) {
462
        sprintf(buf, " align=%s", p);
463
        PUT_STRING(buf);
464
    }
465
466
    if (b.x != -1 && hocrmode) {
467
        sprintf(buf, " title=\"bbox %d %d %d %d\"", b.x, b.y, b.x + b.w, b.y
468
                + b.h);
469
        PUT_STRING(buf);
470
    }
471
    PUT_STRING(">");
472
473
    return TRUE;
1 by JussiP
Imported source zip of Cuneiform
474
}
475
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
476
static Bool CellStart()
1 by JussiP
Imported source zip of Cuneiform
477
{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
478
// ß÷åéêà òàáëèöû
1 by JussiP
Imported source zip of Cuneiform
479
	char buf[80] = "";
480
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
481
	// Âû÷èñëèòü ðàçìåð ÿ÷åéêè
1 by JussiP
Imported source zip of Cuneiform
482
	CalcCellSpan();
483
484
	if ( rowspan == 1 && colspan == 1 )
485
		strcpy(buf,"<td>");
486
487
	else if ( rowspan > 1 && colspan == 1 )
458 by Jussi Pakkanen
Format modifier fixes.
488
		sprintf(buf,"<td rowspan=%ld>",rowspan);
1 by JussiP
Imported source zip of Cuneiform
489
490
	else if ( rowspan == 1 && colspan > 1 )
458 by Jussi Pakkanen
Format modifier fixes.
491
		sprintf(buf,"<td colspan=%ld>",colspan);
1 by JussiP
Imported source zip of Cuneiform
492
493
	else // ( rowspan > 1 && colspan > 1 )
458 by Jussi Pakkanen
Format modifier fixes.
494
		sprintf(buf,"<td rowspan=%ld colspan=%ld>",rowspan,colspan);
1 by JussiP
Imported source zip of Cuneiform
495
496
	PUT_STRING(buf);
497
	return TRUE;
498
}
499
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
500
static Bool CalcCellSpan()
1 by JussiP
Imported source zip of Cuneiform
501
{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
502
// Âû÷èñëèòü ðàçìåð ÿ÷åéêè
1 by JussiP
Imported source zip of Cuneiform
503
	long row,col;
504
505
	rowspan = 0;
506
	colspan = 0;
507
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
508
	// Ïðîñìîòð âíèç îò òåêóùåé ÿ÷åéêè
1 by JussiP
Imported source zip of Cuneiform
509
	row = gIndexTableRow;
510
	col = gIndexTableCol;
511
512
	while ( row < gTableRows &&
513
			gIndexTableCell == gLogicalCells[row*gTableCols+col]
514
		  )
515
		{
516
		rowspan++;
517
		row++;
518
		}
519
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
520
	// Ïðîñìîòð âïðàâî îò òåêóùåé ÿ÷åéêè
1 by JussiP
Imported source zip of Cuneiform
521
	row = gIndexTableRow;
522
	col = gIndexTableCol;
523
524
	while ( col < gTableCols &&
525
			gIndexTableCell == gLogicalCells[row*gTableCols+col]
526
		  )
527
		{
528
		colspan++;
529
		col++;
530
		}
531
532
	ASSERT(rowspan>0 && colspan>0);
533
	return TRUE;
534
}
535
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
536
static Bool OptimizeTags()
1 by JussiP
Imported source zip of Cuneiform
537
{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
538
	// Óñòðàíåíèå èçáûòî÷íûõ òåãîâ
1 by JussiP
Imported source zip of Cuneiform
539
	long l1 = 0, l2 = 0;
540
	char *p;
541
542
#define SUBST(a,b) {\
543
		l1 = strlen(a);\
544
		l2 = strlen(b);\
545
		p = (char*)gMemCur - l1;\
546
		if (!memcmp(a,p,l1))\
547
			{\
548
			strcpy(p,b);\
549
			gMemCur -= l1 - l2;\
550
			}\
551
		}
552
553
	SUBST("<td><p>","<td>");
554
	SUBST("</p><td>","<td>");
555
	SUBST("</p></table>","</table>");
556
	SUBST("<p></p>","");
557
	SUBST("<br></p>","</p>");
558
559
	return TRUE;
560
}
561
//********************************************************************
412.1.14 by uliss
Migration from BOOL to Bool
562
static Bool Picture()
1 by JussiP
Imported source zip of Cuneiform
563
{
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
564
/* Êàðòèíêà.
1 by JussiP
Imported source zip of Cuneiform
565
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
566
	gPictureNumber - img number 1
567
	gPictureData   - DIB address, wiith header
568
	gPictureLength - DIB length, with header
1 by JussiP
Imported source zip of Cuneiform
569
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
570
	1. Ñîçäàòü ïîäïàïêó äëÿ êàðòèíîê "<page>_files"
571
	2. Çàïèñàòü êàðòèíêó â BMP-ôàéë <íîìåð>.bmp.
572
	3. Âñòàâèòü òåã "img" ñî ññûëêîé íà ôàéë êàðòèíêè.
1 by JussiP
Imported source zip of Cuneiform
573
*/
574
	char buf[256] = "";
575
	char absPicFileName[256] = "";
576
	char relPicFileName[256] = "";
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
577
	char dir[_MAX_PATH], name[_MAX_PATH], ext[_MAX_EXT];
1 by JussiP
Imported source zip of Cuneiform
578
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
579
	// create folder for images gPageFilesFolder.
1 by JussiP
Imported source zip of Cuneiform
580
	if ( !CreatePageFilesFolder() )
581
		return FALSE;
582
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
583
	// create file name
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
584
	split_path(gPageName, dir, name, ext);
1 by JussiP
Imported source zip of Cuneiform
585
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
586
	// write picture to bmp file
587
	if(dir[0])
458 by Jussi Pakkanen
Format modifier fixes.
588
	    sprintf(absPicFileName,"%s/%s/%ld.bmp", dir,
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
589
	            gPageFilesFolder, gPictureNumber);
590
	else
458 by Jussi Pakkanen
Format modifier fixes.
591
	    sprintf(absPicFileName,"%s/%ld.bmp",
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
592
	            gPageFilesFolder, gPictureNumber);
1 by JussiP
Imported source zip of Cuneiform
593
458 by Jussi Pakkanen
Format modifier fixes.
594
	sprintf (relPicFileName,"%s/%ld.bmp",
1 by JussiP
Imported source zip of Cuneiform
595
		gPageFilesFolder, gPictureNumber);
596
597
	if ( !WritePictureToBMP_File(
598
					gPictureData,
599
					gPictureLength,
600
					absPicFileName)
601
		)
602
		return FALSE;
603
256 by Jussi Pakkanen
HTML image fix from Alex Samorukov.
604
	// write img html tag.
1 by JussiP
Imported source zip of Cuneiform
605
	sprintf (buf,"<img src=%s "
458 by Jussi Pakkanen
Format modifier fixes.
606
"width=%ld height=%ld "
1 by JussiP
Imported source zip of Cuneiform
607
"alt=\"%s\">",
608
		relPicFileName,
205 by Jussi Pakkanen
Removed end-of-line whitespace.
609
		gPictureGoal.cx * 72L / 1440L,
1 by JussiP
Imported source zip of Cuneiform
610
		gPictureGoal.cy * 72L / 1440L,
611
		relPicFileName
612
		);
613
614
	PUT_STRING(buf);
615
	return TRUE;
616
}
617
//********************************************************************
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
618
619
/**
620
 * Create a subdirectory to hold image files for html document.
621
 */
412.1.14 by uliss
Migration from BOOL to Bool
622
static Bool CreatePageFilesFolder() {
412.1.1 by uliss
BIG Changes))) need to commit mo frequently
623
    // СПзЎать пПЎпапку Ўля картОМПк gPageFilesFolder.
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
624
    char dir[_MAX_PATH], name[_MAX_PATH], ext[_MAX_EXT], path[_MAX_PATH];
625
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
626
    // Çàäàíî ëè èìÿ ñòðàíèöû?
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
627
    if (!gPageName[0])
628
        return FALSE;
629
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
630
    // Èçãîòîâèòü èìÿ ïîäïàïêè
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
631
    split_path(gPageName, dir, name, ext);
632
    memset(gPageFilesFolder, 0, sizeof(gPageFilesFolder));
633
    sprintf(gPageFilesFolder, "%s_files", name);
634
415 by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding
635
    // Ñîçäàòü ïîäïàïêó
254 by Jussi Pakkanen
Fix path thing. Thanks to Alex Samorukov.
636
    if(dir[0])
637
        sprintf(path, "%s/%s", dir, gPageFilesFolder);
638
    else
639
        sprintf(path, "%s", gPageFilesFolder);
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
640
    if (CreateDirectory(&path[0], 0) == FALSE) {
422 by Jussi Pakkanen
Trial merge from refactoring branch.
641
        uint32_t err = GetLastError();
249 by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov.
642
        if (err != ERROR_ALREADY_EXISTS) {
643
            DEBUG_PRINT("CreatePageFilesFolder error = %d",err);
644
            return FALSE;
645
        }
646
    }
647
648
    return TRUE;
1 by JussiP
Imported source zip of Cuneiform
649
}
650
//********************************************************************