1
by JussiP
Imported source zip of Cuneiform |
1 |
/*
|
2 |
Copyright (c) 1993-2008, Cognitive Technologies
|
|
3 |
All rights reserved.
|
|
4 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
5 |
Ðàçðåøàåòñÿ ïîâòîðíîå ðàñïðîñòðàíåíèå è èñïîëüçîâàíèå êàê â âèäå èñõîäíîãî êîäà,
|
6 |
òàê è â äâîè÷íîé ôîðìå, ñ èçìåíåíèÿìè èëè áåç, ïðè ñîáëþäåíèè ñëåäóþùèõ óñëîâèé:
|
|
7 |
||
8 |
* Ïðè ïîâòîðíîì ðàñïðîñòðàíåíèè èñõîäíîãî êîäà äîëæíû îñòàâàòüñÿ óêàçàííîå
|
|
9 |
âûøå óâåäîìëåíèå îá àâòîðñêîì ïðàâå, ýòîò ñïèñîê óñëîâèé è ïîñëåäóþùèé
|
|
10 |
îòêàç îò ãàðàíòèé.
|
|
11 |
* Ïðè ïîâòîðíîì ðàñïðîñòðàíåíèè äâîè÷íîãî êîäà â äîêóìåíòàöèè è/èëè â
|
|
12 |
äðóãèõ ìàòåðèàëàõ, ïîñòàâëÿåìûõ ïðè ðàñïðîñòðàíåíèè, äîëæíû ñîõðàíÿòüñÿ
|
|
13 |
óêàçàííàÿ âûøå èíôîðìàöèÿ îá àâòîðñêîì ïðàâå, ýòîò ñïèñîê óñëîâèé è
|
|
14 |
ïîñëåäóþùèé îòêàç îò ãàðàíòèé.
|
|
15 |
* Íè íàçâàíèå Cognitive Technologies, íè èìåíà åå ñîòðóäíèêîâ íå ìîãóò
|
|
16 |
áûòü èñïîëüçîâàíû â êà÷åñòâå ñðåäñòâà ïîääåðæêè è/èëè ïðîäâèæåíèÿ
|
|
17 |
ïðîäóêòîâ, îñíîâàííûõ íà ýòîì ÏÎ, áåç ïðåäâàðèòåëüíîãî ïèñüìåííîãî
|
|
18 |
ðàçðåøåíèÿ.
|
|
19 |
||
20 |
ÝÒÀ ÏÐÎÃÐÀÌÌÀ ÏÐÅÄÎÑÒÀÂËÅÍÀ ÂËÀÄÅËÜÖÀÌÈ ÀÂÒÎÐÑÊÈÕ ÏÐÀÂ È/ÈËÈ ÄÐÓÃÈÌÈ ËÈÖÀÌÈ "ÊÀÊ
|
|
21 |
ÎÍÀ ÅÑÒÜ" ÁÅÇ ÊÀÊÎÃÎ-ËÈÁÎ ÂÈÄÀ ÃÀÐÀÍÒÈÉ, ÂÛÐÀÆÅÍÍÛÕ ßÂÍÎ ÈËÈ ÏÎÄÐÀÇÓÌÅÂÀÅÌÛÕ,
|
|
22 |
ÂÊËÞ×Àß ÃÀÐÀÍÒÈÈ ÊÎÌÌÅÐ×ÅÑÊÎÉ ÖÅÍÍÎÑÒÈ È ÏÐÈÃÎÄÍÎÑÒÈ ÄËß ÊÎÍÊÐÅÒÍÎÉ ÖÅËÈ, ÍÎ ÍÅ
|
|
23 |
ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÈÌÈ. ÍÈ ÂËÀÄÅËÅÖ ÀÂÒÎÐÑÊÈÕ ÏÐÀÂ È ÍÈ ÎÄÍÎ ÄÐÓÃÎÅ ËÈÖÎ, ÊÎÒÎÐÎÅ
|
|
24 |
ÌÎÆÅÒ ÈÇÌÅÍßÒÜ È/ÈËÈ ÏÎÂÒÎÐÍÎ ÐÀÑÏÐÎÑÒÐÀÍßÒÜ ÏÐÎÃÐÀÌÌÓ, ÍÈ Â ÊÎÅÌ ÑËÓ×ÀÅ ÍÅ
|
|
25 |
ÍÅÑšÒ ÎÒÂÅÒÑÒÂÅÍÍÎÑÒÈ, ÂÊËÞ×Àß ËÞÁÛÅ ÎÁÙÈÅ, ÑËÓ×ÀÉÍÛÅ, ÑÏÅÖÈÀËÜÍÛÅ ÈËÈ
|
|
26 |
ÏÎÑËÅÄÎÂÀÂØÈÅ ÓÁÛÒÊÈ, ÑÂßÇÀÍÍÛÅ Ñ ÈÑÏÎËÜÇÎÂÀÍÈÅÌ ÈËÈ ÏÎÍÅÑÅÍÍÛÅ ÂÑËÅÄÑÒÂÈÅ
|
|
27 |
ÍÅÂÎÇÌÎÆÍÎÑÒÈ ÈÑÏÎËÜÇÎÂÀÍÈß ÏÐÎÃÐÀÌÌÛ (ÂÊËÞ×Àß ÏÎÒÅÐÈ ÄÀÍÍÛÕ, ÈËÈ ÄÀÍÍÛÅ,
|
|
28 |
ÑÒÀÂØÈÅ ÍÅÃÎÄÍÛÌÈ, ÈËÈ ÓÁÛÒÊÈ È/ÈËÈ ÏÎÒÅÐÈ ÄÎÕÎÄÎÂ, ÏÎÍÅÑÅÍÍÛÅ ÈÇ-ÇÀ ÄÅÉÑÒÂÈÉ
|
|
29 |
ÒÐÅÒÜÈÕ ËÈÖ È/ÈËÈ ÎÒÊÀÇÀ ÏÐÎÃÐÀÌÌÛ ÐÀÁÎÒÀÒÜ ÑÎÂÌÅÑÒÍÎ Ñ ÄÐÓÃÈÌÈ ÏÐÎÃÐÀÌÌÀÌÈ,
|
|
30 |
ÍÎ ÍÅ ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÝÒÈÌÈ ÑËÓ×ÀßÌÈ), ÍÎ ÍÅ ÎÃÐÀÍÈ×ÈÂÀßÑÜ ÈÌÈ, ÄÀÆÅ ÅÑËÈ ÒÀÊÎÉ
|
|
31 |
ÂËÀÄÅËÅÖ ÈËÈ ÄÐÓÃÎÅ ËÈÖÎ ÁÛËÈ ÈÇÂÅÙÅÍÛ Î ÂÎÇÌÎÆÍÎÑÒÈ ÒÀÊÈÕ ÓÁÛÒÊÎÂ È ÏÎÒÅÐÜ.
|
|
1
by JussiP
Imported source zip of Cuneiform |
32 |
|
33 |
Redistribution and use in source and binary forms, with or without modification,
|
|
34 |
are permitted provided that the following conditions are met:
|
|
35 |
||
36 |
* Redistributions of source code must retain the above copyright notice,
|
|
37 |
this list of conditions and the following disclaimer.
|
|
38 |
* Redistributions in binary form must reproduce the above copyright notice,
|
|
39 |
this list of conditions and the following disclaimer in the documentation
|
|
40 |
and/or other materials provided with the distribution.
|
|
41 |
* Neither the name of the Cognitive Technologies nor the names of its
|
|
42 |
contributors may be used to endorse or promote products derived from this
|
|
43 |
software without specific prior written permission.
|
|
44 |
||
45 |
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
|
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
46 |
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
47 |
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
|
48 |
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
|
|
49 |
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
50 |
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
|
51 |
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
52 |
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
|
53 |
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
|
1
by JussiP
Imported source zip of Cuneiform |
54 |
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
55 |
*/
|
|
56 |
||
57 |
||
58 |
// HTML.cpp
|
|
59 |
||
60 |
//********************************************************************
|
|
61 |
//
|
|
417
by julien
attempt fix comments |
62 |
// HTML.cpp - ôîðìàò HTML
|
1
by JussiP
Imported source zip of Cuneiform |
63 |
//
|
64 |
// This file creation date: 27.05.99
|
|
65 |
// By Eugene Pliskin pliskin@cs.isa.ac.ru
|
|
66 |
//********************************************************************
|
|
67 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
68 |
|
89
by JussiP
Rout compiles. |
69 |
#include <string.h> |
422
by Jussi Pakkanen
Trial merge from refactoring branch. |
70 |
#include <string> |
71 |
#include <sstream> |
|
72 |
#include <vector> |
|
73 |
||
1
by JussiP
Imported source zip of Cuneiform |
74 |
#include "stdafx.h" |
75 |
#include "rout_own.h" |
|
89
by JussiP
Rout compiles. |
76 |
#include "compat_defs.h" |
1
by JussiP
Imported source zip of Cuneiform |
77 |
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
78 |
using namespace std; |
79 |
||
412.1.14
by uliss
Migration from BOOL to Bool |
80 |
static Bool Static_MakeHTML(Handle hObject, long reason); |
1
by JussiP
Imported source zip of Cuneiform |
81 |
|
412.1.14
by uliss
Migration from BOOL to Bool |
82 |
static Bool FontStyle(ulong newStyle); |
83 |
static Bool BeginParagraph(Handle hObject); |
|
84 |
static Bool CellStart(); |
|
85 |
static Bool CalcCellSpan(); |
|
86 |
static Bool OptimizeTags(); |
|
87 |
static Bool Picture(); |
|
88 |
static Bool CreatePageFilesFolder(); |
|
1
by JussiP
Imported source zip of Cuneiform |
89 |
|
422
by Jussi Pakkanen
Trial merge from refactoring branch. |
90 |
static ulong sFontStyle = 0; // Ñòèëü øðèôòà |
1
by JussiP
Imported source zip of Cuneiform |
91 |
static long rowspan = 0, colspan = 0; |
412.1.14
by uliss
Migration from BOOL to Bool |
92 |
static Bool hocrmode = FALSE; // If true, print hOCR tags to output. |
1
by JussiP
Imported source zip of Cuneiform |
93 |
|
94 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
95 |
Bool MakeHTML() |
1
by JussiP
Imported source zip of Cuneiform |
96 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
97 |
/* Ôîðìàò HTML.
|
1
by JussiP
Imported source zip of Cuneiform |
98 |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
99 |
Âêëþ÷àþòñÿ òàáëèöû.
|
100 |
Êîíöû ñòðîê ñîõðàíÿþòñÿ, åñëè gPreserveLineBreaks = TRUE.
|
|
1
by JussiP
Imported source zip of Cuneiform |
101 |
*/
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
102 |
sFontStyle = 0; // Ñòèëü øðèôòà |
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
103 |
hocrmode = FALSE; |
1
by JussiP
Imported source zip of Cuneiform |
104 |
|
105 |
return BrowsePage(Static_MakeHTML, |
|
106 |
FALSE, // wantSkipTableCells |
|
107 |
FALSE); // wantSkipParagraphs |
|
108 |
||
109 |
}
|
|
110 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
111 |
Bool MakeHOCR() { |
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
112 |
sFontStyle = 0; |
113 |
hocrmode = TRUE; |
|
114 |
return BrowsePage(Static_MakeHTML, FALSE, FALSE); |
|
115 |
}
|
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
116 |
|
117 |
/*!
|
|
118 |
\brief \~english Put stream bufer into buffer for OCR results.
|
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
119 |
\~russian Ïîìåñòèòü ñîäåðæèìîå ñòðîêîâîãî ïîòîêà â áóôåð
|
120 |
ðåçóëüòàòîâ ðàñïîçíàâàíèÿ.
|
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
121 |
*/
|
426
by Jussi Pakkanen
Merged more integer type refactoring. |
122 |
static Bool |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
123 |
strm2buf(const ostringstream& outStrm) |
124 |
{
|
|
125 |
unsigned long sizeMem = outStrm.str().size(); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
126 |
// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
127 |
CHECK_MEMORY(sizeMem + 10); |
128 |
||
129 |
::memcpy(gMemCur, outStrm.str().c_str(), sizeMem); |
|
130 |
gMemCur += sizeMem; |
|
131 |
||
132 |
return TRUE; |
|
133 |
}
|
|
134 |
||
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
135 |
|
136 |
||
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
137 |
/*!
|
138 |
\brief \~english Put info about hOCR text line into buffer for OCR results.
|
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
139 |
\~russian Ïîìåñòèòü òåêñòîâóþ ñòðîêó hOCR â áóôåð ðåçóëüòàòîâ ðàñïîçíàâàíèÿ.
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
140 |
*/
|
426
by Jussi Pakkanen
Merged more integer type refactoring. |
141 |
static Bool |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
142 |
writeHocrLineStartTag(Byte* pLineStart, const edRect& rcLine, const unsigned int iLine) |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
143 |
{
|
144 |
ASSERT(pLineStart); |
|
145 |
ostringstream outStrm; |
|
146 |
outStrm << "<span class='ocr_line' id='line_" << iLine << "' " |
|
147 |
<< "title=\"bbox " |
|
148 |
<< rcLine.left << " " |
|
149 |
<< rcLine.top << " " |
|
150 |
<< rcLine.right << " " |
|
151 |
<< rcLine.bottom << "\">"; |
|
152 |
outStrm.write(reinterpret_cast<const char*>(pLineStart), gMemCur - pLineStart); |
|
153 |
||
154 |
unsigned long sizeMem = outStrm.str().size(); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
155 |
// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
156 |
CHECK_MEMORY(sizeMem + 10); |
157 |
||
158 |
::memcpy(pLineStart, outStrm.str().c_str(), sizeMem); |
|
159 |
gMemCur = pLineStart + sizeMem; |
|
160 |
||
161 |
return TRUE; |
|
162 |
}
|
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
163 |
|
164 |
||
165 |
||
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
166 |
static bool |
167 |
isGoodCharRect(const edRect& rc) |
|
168 |
{
|
|
169 |
bool goodCharRect = true; |
|
170 |
goodCharRect = goodCharRect && (rc.left != -1); |
|
171 |
goodCharRect = goodCharRect && (rc.left != 65535); |
|
172 |
goodCharRect = goodCharRect && (rc.right != 65535); |
|
173 |
goodCharRect = goodCharRect && (rc.top != 65535); |
|
174 |
goodCharRect = goodCharRect && (rc.bottom != 65535); |
|
175 |
return goodCharRect; |
|
176 |
}
|
|
177 |
||
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
178 |
// decided to use CHECK_MEMORY macro in case it becomes a function which does more things than check if gMemCur+a>gMemEnd
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
179 |
// as a consequence, this function assures that allocated memory in gMemCur is enough.
|
426
by Jussi Pakkanen
Merged more integer type refactoring. |
180 |
static Bool |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
181 |
writeHocrCharBBoxesInfo(const std::vector<edRect > &charBboxes, const unsigned int iLine) |
182 |
{
|
|
183 |
ostringstream outStrm; |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
184 |
outStrm << "<span class='ocr_cinfo' title=\"x_bboxes "; |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
185 |
|
186 |
for (unsigned int i = 0; i < charBboxes.size(); i++) { |
|
187 |
outStrm << charBboxes[i].left << " " << charBboxes[i].top << " " |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
188 |
<< charBboxes[i].right << " " << charBboxes[i].bottom << " "; |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
189 |
}
|
190 |
||
191 |
outStrm << "\"></span>"; |
|
192 |
||
193 |
unsigned long sizeMem = outStrm.str().size(); |
|
194 |
||
195 |
// (check memory assures gMemCur can store and has 10 bytes extra).
|
|
196 |
// the comment below was copied from writeHocrLine
|
|
417
by julien
attempt fix comments |
197 |
// ïðîâåðèì äîñòàòî÷íîñòü ïàìÿòè
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
198 |
CHECK_MEMORY(sizeMem + 10); |
199 |
||
200 |
::memcpy(gMemCur, outStrm.str().c_str(), sizeMem); |
|
201 |
gMemCur += sizeMem; |
|
202 |
||
203 |
return TRUE; |
|
204 |
}
|
|
205 |
||
206 |
||
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
207 |
//********************************************************************
|
412.1.14
by uliss
Migration from BOOL to Bool |
208 |
Bool Static_MakeHTML( |
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
209 |
Handle hObject, |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
210 |
long reason // Ñì. enum BROWSE_REASON |
1
by JussiP
Imported source zip of Cuneiform |
211 |
)
|
212 |
{
|
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
213 |
static char buf[256] = {0}; |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
214 |
//! \~russian ïðÿìîóãîëüíèê ñèìâîëà
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
215 |
edRect r = {0}; |
216 |
||
217 |
static unsigned int iPage(1); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
218 |
//! \~russian ïðÿìîóãîëüíèê ñòðîêè
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
219 |
//! \~english rectangle state variable, for the current line, is expanded per incoming char.
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
220 |
static edRect rcLine = {0}; |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
221 |
//! \~russian ïðÿìîóãîëüíèê ñòðîêè
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
222 |
//! \~english true if last none-space character was in line (i.e had a valid bbox).
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
223 |
static bool isInLine(false); |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
224 |
//! \~russian íîìåð òåêóùåé ñòðîêè
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
225 |
//! \~english state flag for current line nr.
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
226 |
static unsigned int iLine(1); |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
227 |
//! \~russian ïîçèöèÿ íà÷àëà ñòðîêè â òåêñòîâîì áóôåðå âûâîäà
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
228 |
static Byte* pLineStart = 0; |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
229 |
//! \~english is the ptr to the location that gMemCur pointed to when reason was BROWSE_LINE_START
|
230 |
||
231 |
static std::vector<edRect > currentLineCharBBoxes; |
|
232 |
currentLineCharBBoxes.reserve(200); |
|
233 |
||
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
234 |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
235 |
// Â êîíöå âûçûâàåòñÿ WordControl
|
1
by JussiP
Imported source zip of Cuneiform |
236 |
|
237 |
switch(reason) |
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
238 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
239 |
case BROWSE_CHAR: // Ñèìâîë |
1
by JussiP
Imported source zip of Cuneiform |
240 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
241 |
// Óñòàíîâèòü ÿçûê
|
1
by JussiP
Imported source zip of Cuneiform |
242 |
long lang = CED_GetCharFontLang(hObject); |
243 |
if (lang != gLanguage) |
|
244 |
SetLanguage(lang); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
245 |
// Ñòèëü øðèôòà
|
1
by JussiP
Imported source zip of Cuneiform |
246 |
FontStyle(CED_GetCharFontAttribs(hObject)); |
247 |
||
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
248 |
r = CED_GetCharLayout(hObject); |
418
by julien
attempt fix comments |
249 |
currentLineCharBBoxes.push_back(r); |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
250 |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
251 |
// Çàïèñàòü ñèìâîë
|
418
by julien
attempt fix comments |
252 |
if(isGoodCharRect(r) && hocrmode) |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
253 |
{
|
418
by julien
attempt fix comments |
254 |
if (0 == isInLine) |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
255 |
// íà÷íåì îïðåäåëåíèå ãðàíèö ñòðîêè
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
256 |
{
|
257 |
if (isGoodCharRect(r)) |
|
258 |
{
|
|
259 |
rcLine = r; |
|
260 |
isInLine = true; |
|
261 |
}
|
|
262 |
}
|
|
263 |
else
|
|
264 |
{
|
|
265 |
if (isGoodCharRect(r)) |
|
266 |
{
|
|
267 |
rcLine.left = min(rcLine.left, r.left); |
|
268 |
rcLine.top = min(rcLine.top, r.top); |
|
269 |
rcLine.right = max(rcLine.right, r.right); |
|
270 |
rcLine.bottom = max(rcLine.bottom, r.bottom); |
|
271 |
}
|
|
272 |
else
|
|
273 |
{
|
|
274 |
}
|
|
275 |
}
|
|
276 |
}
|
|
277 |
ONE_CHAR(hObject); |
|
278 |
||
279 |
break; |
|
280 |
}
|
|
281 |
case BROWSE_LINE_START: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
282 |
// Íà÷àëî ñòðîêè òåêñòà
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
283 |
pLineStart = gMemCur; |
284 |
::memset(&rcLine, 0, sizeof(rcLine)); |
|
1
by JussiP
Imported source zip of Cuneiform |
285 |
break; |
286 |
||
287 |
case BROWSE_LINE_END: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
288 |
// Êîíåö ñòðîêè òåêñòà
|
419
by julien
no hocr tags are written if -f html is used. |
289 |
if (hocrmode) |
290 |
writeHocrLineStartTag(pLineStart, rcLine, iLine); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
291 |
FontStyle(0); |
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
292 |
|
293 |
// write character bounding boxes info
|
|
294 |
if (currentLineCharBBoxes.size()) |
|
419
by julien
no hocr tags are written if -f html is used. |
295 |
if (hocrmode) |
296 |
writeHocrCharBBoxesInfo(currentLineCharBBoxes, iLine); |
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
297 |
currentLineCharBBoxes.resize(0); |
298 |
||
299 |
||
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
300 |
isInLine = false; |
1
by JussiP
Imported source zip of Cuneiform |
301 |
if ( gPreserveLineBreaks || gEdLineHardBreak ) |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
302 |
{
|
1
by JussiP
Imported source zip of Cuneiform |
303 |
PUT_STRING("<br>"); |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
304 |
}
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
305 |
|
306 |
iLine++; |
|
307 |
// close HocrLine tag
|
|
308 |
PUT_STRING("</span>"); |
|
309 |
||
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
310 |
NEW_LINE; |
1
by JussiP
Imported source zip of Cuneiform |
311 |
break; |
312 |
||
313 |
case BROWSE_PARAGRAPH_START: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
314 |
// Íà÷àëî àáçàöà
|
1
by JussiP
Imported source zip of Cuneiform |
315 |
FontStyle(0); |
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
316 |
BeginParagraph(hObject); |
1
by JussiP
Imported source zip of Cuneiform |
317 |
break; |
318 |
||
319 |
case BROWSE_PARAGRAPH_END: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
320 |
// Êîíåö àáçàöà
|
1
by JussiP
Imported source zip of Cuneiform |
321 |
FontStyle(0); |
322 |
PUT_STRING("</p>"); |
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
323 |
NEW_LINE; |
1
by JussiP
Imported source zip of Cuneiform |
324 |
break; |
325 |
||
326 |
case BROWSE_PAGE_START: |
|
276.1.4
by Jussi Pakkanen
Doctype fix from Alex Samorukov. |
327 |
// Start of page.
|
1
by JussiP
Imported source zip of Cuneiform |
328 |
FontStyle(0); |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
329 |
{
|
330 |
ostringstream outStrm; |
|
331 |
outStrm << "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 " |
|
332 |
"Transitional//EN\"" |
|
333 |
" \"http://www.w3.org/TR/html4/loose.dtd\">" << endl; |
|
334 |
outStrm << "<html><head><title></title>" << endl; |
|
335 |
if (gActiveCode==ROUT_CODE_UTF8) |
|
336 |
{
|
|
337 |
outStrm << "<meta http-equiv=\"Content-Type\"" |
|
338 |
" content=\"text/html;charset=utf-8\" >" << endl; |
|
339 |
}
|
|
340 |
outStrm << "<meta name='ocr-system' content='openocr'>" << endl; |
|
341 |
outStrm << "</head>" << endl << "<body>"; |
|
342 |
strm2buf(outStrm); |
|
343 |
}
|
|
344 |
{
|
|
345 |
ostringstream outStrm; |
|
346 |
EDSIZE sizeImage(CED_GetPageImageSize(hObject)); |
|
347 |
const char* pImageName = CED_GetPageImageName(hObject); |
|
348 |
assert(pImageName); |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
349 |
//ïðèìåð <div class='ocr_page' title='image "page-000.pbm"; bbox 0 0 4306 6064'>
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
350 |
outStrm << "<div class='ocr_page' id='page_" << iPage << "' "; |
351 |
outStrm << "title='image \"" << pImageName << "\"; bbox 0 0 " |
|
414
by julien
separated ocr_line and character bboxes. now follows the hocr standard using the ocr_cinfo tag for char bboxes |
352 |
<< sizeImage.cx << " " << sizeImage.cy << "'>" << endl; |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
353 |
strm2buf(outStrm); |
354 |
++iPage; |
|
355 |
}
|
|
1
by JussiP
Imported source zip of Cuneiform |
356 |
break; |
357 |
||
358 |
case BROWSE_PAGE_END: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
359 |
// Êîíåö ñòðàíèöû
|
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
360 |
PUT_STRING("</div>"); |
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
361 |
// Êîíåö äîêóìåíòà
|
432
by Jussi Pakkanen
End HTML output with linefeed. |
362 |
PUT_STRING("</body></html>\n"); |
413
by Dmitry Polevoy
hocr format now supports ocr_line. Replaced cuneiform_src/Kern/rout/src/html.cpp to the patch submitted in the cuneiform mailing list the 24th of February by Dmitry Polevoy. Changed %d to %l in a few sprintf statements in html.cpp |
363 |
iLine = 1; |
1
by JussiP
Imported source zip of Cuneiform |
364 |
break; |
365 |
||
366 |
case BROWSE_TABLE_START: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
367 |
// Íà÷àëî òàáëèöû
|
1
by JussiP
Imported source zip of Cuneiform |
368 |
FontStyle(0); |
369 |
PUT_STRING("<table border>"); |
|
370 |
break; |
|
371 |
||
372 |
case BROWSE_TABLE_END: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
373 |
// Êîíåö òàáëèöû
|
1
by JussiP
Imported source zip of Cuneiform |
374 |
FontStyle(0); |
375 |
PUT_STRING("</table>"); |
|
376 |
break; |
|
377 |
||
378 |
case BROWSE_ROW_START: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
379 |
// Íà÷àëî ñòðîêè òàáëèöû
|
1
by JussiP
Imported source zip of Cuneiform |
380 |
PUT_STRING("<tr>"); |
381 |
break; |
|
382 |
||
383 |
case BROWSE_CELL_START: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
384 |
// Íà÷àëî ÿ÷åéêè òàáëèöû
|
1
by JussiP
Imported source zip of Cuneiform |
385 |
CellStart(); |
386 |
break; |
|
387 |
||
388 |
case BROWSE_PICTURE: |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
389 |
// Êàðòèíêà
|
1
by JussiP
Imported source zip of Cuneiform |
390 |
Picture(); |
391 |
break; |
|
392 |
||
393 |
}
|
|
394 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
395 |
// Ñëåæåíèå çà ñëîâàìè è ñòðîêàìè
|
1
by JussiP
Imported source zip of Cuneiform |
396 |
WORDS_CONTROL(reason); |
397 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
398 |
// Óñòðàíåíèå èçáûòî÷íûõ òåãîâ
|
1
by JussiP
Imported source zip of Cuneiform |
399 |
OptimizeTags(); |
400 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
401 |
return TRUE; // Ïðîäîëæèòü ïðîñìîòð |
1
by JussiP
Imported source zip of Cuneiform |
402 |
}
|
403 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
404 |
static Bool FontStyle(ulong newStyle) |
1
by JussiP
Imported source zip of Cuneiform |
405 |
{
|
406 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
407 |
if ((newStyle & FONT_BOLD) && |
408 |
(!(sFontStyle & FONT_BOLD) || |
|
1
by JussiP
Imported source zip of Cuneiform |
409 |
(sFontStyle & FONT_LIGHT))) |
410 |
{PUT_STRING("<b>");} |
|
411 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
412 |
else if ((sFontStyle & FONT_BOLD) && |
413 |
(!(newStyle & FONT_BOLD) || |
|
1
by JussiP
Imported source zip of Cuneiform |
414 |
(newStyle & FONT_LIGHT))) |
415 |
{PUT_STRING("</b>");} |
|
416 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
417 |
if ((newStyle & FONT_ITALIC) && |
1
by JussiP
Imported source zip of Cuneiform |
418 |
(!(sFontStyle & FONT_ITALIC) )) |
419 |
{PUT_STRING("<i>");} |
|
420 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
421 |
else if ((sFontStyle & FONT_ITALIC) && |
1
by JussiP
Imported source zip of Cuneiform |
422 |
(!(newStyle & FONT_ITALIC) )) |
423 |
{PUT_STRING("</i>");} |
|
424 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
425 |
if ((newStyle & FONT_UNDERLINE) && |
1
by JussiP
Imported source zip of Cuneiform |
426 |
!(sFontStyle & FONT_UNDERLINE)) |
427 |
{PUT_STRING("<u>");} |
|
428 |
||
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
429 |
else if ((sFontStyle & FONT_UNDERLINE) && |
1
by JussiP
Imported source zip of Cuneiform |
430 |
!(newStyle & FONT_UNDERLINE)) |
431 |
{PUT_STRING("</u>");} |
|
432 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
433 |
// Çàïîìíèòü øðèôò
|
1
by JussiP
Imported source zip of Cuneiform |
434 |
sFontStyle = newStyle; |
435 |
return TRUE; |
|
436 |
}
|
|
437 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
438 |
static Bool BeginParagraph(Handle hObject) |
1
by JussiP
Imported source zip of Cuneiform |
439 |
{
|
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
440 |
const char *p = NULL; |
1
by JussiP
Imported source zip of Cuneiform |
441 |
char buf[80] = ""; |
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
442 |
edBox b = CED_GetLayout(hObject); |
412.1.1
by uliss
BIG Changes))) need to commit mo frequently |
443 |
ulong alignment = CED_GetAlignment(hObject); |
1
by JussiP
Imported source zip of Cuneiform |
444 |
|
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
445 |
switch (alignment & ALIGN_MASK) { |
1
by JussiP
Imported source zip of Cuneiform |
446 |
case ALIGN_CENTER: |
447 |
p = "center"; |
|
448 |
break; |
|
449 |
||
450 |
case (ALIGN_LEFT | ALIGN_RIGHT): |
|
451 |
p = "justify"; |
|
452 |
break; |
|
453 |
||
454 |
case ALIGN_LEFT: |
|
455 |
default: |
|
456 |
// "left" by default
|
|
292
by Jussi Pakkanen
Enable hOCR output mode with help from Rene Rebe. |
457 |
;
|
458 |
}
|
|
459 |
||
460 |
PUT_STRING("<p"); |
|
461 |
if (p) { |
|
462 |
sprintf(buf, " align=%s", p); |
|
463 |
PUT_STRING(buf); |
|
464 |
}
|
|
465 |
||
466 |
if (b.x != -1 && hocrmode) { |
|
467 |
sprintf(buf, " title=\"bbox %d %d %d %d\"", b.x, b.y, b.x + b.w, b.y |
|
468 |
+ b.h); |
|
469 |
PUT_STRING(buf); |
|
470 |
}
|
|
471 |
PUT_STRING(">"); |
|
472 |
||
473 |
return TRUE; |
|
1
by JussiP
Imported source zip of Cuneiform |
474 |
}
|
475 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
476 |
static Bool CellStart() |
1
by JussiP
Imported source zip of Cuneiform |
477 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
478 |
// ß÷åéêà òàáëèöû
|
1
by JussiP
Imported source zip of Cuneiform |
479 |
char buf[80] = ""; |
480 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
481 |
// Âû÷èñëèòü ðàçìåð ÿ÷åéêè
|
1
by JussiP
Imported source zip of Cuneiform |
482 |
CalcCellSpan(); |
483 |
||
484 |
if ( rowspan == 1 && colspan == 1 ) |
|
485 |
strcpy(buf,"<td>"); |
|
486 |
||
487 |
else if ( rowspan > 1 && colspan == 1 ) |
|
458
by Jussi Pakkanen
Format modifier fixes. |
488 |
sprintf(buf,"<td rowspan=%ld>",rowspan); |
1
by JussiP
Imported source zip of Cuneiform |
489 |
|
490 |
else if ( rowspan == 1 && colspan > 1 ) |
|
458
by Jussi Pakkanen
Format modifier fixes. |
491 |
sprintf(buf,"<td colspan=%ld>",colspan); |
1
by JussiP
Imported source zip of Cuneiform |
492 |
|
493 |
else // ( rowspan > 1 && colspan > 1 ) |
|
458
by Jussi Pakkanen
Format modifier fixes. |
494 |
sprintf(buf,"<td rowspan=%ld colspan=%ld>",rowspan,colspan); |
1
by JussiP
Imported source zip of Cuneiform |
495 |
|
496 |
PUT_STRING(buf); |
|
497 |
return TRUE; |
|
498 |
}
|
|
499 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
500 |
static Bool CalcCellSpan() |
1
by JussiP
Imported source zip of Cuneiform |
501 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
502 |
// Âû÷èñëèòü ðàçìåð ÿ÷åéêè
|
1
by JussiP
Imported source zip of Cuneiform |
503 |
long row,col; |
504 |
||
505 |
rowspan = 0; |
|
506 |
colspan = 0; |
|
507 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
508 |
// Ïðîñìîòð âíèç îò òåêóùåé ÿ÷åéêè
|
1
by JussiP
Imported source zip of Cuneiform |
509 |
row = gIndexTableRow; |
510 |
col = gIndexTableCol; |
|
511 |
||
512 |
while ( row < gTableRows && |
|
513 |
gIndexTableCell == gLogicalCells[row*gTableCols+col] |
|
514 |
)
|
|
515 |
{
|
|
516 |
rowspan++; |
|
517 |
row++; |
|
518 |
}
|
|
519 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
520 |
// Ïðîñìîòð âïðàâî îò òåêóùåé ÿ÷åéêè
|
1
by JussiP
Imported source zip of Cuneiform |
521 |
row = gIndexTableRow; |
522 |
col = gIndexTableCol; |
|
523 |
||
524 |
while ( col < gTableCols && |
|
525 |
gIndexTableCell == gLogicalCells[row*gTableCols+col] |
|
526 |
)
|
|
527 |
{
|
|
528 |
colspan++; |
|
529 |
col++; |
|
530 |
}
|
|
531 |
||
532 |
ASSERT(rowspan>0 && colspan>0); |
|
533 |
return TRUE; |
|
534 |
}
|
|
535 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
536 |
static Bool OptimizeTags() |
1
by JussiP
Imported source zip of Cuneiform |
537 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
538 |
// Óñòðàíåíèå èçáûòî÷íûõ òåãîâ
|
1
by JussiP
Imported source zip of Cuneiform |
539 |
long l1 = 0, l2 = 0; |
540 |
char *p; |
|
541 |
||
542 |
#define SUBST(a,b) {\
|
|
543 |
l1 = strlen(a);\
|
|
544 |
l2 = strlen(b);\
|
|
545 |
p = (char*)gMemCur - l1;\
|
|
546 |
if (!memcmp(a,p,l1))\
|
|
547 |
{\
|
|
548 |
strcpy(p,b);\
|
|
549 |
gMemCur -= l1 - l2;\
|
|
550 |
}\
|
|
551 |
}
|
|
552 |
||
553 |
SUBST("<td><p>","<td>"); |
|
554 |
SUBST("</p><td>","<td>"); |
|
555 |
SUBST("</p></table>","</table>"); |
|
556 |
SUBST("<p></p>",""); |
|
557 |
SUBST("<br></p>","</p>"); |
|
558 |
||
559 |
return TRUE; |
|
560 |
}
|
|
561 |
//********************************************************************
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
562 |
static Bool Picture() |
1
by JussiP
Imported source zip of Cuneiform |
563 |
{
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
564 |
/* Êàðòèíêà.
|
1
by JussiP
Imported source zip of Cuneiform |
565 |
|
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
566 |
gPictureNumber - img number 1
|
567 |
gPictureData - DIB address, wiith header
|
|
568 |
gPictureLength - DIB length, with header
|
|
1
by JussiP
Imported source zip of Cuneiform |
569 |
|
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
570 |
1. Ñîçäàòü ïîäïàïêó äëÿ êàðòèíîê "<page>_files"
|
571 |
2. Çàïèñàòü êàðòèíêó â BMP-ôàéë <íîìåð>.bmp.
|
|
572 |
3. Âñòàâèòü òåã "img" ñî ññûëêîé íà ôàéë êàðòèíêè.
|
|
1
by JussiP
Imported source zip of Cuneiform |
573 |
*/
|
574 |
char buf[256] = ""; |
|
575 |
char absPicFileName[256] = ""; |
|
576 |
char relPicFileName[256] = ""; |
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
577 |
char dir[_MAX_PATH], name[_MAX_PATH], ext[_MAX_EXT]; |
1
by JussiP
Imported source zip of Cuneiform |
578 |
|
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
579 |
// create folder for images gPageFilesFolder.
|
1
by JussiP
Imported source zip of Cuneiform |
580 |
if ( !CreatePageFilesFolder() ) |
581 |
return FALSE; |
|
582 |
||
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
583 |
// create file name
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
584 |
split_path(gPageName, dir, name, ext); |
1
by JussiP
Imported source zip of Cuneiform |
585 |
|
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
586 |
// write picture to bmp file
|
587 |
if(dir[0]) |
|
458
by Jussi Pakkanen
Format modifier fixes. |
588 |
sprintf(absPicFileName,"%s/%s/%ld.bmp", dir, |
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
589 |
gPageFilesFolder, gPictureNumber); |
590 |
else
|
|
458
by Jussi Pakkanen
Format modifier fixes. |
591 |
sprintf(absPicFileName,"%s/%ld.bmp", |
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
592 |
gPageFilesFolder, gPictureNumber); |
1
by JussiP
Imported source zip of Cuneiform |
593 |
|
458
by Jussi Pakkanen
Format modifier fixes. |
594 |
sprintf (relPicFileName,"%s/%ld.bmp", |
1
by JussiP
Imported source zip of Cuneiform |
595 |
gPageFilesFolder, gPictureNumber); |
596 |
||
597 |
if ( !WritePictureToBMP_File( |
|
598 |
gPictureData, |
|
599 |
gPictureLength, |
|
600 |
absPicFileName) |
|
601 |
)
|
|
602 |
return FALSE; |
|
603 |
||
256
by Jussi Pakkanen
HTML image fix from Alex Samorukov. |
604 |
// write img html tag.
|
1
by JussiP
Imported source zip of Cuneiform |
605 |
sprintf (buf,"<img src=%s " |
458
by Jussi Pakkanen
Format modifier fixes. |
606 |
"width=%ld height=%ld "
|
1
by JussiP
Imported source zip of Cuneiform |
607 |
"alt=\"%s\">", |
608 |
relPicFileName, |
|
205
by Jussi Pakkanen
Removed end-of-line whitespace. |
609 |
gPictureGoal.cx * 72L / 1440L, |
1
by JussiP
Imported source zip of Cuneiform |
610 |
gPictureGoal.cy * 72L / 1440L, |
611 |
relPicFileName
|
|
612 |
);
|
|
613 |
||
614 |
PUT_STRING(buf); |
|
615 |
return TRUE; |
|
616 |
}
|
|
617 |
//********************************************************************
|
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
618 |
|
619 |
/**
|
|
620 |
* Create a subdirectory to hold image files for html document.
|
|
621 |
*/
|
|
412.1.14
by uliss
Migration from BOOL to Bool |
622 |
static Bool CreatePageFilesFolder() { |
412.1.1
by uliss
BIG Changes))) need to commit mo frequently |
623 |
// СПзЎаÑÑ Ð¿ÐŸÐŽÐ¿Ð°Ð¿ÐºÑ ÐŽÐ»Ñ ÐºÐ°ÑÑОМПк gPageFilesFolder.
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
624 |
char dir[_MAX_PATH], name[_MAX_PATH], ext[_MAX_EXT], path[_MAX_PATH]; |
625 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
626 |
// Çàäàíî ëè èìÿ ñòðàíèöû?
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
627 |
if (!gPageName[0]) |
628 |
return FALSE; |
|
629 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
630 |
// Èçãîòîâèòü èìÿ ïîäïàïêè
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
631 |
split_path(gPageName, dir, name, ext); |
632 |
memset(gPageFilesFolder, 0, sizeof(gPageFilesFolder)); |
|
633 |
sprintf(gPageFilesFolder, "%s_files", name); |
|
634 |
||
415
by julien
moved some tags around, now follows html spec and hocr spec. fixed russian comments that were destroyed during encoding |
635 |
// Ñîçäàòü ïîäïàïêó
|
254
by Jussi Pakkanen
Fix path thing. Thanks to Alex Samorukov. |
636 |
if(dir[0]) |
637 |
sprintf(path, "%s/%s", dir, gPageFilesFolder); |
|
638 |
else
|
|
639 |
sprintf(path, "%s", gPageFilesFolder); |
|
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
640 |
if (CreateDirectory(&path[0], 0) == FALSE) { |
422
by Jussi Pakkanen
Trial merge from refactoring branch. |
641 |
uint32_t err = GetLastError(); |
249
by Jussi Pakkanen
Enable HTML image output with help from Alex Samorukov. |
642 |
if (err != ERROR_ALREADY_EXISTS) { |
643 |
DEBUG_PRINT("CreatePageFilesFolder error = %d",err); |
|
644 |
return FALSE; |
|
645 |
}
|
|
646 |
}
|
|
647 |
||
648 |
return TRUE; |
|
1
by JussiP
Imported source zip of Cuneiform |
649 |
}
|
650 |
//********************************************************************
|