5
ReadOSM is a C open source library to extract valid data from within an Open
6
Street Map input file. Such OSM files come in two different formats:
7
- files identified by the <b>.osm</b> suffix simply are plain XML files.
8
- files identified by the <b>.pbf</b> suffix contain the same
9
data, but adopting the Google's Protocol Buffer serialization format (a
10
more concise and compressed binary notation, thus requiring much less
13
The ReadOSM design goals are:
14
- to be simple and lightweight
15
- to be stable, robust and efficient
16
- to be easily and universally portable.
17
- making the whole parsing process of both .osm or .pbf files
18
completely transparent from the application own perspective.
20
ReadOSM is structurally simple and quite light-weight (typically about 20K of object
21
code, stripped). ReadOSM has only two key dependencies:
22
- zlib (the well known ZIP library), which is used to decompress zipped binary
23
blocks internally stored within .pbf files.
24
- expat (a widely used XML parsing library), which is used to parse XML .osm files.
25
- both libraries are widely available on many platforms.
27
Building and installing ReadOSM is straightforward:
34
Linking ReadOSM to your own code is usually simple:
36
gcc my_program.c -o my_program -lreadosm
39
On some systems you may have to provide a slightly more complex arrangement:
41
gcc -I/usr/local/include my_program.c -o my_program \
42
-L/usr/local/lib -lreadosm -lexpat -lz
45
ReadOSM also provides pkg-config support, so you can also do:
47
gcc `pkg-config --cflags readosm` my_program.c -o my_program `pkg-config --libs readosm`
50
I originally developed ReadOSM simply in order to allow the SpatiaLite's
51
own CLI tools to acquire both OSM .osm and .pbf files indifferently.
52
Anyway I feel that supporting OSM files import/parsing in a simple and easy
53
way could be useful to many other developers, so I quickly decided to
54
implement all this stuff as a self-standing library.
56
ReadOSM is licensed under the MPL tri-license terms: you are free to choose the
57
best-fit license between:
59
- the GPL v2.0 or any subsequent version
60
- the LGPL v2.1 or any subsequent version
62
Enjoy, and happy coding
65
/** \page intro About Open Street Map datasets
67
Open Street Map aka \b OSM [http://www.openstreetmap.org/] is a very popular
68
community project aimed to produced a map of the world; this map is absolutely
69
free and is released under the CC-BY-SA license terms
70
[http://creativecommons.org/licenses/by-sa/2.0/].
72
Selected portions [by Country / Region] of the OSM map are available on the
73
following download sites:
74
- http://download.geofabrik.de/
75
- http://downloads.cloudmade.com/
77
The best known format used to ship OSM datasets is based on XML; we'll
78
shortly examine the XML general layout so to explain the objects used
79
by the OSM data model and their mutual relationships.
83
A Node simply corresponds to a 2D POINT Geometry; the geographic coordinates
84
are always expressed as Longitude and Latitude (corresponding to SRID 4326).<br>
85
A Node doesn't simply have a geometry; it's usually characterized by several data
87
- \b id: a number uniquely identifying each Node object.
88
- \b lon and \b lat: the geographic Longitude and Latitude of the Point.
89
- \b version: a progressive number identifying subsequent versions of the same object.
90
- \b changeset: a progressive number identifying a "changeset", i.e. a batch insert/update
91
performed by same user.
92
- \b user: nickname of the user committing the changeset.
93
- \b uid: a number uniquely identifying the user
94
- \b timestemp: commit date-time
95
- \b tag-list: any object may eventually be further qualified using arbitrary \b key:value pairs.
97
The following is the XML general layout used to represent a Node object:
99
<node id="12345" lat="6.66666" lon="7.77777" version="1" changeset="54321" user="some-user" uid="66" timestamp="2005-02-28T17:45:15Z">
100
<tag key="created_by" value="JOSM" />
101
<tag key="tourism" value="camp_site" />
107
A Way corresponds to a 2D LINESTRING Geometry: anyway the vertices never are directly
108
defined within the Way itself; a list of indirectly referenced Nodes (<b><nd ref></b> items) is required instead.<br>
109
The data attributes characterizing a Way are more or less the same used for Nodes, and with identical meaning;
110
and for Ways too an arbitrary collection of Tags (\b key:value pairs) is supported.
112
The following is the XML general layout used to represent a Way object:
114
<way id="12345" version="1" changeset="54321" user="some-user" uid="66" timestamp="2005-02-28T17:45:15Z">
118
<tag key="created_by" value="JOSM" />
119
<tag key="tourism" value="camp_site" />
125
A Relation is a complex object: it can correspond to a 2D POLYGON, or to a 2D MULTILINESTRING, or even to a 2D GEOMETRYCOLLECTION.<br>
126
A Relation object can reference any other kind of OSM objects: each <b><member></b> item can address a Node object,
127
a Way object or another Relation object; the \b type attribute will always specify the nature of the referenced object,
128
and the optional \b role attribute may eventually better specify the intended scope.<br>
129
The data attributes characterizing a Relation are exactly the same used for Ways, and with identical meaning;
130
and for Relations too an arbitrary collection of Tags (\b key:value pairs) is supported.
132
The following is the XML general layout used to represent a Relation object:
134
<relation id="12345" version="1" changeset="54321" user="some-user" uid="66" timestamp="2005-02-28T17:45:15Z">
135
<member type="way" ref="12345" role="outer" />
136
<member type="way" ref="12346" role="inner" />
137
<tag key="created_by" value="JOSM" />
138
<tag key="tourism" value="camp_site" />
143
/** \page formats Open Street Map file formats
145
There are two distinct formats used to ship OSM datasets: both contains the exact same
146
information, but the internal layout is radically different.
148
\section osm XML (.osm) files
150
OSM files based on the XML notation are widely used: usually they are identified by the <b>.osm</b> suffix.<br>
151
XML is notoriously verbose and usually requires lots of storage space; happily enough, XML it's strongly compressible.<br>
152
Accordingly to this consideration, the most commonly found OSM files are identified by the <b>.osm.bz2</b> suffix:
153
this practically means that the <b>.osm</b> (XML) file has been compressed using <b>bzip2</b>.
154
In order to actually process a <b>.osm.bz2</b> OSM file a two-steps approach is always required:
155
- decompressing the file (using <b>bunzip2</b> or some other tool)
156
- then parsing the resulting <b>.osm</b> file
157
- please note: the inflated file will require about 10/15 times the amount space required
158
by the compressed file; many OSM XML files could actually be impressively huge (several GB).
160
\section pbf Protocol Buffer (.pbf) files
162
An alternative OSM file format is based on the Google's Protocol Buffer encoding
163
[https://developers.google.com/protocol-buffers/docs/encoding]<br>
164
This OSM format is based on a public and documented specification: [http://wiki.openstreetmap.org/wiki/PBF_Format]<br>
166
OSM files based on Protocol Buffer encoding are usually identified by the <b>.pbf</b> suffix.<br>
167
The main benefit coming from using <b>.pbf</b> files is in that they are much more compact
168
(smaller size) than the corresponding <b>.osm.bz2</b>; and they can be immediately parsed, no
169
preliminary decompression step being required at all.<br>
171
\section readosm Why using ReadOSM ?
173
The intended scope of <b>ReadOSM</b> is to allow transparent parsing of both OSM formats indifferently.
174
There is no need to take care of any internal low-level aspect, because the library itself silently handles any required step.
175
The simple and easy abstract interface implemented by ReadOSM is exactly intended so to allow many
176
reader-apps to consume OSM-input files in the most painless way; and all this requires only a
177
very limited memory footprint.
181
/** \page readosm ReadOSM basic architecture
183
ReadOSM implements a very simple and straightforward interface; there are only three methods:
184
- <b>readosm_open()</b>: this function is intended to establish a connection to some OSM input file.
185
- <b>readosm_close()</b>: this function is intended to terminate a previously established connection.
186
- <b>readosm_parse()</b>: a single function dispatching the whole parsing process (mainly based on <b>callback functions</b>).
188
Accordingly to the above premises, implementing a complete OSM parser is incredibly simple:
194
parse_node (const void *user_data, const readosm_node * node)
196
/* callback function consuming Node objects */
197
struct some_user_defined_struct *my_struct =
198
(struct some_user_defined_struct *) user_data;
200
... some smart code ...
206
parse_way (const void *user_data, const readosm_way * way)
208
/* callback function consuming Way objects */
209
struct some_user_defined_struct *my_struct =
210
(struct some_user_defined_struct *) user_data;
212
... some smart code ...
218
parse_relation (const void *user_data, const readosm_relation * relation)
220
/* callback function consuming Relation objects */
221
struct some_user_defined_struct *my_struct =
222
(struct some_user_defined_struct *) user_data;
224
... some smart code ...
231
/* the basic OSM parser implementation */
234
struct some_user_defined_struct my_struct;
236
ret = readosm_open ("path-to-some-OSM-file", &handle);
238
... error handling intentionally suppressed ...
240
ret = readosm_parse (handle, &my_struct, parse_node, parse_way, parse_relation);
242
... error handling intentionally suppressed ...
244
ret = readosm_close (handle);
246
... error handling intentionally suppressed ...
252
So the real programming work is simply the one required in order to implement the callback-functions own code.<br>
253
You can usefully read and study the <b>Examples</b> code-samples in order to get any other relevant information about this topic.