2
2
NetCDF reader/writer module.
4
This module is used to read and create NetCDF files. NetCDF files are
5
accessed through the `netcdf_file` object. Data written to and from NetCDF
6
files are contained in `netcdf_variable` objects. Attributes are given
7
as member variables of the `netcdf_file` and `netcdf_variable` objects.
11
NetCDF files are a self-describing binary data format. The file contains
12
metadata that describes the dimensions and variables in the file. More
13
details about NetCDF files can be found `here
14
<http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html>`_. There
15
are three main sections to a NetCDF data structure:
21
The dimensions section records the name and length of each dimension used
22
by the variables. The variables would then indicate which dimensions it
23
uses and any attributes such as data units, along with containing the data
24
values for the variable. It is good practice to include a
25
variable that is the same name as a dimension to provide the values for
26
that axes. Lastly, the attributes section would contain additional
27
information such as the name of the file creator or the instrument used to
30
When writing data to a NetCDF file, there is often the need to indicate the
31
'record dimension'. A record dimension is the unbounded dimension for a
32
variable. For example, a temperature variable may have dimensions of
33
latitude, longitude and time. If one wants to add more temperature data to
34
the NetCDF file as time progresses, then the temperature variable should
35
have the time dimension flagged as the record dimension.
4
37
This module implements the Scientific.IO.NetCDF API to read and create
5
38
NetCDF files. The same API is also used in the PyNIO and pynetcdf
6
modules, allowing these modules to be used interchangebly when working
7
with NetCDF files. The major advantage of ``scipy.io.netcdf`` over other
39
modules, allowing these modules to be used interchangeably when working
40
with NetCDF files. The major advantage of this module over other
8
41
modules is that it doesn't require the code to be linked to the NetCDF
9
libraries as the other modules do.
11
The code is based on the `NetCDF file format specification
12
<http://www.unidata.ucar.edu/software/netcdf/docs/netcdf.html>`_. A
13
NetCDF file is a self-describing binary format, with a header followed
14
by data. The header contains metadata describing dimensions, variables
15
and the position of the data in the file, so access can be done in an
16
efficient manner without loading unnecessary data into memory. We use
17
the ``mmap`` module to create Numpy arrays mapped to the data on disk,
20
The structure of a NetCDF file is as follows:
22
C D F <VERSION BYTE> <NUMBER OF RECORDS>
23
<DIMENSIONS> <GLOBAL ATTRIBUTES> <VARIABLES METADATA>
24
<NON-RECORD DATA> <RECORD DATA>
26
Record data refers to data where the first axis can be expanded at
27
will. All record variables share a same dimension at the first axis,
28
and they are stored at the end of the file per record, ie
30
A[0], B[0], ..., A[1], B[1], ..., etc,
32
so that new data can be appended to the file without changing its original
33
structure. Non-record data are padded to a 4n bytes boundary. Record data
34
are also padded, unless there is exactly one record variable in the file,
35
in which case the padding is dropped. All data is stored in big endian
38
The Scientific.IO.NetCDF API allows attributes to be added directly to
39
instances of ``netcdf_file`` and ``netcdf_variable``. To differentiate
40
between user-set attributes and instance attributes, user-set attributes
41
are automatically stored in the ``_attributes`` attribute by overloading
42
``__setattr__``. This is the reason why the code sometimes uses
43
``obj.__dict__['key'] = value``, instead of simply ``obj.key = value``;
44
otherwise the key would be inserted into userspace attributes.
46
To create a NetCDF file::
49
>>> f = netcdf_file('simple.nc', 'w')
44
In addition, the NetCDF file header contains the position of the data in
45
the file, so access can be done in an efficient manner without loading
46
unnecessary data into memory. It uses the ``mmap`` module to create
47
Numpy arrays mapped to the data on disk, for the same purpose.
51
To create a NetCDF file:
53
>>> from scipy.io import netcdf
54
>>> f = netcdf.netcdf_file('simple.nc', 'w')
50
55
>>> f.history = 'Created for a test'
51
56
>>> f.createDimension('time', 10)
52
57
>>> time = f.createVariable('time', 'i', ('time',))
72
* properly implement ``_FillValue``.
73
* implement Jeff Whitaker's patch for masked variables.
74
* fix character variables.
75
* implement PAGESIZE for Python 2.6?
84
# * properly implement ``_FillValue``.
85
# * implement Jeff Whitaker's patch for masked variables.
86
# * fix character variables.
87
# * implement PAGESIZE for Python 2.6?
89
#The Scientific.IO.NetCDF API allows attributes to be added directly to
90
#instances of ``netcdf_file`` and ``netcdf_variable``. To differentiate
91
#between user-set attributes and instance attributes, user-set attributes
92
#are automatically stored in the ``_attributes`` attribute by overloading
93
#``__setattr__``. This is the reason why the code sometimes uses
94
#``obj.__dict__['key'] = value``, instead of simply ``obj.key = value``;
95
#otherwise the key would be inserted into userspace attributes.
78
98
__all__ = ['netcdf_file', 'netcdf_variable']
82
102
from mmap import mmap, ACCESS_READ
84
104
import numpy as np
105
from numpy.compat import asbytes, asstr
85
106
from numpy import fromstring, ndarray, dtype, empty, array, asarray
86
107
from numpy import little_endian as LITTLE_ENDIAN
89
ABSENT = '\x00\x00\x00\x00\x00\x00\x00\x00'
90
ZERO = '\x00\x00\x00\x00'
91
NC_BYTE = '\x00\x00\x00\x01'
92
NC_CHAR = '\x00\x00\x00\x02'
93
NC_SHORT = '\x00\x00\x00\x03'
94
NC_INT = '\x00\x00\x00\x04'
95
NC_FLOAT = '\x00\x00\x00\x05'
96
NC_DOUBLE = '\x00\x00\x00\x06'
97
NC_DIMENSION = '\x00\x00\x00\n'
98
NC_VARIABLE = '\x00\x00\x00\x0b'
99
NC_ATTRIBUTE = '\x00\x00\x00\x0c'
110
ABSENT = asbytes('\x00\x00\x00\x00\x00\x00\x00\x00')
111
ZERO = asbytes('\x00\x00\x00\x00')
112
NC_BYTE = asbytes('\x00\x00\x00\x01')
113
NC_CHAR = asbytes('\x00\x00\x00\x02')
114
NC_SHORT = asbytes('\x00\x00\x00\x03')
115
NC_INT = asbytes('\x00\x00\x00\x04')
116
NC_FLOAT = asbytes('\x00\x00\x00\x05')
117
NC_DOUBLE = asbytes('\x00\x00\x00\x06')
118
NC_DIMENSION = asbytes('\x00\x00\x00\n')
119
NC_VARIABLE = asbytes('\x00\x00\x00\x0b')
120
NC_ATTRIBUTE = asbytes('\x00\x00\x00\x0c')
102
123
TYPEMAP = { NC_BYTE: ('b', 1),
122
143
class netcdf_file(object):
124
A ``netcdf_file`` object has two standard attributes: ``dimensions`` and
125
``variables``. The values of both are dictionaries, mapping dimension
145
A file object for NetCDF data.
147
A `netcdf_file` object has two standard attributes: `dimensions` and
148
`variables`. The values of both are dictionaries, mapping dimension
126
149
names to their associated lengths and variable names to variables,
127
150
respectively. Application programs should never modify these
130
153
All other attributes correspond to global attributes defined in the
131
154
NetCDF file. Global file attributes are created by assigning to an
132
attribute of the ``netcdf_file`` object.
155
attribute of the `netcdf_file` object.
159
filename : string or file-like
161
mode : {'r', 'w'}, optional
162
read-write mode, default is 'r'
163
mmap : None or bool, optional
164
Whether to mmap `filename` when reading. Default is True
165
when `filename` is a file name, False when `filename` is a
167
version : {1, 2}, optional
168
version of netcdf to read / write, where 1 means *Classic
169
format* and 2 means *64-bit offset format*. Default is 1. See
170
`here <http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Which-Format.html>`_
135
174
def __init__(self, filename, mode='r', mmap=None, version=1):
136
''' Initialize netcdf_file from fileobj (string or file-like)
140
filename : string or file-like
142
mode : {'r', 'w'}, optional
143
read-write mode, default is 'r'
144
mmap : None or bool, optional
145
Whether to mmap `filename` when reading. Default is True
146
when `filename` is a file name, False when `filename` is a
148
version : {1, 2}, optional
149
version of netcdf to read / write, where 1 means *Classic
150
format* and 2 means *64-bit offset format*. Default is 1. See
151
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Which-Format.html#Which-Format
175
"""Initialize netcdf_file from fileobj (str or file-like)."""
153
176
if hasattr(filename, 'seek'): # file-like
154
177
self.fp = filename
155
178
self.filename = 'None'
201
225
def createDimension(self, name, length):
227
Adds a dimension to the Dimension section of the NetCDF data structure.
229
Note that this function merely adds a new dimension that the variables can
230
reference. The values for the dimension, if desired, should be added as
231
a variable using `createVariable`, referring to this dimension.
236
Name of the dimension (Eg, 'lat' or 'time').
238
Length of the dimension.
202
245
self.dimensions[name] = length
203
246
self._dims.append(name)
205
248
def createVariable(self, name, type, dimensions):
250
Create an empty variable for the `netcdf_file` object, specifying its data
251
type and the dimensions it uses.
256
Name of the new variable.
258
Data type of the variable.
259
dimensions : sequence of str
260
List of the dimension names used by the variable, in the desired order.
264
variable : netcdf_variable
265
The newly created ``netcdf_variable`` object.
266
This object has also been added to the `netcdf_file` object as well.
274
Any dimensions to be used by the variable should already exist in the
275
NetCDF data structure or should be created by `createDimension` prior to
276
creating the NetCDF variable.
206
279
shape = tuple([self.dimensions[dim] for dim in dimensions])
207
280
shape_ = tuple([dim or 0 for dim in shape]) # replace None with 0 for numpy
591
672
def _pack_string(self, s):
593
674
self._pack_int(count)
595
self.fp.write('0' * (-count % 4)) # pad
675
self.fp.write(asbytes(s))
676
self.fp.write(asbytes('0') * (-count % 4)) # pad
597
678
def _unpack_string(self):
598
679
count = self._unpack_int()
599
s = self.fp.read(count).rstrip('\x00')
680
s = self.fp.read(count).rstrip(asbytes('\x00'))
600
681
self.fp.read(-count % 4) # read padding
604
685
class netcdf_variable(object):
606
``netcdf_variable`` objects are constructed by calling the method
607
``createVariable`` on the netcdf_file object.
687
A data object for the `netcdf` module.
609
``netcdf_variable`` objects behave much like array objects defined in
610
Numpy, except that their data resides in a file. Data is read by
611
indexing and written by assigning to an indexed subset; the entire
612
array can be accessed by the index ``[:]`` or using the methods
613
``getValue`` and ``assignValue``. ``netcdf_variable`` objects also
614
have attribute ``shape`` with the same meaning as for arrays, but
615
the shape cannot be modified. There is another read-only attribute
616
``dimensions``, whose value is the tuple of dimension names.
689
`netcdf_variable` objects are constructed by calling the method
690
`netcdf_file.createVariable` on the `netcdf_file` object. `netcdf_variable`
691
objects behave much like array objects defined in numpy, except that their
692
data resides in a file. Data is read by indexing and written by assigning
693
to an indexed subset; the entire array can be accessed by the index ``[:]``
694
or (for scalars) by using the methods `getValue` and `assignValue`.
695
`netcdf_variable` objects also have attribute `shape` with the same meaning
696
as for arrays, but the shape cannot be modified. There is another read-only
697
attribute `dimensions`, whose value is the tuple of dimension names.
618
699
All other attributes correspond to variable attributes defined in
619
700
the NetCDF file. Variable attributes are created by assigning to an
620
attribute of the ``netcdf_variable`` object.
701
attribute of the `netcdf_variable` object.
706
The data array that holds the values for the variable.
707
Typically, this is initialized as empty, but with the proper shape.
708
typecode : dtype character code
709
Desired data-type for the data array.
710
shape : sequence of ints
711
The shape of the array. This should match the lengths of the
712
variable's dimensions.
713
dimensions : sequence of strings
714
The names of the dimensions used by the variable. Must be in the
715
same order of the dimension lengths given by `shape`.
716
attributes : dict, optional
717
Attribute values (any type) keyed by string names. These attributes
718
become attributes for the netcdf_variable object.
723
dimensions : list of str
724
List of names of dimensions used by the variable object.
623
733
def __init__(self, data, typecode, shape, dimensions, attributes=None):