What is this package?

This package makes it simple to load geospatial data from files. Currently binary and grib1/grib2 files are supported. Data can be loaded across a range of dates, forecast hours (for forecast data), and ensemble members (for ensemble forecast data).

This package uses wgrib and wgrib2 in order to read grib files, and must be installed on your system for this package to work. You can test your system to make sure you have wgrib and wgrib2 available with the following code:

$ python -c "from cpc.geofiles import test_wgrib ; test_wgrib()"

How do I read in data?

Data can either be read in for a single file, or over a range of dates, forecast hours, or ensemble members.

Reading data from a single file

The reading module contains a function for reading grib1 & grib2 files. Binary files are simple to read in without this module. For example:

>>> import numpy as np
>>> data = np.fromfile('/path/to/file.bin', dtype='float32')

Since reading in grib data is a little less straightforward, this module makes it easier. The function that reads in grib data is called read_grib(). You’ll need to specify the following parameters in order to read in a file:

grib_var and grib_level are used as strings to match grib records. You can see all the grib records in your file using wgrib or wgrib2. For example, for a concise listing of records:

$ wgrib2 /path/to/files/gefs_20160515_00z_f342_m10.grb2

or for a verbose listing:

$ wgrib2 -V /path/to/files/gefs_20160515_00z_f342_m10.grb2

For grib1 files, use wgrib the same way. You can filter out just the variables and levels like this:

$ wgrib2 -var -lev /path/to/files/gefs_20160515_00z_f342_m10.grb2

Once you have the variable and level for the record you want to read in, you can pass that to read_grib(). For example, to read in 2m temperature from a grib2 file:

>>> from cpc.geogrids import GeoGrid
>>> from cpc.geofiles.reading import read_grib
>>> file = '/path/to/files/gefs_20160515_00z_f342_m10.grb2'
>>> grib_type = 'grib2'
>>> grib_var = 'TMP'
>>> grib_level = '2 m above ground'
>>> geogrid = GeoGrid('1deg-global')
>>> data = read_grib(file, grib_type, grib_var, grib_level, geogrid)
>>> print(data.shape, data[0])
(65160,) 237.4

Reading data from multiple files

If you want to “load a dataset” (read in data from multiple files), you can use the loading module. The loading module loops over dates (for all types of datasets), forecast hours (for forecast datasets) and members (for ensemble forecast datasets), and reads in data across the range of all of those parameters.

Like for the reading module, the loading module depends on wgrib and wgrib2 for reading in grib1/grib2 data.

Dataset objects

Data is loaded into a Dataset object, from which you can extract the data itself, plus a few QC-related attributes such as dates with files not loaded, a list of the files not loaded, and the dates that were loaded. There are several types of Datasets. Here is the hierarchy of Dataset types, and what information they contain:

General Datasets

All Datasets contain the following attributes:

Observations

Observations contain all attributes that Datasets have, plus:

Forecasts

Forecasts contain all attributes that Datasets have, plus:

EnsembleForecasts contain all attributes that Forecasts have, plus:

DeterministicForecasts contain all attributes that Forecasts have, plus:

Climatologies

Climatologies contain all attributes that Datasets have, plus:

Loading observation data

To load observation data, use the loading.load_obs() function. You’ll need to specify the following parameters:

Some optional parameters are:

Note that observation binary files are assumed to have the format record-num x grid-point for multi-record files, or just a sequence of grid points for single-record files.

Here’s an example of how to load a few days of observation data:

>>> from cpc.geogrids import Geogrid
>>> from cpc.geofiles.loading import load_obs
>>> valid_dates = ['20150101', '20150102', '20150103']
>>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/'\
                    'tmean_01d_{yyyy}{mm}{dd}.bin'
>>> data_type = 'binary'
>>> geogrid = Geogrid('1deg-global')
>>> dataset = load_obs(valid_dates, file_template, data_type, geogrid)
>>> print(dataset.obs.shape, dataset.obs[:, 0])
(3, 65160) [-28.48999405 -28.04499435 -27.81749725]

Loading deterministic forecast data

To load deterministic forecast data, use the loading.load_dtrm_fcsts() function. You’ll need to specify the following parameters:

Some optional parameters are:

Here’s an example of how to load a few days of deterministic forecast data:

>>> from cpc.geogrids import Geogrid
>>> from cpc.geofiles.loading import load_dtrm_fcsts
>>> valid_dates = ['20160101', '20160102', '20160103']
>>> fhrs = range(0, 120, 6)
>>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/{cc}/' \
                    'gfs_{yyyy}{mm}{dd}_{cc}z_f{fhr}.grb2'
>>> data_type = 'grib2'
>>> geogrid = Geogrid('0.5-deg-global-center-aligned')
>>> grib_var = 'TMP'
>>> grib_level = '2 m above ground'
>>> dataset = load_dtrm_fcsts(valid_dates, fhrs, file_template,
                              data_type, geogrid, grib_var=grib_var,
                              grib_level=grib_level)
>>> print(dataset.fcst.shape, dataset.fcst[:, 0])
(3, 259920) [ 246.64699936  246.50599976  245.97450104]

Loading ensemble forecast data

To load ensemble forecast data, use the loading.load_ens_fcsts() function. You’ll need to specify the following parameters:

Some optional parameters are:

Here’s an example of how to load a few days of ensemble forecast data:

>>> from cpc.geogrids import Geogrid
>>> from cpc.geofiles.loading import load_ens_fcsts
>>> valid_dates = ['20160101', '20160102', '20160103']
>>> fhrs = range(0, 120, 6)
>>> members = range(0, 21)
>>> file_template = '/path/to/files/{yyyy}/{mm}/{dd}/{cc}/' \
                    'gefs_{yyyy}{mm}{dd}_{cc}z_f{fhr}_m{member}.grb2'
>>> data_type = 'grib2'
>>> geogrid = Geogrid('1deg-global')
>>> grib_var = 'TMP'
>>> grib_level = '2 m above ground'
>>> dataset = load_ens_fcsts(valid_dates, fhrs, members, file_template,
                             data_type, geogrid, grib_var=grib_var,
                             grib_level=grib_level)
>>> print(dataset.ens.shape)
(3, 21, 65160)
>>> print(dataset.ens[:, :, 0])
[[ 246.18849945  246.40299683  247.11050034  245.95850067  246.17949905
   246.91550064  247.41700134  246.53700104  247.96300125  246.05699921
   246.08150101  247.11800003  247.46500015  247.30050049  247.44899979
   245.84649963  247.8234993   246.21900101  246.45600128  245.72950058
   246.05299988]
 [ 246.11650085  245.45250092  247.54049759  246.35499878  245.56750107
   246.74899902  247.23949966  246.52750015  247.40500031  245.96500092
   245.85749969  246.07099915  247.3465004   246.61099854  245.78749771
   247.18349838  246.47999954  245.44049988  245.78899994  245.67700043
   245.87299957]
 [ 245.88300095  245.5995018   247.63799896  247.21050034  245.88849945
   246.78749847  246.15800018  246.15749969  246.41600113  246.00299988
   246.80950012  246.51200104  247.11650009  246.2659996   245.96800156
   247.20250168  246.22499924  245.72900162  245.85200043  244.81850128
   245.73949966]]
>>> print(dataset.ens_mean.shape)
(3, 65160)
>>> print(dataset.ens_mean[:, 0])
[ 246.67957157  246.33497583  246.28476225]

Loading climatology data

To load climatology data, use the loading.load_obs() function. You’ll need to specify the following parameters:

Some optional parameters are:

Here’s an example of how to load a few days of climatology data:

>>> from cpc.geogrids import Geogrid
>>> from cpc.geofiles.loading import load_climos
>>> valid_days = ['0101', '0102', '0103']
>>> file_template = '/path/to/files/tmean_clim_poe_05d_{mm}{dd}.bin'
>>> geogrid = Geogrid('1deg-global')
>>> num_ptiles = 19
>>> dataset = load_climos(valid_days, file_template, geogrid,
                          num_ptiles=num_ptiles, debug=True)
>>> print(dataset.climo.shape)
>>> print(dataset.climo[:, :, 0])

How do I convert data?

Binary files can be converted to text files using functions in cpc.geofiles.conversion.

To convert a binary forecast file to text, use the function cpc.geofiles.conversion.fcst_bin_to_txt(). You’ll need to specify the following parameters:

To convert a binary observation file to text, use the function cpc.geofiles.conversion.obs_bin_to_txt(). You’ll need to specify the following parameters:

See the API documentation for more information.