Monthly Archives: August 2013

MERRA data

“The Modern Era Retrospective-analysis for Research and Applications (MERRA) products are generated using Version 5.2.0 of the GEOS-5 DAS with the model and analysis each at 1/2×2/3 degrees resolution. Three-dimensional analyses are generated every 6 hours, and 3- dimensional diagnostics, describing the radiative and physical properties of the atmosphere, are 3-hourly. The product suite includes analyses on the native vertical grid as well on pressure surfaces. Two-dimensional data, including surface, fluxes, and vertical integrals, are produced hourly. The product suite includes monthly and monthly diurnal files. The MERRA production is being conducted in 3 separate streams, 1979 – 1989; 1989 – 1998; 1998 – present. Data are being uploaded to the MDISC after undergoing quality assurance in the GMAO.”http://gmao.gsfc.nasa.gov/merra/news/merra-land_release.php

A summary of MERRA can be found in the [brouchure](http://gmao.gsfc.nasa.gov/pubs/brochures/MERRA\ Brochure.pdf), and the best detailed information about MERRA is found in the Readme file. Data can be accessed from Goddard Earth Sciences Data Centre.

“Hourly, three-hourly, and six hourly collections consists of daily files. For collections of monthly or seasonal means, each month or season is in a separate file.”

One attraction of MERRA is the hourly resolution of some variables. Only ‘2D’ variables are available at hourly resolution, but these ‘2D’ variables are available at different heights.

Underlying file names

File names consist of five dot-delimited nodes, runid.runtype.config.collection.timestamp. Where:

Node Description
run_id MERRASVv where S=stream number and Vv=Version number
runtype prod=standard product,
ovlp=overlapping product,
spnp=spin up product,
rosb=reduced observing system product,
cers=CERES observing system product
config assim = assimilation. Uses a combination of atmospheric data analysis and model forecasting to generate a time-series of global atmospheric quantities.
simul=simulation. Uses a free-running atmospheric model with some prescribed external forcing, such as sea-surface temperatures.
frcst=forecasts. Uses a free-running atmospheric model initialized from an analyzed state.
collection All MERRA data are organized into file collections that contain fields with common characteristics. Collection names are of the form freq_dims_group_HV, where
freq can be cnst=time-independent, instF=instantaneous, tavgF=time-average and F indicates the frequency or averaging interval and can be 1 = Hourly, 3 = 3-Hourly, 6 = 6-Hourly, M = Monthly mean, U = Monthly-Diurnal mean, 0 = Not Applicable.
dims=2D or 3D,
group lowercase (cryptic) nemomic
HV, where H (horizontal) can be N=native (2/3×1/2), C=reduced (1.25×1.25), F=reduced (1.25×1), and V (vertical) can be: x=horizontal only, p=pressure levels, v=model level centres, e=model level edges
timestamp yyyymmdd

There are two collection naming conventions in operation, described part-by-part in the table below:

Short Name Standard name
MCTFHVGGG
C: Configuration, where C is one of:
A = Assimilation
F = Forecast; 
S = Simulation;
ffffN_dd_ggg_Hv
ffff: Frequency Type, where ffff is one of:
inst = Instantaneous; 
tavg = Time average; 
const = Time independent
MCTFHVGGG
T: Time Description, where T is one of:
I = Instantaneous; 
T = Time Averaged with
1=1hourly,
3=3hourly,
6=6-hourly,
M=monthly; 
C = Time Independent
ffffN_dd_ggg_Hv
N: Frequency, where N is one of:
1 = 1-hourly; 
3 = 3-hourly; 
6 = 6-hourly; 
M = Monthly Mean; 
U = Monthly Diurnal Mean; 
0 = Not Applicable;
MCTFHVGGG
H: Horizontal Resolution, where H is one of:
N = Native (2/3 x 1/2 deg);
F = Reduced Resolution Version of Model Grid (1.25 x 1 deg);
C = Reduced Resolution (1.25 x 1.25 deg)
ffffN_dd_ggg_Hv,
ggg: Group, where ggg is one of:
ana = Direct analysis products;
asm = Assimilated state variables;
tdt = Tendencies of temperature;
udt = Tendencies of eastward and northward wind components;
qdt = Tendencies of specific humidity;
odt = Tendencies of ozone;
lnd = Land surface variables;
flx = Surface turbulent fluxes and related quantities;
mst = Moist processes;
cld = Clouds;
rad = Radiation;
trb = Turbulence;
slv = Single level;
int = Vertical integrals;
chm = Chemistry forcing
MCTFHVGGG
V: Vertical Location, where V is one of:
X = Two-dimensional;
P = Pressure;
V = Model Layer Center;
E = Model Layer Edge
ffffN_dd_ggg_Hv
H: Horizontal Resolution, where H is one of:
N = Native (2/3 x 1/2 deg);
F = Reduced Resolution Version of Model Grid (1.25 x 1 deg);
C = Reduced Resolution (1.25 x 1.25 deg)
MCTFHVGGG
GGG: Group, where GGG is one of:
ANA = Direct analysis products;
ASM = Assimilated state variables;
TDT = Tendencies of temperature;
UDT = Tendencies of eastward and northward wind components;
QDT = Tendencies of specific humidity;
ODT = Tendencies of ozone;
LND = Land surface variables;
FLX = Surface turbulent fluxes and related quantities;
OCN = Ocean quantities;
MST = Moist processes;
CLD = Clouds;
RAD = Radiation;
TRB = Turbulence;
SLV = Single level;
INT = Vertical integrals;
CHM = Chemistry forcing
ffffN_dd_ggg_Hv
v: Vertical Location, where v is one of:
x = Two-dimensional;
p = Pressure;
v = Model Layer Center;
e = Model Layer Edge

Useful collections

Collection Description
MAT1NXSLV
tavg1_2d_slv_Nx
2D IAU Diagnostic, Single Level Meteorology, Time Average 1-hourly, on 2/3×1/2 grid
MAT1NXFLX
tavg1_2d_slv_Nx
2D Surface Fluxes, Single Level Meteorology, Time Average 1-hourly, on 2/3×1/2 grid
MAI6NPANA
inst6_3d_ana_Np
3-Dimensional Instantaneous 6-hourly, on pressure levels, at native resolution
MAI6NVANA
inst6_3d_ana_Nv
3D Meteorology Instantaneous 6-hourly analyzed fields on model layer center on 2/3×1/2 grid
MAIMNPANA
instM_3d_ana_Np
3D analyzed state, meteorology, instantaneous (monthly), on pressure levels, at native resolution
MAIUNPANA
(instU_3d_ana_Np)
MERRA 3D analyzed state, meteorology, instantaneous (diurnal), on pressure levels, at native resolution

A particularly useful dataset (for me) is MAT1NXSLV. Using OpenDAP (more later) to examine the files shows this structure:

http://goldsmr2.sci.gsfc.nasa.gov/opendap/MERRA/MAT1NXSLV.5.2.0/1979/01/MERRA100.prod.assim.tavg1_2d_slv_Nx.19790101.hdf
<type 'netCDF4.Dataset'>
root group (NETCDF3_CLASSIC file format):
HDFEOSVersion: HDFEOS_V2.14
missing_value: 1e+15
Conventions: CF-1.0
title: MERRA reanalysis. GEOS-5.2.0
history: File written by CFIO
institution: Global Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, MD 20771
source: Global Modeling and Assimilation Office. GEOSops_5_2_0
references: http://gmao.gsfc.nasa.gov/research/merra/
comment: GEOS-5.2.0
contact: http://gmao.gsfc.nasa.gov/
dimensions: TIME, XDim, YDim
variables: SLP, PS, U850, U500, U250, V850, V500, V250, T850, T500, T250, Q850, Q500, Q250, H1000, H850, H500, H250, OMEGA500, U10M, U2M, U50M, V10M, V2M, V50M, T10M, T2M, QV10M, QV2M, TS,     DISPH, TROPPV, TROPPT, TROPPB, TROPT, TROPQ, CLDPRS, CLDTMP, XDim, YDim, TIME, XDim_EOS, YDim_EOS, Time

Where the XDim=540 (i.e. 2/3 degree), YDim=361 (i.e. 1/2 degree), and the long names of the variables are:

Variable name long name
SLP Sea level pressure
PS Time averaged surface pressure
U850 Eastward wind at 850 hPa
U500 Eastward wind at 500 hPa
U250 Eastward wind at 250 hPa
V850 Northward wind at 850 hPa
V500 Northward wind at 500 hPa
V250 Northward wind at 250 hPa
T850 Temperature at 850 hPa
T500 Temperature at 500 hPa
T250 Temperature at 250 hPa
Q850 Specific humidity at 850 hPa
Q500 Specific humidity at 500 hPa
Q250 Specific humidity at 250 hPa
H1000 Height at 1000 hPa
H850 Height at 850 hPa
H500 Height at 500 hPa
H250 Height at 250 hPa
OMEGA500 Vertical pressure velocity at 500 hPa
U10M Eastward wind at 10 m above displacement height
U2M Eastward wind at 2 m above the displacement height
U50M Eastward wind at 50 m above surface
V10M Northward wind at 50 m above the displacement height
V2M Northward wind at 2 m above the displacement height
V50M Northward wind at 50 m above
T10M Temperature at 10 m above the displacement height
T2M Temperature at 2 m above the displacement height
QV10M Specific humidity at 10 m above the displacement height
QV2M Specific humidity at 2 m above the displacement height
TS Surface skin temperature
DISPH Displacement height
TROPPV PV based tropopause pressure
TROPPT T based tropopause pressure
TROPPB Blended tropopause pressure
TROPT Tropopause temperature
TROPQ Tropopause specific humidity
CLDPRS Cloud-top pressure
CLDTMP Cloud-top temperature
XDim longitude
YDim latitude
TIME time
XDim_EOS XDim
YDim_EOS YDim
Time Time

Getting the data

There are at least three different ways of getting the data:

MIRADOR allows http or ftp access to the full global hdf4 files

Simple Subset Wizard Web front-end to an OpenDAP server. Use the web interface to select the collection (using the short name, e.g. MAI6NPANAY), and define a bounding box. The web interface then constructs an OpenDAP request e.g. http://goldsmr3.sci.gsfc.nasa.gov/opendap/MERRA/MAI6NPANA.5.2.0/2010/01/MERRA300.prod.assim.inst6_3d_ana_Np.20100101.hdf.nc?T[0:3][0:41][226:337][188:344],U[0:3][0:41][226:337][188:344],H[0:3][0:41][226:337][188:344],V[0:3][0:41][226:337][188:344],O3[0:3][0:41][226:337][188:344],SLP[0:3][226:337][188:344],QV[0:3][0:41][226:337][188:344],PS[0:3][226:337][188:344],XDim[188:344],TIME[0:3],Height,YDim[226:337]

Use OpenDAP directly. There are some instructions here about using OpenDAP via GRADS to download MERRA data. Information gleaned from that:

  • There are three OpenDAP servers:
    • http://goldsmr1.sci.gsfc.nasa.gov/dods/ Meteorological fields for Chemical Transport Modeling
    • http://goldsmr2.sci.gsfc.nasa.gov/dods/ Two-dimensional fields
    • http://goldsmr3.sci.gsfc.nasa.gov/dods/ Three-dimensional fields

On the servers, the files are grouped by year and month, e.g.

http://goldsmr3.sci.gsfc.nasa.gov/opendap/MERRA/MAI6NPANA.5.2.0/2010/01/

Then you need to know the long name of the underlying hdf file, e.g.

http://goldsmr3.sci.gsfc.nasa.gov/opendap/MERRA/MAI6NPANA.5.2.0/2010/01/MERRA300.prod.assim.inst6_3d_ana_Np.20100101.hdf.nc

Note that the stream number comes into play here, i.e the first digit after the final ‘MERRA’:

  • Stream 1: 1979 – 1989;
  • Stream 2: 1989 – 1998;
  • Stream 3: 1998 – present.

If you have OpenDAP-enabled NetCDF Operators can use ncks:

ncks -v <variables> -d XDim,<xmin><xmax> -d YDim,<ymin><ymax> http://goldsmr2.sci.gsfc.nasa.gov/opendap/MERRA/MAT1NXSLV.5.2.0/<year>/<month>/MERRA<stream_number><version>.prod.assim.tavg1_2d_slv_Nx.<year><month><day>.hdf  <out>.nc

Concrete example:

ncks -v SLP,U850,U500,V850,V500,T850,T500,T250,Q850,Q500,H1000,H850,H500,OMEGA500,U10M,U2M,U50M,V10M,V2M,V50M,T10M,T2M,QV10M,QV2M,TS,DISPH,XDim,YDim,TIME,XDim_EOS,YDim_EOS,Time -d XDim,225,315 -d YDim,262,322 http://goldsmr2.sci.gsfc.nasa.gov/opendap/MERRA/MAT1NXSLV.5.2.0/2010/01/MERRA300.prod.assim.tavg1_2d_slv_Nx.20100101.hdf $HOME/data/reanalysis/MERRA/MERRA300.prod.assim.tavg1_2d_slv_Nx.20100101.nc

WRF with CFSR Data – Part One

I’m gradually piecing together infomation about using the CFSR dataset to drive WRF. I’ll split this post into two halves. The first deals with fetching CFSR data in an automated way from the RDA server at UCAR, the second with actually running WRF.

ds093.0 is the original CFSR dataset, and covers from 1979 to March 2011.

ds094.0 covers from 2011-01-01 up to the present.

The first thing to note is that surface variables are not available at the analysis time, so the WRF group recommend using the 6-hour forecast data. By this it seems they mean you to use the 6-hour forecast for all fields, not just surface fields. I guess this makes running ungrib much simpler, and the forecast error in a 6-hour forecast should be small anyway. According to http://drought.geo.msu.edu/data/CFSR4WRF/

“The 6-hourly CFSR data subsetting interface now includes support for WRF variable tables. By choosing one of these presets, the parameters and levels required by the WRF Preprocessing System (WPS) will automatically be selected. Users will still need to select the product type and grid resolution. To access the subsetting interface, go to http://rda.ucar.edu/datasets/ds093.0/ and click the “Data Access” tab. Then click “Internet Download”, and then the icon in the “Request a Subset…” column.

Because not all parameters/levels are available as analyses from CFSR, the WRF group recommends that users download the 6-hour Forecasts, which is selectable from the “Type of Product:” menu. The high resolution data are on the 0.5-degree grid for the pressure level parameters and on the 0.3-degree grid for the surface parameters. The low resolution data are on the 2.5-degree grid for the pressure level parameters and on the 1.875-degree grid for the surface parameters. These are selectable from the “Grid:” menu.

It has come to our attention that some users are under the assumption that the “pgbh00” and “flxf00” files from NCDC’s NOMADS server are analyses, and that they are using them as inputs to WRF. The fields in those files are NOT analyses; they are output from the first model timestep and they should not be used at all. NCEP asked in January 2010 that those files not be distributed, and they confirmed this in August 2011.

For questions about CFSR data at NCAR or about the subsetting interface, please contact Bob Dattore at dattore@ucar.edu.”

Authentication

When ever you interact with the RDA server, you must be authenticated. You can do this by sending a request and saving a cookie with either of these commands:

curl -o /dev/null -k -s -c <cookie_file_name> -d "email=<email>&passwd=<passwd>&action=login" https://rda.ucar.edu/cgi-bin/login 
wget --save-cookies <cookie_file_name> --post-data="email=<email>&passwd=<passwd>&action=login" https://rda.ucar.edu/cgi-bin/login

Then passing the authentication cookie in all future requests using either:

curl -b <cookie_file_name>
wget --load-cookies <cookie_file_name>

Getting data

There are two steps to automatically fetching data:

  1. Requesting a subset
  2. Downloading the data

A data subset can be requested in two ways, either through the web interface (via the data access tab), or by constructing the correct HTTP string to post. The web interface is self explanatory, so I will deal with the automated way, described here.

A subset request can be verified by posting a correctly formed string to the http://rda.ucar.edu/php/dsrqst-test.php using either the commands:

curl -b <cookie_file_name> -d "<dataset_description>" http://rda.ucar.edu/php/dsrqst-test.php
wget --load-cookies <cookie_file_name> --post-data "<dataset_description>" http://rda.ucar.edu/php/dsrqst-test.php

Then the subset request can be queued using either of the commands:

curl -b <cookie_file_name> -d "<dataset_description>" http://rda.ucar.edu/php/dsrqst.php
wget --load-cookies <cookie_file_name> --post-data "<dataset_description>" http://rda.ucar.edu/php/dsrqst.php

Where the <dataset_description> is one of the WRF presets described below. To subset data geographically, add all of these options to the . However WPS may not cope with the subsets if they cross longitude 0:

nlat=NN;  
slat=SS; 
wlon=WWW; 
elon=EEE; 

When requesting subsets, some WRF preset are available. Three presets available are:

WRF Model Input VTable.CFSR

  • 0.5×0.5 degree grid
  • 6-hour forecast
  • Levels:
    • mean sea level,
    • 2m,
    • 1,2,3,5,7,10,20,30,50,70,100,125,150,175,200,225,250,300,350,400,450,500,550,600,650,700,750,775,800,825,850,875,900,925,950,975,1000mb:
  • Variables
    • Geopotential height
    • Pressure reduced to MSL
    • Relative humidity
    • Temperature
    • U and V

Which corresponds to this <dataset_description> for ds093.0:

"dsid=ds093.0&rtype=S&rinfo=dsnum=093.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:0.0.0,3%217-0.2-1:0.1.1,3%217-0.2-1:0.2.2,3%217-0.2-1:0.2.3,3%217-0.2-1:0.3.1,3%217-0.2-1:0.3.5;
level=76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,221,361,362,363,557,562,563,574,577,581,913,914,219;
product=3;
grid_definition=57;
ststep=yes"

And this description for ds094.0:

"dsid=ds094.0&rtype=S&rinfo=dsnum=094.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:0.0.0,3%217-0.2-1:0.1.1,3%217-0.2-1:0.2.2,3%217-0.2-1:0.2.3,3%217-0.2-1:0.3.1,3%217-0.2-1:0.3.5;
level=76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,221,361,362,363,557,562,563,574,577,581,913,914,219;
product=3;
grid_definition=57;
ststep=yes"

WRF Model Input VTable.CSFR Surface

  • 0.313×0.313 degrees.
  • 6-hour forecast.
  • Levels:
    • ground or water surface,
    • layer between depth 0.4-0.1m,
    • layer between depth 0.1-0m,
    • layer between depth 2-1m,
    • layer between depth 1-0.4,
    • height above ground 2m,
    • height above ground 10m:
  • Variables
    • Pressure
    • Specific humidity
    • Temperature
    • U and V
    • Ice cover
    • Land cover
    • Volumetric soil water
    • Water equivalent of accumulated snow depth
  • Output files:
    • 201001010000.flxf06.gdas.20091226-20091231.grb2
    • Date prefix is valid time, date suffix is a relic of which .tar file they came from

Which corresponds to this <dataset_description> for ds093.0:

"dsid=ds093.0&rtype=S&rinfo=dsnum=093.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:2.0.192,3%217-0.2-1:0.3.5,3%217-0.2-1:0.2.2,3%217-0.2-1:0.2.3,3%217-0.2-1:0.1.0,3%217-0.2-1:0.1.13,3%217-0.2-1:2.0.0,3%217-0.2-1:10.2.0,3%217-0.2-1:0.3.0,3%217-0.2-1:0.0.0;
level=521,522,523,524,107,223,221;
product=3;
grid_definition=62;
ststep=yes"

And this <dataset_descriprtion> for ds094.0:

dsid=ds094.0&rtype=S&rinfo=dsnum=094.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:0.0.0,3%217-0.2-1:0.1.0,3%217-0.2-1:0.1.13,3%217-0.2-1:0.2.2,3%217-0.2-1:0.2.3,3%217-0.2-1:0.3.0,3%217-0.2-1:0.3.5,3%217-0.2-1:10.2.0,3%217-0.2-1:2.0.0,3%217-0.2-1:2.0.192;
level=107,221,521,522,523,524,223;
product=3;
grid_definition=68;
ststep=yes

WRF Model Input VTable.CSFR SST,

  • 0.313×0.313 degreee grid
  • 6-hour forecast,
  • Levels:
    • Surface
  • Variables:
    • Temperature
  • Output files:
    • 201001010000.flxf06.gdas.20091226-20091231.grb2

Which corresponds to this <dataset_description> for ds093.0:

"dsid=ds093.0&rtype=S&rinfo=dsnum=093.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:0.0.0;
level=107;
product=3;
grid_definition=62;
ststep=yes"

And this for ds094.0:

dsid=ds094.0&rtype=S&rinfo=dsnum=094.0;
startdate=YYYY-MM-DD HH:MM;
enddate=YYYY-MM-DD HH:MM;
parameters=3%217-0.2-1:0.0.0;
level=107;
product=3;
grid_definition=68;
ststep=yes

If a subset request is successful, the return should be an html string, containing a description of the dataset which will contain, among other things,:

<pre>Request Summary:
Index    : <numeric_index>
ID       : <request_id>
Category : Subset Data
Status   : Queue
Dataset  : ds093.0
Title    : NCEP Climate Forecast System Reanalysis (CFSR) 6-hourly Products, January 1979 to December 2010
User     : <user>
Email    : <email>
Date     : 2013-08-27
Time     : 10:11:22

Once you submit a subset request to ucar, the data is queued to be subset on the server side. You can check the status of the request by doing:

curl -b <cookie_file> http://rda.ucar.edu/#ckrqst

But this returns a webpage which uses JavaScript to populate a table of requests and statuses, so you can’t parse the returned text to check the status of a job. You will, however, receive an email when the request has been processed, and the data will be put into a subdirectory:

http://rda.ucar.edu/#dsrqst/<request_id>

This directory will contain all the files you requested, and wget and curl based scripts to download them called,

curl.<request_id>.csh
wget.<request_id>.csh

These scripts can themselves be downloaded automatically using either:

curl -b <cookie_file_name> http://rda.ucar.edu/#dsrqst/<request_id>/curl.<request_id>.csh
wget --load-cookies <cookie_file_name> --post-data "<dataset_description>" http://rda.ucar.edu/#dsrqst/<request_id>/wget.<request_id>.csh

Then, finally, once you have these scripts, you can run them to fetch the actual grib files themselves.

./curl.<request_id>.csh <passwd>
./wget.<request_id>.csh <passwd>

The underlying grib files are either called flx (surface variables), or pgbl (pressure and soil levels). Since one timestep per file is requested, they are prefixed with the valid time. There is also a time-based suffix which relates to how they were stored on disk, but you can ignore that.

In a future post, I will build a simple python programme to automate the full procedure, and then deal with running WRF using the data.

Installing python setuptools

Chicken and egg? I needed to install setuptools on a cluster where https access was blocked. I had to fetch the source to my local machine and copy over using winscp.

  • Get ez_setup.py
  • Get setuptools-0.9.8.tar.gz
  • Copy them to the machine you are installing on
  • Modify ez_install.py and change:

    DEFAULT_URL = “https://pypi.python.org/packages/source/s/setuptools/”
    DEFAULT_URL = “file:///path/to/local/setuptools/dir”

  • python ez_install –user