Category Archives: Uncategorized

Structuring a simple Python project

Python is mostly an elegant and simple language. As such, it is the tool of choice for engineers, academics, and other “accidental developers” who need to write code outside of a dedicated software development company. Many of these developers quite like coding, would like to share and cooperate on code and make reusable tools. Python packaging, testing, distributing is, in contrast to the core language, an absolute mess. The foundations are a swamp, on top of which is built an increasing array of complex tools: tox, pavement, pytest, nosetest, pip, conda, virtualenv. To the uninitiated, this appears as a complex mass which you struggle even to work out what anything is for. I still don’t know whether I should use setuptools or distutils see here.

There are a variety of templates, “cookie-cutter” recipes for example cookiecutter-pypackage, python-project-template. But I have found them a bit too complex for my needs.

The problem is a lack of middle ground: a simple project layout, a simple way to write and run tests, and a simple way to package and distribute their code. We don’t all need to use continuous integration. Why should we have to install yet another task runner, template engine or similar which “simplifies” the process, but requires a different configuration file. I honestly don’t know whether I need a requirements.txt, an environment.yml, a setup.cfg, a travis.yml a tox.ini or whatever. Developers take note, every time you write a tool which needs another config file at top level, you increase the clutter and complexity of developing a project. Sure, all of these things are useful and probably have a place

This post is aimed at providing a structure and a process suitable for small-time developers who still wish to follow good practice, like me. It is my own attempt to boil down all the myriad options and information into a something usable.

Terminology

In my mind project > package > module. And

  • project corresponds to a repository on GitHub
  • project usually contains one python package, in a subdirectory, often with the same name as the project
  • project contains other stuff, such as the docs, README, tools for helping other developers
  • the package contains multiple python modules

In the template, I have named them such, so you have the directory structure:

/project
/project/setup.py
(and license, README, etc)
/project/doc
/project/tests
/project/package
/project/package/module.py

Tools

The tools I have settled on are:

  • git + GitHub for version control and hosting
  • anaconda for python control and virtual environments
  • pytest for testing
  • sphinx for documentation using napoleon extension

Instructions

  1. Clone the template repo:

    git clone https://github.com/samwisehawkins/project

  2. Change the name of the project and python environment

    mv project myproject 
    vi myproject/environment.yml  --> change environment name to match
    
  3. Change the package name, python modules etc. Generally adapt project to your needs.

  4. Edit setup.py to reflect name changes

  5. Remove git history and re-initialise

    cd myproject 
    rm -rf .git 
    git init
    
  6. Create a virtual environment with the same name as the project

    cd myproject 
    conda env create -f environment.yml 
    source activate myproject
    
  7. Install package into new virtual environment, using develop so it is symbolically linked to source code

    python setup.py develop
    
  8. Run tests

    pytest
    
  9. Create documentation structure (use project/doc as target)

    sphinx-quickstart
    
  10. Edit doc/index.rst to auto-include modules and members:

    vi doc/index.rst
    .. automodule:: package.module
       :members:
    
  11. Build the docs

    cd doc
    make html
    
  12. Package and publish (coming soon).

NCL colormaps in Python

NCL colour maps in python

I’ve never found any good colormaps in python which match NCL for presenting geophysical data, so I pinched them and hard-coded them into this python module.

To use:

import nclcmaps as ncm
cmap = ncm.cmap(name)

All of the data is hard-coded in the python file, and the total size is around 528Kb, so if you need a more efficient way get in touch and I’ll split the data from the code and make it load on demand.

Handling configurations in Python

After years of trying different strategies for handling configurations in python, I’ve finally settled on a solution which ticks all my boxes. This:

  • allows both file and command-line arguments
  • is based on existing formats (json or yaml)
  • is easy to document (both in the config file, and in the module’s doctstring)
  • makes use of the awesome docopt
  • allows nested configurations
  • allows reference to environment variables
  • allows reference to variables elsewhere in the config file

Check it out here

Git branching model

I found a good branching model for git.

$ git checkout -b myfeature develop
# Switched to a new branch "myfeature"

Do work on myfeature here, when finished:

$ git checkout develop
# Switched to branch 'develop'
$ git merge --no-ff myfeature
# Updating ea1b82a..05e9557
# (Summary of changes)
$ git branch -d myfeature
# Deleted branch myfeature (was 05e9557).
$ git push origin develop

However, I tend to work with two branches “develop” and “master”, rather than feature specific
branches. I do any changes to “develop”, before merging them to “master” when they are ready.

WRF Registry Edits

A quick post for reference on the syntax of registry state entries in WRF, to facilitate turning off output variables in WRF to keep file-size down.

Registry state entry

type   sym   dims  use     Tlev  Stag   IO       DName   Description  Units   
real   u_gc  igj   dyn_em   1      XZ   i1       "UU"   "x-wind..."  "m s-1"
real   u     ikjb  dyn_em   2      X    i0rhusdf "U"    "x-wind..."  "m s-1"

IO string:

  • i : initial [0-9], 0 is principal stream, 1-9 auxilliary
  • h : history [0-9]
  • b : boundary
  • r : restart
  • usdf : nesting up, down, smooth, force

WRF with CFSR Data – Part Two

See Part One for hints on how to fetch CSFR data in an automated way. You only need two of the WRF presets, since SST is included in surface files:

  • pressure level data (pgbh06 files)
  • surface and fluxes (flxf06 files)

I rename the files and strip off the trailing date which refers to how they are archived e.g.

201001011800.pgbh06.gdas.20100101-20100105.grb2  ------>  201001011800.pgbh06.gdas.pressure.grb2
201001020000.flxf06.gdas.20100101-20100105.grb2  ------>  201001020000.flxf06.gdas.sst.grb2

Then you need to run ungrib.exe twice:

  1. in namelist.wps, set prefix=PLEVS and link Vtable.CFSR_press_pgbh06 --> Vtable
  2. in namelist.wps, set prefix=SFLUX and link Vtable.CFSR_sfc_flxf06 --> Vtable

That should create two sets of intermediate files prefixed PRESS and SFLUX. Next run metgrid with the

fg_name = 'PLEVS',  'SFLUX',

Et voila! It should work.

Open Meteo Data

The US has a very different model for met data than Europe. Centrally funded agencies produce forecasts, which are seen as essential for the country’s infrastructure. This data can then be made available for free. This encourages innovation, as other specialists then take that data, build funky interfaces, derive new variables. A whole bunch of weather forecasts online derive ultimately from free GFS data.

Europe actually has higher quality met data available courtesy of ECMWF. However, it will cost you an arm and two legs to access it. I recently asked two persons relatively senior at ECMWF and the UK Met Office whether they would like to see a open data model in Europe. They both responded very positively. It isn’t due to the people who work in these institutions wanting to keep things secret, it’s because they are required by governments to try and recoup some of their costs by maximising revenue. However, this revenue recouped is relatively small in relation to their total funding. With a small change of funding structure, I’m sure they would embrace this approach.

Many of the models too, can be released as open source, allowing a much larger community of active users and developers. The success of WRF is dragging many of the European met agencies into the open data 21st century. The Open Meteo Forecast is a forecast to run WRF at a decent resolution over Europe, and make the data publicly available. It is exactly this approach, showing that it can be done, and that it encourages innovation in the use of data which will help the European met agencies change their model to the much more productive open source, open data model.

I’m hoping to get involved in the project, if work time allows. Support it!

Converting NetCDF to Grib

It’s possible to convert NetCDF to grib format using the Climate Data Operators. For example, if you want to use the high resolution SST dataset produced by DMI (refs), as input to WRF, the you need to convert it.

I found some info on a post here. The basic steps needed using CDO are:

Split the NetCDF file into separate variables, one per file:

cdo splitname infile.nc split_

This creates sets of files for each variable in the input, e.g. split_sst.nc and split_mask.nc. Now convert to grib and map the variable to something “grib-like”. CDO uses a parameter identifier to deal with grib codes and tables. GRIB1 parameter identifier has the two components, GRIB2 have three components:

GRIB1 parameter identifier: < code number>.<table number>  e.g. temperature 130.128
GRIB2 parameter identifier: <parameter number>.<parameter category>.<discipline>  e.g temperature 0.0.0

By looking into the WPS Vtables, you can copy sensible parameter values which WRF will recognise:

GRIB1| Level| From |  To  | metgrid  | metgrid | metgrid                                 |GRIB2|GRIB2|GRIB2|GRIB2|
Param| Type |Level1|Level2| Name     | Units   | Description                             |Discp|Catgy|Param|Level|
-----+------+------+------+----------+---------+-----------------------------------------+-----------------------+
  11 | 100  |   *  |      | TT       | K       | Temperature                             |  0  |  0  |  0  | 100 |
  33 | 100  |   *  |      | UU       | m s-1   | U                                       |  0  |  2  |  2  | 100 |
  34 | 100  |   *  |      | VV       | m s-1   | V                                       |  0  |  2  |  3  | 100 |
  52 | 100  |   *  |      | RH       | %       | Relative Humidity                       |  0  |  1  |  1  | 100 |
   7 | 100  |   *  |      | HGT      | m       | Height                                  |  0  |  3  |  5  | 100 |
  11 | 105  |   2  |      | TT       | K       | Temperature       at 2 m                |  0  |  0  |  0  | 103 |
  52 | 105  |   2  |      | RH       | %       | Relative Humidity at 2 m                |  0  |  1  |  1  | 103 |
  33 | 105  |  10  |      | UU       | m s-1   | U                 at 10 m               |  0  |  2  |  2  | 103 |

I need SST, so I guess temperature is what I need, i.e. 0.0.0 . Use the CDO command to convert to grib and set the parameter code at the same time

cdo -f grb2 -setparam,0.0.0 split_temp.nc split_temp.grb2