E3SM Diags Parameters

The current diagnostics (zonal_mean_xy, zonal_mean_2d, lat_lon, polar, cosp_histogram, meridional_mean_2d) support the default parameters.

Some, like zonal_mean_2d have specific parameters that only pertain to that diagnostics, like zonal_mean_2d_plevs. As we add more diagnostic packages, the need for plot-set-specific parameter only increases. We need a way to deal with them.

The Brute-Force Solution

In the acme_parameters.py, we have all of the parameters and their default value. In acme_parser.py, we have a command line parser for each of these parameters. Each parameter should be able to be modified via the command line, so this is needed.

With this solution, every new plot set specific parameter would be in the format:

{plot_set_name}_{parameter_name}

An example of this is zonal_mean_2d_plevs, which are the pressure levels for the zonal_mean_2d plot.

Pros:

There's no additional work needed to be done, this just works.
Users who add new diagnostics sets won't need do much work when new parameters are needed to be added.

Cons:

Unorganized: This solution isn't the most elegant. For a plotset like lat_lon, it'll have a default parameter zonal_mean_2d_plevs defaulted to numpy.logspace(2.0, 3.0, num=17).
Like the above point, when there are many more diagnostics sets (think 10-100), this will be a mess.

An API Solution

The problem at hand is that we need to have specific parameters for specific plot sets. I cannot think of an elegant way via the current -p, or -d methods.

Instead, we can have the users create a simple Python script, something.py, below to run the diags like:

$ something.py

In this solution, we are introducing an API to run e3sm_diags.

Running a simple example.

Below is how this simple script myparams.py would be ran. It's pretty 1:1 with running with e3sm_diags -p params.py.

# something.py
from acme_diags.parameters import CoreParameters
import acme_diags

# All of these parameters are the current defaults.
# It's analogous to what a user does via a `params.py` in:
#    e3sm_diags -p params.py
param = CoreParameters()
param.reference_data_path = '/global/project/projectdirs/acme/acme_diags/obs_for_e3sm_diags/climatology/'
param.test_data_path = '/global/project/projectdirs/acme/acme_diags/test_model_data_for_acme_diags/climatology/'
param.test_name = '20161118.beta0.FC5COSP.ne30_ne30.edison'
param.sets = ["lat_lon"]
param.seasons = ["ANN"]
# 'mpl' and 'vcs' are for matplotlib or vcs plots respectively.
param.backend = 'mpl'
# Name of folder where all results will be stored.
param.results_dir = 'lat_lon_demo'

acme_diags.run_diags(param)

You run this like:

something.py

Running with set-specific defaults

Say that zonal_mean_2d, and some_plotset_2d both a plevs parameter that's specific to those plot sets only. We want it to be [30., 50., 100.] in one, but [20., 40., 60.] in the other.

We can to change it's value to , instead of what's it's defaulted to, numpy.logspace(2.0, 3.0, num=17) or something like that.

Remember to modify the sets parameter in the CoreParameters. This is like changing the sets parameter in the myparams.py.

# something.py
from acme_diags.parameters import CoreParameters, ZonalMean2dParameters, SomePlotset2dParameters
import acme_diags

# All of these parameters are the current defaults.
# It's analogous to what a user does via a `params.py` in:
#    e3sm_diags -p params.py
param = CoreParameters()
param.reference_data_path = '/global/project/projectdirs/acme/acme_diags/obs_for_e3sm_diags/climatology/'
param.test_data_path = '/global/project/projectdirs/acme/acme_diags/test_model_data_for_acme_diags/climatology/'
param.test_name = '20161118.beta0.FC5COSP.ne30_ne30.edison'
# param.sets = ["lat_lon", "zonal_mean_2d", "some_plotset_2d"]
param.seasons = ["ANN"]
# 'mpl' and 'vcs' are for matplotlib or vcs plots respectively.
param.backend = 'mpl'
# Name of folder where all results will be stored.
param.results_dir = 'lat_lon_demo'

# These are parameters specific to the `zonal_mean_2d` plot set.
zonal_mean_2d_param = ZonalMean2dParameters()
zonal_mean_2d_param.plevs = [30., 50., 100.]

some_plotset_2d_param = SomePlotset2dParameters()
some_plotset_2d_param.plevs = [20., 40., 60.]

acme_diags.sets_to_run = ['lat_lon', 'zonal_mean_2d', 'some_plotset_2d']
acme_diags.run_diags([param, zonal_mean_2d_param, some_plotset_2d_param])

# acme_diags.run_diags([param, zonal_mean_2d_param, some_plotset_2d_param])

Running custom diags (with -d)

If a user wants to add their custom diags (diags.cfg), they'd use the above something.py with no changes like so:

something.py -d diags.cfg

Pros and Cons

Pros:

This is a longer-term solution. Many more plot sets can be added in an organized way.
Using an API to run the software, over the command line way, allow for more flexibility.
- Since this is a new way, we'll have lots of documentation/example to make it easier for users.
An API approach allows the software to be integrated into other workflows more easily.
- Ex: The images that Peter Caldwell made.
Inheritance: For example, say we have a set of diagnostics for El Nino.
- We can have ElNinoParameters with certain default parameters for El Nino..
- If we have another diagnostics package SomeElNinoPlotSet that uses these, but has some specialized ones, we can have SomeElNinoPlotSetParameters inherit from ElNinoParameters.
Works cleanly in the unified environment. No changes needed.

Cons:

The current command line solution will no longer work on new diagnostics.
- But it'll be backwards compatible, so a user can still do the below for the current diags.
```
e3sm_diags -p param.py -d diags.cfg
```
~~The command line provenance on each diagnostics image won't currently work.~~
- ~~However, we can still dump the entire something.py so the user can see what was ran.~~
- No longer an issue: After talking with Chris, we have a solution.
- Below is how the provenance would look like for the zonal_mean_2d plots:
- e3sm_diags zonal_mean_2d --plevs=[30.,50.] --ref_data_path='...'
Running with a container won't currently work. We'd need to make some modifications to get the container to work.
- Current, old way of running the container: python e3sm_diags_container.py --docker -p myparams.py.
- New way: python e3sm_diags_container.py something.py

@golaz, thanks for the input, I've updated the gist with that and some other points I've got from talking to you and @chengzhuzhang. Do note that we basically only have two ways to run the software:

The current way to run the diags, needed for backwards compatibility:

e3sm_diags -p param.py
e3sm_diags -p param.py -d diags.cfg

The newer way, which is how people will run the diags to use newer plotsets:

something.py
something.py -d diags.cfg
e3sm_diags [plotset_name] [the args]

In the newer way, the only reason we have e3sm_diags is for the provenance. We can have, for example, zonal_mean_2d [the args], but there's a chance that zoanl_mean_2d might be an executable in the user's env. So prefixing it with e3sm_diags is needed.

A user interfacing with the software in the new way would only be concerned with creating scripts like something.py. Only if they want to modify a single image would they copy/paste the provenance command with modifications. We are not really advertising that our software runs with e3sm_diags [plotset_name] [the args]. It's just a thing that people will see if they want to modify an image.

So again, most users will just be using something.py and something.py -d diags.cfg.

zshaheen/e3sm_diags_param.md