Documentation of the JOIN REST interface

A Representational State Transfer (REST) service allows querying all metadata and data products from the Tropospheric Ozone Assessment Report (TOAR) database of surface ozone observations. This database is described in Schultz et al. (2017a). The main TOAR data products are available as supplement to that article on PANGAEA (Schultz et al., 2017b). Note that the online database, which is queried via the REST service, differs from the database described in Schultz et al. (2017) as more data have been added since writing the manuscript. The primary purpose of the REST interface described here is machine-machine communication, i.e. the inclusion of TOAR data into other web services such as JOIN. However, you can also employ the TOAR REST services for specific queries of the database or in your data analysis software. An example script in python is given below.

This documentation describes the URL architecture and query options of the TOAR REST interface. For general information on REST, please consult other resources.

References:

Schultz, M. G. et al. (2017a) Tropospheric Ozone Assessment Report:Database and Metrics Data of Global Surface Ozone Observations, Elementa. [issue, page range, and doi to be added]

Schultz, M. G. et al. (2017b) Tropospheric Ozone Assessment Report:Global Surface Ozone Data Products, supplement to Schultz et al. (2017a), https://doi.org/10.1594/PANGAEA.876108.

1. General

1.1 Base URL

https://join.fz-juelich.de/services/rest/surfacedata/

Response: Description and documentation of available REST services (this document)

1.2 Services

The following information services are available and described individually below. Each service is invoked by appending its name and possible query arguments to the base URL.

1.3 Query arguments

In order to control the database queries and hence the response of the TOAR REST service, you can add arguments to the service URL. These arguments must adhere to the format argumentname=value. The first argument is prepended by a ?character, all other arguments are separated by & characters.

Example:https://join.fz-juelich.de/services/rest/surfacedata/stations/?station_country=Germany&parameter_name=o3

For some arguments, multiple values are allowed. According to the REST standard, these should be specified by repeating the argumentname for each value. However, for convenience it is also possible to provide multiple argument values as comma-separated list. Note that for arguments, where only one value is allowed, the service will only use the last value and silently ignore all other values. This is a common feature of REST services.

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?parameter_name=o3&parameter_name=no2

or (non-standard): https://join.fz-juelich.de/services/rest/surfacedata/stations/?parameter_name=o3,no2

Date values must be supplied as string with the format YYYY-MM-DD hh:mm. Note that the blank character is escaped as %20 in http URLs. You can write blanks in the address line of your browser, but when you copy a URL you will see that blanks are replaced by %20. Do not use quotes around a date string (or in fact any other string argument).

Value ranges can be specified by including the first and last value in square brackets, separated by comma.

Example:https://join.fz-juelich.de/services/rest/surfacedata/stats/?id=21919&daterange=[2010-01-01 00:00,2016-12-31 23:00]

1.4 Response format

The default response format is json. You can control the format with the format= option in all queries. Currently, only json and html are supported.

1.5 Error messages

The REST service may return a page with error code 500 if you try to open a malformed URL. Usually, a meaningful error message shall be returned in this case.

Note that queries which are formally correct, but return no results, return a valid page (HTML code 200) with empty content. If the response format is json, you will typically receive an empty array [] in this case.

2. Description of services 

2.1 Parameters

https://join.fz-juelich.de/services/rest/surfacedata/parameters/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

name = <string> (examples: o3, ox, co, no, no2, nox, ch4, pm2.5, pm10, temp)

format = <string> (json|html)

metadata = <Boolean> (True|False; default: False)

Multiple parameter names can be given by repeating the name= option or as a comma-separated list.

Response: By default, the query will return a list of parameter names. If metadata=True, the complete parameter metadata will be returned as lists with field names parameter_name, parameter_long_name, parameter_display_name, parameter_cf_standard_name, parameter_units, parameter_formula.

If no QUERY-OPTIONS are given, the complete set of parameter names will be returned in json format.

Results are ordered alphabetically by parameter_name.

Example:https://join.fz-juelich.de/services/rest/surfacedata/parameters/

Result:

["albedo", "aswdifu", "aswdir", "benzene", "ch4", "cloudcover", "co", "ethane", "humidity", "no", "no2", "nox", "o3", "ox", "pblheight", "pm1", "pm10", "pm2p5", "press", "propane", "relhum", "so2", "temp", "toluene", "totprecip", "u", "v", "wdir", "wspeed"]

Example: https://join.fz-juelich.de/services/rest/surfacedata/parameters/?name=o3&name=no&name=ethane&format=json&metadata=True

or: https://join.fz-juelich.de/services/rest/surfacedata/parameters/?name=o3,no,ethane &format=json&metadata=True

Result:

[["ethane", "Ethane", "Ethane", "mole_fraction_of_ethane_in_air", "nmol mol-1", "C2H6"],["no", "nitrogenmonoxide", "NO", "mole_fraction_of_nitrogen_monoxide_in_air", "nmol mol-1", "NO"],["o3", "ozone", "Ozone", "mole_fraction_of_ozone_in_air", "nmol mol-1", "O3"]]


2.2 Networks

https://join.fz-juelich.de/services/rest/surfacedata/networks/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

name = <string> (examples: GAW, UBA, CAPMON, EANET)

format = <string> (json|html)

metadata = <Boolean> (True|False; default is False)

Multiple network names can be given by repeating the name= option (URL standard) or as comma-separated list.

Response: If no QUERY-OPTIONS are given, the complete set of network names will be returned as a list in json format. If metadata=True, the response will consist of lists with the field names network_name, datacenter_name, datacenter_fullname, datacenter_url.

Example:https://join.fz-juelich.de/services/rest/surfacedata/networks/

Result:

["AIRBASE", "AIRMAP", "AQS", "CAPMON", "CASTNET", "EANET", "EMEP", "GAW", "ISRAQN", "NAPS", "NIES", "OTHER", "UBA"]

Example:https://join.fz-juelich.de/services/rest/surfacedata/networks/?name=UBA&metadata=True

Result:

[["UBA", "Federal Environment Agency", "German Federal Environment Agency", "http://www.umweltbundesamt.de/en/data/current-concentrations-of-air-pollutants-in-germany"]]


2.3 Stations

https://join.fz-juelich.de/services/rest/surfacedata/stations/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

network_name = <string> (examples: GAW, UBA, CAPMON, EANET)

station_id = <string> (examples: DENW058, CGO540S00, CVO, 13-089-0002)

station_name = <string: regular expression> (examples: MOUNT PEARL, New Y)

station_country = <string> (examples: Germany, United Kingdom, France)

... and many other keywords as described in section 3 below.

format = <string> (json|html)

as_dict = <Boolean> (True|False; default is False). Only applicable if format is json.Returns results as json dictionaries instead of lists.

Multiple argument values can be given by repeating the respective query option. Alternatively, you can provide multiple arguments as comma-separated list.

Response: Each query result consists of the fieldsnetwork_name, station_id, station_name, station_lon, station_lat, station_alt. If you wish to retrieve other fields of station metadata, you can use the search service (section 2.5) with the columns argument.

If no QUERY-OPTIONS are given, the complete set of stations will be returned.

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/

Result:

[["ISRAQN", "ILA02RB", "ariel", 35.168563, 32.103376, 546.0], ["GAW", "ANG638N00", "Angra do Heroismo", -27.22, 38.67, 74.0], ["GAW", "CAS639N00", "Castelo Branco", -7.47, 39.83, 386.0], ["GAW", "CGO540S00", "Cape Grim", 144.689938889, -40.6831194444, 94.0], . . .]

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?parameter_name=o3&parameter_name=no2

or: https://join.fz-juelich.de/services/rest/surfacedata/stations/?parameter_name=o3,no2

The output will look similar, but here only stations which have an ozone or NO2 data series will be returned. Note that it is not possible to query the database for stations which have ozone and NO2 data. If you need this functionality you must perform separate queries for parameter_name=o3 and parameter_name=no2 and combine the two search results in your application.

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?station_name=Cape

Result:

[["GAW", "CGO540S00", "Cape Grim", 144.689938889, -40.6831194444, 94.0], ["EANET", "GAWCOI", "Cape Ochiishi", 145.5, 43.15, 49.0], . . .]

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?station_htap_region=EAS,SEA&station_toar_category=urban

Returns records of all stations in regions EAS or SEA (East Asia and South East Asia) which are classified as urban according to the TOAR station category.

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?station_htap_region=EAS,SEA&station_toar_category=urban&as_dict=True

Returns the same result as the previous example, but as list of json dictionaries:

[{"network_name": "NIES", "station_id": "jp01101010", "station_name": "Senta", "station_lon": 141.3539, "station_lat": 43.0619, "station_alt": 19.0}, {"network_name": "NIES", "station_id": "jp14201020", "station_name": "Oppamagyouseisenta", "station_lon": 139.6319, "station_lat": 35.3183, "station_alt": 3.0}, {"network_name": "NIES", "station_id": "jp01102010", "station_name": "Shinoro", "station_lon": 141.3714, "station_lat": 43.1472, "station_alt": 4.0}, . . . ]

Example: https://join.fz-juelich.de/services/rest/surfacedata/stations/?altitude=1000.&vtol=500.

The output looks similar to the one from the previous examples, but only stations with a station_alt between 500 and 1500 m will be returned (see description of the search options in section 3).


2.4 Series

https://join.fz-juelich.de/services/rest/surfacedata/series/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

parameter_name = <string> (examples: o3, no2, temp)

network_name = <string> (examples: GAW, UBA, CAPMON, EANET)

station_id = <string> (examples: DENW058, CGO540S00, CVO, 13-089-0002)

station_name = <string: regular expression> (examples: MOUNT PEARL, New Y)

station_country = <string> (examples: Germany, United Kingdom, France)

... and many other keywords as described in the search query (section 2.8).

format = <string> (json|html)

as_dict = <Boolean> (True|False; default is False). Only applicable if format is json.Returns results as json dictionaries instead of lists.

Multiple argument values can be given by repeating the respective query option. Alternatively, you can provide multiple arguments as comma-separated list.

Response: Each query result consists of the fieldsseries_id, network_name, station_id, parameter_label.

If no QUERY-OPTIONS are given, the complete set of stations will be returned.

Example:https://join.fz-juelich.de/services/rest/surfacedata/series/?station_name=Bay

Result:

[[22119, "CAPMON", "CAPMCANL1GOS", "O3"], [28469, "NAPS", "010601", "O3"]]

Example:https://join.fz-juelich.de/services/rest/surfacedata/series/?station_name=Bay&as_dict=True

Result:

[{"id": 22119, "network_name": "CAPMON", "station_id": "CAPMCANL1GOS", "parameter_label": "O3"}, {"id": 28469, "network_name": "NAPS", "station_id": "010601", "parameter_label": "O3"}]

Example:https://join.fz-juelich.de/services/rest/surfacedata/series/?parameter_name=temp

Result:

[[16650, "UBA", "DENW021", "TEMP"], [16698, "UBA", "DENW059", "TEMP"], [16639, "UBA", "DENW067", "TEMP"], . . .]

Example:https://join.fz-juelich.de/services/rest/surfacedata/series/?parameter_name=temp&as_dict=True

Result:

[{"id": 16650, "network_name": "UBA", "station_id": "DENW021", "parameter_label": "TEMP"}, {"id": 16698, "network_name": "UBA", "station_id": "DENW059", "parameter_label": "TEMP"}, . . . ]


2.5 Search

https://join.fz-juelich.de/services/rest/surfacedata/search/[?QUERY-OPTIONS]

where QUERY-OPTIONS are:

network_name = <string> (examples: GAW, UBA, CAPMON, EANET)

station_id = <string> (examples: DENW058, CGO540S00, CVO, 13-089-0002)

station_name = <string: regular expression> (examples: MOUNT PEARL, New Y)

station_country = <string> (examples: Germany, United Kingdom, France)

parameter_name = <string> (examples: o3, no2, temp)

... and many other keywords as described in section 3.

columns = <string> (comma-separated list of database columns to be included in response; see sections 3.1 and 3.2)

format = <string> (json|html)

as_dict = <Boolean> (True|False; default is False). Only applicable if format is json.Returns results as json dictionaries instead of lists.

Aggregate = <Boolean>(True|False; default is False).If more than one data series is found for a station, combine results into one record per station.

Multiple argument values can be given by repeating the respective query option. Alternatively, you can provide multiple arguments as comma-separated list.

Response: Each query result consists of the fields that are specified in the columns argument. If columns are not specified, the output of each record will consist of the fieldsseries_id, network_name, station_id, parameter_label as the series query.

If no QUERY-OPTIONS are given, the complete set of stations will be returned.

Allowed column names are: numid (i.e. internal station number), network_name, station_id, station_type, station_type_of_area, station_category, station_name, station_country, station_state, station_lon, station_lat, station_alt, station_alt_flag, station_coordinate_status, station_reported_alt, station_google_alt, google_resolution, station_etopo_alt, station_etopo_min_alt_5km, station_etopo_relative_alt, station_timezone, station_population_density, station_max_population_density_5km, station_max_population_density_25km, station_nightlight_1km, station_nightlight_5km, station_max_nightlight_25km, station_nox_emissions, station_omi_no2_column, station_rice_production, station_wheat_production, station_climatic_zone, station_htap_region, station_dominant_landcover, station_landcover_description, station_toar_category, id (i.e. the data series internal number), parameter_name, parameter_label, parameter_attribute, parameter_sampling_type, parameter_measurement_method, parameter_original_units, parameter_calibration, parameter_contributor_shortname, parameter_contributor, parameter_contributor_country, parameter_dataset_type, parameter_status, creation_date, modification_date, comments, data_start_date, data_end_date, parameter_pi, parameter_pi_email.

Note that in some cases the column name differs from the argument name of a search (or stations or series) query. For example, to search for stations in a given longitude range, you must write longitude=[4.2,5.7], whereas in order to retrieve the longitude values in the response, you must add station_lon to the columns list. See sections 3.1 and 3.2 for details on the database columns and query options.

Use of the columns argument:

Example: https://join.fz-juelich.de/services/rest/surfacedata/search/?station_name=Bay&parameter_name=o3,no2,temp&columns=id,station_id,station_name,station_country,station_state,parameter_pi

Result:

[[26355, "06-075-0006", "San Francisco - Bayview Hunters Point", "United States of America", "California", "unknown"], [28501, "60809", "421 JAMES STREET SOUTH _THunder Bay_", "Canada", "Ontario", "unknown"], [47859, "DECBAY01", "Deception Bay", "Australia", "Queensland", "David Wainwright"], [38271, "DEBY111", "Bayreuth/Hohenzollernring", "Germany", "Bayern", "unknown"], [48283, "HBA775S00", "Halley Bay", "United Kingdom", "unknown", "Neil Brough"], [28499, "60807", "615 JAMES STREET SOUTH _THunder Bay_", "Canada", "Ontario", "unknown"], [38272, "DEBY010", "Bayreuth/Rathaus", "Germany", "Bayern", "unknown"], [86943, "DEBY111", "Bayreuth/Hohenzollernring", "Germany", "Bayern", "Christian Ohlwein, Jan Keller"], [27989, "48-201-0055", "Houston Bayland Park", "United States of America", "Texas", "unknown"], [86848, "DEBY010", "Bayreuth/Rathaus", "Germany", "Bayern", "Christian Ohlwein, Jan Keller"], . . . ]

Use of aggregate:

Example: https://join.fz-juelich.de/services/rest/surfacedata/search/?station_name=Bay&parameter_name=o3,no2,temp&columns=id,station_id,station_name,station_country,station_state&aggregate=True

Result:

[[[27989], "48-201-0055", "Houston Bayland Park", "United States of America", "Texas"], [[28500], "60808", "412 JAMES STREET SOUTH _THunder Bay_", "Canada", "Ontario"], [[19109, 86943], "DEBY111", "Bayreuth/Hohenzollernring", "Germany", "Bayern"], [[47859], "DECBAY01", "Deception Bay", "Australia", "Queensland"], [[28663], "64701", "1385 RIVER ROAD _Georgian Bay SouTH_", "Canada", "Ontario"], [[28539], "62001", "CHIPPEWA ST. - DND _NorTH Bay_", "Canada", "Ontario"], [[26983], "22-047-0009", "Bayou Plaquemine", "United States of America", "Louisiana"], [[27998], "48-201-1017", "Baytown Eastpoint", "United States of America", "Texas"], [[22119], "CAPMCANL1GOS", "Goose Bay", "Canada", "Newfoundland and Labrador"], [[48283], "HBA775S00", "Halley Bay", "United Kingdom", "unknown"], [[38271, 86942], "DEBY111", "Bayreuth/Hohenzollernring", "Germany", "Bayern"], [[26355], "06-075-0006", "San Francisco - Bayview Hunters Point", "United States of America", "California"], [[27388], "34-017-0006", "Bayonne", "United States of America", "New Jersey"], [[28499], "60807", "615 JAMES STREET SOUTH _THunder Bay_", "Canada", "Ontario"], [[38272, 86847], "DEBY010", "Bayreuth/Rathaus", "Germany", "Bayern"], [[28501], "60809", "421 JAMES STREET SOUTH _THunder Bay_", "Canada", "Ontario"], [[18924, 18927, 86848], "DEBY010", "Bayreuth/Rathaus", "Germany", "Bayern"]]

Use of longitude and latitude bounds:

Example: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&longitude=[1.2,1.7]&latitude=[43.2,43.9]&columns=network_name,station_id,station_name,station_country,station_state,id,parameter_label

or: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&longitude=1.45&latitude=43.55&htol=0.35&columns=network_name,station_id,station_name,station_country,station_state,id,parameter_label

or: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&boundingbox=[1.2,43.3,1.7,43.9]&columns=network_name,station_id,station_name,station_country,station_state,id,parameter_label

Result:

[["AIRBASE", "FR12024", "BALMA", "France", "Midi-Pyr\u00e9n\u00e9es", 23785, "O3"], ["AIRBASE", "FR12030", "BERTHELOT", "France", "Midi-Pyr\u00e9n\u00e9es", 23796, "O3"], ["AIRBASE", "FR1054A", "BERTHELOT12", "France", "Midi-Pyr\u00e9n\u00e9es", 23797, "O3"], ["AIRBASE", "FR12001", "COLOMIERS", "France", "Midi-Pyr\u00e9n\u00e9es", 23863, "O3"], ["AIRBASE", "FR12037", "DOAS TOULOUSE", "France", "Midi-Pyr\u00e9n\u00e9es", 23912, "O3"], ["AIRBASE", "FR12004", "ECOLE M.JACQUIER", "France", "Midi-Pyr\u00e9n\u00e9es", 23930, "O3"], ["AIRBASE", "FR12021", "MAZADES", "France", "Midi-Pyr\u00e9n\u00e9es", 24091, "O3"], ["AIRBASE", "FR12041", "SICOVAL", "France", "Midi-Pyr\u00e9n\u00e9es", 24236, "O3"], ["AIRBASE", "FR12023", "calas", "France", "Midi-Pyr\u00e9n\u00e9es", 24393, "O3"]]

Use of altitude range:

Example: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&altitude=[-190,-10]&columns=network_name,station_id,station_name,station_country,station_alt

or: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&altitude=-100&vtol=90&columns=network_name,station_id,station_name,station_country,station_alt

Result:

[["AQS", "06-025-0002", "HOVLEY, BRAWLEY", "United States of America", -42.3], ["AQS", "06-025-1002", "1414 STATE ST., EL CENTRO", "United States of America", -12.7], ["AQS", "06-025-2001", "GENTRY & SINCLAIR, 6 MI NW OF CALIPATRIA", "United States of America", -69.8], ["AQS", "06-025-4001", "STATE ROUTE 86, WESTMORELAND", "United States of America", -61.6]]

Limit query to data series within a certain date range and with minimum length:

Example: https://join.fz-juelich.de/services/rest/surfacedata/search/?parameter_name=o3&data_before=1990-01-01%2000:00&data_after=1988-12-31%2023:00&min_data_length=10&columns=network_name,station_id,data_start_date,data_end_date,id

Result:

[["AQS", "37-051-1002", "1989-04-01T05:00:00", "1996-11-01T04:00:00", 27509], ["NAPS", "60607", "1988-01-01T05:00:00", "2004-07-09T15:00:00", 28438], ["UBA", "DEBW012", "1990-01-01T00:00:00", "2000-10-01T23:00:00", 20019], ["AQS", "45-031-0002", "1980-04-23T22:00:00", "1990-11-01T15:00:00", 27838], ["AIRBASE", "AT31904", "1988-01-01T00:00:00", "2012-12-31T22:00:00", 22254], ["EMEP", "AT0034G", "1990-01-01T00:00:00", "2013-12-31T23:00:00", 25683], . . . ]


2.6 Stats

 Calculate and return a set of statistics/metrics for a given series_id. Please note that hourly values cannot be retrieved from the TOAR database.

https://join.fz-juelich.de/services/rest/surfacedata/stats/?id=SERIESID[&QUERY-OPTIONS]

where QUERY-OPTIONS are:

sampling= <string> (statistical sampling interval: "daily", "monthly", "seasonal", "vegseason", "summer", "annual"; default: "monthly")

statistics= <string> (list of strings with the statistics/metrics names to be evaluated; see section 3.3 for details. Default: ["average_values", "standard_deviation", "value_count"]).

daterange = <2-element list of date strings with format YYYY-MM-DD hh:mm> (restrict processing of data to given daterange)

data_capture = <float> (data_capture threshold; default: 0.75)

format = <string> (json|html)

The seriesid (id=) must be given. Only one series can be processed with one request. Data series which are embargoed will not be processed.

Response: dictionary structure with datetime as key for the datetime values, the variable names as keys for the variable values, and metadata for the complete metadata of the seriesid.

Example:https://join.fz-juelich.de/services/rest/surfacedata/stats/?id=26688

Result:

{"datetime":["1998-04-01T00:00:00", "1998-05-01T00:00:00", . . . ,"2008-11-01T00:00:00"], "mean:[31.916784203102964,39.48547717842324, . . . ,NaN], "stddev":[19.290160626305752,30.47583388269126, . . . ,NaN], "count":[709.0,723.0, . . . ,5.0], "metadata":{"numid":11573,"network_name":"AQS", "station_id":"13-113-0001", . . . }}

Explanation:

This query returns monthly values of mean, standard deviation, and value count of ozone measurements at the station Dot storage facility in Georgia, USA (use the series service to obtain the series_id of this query).

Example:

https://join.fz-juelich.de/services/rest/surfacedata/stats/?id=21919&sampling=seasonal&statistics=dma8epax,somo35&daterange=[2010-01-01%2000:00,2016-12-31%2023:00]

Result:

{"datetime":["1975-01-01T00:00:00", "1976-01-01T00:00:00", "1977-01-01T00:00:00", "1978-01-01T00:00:00", "1979-01-01T00:00:00", "1980-01-01T00:00:00", "1981-01-01T00:00:00", "1982-01-01T00:00:00", "1983-01-01T00:00:00", "1984-01-01T00:00:00", . . . ,"2016-01-01T00:00:00"], "dma8epax-DJF":[NaN,38.12499999999992,34.637499999999875, . . . ], "dma8epax-MAM":[38.52499999999999,31.449999999999992, . . . ], "dma8epax-JJA":[37.89999999999999,38.937500000000334, . . . ], "dma8epax-SON":[39.93749999999998,31.637500000000006, . . . ], "somo35-DJF":[NaN,50.91249999999816,8.463662790699093, . . . ], . . . "metadata":{"numid":4610,"network_name":"GAW", "station_id": "SPO789S00", . . . }}

Explanation:

This query requests seasonal statistics of the ozone metrics DMA8EPAX (4th highest daily maximum 8-hour average) and SOMO35 (a European health-related metric) at the GAW station South Pole. Note that seasonal statistics are reported as one variable per season (with suffixes -DJF, -MAM, -JJA, and -SON) and that the datetime values for seasonal statistics always point to the beginning of the year.

Example:https://join.fz-juelich.de/services/rest/surfacedata/stats/?id=21843&sampling=vegseason&statistics=aot40,w126,average_values&format=json

Result:

{"datetime":["2004-01-01T00:00:00", "2005-01-01T00:00:00", . . . , "2014-01-01T00:00:00"], "aot40-wheat-warm_temperate_moist-SH":[36.18,1.9600000000000009, . . . ], "aot40-rice-warm_temperate_moist-SH":[NaN,5.960371057513915, . . . ], "w126-wheat-warm_temperate_moist-SH":[383.4789063213843, . . . ], "w126-rice-warm_temperate_moist-SH":[NaN,88.82615555280175, . . . ], "mean-wheat-warm_temperate_moist-SH":[30.077629151291493, . . . ], "mean-rice-warm_temperate_moist-SH":[16.90771708683472, . . . ], "metadata":{"numid":4615,"network_name":"GAW", "station_id": "CGO540S00", . . . }}

Explanation:

In this query vegetation-related query, the metrics mean, AOT40, and W126 are requested as aggregates over the respective growing seasons of rice and wheat (for details, see Schultz et al. (2017) and in particular supplement 1). Note that the variable names contain a suffix that describes the climatic zone of the station, which is then used to determine the rice and wheat vegetation periods at that location.

Example: https://join.fz-juelich.de/services/rest/surfacedata/stats/?sampling=annual&statistics=drmdmax1h,dma8eu,nvgt070&data_capture=0.9&id=40555

Result:

{"datetime": ["2003-01-01 00:00", "2004-01-01 00:00", "2005-01-01 00:00", "2006-01-01 00:00", "2007-01-01 00:00", "2008-01-01 00:00", "2009-01-01 00:00", "2010-01-01 00:00", "2011-01-01 00:00", "2012-01-01 00:00"], "drmdmax1h": [-999.0, 61.76856777777774, 71.27497555555554, 78.03873358024691, 88.69831481481471, 71.34341111111111, 68.40208522727261, 60.49619999999985, 74.232043181818, 65.4746465116277], "day_of_drmdmax1h": [1, 357, 47, 12, 280, 101, 295, 1, 55, 54], "dma8eu": [NaN, 54.12737500000007, NaN, NaN, NaN, 65.24050000000003, 64.58678750000038, 57.46133750000065, 62.03732500000059, 56.611512500000636], "nvgt070": [NaN, 4.0, NaN, NaN, NaN, 12.0, 9.0, 1.0, 4.0, 2.0], "metadata": {"numid": 8423, "network_name": "AIRBASE", "station_id": "FR03068", "station_local_id": "FR03068", "station_type": "traffic", "station_type_of_area": "urban", "station_category": "unknown", "station_name": "TOULON FOCH", "station_country": "France", "station_state": . . . ,"parameter_name": "no2", . . . }}

Explanation:

This query generates annual statistics of the three-month-running-average daily maximum value, the 26-highest daily maximum 8-hour-average (according to the EU time window), and the number of days where the maximum 8-hour average exceeds 70 ppb. Note that in addition to drmdmax1h also the day of this maximum value is reported (day_of_drmdmax1h). Furthermore, the data_capture threshold for this query was changed to 0.9 (implying that 90% of the data must be valid in order to return a valid result) from the default value of 0.75. Note also that the selected data series is actually an NO2 measurement series. This is to demonstrate that the same statistics can technically be applied to different variables.


3. TOAR database columns and query arguments

3.1 Database table stations

Database field name

(to be used in columns argument

of search service)

Description

Use in query arguments

numid

internal serial number of the station

station_numid=<value or list>

network_name

name of the measurement network

network_name = <string or list>

station_id

station code within the network

station_id = <string or list>

station_type

characterisation of site. Normally one of "background", "industrial", "traffic"

station_type = <string or list>

station_type_of_area

characterisation of station environment. Normally one of "urban", "suburban", "rural", "remote"

station_type_of_area = <string or list>

Note that this classification depends on the provider of the dataset and is not uniform. Use station_toar_category for a universal characterization.

station_category

other classification of stations (e.g. GAW category (global, regional, contributing))

station_category = <string or list>

station_name

full name of the station. Unicode characters are allowed.

station_name = <string>

This argument is interpreted as regular expression.

station_country

country which operates the station

station_country = <string or list>

station_state

province/state/territory to which station belongs (may be blank)

station_state = <string or list>

station_lon

longitude coordinate of station (decimal degrees_east). This is our best estimate of the station location which is not always identical to the reported station coordinates.

longitude = <float or 2-element list of floats>

Longitude ranges can be specified either by using a 2-element float list, or by using one center value and the htol = <float> argument. Htol applies to longitude and latitude.

Another option is to use the boundingbox = [lon0,lat0,lon1,lat1] argument.

station_lat

latitude coordinate of station (decimal degrees_north). This is our best estimate of the station location which is not always identical to the reported station coordinates.

latitude = <float or 2-element list of floats>

Latitude ranges can be specified either by using a 2-element float list, or by using one center value and the htol = <float> argument. Htol applies to longitude and latitude.

Another option is to use the boundingbox = [lon0,lat0,lon1,lat1] argument.

station_alt

altitude of station (in m above sea level). This is our best estimate of the station altitude, which is not always identical to the reported station altitude, but frequently uses the elevation from google earth instead (see station_alt_flag).

altitude = <float or 2-element list of floats>

Altitude ranges can be specified either by using a 2-element float list, or by using one center value and the vtol = <float> argument.

station_alt_flag

Flag value to document where station_alt was taken from.

0 = Reported station altitude

1 = Google maps elevation

2 = ETOPO1 elevation

3 = Station report or similar

4 = Personal communication

5 = Other source

station_alt_flag = <int or list of ints>

station_coordinate_status

an integer flag indicating our knowledge about the real station location. Note that this flag has been introduced rather late during the TOAR QA process and it may thus not always reflect the actual status of verification.

Flag values are:

-1: not checked (default value)

0: verified by google earth or other means. This means that a building or container which looks like a measurement site could be visibly identified or that a google earth feature is consistent with a detailed station description and is found at the location that is given in the station description. While in most cases the coordinates associated with a flag value of 0 will be exact within 10 metres or so, there are some stations where the accuracy is lower, for example if the air quality monitoring site is part of a larger campus and we could not exactly identify the building or container site of the air quality measurements.

1: verification not possible, but no reason to doubt that the measurement location should be accurate to within 100 metres or so. This means that no obvious station feature could be seen on google earth, but the area corresponds to the station description and could be a place where measurements are made.

2: unspecified potential issue with the station coordinates. This means that after checking the station location on google earth, comparing the reported station altitude to the google elevation, and looking at the station_type, station_type_of_area, and station_category information, something appears wrong, but for lack of better knowledge we retain station coordinates as given. This flag value is used particularly in cases when the coordinates of the same station are reported differently in various archives and if we could not locate the exact station location on google earth.

3: obvious error in station coordinate information. For example, a continental site is located in an ocean or lake, the measurement site is in the middle of a dense forest, etc. The station coordinates could not be corrected for lack of better information.

4: severe mismatch between reported station altitude and google elevation at station location (> 100 m) indicating wrong station coordinates. This flag value is only set after a potential correction of the station_alt value (see station_alt_flag), i.e. if we could not resolve a gross altitude difference. Note that for measurement sites on tall towers or in mountaineous terrain, altitude differences > 100 m may be correct and the coordinate status will not be flagged as 4 then.

5: no coordinates available -- given coordinates are completely invented!

6: no station metadata available -- given metadata is completely invented!

station_coordinate_status = <int or list of ints>

station_reported_alt

This is the station altitude as reported by the data provider. Note: due to edits of obvious station coordinate errores before introducing the coordinate flagging scheme, there may be cases where the reported altitude in our database differs from the reported altitude in the original data sets.

station_reported_alt = <float or 2-element list of floats>

station_google_alt

Terrain elevation derived from the google maps API (see https://maps.googleapis.com/maps/api/elevation/json?locations=47.05444,12.958342; example coordinates of Sonnblick, Austria).

station_google_alt = <float or 2-element list of floats>

google_resolution

The horizontal resolution of google maps at the station location. This provides some indication of the accuracy of the station_google_alt information.

N.A.

station_etopo_alt

Terrain elevation at the station location from the ~1 km resolution ETOPO1 dataset.

station_etopo_alt = <float or 2-element list of floats>

station_etopo_min_alt_5km

Minimum elevation from the ETOPO1 dataset in an area of 5 km radius around the station location. This can be used to find out if a high altitude station is located in mountaineous terrain or on a plateau (see station_etopo_relative_alt).

N.A.

station_etopo_relative_alt

Station elevation above the surrounding area. Derived by subtracting the minimum altitude within a 5 km radius around the station location from the actual station altitude. The area altitude is obtained from the etopo1 map.

station_etopo_relative_alt = <float or 2-element list of floats>

station_timezone

Time zone of station; Note that all data will be stored as UTC, but the timezone information is needed to convert data back to local time for display.

station_timezone = <string or list of strings>

Example string: Europe/Madrid

station_population_density

Year 2010 human population per square km from CIESIN GPW v3 (original horizontal resolution: 2.5 arc minutes)

station_population_density = <float or 2-element list of floats>

station_max_population_density_5km

Maximum population density in a radius of 5 km around the station location.

 

station_max_population_density_25km

Maximum population density in a radius of 25 km around the station location.

 

station_nightlight_1km

Year 2013 Nighttime lights brightness values from NOAA DMSP (original horizontal resolution: 0.925 km)

 

station_nightlight_5km

Year 2013 Nighttime lights brightness values (original horizontal resolution: 5 km)

 

station_max_nightlight_25km

Maximum nighttime light intensity in a radius of 25 km around the station location.

 

station_nox_emissions

Year 2010 NOx emissions from EDGAR HTAP inventory V2 in units of g m-2 yr-1 (original resolution: 0.1 degrees)

 

station_omi_no2_column

Average 2011-2015 tropospheric NO2 columns from OMI at 0.1 degree resolution (Env. Canada) in units of 10^15 molecules cm-2.

 

station_rice_production

Year 2000 rice production amount from FAO GAEZ at station location (units: thousand tons; original resolution: 5 arc minutes)

 

station_wheat_production

Y2000 wheat production amount from FAO GAEZ at station location (units: thousand tons; original resolution: 5 arc minutes)

 

station_climatic_zone

Climatic zone according to IPCC, 2006:

0: unclassified

1: Warm Temperate Moist

2: Warm Temperate Dry

3: Cool Temperate Moist

4: Cool Temperate Dry

5: Polar Moist

6: Polar Dry

7: Boreal Moist

8: Boreal Dry

9: Tropical Montane

10: Tropical Wet

11: Tropical Moist

12: Tropical Dry

(original resolution: 5 arc minutes)

station_climatic_zone = <string or list of strings>

station_htap_region

An integer denoting the "tier1" region defined in the task force on hemispheric transport of air pollution (TFHTAP) coordinated model studies (see http://www.htap.org). Region codes are:

02: OCN Non-arctic/Antarctic Ocean

03: NAM US+Canada (upto 66 N; polar circle)

04: EUR Western + Eastern EU+Turkey (upto 66 N polar circle)

05: SAS South Asia: India, Nepal, Pakistan, Afghanistan, Bangadesh, Sri Lanka

06: EAS East Asia: China, Korea, Japan

07: SEA South East Asia

08: PAN Pacific, Australia+ New Zealand

09: NAF Northern Africa+Sahara+Sahel

10: SAF Sub Saharan/sub Sahel Africa

11: MDE Middle East: S. Arabia, Oman, etc, Iran, Iraq

12: MCA Mexico, Central America, Caribbean, Guyanas, Venezuela, Columbia

13: SAM S. America

14: RBU Russia, Belarussia, Ukraine

15: CAS Central Asia

16: NPO Arctic Circle (North of 66 N)+Greenland

17: SPO Antarctic

(original resolution: 0.1 degrees)

station_htap_region =

station_dominant_landcover

The dominant IGBP landcover classification at the station location extracted from the MODIS MCD12C1 dataset (original resolution: 0.05 degrees). Landcover type values are:

0: Water

1: Evergreen Needleleaf forest

2: Evergreen Broadleaf forest

3: Deciduous Needleleaf forest

4: Deciduous Broadleaf forest

5: Mixed forest

6: Closed shrublands

7: Open shrublands

8: Woody savannas

9: Savannas

10: Grasslands

11: Permanent wetlands

12: Croplands

13: Urban and built-up

14: Cropland/Natural vegetation mosaic

15: Snow and ice

16: Barren or sparsely vegetated

255: Fill Value/Unclassified

station_dominant_landcover =

station_landcover_description

Text information about the landcover types and their area fractions in a radius of 25 km around the station location.

N.A.

station_toar_category

A station classification for the Tropsopheric Ozone Assessment Report based on the station proxy data that are stored in the database.

0: unclassified

1: rural, low elevation: derived as (station_omi_no2_column <= 8 and station_nightlight_5km <= 25 and station_population_density <= 3000 and station_max_population_density_5km <= 30000 and station_google_alt <= 1500 and station_etopo_relative_alt < 500). Note that this scheme may not catch all sites that are designated as rural. It will, however, provide a selection with reasonable certainty that no urban sites are included.

2: rural, high elevation: (station_omi_no2_column <= 8 and station_nightlight_5km <= 25 and station_population_density <= 3000 and (station_google_alt > 1500 or (station_google_alt > 800 and station_etopo_relative_alt < 500)).

3: urban; classified (station_population_density >= 15000 and station_nightlight_1km >= 60 and station_max_nightlight_25km == 63). Again, the intention here is to make reasonably sure that a site classified as urban really carries an urban signature.

station_toar_category = <>

3.2 Database table parameter_series

Database field name

(to be used in columns argument

of search service)

Description

Use in query arguments

id

an internal serial number

series_id = <int or list of ints>

Note that the stats service only accepts a single series_id, and the argument name for stats is id.

parameter_name

Name of the species or variable (all lower case)

parameter_name = <string or list of strings>

parameter_label

Automatically generated label of a parameter_series. The label consists of the capitalized parameter_name and additional information if this is needed to uniquely identify a data series by name. The additional information can consist of the parameter_contributor_shortname and/or of the parameter_attribute

N.A.

parameter_attribute

A series-specific attribute that may distinguish two series of the same parameter measured at the same station (e.g. all, filtered, etc.)

parameter_attribute = <string>

parameter_sampling_type

The method how observations were sampled. Standard values are: continuous, flask, filter

parameter_sampling_type = <string>

parameter_measurement_method

Instrument principle of measurement. Example (for ozone): UV absorption

parameter_measurement_method = <string>

parameter_original_units

Physical units in which parameter values were expressed in the original data files

N.A.

parameter_calibration

Information on the calibration of the parameter, such as calibration procedure and/or calibration scale

N.A.

parameter_contributor_shortname

Abbreviated string of parameter_contributor

parameter_contributor_shortname = <string or list of strings>

parameter_contributor

Institute or person name who provided data to network datacenter. If more than one contributor exists, the names will be separated by ;

N.A.

parameter_contributor_country

Country of contributor

parameter_contributor_country = <string or list of strings>

parameter_pi

Name of principal investigator of dataseries

N.A.

parameter_pi_email

Email of parameter_pi

N.A.

parameter_dataset_type

The type of the high-frequency data ("hourly", "event", etc.); this determines the data table name (e.g. "o3_hourly", "co_event").

parameter_dataset_type = <string>

parameter_status

an internal status flag which may be used to suppress display or analysis of an individual timeseries. Flag values are:

0: everything OK - use this dataset in any analyses

1: data was embargoed by originator; do not display publicly

2: NRT data ingestion; no metadata available, metadata was invented

N.A.

comments

Any comments on a data series, for example from the data QA in TOAR

N.A.

creation_date

Creation date when this entry was added in parameter_series table

creation_date =

modification_date

Date when this entry is last modified

modification_date

data_start_date

Start date of the data series

data_start_date

You can also search for series with a minimum length via the min_data_length argument; it accepts a float value in years.

data_end_date

End date of the data series. Note: data start and end date are not considering gaps (missing data)

data_end_date

3.3 List of statistics/metrics for stats service

More information on these statistics and metrics can be found in supplement 1 of Schultz et al. (2017).

Name

Description

data_capture

Fraction of valid (hourly) values available in the aggregation period.

average_values

Daily, monthly, … average value. No data capture criterion is applied, i.e. a daily average is valid if at least one hourly value of the day is present.

daytime_avg

Daytime average is defined as average of hourly values for the 12-h period from 08:00h to 19:59h solar time. All hourly values in the aggregation period are averaged, and the resulting value is valid if at least 75% of hourly values are present.

nighttime_avg

Same as daytime_average but accumulated over the daily interval from 20:00 h to 07:59 h solar time.

median

Median mixing ratio over the aggregation period. At least 10 valid values must be present to accept a median value as valid.

perc05

Fifth-percentile of hourly values in the aggregation period. At least 10 valid values must be present to accept a percentile value as valid.

perc10

As perc05, but for the 10th-percentile.

perc25

As perc05, but for the 25th-percentile.

perc75

As perc05, but for the 75th-percentile.

perc90

As perc05, but for the 90th-percentile.

perc95

As perc05, but for the 95th-percentile.

perc98

As perc05, but for the 98th-percentile. This percentile is only calculated for “summer” or “annual” aggregation periods.

dma8epa

Daily maximum 8-hour average statistics according to the US EPA definition. 8-hour averages are calculated for 24 bins starting at 0 h local time. The 8-h running mean for a particular hour is calculated on the concentration for that hour plus the following 7 hours. If less than 75% of data are present (i.e. less than 6 hours), the average is considered missing.

When the aggregation period is “seasonal”, “summer”, or “annual”, the 4th highest daily 8-hour maximum of the aggregation period will be computed.

Note that in contrast to the official EPA definition, a daily value is considered valid if at least 1 8-hour average is valid.

dma8epa_strict

As dma8epa, but additionally, a diurnal 8-hour maximum value is only saved if at least 18 out of the 24 8-hour averages are valid. This is the official dma8epa definition.

dma8epax

As dma8epa, but using the new US EPA definition of the daily 8-hour window from 7 h local time to 23 h local time.

dma8epax_strict

As dma8epax, but additionally, a diurnal 8-hour maximum value is only saved if at least 13 out of the 17 8-hour averages are valid. This is the official dma8epax definition.

dma8eu

As dma8epa, but using the EU definition of the daily 8-hour window starting from 17 h of the previous day.

When the aggregation period is “seasonal”, “summer”, or “annual”, the 26th highest daily 8-hour maximum of the aggregation period will be computed.

dma8eu_strict

As dma8eu, but additionally, a diurnal 8-hour maximum value is only saved if at least 18 out of the 24 8-hour averages are valid. This is the official dma8eu definition.

avgdma8epax

Average value of the daily dma8epax statistics during the aggregation period.

drmdmax1h

Maximum of the 3-months running mean of daily maximum 1-hour mixing ratios during the aggregation period.

This statistics also produces day_of_max_drmdmax1h, which is the Julian day in the year when the maximum value of the 3-months running mean of daily maximum 1-hour concentrations occurred.

somo10

Sum of excess of daily maximum 8-h means (EU Airbase standard with relaxed criterion: dma8eu) over the cut-off of 10 ppb, i.e. 20 μg/m3 calculated for all days in the aggregation period. SOMO10 will be set to missing if less than 75% of days are available. The quantity will be weighted by the number of theoretical days over the number of available days.

somo10_strict

As somo10, but using dma8eu_strict for data capture.

somo35

As somo10, but accumulating ozone values above 35 ppb.

somo35_strict

As somo10_strict, but accumulating ozone values above 35 ppb.

w90

Daily maximum W90 5-h Experimental Exposure Index:

EI = SUM(wiCi) with weight wi = 1/[1 +M exp(-ACi/1000)], where M = 1400, A = 90, and where Ci is the hourly average O3 mixing ratio in units of ppb (Lefohn et al., 2010). For each day, 24 W90 indices are computed as 5-hour sums, requiring that at least 4 of the 5 hours are valid data (75%). If a sample consists of only 4 data points, a fifth value shall be constructed from averaging the 4 valid mixing ratios.

For aggregation periods “month”, “season”, “summer”, or “annual”, the 4th highest W90 value is computed, but only if at least 75% of days in this period have valid W90 values.

aot40

Daily 12-h AOT40 values are accumulated using hourly values for the 12-h period from 08:00h until 19:59h solar time interval. AOT40 is defined as cumulative ozone above 40 ppb. If less than 75% of hourly values (i.e. less than 9 out of 12 hours) are present, the cumulative AOT40 is considered missing. When there exist 75% or greater data capture in the daily 12-h window, the scaling by fractional data capture (ntotal/nvalid) is utilized.

For monthly, seasonal, summer, or annual statistics, the daily AOT40 values are accumulated over the aggregation period and scaled by (ntotal/nvalid) days. If less than 75% of days are valid, the value is considered missing.

daylight_aot40

As aot40, but using solar elevation > 5 degrees to identify “daytime” hours.

w126

Daily W126 index is accumulated using hourly values for the 12-h period from 08:00h until 19:59h solar time interval. W126 = SUM(wiCi) with weight wi = 1/[1 +M exp(-ACi/1000)], where M = 4403, A = 126, and where Ci is the hourly average O3 mixing ratio in units of ppb. If there are less than 9 valid hourly values in the 12 hour window, the daily value is considered missing. When there exist 75% or greater data capture in the daily 12-h window, the scaling by fractional data capture (ntotal/nvalid) is utilized.

Seasonal, summer, or annual statistics are calculated as sum over the daily W126 values. Results are marked as missing if less than 75% of daily values are valid.

w126_24h

As w126, but using all 24 hours of a day.

nvgt050

Number of days with exceedance of the dma8epax value above 50 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt060

Number of days with exceedance of the dma8epax value above 60 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt070

Number of days with exceedance of the dma8epax value above 70 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt080

Number of days with exceedance of the dma8epax value above 80 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt090

Number of days with exceedance of the daily max1h_values above 90 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt100

Number of days with exceedance of the daily max1h_values above 100 ppb. The value is marked as missing if less than 75% of days contain valid data.

nvgt120

Number of days with exceedance of the daily max1h_values above 120 ppb. The value is marked as missing if less than 75% of days contain valid data.

4. A Python example

 The following python code provides a recipe how ozone metrics can be obtained from the TOAR database through the REST services described above. In this example, ozone statistics are obtained for all sites in California, US. The STATION_STATE variable can of course be replaced by other database parameters as described above. We recommend that changes to the request URLs are tested via a web browser. The user is assumed to be familiar with python, including the numpy and pandas libraries. The actual data processing is not included in this program and must be added by the user.

 


# -*- coding: utf-8 -*-
"""
toar_rest_demo: demonstrate use of JOIN REST interface to TOAR database from python

Created on Thu Jan 26 09:01:19 2017

@author: m.g.schultz, forschungszentrum juelich, germany
"""

from urllib.request import urlopen
import json

# user vars
STATION_STATE="California"


BASEURL = "https://join.fz-juelich.de/services/rest/surfacedata/"
# first URL to find all sites matching certain conditions (here station_state)
# note that %s will be replaced by STATION_STATE later
URL1 = "stations/?station_state=%s&parameter_name=o3&columns=id,network_name,station_id&format=json"
# scond URL to return data for one data series at a time
# note that %i will be replaced by the respective dataset id
URL2 = "stats/?id=%i&sampling=seasonal&statistics=dma8epax,daytime_avg,median,perc98&format=json"


# first: find all sites
print("Opening URL1...")
response = urlopen(BASEURL + URL1 % (STATION_STATE)).read().decode('utf-8')
print("response = ", response[:400], " ... ")
metadata = json.loads(response)

# now loop over data series
# note: we assume that there is only one data series per station
for s in metadata:
    all_dataseries = s[0]
    if len(all_dataseries) > 1:
        raise ValueError("More than one data series found at %s. Modify your code" % json.dumps(s))
    print("Opening URL2...")
    dresponse = urlopen(BASEURL + URL2 % (all_dataseries[0])).read().decode('utf-8')
    data = json.loads(dresponse)
    # do something with the data
    print("Data columns: ", data.keys())
    print("metadata: ", data['metadata'])
    print("datetime: ", data['datetime'])
    print("springtime median: ", data['median-MAM'])
    # for demo purposes we break the loop here
    break