GISCobservations

This thing is named GISC observations as we get some data from the DWD GISC interface. Ideally all data would come from this one system, however, that was not possible (did not got access, furthermore station 11320 Universitaet Innsbruck is not included in the GISC at all).

Note

There are is a version extractBUFReccodes.py which has the very same structure as extractBUFRperl.py and makes use of the ecmwf eccodes python library to extract the BUFR files. This is non-finished code! However, in case one has to switch over to the eccodes library one might use this draft. All you would have to do at the end is to change the include in the bufr.py file.

However, the documentation only provides information about the currently used extractBUFRperl.py script.

The worker script

The main script to be executed is bufr.py. If started without any input arguments the default input folders will be checked for new incoming bufr files. There are two incoming folders specified via config.conf. Depending on the folder where the files are stored the data get different labels in the database, either essential (means open data, can be used and downloaded by everyone) or additional (closed data, access will only be given to logged in users when using the Wetterturnier Wordpress Plugin. For both data types (essential and additional) an incoming directory (indir) and an outgoing directory (outdir) is specified in the config.conf file.

The script bufr.py automatically checks the incoming folders for new files. If there are new files the files are processed using extractBUFRperl::extractBUFR and moved into the output directory. The will be stored either in a subfolder error if the BUFR file could not have been extracted/processed or in a subfolder processed if successfully processed.

To run the script please note that the corresponding BUFR tables have to be available. They can either be located in the system wide default folder or specified via environment variable BUFR TABLES. Note that some BUFR files require custom BUFR TABLE files (e.g., for a specific subcentre using custom BUFR entries). WMO style BUFR TABLES can for example be downloaded on the ECMWF website. WARNING: the BUFR tables in this archive have the suffix .txt while bufrread.pl is looking for .TXT files. Simple solution: link all your files .txt to .TXT and try.

To get this script to run:

## Make a copy of the config template file and adjust
## the settings, namely mysql database access information
## and input/output directories in the [essentials] and [additionals]
## section.
cp config.conf.template config.conf

## If required: set BUFR TABLES environment variable
export BUFR_TABLES=/path/to/your/bufrtables

## Execute script (keep care using the virtualenv if you do so)
python bufr.py

For testing a specific file can be specified using the -f/--file flag. In this case this file will be read and not moved after execution.

## Processing af specific bufr file (keep care using the virtualenv if you do so)
python bufr.py --file <path/to/buf/file>

The cleanup script

To keep the databaes small only a subset of data will be archived while the live table is a rolling table containing the last N days of data only. Furthermore, old unused BUFR files should be removed from the disc. The CleanUp.py script does this job using the configuration from the config.conf file (mysql access config and the [cleanup] section).

To get the script running:

## Make a copy of the config template file if you havn't done this
## yet and adjsut the settings, namely mysql database access information
## and input/output directories in the [essentials] and [additionals]
## section. For the archive table: check the list of stations in the
## [cleanup] section which should be moved from the live table (``srctable``)
## to the archive table (``dsttable``).
cp config.conf.template config.conf

## Run the script
python cleanup.py

The script …

  • Reads the config.conf file
  • Creates an object of class cleanup * Deletes old raw (BUFR) files from the disc * Moves a subset of observations from the live table into the archive table * Removes old observations from the live table

Class: cleanup

This is the class used by the CleanUp.py.

class cleanup.cleanup(config)[source]

Setting up the class to clean files and databases used for processing incoming observations.

Parameters:config (str) – Name of the config file to read.
cleanup_live_table()[source]

We have a live and an archive table. These two tables are defined in the config.conf file. Here we are deleting all observations from the live table (‘srctable’) which are older than about ‘db_days’ days (as well defined in the config.conf file).

closeDB()[source]

Closing database.

delete_old_raw_files()[source]

Method deleting files from disc in the directory ‘outdir’ as defined in the config.conf file. We do NOT decide between synop/bufr or processed/error here. Just kill them if they are older than ‘file_days’ as specified in config.conf.

getOldFiles(dirPath, maxage, postfix)[source]

List old files on disc.

Parameters:
  • dirPath (str) – Path to the directory which should be checked.
  • maxage (int) – Timestamp, files older than this will be considered to be old and marked for deletion.
  • postfix (str) – File postfix. Only files where the postfix matches (not case sensitive) will be considered.
Returns:

A list of all files under dirPath older than days.

Return type:

list

live_database_to_archive()[source]

I would like to store some observation data longer than just a few days - however - we wont create a copy of the WMO observation data archive or simething. Therefore we are just archiving some stations as defined in ‘cleanup:stations’ in the config.conf file. Move them from ‘cleanup:srctable’ to ‘cleanup:dsttable’ (see config.conf file).

Class: extractBUFR

Main class, extracting observations from BUFR data files using the Geo::BUFR bufrread.pl script. bufrread.pl converts the BUFR files into ASCII whcih will be parsed by extractBUFRperl::extractBUFR and stored into the database.

class extractBUFRperl.extractBUFR(file, config, stint, verbose, filterfile=None)[source]

Main class, extracting data from the BUFR file.

This object uses subprocess.Popen to call the Geo::BUFR bufrread.pl file (see http://search.cpan.org/dist/Geo-BUFR/lib/Geo/BUFR.pm, https://wiki.met.no/bufr.pm/start). If not installed None will be returned. To install Geo::BUFR check the readme of the package. It is as simple as:

cpan Geo::BUFR

Please note that you will also have to have the BUFRTABLES installed on your system at either one of the default locations or by setting the environment variable BUFR_TABLES=<path> corresponding to the location of the bufr files.

BUFR Tables can e.g. be downloaded here: <https://software.ecmwf.int/wiki/display/BUFR/BUFRDC+Home>`_. The files in this archive are named .txt while .TXT files are expected. bufrread.pl will drop a corresponding message. Simply link the .txt files to a corresponding .TXT version in your BUFR_TABLES folder to get around this.

Parameters:
  • config (str) – Name of the config file.
  • stint (str) – Used to store a flag into the database from which source the messages come. In this case “bufr”. Keep in mind that the database column type is “ENUM” and only allows a distinct set of strings.
  • verbose (bool) – Boolean True/False whether the object should be verbose or not.
  • filterfile (str) – Default is None, a filter file can be specified forwarded to Geo::BUFR bufrread.pl.
__check_bufrdesc_and_add_if_necessary__(rec, param)[source]

Adding bufr entry to database table bufrdesc if necessary. Input rec is a bufrentry object. Input param has to be of class paramclass. Checks if entry is already in the bufrdesc database. If not, we have to add a row.

Parameters:
  • rec (bufrentry) – Object to be added.
  • param (bufrdesc) – Bufr description object.
__check_displacement__(rec)[source]

Check if current record is a time displacement specification. If so the value of the time displacement value will be returned as int in seconds. If not bool False is returned.

Parameters:rec (bufrentry) – Object to check.
Returns:Returns bool FALSE or int.
__check_sensorheight__(rec)[source]

Check if current record is a sensorheight specification. If so the value of the sensorheight value will be returned (float). If not a bool False is returned.

Parameters:rec (bufrentry) – Object to check.
Returns:Returns bool FALSE or float.
__check_verticalsign__(rec)[source]

Check if current record is a vertical significance specification. If so the value of the vertical significance value will be returned (absolute value as integer). If not a bool False is returned.

Parameters:rec (bufrentry) – Object to check.
Returns:Returns bool FALSE or int.
__get_param_obj__(search, displacement, verticalsign, sensorheight)[source]

The config file bufr_config.conf contains a set of parameter definitions. This method is used to finde the appropriate parameter description given the inputs which directly come from the BUFR entry extracted from the BUFR file using Geo::BUFR buffread.pl.

We are therefore matching each data line from the BUFR file with one of our specified parameter configs from the bufr_config.conf and use them to further process the data.

Parameters:
  • search (burentry) – Bufrentry object.
  • displacement (int) – Lates time displacement value, seconds.
  • verticalsign (int) – Latest vertical significance value.
  • sensorheight (float) – Latest sensor height value.
Returns:

Returns two values, the first one is a bool whether to drop the message or not. If no parameter entry can be matched to the current bufrentry this value is True (drop message, unknown). Else False will be returned (don’t drop). The second argument is bool False if we cannot find the parameter entry, or a parameter entry of class bufrdesc else.

__getval__(x)[source]

Get value: if the value is a string: simply return. Else convert value to float. If the value is extremely large or extremely small: return MISSING_VALUE.

Returns:Properly prepare the value.
__read_bufr_file__(file, filterfile=None)[source]

Function reading the BUFR file. Actually calling the perl Geo::BUFR library to convert the binary files into ASCII table and pase the output to extract the necessary information.

Parameters:
  • file (str) – Path/Name of the BUFR file (binary file).
  • filterfile (str) – Default None, dan be set and will be forwarded to Geo::BUFR bufrread.pl to set specific filters. If set only this subset of the bufr file will be extracted/processed.
Returns:

Returns a list of lists, each containing a set of bufrentry objects. The length of the most outer list corresponds to the number of messages in the BUFR file. The first nested lists are the messages each consisting of a set of bufrentry entries with the data.

Return type:

list

__showdata_sort_order__(force=None)[source]

Takes care of the order of the columns in the output.

__weakref__

list of weak references to the object (if defined)

commit()[source]

Alias for MySQLdb.commit.

cursor()[source]

Alias for MySQLdb.close.

Returns:Returns a MySQL.cursor object.
dbClose()[source]

Alias for MySQLdb.close.

dbConnect()[source]

Method to open the database connection. Uses the settings on self.config. No return, saves the database handler on the object itself.

extractdata()[source]

Looping trough self.raw (raw information returned by __read_bufr_file__ and prepares the data.

load_bufr_description(table)[source]

Loading data from ‘table’ and returns a list object containing one ‘bufrdesc’ object for each of the rows in the database.

Parameters:table (str) – Name of the database table containing the bufr descriptions.
Returns:Returns a list of bufrdesc objects containing the definition/description.
Return type:list
manipulatedata()[source]

Manipulate data. Is looking for some meta information such as wmoblock, statnr, year, month, hour, and minute and creates the columns datumsec (unix time stamp), stdmin (hour/minute integer, e.g., 7:00 UTC is 700), and statnr (a combination of the wmoblock and station number information from the bufr file).

prepare_data()[source]

Prepares the data. Puts the data we found bevore in the single messages into a matrix style variable called “res”. Stores parameter (column description of the matrix) and the data matrix into self.PREPARED.

showdata()[source]

Helper function to print the data to stdout.

showdropped()[source]

If a bufrentry cannot be attributed (is not defined by bufr_config.conf) we will ignore these lines. To see what has been dropped and whether there is important information being dropped the dropped lines will be kept.

This method allows to print the dropped lines to stdout.

update_stations()[source]

Update station database. Update the station database with the information from the bufr message. Plase note that we do simply update the database row and do not take care of history (e.g., if a station would be renamed or moved the latest name/location will be stored and the old information is simply overwritten).

write_to_db()[source]

Write data to database.

Class: bufrentry

extractBUFRperl::extractBUFR uses the perl library Geo::BUFR bufrread.pl to extract the binary BUFR files (called internally via subprocess.Popen)

The script bufrread.pl returns the content of the BUFR file in ASCII where each line in the data section corresponds to one BUFR entry. extractBUFRperl::extractBUFR stores each line in a extractBUFRperl::bufrentry object which are easy to iterate over.

class extractBUFRperl.bufrentry(string, width)[source]

This is a small helper class. I store all entries from the bufr file in such bufrentry classes. A bufrenry class contains the specification of one single message. E.g., bufrid, value, description.

Parameters:
  • string (str) – A bufrentry is a line as extracted by the Geo::BUFR bufrread.pl perl script.
  • width (int) – bufrread.pl allows to set a width for the description column. This width has to be known by bufrentry to be able to properly extract the information from this line.
__weakref__

list of weak references to the object (if defined)

show()[source]

Allows to print the content of this object, mainly for development.

Returns:No return, creates output on stdout.
string()[source]

Helper method to output the content of this object to console.

Returns:Returns the information from the object in a string format.

Class: bufrdesc

The class extractBUFRperl::extractBUFR uses extractBUFRperl::bufrdesc classes to handle the bufr parameter configuration read from the bufr_config.conf file. Each entry (bufrentry) read from the BUFR file has to match a parameter configured in bufr_config.conf and will be dropped else.

For ease of use the configuration of bufr_config.conf is read piece-wise and each config is stored as a extractBUFRperl::bufrdesc object.

class extractBUFRperl.bufrdesc(rec, cols)[source]

This is a small helper class. I am loading the bufrdesc database as a list ob such bufrdesc classes which are easily iteratable. Used to store each record (each row of the bufrdesc database table as an object which is easy to iterate over.

Parameters:
  • rec (tuple) – A record from the bufrdesc database table. The elements of the tuple are described by the second input argument cols.
  • cols (list) – List of str describing the elements in the first argument (rec tuple).
__weakref__

list of weak references to the object (if defined)

get(what)[source]

Returns element corresponding to input string ‘what’. If we cant find it in the columns from the database: stop!

Parameters:what (str) – Element to be returned.
Returns:Returns the corresponding element if available, else stop.
show()[source]

Shows content of the object