GISCobservations¶
This thing is named GISC observations as we get some data from the DWD GISC interface. Ideally all data would come from this one system, however, that was not possible (did not got access, furthermore station 11320 Universitaet Innsbruck is not included in the GISC at all).
Note
There are is a version extractBUFReccodes.py
which has the
very same structure as extractBUFRperl.py
and makes use of the
ecmwf eccodes python library to extract the BUFR files. This is
non-finished code! However, in case one has to switch over to
the eccodes library one might use this draft. All you would have to
do at the end is to change the include in the bufr.py
file.
However, the documentation only provides information about the
currently used extractBUFRperl.py
script.
The worker script¶
The main script to be executed is bufr.py
. If started without
any input arguments the default input folders will be checked for new
incoming bufr files. There are two incoming folders specified via
config.conf
. Depending on the folder where the files are stored
the data get different labels in the database, either essential (means
open data, can be used and downloaded by everyone) or additional (closed
data, access will only be given to logged in users when using the
Wetterturnier Wordpress Plugin.
For both data types (essential and additional) an incoming directory (indir
)
and an outgoing directory (outdir
) is specified in the config.conf
file.
The script bufr.py
automatically checks the incoming folders for new files.
If there are new files the files are processed using extractBUFRperl::extractBUFR
and moved into the output directory. The will be stored either in a subfolder
error
if the BUFR file could not have been extracted/processed or in a subfolder
processed
if successfully processed.
To run the script please note that the corresponding BUFR tables have to be
available. They can either be located in the system wide default folder or
specified via environment variable BUFR TABLES
.
Note that some BUFR files require custom BUFR TABLE files (e.g., for a specific
subcentre using custom BUFR entries). WMO style BUFR TABLES can for example be
downloaded on the ECMWF website.
WARNING: the BUFR tables in this archive have the suffix .txt
while
bufrread.pl
is looking for .TXT
files. Simple solution: link all your files
.txt
to .TXT
and try.
To get this script to run:
## Make a copy of the config template file and adjust
## the settings, namely mysql database access information
## and input/output directories in the [essentials] and [additionals]
## section.
cp config.conf.template config.conf
## If required: set BUFR TABLES environment variable
export BUFR_TABLES=/path/to/your/bufrtables
## Execute script (keep care using the virtualenv if you do so)
python bufr.py
For testing a specific file can be specified using the -f/--file
flag.
In this case this file will be read and not moved after execution.
## Processing af specific bufr file (keep care using the virtualenv if you do so)
python bufr.py --file <path/to/buf/file>
The cleanup script¶
To keep the databaes small only a subset of data will be archived while the
live table is a rolling table containing the last N days of data only.
Furthermore, old unused BUFR files should be removed from the disc.
The CleanUp.py
script does this job using the configuration from the
config.conf
file (mysql access config and the [cleanup]
section).
To get the script running:
## Make a copy of the config template file if you havn't done this
## yet and adjsut the settings, namely mysql database access information
## and input/output directories in the [essentials] and [additionals]
## section. For the archive table: check the list of stations in the
## [cleanup] section which should be moved from the live table (``srctable``)
## to the archive table (``dsttable``).
cp config.conf.template config.conf
## Run the script
python cleanup.py
The script …
- Reads the
config.conf
file - Creates an object of class
cleanup
* Deletes old raw (BUFR) files from the disc * Moves a subset of observations from the live table into the archive table * Removes old observations from the live table
Class: cleanup¶
This is the class used by the CleanUp.py
.
-
class
cleanup.
cleanup
(config)[source]¶ Setting up the class to clean files and databases used for processing incoming observations.
Parameters: config ( str
) – Name of the config file to read.-
cleanup_live_table
()[source]¶ We have a live and an archive table. These two tables are defined in the config.conf file. Here we are deleting all observations from the live table (‘srctable’) which are older than about ‘db_days’ days (as well defined in the config.conf file).
-
delete_old_raw_files
()[source]¶ Method deleting files from disc in the directory ‘outdir’ as defined in the config.conf file. We do NOT decide between synop/bufr or processed/error here. Just kill them if they are older than ‘file_days’ as specified in config.conf.
-
getOldFiles
(dirPath, maxage, postfix)[source]¶ List old files on disc.
Parameters: - dirPath (
str
) – Path to the directory which should be checked. - maxage (
int
) – Timestamp, files older than this will be considered to be old and marked for deletion. - postfix (
str
) – File postfix. Only files where the postfix matches (not case sensitive) will be considered.
Returns: A list of all files under dirPath older than days.
Return type: list
- dirPath (
-
live_database_to_archive
()[source]¶ I would like to store some observation data longer than just a few days - however - we wont create a copy of the WMO observation data archive or simething. Therefore we are just archiving some stations as defined in ‘cleanup:stations’ in the config.conf file. Move them from ‘cleanup:srctable’ to ‘cleanup:dsttable’ (see config.conf file).
-
Class: extractBUFR¶
Main class, extracting observations from BUFR data files using the
Geo::BUFR bufrread.pl
script. bufrread.pl
converts the BUFR files into ASCII whcih will be parsed
by extractBUFRperl::extractBUFR
and stored into the database.
-
class
extractBUFRperl.
extractBUFR
(file, config, stint, verbose, filterfile=None)[source]¶ Main class, extracting data from the BUFR file.
This object uses subprocess.Popen to call the Geo::BUFR bufrread.pl file (see http://search.cpan.org/dist/Geo-BUFR/lib/Geo/BUFR.pm, https://wiki.met.no/bufr.pm/start). If not installed None will be returned. To install Geo::BUFR check the readme of the package. It is as simple as:
cpan Geo::BUFR
Please note that you will also have to have the BUFRTABLES installed on your system at either one of the default locations or by setting the environment variable
BUFR_TABLES=<path>
corresponding to the location of the bufr files.BUFR Tables can e.g. be downloaded here: <https://software.ecmwf.int/wiki/display/BUFR/BUFRDC+Home>`_. The files in this archive are named
.txt
while.TXT
files are expected. bufrread.pl will drop a corresponding message. Simply link the.txt
files to a corresponding.TXT
version in your BUFR_TABLES folder to get around this.Parameters: - config (
str
) – Name of the config file. - stint (
str
) – Used to store a flag into the database from which source the messages come. In this case “bufr”. Keep in mind that the database column type is “ENUM” and only allows a distinct set of strings. - verbose (
bool
) – Boolean True/False whether the object should be verbose or not. - filterfile (
str
) – Default is None, a filter file can be specified forwarded to Geo::BUFR bufrread.pl.
-
__check_bufrdesc_and_add_if_necessary__
(rec, param)[source]¶ Adding bufr entry to database table bufrdesc if necessary. Input rec is a bufrentry object. Input param has to be of class paramclass. Checks if entry is already in the bufrdesc database. If not, we have to add a row.
Parameters:
-
__check_displacement__
(rec)[source]¶ Check if current record is a time displacement specification. If so the value of the time displacement value will be returned as
int
in seconds. If notbool
False
is returned.Parameters: rec ( bufrentry
) – Object to check.Returns: Returns bool
FALSE
orint
.
-
__check_sensorheight__
(rec)[source]¶ Check if current record is a sensorheight specification. If so the value of the sensorheight value will be returned (float). If not a
bool
False
is returned.Parameters: rec ( bufrentry
) – Object to check.Returns: Returns bool
FALSE
orfloat
.
-
__check_verticalsign__
(rec)[source]¶ Check if current record is a vertical significance specification. If so the value of the vertical significance value will be returned (absolute value as integer). If not a
bool
False
is returned.Parameters: rec ( bufrentry
) – Object to check.Returns: Returns bool
FALSE
orint
.
-
__get_param_obj__
(search, displacement, verticalsign, sensorheight)[source]¶ The config file
bufr_config.conf
contains a set of parameter definitions. This method is used to finde the appropriate parameter description given the inputs which directly come from the BUFR entry extracted from the BUFR file using Geo::BUFR buffread.pl.We are therefore matching each data line from the BUFR file with one of our specified parameter configs from the
bufr_config.conf
and use them to further process the data.Parameters: - search (
burentry
) – Bufrentry object. - displacement (
int
) – Lates time displacement value, seconds. - verticalsign (
int
) – Latest vertical significance value. - sensorheight (
float
) – Latest sensor height value.
Returns: Returns two values, the first one is a
bool
whether to drop the message or not. If no parameter entry can be matched to the current bufrentry this value isTrue
(drop message, unknown). ElseFalse
will be returned (don’t drop). The second argument isbool
False if we cannot find the parameter entry, or a parameter entry of classbufrdesc
else.- search (
-
__getval__
(x)[source]¶ Get value: if the value is a string: simply return. Else convert value to
float
. If the value is extremely large or extremely small: returnMISSING_VALUE
.Returns: Properly prepare the value.
-
__read_bufr_file__
(file, filterfile=None)[source]¶ Function reading the BUFR file. Actually calling the perl Geo::BUFR library to convert the binary files into ASCII table and pase the output to extract the necessary information.
Parameters: - file (
str
) – Path/Name of the BUFR file (binary file). - filterfile (
str
) – Default None, dan be set and will be forwarded to Geo::BUFR bufrread.pl to set specific filters. If set only this subset of the bufr file will be extracted/processed.
Returns: Returns a list of lists, each containing a set of
bufrentry
objects. The length of the most outer list corresponds to the number of messages in the BUFR file. The first nested lists are the messages each consisting of a set ofbufrentry
entries with the data.Return type: list
- file (
-
__weakref__
¶ list of weak references to the object (if defined)
-
dbConnect
()[source]¶ Method to open the database connection. Uses the settings on self.config. No return, saves the database handler on the object itself.
-
extractdata
()[source]¶ Looping trough self.raw (raw information returned by __read_bufr_file__ and prepares the data.
-
load_bufr_description
(table)[source]¶ Loading data from ‘table’ and returns a list object containing one ‘bufrdesc’ object for each of the rows in the database.
Parameters: table ( str
) – Name of the database table containing the bufr descriptions.Returns: Returns a list of bufrdesc
objects containing the definition/description.Return type: list
-
manipulatedata
()[source]¶ Manipulate data. Is looking for some meta information such as
wmoblock
,statnr
,year
,month
,hour
, andminute
and creates the columnsdatumsec
(unix time stamp),stdmin
(hour/minute integer, e.g., 7:00 UTC is 700), andstatnr
(a combination of the wmoblock and station number information from the bufr file).
-
prepare_data
()[source]¶ Prepares the data. Puts the data we found bevore in the single messages into a matrix style variable called “res”. Stores parameter (column description of the matrix) and the data matrix into self.PREPARED.
-
showdropped
()[source]¶ If a bufrentry cannot be attributed (is not defined by bufr_config.conf) we will ignore these lines. To see what has been dropped and whether there is important information being dropped the dropped lines will be kept.
This method allows to print the dropped lines to stdout.
-
update_stations
()[source]¶ Update station database. Update the station database with the information from the bufr message. Plase note that we do simply update the database row and do not take care of history (e.g., if a station would be renamed or moved the latest name/location will be stored and the old information is simply overwritten).
- config (
Class: bufrentry¶
extractBUFRperl::extractBUFR
uses the perl library
Geo::BUFR bufrread.pl
to extract the binary BUFR files (called internally via subprocess.Popen
)
The script bufrread.pl
returns the content of the BUFR file in ASCII where each
line in the data section corresponds to one BUFR entry.
extractBUFRperl::extractBUFR
stores each line in a
extractBUFRperl::bufrentry
object which are easy to iterate over.
-
class
extractBUFRperl.
bufrentry
(string, width)[source]¶ This is a small helper class. I store all entries from the bufr file in such bufrentry classes. A bufrenry class contains the specification of one single message. E.g., bufrid, value, description.
Parameters: - string (
str
) – A bufrentry is a line as extracted by the Geo::BUFR bufrread.pl perl script. - width (
int
) – bufrread.pl allows to set a width for the description column. This width has to be known bybufrentry
to be able to properly extract the information from this line.
-
__weakref__
¶ list of weak references to the object (if defined)
- string (
Class: bufrdesc¶
The class extractBUFRperl::extractBUFR
uses
extractBUFRperl::bufrdesc
classes to handle the bufr parameter
configuration read from the bufr_config.conf
file. Each entry
(bufrentry
) read from the BUFR file has to match a parameter configured
in bufr_config.conf
and will be dropped else.
For ease of use the configuration of bufr_config.conf
is read piece-wise
and each config is stored as a extractBUFRperl::bufrdesc
object.
-
class
extractBUFRperl.
bufrdesc
(rec, cols)[source]¶ This is a small helper class. I am loading the bufrdesc database as a list ob such bufrdesc classes which are easily iteratable. Used to store each record (each row of the bufrdesc database table as an object which is easy to iterate over.
Parameters: - rec (
tuple
) – A record from the bufrdesc database table. The elements of thetuple
are described by the second input argumentcols
. - cols (
list
) – List ofstr
describing the elements in the first argument (rectuple
).
-
__weakref__
¶ list of weak references to the object (if defined)
- rec (