empd_admin.generate_repo module¶

Module to generate the EMPD-data repository from the postgres database

This module defines the db2folder function that generates the EMPD-data repository out of the postgres database

Functions

`db2repo`(meta, postgres_dump[, commit, …])	Generate the EMPD-data repository out of a postgres_dump
`fill_repo`(meta, db_url[, root_db, dry_run, …])	Fill the EMPD-data repo with the database in the given URL

empd_admin.generate_repo.db2repo(meta, postgres_dump, commit=False, output=None, dry_run=False, *args, **kwargs)¶

Generate the EMPD-data repository out of a postgres_dump

Parameters

meta (str) – The path to the local EMPD2 meta data file
postgres_dump (str) – The path to the postgres file (relative to meta). This dump must define a metaViewer table that contains the new meta data
commit (bool) – If True, commit the added files
output (str) –
The path where to save the new meta data. If this is None, the following cases are considered:
1. meta is 'meta.tsv': output will be set to 'update.tsv'
2. meta is anything else: output will be set to meta
dry_run (bool) – If True, do not create any file but only report what would have been saved

Returns

The markdown formatted report

Return type

str

empd_admin.generate_repo.fill_repo(meta, db_url, root_db=None, dry_run=False, meta_data=True, count_data=True, keep=None, how='left', on=None, exclude=[], columns='left', atol=0.001)¶

Fill the EMPD-data repo with the database in the given URL

Parameters

meta (str) – The path where to save the data
db_url (str) – The url where the postgres database can be accessed. Note that we expect this database to have a 'metaViewer' table
root_db (str) – The url where the EMPD2 postgres database can be accessed. This parameter is only necessary where how != 'left-only'
dry_run (bool) – If True, do not create any file but only report what would have been saved
meta_data (bool) – If True (default), dump the meta data into meta
count_data (bool) – If True (default), dump the pollen counts in the corresponding file of the sample
keep (list) – Columns to keep from the root_df
how (str) –
How to merge the root meta data into the new one. Possiblities are

inner
use intersection of samples from both frames, similar to a SQL inner join; preserve the order of the left keys.

outer
use union of samples from both frames, similar to a SQL full outer join; sort keys lexicographically.

left (default)
use only samples from the new frame, similar to a SQL left outer join; preserve key order.

right
use only samples from right frame, similar to a SQL right outer join; preserve key order.
on (list of str) – The names of the columns to compute the diff on. If None, we use the intersection of columns between left and right.
exclude (list of str) – Columns names that should be excluded in the diff.
columns (str or list of str) –
The columns of the returned dataframe. It can either be a list of column names to use or one of

leftdiff (default)
To use the columns from left that differ from right

left
To use all columns from left

rightdiff
To use the columns from right that differ from left

right
To use all columns from right

inner
To use the intersection of left and right

bothdiff
To use the differing columns from right and left (columns from right are suffixed with an '_r')

both
To use all columns from left and right (columns from right are suffixed with an '_r')

In any of these cases (except if you specify the column names explicitly), the columns the data frame will include a diff column that contains for each sample the columns names of the differing cells.
atol (float) – Absolute tolerance to use for numeric columns (see the empd_admin.common.NUMERIC_COLS).

Returns

str – The markdown formatted report
list – The filenames that have changed (or would have been changed, if dry_run is True)