empd_admin.generate_repo module¶
Module to generate the EMPD-data repository from the postgres database
This module defines the db2folder function that generates the EMPD-data repository out of the postgres database
Functions
|
Generate the EMPD-data repository out of a postgres_dump |
|
Fill the EMPD-data repo with the database in the given URL |
-
empd_admin.generate_repo.
db2repo
(meta, postgres_dump, commit=False, output=None, dry_run=False, *args, **kwargs)¶ Generate the EMPD-data repository out of a postgres_dump
- Parameters
meta (str) – The path to the local EMPD2 meta data file
postgres_dump (str) – The path to the postgres file (relative to meta). This dump must define a metaViewer table that contains the new meta data
commit (bool) – If True, commit the added files
output (str) –
The path where to save the new meta data. If this is None, the following cases are considered:
meta is
'meta.tsv'
: output will be set to'update.tsv'
meta is anything else: output will be set to meta
dry_run (bool) – If True, do not create any file but only report what would have been saved
- Returns
The markdown formatted report
- Return type
-
empd_admin.generate_repo.
fill_repo
(meta, db_url, root_db=None, dry_run=False, meta_data=True, count_data=True, keep=None, how='left', on=None, exclude=[], columns='left', atol=0.001)¶ Fill the EMPD-data repo with the database in the given URL
- Parameters
meta (str) – The path where to save the data
db_url (str) – The url where the postgres database can be accessed. Note that we expect this database to have a
'metaViewer'
tableroot_db (str) – The url where the EMPD2 postgres database can be accessed. This parameter is only necessary where
how != 'left-only'
dry_run (bool) – If True, do not create any file but only report what would have been saved
meta_data (bool) – If True (default), dump the meta data into meta
count_data (bool) – If True (default), dump the pollen counts in the corresponding file of the sample
keep (list) – Columns to keep from the root_df
how (str) –
How to merge the root meta data into the new one. Possiblities are
- inner
use intersection of samples from both frames, similar to a SQL inner join; preserve the order of the left keys.
- outer
use union of samples from both frames, similar to a SQL full outer join; sort keys lexicographically.
- left (default)
use only samples from the new frame, similar to a SQL left outer join; preserve key order.
- right
use only samples from right frame, similar to a SQL right outer join; preserve key order.
on (list of str) – The names of the columns to compute the diff on. If None, we use the intersection of columns between left and right.
exclude (list of str) – Columns names that should be excluded in the diff.
columns (str or list of str) –
The columns of the returned dataframe. It can either be a list of column names to use or one of
- leftdiff (default)
To use the columns from left that differ from right
- left
To use all columns from left
- rightdiff
To use the columns from right that differ from left
- right
To use all columns from right
- inner
To use the intersection of left and right
- bothdiff
To use the differing columns from right and left (columns from right are suffixed with an
'_r'
)- both
To use all columns from left and right (columns from right are suffixed with an
'_r'
)
In any of these cases (except if you specify the column names explicitly), the columns the data frame will include a
diff
column that contains for each sample the columns names of the differing cells.atol (float) – Absolute tolerance to use for numeric columns (see the
empd_admin.common.NUMERIC_COLS
).
- Returns
str – The markdown formatted report
list – The filenames that have changed (or would have been changed, if dry_run is True)