empd_admin.generate_repo module

Module to generate the EMPD-data repository from the postgres database

This module defines the db2folder function that generates the EMPD-data repository out of the postgres database

Functions

db2repo(meta, postgres_dump[, commit, …])

Generate the EMPD-data repository out of a postgres_dump

fill_repo(meta, db_url[, root_db, dry_run, …])

Fill the EMPD-data repo with the database in the given URL

empd_admin.generate_repo.db2repo(meta, postgres_dump, commit=False, output=None, dry_run=False, *args, **kwargs)

Generate the EMPD-data repository out of a postgres_dump

Parameters
  • meta (str) – The path to the local EMPD2 meta data file

  • postgres_dump (str) – The path to the postgres file (relative to meta). This dump must define a metaViewer table that contains the new meta data

  • commit (bool) – If True, commit the added files

  • output (str) –

    The path where to save the new meta data. If this is None, the following cases are considered:

    1. meta is 'meta.tsv': output will be set to 'update.tsv'

    2. meta is anything else: output will be set to meta

  • dry_run (bool) – If True, do not create any file but only report what would have been saved

Returns

The markdown formatted report

Return type

str

empd_admin.generate_repo.fill_repo(meta, db_url, root_db=None, dry_run=False, meta_data=True, count_data=True, keep=None, how='left', on=None, exclude=[], columns='left', atol=0.001)

Fill the EMPD-data repo with the database in the given URL

Parameters
  • meta (str) – The path where to save the data

  • db_url (str) – The url where the postgres database can be accessed. Note that we expect this database to have a 'metaViewer' table

  • root_db (str) – The url where the EMPD2 postgres database can be accessed. This parameter is only necessary where how != 'left-only'

  • dry_run (bool) – If True, do not create any file but only report what would have been saved

  • meta_data (bool) – If True (default), dump the meta data into meta

  • count_data (bool) – If True (default), dump the pollen counts in the corresponding file of the sample

  • keep (list) – Columns to keep from the root_df

  • how (str) –

    How to merge the root meta data into the new one. Possiblities are

    inner

    use intersection of samples from both frames, similar to a SQL inner join; preserve the order of the left keys.

    outer

    use union of samples from both frames, similar to a SQL full outer join; sort keys lexicographically.

    left (default)

    use only samples from the new frame, similar to a SQL left outer join; preserve key order.

    right

    use only samples from right frame, similar to a SQL right outer join; preserve key order.

  • on (list of str) – The names of the columns to compute the diff on. If None, we use the intersection of columns between left and right.

  • exclude (list of str) – Columns names that should be excluded in the diff.

  • columns (str or list of str) –

    The columns of the returned dataframe. It can either be a list of column names to use or one of

    leftdiff (default)

    To use the columns from left that differ from right

    left

    To use all columns from left

    rightdiff

    To use the columns from right that differ from left

    right

    To use all columns from right

    inner

    To use the intersection of left and right

    bothdiff

    To use the differing columns from right and left (columns from right are suffixed with an '_r')

    both

    To use all columns from left and right (columns from right are suffixed with an '_r')

    In any of these cases (except if you specify the column names explicitly), the columns the data frame will include a diff column that contains for each sample the columns names of the differing cells.

  • atol (float) – Absolute tolerance to use for numeric columns (see the empd_admin.common.NUMERIC_COLS).

Returns

  • str – The markdown formatted report

  • list – The filenames that have changed (or would have been changed, if dry_run is True)