empd-admin Command Line Interface

usage: empd-admin [-h] [-d DIRECTORY]
                  {test,fix,createdb,rebuild,rebase,finish,accept,unaccept,query,diff,generate,merge-meta,help}
                  ...

Named Arguments

-d, --directory

Path to the local EMPD2/EMPD-data repository. Default: “.”

Default: “.”

Commands

parser

Possible choices: test, fix, createdb, rebuild, rebase, finish, accept, unaccept, query, diff, generate, merge-meta, help

Sub-commands:

test

test the database

empd-admin test [-h] [--collect-only] [-x] [-m MARKEXPR] [-s SampleName] [-v]
                [--maxfail num] [-f] [-e [filename.tsv]] [--no-commit]
                [--skip-ci]
                [EXPRESSION]

Positional Arguments

EXPRESSION

only run tests which match the given substringexpression. An expression is a python evaluatableexpression where all names are substring-matchedagainst test names and their parent classes. Example:-k ‘test_method or test_other’ matches all testfunctions and classes whose name contains’test_method’ or ‘test_other’, while -k ‘nottest_method’ matches those that don’t contain’test_method’ in their names. Additionally keywordsare matched to classes and functions containing extranames in their ‘extra_keyword_matches’ set, as well asfunctions which have names assigned directly to them.

Named Arguments

--collect-only

only collect tests, don’t execute them.

Default: False

-x, --exitfirst

exit instantly on first error or failed test.

Default: False

-m

only run tests matching given mark expression. example: -m ‘mark1 and not mark2’.’

-s, --sample

Name of samples to test. If provided, only samples that match the given pattern are tested.

Default: “.*”

-v, --verbose

increase verbosity.

Default: False

--maxfail

exit after first num failures or errors.

Default: 20

-f, --full-report

Print the full test report

Default: False

-e, --extract-failed

Extract the meta data of failed samples into a separate file in the failures directory. Without argument, failed samples will be extracted to failed.tsv.

Default: False

--no-commit

Do not commit the changes.

Default: False

--skip-ci

Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.

Default: False

fix

fix the database

empd-admin fix [-h] [--collect-only] [-x] [-m MARKEXPR] [-s SampleName] [-v]
               [--no-commit] [--skip-ci]
               [EXPRESSION]

Positional Arguments

EXPRESSION

only run tests which match the given substringexpression. An expression is a python evaluatableexpression where all names are substring-matchedagainst test names and their parent classes. Example:-k ‘test_method or test_other’ matches all testfunctions and classes whose name contains’test_method’ or ‘test_other’, while -k ‘nottest_method’ matches those that don’t contain’test_method’ in their names. Additionally keywordsare matched to classes and functions containing extranames in their ‘extra_keyword_matches’ set, as well asfunctions which have names assigned directly to them.

Named Arguments

--collect-only

only collect tests, don’t execute them.

Default: False

-x, --exitfirst

exit instantly on first error or failed test.

Default: False

-m

only run tests matching given mark expression. example: -m ‘mark1 and not mark2’.’

-s, --sample

Name of samples to test. If provided, only samples that match the given pattern are tested.

Default: “.*”

-v, --verbose

increase verbosity.

Default: False

--no-commit

Do not commit the changes.

Default: False

--skip-ci

Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.

Default: False

createdb

Create a postgres database out of the data

empd-admin createdb [-h] [-db DATABASE] [-c]

Named Arguments

-db, --database

The name of the database. If not given, a temporary database will be created and deleted afterwards.

-c, --commit

Dump the postgres database into a .sql file

Default: False

rebuild

Rebuild the fixed tables of the postgres database

empd-admin rebuild [-h] [-db DATABASE] [-c]
                   {all,SampleType,Country} [{all,SampleType,Country} ...]

Positional Arguments

tables

Possible choices: all, SampleType, Country

The table name to rebuild.

Named Arguments

-db, --database

The name of the database. If not given, a temporary database will be created and deleted afterwards.

-c, --commit

Dump the postgres database into a .sql file

Default: False

rebase

Merge the master branch of EMPD2/EMPD-data into the current branch to resolve merge conflicts

empd-admin rebase [-h]

finish

Finish this PR and merge the data into meta.tsv

empd-admin finish [-h] [-c] [-nt]

Named Arguments

-c, --commit

Commit the changes

Default: False

-nt, --no-tests

Do not run the tests at the end.

Default: True

accept

Mark incomplete or erroneous meta data as accepted

empd-admin accept [-h] [-e] [-q QUERY] [-m <<metafile>>.tsv] [--no-commit]
                  [--skip-ci]
                  SampleName:Column [SampleName:Column ...]

Positional Arguments

SampleName:Column

The sample name and the column that should be accepted despite being erroneous. For example use my_sample_a1:Country to not check the Country column for the sample my_sample_a1. SampleName might also be all to accept it for all samples. NOTE: When using –query argument, the SampleName is ignored.

Named Arguments

-e, --exact

Assume provided sample names to match exactly. Otherwise we expect a regex and search for it in the sample name.

Default: False

-q, --query

Select the samples through an SQLite query instead of the SampleName:Column syntax. If this argument is provided, the resulting query is passed to the WHERE clause of an SQL query. E.g. ` -q “Country = ‘Germany’”` will be executed as SELECT SampleName FROM meta WHERE Country = ‘Germany’. Note that any provided SampleName in the positional arguments (SampleName:Column) are then ignored

-m, --meta-file

The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.

--no-commit

Do not commit the changes.

Default: False

--skip-ci

Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.

Default: False

Examples

  • Accept wrong countries for all samples:

    empd-admin accept all:Country
    
  • Accept wrong latitudes and longitudes for all samples that start with 'Barboni':

    empd-admin accept Barboni:Latitude Barboni:Longitude
    
  • Accept wrong Temperature for the sample 'Beaudouin_a1':

    empd-admin accept -e Beaudouin_a1:Temperature
    

    Note

    If you skip the -e option above, wrong temperatures would also be accepted for the sample Beaudouin_a10

  • Accept missing Latitudes and Longitudes:

    empd-admin accept Country -q "Latitude is NULL or Longitude is NULL"
    

unaccept

Reverse the acceptance of incomplete or erroneous meta data.

empd-admin unaccept [-h] [-e] [-q QUERY] [-m <<metafile>>.tsv] [--no-commit]
                    [--skip-ci]
                    SampleName:Column [SampleName:Column ...]

Positional Arguments

SampleName:Column

The sample name and the column that should be rejected if it is erroneous. For example use my_sample_a1:Country to check the Country column for the sample my_sample_a1 again. SampleName and/or Column might also be all to enable the tests for all the samples and/or meta data fields again. NOTE: When using –query argument, the SampleName is ignored.

Named Arguments

-e, --exact

Assume provided sample names to match exactly. Otherwise we expect a regex and search for it in the sample name.

Default: False

-q, --query

Select the samples through an SQLite query instead of the SampleName:Column syntax. If this argument is provided, the resulting query is passed to the WHERE clause of an SQL query. E.g. ` -q “Country = ‘Germany’”` will be executed as SELECT SampleName FROM meta WHERE Country = ‘Germany’. Note that any provided SampleName in the positional arguments (SampleName:Column) are then ignored

-m, --meta-file

The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.

--no-commit

Do not commit the changes.

Default: False

--skip-ci

Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.

Default: False

Examples

  • Do not accept any failure for any column:

    empd-admin unaccept all:all
    
  • Do not accept any failure for latitudes or longitudes with samples that start with 'Barboni':

    empd-admin unaccept Barboni:Latitude Barboni:Longitude
    
  • Do not accept wrong Temperature for the sample 'Beaudouin_a1':

    empd-admin unaccept -e Beaudouin_a1:Temperature
    

    Note

    If you skip the exact parameter above, wrong temperatures would also be not accepted anymore for the sample Beaudouin_a10!

  • Do not accept any failure for samples where the Country equals “Germany”:

    empd-admin unaccept Country -q "Country = 'Germany'"
    

query

Query and display the meta data

empd-admin query [-h] [-d column [column ...]] [-count] [-m <<metafile>>.tsv]
                 [-c] [-o OUTPUT]
                 query [columns [columns ...]]

Positional Arguments

query

The query that is passed to the pandas.DataFrame.query method to select a subsection of the data. See the examples below for further details.

columns

The columns in the metadata to show. The default is notnull, to only display columns that have at least one valid value. You can change this by setting it to ‘all’

Default: “notnull”

Named Arguments

-d, --distinct

Be distinct on the given columns (i.e. drop duplicates). It can also be all to consider all columns.

Default: False

-count

Display the number of not-null values (i.e. COUNT(column)) in the selected columns instead of the data table.

Default: False

-m, --meta-file

The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.

-c, --commit

Commit the generated file.

Default: False

-o, --output

Save the query in the queries directory. If not set but –commit is set, then it will be saved as queries/query.tsv.

Examples

Display the samples in Germany:

empd-admin query "Country = 'Germany'"

Display only the sample names of samples in Germany:

empd-admin query "Country == 'Germany'" SampleName

Display the samples with a ‘forest’ SampleContext:

empd-admin query "SampleContext LIKE '%forest%'"

Display the distinct countries of a certain data contribution:

empd-admin query -d "SampleName LIKE '%Barboni_%'"

diff

Compare two EMPD meta files

empd-admin diff [-h] [-how {inner,outer,left,right}] [-on ON [ON ...]]
                [-atol ATOL] [-e EXCLUDE [EXCLUDE ...]]
                [-col [COLUMN [COLUMN ...]]] [-c] [-o OUTPUT] [-max MAXDIFF]
                [left] [right]

Positional Arguments

left

The first meta file. If None, the meta file of this repository will be used

right

The second meta file. If None, the meta file of this repository will be used. If that is the same as left, we use the meta.tsv of the repository or https://raw.githubusercontent.com/EMPD2/EMPD-data/master/meta.tsv

Named Arguments

-how

Possible choices: inner, outer, left, right

Specify which samples to test. inner means the intersection of left and right, outer is the outer product of left and right, and so on. Default: “inner”.

Default: “inner”

-on

The columns to use for computing the change. They have to be in left and right. If None, all columns will be used.

-atol

Absolute tolerance to use for numeric columns. Default: 0.001

Default: 0.001

-e, --exclude

The columns to exclude for computing the change. They will be removed from the columns set by the on parameter.

Default: []

-col, --columns

The columns for the output. Can be leftdiff, to use the differing columns from left, left to use all columns from left, rightdiff to use differing columns from right, right to use all columns from right, inner to use the intersection of left and right, nothing to not display any columns, or a list of columns to display. Alternatively it can be both to use the columns from left and right, or bothdiff, to use the changed columns from left and right. The columns from right will then be suffixed with an _r. Default: [‘leftdiff’].

Default: [‘leftdiff’]

-c, --commit

Commit the generated file.

Default: False

-o, --output

Save the difference in the queries directory. If not set but –commit is set, then it will be saved as queries/diff.tsv.

-max, --maxdiff

The maximum number of lines to print to stdout. Default: 200

Default: 200

generate

Generate the EMPD data out of a postgres dump

empd-admin generate [-h] [-o OUTPUT] [-d] [--no-meta] [--no-counts]
                    [-k COLUMN [COLUMN ...]] [-how {inner,outer,left,right}]
                    [-on ON [ON ...]] [-atol ATOL] [-e EXCLUDE [EXCLUDE ...]]
                    [-col [COLUMN [COLUMN ...]]] [-c]
                    postgres_dump

Positional Arguments

postgres_dump

The name of the postgres dump, relative to the postgres folder

Named Arguments

-o, --output

Save the metadata to the given file. If not set, the meta data file of the repository will be used.

-d, --dry-run

Perform a dry run and do not save anything to disk

Default: False

--no-meta

If set, do not modify the meta data

Default: True

--no-counts

If set, do not modify the pollen data files

Default: True

-k, --keep

Keep the specified columns from meta.tsv

-how

Possible choices: inner, outer, left, right

Specify which samples to test. inner means the intersection of left and right, outer is the outer product of left and right, and so on. Default: “left”.

Default: “left”

-on

The columns to use for computing the change. They have to be in left and right. If None, all columns will be used.

-atol

Absolute tolerance to use for numeric columns. Default: 0.001

Default: 0.001

-e, --exclude

The columns to exclude for computing the change. They will be removed from the columns set by the on parameter.

Default: []

-col, --columns

The columns for the output. Can be leftdiff, to use the differing columns from left, left to use all columns from left, rightdiff to use differing columns from right, right to use all columns from right, inner to use the intersection of left and right, nothing to not display any columns, or a list of columns to display. Alternatively it can be both to use the columns from left and right, or bothdiff, to use the changed columns from left and right. The columns from right will then be suffixed with an _r. Default: [‘left’].

Default: [‘left’]

-c, --commit

Commit the generated file.

Default: False

merge-meta

Merge two metafiles

empd-admin merge-meta [-h] [--no-commit] src [target]

Positional Arguments

src

The tab-separated source file that shall be merged into the target file

target

The meta file in which src should be merged into. If not set, it is either the new meta file in the root directory of the repository (if existent) or meta.tsv.

Named Arguments

--no-commit

Do not commit the merge.

Default: True

help

Print the help on a command

empd-admin help [-h]
                [{test,fix,createdb,rebuild,rebase,finish,accept,unaccept,query,diff,generate,merge-meta,help}]

Positional Arguments

command

Possible choices: test, fix, createdb, rebuild, rebase, finish, accept, unaccept, query, diff, generate, merge-meta, help

Command for which to request the help