empd-admin Command Line Interface¶
usage: empd-admin [-h] [-d DIRECTORY]
{test,fix,createdb,rebuild,rebase,finish,accept,unaccept,query,diff,generate,merge-meta,help}
...
Named Arguments¶
- -d, --directory
Path to the local EMPD2/EMPD-data repository. Default: “.”
Default: “.”
Commands¶
- parser
Possible choices: test, fix, createdb, rebuild, rebase, finish, accept, unaccept, query, diff, generate, merge-meta, help
Sub-commands:¶
test¶
test the database
empd-admin test [-h] [--collect-only] [-x] [-m MARKEXPR] [-s SampleName] [-v]
[--maxfail num] [-f] [-e [filename.tsv]] [--no-commit]
[--skip-ci]
[EXPRESSION]
Positional Arguments¶
- EXPRESSION
only run tests which match the given substringexpression. An expression is a python evaluatableexpression where all names are substring-matchedagainst test names and their parent classes. Example:-k ‘test_method or test_other’ matches all testfunctions and classes whose name contains’test_method’ or ‘test_other’, while -k ‘nottest_method’ matches those that don’t contain’test_method’ in their names. Additionally keywordsare matched to classes and functions containing extranames in their ‘extra_keyword_matches’ set, as well asfunctions which have names assigned directly to them.
Named Arguments¶
- --collect-only
only collect tests, don’t execute them.
Default: False
- -x, --exitfirst
exit instantly on first error or failed test.
Default: False
- -m
only run tests matching given mark expression. example: -m ‘mark1 and not mark2’.’
- -s, --sample
Name of samples to test. If provided, only samples that match the given pattern are tested.
Default: “.*”
- -v, --verbose
increase verbosity.
Default: False
- --maxfail
exit after first num failures or errors.
Default: 20
- -f, --full-report
Print the full test report
Default: False
- -e, --extract-failed
Extract the meta data of failed samples into a separate file in the failures directory. Without argument, failed samples will be extracted to
failed.tsv
.Default: False
- --no-commit
Do not commit the changes.
Default: False
- --skip-ci
Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.
Default: False
fix¶
fix the database
empd-admin fix [-h] [--collect-only] [-x] [-m MARKEXPR] [-s SampleName] [-v]
[--no-commit] [--skip-ci]
[EXPRESSION]
Positional Arguments¶
- EXPRESSION
only run tests which match the given substringexpression. An expression is a python evaluatableexpression where all names are substring-matchedagainst test names and their parent classes. Example:-k ‘test_method or test_other’ matches all testfunctions and classes whose name contains’test_method’ or ‘test_other’, while -k ‘nottest_method’ matches those that don’t contain’test_method’ in their names. Additionally keywordsare matched to classes and functions containing extranames in their ‘extra_keyword_matches’ set, as well asfunctions which have names assigned directly to them.
Named Arguments¶
- --collect-only
only collect tests, don’t execute them.
Default: False
- -x, --exitfirst
exit instantly on first error or failed test.
Default: False
- -m
only run tests matching given mark expression. example: -m ‘mark1 and not mark2’.’
- -s, --sample
Name of samples to test. If provided, only samples that match the given pattern are tested.
Default: “.*”
- -v, --verbose
increase verbosity.
Default: False
- --no-commit
Do not commit the changes.
Default: False
- --skip-ci
Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.
Default: False
createdb¶
Create a postgres database out of the data
empd-admin createdb [-h] [-db DATABASE] [-c]
Named Arguments¶
- -db, --database
The name of the database. If not given, a temporary database will be created and deleted afterwards.
- -c, --commit
Dump the postgres database into a .sql file
Default: False
rebuild¶
Rebuild the fixed tables of the postgres database
empd-admin rebuild [-h] [-db DATABASE] [-c]
{all,SampleType,Country} [{all,SampleType,Country} ...]
Positional Arguments¶
- tables
Possible choices: all, SampleType, Country
The table name to rebuild.
Named Arguments¶
- -db, --database
The name of the database. If not given, a temporary database will be created and deleted afterwards.
- -c, --commit
Dump the postgres database into a .sql file
Default: False
rebase¶
Merge the master branch of EMPD2/EMPD-data into the current branch to resolve merge conflicts
empd-admin rebase [-h]
finish¶
Finish this PR and merge the data into meta.tsv
empd-admin finish [-h] [-c] [-nt]
Named Arguments¶
- -c, --commit
Commit the changes
Default: False
- -nt, --no-tests
Do not run the tests at the end.
Default: True
accept¶
Mark incomplete or erroneous meta data as accepted
empd-admin accept [-h] [-e] [-q QUERY] [-m <<metafile>>.tsv] [--no-commit]
[--skip-ci]
SampleName:Column [SampleName:Column ...]
Positional Arguments¶
- SampleName:Column
The sample name and the column that should be accepted despite being erroneous. For example use my_sample_a1:Country to not check the Country column for the sample my_sample_a1. SampleName might also be all to accept it for all samples. NOTE: When using –query argument, the SampleName is ignored.
Named Arguments¶
- -e, --exact
Assume provided sample names to match exactly. Otherwise we expect a regex and search for it in the sample name.
Default: False
- -q, --query
Select the samples through an SQLite query instead of the SampleName:Column syntax. If this argument is provided, the resulting query is passed to the WHERE clause of an SQL query. E.g. ` -q “Country = ‘Germany’”` will be executed as SELECT SampleName FROM meta WHERE Country = ‘Germany’. Note that any provided SampleName in the positional arguments (SampleName:Column) are then ignored
- -m, --meta-file
The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.
- --no-commit
Do not commit the changes.
Default: False
- --skip-ci
Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.
Default: False
Examples¶
Accept wrong countries for all samples:
empd-admin accept all:Country
Accept wrong latitudes and longitudes for all samples that start with
'Barboni'
:empd-admin accept Barboni:Latitude Barboni:Longitude
Accept wrong Temperature for the sample
'Beaudouin_a1'
:empd-admin accept -e Beaudouin_a1:Temperature
Note
If you skip the
-e
option above, wrong temperatures would also be accepted for the sampleBeaudouin_a10
Accept missing Latitudes and Longitudes:
empd-admin accept Country -q "Latitude is NULL or Longitude is NULL"
unaccept¶
Reverse the acceptance of incomplete or erroneous meta data.
empd-admin unaccept [-h] [-e] [-q QUERY] [-m <<metafile>>.tsv] [--no-commit]
[--skip-ci]
SampleName:Column [SampleName:Column ...]
Positional Arguments¶
- SampleName:Column
The sample name and the column that should be rejected if it is erroneous. For example use my_sample_a1:Country to check the Country column for the sample my_sample_a1 again. SampleName and/or Column might also be all to enable the tests for all the samples and/or meta data fields again. NOTE: When using –query argument, the SampleName is ignored.
Named Arguments¶
- -e, --exact
Assume provided sample names to match exactly. Otherwise we expect a regex and search for it in the sample name.
Default: False
- -q, --query
Select the samples through an SQLite query instead of the SampleName:Column syntax. If this argument is provided, the resulting query is passed to the WHERE clause of an SQL query. E.g. ` -q “Country = ‘Germany’”` will be executed as SELECT SampleName FROM meta WHERE Country = ‘Germany’. Note that any provided SampleName in the positional arguments (SampleName:Column) are then ignored
- -m, --meta-file
The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.
- --no-commit
Do not commit the changes.
Default: False
- --skip-ci
Do not build the commits with the continous integration. Has no effect if the –no-commit argument is passed as well.
Default: False
Examples¶
Do not accept any failure for any column:
empd-admin unaccept all:all
Do not accept any failure for latitudes or longitudes with samples that start with
'Barboni'
:empd-admin unaccept Barboni:Latitude Barboni:Longitude
Do not accept wrong Temperature for the sample
'Beaudouin_a1'
:empd-admin unaccept -e Beaudouin_a1:Temperature
Note
If you skip the exact parameter above, wrong temperatures would also be not accepted anymore for the sample
Beaudouin_a10
!Do not accept any failure for samples where the Country equals “Germany”:
empd-admin unaccept Country -q "Country = 'Germany'"
query¶
Query and display the meta data
empd-admin query [-h] [-d column [column ...]] [-count] [-m <<metafile>>.tsv]
[-c] [-o OUTPUT]
query [columns [columns ...]]
Positional Arguments¶
- query
The query that is passed to the pandas.DataFrame.query method to select a subsection of the data. See the examples below for further details.
- columns
The columns in the metadata to show. The default is notnull, to only display columns that have at least one valid value. You can change this by setting it to ‘all’
Default: “notnull”
Named Arguments¶
- -d, --distinct
Be distinct on the given columns (i.e. drop duplicates). It can also be all to consider all columns.
Default: False
- -count
Display the number of not-null values (i.e. COUNT(column)) in the selected columns instead of the data table.
Default: False
- -m, --meta-file
The meta file to use. If None, the default meta file of repository is used. The path has to be relative to the root of the repository.
- -c, --commit
Commit the generated file.
Default: False
- -o, --output
Save the query in the queries directory. If not set but –commit is set, then it will be saved as queries/query.tsv.
Examples¶
Display the samples in Germany:
empd-admin query "Country = 'Germany'"
Display only the sample names of samples in Germany:
empd-admin query "Country == 'Germany'" SampleName
Display the samples with a ‘forest’ SampleContext:
empd-admin query "SampleContext LIKE '%forest%'"
Display the distinct countries of a certain data contribution:
empd-admin query -d "SampleName LIKE '%Barboni_%'"
diff¶
Compare two EMPD meta files
empd-admin diff [-h] [-how {inner,outer,left,right}] [-on ON [ON ...]]
[-atol ATOL] [-e EXCLUDE [EXCLUDE ...]]
[-col [COLUMN [COLUMN ...]]] [-c] [-o OUTPUT] [-max MAXDIFF]
[left] [right]
Positional Arguments¶
- left
The first meta file. If None, the meta file of this repository will be used
- right
The second meta file. If None, the meta file of this repository will be used. If that is the same as left, we use the meta.tsv of the repository or https://raw.githubusercontent.com/EMPD2/EMPD-data/master/meta.tsv
Named Arguments¶
- -how
Possible choices: inner, outer, left, right
Specify which samples to test. inner means the intersection of left and right, outer is the outer product of left and right, and so on. Default: “inner”.
Default: “inner”
- -on
The columns to use for computing the change. They have to be in left and right. If None, all columns will be used.
- -atol
Absolute tolerance to use for numeric columns. Default: 0.001
Default: 0.001
- -e, --exclude
The columns to exclude for computing the change. They will be removed from the columns set by the on parameter.
Default: []
- -col, --columns
The columns for the output. Can be leftdiff, to use the differing columns from left, left to use all columns from left, rightdiff to use differing columns from right, right to use all columns from right, inner to use the intersection of left and right, nothing to not display any columns, or a list of columns to display. Alternatively it can be both to use the columns from left and right, or bothdiff, to use the changed columns from left and right. The columns from right will then be suffixed with an _r. Default: [‘leftdiff’].
Default: [‘leftdiff’]
- -c, --commit
Commit the generated file.
Default: False
- -o, --output
Save the difference in the queries directory. If not set but –commit is set, then it will be saved as queries/diff.tsv.
- -max, --maxdiff
The maximum number of lines to print to stdout. Default: 200
Default: 200
generate¶
Generate the EMPD data out of a postgres dump
empd-admin generate [-h] [-o OUTPUT] [-d] [--no-meta] [--no-counts]
[-k COLUMN [COLUMN ...]] [-how {inner,outer,left,right}]
[-on ON [ON ...]] [-atol ATOL] [-e EXCLUDE [EXCLUDE ...]]
[-col [COLUMN [COLUMN ...]]] [-c]
postgres_dump
Positional Arguments¶
- postgres_dump
The name of the postgres dump, relative to the postgres folder
Named Arguments¶
- -o, --output
Save the metadata to the given file. If not set, the meta data file of the repository will be used.
- -d, --dry-run
Perform a dry run and do not save anything to disk
Default: False
- --no-meta
If set, do not modify the meta data
Default: True
- --no-counts
If set, do not modify the pollen data files
Default: True
- -k, --keep
Keep the specified columns from meta.tsv
- -how
Possible choices: inner, outer, left, right
Specify which samples to test. inner means the intersection of left and right, outer is the outer product of left and right, and so on. Default: “left”.
Default: “left”
- -on
The columns to use for computing the change. They have to be in left and right. If None, all columns will be used.
- -atol
Absolute tolerance to use for numeric columns. Default: 0.001
Default: 0.001
- -e, --exclude
The columns to exclude for computing the change. They will be removed from the columns set by the on parameter.
Default: []
- -col, --columns
The columns for the output. Can be leftdiff, to use the differing columns from left, left to use all columns from left, rightdiff to use differing columns from right, right to use all columns from right, inner to use the intersection of left and right, nothing to not display any columns, or a list of columns to display. Alternatively it can be both to use the columns from left and right, or bothdiff, to use the changed columns from left and right. The columns from right will then be suffixed with an _r. Default: [‘left’].
Default: [‘left’]
- -c, --commit
Commit the generated file.
Default: False
merge-meta¶
Merge two metafiles
empd-admin merge-meta [-h] [--no-commit] src [target]
Positional Arguments¶
- src
The tab-separated source file that shall be merged into the target file
- target
The meta file in which src should be merged into. If not set, it is either the new meta file in the root directory of the repository (if existent) or meta.tsv.
Named Arguments¶
- --no-commit
Do not commit the merge.
Default: True
help¶
Print the help on a command
empd-admin help [-h]
[{test,fix,createdb,rebuild,rebase,finish,accept,unaccept,query,diff,generate,merge-meta,help}]
Positional Arguments¶
- command
Possible choices: test, fix, createdb, rebuild, rebase, finish, accept, unaccept, query, diff, generate, merge-meta, help
Command for which to request the help