Developer reference¶
This document is organized by module. For source code, see the repository.
env
- Environment¶
Environment for compound group operations.
-
class
commongroups.env.
CommonEnv
(name=None, env_path=None, **kwargs)¶ Bases:
object
Run environment for
commongroups
.This object keeps track of a project environment (i.e., file locations for data and logs, shared parameters), which can be used by instances of
CMGroup
or by a number of functions needing such information.Instantiating this class creates a directory tree: the project directory corresponding to
project_path
and subdirectoriesdata
,log
, andresults
. The project directory is created within the “home” directory corresponding toenv_path
. A new log file is created in thelog
subdirectory each time a newCommonEnv
with the same project name is created.Parameters: - name (str) – Project name, used to name the project directory.
- env_path (str) – Path to root commongroups home. If not specified,
looks for environment variable
CMG_HOME
or defaults to~/commongroups_data
. - kwargs – Configuration options to override those read from file.
-
connect_database
()¶ Instantiate a SQLAlchemy engine for connecting to the database.
-
data_path
¶ Path to the project data directory.
-
name
¶ Project name.
-
project_path
¶ Path to project directory.
-
results_path
¶ Path to the project results directory.
-
set_config
(opts)¶ Combine config options from file and from object instantiation.
Parameters passed as
config
when theCommonEnv
was created will overwrite parameters read from file (individually).Parameters: opts (dict) – Configuration options.
-
commongroups.env.
add_project_handler
(log_file)¶ Add a project-specific
FileHandler
for all logging output.This enables logging to a file that’s kept within the project directory.
cmgroup
- Compound group¶
Compound group class.
-
class
commongroups.cmgroup.
CMGroup
(env, params, info=None)¶ Bases:
object
Compound group object.
Initialize with parameters (
params
), which are known in advance and assumed to stay unchanged. Call thecreate_query()
method to set up a database query, then callprocess()
(requires a database connection) to populate the object’scompounds
andinfo
attributes. Add your own annotations usingadd_info()
.Data, output, and logs for each
CMGroup
are managed using an associatedCommonEnv
project environment. See Design.Parameters: - env (
commongroups.env.CommonEnv
) – The project environment. - params (dict) – A dictionary containing the parameters of the compound group. See Parameters for defining a compound group.
- info (dict) – Optional extra information as key-value pairs.
-
add_info
(info)¶ Add information to the group as key-value pairs.
Parameters: data (dict) – A dict containing any number of items.
-
cmg_id
¶ Compound group ID (convenience method to retrieve from params).
-
compounds
¶ If populated, return a
DataFrame
of compounds in the group.Warning:
DataFrame
objects are easily modified by accident. Making this a read-only attribute does not prevent accidental modification.
-
create_query
()¶ Create query method based on compound group parameters.
Add a callable
commongroups.query.QueryMethod
attribute.
-
name
¶ Compound group name (convenience method to retrieve from params).
-
process
(con)¶ Execute the database query and store results in the
CMGroup
object.Populate the
compounds
list with query results; add a computed summary of results to theinfo
attribute.Parameters: con ( sqlalchemy.engine.Engine
) – Database connection.
-
to_dict
()¶ Return a dict of
CMGroup
parameters and info.
-
to_excel
(path=None)¶ Output compound group data to an Excel spreadsheet.
Parameters and info are tabulated on the first sheet, and the full compounds
DataFrame
is exported to the second sheet.
-
to_html
(*args, **kwargs)¶ Output HTML display of the compound group.
For options see
commongroups.hypertext.cmg_to_html()
.
-
to_json
(path=None)¶ Serialize
CMGroup
parameters and info as JSON.
- env (
query
- Database queries¶
Database querying methods for compound groups.
-
class
commongroups.query.
QueryMethod
(params)¶ Bases:
object
Create, describe, and execute a query for populating a compound group.
Parameters: params (dict) – Compound group parameters of a CMGroup
object.-
create_expression
()¶ Compose a SQLAlchemy expression based on the supplied parameters.
Set object attribute
expression
.
-
create_query_where
()¶ Generate a query expression from a WHERE clause.
-
describe
()¶ Create a textual description of the query method with minimal HTML.
-
get_literal
()¶ Return a string literal of the query expression, with bound parameters.
-
-
commongroups.query.
get_query_results
(que, con)¶ Execute a database query using SQLAlchemy.
Parameters: - que – SQLAlchemy
Select
object. - con – SQLAlchemy database
Connection
object.
Returns: A pandas
DataFrame
containing all rows of results.- que – SQLAlchemy
googlesheet
- Google Sheets access¶
Get compound group parameters from a Google Sheet.
See Google Sheets access for more information.
-
class
commongroups.googlesheet.
SheetManager
(title, worksheet, key_file)¶ Bases:
object
Object to manage Google Sheets access.
Parameters: - key_file (str) – Path to Google service account credentials JSON file.
- title (str) – Title of the Google Sheet to open.
- worksheet (str) – Title of the worksheet containing parameters within the Google Sheet.
Raises: commongroups.errors.NoCredentialsError
– If the API credentials are missing or cannot be parsed from JSON.Notes
Yes, we open Google Sheets by title. It would be nice to open them by key or by URL, but that functionality in
gspread
is broken because of the “New Sheets”.-
get_cmgs
(env)¶ Generate
CMGroup
objects from parameters in spreadsheet rows.Parameters: env ( commongroups.env.CommonEnv
) – The project environment that the returned objects will use to store data, etc.Yields: CMGroup
objects based on parameters in each row.
-
get_params
()¶ Read parameters and info from spreadsheet rows iteratively.
Stops reading the spreadsheet when a blank row is encountered.
Yields: Parameters and info for each group (row), as nested dicts.
-
get_spreadsheet
()¶ Open the spreadsheet containing CMG parameters.
Returns: The Google Sheet object. Return type: gspread.Spreadsheet
-
params_to_json
(path)¶ Get group parameters from the worksheet and output to a JSON file.
Parameters: path – Path to output file.
hypertext
- Generating HTML output¶
Functions for generating HTML output.
-
commongroups.hypertext.
cmg_to_html
(cmg, formats=None, img_source='PubChem', img_size=500)¶ Generate an HTML document showing results of processing a
CMGroup
.Parameters: - cmg – A
CMGroup
object. - formats (list) – Other formats to link to for this compound group, such
as:
json
,excel
,csv
. - img_source (str) – How to generate images. Currently the only option
is
PubChem
.
- cmg – A
-
commongroups.hypertext.
describe_cmg
(cmg)¶ Generate an HTML snippet describing the parameters of a CMGroup.
-
commongroups.hypertext.
directory
(cmgs, env, title='Compound group processing results', formats=None)¶ Generate an HTML directory for a collection of compound groups.
Writes and HTML file in the environment’s
results/html
directory.Parameters: - cmgs – Iterable of
CMGroup
objects. - env –
CommonEnv
to contain output. - title (str) – Title of page.
- formats (list) – List of file extensions to use to create links to other
formats of this collection, e.g.
json
,xlsx
.
- cmgs – Iterable of
-
commongroups.hypertext.
get_notes
(cmg)¶ Retrieve
notes
from CMGroup info, if exists.
-
commongroups.hypertext.
info_to_context
(info)¶ Convert CMGroup
info
to a context for HTML templating.
-
commongroups.hypertext.
pubchem_image
(cid_or_container, size=500)¶ Generate HTML code for a PubChem molecular structure graphic and link.
Parameters: cid_or_container – The CID (int, str) or a subscriptable object that contains a key cid
.Returns: HTML code for an image from PubChem.
ops
- Batch operations¶
Common Groups operations.
-
commongroups.ops.
batch_process
(cmgs, env)¶ Process compound groups in a given environment and output all results.
Use the database connection provided by the environment. Output results to Excel (compound lists and group info) and JSON (group parameters and info). Create a browseable HTML directory of all groups & results.
Parameters: - cmgs (iterable) –
commongroups.cmgroup.CMGroup
objects to process. - env (
commongroups.env.CommonEnv
) – Environment.
Returns: List of processed compound groups.
- cmgs (iterable) –
-
commongroups.ops.
cmgs_from_file
(env, path, filetype=None)¶ Generate compound group objects from a file.
Only the defining parameters and descriptive information for each compound group are imported from the file. Importing lists of compounds for already populated groups is not supported.
Parameters: - env (
commongroups.env.CommonEnv
) – The project environment. Determines the environment used for theCMGroup
objects. - path (str) – Path to a file containing parameters, and optionally
other
info
, for a number of CMGs. - filetype (str) – Type of file; required only if path does not have a file extension.
Yields: commongroups.cmgroup.CMGroup
objects.- env (
-
commongroups.ops.
cmgs_from_googlesheet
(env)¶ Generate compound group objects from parameters given in a Google Sheet.
Use the Google Sheets source referenced in the environment’s configuration.
Parameters: env ( commongroups.env.CommonEnv
) – Environment to use for all generated groups and for identifying the Google Sheet.Returns: Generator yielding commongroups.cmgroup.CMGroup
objects.
-
commongroups.ops.
collect_to_json
(cmgs, env, filename=None)¶ Write parameters and info for a number of compound groups to a JSON file.
The output is written to
cmgroups.json
(or other filename if specified) in the project environment’sresults
directory.Parameters: - cmgs (iterable) –
commongroups.cmgroup.CMGroup
objects to write. - env (
commongroups.env.CommonEnv
) – Project environment. - filename (str) – Optional alternative filename.
- cmgs (iterable) –
run
- The run script¶
See Usage.
Automatically populate compound groups using a chemical database.
Invoking this module runs a collection of commongroups operations:
- Read compound group definitions either from the web (Google Sheets) or from a JSON file if specified.
- Compile and perform database queries based on group definitions.
- Output results to Excel and JSON and create a browseable HTML directory.
-
commongroups.run.
create_parser
()¶ Create command-line argument parser.
-
commongroups.run.
main
()¶ Run commongroups operations.
-
commongroups.run.
print_version_info
()¶ Print information about the program to the console.
-
commongroups.run.
set_console_loglevel
(level)¶ Change console log level from default (INFO), if specified.
errors
- Exceptions¶
Special errors for commongroups.
-
exception
commongroups.errors.
CommonError
(*args, **kwargs)¶ Bases:
Exception
Base Exception for all commongroups errors.
-
exception
commongroups.errors.
MissingParamError
(param, *args, **kwargs)¶ Bases:
commongroups.errors.CommonError
Raised upon failure to access a configuration or group parameter.
-
exception
commongroups.errors.
NoCredentialsError
(path, *args, **kwargs)¶ Bases:
commongroups.errors.CommonError
Raised when the Google API credentials file cannot be read.
tests
- Tests¶
This package includes a suite of unit tests. Currently, the included tests only cover the architecture of the program itself, and not the logic of SQL queries against the database.
With pytest, tests can be run on the installed package using:
pytest --pyargs commongroups
Or on the source code (without installing) by running:
python setup.py test
For all of the tests to pass, you must have a database
already set up, the PostgreSQL server must be running; your commongroups
environment must be configured to allow the program to access
this database and to access a Google Spreadsheet
containing test parameters. Please contact the authors if you want to run tests
yourself and would like a sample spreadsheet.