Developer reference

This document is organized by module. For source code, see the repository.

env - Environment

Environment for compound group operations.

class commongroups.env.CommonEnv(name=None, env_path=None, **kwargs)

Bases: object

Run environment for commongroups.

This object keeps track of a project environment (i.e., file locations for data and logs, shared parameters), which can be used by instances of CMGroup or by a number of functions needing such information.

Instantiating this class creates a directory tree: the project directory corresponding to project_path and subdirectories data, log, and results. The project directory is created within the “home” directory corresponding to env_path. A new log file is created in the log subdirectory each time a new CommonEnv with the same project name is created.

Parameters:
  • name (str) – Project name, used to name the project directory.
  • env_path (str) – Path to root commongroups home. If not specified, looks for environment variable CMG_HOME or defaults to ~/commongroups_data.
  • kwargs – Configuration options to override those read from file.
connect_database()

Instantiate a SQLAlchemy engine for connecting to the database.

data_path

Path to the project data directory.

name

Project name.

project_path

Path to project directory.

results_path

Path to the project results directory.

set_config(opts)

Combine config options from file and from object instantiation.

Parameters passed as config when the CommonEnv was created will overwrite parameters read from file (individually).

Parameters:opts (dict) – Configuration options.
commongroups.env.add_project_handler(log_file)

Add a project-specific FileHandler for all logging output.

This enables logging to a file that’s kept within the project directory.

cmgroup - Compound group

Compound group class.

class commongroups.cmgroup.CMGroup(env, params, info=None)

Bases: object

Compound group object.

Initialize with parameters (params), which are known in advance and assumed to stay unchanged. Call the create_query() method to set up a database query, then call process() (requires a database connection) to populate the object’s compounds and info attributes. Add your own annotations using add_info().

Data, output, and logs for each CMGroup are managed using an associated CommonEnv project environment. See Design.

Parameters:
add_info(info)

Add information to the group as key-value pairs.

Parameters:data (dict) – A dict containing any number of items.
cmg_id

Compound group ID (convenience method to retrieve from params).

compounds

If populated, return a DataFrame of compounds in the group.

Warning: DataFrame objects are easily modified by accident. Making this a read-only attribute does not prevent accidental modification.

create_query()

Create query method based on compound group parameters.

Add a callable commongroups.query.QueryMethod attribute.

name

Compound group name (convenience method to retrieve from params).

process(con)

Execute the database query and store results in the CMGroup object.

Populate the compounds list with query results; add a computed summary of results to the info attribute.

Parameters:con (sqlalchemy.engine.Engine) – Database connection.
to_dict()

Return a dict of CMGroup parameters and info.

to_excel(path=None)

Output compound group data to an Excel spreadsheet.

Parameters and info are tabulated on the first sheet, and the full compounds DataFrame is exported to the second sheet.

to_html(*args, **kwargs)

Output HTML display of the compound group.

For options see commongroups.hypertext.cmg_to_html().

to_json(path=None)

Serialize CMGroup parameters and info as JSON.

query - Database queries

Database querying methods for compound groups.

class commongroups.query.QueryMethod(params)

Bases: object

Create, describe, and execute a query for populating a compound group.

Parameters:params (dict) – Compound group parameters of a CMGroup object.
create_expression()

Compose a SQLAlchemy expression based on the supplied parameters.

Set object attribute expression.

create_query_where()

Generate a query expression from a WHERE clause.

describe()

Create a textual description of the query method with minimal HTML.

get_literal()

Return a string literal of the query expression, with bound parameters.

commongroups.query.get_query_results(que, con)

Execute a database query using SQLAlchemy.

Parameters:
  • que – SQLAlchemy Select object.
  • con – SQLAlchemy database Connection object.
Returns:

A pandas DataFrame containing all rows of results.

googlesheet - Google Sheets access

Get compound group parameters from a Google Sheet.

See Google Sheets access for more information.

class commongroups.googlesheet.SheetManager(title, worksheet, key_file)

Bases: object

Object to manage Google Sheets access.

Parameters:
  • key_file (str) – Path to Google service account credentials JSON file.
  • title (str) – Title of the Google Sheet to open.
  • worksheet (str) – Title of the worksheet containing parameters within the Google Sheet.
Raises:

commongroups.errors.NoCredentialsError – If the API credentials are missing or cannot be parsed from JSON.

Notes

Yes, we open Google Sheets by title. It would be nice to open them by key or by URL, but that functionality in gspread is broken because of the “New Sheets”.

get_cmgs(env)

Generate CMGroup objects from parameters in spreadsheet rows.

Parameters:env (commongroups.env.CommonEnv) – The project environment that the returned objects will use to store data, etc.
Yields:CMGroup objects based on parameters in each row.
get_params()

Read parameters and info from spreadsheet rows iteratively.

Stops reading the spreadsheet when a blank row is encountered.

Yields:Parameters and info for each group (row), as nested dicts.
get_spreadsheet()

Open the spreadsheet containing CMG parameters.

Returns:The Google Sheet object.
Return type:gspread.Spreadsheet
params_to_json(path)

Get group parameters from the worksheet and output to a JSON file.

Parameters:path – Path to output file.

hypertext - Generating HTML output

Functions for generating HTML output.

commongroups.hypertext.cmg_to_html(cmg, formats=None, img_source='PubChem', img_size=500)

Generate an HTML document showing results of processing a CMGroup.

Parameters:
  • cmg – A CMGroup object.
  • formats (list) – Other formats to link to for this compound group, such as: json, excel, csv.
  • img_source (str) – How to generate images. Currently the only option is PubChem.
commongroups.hypertext.describe_cmg(cmg)

Generate an HTML snippet describing the parameters of a CMGroup.

commongroups.hypertext.directory(cmgs, env, title='Compound group processing results', formats=None)

Generate an HTML directory for a collection of compound groups.

Writes and HTML file in the environment’s results/html directory.

Parameters:
  • cmgs – Iterable of CMGroup objects.
  • envCommonEnv to contain output.
  • title (str) – Title of page.
  • formats (list) – List of file extensions to use to create links to other formats of this collection, e.g. json, xlsx.
commongroups.hypertext.get_notes(cmg)

Retrieve notes from CMGroup info, if exists.

commongroups.hypertext.info_to_context(info)

Convert CMGroup info to a context for HTML templating.

commongroups.hypertext.pubchem_image(cid_or_container, size=500)

Generate HTML code for a PubChem molecular structure graphic and link.

Parameters:cid_or_container – The CID (int, str) or a subscriptable object that contains a key cid.
Returns:HTML code for an image from PubChem.

ops - Batch operations

Common Groups operations.

commongroups.ops.batch_process(cmgs, env)

Process compound groups in a given environment and output all results.

Use the database connection provided by the environment. Output results to Excel (compound lists and group info) and JSON (group parameters and info). Create a browseable HTML directory of all groups & results.

Parameters:
Returns:

List of processed compound groups.

commongroups.ops.cmgs_from_file(env, path, filetype=None)

Generate compound group objects from a file.

Only the defining parameters and descriptive information for each compound group are imported from the file. Importing lists of compounds for already populated groups is not supported.

Parameters:
  • env (commongroups.env.CommonEnv) – The project environment. Determines the environment used for the CMGroup objects.
  • path (str) – Path to a file containing parameters, and optionally other info, for a number of CMGs.
  • filetype (str) – Type of file; required only if path does not have a file extension.
Yields:

commongroups.cmgroup.CMGroup objects.

commongroups.ops.cmgs_from_googlesheet(env)

Generate compound group objects from parameters given in a Google Sheet.

Use the Google Sheets source referenced in the environment’s configuration.

Parameters:env (commongroups.env.CommonEnv) – Environment to use for all generated groups and for identifying the Google Sheet.
Returns:Generator yielding commongroups.cmgroup.CMGroup objects.
commongroups.ops.collect_to_json(cmgs, env, filename=None)

Write parameters and info for a number of compound groups to a JSON file.

The output is written to cmgroups.json (or other filename if specified) in the project environment’s results directory.

Parameters:

run - The run script

See Usage.

Automatically populate compound groups using a chemical database.

Invoking this module runs a collection of commongroups operations:

  • Read compound group definitions either from the web (Google Sheets) or from a JSON file if specified.
  • Compile and perform database queries based on group definitions.
  • Output results to Excel and JSON and create a browseable HTML directory.
commongroups.run.create_parser()

Create command-line argument parser.

commongroups.run.main()

Run commongroups operations.

commongroups.run.print_version_info()

Print information about the program to the console.

commongroups.run.set_console_loglevel(level)

Change console log level from default (INFO), if specified.

errors - Exceptions

Special errors for commongroups.

exception commongroups.errors.CommonError(*args, **kwargs)

Bases: Exception

Base Exception for all commongroups errors.

exception commongroups.errors.MissingParamError(param, *args, **kwargs)

Bases: commongroups.errors.CommonError

Raised upon failure to access a configuration or group parameter.

exception commongroups.errors.NoCredentialsError(path, *args, **kwargs)

Bases: commongroups.errors.CommonError

Raised when the Google API credentials file cannot be read.

tests - Tests

This package includes a suite of unit tests. Currently, the included tests only cover the architecture of the program itself, and not the logic of SQL queries against the database.

With pytest, tests can be run on the installed package using:

pytest --pyargs commongroups

Or on the source code (without installing) by running:

python setup.py test

For all of the tests to pass, you must have a database already set up, the PostgreSQL server must be running; your commongroups environment must be configured to allow the program to access this database and to access a Google Spreadsheet containing test parameters. Please contact the authors if you want to run tests yourself and would like a sample spreadsheet.