ORD Reaction Converter

Overview

The package automatically extracts all of the reactions in a dataset in Open Reaction Database to a dictionary of Pandas DataFrames which can be saved as CSV files to make a relational database.

The structure of the Pandas DataFrames follows the ORD schema:

Dataset & reaction metadata (provenance)
Reaction identifiers (SMILES, InChI, etc.)
Reaction inputs (reactants, reagents, solvents)
Reaction setup (vessels, automation)
Reaction conditions (temperature, pressure, stirring)
Reaction notes & observations (experimental notes)
Reaction workups (post-reaction processing)
Reaction outcomes (products and analyses)

The package also includes a utility functions module to conveniently extract all of the enum types stored in the .proto file. All columns are named according to the ORD schema.

Installation

Install from PyPI:

pip install ord_rxn_converter

Or install from source:

git clone https://github.com/cwru-sdle/ord_rxn_converter.git
cd ord_rxn_converter
pip install -e .

Quick Start

Extract an entire dataset:

from ord_rxn_converter.dataset_module import extract_dataset

# Extract all data from a dataset file
data = extract_dataset("example_dataset.pb")

# Access different components
print(f"Available data: {list(data.keys())}")
print(f"Number of reactions: {len(data['reaction_metadata'])}")

# Export to CSV files
for key, df in data.items():
    df.to_csv(f"{key}.csv", index=False)

Extract individual reaction components:

from ord_schema.proto import dataset_pb2
from ord_schema.message_helpers import load_message
from ord_rxn_converter.identifiers_module import extract_reaction_identifiers
from ord_rxn_converter.conditions_module import extract_reaction_conditions

# Load a Dataset message from file
dataset = load_message("example1.pb", dataset_pb2.Dataset)

# Access first reaction in dataset
reaction = dataset.reactions[0]

# Extract specific components
identifiers_df = extract_reaction_identifiers(reaction.identifiers)
conditions_df = extract_reaction_conditions(reaction.conditions)

print("Reaction identifiers:")
print(identifiers_df.head())

print("\nReaction conditions:")
print(conditions_df.head())

API Reference

Main Functions

ord_rxn_converter.dataset_module.extract_dataset(filepath, compounds=Empty DataFrame Columns: [] Index: [], persons=Empty DataFrame Columns: [] Index: [])[source]

Extracts all structured data from an ORD dataset file and organizes it into a dictionary of DataFrames.

This function loads a dataset from a .pb or .pbtxt file (compressed or uncompressed), then extracts and organizes its reactions and associated metadata into tabular form. Each component of the reaction—identifiers, inputs, conditions, setup, workups, outcomes, and more—is parsed into a separate pandas.DataFrame.

If compound or person tables are provided, they will be updated to include any new compounds or people found during extraction.

Parameters:

filepath (str) – Path to the input file (either zipped or unzipped Google Protobuf format).
compounds (pd.DataFrame, optional) – Existing compound table to update or append to. Defaults to an empty DataFrame.
persons (pd.DataFrame, optional) – Existing person table to update or append to. Defaults to an empty DataFrame.

Returns:

A dictionary containing the following keys, each mapping to a pandas.DataFrame:

”dataset_metadata”: Dataset-level metadata.

”reaction_metadata”: Reaction-level metadata including provenance and contributor info.

”reaction_identifiers”: SMILES, InChI, and other identifiers for each reaction.

”input_components”: Details of each input component (compound, amount, role, etc.).

”input_addition”: Temporal details for the addition of inputs.

”reaction_setup”: Setup information including vessels and automation.

”reaction_conditions”: Environmental and operational reaction conditions.

”reaction_notes”: Observations and experimental notes.

”reaction_workups”: Post-reaction processing steps.

”reaction_outcomes”: Products and analyses of reaction outcomes.

”compound”: A table of all compounds involved across reactions.

”person”: A table of contributors extracted from provenance.

Return type:

dict

Raises:

FileNotFoundError – If the filepath does not exist.
ValueError – If the Protobuf file is invalid or does not conform to dataset_pb2.Dataset.

Example

>>> from ord_rxn_converter.dataset_module import extract_dataset
>>> out = extract_dataset("example_dataset.pb")
>>> out["reaction_metadata"].head()

Core Modules

Utility Functions Module

ord_rxn_converter.utility_functions_module.extract_enums_from_message(descriptor, parent_name='')[source]

Recursively extract enums from messages and nested messages.

This function traverses through a protobuf message descriptor and extracts all enum types defined within it and its nested messages. For each enum type, it creates a mapping between enum value numbers and their names.

Parameters:

descriptor – The descriptor of the protobuf message to extract enums from.
parent_name – The name of the parent message for nested messages, used for constructing fully qualified enum names. Default is an empty string.

Returns:

A dictionary mapping fully qualified enum names to dictionaries that map enum value numbers to their names. The structure is:

{: ‘EnumName’: {value_number: ‘VALUE_NAME’, …}, ‘ParentMessage.NestedEnum’: {value_number: ‘VALUE_NAME’, …}, …

}

ord_rxn_converter.utility_functions_module.extract_all_enums(proto_module)[source]

Extract enums from all message types in the proto module.

This function serves as the main entry point for extracting all enum types from a protobuf module. It iterates through all attributes of the module, identifies protobuf message types, and extracts all enum types defined within them.

Parameters:

proto_module – The protobuf module (e.g., dataset_pb2, reaction_pb2) to extract enums from.

Returns:

A dictionary mapping fully qualified enum names to dictionaries that map enum value numbers to their names. The structure is:

{: ‘MessageName.EnumName’: {value_number: ‘VALUE_NAME’, …}, ‘MessageName.NestedMessage.NestedEnum’: {value_number: ‘VALUE_NAME’, …}, …

}

Example

>>> from ord_schema.proto import reaction_pb2
>>> from utility_functions_module import extract_all_enums
>>> enums_data = extract_all_enums(reaction_pb2)
>>> print(enums_data['Analysis.AnalysisType'])
{0: 'UNSPECIFIED', 1: 'CUSTOM', 2: 'LC', 3: 'GC', ...}

Dataset Module

ord_rxn_converter.dataset_module.extract_dataset(filepath, compounds=Empty DataFrame Columns: [] Index: [], persons=Empty DataFrame Columns: [] Index: [])[source]

Extracts all structured data from an ORD dataset file and organizes it into a dictionary of DataFrames.

This function loads a dataset from a .pb or .pbtxt file (compressed or uncompressed), then extracts and organizes its reactions and associated metadata into tabular form. Each component of the reaction—identifiers, inputs, conditions, setup, workups, outcomes, and more—is parsed into a separate pandas.DataFrame.

If compound or person tables are provided, they will be updated to include any new compounds or people found during extraction.

Parameters:

filepath (str) – Path to the input file (either zipped or unzipped Google Protobuf format).
compounds (pd.DataFrame, optional) – Existing compound table to update or append to. Defaults to an empty DataFrame.
persons (pd.DataFrame, optional) – Existing person table to update or append to. Defaults to an empty DataFrame.

Returns:

A dictionary containing the following keys, each mapping to a pandas.DataFrame:

”dataset_metadata”: Dataset-level metadata.

”reaction_metadata”: Reaction-level metadata including provenance and contributor info.

”reaction_identifiers”: SMILES, InChI, and other identifiers for each reaction.

”input_components”: Details of each input component (compound, amount, role, etc.).

”input_addition”: Temporal details for the addition of inputs.

”reaction_setup”: Setup information including vessels and automation.

”reaction_conditions”: Environmental and operational reaction conditions.

”reaction_notes”: Observations and experimental notes.

”reaction_workups”: Post-reaction processing steps.

”reaction_outcomes”: Products and analyses of reaction outcomes.

”compound”: A table of all compounds involved across reactions.

”person”: A table of contributors extracted from provenance.

Return type:

dict

Raises:

FileNotFoundError – If the filepath does not exist.
ValueError – If the Protobuf file is invalid or does not conform to dataset_pb2.Dataset.

Example

>>> from ord_rxn_converter.dataset_module import extract_dataset
>>> out = extract_dataset("example_dataset.pb")
>>> out["reaction_metadata"].head()

Metadata Module

ord_rxn_converter.metadata_module.extract_dataset_metadata(dataset)[source]

Extracts key metadata from a loaded ORD dataset message.

This function parses a loaded Protocol Buffer dataset message and extracts high-level metadata such as a modified dataset ID, original ORD ID, name, and description. The modified ID is formatted to reflect that the dataset is stored in an MDS (custom) database.

Parameters:: dataset (dataset_pb2.Dataset) – A dataset message loaded via load_message from the ORD schema.
Returns:: A list containing the following metadata fields: - dataset_id (str): Custom MDS-formatted dataset ID. - ord_dataset_id (str): Original dataset ID from ORD. - name (str): Human-readable name of the dataset. - description (str): Textual description of the dataset.
Return type:: list

Example

>>> from metadata_module import extract_dataset_metadata
>>> dataset = load_message("example_dataset.pb", dataset_pb2.Dataset())
>>> extract_dataset_metadata(dataset)
['mds_dataset-000001', 'ord_dataset-000001', '...', '...']

ord_rxn_converter.metadata_module.extract_reaction_metadata(provenance, reactionID)[source]

Extracts reaction-level provenance and contributor metadata from a reaction.

This function parses the Provenance message from a reaction in an ORD dataset, extracting detailed metadata related to: - The reaction’s source (e.g., DOI, patent, publication) - Timing and authorship of creation and modifications - Contributor identities (with ORCID and contact details)

Parameters:

provenance (reaction_pb2.Provenance) – A Provenance message associated with a reaction.
reactionID (str) – The unique identifier of the reaction being processed.

Returns:

provenance_data (list): Reaction-level metadata including:
- reactionID (str)
- experimenter_orcid (str)
- city (str)
- experiment_start (str)
- doi (str)
- patent (str)
- publication_url (str)
- created_time (str)
- created_person_orcid (str)
- created_details (str)
- modified_times (str, comma-separated)
- modified_people (str, comma-separated ORCIDs)
person_metadata (list of list of str): Contributor metadata:
- Each inner list includes:
  [orcid, username, full_name, organization, email]

Return type:

tuple

Example

>>> from metadata_module import extract_reaction_metada
>>> reaction = dataset.reactions[0]
>>> extract_reaction_metadata(reaction.provenance, "reaction-001")
(['reaction-001', '0000-0001-...', 'Boston', ...],
 [['jsmith', 'John Smith', '0000-0001-...', ...], ...])

Identifiers Module

ord_rxn_converter.identifiers_module.extract_reaction_identifiers(identifiers, reactionID: str) → list[source]

Extracts detailed reaction identifier information for a given reaction.

Parameters:

identifiers (list) – A list of ReactionIdentifier protobuf messages.
reactionID (str) – Unique reaction ID string.

Returns:

A list in the format:: [reactionID, reaction_smiles, reaction_cxsmiles, rdfile, rinchi, reaction_type, unspecified, custom, details_dict, mapped_dict]

Return type:

list

Example

>>> from identifiers_module import extract_reaction_identifiers
>>> extract_reaction_identifiers(reaction.identifiers, 'rxn-000001')
['rxn-000001', 'CCO>>CC=O', None, None, None,
 'REACTION_TYPE_XYZ', None, None,
 {'REACTION_CXSMILES': 'CCO>>CC=O'}, {'REACTION_CXSMILES': True}]

ord_rxn_converter.identifiers_module.extract_compound_identifiers(compound_identifiers)[source]

Extracts compound identifier values and ensures key identifiers are present.

Generates missing InChI keys and CXSMILES if possible using RDKit.

Parameters:

compound_identifiers (list) – A list of CompoundIdentifier protobuf messages.

Returns:

str: InChI key of the compound.
dict: Dictionary of identifier types to their values.

Return type:

tuple

Example

>>> from identifiers_module import extract_compound_identifiers
>>> compound_identifiers = reaction.inputs['...'].components[0].identifiers
>>> extract_compound_identifiers(compound_identifiers)
('ROSDSFDQCJNGOL-UHFFFAOYSA-N', {'NAME': 'dimethylamine', 'SMILES': 'CCO', ...})

ord_rxn_converter.identifiers_module.generate_compound_table(compound_identifiers)[source]

Generates a full set of compound identifiers in a fixed order.

If InChI key or CXSMILES are missing, attempts to generate them using RDKit.

Parameters:

compound_identifiers (list) – A list of CompoundIdentifier protobuf messages, typically accessed via reaction.inputs[‘m1_m2’].components[0].identifiers.

Returns:

A list of compound identifier values in this order:: [inchi_key, smiles, inchi, iupac_name, name, cas_number, pubchem_cid, chemspider_id, cxsmiles, unspecified, custom, molblock, xyz, uniprot_id, pdb_id, amino_acid_sequence, helm, mdl]

Return type:

list

Example

>>> from identifiers_module import generate_compound_table
>>> compound_identifiers = reaction.inputs['...'].components[0].identifiers
>>> generate_compound_table(compound_identifiers)
['BQJCRHHNABKAKU-KBQPJGBKSA-N', 'CCO', 'InChI=1S/C2H6O/...', ...]

Inputs Module

ord_rxn_converter.inputs_module.extract_input_addition(inputs, reactionID='')[source]

Extracts detailed addition information from reaction inputs.

This function processes the reaction inputs and extracts detailed information about each addition, such as timing, speed, device, temperature, flow rate, and texture information.

Parameters:

inputs (dict) – Reaction inputs from a reaction object (protobuf-based ORD schema). Typically accessed as reaction.inputs[‘input_key’].
reactionID (str, optional) – Unique identifier for the reaction. Defaults to ‘’.

Returns:

A list of lists, each containing addition details:

[reactionID, input key, addition order, addition time value,: addition time unit, addition speed, addition duration value, addition duration unit, addition device, addition temperature value, addition temperature unit, flow rate value, flow rate unit, reaction texture, input texture details]

Return type:

list

Example

>>> from inputs_module import extract_input_addition
>>> extract_input_addition(reaction.inputs, reactionID='rxn-001')
[['', 'm1_m2', 0, 0.0, 'UNSPECIFIED', ...]]

ord_rxn_converter.inputs_module.extract_input_components(inputs, reactionID='')[source]

Extracts detailed information about reaction input components and their compound identifiers.

Processes each input and its components, extracting chemical and experimental details including identifiers, amounts, roles, preparations, sources, features, analyses, and texture.

Parameters:

inputs (dict) – Reaction inputs from a reaction object (protobuf-based ORD schema). Typically accessed as reaction.inputs[‘input_key’].
reactionID (str, optional) – Unique identifier for the reaction. Defaults to ‘’.

Returns:

list: List of input component details with structure: [reactionID, input key, inchi_key, amount value, amount unit, reaction role, is limiting (bool), compound preparation (list of dicts), component source (dict), feature dictionary, analyses list, texture dictionary]
list: List of compound identifier tables (each a list of compound identifiers).

Return type:

tuple

Example

>>> from inputs_module import extract_input_components
>>> components, compound_table = extract_input_components(reaction.inputs, reactionID='rxn-001')

ord_rxn_converter.inputs_module.extract_amount(compound)[source]

Extracts the amount value and unit from a compound’s amount field.

This function reads the nested amount field from a compound (which is a protobuf oneof) and returns its numerical value and the unit as a string.

Parameters:

compound (protobuf message) – A compound component of a reaction with an amount field.

Returns:

float: Amount value.
str: Amount unit name (e.g., ‘MASS’, ‘MOLE’, ‘GRAM’).

Return type:

tuple

Example

>>> from inputs_module import extract_amount
>>> amount_value, amount_unit = extract_amount(component)

Setup Module

ord_rxn_converter.setup_module.extract_reaction_setup(setup, reactionID)[source]

Extracts detailed setup information from a reaction object.

This function processes the reaction setup section of an ORD (Open Reaction Database) reaction object and extracts metadata about the vessel, its material, volume, preparations, attachments, automation details, and environmental setup.

Parameters:

setup (reaction_pb2.ReactionSetup) – A ReactionSetup protobuf object containing details about how the reaction was set up.
reactionID (str) – Unique identifier for the reaction being processed.

Returns:

A list representing the reaction setup details in the following structure: [reactionID (str), vessel type (str), vessel material (str), vessel volume (float or None), volume unit (str or None), vessel preparations (dict or None), vessel attachments (dict or None), is automated (bool or None), automation platform (str or None), automation code (str), reaction environment (str or None)]

Return type:

list

Example

>>> from ord_schema.proto import reaction_pb2
>>> from setup_module import extract_reaction_setup
>>> from ord_schema.proto import dataset_pb2
>>> dataset = dataset_pb2.Dataset()
>>> reaction = dataset.reactions[0]
>>> reaction_setup = extract_reaction_setup(reaction.setup, reactionID='rxn-042')

Conditions Module

ord_rxn_converter.conditions_module.extract_reaction_conditions(conditions, reactionID: str) → list[source]

Extracts reaction condition information from an ORD reaction message.

This function aggregates all available reaction condition data from an Open Reaction Database (ORD) reaction message, including temperature, pressure, stirring, illumination, electrochemistry, and flow conditions.

Parameters:: reaction (message.Reaction) – The ORD reaction message from which to extract condition data.
Returns:: A dictionary where keys represent condition types (e.g., “temperature”, “pressure”, etc.) and values are dictionaries of extracted parameters for each condition. If a condition is not present in the reaction message, it is omitted from the output.
Return type:: Dict[str, Dict[str, Union[str, float, bool]]]

Example

>>> from ord_schema.proto import reaction_pb2
>>> from ord_schema.proto import dataset_pb2
>>> dataset = dataset_pb2.Dataset()
>>> reaction = dataset.reactions[0]
>>> conditions = extract_reaction_conditions(reaction.conditions, reactionID='rxn-028')

ord_rxn_converter.conditions_module.temperature_conditions(temperature) → dict[source]

Extracts temperature condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing temperature conditions.
Returns:: A dictionary with temperature condition details, or None if no temperature condition is found. Keys may include “value”, “units”, “setpoint”, and “control”.
Return type:: Optional[Dict[str, Union[str, float]]]

ord_rxn_converter.conditions_module.pressure_conditions(pressure) → dict[source]

Extracts pressure condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing pressure conditions.
Returns:: A dictionary with pressure condition details, or None if no pressure condition is found. Keys may include “value”, “units”, and “control”.
Return type:: Optional[Dict[str, Union[str, float]]]

ord_rxn_converter.conditions_module.stirring_conditions(stirring) → dict[source]

Extracts stirring condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing stirring conditions.
Returns:: A dictionary with stirring condition details, or None if no stirring condition is found. Keys may include “type”, “rate”, “units”, and “control”.
Return type:: Optional[Dict[str, Union[str, float, bool]]]

ord_rxn_converter.conditions_module.illumination_conditions(illumination) → dict[source]

Extracts illumination condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing illumination conditions.
Returns:: A dictionary with illumination condition details, or None if no illumination condition is found. Keys may include “type”, “wavelength”, and “wavelength_units”.
Return type:: Optional[Dict[str, Union[str, float]]]

ord_rxn_converter.conditions_module.electrochemistry_conditions(electrochemistry) → dict[source]

Extracts electrochemistry condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing electrochemistry conditions.
Returns:: A dictionary with electrochemistry condition details, or None if no electrochemistry condition is found. Keys may include “type”, “current”, “potential”, “cell_type”, “anode”, and “cathode”.
Return type:: Optional[Dict[str, Union[str, float]]]

ord_rxn_converter.conditions_module.flow_conditions(flow) → dict[source]

Extracts flow condition from an ORD reaction message.

Parameters:: reaction (message.Reaction) – The reaction message containing flow conditions.
Returns:: A dictionary with flow condition details, or None if no flow condition is found. Keys may include “flow_rate”, “flow_rate_units”, “residence_time”, “residence_time_units”, and “slug_diameter”.
Return type:: Optional[Dict[str, Union[str, float]]]

Notes & Observations Module

ord_rxn_converter.notes_observations_module.extract_notes_observations(reactionID, notes, observations=None)[source]

Extracts reaction notes and optional observations from ORD reaction data.

This function takes a reaction ID, notes, and optionally observations, and returns a list summarizing various reaction notes and details from observations if provided. The notes include flags for reaction characteristics, safety notes, and procedure details. Observations include time, comments, and image metadata.

Parameters:

reactionID (str) – Unique identifier for the reaction.
notes (object) –
Notes object containing reaction flags and textual details. Expected attributes:
- is_heterogeneous (bool)
- forms_precipitate (bool)
- is_exothermic (bool)
- offgasses (bool)
- is_sensitive_to_moisture (bool)
- is_sensitive_to_oxygen (bool)
- is_sensitive_to_light (bool)
- safety_notes (str)
- procedure_details (str)
observations (list, optional) – List of observation objects, each possibly containing: - time.value (float) - time.units (enum int) - comment (str) - image with kind, description, and format attributes

Returns:

A list containing the following elements in order:

[reactionID (str), is_heterogeneous (bool), forms_precipitate (bool), is_exothermic (bool), offgasses (bool), is_sensitive_to_moisture (bool), is_sensitive_to_oxygen (bool), is_sensitive_to_light (bool), safety_notes (str), procedure_details (str), observations (list of dict or None)]

Each dict in observations contains keys:

’time’: float
’timeUnit’: str
’comment’: str
’imageKind’: str
’imageDescription’: str
’imageFormat’: str

Return type:

list

Workups Module

ord_rxn_converter.workups_module.extract_reaction_workups(workups, reactionID)[source]

Extracts workup details from an ORD reaction workup list.

This function parses a list of ReactionWorkup protobuf messages associated with a reaction and extracts structured information including compound input components, input addition details, temperature and stirring conditions, and metadata such as pH or automation status.

Parameters:

workups (list) – A list of ReactionWorkup messages from reaction_pb2.Reaction.workups.
reactionID (str) – A unique identifier for the reaction.

Returns:

A list of extracted workup information for the reaction. Each item in the list corresponds to a single ReactionWorkup and contains the following fields:

[reactionID (str), workup_type (str), workup.details (str), workup.duration.value (float), workup_duration_unit (str), input_components (dict or None), input_addition_details (dict or None), temperature_conditions_list (list or None), keep_phase (str), stirring_conditions_list (list or None), target_ph (float or None), is_automated (bool or None)]

Return type:

list

Outcomes Module

ord_rxn_converter.outcomes_module.extract_reaction_outcomes(reactionID, outcomes)[source]

Extracts outcome information from ORD reaction data.

Takes a reaction outcome object (in Google Protobuf message type based on ORD structure schema) and extracts data about reaction outcomes including reaction time, conversion percentages, product information, and analytical data.

Parameters:

reactionID (str) – Unique identifier for the reaction.
outcomes (list) – List of outcome objects from a reaction, containing reaction time, conversion, products, and analyses data.

Returns:

A tuple containing two elements:

outcomes_list (list): A list of lists, where each inner list contains:: [reactionID, outcomeKey, reaction_time_value, time_unit, conversion_value, products_list, analyses_list]
outcome_identifiers (list): A list of compound identifiers associated: with the reaction outcomes, or None if no products are present.

Return type:

tuple

ord_rxn_converter.outcomes_module.extract_product(products)[source]

Extracts product data and related measurements from ORD product objects.

Takes product objects from a reaction outcome and extracts information including identifiers, measurements, textures, features, and reaction roles. Also generates compound tables with standardized identifiers.

Parameters:

products (list) – List of product objects from a reaction outcome.

Returns:

A tuple containing two elements:

products_list (list): A list of lists, where each inner list contains:: [inchi_key, is_desired_product, products_measurements, isolated_color, product_texture, feature_dict, reaction_role]
compound_identifiers (list): A list of compound tables containing: standardized compound identifiers for all products.

Return type:

tuple

ord_rxn_converter.outcomes_module.extract_product_measurements(measurements)[source]

Extracts measurement data from ORD product measurements.

Processes measurement objects to extract analytical data including measurement types, values, spectroscopic details, and chromatographic information.

Parameters:

measurements (list) – List of measurement objects associated with a product.

Returns:

A list of lists, where each inner list contains measurement data:: [analysis_key, measurement_type, details, uses_internal_standard, is_normalized, uses_authentic_standard, compound_authentic, measurement_value, retention_time, time_unit, mass_spec_type, mass_spec_details, tic_minimum, tic_maximum, eic_masses, selectivity, wavelength, wavelength_unit]

Return type:

list

ord_rxn_converter.outcomes_module.extract_analyses(analyses)[source]

Extracts analytical data from ORD reaction analyses.

Processes analysis objects to extract information about analytical techniques, instrument details, and associated data for reaction outcome characterization.

Parameters:

analyses (dict) – Dictionary of analysis objects keyed by analysis_key.

Returns:

A list of dictionaries, where each dictionary contains:: {‘analysisKey’: str, ‘analysisType’: str, ‘Details’: str, ‘CHMO_ID’: str, ‘IsolatedSpecies’: bool, ‘data’: dict, ‘instrumentManufacturer’: str, ‘lastCalibrated’: datetime}

Return type:

list

Examples

Working with Multiple Files

import pandas as pd
from ord_rxn_converter.dataset_module import extract_dataset

# Process multiple dataset files
file_list = ['dataset1.pb', 'dataset2.pb', 'dataset3.pb']
all_data = {}

for file_path in file_list:
    data = extract_dataset(file_path)

    # Combine data from multiple files
    for key, df in data.items():
        if key in all_data:
            all_data[key] = pd.concat([all_data[key], df], ignore_index=True)
        else:
            all_data[key] = df

print(f"Total reactions processed: {len(all_data['reaction_metadata'])}")

Filtering and Analysis

# Load dataset
data = extract_dataset("reactions.pb")

# Filter reactions by temperature
conditions = data['reaction_conditions']
temp_conditions = conditions[conditions['condition_type'] == 'temperature']
high_temp_reactions = temp_conditions[temp_conditions['value'] > 100]

# Analyze outcomes
outcomes = data['reaction_outcomes']
successful_reactions = outcomes[outcomes['conversion'] > 0.8]

print(f"High temperature reactions: {len(high_temp_reactions)}")
print(f"High conversion reactions: {len(successful_reactions)}")

Notes

The package automatically handles both compressed (.pb.gz) and uncompressed (.pb) files
Large datasets may require significant memory for processing
All DataFrames follow consistent naming conventions based on the ORD schema
Missing or optional fields are handled gracefully with appropriate default values

ORD Reaction Converter

Overview

Installation

Quick Start

API Reference

Main Functions

Core Modules

Utility Functions Module

Dataset Module

Metadata Module

Identifiers Module

Inputs Module

Setup Module

Conditions Module

Notes & Observations Module

Workups Module

Outcomes Module

Examples

Working with Multiple Files

Filtering and Analysis

Notes

See Also