Note
Click here to download the full example code
Generating synthetic model output data for examples, tests or mock studies.¶
Problem¶
As a developer, you want ot generate reference data for examples or tests, controlling the bias of the data with respect to the model output. As a user, you want to generate synthetic experimental data that could be used to validate or verify a model.
Solution¶
A utility function is provided to generate synthetic reference data from a model and either:
- an input dataset or
- a parameter space to sample from.
Example¶
In the example below, synthetic reference data is generated for a bending test analytical beam model, and a bias is added to the data to obtain non-zero error metrics in the validation case.
from gemseo.datasets.io_dataset import IODataset
from numpy import atleast_1d
from vimseo import EXAMPLE_RUNS_DIR
from vimseo.api import create_model
from vimseo.core.model_settings import IntegratedModelSettings
from vimseo.tools.space.space_tool import SpaceTool
from vimseo.utilities.datasets import SEP
from vimseo.utilities.generate_validation_reference import Bias
from vimseo.utilities.generate_validation_reference import (
generate_reference_from_dataset,
)
Load a parameter space to sample from.
space_tool_result = SpaceTool.load_results("bending_test_validation_input_space.json")
print(space_tool_result)
Out:
/home/sebastien.bocquet/PycharmProjects/vimseo/.tox/doc/lib/python3.11/site-packages/pydantic/main.py:209: DeprecationWarning:
Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
SpaceToolResult(metadata=ToolResultMetadata(generic={'datetime': '24-01-2025_21-20-26', 'version': '1.0.1.dev19+gef64ff85.d20250121'}, misc={}, settings={'distribution_name': 'OTTriangularDistribution', 'space_builder_name': 'FromModelCenterAndCov', 'minimum_values': None, 'maximum_values': None, 'center_value_expr': '0.5*(mini+maxi)', 'use_default_values_as_center': True, 'variable_names': ['length', 'width', 'height', 'imposed_dplt', 'young_modulus', 'nu_p'], 'center_values': None, 'cov': 0.05, 'truncate_to_model_bounds': True, 'lower_bounds': None, 'upper_bounds': None}, report={}, model=None), parameter_space=Parameter space:
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+
| Name | Lower bound | Value | Upper bound | Type | Initial distribution | Transformation(x)= |
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+
| nu_p | 0.285 | 0.3002399751580314 | 0.315 | float | Triangular(lower=-1000000000000.0, mode=0.3, upper=1000000000000.0) | Trunc(x) |
| imposed_dplt | -5.25 | -4.99955553660073 | -4.75 | float | Triangular(lower=-1000000000000.0, mode=-5.0, upper=1000000000000.0) | Trunc(x) |
| young_modulus | 199500 | 209999.9992855908 | 220500 | float | Triangular(lower=-1000000000000.0, mode=210000.0, upper=1000000000000.0) | Trunc(x) |
| length | 570 | 599.999950346782 | 630 | float | Triangular(lower=-1000000000000.0, mode=600.0, upper=1000000000000.0) | Trunc(x) |
| height | 38 | 39.99977464700989 | 42 | float | Triangular(lower=-1000000000000.0, mode=40.0, upper=1000000000000.0) | Trunc(x) |
| width | 28.5 | 29.99955343520399 | 31.5 | float | Triangular(lower=-1000000000000.0, mode=30.0, upper=1000000000000.0) | Trunc(x) |
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+)
Generate 3 samples of input data from the parameter space, and create a dataset with the input data.
input_data = space_tool_result.parameter_space.compute_samples(
n_samples=3, as_dict=False
)
reference_data = IODataset()
reference_data.add_group(
IODataset.INPUT_GROUP,
input_data,
space_tool_result.parameter_space.uncertain_variables,
)
Prepare the model and the bias to apply to the output data.
model_name = "BendingTestAnalytical"
load_case = "Cantilever"
model = create_model(
model_name,
load_case,
model_options=IntegratedModelSettings(
directory_archive_root=EXAMPLE_RUNS_DIR / "archive/generate_reference_data",
directory_scratch_root=EXAMPLE_RUNS_DIR / "scratch/generate_reference_data",
cache_file_path=EXAMPLE_RUNS_DIR
/ f"caches/generate_reference_data/{model_name}_{load_case}.hdf",
),
)
outputs_to_bias = {"reaction_forces": Bias(mult_factor=1.05)}
Generate the synthetic reference data from the model, the input dataset and the bias. Specific input data can be prescribed for some input variables, that are not in the input dataset, or that should be different from the one in the input dataset. The generated data can be returned as a dataset:
specific_inputs = {"length": atleast_1d(100.0)}
df = generate_reference_from_dataset(
model,
reference_data,
specific_inputs=specific_inputs,
outputs_to_bias=outputs_to_bias,
as_dataset=True,
)
print(df)
df.to_csv("dataset_validation_beam_cantilever.csv", sep=SEP)
# Or a dataframe:
df = generate_reference_from_dataset(
model,
reference_data,
specific_inputs=specific_inputs,
outputs_to_bias=outputs_to_bias,
as_dataset=False,
)
print(df)
df.to_csv("dataframe_validation_beam_cantilever.csv", sep=SEP)
Out:
GROUP inputs ... outputs
VARIABLE height imposed_dplt length ... reaction_forces user vims_git_version
COMPONENT 0 0 0 ... 0 0 0
0 40.378662 -5.136475 610.926758 ... -2323.603871 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
1 38.020020 -4.822388 599.229004 ... -2113.937195 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
2 39.801025 -5.150757 597.712769 ... -2642.579197 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
[3 rows x 230 columns]
height imposed_dplt length ... reaction_forces user vims_git_version
0 40.378662 -5.136475 610.926758 ... -2323.603871 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
1 38.020020 -4.822388 599.229004 ... -2113.937195 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
2 39.801025 -5.150757 597.712769 ... -2642.579197 sebastien.bocquet bd04719587923889b30edb372ef71eeaeb4c168d
[3 rows x 230 columns]
Note
Data generation can append data to an existing dataframe, but not to an existing dataset.
Total running time of the script: ( 0 minutes 1.789 seconds)
Download Python source code: plot_generate_synthetic_data.py
Download Jupyter notebook: plot_generate_synthetic_data.ipynb