Options & Configuration#

This page documents all available options for configuring your spml2 workflow. You can set these in your options_user.py file.

Available Options#

Main Options#
Option	Type	Default	Description / Possible Values
test_mode	bool	False	Enable test mode for quick runs (uses a small sample)
debug	bool	False	Enable debug mode (extra output, skips some models)
target_name	str	‘target’	Name of the target column in your data
test_df_size	int	1000	Number of rows for test DataFrame (if test_mode)
test_ratio	float	0.20	Proportion of the dataset to use as test split
root	Path or str	‘./input’	Root directory for data files
real_df_filename	str	‘example.dta’	Main data file name (supports .dta, .parquet, .csv, .xlsx)
output_folder	Path or str	‘Output’	Output folder for results
numerical_cols	list[str] or None	None	List of numerical columns (None = infer automatically)
categorical_cols	list[str] or None	None	List of categorical columns (None = infer automatically)
data	pandas.DataFrame or None	None	Pass a custom DataFrame directly (advanced use; otherwise data is loaded from file)
stratify	bool	True	Whether to stratify train/test splits (recommended for classification)
random_state	int	42	Random seed for reproducibility
raise_error	bool	True	Raise errors while running models especially during plotting and getting feature importances (set False to suppress and continue)
sampling_strategy	str or float	‘auto’	SMOTE sampling strategy (see imbalanced-learn docs)
n_splits	int	5	Number of cross-validation splits
shap_plots	bool	False	Enable SHAP plots
roc_plots	bool	True	Enable ROC curve plots
shap_sample_size	int	100	Number of samples for SHAP plots
pipeline	ImbPipeline or None	None	Custom pipeline (advanced users)
search_type	str	‘random’	Hyperparameter search type (‘random’ or ‘grid’)
search_kwargs	dict or None	None	Additional kwargs for search (e.g., {‘verbose’: 1})

Example options_user.py#

from pathlib import Path
from spml2 import Options
from models_user import models
from imblearn.pipeline import Pipeline as ImbPipeline
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE

user_pipeline = ImbPipeline([
    ("preprocessor", StandardScaler()),
    ("smote", SMOTE(random_state=42)),
    # Add more steps as needed
])

options = Options(
    test_mode=False,
    debug=False,
    target_name="target",
    test_df_size=1000,
    test_ratio=0.20,
    root=Path("./input"),
    real_df_filename="example.dta",
    output_folder="Output",
    numerical_cols=None,
    sampling_strategy="auto",
    n_splits=5,
    shap_plots=False,
    roc_plots=True,
    shap_sample_size=100,
    pipeline=user_pipeline,
    search_type="random",
    search_kwargs={"verbose": 1},
)

print(options)

See the comments in options_user.py for more details and customization tips.

Options & Configuration

Contents

Options & Configuration#

Available Options#

Example options_user.py#