Options & Configuration

Options & Configuration#

This page documents all available options for configuring your spml2 workflow. You can set these in your options_user.py file.

Available Options#

Main Options#

Option

Type

Default

Description / Possible Values

test_mode

bool

False

Enable test mode for quick runs (uses a small sample)

debug

bool

False

Enable debug mode (extra output, skips some models)

target_name

str

‘target’

Name of the target column in your data

test_df_size

int

1000

Number of rows for test DataFrame (if test_mode)

test_ratio

float

0.20

Proportion of the dataset to use as test split

root

Path or str

‘./input’

Root directory for data files

real_df_filename

str

‘example.dta’

Main data file name (supports .dta, .parquet, .csv, .xlsx)

output_folder

Path or str

‘Output’

Output folder for results

numerical_cols

list[str] or None

None

List of numerical columns (None = infer automatically)

categorical_cols

list[str] or None

None

List of categorical columns (None = infer automatically)

data

pandas.DataFrame or None

None

Pass a custom DataFrame directly (advanced use; otherwise data is loaded from file)

stratify

bool

True

Whether to stratify train/test splits (recommended for classification)

random_state

int

42

Random seed for reproducibility

raise_error

bool

True

Raise errors while running models especially during plotting and getting feature importances (set False to suppress and continue)

sampling_strategy

str or float

‘auto’

SMOTE sampling strategy (see imbalanced-learn docs)

n_splits

int

5

Number of cross-validation splits

shap_plots

bool

False

Enable SHAP plots

roc_plots

bool

True

Enable ROC curve plots

shap_sample_size

int

100

Number of samples for SHAP plots

pipeline

ImbPipeline or None

None

Custom pipeline (advanced users)

search_type

str

‘random’

Hyperparameter search type (‘random’ or ‘grid’)

search_kwargs

dict or None

None

Additional kwargs for search (e.g., {‘verbose’: 1})

Example options_user.py#

from pathlib import Path
from spml2 import Options
from models_user import models
from imblearn.pipeline import Pipeline as ImbPipeline
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE

user_pipeline = ImbPipeline([
    ("preprocessor", StandardScaler()),
    ("smote", SMOTE(random_state=42)),
    # Add more steps as needed
])

options = Options(
    test_mode=False,
    debug=False,
    target_name="target",
    test_df_size=1000,
    test_ratio=0.20,
    root=Path("./input"),
    real_df_filename="example.dta",
    output_folder="Output",
    numerical_cols=None,
    sampling_strategy="auto",
    n_splits=5,
    shap_plots=False,
    roc_plots=True,
    shap_sample_size=100,
    pipeline=user_pipeline,
    search_type="random",
    search_kwargs={"verbose": 1},
)

print(options)

See the comments in options_user.py for more details and customization tips.