hypernets.tabular package¶

Subpackages¶

Submodules¶

hypernets.tabular.cache module¶

class hypernets.tabular.cache.CacheCallback[source]¶

Bases: object

on_apply(fn, cached_data, *args, **kwargs)[source]¶: is fired before applying cached data. raise Exception to skip applying

on_enter(fn, *args, **kwargs)[source]¶: is fired before checking cache. raise Exception to disable cache

on_leave(fn, *args, **kwargs)[source]¶: is fired before leaving fn call. raise Exception to skip store cache

on_store(fn, cached_data, *args, **kwargs)[source]¶: is fired before storing cache. raise Exception to skip store cache

exception hypernets.tabular.cache.SkipCache[source]¶: Bases: Exception

hypernets.tabular.cache.cache(strategy=None, arg_keys=None, attr_keys=None, attrs_to_restore=None, transformer=None, callbacks=None, cache_dir=None)[source]¶

hypernets.tabular.cache.clear(cache_dir=None, fn=None)[source]¶

hypernets.tabular.cache.decorate(fn, *, cache_dir, strategy, arg_keys=None, attr_keys=None, attrs_to_restore=None, transformer=None, callbacks=None)[source]¶

hypernets.tabular.cfg module¶

hypernets.tabular.collinearity module¶

Adapted from https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html handling multicollinearity is by performing hierarchical clustering on the features’ Spearman rank-order correlations, picking a threshold, and keeping a single feature from each cluster.

class hypernets.tabular.collinearity.MultiCollinearityDetector[source]¶

Bases: object

detect(X, method=None)[source]¶

hypernets.tabular.column_selector module¶

class hypernets.tabular.column_selector.AutoCategoryColumnSelector(pattern=None, *, dtype_include=None, dtype_exclude=None, cat_exponent=0.5)[source]¶: Bases: hypernets.tabular.column_selector.ColumnSelector

class hypernets.tabular.column_selector.ColumnSelector(pattern=None, *, dtype_include=None, dtype_exclude=None)[source]¶

Bases: sklearn.compose._column_transformer.make_column_selector

Create a callable to select columns to be used with ColumnTransformer.

make_column_selector() can select columns based on datatype or the columns name with a regex. When using multiple selection criteria, all criteria must match for a column to be selected.

Parameters:	pattern (str, default=None) – Name of columns containing this regex pattern will be included. If None, column selection will not be selected based on pattern. dtype_include (column dtype or list of column dtypes, default=None) – A selection of dtypes to include. For more details, see `pandas.DataFrame.select_dtypes()`. dtype_exclude (column dtype or list of column dtypes, default=None) – A selection of dtypes to exclude. For more details, see `pandas.DataFrame.select_dtypes()`.
Returns:	selector – Callable for column selection to be used by a `ColumnTransformer`.
Return type:	callable

hypernets.tabular.data_cleaner module¶

class hypernets.tabular.data_cleaner.DataCleaner(nan_chars=None, correct_object_dtype=True, drop_constant_columns=True, drop_duplicated_columns=False, drop_label_nan_rows=True, drop_idness_columns=True, replace_inf_values=nan, drop_columns=None, reserve_columns=None, reduce_mem_usage=False, int_convert_to='float')[source]¶

Bases: object

append_drop_columns(columns)[source]¶

clean_data(X, y, *, df_meta=None, reduce_mem_usage)[source]¶

fit_transform(X, y=None, copy_data=True)[source]¶

static get_helper(X, y)[source]¶

get_params()[source]¶

transform(X, y=None, copy_data=True)[source]¶

hypernets.tabular.data_hasher module¶

class hypernets.tabular.data_hasher.DataHasher(method='md5')[source]¶: Bases: object

hypernets.tabular.dataframe_mapper module¶

Adapted from: https://github.com/scikit-learn-contrib/sklearn-pandas 1. Fix the problem of confusion of column names 2. Support columns is a callable object

class hypernets.tabular.dataframe_mapper.DataFrameMapper(features, default=False, df_out=False, input_df=False, df_out_dtype_transforms=None)[source]¶

Bases: sklearn.base.BaseEstimator

Map Pandas data frame column subsets to their own sklearn transformation.

features : a list of tuples with features definitions.: The first element is the pandas column selector. This can be a string (for one column) or a list of strings. The second element is an object that supports sklearn’s transform interface, or a list of such objects. The third element is optional and, if present, must be a dictionary with the options to apply to the transformation. Example: {‘alias’: ‘day_of_week’}
default : default transformer to apply to the columns not: explicitly selected in the mapper. If False (default), discard them. If None, pass them through untouched. Any other transformer will be applied to all the unselected columns as a whole, taken as a 2d-array.
df_out : return a pandas data frame, with each column named using: the pandas column that created it (if there’s only one input and output) or the input columns joined with ‘_’ if there’s multiple inputs, and the name concatenated with ‘_1’, ‘_2’ etc if there’s multiple outputs.
input_df : If True pass the selected columns to the transformers: as a pandas DataFrame or Series. Otherwise pass them as a numpy array. Defaults to False.

fitted_features_¶

Type:	list of tuple(column_name list, fitted transformer, options)

fit(X, y=None)[source]¶

fit_transform(X, y=None, *fit_args)[source]¶

transform(X)[source]¶

class hypernets.tabular.dataframe_mapper.TransformerPipeline(steps)[source]¶

Bases: sklearn.pipeline.Pipeline

Pipeline that expects all steps to be transformers taking a single X argument, an optional y argument, and having fit and transform methods.

Code is copied from sklearn’s Pipeline

fit(X, y=None, **fit_params)[source]¶

Fit the model

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

Parameters:	X (iterable) – Training data. Must fulfill input requirements of first step of the pipeline. y (iterable, default=None) – Training targets. Must fulfill label requirements for all steps of the pipeline. *fit_params (dict of string -> object*) – Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	self – This estimator
Return type:	Pipeline

fit_transform(X, y=None, **fit_params)[source]¶

Fit the model and transform with the final estimator

Fits all the transforms one after the other and transforms the data, then uses fit_transform on transformed data with the final estimator.

Parameters:	X (iterable) – Training data. Must fulfill input requirements of first step of the pipeline. y (iterable, default=None) – Training targets. Must fulfill label requirements for all steps of the pipeline. *fit_params (dict of string -> object*) – Parameters passed to the `fit` method of each step, where each parameter name is prefixed such that parameter `p` for step `s` has key `s__p`.
Returns:	Xt – Transformed samples
Return type:	array-like of shape (n_samples, n_transformed_features)

hypernets.tabular.dataframe_mapper.make_transformer_pipeline(*steps)[source]¶: Construct a TransformerPipeline from the given estimators.

hypernets.tabular.drift_detection module¶

class hypernets.tabular.drift_detection.DriftDetector(preprocessor=None, estimator=None, random_state=None)[source]¶

Bases: object

fit(X_train, X_test, sample_balance=True, max_test_samples=None, cv=5)[source]¶

predict_proba(X)[source]¶

train_test_split(X, y, test_size=0.25, remain_for_train=0.3)[source]¶

class hypernets.tabular.drift_detection.FeatureSelectionCallback[source]¶

Bases: object

on_remove_shift_variable(shift_score, remove_features)[source]¶

on_round_end(round_no, auc, features, remove_features, elapsed)[source]¶

on_round_start(round_no, features)[source]¶

on_task_break(round_no, auc, features)[source]¶

on_task_finished(round_no, auc, features)[source]¶

class hypernets.tabular.drift_detection.FeatureSelectorWithDriftDetection(remove_shift_variable=True, variable_shift_threshold=0.7, variable_shift_scorer=None, auc_threshold=0.55, min_features=10, remove_size=0.1, sample_balance=True, max_test_samples=None, cv=5, random_state=None, callbacks=None)[source]¶

Bases: object

static get_detector(preprocessor=None, estimator=None, random_state=None)[source]¶

parallelizable = True¶

select(X_train, X_test, *, preprocessor=None, estimator=None, copy_data=False)[source]¶

hypernets.tabular.estimator_detector module¶

class hypernets.tabular.estimator_detector.EstimatorDetector(name_or_cls, task, *, init_kwargs=None, fit_kwargs=None, n_samples=100, n_features=5)[source]¶

Bases: object

create_estimator(estimator_cls)[source]¶

fit_estimator(estimator, X, y)[source]¶

get_estimator_cls()[source]¶

prepare_data()[source]¶

hypernets.tabular.metrics module¶

class hypernets.tabular.metrics.Metrics[source]¶

Bases: object

calc_score(y_preds, y_proba=None, metrics=('accuracy', ), task='binary', pos_label=1, classes=None, average=None)¶

evaluate(X, y, metrics, *, task=None, pos_label=None, classes=None, average=None, threshold=0.5, n_jobs=-1)¶

metric_to_scoring(task='binary', pos_label=None)¶

predict(X, *, task=None, classes=None, threshold=0.5, n_jobs=None)¶

predict_proba(X, *, n_jobs=None)¶

proba2predict(*, task=None, threshold=0.5, classes=None)¶

hypernets.tabular.metrics.calc_score(y_true, y_preds, y_proba=None, metrics=('accuracy', ), task='binary', pos_label=1, classes=None, average=None)[source]¶

hypernets.tabular.metrics.evaluate(estimator, X, y, metrics, *, task=None, pos_label=None, classes=None, average=None, threshold=0.5, n_jobs=-1)[source]¶

hypernets.tabular.metrics.metric_to_scoring(metric, task='binary', pos_label=None)[source]¶

hypernets.tabular.metrics.predict(estimator, X, *, task=None, classes=None, threshold=0.5, n_jobs=None)[source]¶

hypernets.tabular.metrics.predict_proba(estimator, X, *, n_jobs=None)[source]¶

hypernets.tabular.metrics.proba2predict(proba, *, task=None, threshold=0.5, classes=None)[source]¶

hypernets.tabular.persistence module¶

hypernets.tabular.pseudo_labeling module¶

class hypernets.tabular.pseudo_labeling.PseudoLabeling(strategy, threshold=None, quantile=None, number=None)[source]¶

Bases: object

DEFAULT_STRATEGY_SETTINGS = {'default_number': 0.2, 'default_quantile': 0.8, 'default_strategy': 'threshold', 'default_threshold': 0.8}¶

static detect_strategy(strategy, threshold=None, quantile=None, number=None)[source]¶

np = <module 'numpy' from '/home/docs/checkouts/readthedocs.org/user_builds/hypernets/envs/latest/lib/python3.6/site-packages/numpy/__init__.py'>¶

select(X_test, classes, proba)[source]¶

hypernets.tabular.sklearn_ex module¶

class hypernets.tabular.sklearn_ex.AsTypeTransformer(*, dtype)[source]¶

Bases: sklearn.base.BaseEstimator

fit(X, y=None)[source]¶

fit_transform(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.CategorizeEncoder(columns=None, remain_numeric=True)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.ColumnEncoder[source]¶

Bases: sklearn.base.BaseEstimator

Encode each column in the dataset with a separate encoder.

create_encoder(X, y)[source]¶

fit(X, y=None, **kwargs)[source]¶

fit_transform(X, y=None, *, copy=True, **kwargs)[source]¶

transform(X, *, copy=True)[source]¶

class hypernets.tabular.sklearn_ex.ConstantImputer(missing_values=nan, fill_value=None, copy=True)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X, y=None)[source]¶

class hypernets.tabular.sklearn_ex.DataFrameWrapper(transform, columns=None)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.DatetimeEncoder(columns=None, include=None, exclude=None, extra=None, drop_constants=True)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

all_items = {'day': 'day', 'dayofyear': 'dayofyear', 'hour': 'hour', 'minute': 'minute', 'month': 'month', 'second': 'second', 'timestamp': <function DatetimeEncoder.<lambda>>, 'week': 'week', 'weekday': 'weekday', 'year': 'year'}¶

default_include = ['month', 'day', 'hour', 'minute', 'week', 'weekday', 'dayofyear']¶

fit(X, y=None)[source]¶

static to_dataframe(X)[source]¶

transform(X, y=None)[source]¶

transform_column(Xc)[source]¶

class hypernets.tabular.sklearn_ex.FeatureImportanceSelection(importances, quantile, min_features=3)[source]¶

Bases: sklearn.base.BaseEstimator

feature_usage()[source]¶

fit(X, y=None, **kwargs)[source]¶

fit_transform(X, y=None, **kwargs)[source]¶

important_features¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.FeatureImportancesSelectionTransformer(task=None, strategy=None, threshold=None, quantile=None, number=None, data_clean=True)[source]¶

Bases: sklearn.base.BaseEstimator

fit(X, y)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.FeatureSelectionTransformer(task=None, max_train_samples=10000, max_test_samples=10000, max_cols=10000, ratio_select_cols=0.1, n_max_cols=100, n_min_cols=10, reserved_cols=None)[source]¶

Bases: sklearn.base.BaseEstimator

feature_score(F_train, y_train, F_test, y_test)[source]¶

fit(X, y)[source]¶

get_categorical_features(X)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.FloatOutputImputer(*, missing_values=nan, strategy='mean', fill_value=None, verbose=0, copy=True, add_indicator=False)[source]¶

Bases: sklearn.impute._base.SimpleImputer

transform(X)[source]¶

Impute all missing values in X.

Parameters:	X ({array-like, sparse matrix}, shape (n_samples, n_features)) – The input data to complete.

class hypernets.tabular.sklearn_ex.GaussRankScaler[source]¶

Bases: sklearn.base.BaseEstimator

fit_transform(X, y=None)[source]¶

class hypernets.tabular.sklearn_ex.LgbmLeavesEncoder(cat_vars, cont_vars, task, **params)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.LocalizedTfidfVectorizer(*, input='content', encoding='utf-8', decode_error='strict', strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer='word', stop_words=None, token_pattern='(?u)\b\w\w+\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class 'numpy.float64'>, norm='l2', use_idf=True, smooth_idf=True, sublinear_tf=False)[source]¶

Bases: sklearn.feature_extraction.text.TfidfVectorizer

decode(doc)[source]¶

Decode the input into a string of unicode symbols.

The decoding strategy depends on the vectorizer parameters.

Parameters:	doc (str) – The string to decode.
Returns:	doc – A string of unicode symbols.
Return type:	str

class hypernets.tabular.sklearn_ex.LogStandardScaler(copy=True, with_mean=True, with_std=True)[source]¶

Bases: sklearn.base.BaseEstimator

fit(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.MultiKBinsDiscretizer(columns=None, bins=None, strategy='quantile')[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.MultiLabelEncoder(columns=None, dtype=None)[source]¶

Bases: sklearn.base.BaseEstimator

fit(X, y=None)[source]¶

fit_transform(X, *args)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.MultiTargetEncoder(n_folds=4, smooth=None, seed=42, split_method='interleaved', dtype=None)[source]¶

Bases: hypernets.tabular.sklearn_ex.ColumnEncoder

create_encoder(X, y)[source]¶

fit(X, y=None, **kwargs)[source]¶

fit_transform(X, y=None, **kwargs)[source]¶

label_encoder_cls¶: alias of sklearn.preprocessing._label.LabelEncoder

target_encoder_cls¶: alias of SlimTargetEncoder

class hypernets.tabular.sklearn_ex.MultiVarLenFeatureEncoder(features)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.PassThroughEstimator[source]¶

Bases: sklearn.base.BaseEstimator

fit(X, y=None)[source]¶

fit_transform(X, y=None)[source]¶

transform(X)[source]¶

class hypernets.tabular.sklearn_ex.SafeLabelEncoder[source]¶

Bases: sklearn.preprocessing._label.LabelEncoder

transform(y)[source]¶

Transform labels to normalized encoding.

Parameters:	y (array-like of shape (n_samples,)) – Target values.
Returns:	y
Return type:	array-like of shape (n_samples,)

class hypernets.tabular.sklearn_ex.SafeOneHotEncoder(*, categories='auto', drop=None, sparse=True, dtype=<class 'numpy.float64'>, handle_unknown='error')[source]¶

Bases: sklearn.preprocessing._encoders.OneHotEncoder

get_feature_names(input_features=None)[source]¶: Override this method to remove non-alphanumeric chars from feature names

class hypernets.tabular.sklearn_ex.SafeOrdinalEncoder(*, categories='auto', dtype=<class 'numpy.float64'>, handle_unknown='error', unknown_value=None)[source]¶

Bases: sklearn.preprocessing._encoders.OrdinalEncoder

Adapted from sklearn OrdinalEncodern Encode categorical features as an integer array.

The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature.

hypernets.tabular.toolbox module¶

class hypernets.tabular.toolbox.ToolBox[source]¶

Bases: object

STRATEGY_NUMBER = 'number'¶

STRATEGY_QUANTILE = 'quantile'¶

STRATEGY_THRESHOLD = 'threshold'¶

classmethod accept(*args)[source]¶

acceptable_types = (<class 'numpy.ndarray'>, <class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.series.Series'>)¶

static array_to_df(arr, *, columns=None, index=None, meta=None)[source]¶

static collapse_last_dim(arr, keep_dim=True)[source]¶: Collapse the last dimension :param arr: data array :param keep_dim: keep the last dim as one or not :return:

classmethod collinearity_detector()[source]¶

column_selector = <module 'hypernets.tabular.column_selector' from '/home/docs/checkouts/readthedocs.org/user_builds/hypernets/checkouts/latest/hypernets/tabular/column_selector.py'>¶

compute_class_weight(*, classes, y)¶

Estimate class weights for unbalanced datasets.

Parameters:	class_weight (dict, 'balanced' or None) – If ‘balanced’, class weights will be given by `n_samples / (n_classes * np.bincount(y))`. If a dictionary is given, keys are classes and values are corresponding class weights. If None is given, the class weights will be uniform. classes (ndarray) – Array of the classes occurring in the data, as given by `np.unique(y_org)` with `y_org` the original class labels. y (array-like of shape (n_samples,)) – Array of original class labels per sample.
Returns:	class_weight_vect – Array with class_weight_vect[i] the weight for i-th class.
Return type:	ndarray of shape (n_classes,)

References

The “balanced” heuristic is inspired by Logistic Regression in Rare Events Data, King, Zen, 2001.

static compute_sample_weight(y)[source]¶

static concat_df(dfs, axis=0, repartition=False, random_state=9527, **kwargs)[source]¶

classmethod data_cleaner(nan_chars=None, correct_object_dtype=True, drop_constant_columns=True, drop_duplicated_columns=False, drop_label_nan_rows=True, drop_idness_columns=True, replace_inf_values=nan, drop_columns=None, reserve_columns=None, reduce_mem_usage=False, int_convert_to='float')[source]¶

classmethod data_hasher(method='md5')[source]¶

static detect_strategy(strategy, *, threshold=None, quantile=None, number=None, default_strategy, default_threshold, default_quantile, default_number)[source]¶

classmethod detect_strategy_of_feature_selection_by_importance(strategy, *, threshold=None, quantile=None, number=None)[source]¶

static df_to_array(df)[source]¶

classmethod drift_detector(preprocessor=None, estimator=None, random_state=None)[source]¶

classmethod estimator_detector(name_or_cls, task, *, init_kwargs=None, fit_kwargs=None, n_samples=100, n_features=5)[source]¶

classmethod feature_selector_with_drift_detection(remove_shift_variable=True, variable_shift_threshold=0.7, variable_shift_scorer=None, auc_threshold=0.55, min_features=10, remove_size=0.1, sample_balance=True, max_test_samples=None, cv=5, random_state=None, callbacks=None)[source]¶

classmethod feature_selector_with_feature_importances(strategy=None, threshold=None, quantile=None, number=None)[source]¶

static fix_binary_predict_proba_result(proba)[source]¶

static from_local(*data)[source]¶

static gc()[source]¶

classmethod general_estimator(X, y=None, estimator=None, task=None)[source]¶

classmethod general_preprocessor(X, y=None)[source]¶

static get_shape(X, allow_none=False)[source]¶

classmethod greedy_ensemble(task, estimators, need_fit=False, n_folds=5, method='soft', random_state=9527, scoring='neg_log_loss', ensemble_size=0)[source]¶

classmethod hstack_array(arrs)[source]¶

classmethod infer_task_type(y, excludes=None)[source]¶

classmethod kfold(n_splits=5, *, shuffle=False, random_state=None)[source]¶

static load_data(data_path, *, reset_index=False, reader_mapping=None, **kwargs)[source]¶

static mean_oof(probas)[source]¶

static memory_free()[source]¶

static memory_total()[source]¶

static memory_usage(*data)[source]¶

static merge_oof(oofs)[source]¶

Parameters:	oofs – list of tuple(idx,proba)
Returns:	merged proba

metrics¶: alias of hypernets.tabular.metrics.Metrics

static nunique_df(df)[source]¶

static parquet()[source]¶

static permutation_importance(estimator, X, y, *, scoring=None, n_repeats=5, n_jobs=None, random_state=None, sample_weight=None, max_samples=1.0)[source]¶: see: sklearn.inspection.permutation_importance

classmethod permutation_importance_batch(estimators, X, y, scoring=None, n_repeats=5, n_jobs=None, random_state=None)[source]¶

Evaluate the importance of features of a set of estimators

Parameters:

estimators (list) – A set of estimators that has already been fitted and is compatible with scorer.
X (ndarray or DataFrame, shape (n_samples, n_features)) – Data on which permutation importance will be computed.
y (array-like or None, shape (n_samples, ) or (n_samples, n_classes)) – Targets for supervised or None for unsupervised.
scoring (string, callable or None, default=None) – Scorer to use. It can be a single string (see scoring_parameter) or a callable (see scoring). If None, the estimator’s default scorer is used.
n_repeats (int, default=5) – Number of times to permute a feature.
n_jobs (int or None, default=None) – The number of jobs to use for the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details.
random_state (int, RandomState instance, or None, default=None) – Pseudo-random number generator to control the permutations of each feature. See random_state.

Returns:

result – Dictionary-like object, with attributes:

importances_mean : ndarray, shape (n_features, ): Mean of feature importance over n_repeats.
importances_std : ndarray, shape (n_features, ): Standard deviation over n_repeats.
importances : ndarray, shape (n_features, n_repeats): Raw permutation importance scores.

Return type:

Bunch

classmethod pseudo_labeling(strategy, threshold=None, quantile=None, number=None)[source]¶

static reset_index(df)[source]¶

static select_1d(arr, indices)[source]¶: Select by indices from the first axis(0).

static select_df(df, indices)[source]¶: Select dataframe by row indices.

classmethod select_feature_by_importance(feature_importance, strategy=None, threshold=None, quantile=None, number=None)[source]¶

static select_valid_oof(y, oof)[source]¶

static stack_array(arrs, axis=0)[source]¶

classmethod statified_kfold(n_splits=5, *, shuffle=False, random_state=None)[source]¶

static take_array(arr, indices, axis=None)[source]¶

static to_local(*data)[source]¶

train_test_split(*, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)¶

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

Parameters:	x (cudf.Series or cudf.DataFrame or cupy.ndarray) – categories to be encoded. It’s elements may or may not be unique y (cudf.Series or cupy.ndarray) – Series containing the target variable.
Returns:	self – A fitted instance of itself to allow method chaining
Return type:	TargetEncoder

Parameters:	x (cudf.Series or cudf.DataFrame or cupy.ndarray) – categories to be encoded. It’s elements may or may not be unique y (cudf.Series or cupy.ndarray) – Series containing the target variable.
Returns:	self – A fitted instance of itself to allow method chaining
Return type:	TargetEncoder

Returns:	splitting – List containing train-test split of inputs. New in version 0.16: If the input is sparse, the output will be a `scipy.sparse.csr_matrix`. Else, output type is the same as the input type.
Return type:	list, length=2 * len(arrays)

hypernets.tabular package¶

Subpackages¶

Submodules¶

hypernets.tabular.cache module¶

hypernets.tabular.cfg module¶

hypernets.tabular.collinearity module¶

hypernets.tabular.column_selector module¶

hypernets.tabular.data_cleaner module¶

hypernets.tabular.data_hasher module¶

hypernets.tabular.dataframe_mapper module¶

hypernets.tabular.drift_detection module¶

hypernets.tabular.estimator_detector module¶

hypernets.tabular.metrics module¶

hypernets.tabular.persistence module¶

hypernets.tabular.pseudo_labeling module¶

hypernets.tabular.sklearn_ex module¶

hypernets.tabular.toolbox module¶

Module contents¶