Create Transformer
Creating a transformer can be done simply by following these steps:
-
Define default group keys for the input sf. To do this, you must set
DEFAULT_GROUP_KEYS
.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}The
DEFAULT_GROUP_KEYS
are set to thegroup_keys
by default. If you want to use different names, add thegroup_keys
to the constructor when you use it. -
If you need more than one raw data for your transformation and all of them must exist and cannot be None, set
ONLY_GROUP
toTrue
.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}
ONLY_GROUP = TrueThe default value of
ONLY_GROUP
isFalse
. -
Override the validate method. If you need to validate that the input sf must have specific columns, you can use the
_validate_columns
method.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}
ONLY_GROUP = True
def validate(self, sf: SFrame):
super().validate(sf)
self._validate_columns(sf, self.default_sf_key, "column_1", "column_2") -
Set
HANDLER_NAME
to your preferred value. For example, you can choosederive
for derivers andtrim
for trimmers.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}
ONLY_GROUP = True
def validate(self, sf: SFrame):
super().validate(sf)
self._validate_columns(sf, self.default_sf_key, "column_1", "column_2") -
Implement methods based on the input raw format. The method name should follow this rule:
HANDLER_NAME + _ + FRAME_NAME
For example, FRAME_NAME for pandas is
df
and for pyspark isspf
.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}
ONLY_GROUP = True
def validate(self, sf: SFrame):
super().validate(sf)
self._validate_columns(sf, self.default_sf_key, "column_1", "column_2")
def transform_df(default: pd.DataFrame, address: pd.DataFrame, *args, **kwargs): ... -
Return a dictionary from the handler method, so that the keys match the
group_keys
and the values are raw data.class CustomTransformer(Transformer):
DEFAULT_GROUP_KEYS = {"default": "default", "address": "address"}
ONLY_GROUP = True
def validate(self, sf: SFrame):
super().validate(sf)
self._validate_columns(sf, self.default_sf_key, "column_1", "column_2")
def transform_df(default: pd.DataFrame, address: pd.DataFrame, *args, **kwargs):
# your transformation implementation ....
return {"default": default, "address": address}