Define Pivot Vectorizer
Token Pivot
The pivot is a type of vectorizer that extracts vectors using a pivot table. For example, assume you have transaction
records and you want a pivot table that shows how many transactions each address has with different tokens. The vector
must have one column that shows the address and several columns that show the tokens. For this case, you can use a pivot
vectorizer. This vectorizer will add the result SFrame as a new SFrame to an input SFrame, so if your input is already a
group, the new child with the name vector
will be added to it. If your input is not grouped, the vectorizer will
create a group and add the result as a child.
How to use?
To use the pivot vectorizer, you must pass the column names and a strategy.
Column Names
-
You must pass the address columns with the
address_cols
argument. The name of the column will be added as the name of columns in the resulting pivot. For example, if you pass address columns equal to "address", the result columns in the pivot will have names in this format:from_address + _ + value
. Additionally, if you want to have some prefix for it, you can pass thecol_prefix
argument. The default value is "token_".Now suppose you have the following table as an input SFrame:
address token address_1 token_1 address_2 token_2 address_1 token_2 address_3 token_1 address_1 token_3 address_2 token_1 And you want to find the pivot that shows the count of interactions each address has with a token. If you set the
col_prefix
to "token_" and theaddress_cols
to["address"]
, you get this result:address token_address_token_1 token_address_token_2 token_address_token_3 address_1 1 1 1 address_2 1 1 0 address_3 1 0 0 Note how the column names are set by your configuration.
-
You must define the
contract_address
andresult_address_col
arguments. The first one shows which column in the input SFrame is the token, and the second argument is the column name for the address in the result SFrame.For example, in the previous table, the "address" column in the result is set by
result_address_col
and thecontract_address
is the "token".
Define a Strategy
Every pivot needs a strategy. The default strategy is "CountStrategy". Some configurations must be set in the strategy, and these can vary across different strategies.
If you want to know more about pivot strategy, see the strategy documentation.
Normalizing
The pivot can normalize the data to ensure that the values are between 0 and 1. If you want to normalize the values, you
must set should_normalize
to True
.
For example, if you don't use normalizing:
sf = DFrame.from_raw(
{
"address": [
"address_1",
"address_2",
"address_1",
"address_3",
"address_1",
"address_2",
],
"token": ["token_1", "token_2", "token_2", "token_1", "token_3", "token_1"],
"amount": [100, 200, 150, 300, 50, 100],
}
)
vectorizer = TokenPivotVectorizer(
strategy=SumStrategy(address_col="address", pivot_columns=["token"], value_column="amount"),
address_cols=["address"],
result_address_col="address",
contract_address_col="token",
should_normalize=False,
)
sf = vectorizer(sf)
sf["vector"].data
>>>
address token_address_token_1 token_address_token_2 \
0 address_1 100 150
1 address_2 100 200
2 address_3 300 0
token_address_token_3
0 50
1 0
2 0
And if you set to normalize the data:
vectorizer = TokenPivotVectorizer(
strategy=SumStrategy(
address_col="address", pivot_columns=["token"], value_column="amount"
),
address_cols=["address"],
result_address_col="address",
contract_address_col="token",
)
sf = vectorizer(sf)
sf["vector"].data
>>>
address token_address_token_1 token_address_token_2 \
0 address_1 0.0 0.75
1 address_2 0.0 1.00
2 address_3 1.0 0.00
token_address_token_3
0 1.0
1 0.0
2 0.0