Skip to main content

Define Pivot Vectorizer

Token Pivot

The pivot is a type of vectorizer that extracts vectors using a pivot table. For example, assume you have transaction records and you want a pivot table that shows how many transactions each address has with different tokens. The vector must have one column that shows the address and several columns that show the tokens. For this case, you can use a pivot vectorizer. This vectorizer will add the result SFrame as a new SFrame to an input SFrame, so if your input is already a group, the new child with the name vector will be added to it. If your input is not grouped, the vectorizer will create a group and add the result as a child.

How to use?

To use the pivot vectorizer, you must pass the column names and a strategy.

Column Names

  • You must pass the address columns with the address_cols argument. The name of the column will be added as the name of columns in the resulting pivot. For example, if you pass address columns equal to "address", the result columns in the pivot will have names in this format: from_address + _ + value. Additionally, if you want to have some prefix for it, you can pass the col_prefix argument. The default value is "token_".

    Now suppose you have the following table as an input SFrame:

    addresstoken
    address_1token_1
    address_2token_2
    address_1token_2
    address_3token_1
    address_1token_3
    address_2token_1

    And you want to find the pivot that shows the count of interactions each address has with a token. If you set the col_prefix to "token_" and the address_cols to ["address"], you get this result:

    addresstoken_address_token_1token_address_token_2token_address_token_3
    address_1111
    address_2110
    address_3100

    Note how the column names are set by your configuration.

  • You must define the contract_address and result_address_col arguments. The first one shows which column in the input SFrame is the token, and the second argument is the column name for the address in the result SFrame.

    For example, in the previous table, the "address" column in the result is set by result_address_col and the contract_address is the "token".

Define a Strategy

Every pivot needs a strategy. The default strategy is "CountStrategy". Some configurations must be set in the strategy, and these can vary across different strategies.

If you want to know more about pivot strategy, see the strategy documentation.

Normalizing

The pivot can normalize the data to ensure that the values are between 0 and 1. If you want to normalize the values, you must set should_normalize to True.

For example, if you don't use normalizing:

sf = DFrame.from_raw(
{
"address": [
"address_1",
"address_2",
"address_1",
"address_3",
"address_1",
"address_2",
],
"token": ["token_1", "token_2", "token_2", "token_1", "token_3", "token_1"],
"amount": [100, 200, 150, 300, 50, 100],
}
)
vectorizer = TokenPivotVectorizer(
strategy=SumStrategy(address_col="address", pivot_columns=["token"], value_column="amount"),
address_cols=["address"],
result_address_col="address",
contract_address_col="token",
should_normalize=False,
)
sf = vectorizer(sf)
sf["vector"].data

>>>
address token_address_token_1 token_address_token_2 \
0 address_1 100 150
1 address_2 100 200
2 address_3 300 0
token_address_token_3
0 50
1 0
2 0

And if you set to normalize the data:

vectorizer = TokenPivotVectorizer(
strategy=SumStrategy(
address_col="address", pivot_columns=["token"], value_column="amount"
),
address_cols=["address"],
result_address_col="address",
contract_address_col="token",
)
sf = vectorizer(sf)
sf["vector"].data
>>>
address token_address_token_1 token_address_token_2 \
0 address_1 0.0 0.75
1 address_2 0.0 1.00
2 address_3 1.0 0.00
token_address_token_3
0 1.0
1 0.0
2 0.0