Define Pivot Vectorizer

Token Pivot

The pivot is a type of vectorizer that extracts vectors using a pivot table. For example, assume you have transaction records and you want a pivot table that shows how many transactions each address has with different tokens. The vector must have one column that shows the address and several columns that show the tokens. For this case, you can use a pivot vectorizer. This vectorizer will add the result SFrame as a new SFrame to an input SFrame, so if your input is already a group, the new child with the name vector will be added to it. If your input is not grouped, the vectorizer will create a group and add the result as a child.

How to use?

To use the pivot vectorizer, you must pass the column names and a strategy.

Column Names

You must pass the address columns with the address_cols argument. The name of the column will be added as the name of columns in the resulting pivot. For example, if you pass address columns equal to "address", the result columns in the pivot will have names in this format: from_address + _ + value. Additionally, if you want to have some prefix for it, you can pass the col_prefix argument. The default value is "token_".

Now suppose you have the following table as an input SFrame:

address token
address_1 token_1
address_2 token_2
address_1 token_2
address_3 token_1
address_1 token_3
address_2 token_1

And you want to find the pivot that shows the count of interactions each address has with a token. If you set the col_prefix to "token_" and the address_cols to ["address"], you get this result:

address token_address_token_1 token_address_token_2 token_address_token_3
address_1 1 1 1
address_2 1 1 0
address_3 1 0 0

Note how the column names are set by your configuration.
You must define the contract_address and result_address_col arguments. The first one shows which column in the input SFrame is the token, and the second argument is the column name for the address in the result SFrame.

For example, in the previous table, the "address" column in the result is set by result_address_col and the contract_address is the "token".

address	token
address_1	token_1
address_2	token_2
address_1	token_2
address_3	token_1
address_1	token_3
address_2	token_1

address	token_address_token_1	token_address_token_2	token_address_token_3
address_1	1	1	1
address_2	1	1	0
address_3	1	0	0

Define a Strategy

Every pivot needs a strategy. The default strategy is "CountStrategy". Some configurations must be set in the strategy, and these can vary across different strategies.

If you want to know more about pivot strategy, see the strategy documentation.

Normalizing

The pivot can normalize the data to ensure that the values are between 0 and 1. If you want to normalize the values, you must set should_normalize to True.

For example, if you don't use normalizing:

sf = DFrame.from_raw(
    {
        "address": [
            "address_1",
            "address_2",
            "address_1",
            "address_3",
            "address_1",
            "address_2",
        ],
        "token": ["token_1", "token_2", "token_2", "token_1", "token_3", "token_1"],
        "amount": [100, 200, 150, 300, 50, 100],
    }
)
vectorizer = TokenPivotVectorizer(
    strategy=SumStrategy(address_col="address", pivot_columns=["token"], value_column="amount"),
    address_cols=["address"],
    result_address_col="address",
    contract_address_col="token",
    should_normalize=False,
)
sf = vectorizer(sf)
sf["vector"].data

>>>
     address  token_address_token_1  token_address_token_2  \
0  address_1                    100                    150   
1  address_2                    100                    200   
2  address_3                    300                      0   
   token_address_token_3  
0                     50  
1                      0  
2                      0  

And if you set to normalize the data:

vectorizer = TokenPivotVectorizer(
    strategy=SumStrategy(
        address_col="address", pivot_columns=["token"], value_column="amount"
    ),
    address_cols=["address"],
    result_address_col="address",
    contract_address_col="token",
)
sf = vectorizer(sf)
sf["vector"].data
>>>
     address  token_address_token_1  token_address_token_2  \
0  address_1                    0.0                   0.75   
1  address_2                    0.0                   1.00   
2  address_3                    1.0                   0.00   
   token_address_token_3  
0                    1.0  
1                    0.0  
2                    0.0  

Define Pivot Vectorizer

Token Pivot​

How to use?​

Column Names​

Define a Strategy​

Normalizing​

Token Pivot

How to use?

Column Names

Define a Strategy

Normalizing