Skip to main content

Introduction To Trimmers

A Trimmer is a transformer that, as its name implies, trims the data based on specified conditions. This powerful tool is used to filter and refine datasets by removing unwanted or irrelevant data, ensuring that the resulting dataset is clean, focused, and more suitable for analysis or modeling.

In short, trimmers remove some columns or even entire SFrames from the input.

Feature Trimmer

One of the best examples of trimmers is the feature trimmer, which reduces the number of features (columns) from the input data. This is particularly useful when you have a dataset with many columns, and you want to focus only on a few specific features for your analysis or model. The feature trimmer allows you to efficiently select and retain only the columns you need, improving the clarity and manageability of your data.

How It Works

  1. Load the SFrame:

    • Start by loading your dataset into an SFrame, which contains multiple columns.
  2. Specify Columns to Keep:

    • Identify the specific columns that you want to retain in your dataset. This can be done by providing a list of column names.
  3. Apply the Feature Trimmer:

    • The feature trimmer will process the SFrame, keeping only the specified columns and discarding the rest.
  4. Resulting SFrame:

    • The resulting SFrame will contain only the columns you have chosen to keep, simplifying your data and making it more focused for analysis.

Example

from seshat.data_class import DFrame
from seshat.transformer.trimmer import FeatureTrimmer

data = {"A": ["foo", "bar"], "B": [1, 2], "C": [4, 5]}
sf_input = DFrame.from_raw(data)

trimmer = FeatureTrimmer(columns=["A", "B"])
sf_output = trimmer(sf_input)
list(sf_output.data.columns), list(sf_input.data.columns)
>>> (['A', 'B'], ['A', 'B', 'C'])

Zero Address Trimmer

In blockchain, particularly in Ethereum, a zero address (often 0x0000000000000000000000000000000000000000) is frequently used as a burn address. This means that tokens sent to this address are effectively removed from circulation, making the zero address a common target for transactions meant to destroy tokens. To maintain the relevance and cleanliness of your dataset, it may be necessary to remove transactions related to the zero address.

How It Works

The Zero Address Trimmer is a specific type of trimmer designed to filter out any transactions involving the zero address. By removing these transactions, you can ensure that your data analysis focuses only on meaningful transactions that do not involve token burning.

  1. Load the SFrame:

    • Begin by loading your transaction dataset into an SFrame.
  2. Define the Address Columns:

    • Identify the columns in your dataset that contain addresses. These could include columns like from_address and to_address.
  3. Specify the Zero Address:

    • Provide the value of the zero address, typically 0x0000000000000000000000000000000000000000.
  4. Apply the Zero Address Trimmer:

    • The trimmer will process the dataset, examining each transaction to see if it involves the zero address. Transactions where either the from_address or to_address matches the zero address are removed.
  5. Resulting SFrame:

    • The resulting SFrame will contain only transactions that do not involve the zero address, ensuring a cleaner and more relevant dataset for further analysis.

Example

from seshat.data_class import DFrame
from seshat.transformer.trimmer import ZeroAddressTrimmer

data = {"from_address": ["zero", "bar", "baz"], "to_address": ["baz", "zero", "qux"]}
sf_input = DFrame.from_raw(data)

trimmer = ZeroAddressTrimmer(
address_cols=["from_address", "to_address"], zero_address="zero"
)
sf_output = trimmer(sf_input)
sf_output.data

>>>
from_address to_address
0 baz qux

Low Transaction Trimmer

In blockchain analysis, particularly when working with Ethereum transaction data, you might want to keep only addresses with at least a certain number of transactions. The Low Transaction Trimmer is a highly useful tool for this purpose. This trimmer helps filter out addresses with low transaction counts, allowing you to focus on more active participants in the blockchain network.

How It Works

The Low Transaction Trimmer offers two methods for trimming low transaction addresses:

  1. Check Each Address Column Separately:

    • This method checks each address column (e.g., from_address and to_address) individually to ensure that each meets the minimum transaction count condition. This is useful when you want to consider the transaction counts independently for sending and receiving addresses.
  2. Check Across All Address Columns:

    • This method ensures that each address meets the minimum transaction count condition across all address columns combined. This approach is beneficial when you want to ensure that an address is active in multiple roles, such as both a good sender and a good receiver.

Example

If you use the first method, you must set the exclusive_on_each attribute to true.

trimmer = LowTransactionTrimmer(
address_cols=["from_address", "to_address"],
min_transaction_num=2,
exclusive_on_each=True,
)

sf = trimmer(sf)
sf.data
>>>
from_address to_address token
0 address_1 address_2 token_1
1 address_2 address_1 token_2
2 address_3 address_1 token_1
3 address_2 address_1 token_2
4 address_1 address_2 token_1

The second method that checks across all address columns is like this:

from seshat.data_class import DFrame
from seshat.transformer.trimmer import LowTransactionTrimmer


sf = DFrame.from_raw(
{
"from_address": [
"address_1",
"address_2",
"address_3",
"address_3",
"address_2",
"address_1",
],
"to_address": [
"address_2",
"address_1",
"address_4",
"address_1",
"address_1",
"address_2",
],
"token": ["token_1", "token_2", "token_2", "token_1", "token_2", "token_1"],
}
)
trimmer = LowTransactionTrimmer(
address_cols=["from_address", "to_address"],
min_transaction_num=2,
exclusive_on_each=False,
)

sf = trimmer(sf)
sf.data
>>>
from_address to_address token
0 address_2 address_1 token_2
1 address_3 address_1 token_1
2 address_2 address_1 token_2

Contract Trimmer

If you want to keep only certain contracts inside the input data, you can use this Contract Trimmer. This trimmer is specifically designed to filter transactions based on a list of valid contracts, ensuring that only transactions involving specified contracts are retained in the dataset.

How It Works

  1. Define the Contract List Function

    The Contract Trimmer accepts a contract_list_fn, which is a function that identifies valid contracts. This function is crucial for determining which contracts should be kept in the dataset.

  2. Provide Function Arguments

    The function may require specific arguments to operate. To accommodate this, the trimmer constructor includes contract_list_args and contract_list_kwargs. These allow you to pass positional and keyword arguments, respectively, to the contract list function.

  3. Trimming Logic

    The trimmer will apply the contract_list_fn with the provided arguments to identify valid contracts. Any transaction that does not involve these valid contracts will be trimmed from the dataset.

  4. Built-in Functionality

    By default, there is a built-in function that finds popular contracts. This makes it easy to filter the dataset based on commonly used contracts without needing to define a custom function.

Example

from seshat.transformer.trimmer import ContractTrimmer
from seshat.utils.contracts import PopularContractsFinder

trimmer = ContractTrimmer(contract_list_fn=PopularContractsFinder().find, contract_list_kwargs={"limit": 200})

By utilizing these trimmers, you can refine and clean your datasets, ensuring they are well-prepared for analysis and processing.