Introduction To Trimmers
A Trimmer is a transformer that, as its name implies, trims the data based on specified conditions. This powerful tool is used to filter and refine datasets by removing unwanted or irrelevant data, ensuring that the resulting dataset is clean, focused, and more suitable for analysis or modeling.
In short, trimmers remove some columns or even entire SFrames from the input.
Feature Trimmer
One of the best examples of trimmers is the feature trimmer, which reduces the number of features (columns) from the input data. This is particularly useful when you have a dataset with many columns, and you want to focus only on a few specific features for your analysis or model. The feature trimmer allows you to efficiently select and retain only the columns you need, improving the clarity and manageability of your data.
How It Works
-
Load the SFrame:
- Start by loading your dataset into an SFrame, which contains multiple columns.
-
Specify Columns to Keep:
- Identify the specific columns that you want to retain in your dataset. This can be done by providing a list of column names.
-
Apply the Feature Trimmer:
- The feature trimmer will process the SFrame, keeping only the specified columns and discarding the rest.
-
Resulting SFrame:
- The resulting SFrame will contain only the columns you have chosen to keep, simplifying your data and making it more focused for analysis.
Example
from seshat.data_class import DFrame
from seshat.transformer.trimmer import FeatureTrimmer
data = {"A": ["foo", "bar"], "B": [1, 2], "C": [4, 5]}
sf_input = DFrame.from_raw(data)
trimmer = FeatureTrimmer(columns=["A", "B"])
sf_output = trimmer(sf_input)
list(sf_output.data.columns), list(sf_input.data.columns)
>>> (['A', 'B'], ['A', 'B', 'C'])
Zero Address Trimmer
In blockchain, particularly in Ethereum, a zero address (often 0x0000000000000000000000000000000000000000
) is
frequently used as a burn address. This means that tokens sent to this address are effectively removed from circulation,
making the zero address a common target for transactions meant to destroy tokens. To maintain the relevance and
cleanliness of your dataset, it may be necessary to remove transactions related to the zero address.
How It Works
The Zero Address Trimmer is a specific type of trimmer designed to filter out any transactions involving the zero address. By removing these transactions, you can ensure that your data analysis focuses only on meaningful transactions that do not involve token burning.
-
Load the SFrame:
- Begin by loading your transaction dataset into an SFrame.
-
Define the Address Columns:
- Identify the columns in your dataset that contain addresses. These could include columns like
from_address
andto_address
.
- Identify the columns in your dataset that contain addresses. These could include columns like
-
Specify the Zero Address:
- Provide the value of the zero address, typically
0x0000000000000000000000000000000000000000
.
- Provide the value of the zero address, typically
-
Apply the Zero Address Trimmer:
- The trimmer will process the dataset, examining each transaction to see if it involves the zero address.
Transactions where either the
from_address
orto_address
matches the zero address are removed.
- The trimmer will process the dataset, examining each transaction to see if it involves the zero address.
Transactions where either the
-
Resulting SFrame:
- The resulting SFrame will contain only transactions that do not involve the zero address, ensuring a cleaner and more relevant dataset for further analysis.
Example
from seshat.data_class import DFrame
from seshat.transformer.trimmer import ZeroAddressTrimmer
data = {"from_address": ["zero", "bar", "baz"], "to_address": ["baz", "zero", "qux"]}
sf_input = DFrame.from_raw(data)
trimmer = ZeroAddressTrimmer(
address_cols=["from_address", "to_address"], zero_address="zero"
)
sf_output = trimmer(sf_input)
sf_output.data
>>>
from_address to_address
0 baz qux
Low Transaction Trimmer
In blockchain analysis, particularly when working with Ethereum transaction data, you might want to keep only addresses with at least a certain number of transactions. The Low Transaction Trimmer is a highly useful tool for this purpose. This trimmer helps filter out addresses with low transaction counts, allowing you to focus on more active participants in the blockchain network.
How It Works
The Low Transaction Trimmer offers two methods for trimming low transaction addresses:
-
Check Each Address Column Separately:
- This method checks each address column (e.g.,
from_address
andto_address
) individually to ensure that each meets the minimum transaction count condition. This is useful when you want to consider the transaction counts independently for sending and receiving addresses.
- This method checks each address column (e.g.,
-
Check Across All Address Columns:
- This method ensures that each address meets the minimum transaction count condition across all address columns combined. This approach is beneficial when you want to ensure that an address is active in multiple roles, such as both a good sender and a good receiver.
Example
If you use the first method, you must set the exclusive_on_each
attribute to true.
trimmer = LowTransactionTrimmer(
address_cols=["from_address", "to_address"],
min_transaction_num=2,
exclusive_on_each=True,
)
sf = trimmer(sf)
sf.data
>>>
from_address to_address token
0 address_1 address_2 token_1
1 address_2 address_1 token_2
2 address_3 address_1 token_1
3 address_2 address_1 token_2
4 address_1 address_2 token_1
The second method that checks across all address columns is like this:
from seshat.data_class import DFrame
from seshat.transformer.trimmer import LowTransactionTrimmer
sf = DFrame.from_raw(
{
"from_address": [
"address_1",
"address_2",
"address_3",
"address_3",
"address_2",
"address_1",
],
"to_address": [
"address_2",
"address_1",
"address_4",
"address_1",
"address_1",
"address_2",
],
"token": ["token_1", "token_2", "token_2", "token_1", "token_2", "token_1"],
}
)
trimmer = LowTransactionTrimmer(
address_cols=["from_address", "to_address"],
min_transaction_num=2,
exclusive_on_each=False,
)
sf = trimmer(sf)
sf.data
>>>
from_address to_address token
0 address_2 address_1 token_2
1 address_3 address_1 token_1
2 address_2 address_1 token_2
Contract Trimmer
If you want to keep only certain contracts inside the input data, you can use this Contract Trimmer. This trimmer is specifically designed to filter transactions based on a list of valid contracts, ensuring that only transactions involving specified contracts are retained in the dataset.
How It Works
-
Define the Contract List Function
The Contract Trimmer accepts a
contract_list_fn
, which is a function that identifies valid contracts. This function is crucial for determining which contracts should be kept in the dataset. -
Provide Function Arguments
The function may require specific arguments to operate. To accommodate this, the trimmer constructor includes
contract_list_args
andcontract_list_kwargs
. These allow you to pass positional and keyword arguments, respectively, to the contract list function. -
Trimming Logic
The trimmer will apply the
contract_list_fn
with the provided arguments to identify valid contracts. Any transaction that does not involve these valid contracts will be trimmed from the dataset. -
Built-in Functionality
By default, there is a built-in function that finds popular contracts. This makes it easy to filter the dataset based on commonly used contracts without needing to define a custom function.
Example
from seshat.transformer.trimmer import ContractTrimmer
from seshat.utils.contracts import PopularContractsFinder
trimmer = ContractTrimmer(contract_list_fn=PopularContractsFinder().find, contract_list_kwargs={"limit": 200})
By utilizing these trimmers, you can refine and clean your datasets, ensuring they are well-prepared for analysis and processing.