Other Derivers
The main deriver provided in deriver implementation is designed to handle a variety of feature engineering tasks. However, there are also other derivers available that you can use for your specific purposes. These additional derivers are tailored to different types of data transformations and aggregations, offering flexibility and functionality for various analytical needs.
Sender Receiver Token
In the Ethereum blockchain, some tokens have transactions where they are sent or received. If you want to identify these
tokens and keep them in a separate SFrame, you can use this transformer. This deriver is specifically designed to search
through transaction data and extract contract addresses that appear in specified address columns. You must also pass the
result column name; this name will be used for the name of the result column in the new SFrame. The name of the new
SFrame at the output group SFrame is defined in the group_keys
; the other
key shows this SF key name.
How it works
-
Define Address Columns: Specify which columns contain addresses. These could be columns like
from_address
,to_address
, or any other columns that hold Ethereum addresses involved in transactions. -
Specify Contract Address Column: Identify the column that contains contract addresses, such as
contract_address
or any other relevant column where the token contract addresses are recorded. -
Pass the Result Column Name: Provide the name for the result column that will store the unique contract addresses in the new SFrame. This name is used to label the column in the output SFrame where the results are stored.
-
Define Output Group SFrame: Use the
group_keys
parameter to define the output SFrame's name. Theother
key ingroup_keys
specifies the key name of this new SFrame. This helps organize the output, ensuring it is clear and accessible. -
Search Across All Rows: The deriver will scan through all rows in the dataset, checking each specified address column to find instances of the contract addresses.
-
Extract Unique Contract Addresses: The transformer will gather all unique contract addresses that appear at least once in the specified address columns.
-
Create a Separate SFrame: The resulting unique contract addresses are stored in a separate SFrame with the specified result column name and organized using the key defined in
group_keys
.
Example
from seshat.data_class import DFrame
from seshat.transformer.deriver.base import SenderReceiverTokensDeriver
sf = DFrame.from_raw(
{
"from_address": ["address_1", "token_1", "address_2", "address_3"],
"to_address": ["address_2", "address_3", "address_4", "token_2"],
"contract_address": ["token_1", "token_2", "token_2", "token_1"],
}
)
deriver = SenderReceiverTokensDeriver(
group_keys={"default": "default", "other": "tokens_with_transactions"},
address_cols=["from_address", "to_address"],
contract_address_col="contract_address",
result_col="token",
)
sf = deriver(sf)
sf["tokens_with_transactions"].data
>>>
token
0 token_2
1 token_1
Percentile Transaction Value
This deriver can compute the percentile of a specific column and insert the result as a new column. To use this, you must provide a value column for which the percentile will be computed. Additionally, you can specify the name of the new column where the result percentile value will be inserted.
How it works
-
Specify the Value Column: Use the
value_col
argument to identify the column that contains the values for which you want to compute the percentile. This column should contain numeric data. -
Specify the Result Column Name: Use the
result_col
argument to provide a name for the new column where the computed percentile values will be stored. This helps in organizing the output and making it clear which column contains the percentile values. -
Customize Quantile Probabilities (Optional): By default, the quantile probabilities are set to (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). You can customize this by providing your own set of quantile probabilities if needed.
Example
Imagine you have a dataset that contains the amount that each address has. To find the percentile of the amount, you do this:
from seshat.data_class import DFrame
from seshat.transformer.deriver.base import PercentileTransactionValueDeriver
sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3", "address_4", "address_5"],
"amount": [100, 200, 300, 400, 500],
}
)
deriver = PercentileTransactionValueDeriver(value_col="amount", result_col="percentile")
sf = deriver(sf)
sf.data
>>>
address amount percentile
0 address_1 100 10
1 address_2 200 30
2 address_3 300 50
3 address_4 400 80
4 address_5 500 100
From SQLDB Deriver
The SQLDB Deriver is a tool designed to derive new columns based on existing data in an SQL database. This is particularly useful during the inference stage, as explained in the inference documentation.
We have already introduced the SQL database source and discussed how it can fetch data from the database. However, you might wonder why we need a deriver that can connect to the database during the pipeline process.
The conventional approach is for the source to fetch data at the beginning of the pipeline in the feature view. But what if you need to fetch data from the database in the middle of the pipeline, with a query dependent on the resulting data? This is where a transformer, like the SQLDB Deriver, comes into play. It can access the already fetched and transformed data and perform additional data fetching from a database.
How It Works
The SQLDB Deriver fetches data using a source provided as an argument. Therefore, all principles related to fetching data from the source, especially for the SQL database source, are applicable here.
The query for the deriver is defined using the query
, filters
, and the get_query_fn
. This query is set for the
source query, meaning you should define the query within the deriver, not for the source itself.
There are two options for using the SQLDB Deriver:
- It can fetch data from the database and add it to a separate dataframe (sf).
- It can merge the fetched results into the default dataframe (sf).
Dynamic Query
This deriver can handle custom queries because the default dataframe (sf) is passed to the get_query_fn
.
Example
Consider the following initial data:
address | token |
---|---|
address_1 | token_1 |
address_2 | token_2 |
address_1 | token_2 |
You want to fetch the amount of each token from the database, where the database contains data like this:
token | amount |
---|---|
token_1 | 100 |
token_2 | 200 |
The result should be:
address | token | token_amount |
---|---|---|
address_1 | token_1 | 100 |
address_2 | token_2 | 200 |
address_1 | token_2 | 200 |
This can be accomplished using the SQLDB Deriver.
We start with the following dataframe (sf):
sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"token": ["token_1", "token_2", "token_2"],
}
)
To define the deriver, we need to define a get_query_fn
that uses the default dataframe (sf):
def get_query_fn(default: pd.DataFrame, *args, **kwargs):
tokens = default["token"].tolist()
tokens_list_str = ", ".join([f"'{token}'" for token in tokens])
return f"select * from token_info where token in ({tokens_list_str})"
As shown, this function uses the default dataframe to generate a suitable query. After defining the query function, we define the deriver and pass the function to it:
deriver = FromSQLDBDeriver(
SQLDBSource(url=DB_URL),
base_col="address",
get_query_fn=get_query_fn
merge_result=True,
)
# Apply the deriver to the dataframe
sf = deriver(sf)
# Display the resulting dataframe
sf.data
>>>
address token amount_avg
0 address_1 token_1 100
1 address_2 token_2 200
2 address_3 token_2 200