Skip to main content

Other Derivers

The main deriver provided in deriver implementation is designed to handle a variety of feature engineering tasks. However, there are also other derivers available that you can use for your specific purposes. These additional derivers are tailored to different types of data transformations and aggregations, offering flexibility and functionality for various analytical needs.

Sender Receiver Token

In the Ethereum blockchain, some tokens have transactions where they are sent or received. If you want to identify these tokens and keep them in a separate SFrame, you can use this transformer. This deriver is specifically designed to search through transaction data and extract contract addresses that appear in specified address columns. You must also pass the result column name; this name will be used for the name of the result column in the new SFrame. The name of the new SFrame at the output group SFrame is defined in the group_keys; the other key shows this SF key name.

How it works

  • Define Address Columns: Specify which columns contain addresses. These could be columns like from_address, to_address, or any other columns that hold Ethereum addresses involved in transactions.

  • Specify Contract Address Column: Identify the column that contains contract addresses, such as contract_address or any other relevant column where the token contract addresses are recorded.

  • Pass the Result Column Name: Provide the name for the result column that will store the unique contract addresses in the new SFrame. This name is used to label the column in the output SFrame where the results are stored.

  • Define Output Group SFrame: Use the group_keys parameter to define the output SFrame's name. The other key in group_keys specifies the key name of this new SFrame. This helps organize the output, ensuring it is clear and accessible.

  • Search Across All Rows: The deriver will scan through all rows in the dataset, checking each specified address column to find instances of the contract addresses.

  • Extract Unique Contract Addresses: The transformer will gather all unique contract addresses that appear at least once in the specified address columns.

  • Create a Separate SFrame: The resulting unique contract addresses are stored in a separate SFrame with the specified result column name and organized using the key defined in group_keys.

Example

from seshat.data_class import DFrame
from seshat.transformer.deriver.base import SenderReceiverTokensDeriver

sf = DFrame.from_raw(
{
"from_address": ["address_1", "token_1", "address_2", "address_3"],
"to_address": ["address_2", "address_3", "address_4", "token_2"],
"contract_address": ["token_1", "token_2", "token_2", "token_1"],
}
)

deriver = SenderReceiverTokensDeriver(
group_keys={"default": "default", "other": "tokens_with_transactions"},
address_cols=["from_address", "to_address"],
contract_address_col="contract_address",
result_col="token",
)

sf = deriver(sf)
sf["tokens_with_transactions"].data

>>>
token
0 token_2
1 token_1

Percentile Transaction Value

This deriver can compute the percentile of a specific column and insert the result as a new column. To use this, you must provide a value column for which the percentile will be computed. Additionally, you can specify the name of the new column where the result percentile value will be inserted.

How it works

  • Specify the Value Column: Use the value_col argument to identify the column that contains the values for which you want to compute the percentile. This column should contain numeric data.

  • Specify the Result Column Name: Use the result_col argument to provide a name for the new column where the computed percentile values will be stored. This helps in organizing the output and making it clear which column contains the percentile values.

  • Customize Quantile Probabilities (Optional): By default, the quantile probabilities are set to (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). You can customize this by providing your own set of quantile probabilities if needed.

Example

Imagine you have a dataset that contains the amount that each address has. To find the percentile of the amount, you do this:

from seshat.data_class import DFrame
from seshat.transformer.deriver.base import PercentileTransactionValueDeriver

sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3", "address_4", "address_5"],
"amount": [100, 200, 300, 400, 500],
}
)

deriver = PercentileTransactionValueDeriver(value_col="amount", result_col="percentile")
sf = deriver(sf)
sf.data

>>>
address amount percentile
0 address_1 100 10
1 address_2 200 30
2 address_3 300 50
3 address_4 400 80
4 address_5 500 100

From SQLDB Deriver

The SQLDB Deriver is a tool designed to derive new columns based on existing data in an SQL database. This is particularly useful during the inference stage, as explained in the inference documentation.

We have already introduced the SQL database source and discussed how it can fetch data from the database. However, you might wonder why we need a deriver that can connect to the database during the pipeline process.

The conventional approach is for the source to fetch data at the beginning of the pipeline in the feature view. But what if you need to fetch data from the database in the middle of the pipeline, with a query dependent on the resulting data? This is where a transformer, like the SQLDB Deriver, comes into play. It can access the already fetched and transformed data and perform additional data fetching from a database.

How It Works

The SQLDB Deriver fetches data using a source provided as an argument. Therefore, all principles related to fetching data from the source, especially for the SQL database source, are applicable here.

The query for the deriver is defined using the query, filters, and the get_query_fn. This query is set for the source query, meaning you should define the query within the deriver, not for the source itself.

There are two options for using the SQLDB Deriver:

  1. It can fetch data from the database and add it to a separate dataframe (sf).
  2. It can merge the fetched results into the default dataframe (sf).

Dynamic Query

This deriver can handle custom queries because the default dataframe (sf) is passed to the get_query_fn.

Example

Consider the following initial data:

addresstoken
address_1token_1
address_2token_2
address_1token_2

You want to fetch the amount of each token from the database, where the database contains data like this:

tokenamount
token_1100
token_2200

The result should be:

addresstokentoken_amount
address_1token_1100
address_2token_2200
address_1token_2200

This can be accomplished using the SQLDB Deriver.

We start with the following dataframe (sf):

sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"token": ["token_1", "token_2", "token_2"],
}
)

To define the deriver, we need to define a get_query_fn that uses the default dataframe (sf):

def get_query_fn(default: pd.DataFrame, *args, **kwargs):
tokens = default["token"].tolist()
tokens_list_str = ", ".join([f"'{token}'" for token in tokens])
return f"select * from token_info where token in ({tokens_list_str})"

As shown, this function uses the default dataframe to generate a suitable query. After defining the query function, we define the deriver and pass the function to it:

deriver = FromSQLDBDeriver(
SQLDBSource(url=DB_URL),
base_col="address",
get_query_fn=get_query_fn
merge_result=True,
)

# Apply the deriver to the dataframe
sf = deriver(sf)
# Display the resulting dataframe
sf.data
>>>
address token amount_avg
0 address_1 token_1 100
1 address_2 token_2 200
2 address_3 token_2 200