Other Derivers

The main deriver provided in deriver implementation is designed to handle a variety of feature engineering tasks. However, there are also other derivers available that you can use for your specific purposes. These additional derivers are tailored to different types of data transformations and aggregations, offering flexibility and functionality for various analytical needs.

Sender Receiver Token

In the Ethereum blockchain, some tokens have transactions where they are sent or received. If you want to identify these tokens and keep them in a separate SFrame, you can use this transformer. This deriver is specifically designed to search through transaction data and extract contract addresses that appear in specified address columns. You must also pass the result column name; this name will be used for the name of the result column in the new SFrame. The name of the new SFrame at the output group SFrame is defined in the group_keys; the other key shows this SF key name.

How it works

Define Address Columns: Specify which columns contain addresses. These could be columns like from_address, to_address, or any other columns that hold Ethereum addresses involved in transactions.
Specify Contract Address Column: Identify the column that contains contract addresses, such as contract_address or any other relevant column where the token contract addresses are recorded.
Pass the Result Column Name: Provide the name for the result column that will store the unique contract addresses in the new SFrame. This name is used to label the column in the output SFrame where the results are stored.
Define Output Group SFrame: Use the group_keys parameter to define the output SFrame's name. The other key in group_keys specifies the key name of this new SFrame. This helps organize the output, ensuring it is clear and accessible.
Search Across All Rows: The deriver will scan through all rows in the dataset, checking each specified address column to find instances of the contract addresses.
Extract Unique Contract Addresses: The transformer will gather all unique contract addresses that appear at least once in the specified address columns.
Create a Separate SFrame: The resulting unique contract addresses are stored in a separate SFrame with the specified result column name and organized using the key defined in group_keys.

Example

from seshat.data_class import DFrame
from seshat.transformer.deriver.base import SenderReceiverTokensDeriver

sf = DFrame.from_raw(
    {
        "from_address": ["address_1", "token_1", "address_2", "address_3"],
        "to_address": ["address_2", "address_3", "address_4", "token_2"],
        "contract_address": ["token_1", "token_2", "token_2", "token_1"],
    }
)

deriver = SenderReceiverTokensDeriver(
    group_keys={"default": "default", "other": "tokens_with_transactions"},
    address_cols=["from_address", "to_address"],
    contract_address_col="contract_address",
    result_col="token",
)

sf = deriver(sf)
sf["tokens_with_transactions"].data

>>> 
     token
0  token_2
1  token_1

Percentile Transaction Value

This deriver can compute the percentile of a specific column and insert the result as a new column. To use this, you must provide a value column for which the percentile will be computed. Additionally, you can specify the name of the new column where the result percentile value will be inserted.

How it works

Specify the Value Column: Use the value_col argument to identify the column that contains the values for which you want to compute the percentile. This column should contain numeric data.
Specify the Result Column Name: Use the result_col argument to provide a name for the new column where the computed percentile values will be stored. This helps in organizing the output and making it clear which column contains the percentile values.
Customize Quantile Probabilities (Optional): By default, the quantile probabilities are set to (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9). You can customize this by providing your own set of quantile probabilities if needed.

Example

Imagine you have a dataset that contains the amount that each address has. To find the percentile of the amount, you do this:

from seshat.data_class import DFrame
from seshat.transformer.deriver.base import PercentileTransactionValueDeriver

sf = DFrame.from_raw(
    {
        "address": ["address_1", "address_2", "address_3", "address_4", "address_5"],
        "amount": [100, 200, 300, 400, 500],
    }
)

deriver = PercentileTransactionValueDeriver(value_col="amount", result_col="percentile")
sf = deriver(sf)
sf.data

>>> 
     address  amount  percentile
0  address_1     100          10
1  address_2     200          30
2  address_3     300          50
3  address_4     400          80
4  address_5     500         100

From SQLDB Deriver

The SQLDB Deriver is a tool designed to derive new columns based on existing data in an SQL database. This is particularly useful during the inference stage, as explained in the inference documentation.

We have already introduced the SQL database source and discussed how it can fetch data from the database. However, you might wonder why we need a deriver that can connect to the database during the pipeline process.

The conventional approach is for the source to fetch data at the beginning of the pipeline in the feature view. But what if you need to fetch data from the database in the middle of the pipeline, with a query dependent on the resulting data? This is where a transformer, like the SQLDB Deriver, comes into play. It can access the already fetched and transformed data and perform additional data fetching from a database.

How It Works

The SQLDB Deriver fetches data using a source provided as an argument. Therefore, all principles related to fetching data from the source, especially for the SQL database source, are applicable here.

The query for the deriver is defined using the query, filters, and the get_query_fn. This query is set for the source query, meaning you should define the query within the deriver, not for the source itself.

There are two options for using the SQLDB Deriver:

It can fetch data from the database and add it to a separate dataframe (sf).
It can merge the fetched results into the default dataframe (sf).

Dynamic Query

This deriver can handle custom queries because the default dataframe (sf) is passed to the get_query_fn.

Example

Consider the following initial data:

address	token
address_1	token_1
address_2	token_2
address_1	token_2

You want to fetch the amount of each token from the database, where the database contains data like this:

token	amount
token_1	100
token_2	200

The result should be:

address	token	token_amount
address_1	token_1	100
address_2	token_2	200
address_1	token_2	200

This can be accomplished using the SQLDB Deriver.

We start with the following dataframe (sf):

sf = DFrame.from_raw(
    {
        "address": ["address_1", "address_2", "address_3"],
        "token": ["token_1", "token_2", "token_2"],
    }
)

To define the deriver, we need to define a get_query_fn that uses the default dataframe (sf):

def get_query_fn(default: pd.DataFrame, *args, **kwargs):
    tokens = default["token"].tolist()
    tokens_list_str = ", ".join([f"'{token}'" for token in tokens])
    return f"select * from token_info where token in ({tokens_list_str})"

As shown, this function uses the default dataframe to generate a suitable query. After defining the query function, we define the deriver and pass the function to it:

deriver = FromSQLDBDeriver(
    SQLDBSource(url=DB_URL),
    base_col="address",
    get_query_fn=get_query_fn
    merge_result=True,
)

# Apply the deriver to the dataframe
sf = deriver(sf)
# Display the resulting dataframe
sf.data 
>>>
     address    token   amount_avg
0  address_1  token_1  100
1  address_2  token_2  200
2  address_3  token_2  200

Other Derivers

Sender Receiver Token​

How it works​

Example​

Percentile Transaction Value​

How it works​

Example​

From SQLDB Deriver​

How It Works​

Dynamic Query​

Example​

Sender Receiver Token

How it works

Example

Percentile Transaction Value

How it works

Example

From SQLDB Deriver

How It Works

Dynamic Query

Example