Skip to main content

SQL Database Saver

This saver is designed to work with SQL databases. It uses the "SQLMixin" that the SQL database source and Flipside source use. This mixin is useful because, by using SQLAlchemy, it can connect to a database and read from or write to it.

To work with this, you should define the database URL for the saver and focus on the save configs.

Save Config

Save Config is a dataclass with several fields. Each field specifies which SFrame to save and how to save it to a particular table.

Fields

Save Config has these fields:

  • sf_key: Specifies which SFrame from your input to save.
  • table: The name of the table where the data will be saved.
  • schema: Plays multiple roles in the saver. It is used for creating tables, matching data columns to database table columns, and identifying the columns for updating the table.
  • strategy: Defines how the data is written to the database. The options are:
    • replace: First removes all data from the database and then inserts the new data.
    • insert: Simply inserts the data into the database without checking if new rows already exist.
    • update: Updates the data if the values of columns defined as IDs in the schema already exist in the database, then updates the other columns.
  • indexes: The saver can add indexes to your table, which can be single or composite.

Schema

The schema is crucial for the saver. Here we explain more about its role:

  • Creating Table: The schema is used for creating tables. It defines the column names and types, with the default column type being "String". For example, with the following config:

    SaveConfig(
    sf_key="default",
    table="default_table_name",
    schema=Schema(
    cols=[
    Col("column_1"),
    Col("column_2", dtype="Double"),
    Col("column_3", to="column_4"),
    ]
    ),
    )

    In this config, column_1 is "String", column_2 is "Double", and column_3 in your dataset is named column_4 in the database, which will be created as a "String" column.

  • Updating the Columns: The Schema's Col has an update_func field where the update function is defined. This function is used when you want to update the table.

Indexing

As mentioned, indexing can be single or composite. To define indexes, simply add the list of indexes to the save config.

Assume you define the following save config and want to add indexes to it.

SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
)

For a single index, just add the column names you want to index as a list of strings.

SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
indexes=["column_1", "column_2"]
)

This will create two indexes: one for column_1 and one for column_2.

For composite indexing, add a list of column names that you want to index together.

SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
indexes=["column_1", "column_2", ["column_1", "column_2"]],
)

The above code creates three indexes: one for column_1, one for column_2, and one composite index for both column_1 and column_2.

Avoid Duplicate Indexing

If you already have some indexing in your table, don't worry if you define indexes that already exist. The saver hashes the columns for each index and, if this hash already exists, avoids creating the index.

Writing Strategy

As mentioned before, writing to the database can follow multiple strategies. This section explains these strategies and how they work.

Insert

The insert strategy doesn't care about the existing data in the database; it blindly inserts the whole data into it. This approach is straightforward and ensures that all data in the SFrame is written to the database without any checks for existing records or potential conflicts.

Replace

The replace strategy will remove all existing data from the database table and then insert the new data into it. This approach ensures that the database table contains only the latest data, effectively refreshing the table with new entries.

tip

When your strategy is set to insert and the update_func for every column in the schema is set to replace, the Saver behaves the same as the replace strategy.

Update

The update strategy is used to update rows if the defined index columns are the same. For example, assume that you have this dataset:

addresstokensent_amount
address_1token_110
address_1token_220
address_2token_130

And your dataset has these rows:

addresscontract_addresssent_amount
address_1token_140

If you have this config:

SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("contract_address", "token", update_func="sum"), Col("column_2")]),
strategy="update",
)

The result of the database table after calling the saver with the above config will be like this:

addresstokensent_amount
address_1token_150
address_1token_220
address_2token_130

This shows that the update strategy correctly sums the sent_amount for address_1 and token_1.

How to Use

After you define your save configs, you create a saver instance with the configs passed to it.

Example Code

sf: SFrame
saver = Saver(
save_configs=[
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("contract_address", "token"), Col("column_2")]),
strategy="update",
)
]
)
saver(sf)

By calling the saver instance, the saving process will start.