SQL Database Saver
This saver is designed to work with SQL databases. It uses the "SQLMixin" that the SQL database source and Flipside source use. This mixin is useful because, by using SQLAlchemy, it can connect to a database and read from or write to it.
To work with this, you should define the database URL for the saver and focus on the save configs.
Save Config
Save Config is a dataclass with several fields. Each field specifies which SFrame to save and how to save it to a particular table.
Fields
Save Config has these fields:
sf_key
: Specifies which SFrame from your input to save.table
: The name of the table where the data will be saved.schema
: Plays multiple roles in the saver. It is used for creating tables, matching data columns to database table columns, and identifying the columns for updating the table.strategy
: Defines how the data is written to the database. The options are:replace
: First removes all data from the database and then inserts the new data.insert
: Simply inserts the data into the database without checking if new rows already exist.update
: Updates the data if the values of columns defined as IDs in the schema already exist in the database, then updates the other columns.
indexes
: The saver can add indexes to your table, which can be single or composite.
Schema
The schema is crucial for the saver. Here we explain more about its role:
-
Creating Table: The schema is used for creating tables. It defines the column names and types, with the default column type being "String". For example, with the following config:
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(
cols=[
Col("column_1"),
Col("column_2", dtype="Double"),
Col("column_3", to="column_4"),
]
),
)In this config,
column_1
is "String",column_2
is "Double", andcolumn_3
in your dataset is namedcolumn_4
in the database, which will be created as a "String" column. -
Updating the Columns: The Schema's Col has an
update_func
field where the update function is defined. This function is used when you want to update the table.
Indexing
As mentioned, indexing can be single or composite. To define indexes, simply add the list of indexes to the save config.
Assume you define the following save config and want to add indexes to it.
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
)
For a single index, just add the column names you want to index as a list of strings.
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
indexes=["column_1", "column_2"]
)
This will create two indexes: one for column_1
and one for column_2
.
For composite indexing, add a list of column names that you want to index together.
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("column_1"), Col("column_2")]),
indexes=["column_1", "column_2", ["column_1", "column_2"]],
)
The above code creates three indexes: one for column_1
, one for column_2
, and one composite index for
both column_1
and column_2
.
Avoid Duplicate Indexing
If you already have some indexing in your table, don't worry if you define indexes that already exist. The saver hashes the columns for each index and, if this hash already exists, avoids creating the index.
Writing Strategy
As mentioned before, writing to the database can follow multiple strategies. This section explains these strategies and how they work.
Insert
The insert strategy doesn't care about the existing data in the database; it blindly inserts the whole data into it. This approach is straightforward and ensures that all data in the SFrame is written to the database without any checks for existing records or potential conflicts.
Replace
The replace strategy will remove all existing data from the database table and then insert the new data into it. This approach ensures that the database table contains only the latest data, effectively refreshing the table with new entries.
When your strategy is set to insert
and the update_func
for every column in the schema is set to replace
, the
Saver behaves the same as the replace strategy.
Update
The update strategy is used to update rows if the defined index columns are the same. For example, assume that you have this dataset:
address | token | sent_amount |
---|---|---|
address_1 | token_1 | 10 |
address_1 | token_2 | 20 |
address_2 | token_1 | 30 |
And your dataset has these rows:
address | contract_address | sent_amount |
---|---|---|
address_1 | token_1 | 40 |
If you have this config:
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("contract_address", "token", update_func="sum"), Col("column_2")]),
strategy="update",
)
The result of the database table after calling the saver with the above config will be like this:
address | token | sent_amount |
---|---|---|
address_1 | token_1 | 50 |
address_1 | token_2 | 20 |
address_2 | token_1 | 30 |
This shows that the update strategy correctly sums the sent_amount
for address_1
and token_1
.
How to Use
After you define your save configs, you create a saver instance with the configs passed to it.
Example Code
sf: SFrame
saver = Saver(
save_configs=[
SaveConfig(
sf_key="default",
table="default_table_name",
schema=Schema(cols=[Col("contract_address", "token"), Col("column_2")]),
strategy="update",
)
]
)
saver(sf)
By calling the saver instance, the saving process will start.