Skip to main content

DFrame

"DFrame" is the pandas implementation of "SFrame" that keeps a pandas DataFrame inside itself.

How to Create

"DFrame" can be easily created like this:

import pandas as pd
from seshat.data_class import DFrame

data = {"A": ["foo", "baz"], "B": [1, 2]}
df = pd.DataFrame(data=data)
sf = DFrame.from_raw(df)

Even if no pandas DataFrame is provided and you want to directly create a new SFrame, you can pass data to the from_raw method:

sf = DFrame.from_raw(data)

To access the raw data, it is highly recommended to use the to_raw() method instead of accessing .data directly:

df = sf.to_raw()
print(df)

To Dict

This method formats the data into a dictionary. The method is, in fact, the to_dict method of pandas DataFrame with orient equal to records.

from seshat.data_class import DFrame

sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"feature": ["feature_1", "feature_2", "feature_3"],
}
)

print(sf.to_dict())

You can also pass column names to this method if you want to get only some of the columns:

sf.to_dict("address")

Convert to Other

"DFrame" can also be converted to "SPFrame" (the pyspark implementation of SFrame). To use it, you must pass the SPFrame instance to it. By passing the destination instance, the method can understand which conversion method to use.

Example

Assume you want to convert to pyspark:

from seshat.data_class import DFrame, SPFrame

sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"feature": ["feature_1", "feature_2", "feature_3"],
}
)

converted_sf = sf.convert(SPFrame())
print(converted_sf.data)