DFrame
"DFrame" is the pandas implementation of "SFrame" that keeps a pandas DataFrame inside itself.
How to Create
"DFrame" can be easily created like this:
import pandas as pd
from seshat.data_class import DFrame
data = {"A": ["foo", "baz"], "B": [1, 2]}
df = pd.DataFrame(data=data)
sf = DFrame.from_raw(df)
Even if no pandas DataFrame is provided and you want to directly create a new SFrame, you can pass data to
the from_raw
method:
sf = DFrame.from_raw(data)
To access the raw data, it is highly recommended to use the to_raw()
method instead of accessing .data
directly:
df = sf.to_raw()
print(df)
To Dict
This method formats the data into a dictionary. The method is, in fact, the to_dict
method of pandas DataFrame
with orient
equal to records
.
from seshat.data_class import DFrame
sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"feature": ["feature_1", "feature_2", "feature_3"],
}
)
print(sf.to_dict())
You can also pass column names to this method if you want to get only some of the columns:
sf.to_dict("address")
Convert to Other
"DFrame" can also be converted to "SPFrame" (the pyspark implementation of SFrame). To use it, you must pass the SPFrame instance to it. By passing the destination instance, the method can understand which conversion method to use.
Example
Assume you want to convert to pyspark:
from seshat.data_class import DFrame, SPFrame
sf = DFrame.from_raw(
{
"address": ["address_1", "address_2", "address_3"],
"feature": ["feature_1", "feature_2", "feature_3"],
}
)
converted_sf = sf.convert(SPFrame())
print(converted_sf.data)