SPFrame
SPFrame is a PySpark implementation of SFrame. It keeps the PySpark DataFrame inside it.
How to Create
To create a new SPFrame from a PySpark DataFrame:
import pandas as pd
from seshat.data_class import SPFrame
data = {"A": ["foo", "baz"], "B": [1, 2]}
df = pd.DataFrame(data=data)
sf = SPFrame.from_raw(df)
To access the raw data, it is highly recommended to use the to_raw()
method instead of using sf.data
directly. This
is because in GroupSFrame we do not have one raw data to be returned. Using to_raw()
is safer.
df = sf.to_raw()
df.show()
Spark Session
SPFrame gets or creates a PySpark session using the get_spark()
method. This method will get or create a new session
using the SPARK_APP_NAME
config as the app name.
Convert to Other
SPFrame can also be converted to DFrame (the pandas implementation of SFrame). To convert SPFrame, you can do it like this:
converted_sf = sf.to_df()