Skip to main content

SPFrame

SPFrame is a PySpark implementation of SFrame. It keeps the PySpark DataFrame inside it.

How to Create

To create a new SPFrame from a PySpark DataFrame:

import pandas as pd
from seshat.data_class import SPFrame

data = {"A": ["foo", "baz"], "B": [1, 2]}
df = pd.DataFrame(data=data)
sf = SPFrame.from_raw(df)

To access the raw data, it is highly recommended to use the to_raw() method instead of using sf.data directly. This is because in GroupSFrame we do not have one raw data to be returned. Using to_raw() is safer.

df = sf.to_raw()
df.show()

Spark Session

SPFrame gets or creates a PySpark session using the get_spark() method. This method will get or create a new session using the SPARK_APP_NAME config as the app name.

Convert to Other

SPFrame can also be converted to DFrame (the pandas implementation of SFrame). To convert SPFrame, you can do it like this:

converted_sf = sf.to_df()