Skip to main content

SFrame

SFrame is a special data frame used in Seshat. The SFrame acts as a data wrapper that can encapsulate raw data from various sources, such as pandas or PySpark dataframes. By wrapping different types of dataframes into a consistent SFrame format, it ensures uniformity and compatibility across the project.

However, SFrame is not just a data wrapper. It also provides additional methods that are particularly useful when working with dataframes from other libraries. This ensures that you are always dealing with the same type of object, regardless of the underlying data source. This consistency is crucial for maintaining the integrity and efficiency of data operations throughout the entire project.

warning

Note that SFrame is an interface, and if you want to work with it directly, you can see the DFrame and SPFrame documentation. On this page, we focus on defining the methods and generally use the pandas implementation or DFrame for examples. To see other implementations, see the related documentation.

Create SFrame

If you want to create a new SFrame, you can easily do this:

from seshat.data_class import SFrame

sf = SFrame.from_raw({"foo": [1, 2, 3]})

Retrieve and Set the Raw Data

If you want to set raw data, you can use the set_raw method:

from seshat.data_class import SFrame

sf = SFrame()
sample_data = {"foo": [1, 2, 3]}
sf.set_raw("db", sample_data)

And to get the raw data, use the to_raw method:

sf.to_raw()

Make Group

To convert an sframe into a GroupSFrame, you can use the make_group method. This method accepts a default_key, which will be the key of the sframe in the resulting group sframe children.

from seshat.data_class import SFrame, DFrame

sf = DFrame.from_raw({})

grouped_sf = sf.make_group("default_key")
print(list(grouped_sf.keys))

Iterate on the Rows

You can iterate over the rows of a column in an sframe using this generator. This is useful because you can iterate over the rows of the data without worrying about whether the sframe is implemented in pandas or pyspark.

sf = DFrame.from_raw({"A": ["foo", "bar", "baz"]})
print(list(sf.iterrows("A")))