SFrame
SFrame is a special data frame used in Seshat. The SFrame acts as a data wrapper that can encapsulate raw data from various sources, such as pandas or PySpark dataframes. By wrapping different types of dataframes into a consistent SFrame format, it ensures uniformity and compatibility across the project.
However, SFrame is not just a data wrapper. It also provides additional methods that are particularly useful when working with dataframes from other libraries. This ensures that you are always dealing with the same type of object, regardless of the underlying data source. This consistency is crucial for maintaining the integrity and efficiency of data operations throughout the entire project.
Note that SFrame is an interface, and if you want to work with it directly, you can see the DFrame and SPFrame documentation. On this page, we focus on defining the methods and generally use the pandas implementation or DFrame for examples. To see other implementations, see the related documentation.
Create SFrame
If you want to create a new SFrame, you can easily do this:
from seshat.data_class import SFrame
sf = SFrame.from_raw({"foo": [1, 2, 3]})
Retrieve and Set the Raw Data
If you want to set raw data, you can use the set_raw
method:
from seshat.data_class import SFrame
sf = SFrame()
sample_data = {"foo": [1, 2, 3]}
sf.set_raw("db", sample_data)
And to get the raw data, use the to_raw
method:
sf.to_raw()
Make Group
To convert an sframe
into a GroupSFrame, you can use the make_group
method. This method accepts
a default_key
, which will be the key of the sframe
in the resulting group sframe
children.
from seshat.data_class import SFrame, DFrame
sf = DFrame.from_raw({})
grouped_sf = sf.make_group("default_key")
print(list(grouped_sf.keys))
Iterate on the Rows
You can iterate over the rows of a column in an sframe
using this generator. This is useful because you can iterate
over the rows of the data without worrying about whether the sframe
is implemented in pandas or pyspark.
sf = DFrame.from_raw({"A": ["foo", "bar", "baz"]})
print(list(sf.iterrows("A")))