GroupSFrame
If you are new to SFrame and are not familiar with it yet, please check the SFrame documentation first and make sure to understand its structure and philosophy.
When dealing with pipelines and transformers, it is sometimes necessary to have multiple SFrames alongside
each other to work with them. The solution to this problem is the GroupSFrame.
It is a special SFrame that contains multiple individual SFrames as children.
These children are stored in a dictionary with sf_keys as the keys and SFrame objects as the values.
All methods of SFrame work with GroupSFrame. Since GroupSFrame has children inside it, if a method needs to access
data of children, you must specify the sf_key to indicate which child to use. The default sf_key is always set
as DEFAULT_SF_KEY in the config.
Single SFrame to GroupSFrame
A single SFrame (an SFrame that is not grouped) can be grouped by using the make_group method. This method
accepts default_key, and after making the group, the only child of the group is the same single SFrame with the passed
default key. As mentioned before, the default key value is DEFAULT_SF_KEY as defined in the config.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group()
print(list(sf.keys))
If sf_key is passed to the make_group method, then the key will be assigned accordingly:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group("first_sframe")
print(list(sf.keys))
Set and Get Children
We have a GroupSFrame like this:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = sf_1.make_group(default_key="default")
To add new children to sf, you can easily do the following and get a GroupSFrame with two SFrame:
sf["address"] = sf_2
print(list(sf.keys))
This is how you get one SFrame based on its key from the GroupSFrame:
address_sf = sf["address"]
print(address_sf.to_raw())
Using this approach is safe even if sf is not a group. In a single SFrame, getting an item returns the single SFrame
itself, and setting an item sets the single SFrame data to the raw value of the target SFrame.
To Raw
If you want to get raw data of all children, you can use the to_raw method. This method returns all raw data of
children as a dictionary with sf_keys as keys and the raw data of each child as values.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = GroupSFrame(children={"default": sf_1, "address": sf_2})
print(sf.to_raw())
And this is the output:
{'default':
     A  B
 0  foo  1
 1  bar  2,
 'address':
     A  B
 0  baz  3
 1  qux  4}
Set Raw
To add new raw data as children to GroupSFrame, the set_raw method is useful.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
sf = GroupSFrame(sframe_class=DFrame)
sf.set_raw(key="address", data=data_1)
Using this method is safe when you are not sure if the data is in SFrame format or raw. If the passed data is in raw format, it is converted to the proper SFrame, and if it is already an SFrame, it is added to the children.
Children Must Be of the Same Type
Note that GroupSFrame can only keep one type of SFrame inside it: all children must be DFrame or all
children must be SPFrame. Every GroupSFrame must know the type of its children. For example, because the set_raw
method works correctly, the group must have knowledge about the children types to convert raw data to the proper SFrame.
There are two ways to make GroupSFrame understand the type of children:
- Add sframe_classto the constructor of GroupSFrame
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
sf = GroupSFrame(sframe_class=DFrame)
- Pass at least one child to GroupSFrame. In this case, ifGroupSFramehas one child, it can infersframe_classfrom that child.
Generally, you do not need to worry about this if GroupSFrames always have children. Only when you have an empty
GroupSFrame should you manually set the sframe_class.