Skip to main content

GroupSFrame

note

If you are new to SFrame and are not familiar with it yet, please check the SFrame documentation first and make sure to understand its structure and philosophy.

When dealing with pipelines and transformers, it is sometimes necessary to have multiple SFrames alongside each other to work with them. The solution to this problem is the GroupSFrame. It is a special SFrame that contains multiple individual SFrames as children. These children are stored in a dictionary with sf_keys as the keys and SFrame objects as the values.

All methods of SFrame work with GroupSFrame. Since GroupSFrame has children inside it, if a method needs to access data of children, you must specify the sf_key to indicate which child to use. The default sf_key is always set as DEFAULT_SF_KEY in the config.

Single SFrame to GroupSFrame

A single SFrame (an SFrame that is not grouped) can be grouped by using the make_group method. This method accepts default_key, and after making the group, the only child of the group is the same single SFrame with the passed default key. As mentioned before, the default key value is DEFAULT_SF_KEY as defined in the config.

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group()
print(list(sf.keys))

If sf_key is passed to the make_group method, then the key will be assigned accordingly:

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group("first_sframe")
print(list(sf.keys))

Set and Get Children

We have a GroupSFrame like this:

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = sf_1.make_group(default_key="default")

To add new children to sf, you can easily do the following and get a GroupSFrame with two SFrame:

sf["address"] = sf_2
print(list(sf.keys))

This is how you get one SFrame based on its key from the GroupSFrame:

address_sf = sf["address"]
print(address_sf.to_raw())

Using this approach is safe even if sf is not a group. In a single SFrame, getting an item returns the single SFrame itself, and setting an item sets the single SFrame data to the raw value of the target SFrame.

To Raw

If you want to get raw data of all children, you can use the to_raw method. This method returns all raw data of children as a dictionary with sf_keys as keys and the raw data of each child as values.

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = GroupSFrame(children={"default": sf_1, "address": sf_2})
print(sf.to_raw())

And this is the output:

{'default':
A B
0 foo 1
1 bar 2,
'address':
A B
0 baz 3
1 qux 4}

Set Raw

To add new raw data as children to GroupSFrame, the set_raw method is useful.

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

data_1 = {"A": ["foo", "bar"], "B": [1, 2]}

sf = GroupSFrame(sframe_class=DFrame)
sf.set_raw(key="address", data=data_1)

Using this method is safe when you are not sure if the data is in SFrame format or raw. If the passed data is in raw format, it is converted to the proper SFrame, and if it is already an SFrame, it is added to the children.

Children Must Be of the Same Type

Note that GroupSFrame can only keep one type of SFrame inside it: all children must be DFrame or all children must be SPFrame. Every GroupSFrame must know the type of its children. For example, because the set_raw method works correctly, the group must have knowledge about the children types to convert raw data to the proper SFrame. There are two ways to make GroupSFrame understand the type of children:

  • Add sframe_class to the constructor of GroupSFrame
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

sf = GroupSFrame(sframe_class=DFrame)
  • Pass at least one child to GroupSFrame. In this case, if GroupSFrame has one child, it can infer sframe_class from that child.

Generally, you do not need to worry about this if GroupSFrames always have children. Only when you have an empty GroupSFrame should you manually set the sframe_class.

Merge Children

If you have several GroupSFrames and want to merge their children and get all the children as a dictionary, you can simply do this:

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame

sf_1 = DFrame.from_raw({"A": ["foo", "bar"]}).make_group(default_key="sf_1")
sf_2 = DFrame.from_raw({"A": ["foo", "bar"]}).make_group(default_key="sf_2")

print((sf_1 + sf_2).keys())