GroupSFrame
If you are new to SFrame and are not familiar with it yet, please check the SFrame documentation first and make sure to understand its structure and philosophy.
When dealing with pipelines and transformers, it is sometimes necessary to have multiple SFrames alongside
each other to work with them. The solution to this problem is the GroupSFrame
.
It is a special SFrame that contains multiple individual SFrames
as children.
These children are stored in a dictionary with sf_keys
as the keys and SFrame objects as the values.
All methods of SFrame work with GroupSFrame
. Since GroupSFrame
has children inside it, if a method needs to access
data of children, you must specify the sf_key
to indicate which child to use. The default sf_key
is always set
as DEFAULT_SF_KEY
in the config.
Single SFrame to GroupSFrame
A single SFrame (an SFrame that is not grouped) can be grouped by using the make_group
method. This method
accepts default_key
, and after making the group, the only child of the group is the same single SFrame with the passed
default key. As mentioned before, the default key value is DEFAULT_SF_KEY
as defined in the config.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group()
print(list(sf.keys))
If sf_key
is passed to the make_group
method, then the key will be assigned accordingly:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data = {"A": ["foo", "bar"], "B": [1, 2]}
sf = DFrame.from_raw(data)
sf = sf.make_group("first_sframe")
print(list(sf.keys))
Set and Get Children
We have a GroupSFrame
like this:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = sf_1.make_group(default_key="default")
To add new children to sf
, you can easily do the following and get a GroupSFrame
with two SFrame
:
sf["address"] = sf_2
print(list(sf.keys))
This is how you get one SFrame
based on its key from the GroupSFrame
:
address_sf = sf["address"]
print(address_sf.to_raw())
Using this approach is safe even if sf
is not a group. In a single SFrame, getting an item returns the single SFrame
itself, and setting an item sets the single SFrame data to the raw value of the target SFrame.
To Raw
If you want to get raw data of all children, you can use the to_raw
method. This method returns all raw data of
children as a dictionary with sf_keys
as keys and the raw data of each child as values.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
data_2 = {"A": ["baz", "qux"], "B": [3, 4]}
sf_1 = DFrame.from_raw(data_1)
sf_2 = DFrame.from_raw(data_2)
sf = GroupSFrame(children={"default": sf_1, "address": sf_2})
print(sf.to_raw())
And this is the output:
{'default':
A B
0 foo 1
1 bar 2,
'address':
A B
0 baz 3
1 qux 4}
Set Raw
To add new raw data as children to GroupSFrame, the set_raw
method is useful.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
sf = GroupSFrame(sframe_class=DFrame)
sf.set_raw(key="address", data=data_1)
Using this method is safe when you are not sure if the data is in SFrame format or raw. If the passed data is in raw format, it is converted to the proper SFrame, and if it is already an SFrame, it is added to the children.
Children Must Be of the Same Type
Note that GroupSFrame
can only keep one type of SFrame
inside it: all children must be DFrame
or all
children must be SPFrame
. Every GroupSFrame
must know the type of its children. For example, because the set_raw
method works correctly, the group must have knowledge about the children types to convert raw data to the proper SFrame
.
There are two ways to make GroupSFrame
understand the type of children:
- Add
sframe_class
to the constructor of GroupSFrame
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
sf = GroupSFrame(sframe_class=DFrame)
- Pass at least one child to
GroupSFrame
. In this case, ifGroupSFrame
has one child, it can infersframe_class
from that child.
Generally, you do not need to worry about this if GroupSFrames
always have children. Only when you have an empty
GroupSFrame
should you manually set the sframe_class
.