Skip to main content

Extending

In this section, we explain how to extend SFrames. The functionality of extending is the same in all implementations, so the example is written for DFrame, but it applies to other implementations too.

SFrame can be extended using other raw pandas DataFrames. There are two types of extending: horizontally and vertically. In the vertical case, the other data is added at the bottom of the original data, and columns are matched together. If some columns do not exist in one of the data sets, then the values of these columns are NaN. Extending vertically is, in fact, the pandas concat method or the unionByName method in pyspark.

Extend Vertically

To extend vertically:

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd

data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
sf_1 = DFrame.from_raw(data_1)
df = pd.DataFrame({"A": ["bar", "qux"], "C": [3, 4]})
sf_1.extend(other=df, axis=0)

print(sf_1.to_raw())

Extend Horizontally

To extend horizontally, the data will be merged:

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd

sf_1 = DFrame.from_raw({"A": ["foo", "bar", "baz", "foo"], "B_left": [1, 2, 3, 5]})
df = pd.DataFrame({"A": ["foo", "bar", "baz", "foo"], "B_right": [5, 6, 7, 8]})
sf_1.extend(df, axis=1, on="A", how="left")

print(sf_1.to_raw())

The on argument indicates that left_on and right_on are the same. left_on and right_on can be a string or a list of strings.

from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd

sf_1 = DFrame.from_raw(
{"A": ["foo", "bar", "baz", "foo"], "B_left": [1, 2, 3, 5], "C": [1, 2, 3, 4]}
)
df = pd.DataFrame(
{"A": ["foo", "bar", "baz", "foo"], "B_right": [5, 6, 7, 8], "C": [1, 2, 7, 8]}
)
sf_1.extend(df, axis=1, left_on=["A", "C"], right_on=["A", "C"], how="left")

print(sf_1.to_raw())

In this example, the merge operation is performed based on the columns "A" and "C".