Extending
In this section, we explain how to extend SFrames
. The functionality of extending is the same in all implementations, so
the example is written for DFrame
, but it applies to other implementations too.
SFrame
can be extended using other raw pandas DataFrames
. There are two types of extending: horizontally and vertically.
In the vertical case, the other data is added at the bottom of the original data, and columns are matched together. If
some columns do not exist in one of the data sets, then the values of these columns are NaN. Extending vertically is, in
fact, the pandas concat
method or the unionByName
method in pyspark.
Extend Vertically
To extend vertically:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd
data_1 = {"A": ["foo", "bar"], "B": [1, 2]}
sf_1 = DFrame.from_raw(data_1)
df = pd.DataFrame({"A": ["bar", "qux"], "C": [3, 4]})
sf_1.extend(other=df, axis=0)
print(sf_1.to_raw())
Extend Horizontally
To extend horizontally, the data will be merged:
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd
sf_1 = DFrame.from_raw({"A": ["foo", "bar", "baz", "foo"], "B_left": [1, 2, 3, 5]})
df = pd.DataFrame({"A": ["foo", "bar", "baz", "foo"], "B_right": [5, 6, 7, 8]})
sf_1.extend(df, axis=1, on="A", how="left")
print(sf_1.to_raw())
The on
argument indicates that left_on
and right_on
are the same. left_on
and right_on
can be a string or a
list of strings.
from seshat.data_class import SFrame, DFrame, SPFrame, GroupSFrame
import pandas as pd
sf_1 = DFrame.from_raw(
{"A": ["foo", "bar", "baz", "foo"], "B_left": [1, 2, 3, 5], "C": [1, 2, 3, 4]}
)
df = pd.DataFrame(
{"A": ["foo", "bar", "baz", "foo"], "B_right": [5, 6, 7, 8], "C": [1, 2, 7, 8]}
)
sf_1.extend(df, axis=1, left_on=["A", "C"], right_on=["A", "C"], how="left")
print(sf_1.to_raw())
In this example, the merge operation is performed based on the columns "A" and "C".