Source
The Source is used for fetching the dataset. The source has different implementations: local, database, or Flipside integration.
Source Mode
As mentioned before, seshat supports several Python libraries for analysis. The source is where the data is loaded and
must be passed to one of these library's data frames. You can set the mode
variable to specify which type of data
frame to use. The default value is df
, which indicates pandas. If you prefer PySpark, you can set it to spf
.
Schema
When data is fetched from a database, it is highly recommended to specify the type of columns. The schema
can change
the type of the input SFrame. Because changing the schema of data is very common after fetching it, the source
can
accept a schema. If provided, the data will be directly passed to the schema and then returned. The schema
also has
the ability to change column names, which is very useful after fetching the data.