Skip to main content

Source

The Source is used for fetching the dataset. The source has different implementations: local, database, or Flipside integration.

Source Mode

As mentioned before, seshat supports several Python libraries for analysis. The source is where the data is loaded and must be passed to one of these library's data frames. You can set the mode variable to specify which type of data frame to use. The default value is df, which indicates pandas. If you prefer PySpark, you can set it to spf.

Schema

When data is fetched from a database, it is highly recommended to specify the type of columns. The schema can change the type of the input SFrame. Because changing the schema of data is very common after fetching it, the source can accept a schema. If provided, the data will be directly passed to the schema and then returned. The schema also has the ability to change column names, which is very useful after fetching the data.