Load & Transform
The ‘Load’ box currently comes with four example datasets:
- Game Ratings
- College Scorecards <- With this dataset, I included all 4-year Universities that reported data consistently between the years 2000-2015.
- Starbucks Nutrition
- Border Patrol Apprehensions
Users can provide their own dataset by uploading a CSV, Excel, or RDS file. Missing values will be replaced with a character string “NULL”, and character columns will be converted to factors.
The Undo Changes button is used to revert changes the source dataset after performing any of the transformations.
- Combine: Will combine the specified columns into one column, separated by the provided character(s).
- Rename: if new_column_name is comma-separated and contains as many values as columns, each column will be renamed accordingly.
- Drop: if the new_column_name field is empty, and Keep combined cols. is unchecked, the specified columns will be dropped.
Allows users to split the single specified column into several columns. Uses the regular expression specified in Split RegEx to determine where to split. If not specified, the specified column will be splity by all non-alphanumeric characters. Check Keep split col. to retain the split column; otherwise, it is dropped.
The ‘Reshape’ box enables users to “melt” tabular data into a long format that is more suitable for the later plotting operations. However, it can exhibit some nuanced behavior based on certain conditions.
In its simplest form, all columns not specified in ID columns will be melted/gathered into key-value pairs
Because the ‘value’ of the key-value pair must be a numeric value, users will notice unique behavior when attempting to melt a non-numeric column. In this scenario, the dataset will be grouped by the specified ID columns and distinct counts of each non-numeric column will be tallied; numeric columns will be summed.
Non-numeric columns will be prepended with # and numeric columns will be prepended with Total. An additional column # group obs. will be added to indicate how many total observations were present in each group.
Notice that the dataset is still in wide format. Now that all non-numeric columns have been tallied into numeric columns, click Melt once more to melt the new calculated columns.