Used to combine multiple data tables with the same column structure into a single data table. For example, time series data is a prime candidate for this transform.
Union All preserves all rows in the union set. If you want only unique records, use Union instead.
The Sources section serves as a collection of all data tables to append together. Typically, all of the data tables will have the same (or similar) column structure. There are two buttons available to add a data table to the list:
Additionally, right-clicking in the Select Source to Edit window will display the same options. Right-clicking on a table already added will also display the Delete option.
To execute the transform properly, there will need to be one entry in the Sources section for every source data table to append together. These entries are listed in the order in which they will be appended. To adjust the order, right-clicking on a table will display the following options:
By default, each source is named New Table, but the modeler is encouraged to provide descriptive names by double-clicking the name and renaming accordingly.
It is important to remember that the text shown is not related to the source data table’s name. We recommend that the modeler provides a name that is descriptive, often the same as the source data table, but keep in mind that there is no tie whatsoever between the names.
By default, the Target Table is left blank. Before naming, note that data tables must follow Linux naming conventions. As such, we recommend that names only consist of alphanumeric characters. Analyze will automatically scrub any invalid characters from the name. Additionally, it will limit the length to 256 characters, so be concise!
Remember to configure Table Data Selection conditions for each data table listed in Sources.
To set Source Table, select the original data table from the dropdown list. Selecting a data table will automatically populate whether it is a Project or a Workflow table. Additionally, there is an option to preview the data table.
source columns and replacements
Remember to configure Data Filters conditions for each data table listed in Sources.
To allow for maximum flexibility, data filters are available on the source data and the target data. For larger data sets, it can be especially beneficial to filter out rows on the source so the remaining operations are performed on a smaller data set.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.
Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table, which will contain all of the duplicate records.
This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples
Example code here
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table which will contain all of the duplicate records.
To limit the data, check the Apply Row Slicer box and then specify the following:
To limit the data, simply check the Apply Row Slicer box and then specify the following:
In the following example, time series data for weather from 2012, 2013, and 2014 is appended from 3 unique data tables into a single data table. Note that all 3 source data tables have identical column structure. Also, keep in mind that no Data Filters were applied for any of the Source Table items.
First, the 2012 data is loaded.
Next comes the data from 2013.
Finally, the 2014 data is loaded.