Use, as you might have expected, to perform a full outer join operation on 2 data tables, combining them into a single data table based upon the join key(s) specified.
For more details on outer join methodology, see here: Wikipedia SQL Full Outer Join
Specify the source data table by selecting it from the dropdown menu.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.
Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:
There is a checkbox option to use Distinct Rows Only. This is especially helpful in cases where duplicates may exist, but are not desired. As experienced modelers know, having non-distinct data in an SQL join can increase expected record count significantly.
Additionally, there is standard functionality to Report Duplicates in Table elsewhere. For more details on this capability, see details here:
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table which will contain all of the duplicate records.
This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.
Specify any columns to be included in the Outer Join here. Selecting the Inspect Source and Populate Source Mapping Table buttons will make these columns available for the join operation.
To limit the data, check the Apply Row Slicer box and then specify the following:
Additionally, there is standard functionality to Report Duplicates in Table elsewhere.
Specify the Target Table name by either selecting an existing data table from the dropdown menu or typing a new data table name into the same menu. By default, the Target Table is automatically populated with the specific transform’s name. Note that data tables must follow Linux naming conventions. As such, we recommend that names only consist of alphanumeric characters. Analyze will automatically scrub any invalid characters from the name.
Specify join conditions. Using the Guess button will find all matching columns from both Table A and Table B. To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column names to match from Table A to Table B. To remove a field from the Join Map, simply right-click and select Delete.
Specify the columns to appear in the target data table. Selecting the Propagate button will insert all columns listed in the Source Columns and Replacements section of both Table A and Table B. Any columns included in the Join Map will only be listed a single time.
To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column name. To remove a field, simply right-click and select Delete.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples
Example code here
To limit the data, simply check the Apply Row Slicer box and then specify the following:
In this example, consider the following source data tables. First is a list of automobile manufacturers.
Next is a list of automobile models with a manufacturer ID. Note that there are several models with no manufacturer.
To get a list of models by manufacturer, it makes sense to join on Mfg_ID. By leveraging outer join concepts, the output will also be able to show those items which do not have any matches.
First, specify parameters for Table A Data Selection. The source data table is selected and all columns are listed.
Next, specify parameters for Table B Data Selection. Once again, the source data table is selected and all columns are listed.
Finally, the join conditions are set in the Table Output tab. Using the Guess button, Analyze properly identifies the Mfg_ID column to use as the Join Key. Lastly, the Target Output Columns are specified automatically using the Propagate button. This effectively includes all columns from all tables, with any join columns obviously only being included a single time. Note that the columns are sorted alphabetically, first by Manufacturer and next by ModelName.
As expected, the final output includes all rows from both tables, whether they had a match in both tables or not. As such, this time Porsche does indeed show up despite having no models. Additionally, Batmobile, Lightning McQueen, and Optimus Prime are included in the results even though none of them have a manufacturer. Besides, who can say ‘No’ to them?
To include all matching results only, consider the Inner Join Automobile Example.