Use, as you might have expected, to perform an inner join operation on 2 data tables, combining them into a single data table based upon the specified join key(s).
For more details on inner join methodology, see here: Wikipedia SQL Inner Join
Specify the source data table by selecting it from the dropdown menu.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.
Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:
There is a checkbox option to use Distinct Rows Only. This is especially helpful in cases where duplicates may exist but are not desired. As experienced modelers know, having non-distinct data in an SQL join can increase expected record count significantly.
Additionally, there is standard function to Report Duplicates in Table elsewhere.
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table, which will contain all of the duplicate records.
This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.
Specify any columns to be included in the Inner Join here. Selecting the Inspect Source and Populate Source Mapping Table buttons will make these columns available for the join operation.
To limit the data, check the Apply Row Slicer box and then specify the following:
Additionally, there is standard functionality to Report Duplicates in Table elsewhere. For more details on this capability, see details here:
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table which will contain all of the duplicate records.
Specify the Target Table name by either selecting an existing data table from the dropdown menu or typing a new data table name into the same menu. By default, the Target Table is automatically populated with the specific transform’s name. Note that data tables must follow Linux naming conventions. As such, we recommend that names only consist of alphanumeric characters. Analyze will automatically scrub any invalid characters from the name.
Specify join conditions. Using the Guess button will find all matching columns from both Table A as well as Table B. To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column names to match from Table A to Table B. To remove a field from the Join Map, simply right-click and select Delete.
Specify the columns to appear in the target data table. Selecting the Propagate button will insert all columns listed in the Source Columns and Replacements section of both Table A and Table B. Any columns included in the Join Map will only be listed a single time.
To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column name. To remove a field, simply right-click and select Delete.
Example code here
To limit the data, simply check the Apply Row Slicer box and then specify the following:
In this example, consider the following source data tables. First is a list of automobile manufacturers.
Next is a list of automobile models with a manufacturer ID. Note that there are several models with no manufacturer.
To get a list of models by manufacturer, it makes sense to join on Mfg_ID.
First, specify parameters for Table A Data Selection. The source data table is selected and all columns are listed.
Next, specify parameters for Table B Data Selection. Once again, the source data table is selected and all columns are listed.
Finally, the join conditions are set in the Table Output tab. Using the Guess button, Analyze properly identifies the Mfg_ID column to use as the Join Key. Lastly, the Target Output Columns are specified automatically using the Propagate button. This effectively includes all columns from all tables, with all join columns included only a single time. Note that the columns are sorted alphabetically, first by Manufacturer and next by ModelName.
As expected, the final output only includes values which had a match in both tables. As such, Porsche does not show up because it had no models. Likewise, the Batmobile had no manufacturer (it was a custom job), so it’s not included.
To include all results (including non-matching), consider the Outer Join Automobile Example.
For an example showing how to use Inner Join to combine 2 data tables, please see the Combine Harris, Coaches, and Computer Scores section of the BCS Demo.