Table Inner Join

Home/Table Inner Join

Table Inner Join

ParameterValue
CategoryTable
Operationtable_join_inner
Workflow IconIcon
Input TypePlaidCloud Analyze Table
Output TypePlaidCloud Analyze Table

Description #

Use, as you might have expected, to perform an inner join operation on 2 data tables, combining them into a single data table based upon the specified join key(s).

For more details on inner join methodology, see here: Wikipedia SQL Inner Join

Table A Data Selection #

Table Source #

Specify the source data table by selecting it from the dropdown menu.

Select Subset of Source Data #

Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.../../../_images/common_data_filters_subset_source_data3.png

Note

Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:





Duplicates #

There is a checkbox option to use Distinct Rows Only. This is especially helpful in cases where duplicates may exist but are not desired. As experienced modelers know, having non-distinct data in an SQL join can increase expected record count significantly.

Additionally, there is standard function to Report Duplicates in Table elsewhere.

Duplicates #

To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table, which will contain all of the duplicate records.

../../../_images/common_data_filters_duplicates3.png

Caution

This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.

Source Columns and Replacements #

Specify any columns to be included in the Inner Join here. Selecting the Inspect Source and Populate Source Mapping Table buttons will make these columns available for the join operation.

Source Table Slicing (Limit) #

To limit the data, check the Apply Row Slicer box and then specify the following:

  • Initial Rows to Skip: Rows of data to skip (column header row is not included in count)
  • End at Row: Last row of data to include. Note that this is different from simply counting rows at the end to drop

../../../_images/common_data_filters_source_table_slicing3.png

Table B Data Selection #

Table Source #

Specify the source data table by selecting it from the dropdown menu.

Select Subset of Source Data #

Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.../../../_images/common_data_filters_subset_source_data3.png

Note

Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:





Duplicates #

There is a checkbox option to use Distinct Rows Only. This is especially helpful in cases where duplicates may exist but are not desired. As experienced modelers know, having non-distinct data in an SQL join can increase expected record count significantly.

Additionally, there is standard functionality to Report Duplicates in Table elsewhere. For more details on this capability, see details here:

Duplicates #

To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table which will contain all of the duplicate records.

../../../_images/common_data_filters_duplicates3.png

Caution

This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.

Source Columns and Replacements #

Specify any columns to be included in the Inner Join here. Selecting the Inspect Source and Populate Source Mapping Table buttons will make these columns available for the join operation.

Source Table Slicing (Limit) #

To limit the data, check the Apply Row Slicer box and then specify the following:

  • Initial Rows to Skip: Rows of data to skip (column header row is not included in count)
  • End at Row: Last row of data to include. Note that this is different from simply counting rows at the end to drop

../../../_images/common_data_filters_source_table_slicing3.png

Table Output #

Target Table #

Specify the Target Table name by either selecting an existing data table from the dropdown menu or typing a new data table name into the same menu. By default, the Target Table is automatically populated with the specific transform’s name. Note that data tables must follow Linux naming conventions. As such, we recommend that names only consist of alphanumeric characters. Analyze will automatically scrub any invalid characters from the name.

Join Map #

Specify join conditions. Using the Guess button will find all matching columns from both Table A as well as Table B. To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column names to match from Table A to Table B. To remove a field from the Join Map, simply right-click and select Delete.

Target Output Columns #

Specify the columns to appear in the target data table. Selecting the Propagate button will insert all columns listed in the Source Columns and Replacements section of both Table A and Table B. Any columns included in the Join Map will only be listed a single time.

To add additional columns manually, right click anywhere in the section and select either Insert Row or Append Row, to add a row prior to the currently selected row or to add a row at the end, respectively. Then, type the column name. To remove a field, simply right-click and select Delete.

Select Subset of Final Data #

Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.





Example code here

Final Data Table Slicing (Limit) #

To limit the data, simply check the Apply Row Slicer box and then specify the following:

  • Initial Rows to Skip: Rows of data to skip (column header row is not included in count)
  • End at Row: Last row of data to include. This is different from simply counting rows at the end to drop

../../../_images/common_data_filters_target_table_slicing3.png

Workflow Configuration Forms #

Inner Join Table A
Inner Join Table B
Inner Join Table Output

Examples #

Join Automobile Manufacturers with Models #

In this example, consider the following source data tables. First is a list of automobile manufacturers.

Mfg_IDManufacturer
1Aston Martin
2Porsche
3Lamborghini
4Ferrari
5Koenigsegg

Next is a list of automobile models with a manufacturer ID. Note that there are several models with no manufacturer.

ModelNameMfg_ID
Aventador3
Countach3
DBS1
Enzo4
One-771
Optimus Prime 
Batmobile 
Agera5
Lightning McQueen 

To get a list of models by manufacturer, it makes sense to join on Mfg_ID.

First, specify parameters for Table A Data Selection. The source data table is selected and all columns are listed.Inner Join

Next, specify parameters for Table B Data Selection. Once again, the source data table is selected and all columns are listed.Inner Join

Finally, the join conditions are set in the Table Output tab. Using the Guess button, Analyze properly identifies the Mfg_ID column to use as the Join Key. Lastly, the Target Output Columns are specified automatically using the Propagate button. This effectively includes all columns from all tables, with all join columns included only a single time. Note that the columns are sorted alphabetically, first by Manufacturer and next by ModelName.Inner Join

As expected, the final output only includes values which had a match in both tables. As such, Porsche does not show up because it had no models. Likewise, the Batmobile had no manufacturer (it was a custom job), so it’s not included.Inner Join

To include all results (including non-matching), consider the Outer Join Automobile Example.

BCS Demo – Join Tables on Match #

For an example showing how to use Inner Join to combine 2 data tables, please see the Combine Harris, Coaches, and Computer Scores section of the BCS Demo.

Go to Top