Import HTML table data from the internet.
Specify the URL of the source data.
Specify a name for the target data table. Note that data tables must follow Linux naming conventions. As such, we recommend that names only consist of alphanumeric characters. Analyze will automatically scrub any invalid characters from the name. Additionally, it will limit the length to 256 characters, so be concise!
Since it is possible to have multiple tables on a web page, the user must specify which table to import. To do so, specify Name and/or Attribute values to match.
For example, consider the following table:
<table border="1" id="import">
To import this table, specify id:import in the Name Match field.
Additionally, there is an option to skip rows at the beginning of the table.
Specify the row to use for header information. By default, the Column Header Row is 0.
The Table Data Selection tab is used to map columns from the source data table to the target data table. All source columns on the left side of the window are automatically mapped to the target data table depicted on the right side of the window. Using the Inspect Source menu button, there are a few additional ways to map columns from source to target:
In addition to each of these options, each choice offers the ability to preview the source data.
If the source and target column options aren’t enough, other columns can be added into the target data table in several different ways:
Selecting Propagate All may effectively create a duplicate of every column. Analyze does not check to see if the columns are already mapped. Make sure duplicate column names do not exist.
To delete columns from the target data table, select the desired column(s), then right click and select Delete.
To rearrange columns in the target data table, select the desired column(s), then right click and select Move to Top, Move Up, Move Down, or Move to Bottom.
To return only distinct options, select the Distinct menu option. This will toggle a set of checkboxes for each column in the source. Simply check any box next to the corresponding column to return only distinct results.
When the target data table contains only a subset of the source data table, select the check box next to only the columns which are to be included in the target data table. Selecting all checkboxes could provide output that does not appear to be distinct.
To aggregate results, select the Summarize menu option. This will toggle a set of drop down boxes for each column in the target data table. The following summarization options are available:
For more aggregation details, see the Analyze overview page [here](/docs/analyze/#aggregation).
To allow for maximum flexibility, data filters are available on the source data and the target data. For larger data sets, it can be especially beneficial to filter out rows on the source so the remaining operations are performed on a smaller data set.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and examples.
Compound filters must have individual elements wrapped in parentheses. For example, if filtering for Temperature and Humidity, a valid filter would look like this:
To report duplicates, select the Report Duplicates in Table checkbox and then specify an output table which will contain all of the duplicate records.
This will not remove the duplicate items from the target data table. To remove duplicate items, use the Distinct menu options as specified in the [Table Data Selection](../transforms/common_features#table-data-selection) section.
Any valid Python expression is acceptable to subset the data. Please see Expressions for more details and example.s
Example code here
To limit the data, check the Apply Row Slicer box and then specify the following:
To limit the data, simply check the Apply Row Slicer box and then specify the following:
Consider the Hello World sample table.