Comparing two lists in Excel to find matches, duplicates, or unique values is a common task. This article explores various techniques to effectively compare lists using Excel and FME (Feature Manipulation Engine).
Using Excel and FME for List Comparison
Several methods exist for comparing two lists, ranging from simple Excel functions to more complex solutions involving FME. Let’s examine some approaches:
Leveraging FME’s ChangeDetector Transformer
The ChangeDetector transformer in FME offers a robust solution for comparing data. By connecting two Excel spreadsheets (or other data sources) to the ChangeDetector, you can identify:
- Inserted Features: Items present in the second list but not the first.
- Deleted Features: Items present in the first list but not the second.
- Updated Features: Items present in both lists but with differing attribute values.
To effectively use the ChangeDetector for list comparison:
- Format your data: Ensure your lists are structured for optimal comparison. One approach is to arrange your data horizontally in a single row within Excel. This facilitates easier processing within FME. Alternatively, if your data is stored as comma-separated values (CSV) within a single cell, you can use the AttributeSplitter transformer in FME to break it down into individual list elements.
- Create a Common Attribute: If using a list-based approach, adding a common attribute (e.g., a field with the value “1” for all rows) allows you to use the ListBuilder transformer to group the data into a single list for comparison.
- Connect to ChangeDetector: Connect the outputs of the ListBuilder (or directly from the Excel spreadsheet if using horizontal formatting) to the ChangeDetector transformer. This will automatically highlight the differences between the two lists.
FeatureMerger and Fuzzy String Comparison
For scenarios involving slight variations in list items (e.g., “Einstein ST” vs. “A Einstein ST”), a more sophisticated approach is needed:
- Utilize FeatureMerger: Use the FeatureMerger transformer to compare the lists, but instead of a strict equality check, configure it to explode the output list. This separates matching and non-matching items.
- Incorporate Fuzzy String Comparer: Integrate a fuzzy string comparison technique to identify potential matches based on similarity rather than exact equality. This accounts for minor spelling differences or variations in address formats. A specified percentage threshold determines the degree of similarity required for a match.
This method effectively identifies unreferenced records with slight differences, offering a more flexible comparison. Remember to remove duplicates after the merging process.
Conclusion
Comparing two lists in Excel can be achieved through several methods. For straightforward comparisons, the ChangeDetector in FME provides a robust solution. When dealing with potential variations in text, combining the FeatureMerger with fuzzy string comparison offers greater flexibility. Choosing the right method depends on the specific requirements of your comparison task and the nature of your data.