Most of the time, we rely on the honesty of others, especially when it comes to finalizing important documents. However, the risk of unnoticed modifications in contracts is always present, particularly with complex, lengthy agreements. Imagine dealing with 50-page contracts requiring meticulous review for handwritten changes after extensive negotiations. Manually scrolling through each page to identify alterations is time-consuming and prone to error. This is where a Computer Compare Tool becomes invaluable.
In today’s digital age, tools like Intelligence Suite, equipped with Computer Vision technology, offer a revolutionary approach to document review. This powerful suite automates the tedious process of comparing documents, specifically designed to identify handwritten markups in scanned PDFs and images. Let’s explore how a computer compare tool can streamline your workflow and ensure document integrity.
This article will guide you through leveraging a computer compare tool using Intelligence Suite to efficiently detect handwritten markups. We’ll break down the workflow into two parts. Part 1, discussed here, focuses on image processing and profiling techniques. Part 2 (in a separate blog post) will delve into advanced image processing and reporting steps to generate a clear report of pages containing markups.
Streamlining Document Review with a Computer Compare Tool: The Workflow
The goal is to create a workflow using a computer compare tool that automates the comparison of an original document against a revised version, specifically pinpointing pages with handwritten markups. This workflow will generate a concise report, highlighting only the pages requiring attention due to identified modifications. Instead of sifting through an entire 50-page contract, you’ll quickly focus on the few pages that matter.
Data Preparation for Your Computer Compare Tool
For this demonstration of a computer compare tool, we’ll use a publicly accessible document: the Alteryx Designer End User License Agreement (EULA). The original 6-page EULA PDF, named Original Alteryx EULA v20 2006.pdf, is downloaded from this URL.
To simulate a marked-up document, the EULA was printed, and handwritten markups were added in red ink to pages 3 and 4. This modified document was then scanned and saved as Signed Alteryx EULA with Markups.pdf (available as an attachment). While referred to as “signed,” it’s important to note that this EULA example lacks an actual signature page. Imagine this as a segment of a larger contract, excluding the signature section. A future discussion will explore automating signature detection.
The crucial aspect for utilizing a computer compare tool effectively is having two PDF files: the original and the potentially modified version. Both files must contain the same number of pages, ordered identically.
Here’s a side-by-side view of page 3, showcasing the original (left) and the marked-up version (right), demonstrating the type of changes a computer compare tool needs to identify.
Results Achieved with the Computer Compare Tool
Upon running the workflow powered by our computer compare tool, the output clearly presents pages 3 and 4 side-by-side, original versus marked-up, for easy visual comparison.
Even a minor change, like the word “not” added on page 4, is accurately detected by the computer compare tool.
Let’s break down the steps involved in using this computer compare tool to identify pages with handwritten markups.
Step-by-Step Guide: Using a Computer Compare Tool for Markup Detection
The following steps detail the workflow of our computer compare tool, utilizing Alteryx Intelligence Suite components.
Step 1: Data Input with Image Input Tool
The process begins with the Image Input tool. This tool allows you to specify a folder containing both the original and revised PDF files. Designed to work with two files at a time, this computer compare tool treats each page within the PDFs as a separate image, leveraging Optical Character Recognition (OCR) technology.
Step 2: Image Pre-processing for Enhanced Comparison
To optimize the comparison, the computer compare tool converts color images to black and white using the Image Processing tool. While markups are in red, black and white conversion improves the tool’s reliability.
This conversion involves a refined three-step process within the computer compare tool:
- Brightness Balance Adjustment: Setting Brightness Balance to -77 darkens the text, enhancing readability. Optimal brightness levels may vary for different documents. Generally, negative values result in darker images.
- Grayscale Conversion: The Grayscale function converts colors to shades of gray, simplifying the image for comparison.
- Binary Thresholding: Binary image thresholding eliminates stray pixels and scanning artifacts, further cleaning up the image.
Step 3: Image Profiling and Page Arrangement
The Image Profile tool, a key component of our computer compare tool, extracts metadata from the processed images. Pages are then arranged side-by-side, and duplicate entries are removed through subsequent steps.
The Image Profile tool, pointed at the processed image field, generates numerous metadata fields. For our computer compare tool, we focus on the [Bright_Pixel_Count]
and deselect other metadata profiles.
Tool | Configuration |
---|---|
After running Image Profile, a Select tool is used to retain only the [Bright_Pixel_Count]
field, along with [file]
, [page]
, and [image]
fields, which are then renamed for clarity to [file]
, [Page]
, and [Original Image]
respectively. Resizing the [file]
field is a recommended best practice.
The data is then restructured from a single list of pages to a paired list, aligning original and signed document pages on the same row.
Current data structure:
Desired data structure for comparison:
This transformation is achieved using a Join tool, joining the data to itself based on the [Page]
field. The [Page]
field from the right join is unselected to avoid duplication.
Tool | Configuration |
---|---|
This self-join results in 24 rows (4 combinations x 6 pages). A Filter tool then removes rows where [File 1]
and [File 2]
refer to the same file, reducing the dataset.
Tool | Configuration |
---|---|
The dataset is now reduced to 12 records.
Each page now has two entries with swapped [File 1]
and [File 2]
data. A Sort tool sorts the data by [Page]
and [File 1]
to ensure consistent ordering, followed by a Unique tool to retain only one unique row per [Page]
.
Tool | Configuration |
---|---|
Tool | Configuration |
---|---|
Finally, the dataset is refined to one row per page, containing both original and signed versions of file names, images, and bright pixel counts, ready for markup detection.
Step 4: Markup Detection with the Computer Compare Tool
This step employs a test to calculate a ratio based on bright pixel counts to identify pages likely containing markups. Pages exceeding a defined threshold are flagged for manual review.
The ratio is calculated by taking the absolute difference in bright pixel counts between the two pages and dividing it by the smaller of the two bright pixel counts. This normalization is crucial as it accounts for potential size variations between scanned and original documents.
Tool | Configuration |
---|---|
A formula tool then generates text in a new [Test for Markups]
field, indicating whether markups are likely present based on the calculated ratio and a determined threshold (0.046 in this example).
Tool | Configuration |
---|---|
A final Filter tool isolates pages where [Test for Markups]
does not equal “No markups here,” effectively filtering out pages without detected handwritten modifications.
Tool | Configuration |
---|---|
The workflow successfully identifies pages 3 and 4 as containing markups, demonstrating the effectiveness of the computer compare tool in automating document review.
Conclusion: Embrace the Power of Computer Compare Tools
This workflow demonstrates a practical application of a computer compare tool to automate the detection of handwritten markups in documents. By leveraging Intelligence Suite and its Computer Vision capabilities, you can significantly reduce the manual effort involved in contract review and document verification. This approach not only saves time but also minimizes the risk of human error in identifying critical document modifications.
This part of the workflow provides the core functionality of the computer compare tool. Stay tuned for part 2, which will focus on enhancing the presentation of these results for improved readability and reporting.
Find contract pages with markups.yxmd