Model Validation

A model is defined as “a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates.” Watchman offers a statistical scoring of names, addresses, government identifiers, and other data against watchlists from numerous public sources. Watchman is a modern software application used by many companies in production deployments at cloud scale.

Data Sources

Watchman’s default data sources are several government agency and public data sources. These are typically lists of entity data (names, addresses, government IDs, etc) published regularly. Watchman will periodically download these data files and re-index the data. By default this refresh occurs on a 12-hour interval and can be configured or initiated manually. This allows for a high degree of uptime and continual improvement.

Sources List

After the data files are refreshed users can configure webhook notifications to be notified and initiate custom processes. Custom data files can be used with Watchman.

Watchman will index the data sources in a normalized form for improved search rankings. These steps are documented for data cleanup and typical search patterns.

Scoring

Watchman uses the Jaro–Winkler string comparison scoring for each query. Each word part is ordered and compared to the indexed data. The model, scoring, and search rankings are verified on every source code commit and release of Watchman. Monthly checks are performed to verify no unexpected changes have occurred. Changes to the scoring are thoroughly analyzed prior to inclusion as the results returned can have large impacts. Users of Watchman should experiment with different tolerances with positive / negative matches.

Jaro-Winkler distance is a public algorithm for comparing two strings of text to determine their similarity. Results range from 0.0 (completely unequal) to 1.0 (completely equal). Jaro-Winkler has been optimized for human and street names and is a modification of the Jaro algorithm with an additional boost on exact matches. There are two parameters with their defaults specified as: boostThreshold=0.7 and prefixSize=4. See “Other Links” below for references.

Periodic searches of names, addresses, IDs, etc can be performed by two different methods. Watchman supports “watches” which are performed after source data is refreshed and delivers results via webhooks. Otherwise the HTTP endpoints can be called to get the current scoring. Watchman is highly performant to support large amounts of queries.

Search queries return better results when multiple criteria are included with the query. Simple name queries will return false positive matches, so including addresses, alternate names, and other fields are suggested.

Filtering

OFAC searches can add filters to include results of a certain type. These types can be individuals, businesses, aircraft, and vessels. SDN results can also be filtered by their OFAC program. Address searches can filter by Country

Deep Inspection

OFAC searches can include exact matches on ID values (e.g. Government ID). These are in the “Remarks” section of each entity.

Checks Not Performed

BSA/AML programs have requirements that are outside of Watchman, such as ownership calculations (thresholds, shell corporations, indirect majority shares), family relationships, and other risk analysis.

Reporting

Watchman does not store search results or rankings. It’s expected that users of Watchman store this information according to their risk and compliance needs.

A web UI is included with Watchman for inspecting OFAC results.