What is Watchman?
The Watchman project implements an HTTP server and Go library for searching, parsing, and downloading lists. We also have an example of the webhook service. Below, you can find a detailed list of features offered by Watchman:
- Download OFAC, BIS Denied Persons List (DPL), and various other data sources on startup
- Admin endpoint to manually refresh OFAC and DPL data
- Index data for searches
- Async searches and notifications (webhooks)
- Manual overrides to mark a
- Library for OFAC and BIS DPL data to download and parse their custom files
Searching across all sanction lists Watchman uses the Jaro–Winkler algorithm to score the probability of each search query matching a list entry. This follows after what the US Treasury OFAC Search uses and what is recommended in academic literature.
How are entities from the list indexed and used in search?
Entities from sanction lists and other data files are folded through various pre-computations prior to inclusion in the search index. This means the following steps will occur (in order):
Each individual's SDN name is re-ordered (Example: from "MADURO MOROS, Nicolas" to "Nicolas MADURO MOROS").
Company Name Cleanup
Suffixes from company names such as: "CO.", "INC.", "L.L.C.", etc are removed.
Stopwords are removed. See bbalet/stopwords for a full list of supported languages and words subject to removal.
Punctuation is removed along with extra spaces on both ends of the entity name. Using Go's /x/text normalization methods we consolidate entity names and search queries for better searching across multiple languages.
Why are exact matches of words not ranked higher?
Watchman offers an environmental variable called
EXACT_MATCH_FAVORITISM that can adjust the weight of exact matches within a query. This value is a percentage (float64) added to exact matches prior to computing the final match percentage. Try using 0.1, 0.25 or 0.5 with your testing.