What is Watchman?
The Watchman project implements an HTTP server and Go library for searching, parsing, and downloading lists. Below, you can find a detailed list of features offered by Watchman:
- Download OFAC, BIS Denied Persons List, Consolidated Screening List, and various other data sources on startup
- Admin endpoint to manually refresh OFAC, CSL, and other data sources
- Index data for searches
- Libraries for OFAC, US CSL, UK/EU CSL, and BIS DPL data to download and parse their custom files
Searching across all sanction lists Watchman uses the Jaro–Winkler algorithm to score the probability of each search query matching a list entry. This follows after what the US Treasury OFAC Search uses and what is recommended in academic literature.
FAQ
How are entities from the list indexed and used in search?
Entities from sanction lists and other data files are folded through various pre-computations prior to inclusion in the search index. This means the following steps will occur (in order):
-
SDN Reordering
Each individual's SDN name is re-ordered (Example: from "MADURO MOROS, Nicolas" to "Nicolas MADURO MOROS"). -
Company Name Cleanup
Suffixes from company names such as: "CO.", "INC.", "L.L.C.", etc are removed. -
Stopword Removal
Stopwords are removed. See bbalet/stopwords for a full list of supported languages and words subject to removal. -
UTF-8 Normalization
Punctuation is removed along with extra spaces on both ends of the entity name. Using Go's /x/text normalization methods we consolidate entity names and search queries for better searching across multiple languages.
Why are exact matches of words not ranked higher?
Watchman offers an environmental variable called EXACT_MATCH_FAVORITISM
that can adjust the weight of exact matches within a query. This value is a percentage (float64) added to exact matches prior to computing the final match percentage. Try using 0.1, 0.25 or 0.5 with your testing.