File Dataset Ingestion
Watchman has a POST /v2/ingest/{fileType}
endpoint allows you to upload CSV files containing entity data (e.g., businesses or persons) for ingestion.
The endpoint processes the file based on a predefined schema defined in the Watchman YAML configuration and returns a JSON response with the parsed entities.
The parsed entities are included in Watchman’s memory for search, but kept as a separate list. Callers must specify a entity type matching the fileType
.
Path Parameters
fileType
(required): The type of file being ingested, corresponding to a specific configuration in the Watchman YAML (e.g.,fincen-business
orfincen-person
).
Request Body
- A CSV file containing entity data, with headers matching the schema defined in the Watchman configuration.
Response
The response is a JSON object containing the parsed entities and the file type. The structure is as follows:
{
"fileType": string,
"entities": []searchEntity
}
fileType
: The type of file ingested (e.g., fincen-business)entities
: An array of parsed entities, each containing fields like name, type, sourceID, and additional fields based on the entity type (e.g., person or business).
YAML Configuration
Below is an example Watchman YAML configuration for two file types: fincen-business and fincen-person.
Watchman:
Ingest:
files:
fincen-business:
format: csv
mapping:
name:
column: business_name
sourceID:
column: tracking_number
type:
default: "business"
business:
name:
column: business_name
altNames:
columns: dba_name
created:
column: incorporated
governmentIDs:
type:
column: number_type
identifier:
column: number
contact:
phoneNumbers:
columns: phone
addresses:
line1:
columns: street
city:
columns: city
state:
columns: state
postalCode:
columns: zip
country:
columns: country
fincen-person:
format: csv
mapping:
name:
merge: [first_name, suffix, middle_name, last_name]
sourceID:
column: tracking_number
type:
default: "person"
person:
name:
merge: [first_name, middle_name, last_name]
altNames:
merge: [alias_first_name, alias_suffix, alias_middle_name, alias_last_name]
birthDate:
column: dob
governmentIDs:
type:
column: number_type
identifier:
column: number
contact:
phoneNumbers:
columns: phone
addresses:
line1:
columns: street
city:
columns: city
state:
columns: state
postalCode:
columns: zip
country:
columns: country
Schema Explanation
format
: Specifies the file format. Currently, onlycsv
is supported.mapping
: Defines how CSV columns map to entity fields. The mapping supports:name
: The entity’s name, either from a singlecolumn
ormerge
of multiple columns (e.g., combiningfirst_name
andlast_name
).sourceID
: A unique identifier for the entity, mapped to a singlecolumn
.type
: The entity type (e.g.,business
orperson
), set via adefault
value.business
orperson
: Entity-specific fields, such as:name
: The primary name (can be redundant with top-levelname
).altNames
: Alternate names, mapped to a singlecolumns
ormerge
of multiple columns.created
(business) orbirthDate
(person): A date field, parsed from acolumn
using formats like2006-01-02
,1/2/2006
, or01/02/2006
.governmentIDs
: Government-issued IDs, withtype
(e.g.,tax-id
) andidentifier
mapped to columns.
contact
: Contact information, currently supportingphoneNumbers
mapped to acolumns
field.addresses
: Physical addresses, with fields likeline1
,city
,state
,postalCode
, andcountry
, each mapped to acolumns
field.
CSV File Requirements
- The CSV must include headers that match the column names specified in the Watchman configuration.
- Each row represents a single entity.
- Fields like dates must conform to one of the accepted formats (
2006-01-02
,1/2/2006
,01/02/2006
). - Missing or empty fields are handled gracefully (e.g., skipped or set to empty values).
Example CSV for fincen-business
business_name,tracking_number,dba_name,incorporated,number_type,number,phone,street,city,state,zip,country
"Acme Corp","12345","Acme Inc","2020-01-15","tax-id","EIN123","555-1234","123 Main St","Springfield","IL","62701","US"
Example CSV for fincen-person
first_name,middle_name,last_name,suffix,tracking_number,alias_first_name,alias_middle_name,alias_last_name,alias_suffix,dob,number_type,number,phone,street,city,state,zip,country
"John","A","Doe","Jr","67890","Johnny","B","Doe","","1985-03-22","ssn","123-45-6789","555-5678","456 Oak St","Springfield","IL","62701","US"
Search the File
You can then perform searches against the ingested file.
GET /v2/search?type=fincen-person&name=John+Doe&type=person