Update Using the API authored by Natalie Hervieux's avatar Natalie Hervieux
This service handles requests asynchronously, meaning that when you send a job to the `submit` endpoint, the response will include a link to where you can find your results once the job is completed. While the job is processing, you can poll the `status` endpoint to find out when the job finishes. Then you can request the results from the `results` endpoint.
## Authentication (TO UPDATE)
All requests for this service require an authentication token which can be obtained through the NSSI authentication server. For up-to-date information on authenticating your requests, see the [NSSI documentation](https://gitlab.com/calincs/conversion/NSSI/-/wikis/home).
Here is an example authentication request:
```
curl --L --request POST 'https://auth.nssi.stage.lincsproject.ca/oauth/token' \
--header 'Authorization: {authentication}' \
--form 'scope=nerve_client' \
--form 'grant_type=password' \
--form 'username={username}' \
--form 'password={password}'
```
## Submitting a Job
The details for submitting each type of job are included in later sections, but all request bodies will share the following shape:
```json
{
"projectName":"My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "viaf-works",
"matchNumber": 3,
"matchThreshold": 0.7,
"data":[]
}
}
```
Where,
- `workflow` should always be set to `alberta_reconciliation` for this service. Other workflows are available for other [NSSI](https://gitlab.com/calincs/conversion/NSSI) modules.
- `authority` specifies which authority file your data should be compared against. If you want to compare against multiple files, it should be done as separate API requests. The following options are currently supported:
- `viaf-works`
- `viaf-expressions`
- `wikidata-bibliographic`
- `matchNumber` specifies the maximum number of candidate matches that can be returned for a given input record. If not specified, it will default to 3.
- `matchThreshold` specifies the minimum match score that a candidate match must have for it to be returned. This must be a value from 0 to 1, where 0 means that all potential matches can be returned and 1 means that only a perfect scoring candidate match would be returned. If not specified, it will default to 0.6. See [TODO add section] for details on what this score means.
- `data` should be a json of records to be reconciled. The format of this section is specified below.
All of these fields are required for every request other than `data`, which must be excluded if you include a file upload in your request. If you include a file and the `data` field in the body, then only the `data` field in the request body will be used.
The possible fields within `data` vary for each `authority`, but for all authorities, `unique_id` is the only required field within `data`. However, `workName1` and `authorName1` (or `authorFirstName` and `authorLastName`) should also be provided or you are unlikely to get correct matches. You can omit fields entirely if you don't have data for them or set the values as empty strings (ie. `"field": ""`). It's fine if you include extra fields in the `data` json objects; those values will have no impact on the rest of the process (they will not be compared to the authority records) other than potentially slowing down the processing if there are a lot of extra fields.
### VIAF Requests
POST a request to the submit endpoint with either `"authority": "viaf-works"` or `"authority": "viaf-expressions"` to compare your data against bibliographic records from [VIAF](http://viaf.org/). When using `"viaf-works"`, the input will be compared against all VIAF entities of `nameType UniformTitleWork` and all works listed under entities of `nameType Personal` even if the work itself is not a VIAF entity. When using `viaf-expressions`, the input will be compared against all VIAF entities of `nameType UniformTitleExpression`.
Here is an example request with explanations of all the possible fields in place of actual values:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Content-Type: application/json' \
--header 'Authorization: {AuthToken}' \
--data-raw '{
"projectName": "My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "viaf-works",
"matchNumber": 2,
"matchThreshold": 0.5,
"data": [
{
"unique_id": "Required id for each record. Needed so you can relate the results back to your data. Can be any unique value.",
"workID": "VIAF ID for the work. ID number of full URI are acceptable.",
"workName1": "Primary title of the work.",
"workName2": "Alternate title of the work.",
"workISNI": "ISNI for the work.",
"workLanguage": "Original language of the work.",
"workPublicationDate": "Original publication date for the work. If full date is provided, only year will be compared.",
"workPublisher": "Work Publisher.",
"workWikidata": "Wikidata ID for the work. Q values or full URI are acceptable.",
"workWorldCat": "WorkCat ID for the work. ID number of full URI are acceptable.",
"authorID": "VIAF ID for the author. ID number of full URI are acceptable.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is not.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is not.",
"authorName1": "Primary full name of the author.",
"authorName2": "Alternate full name of the author.",
"authorName3": "Alternate full name of the author.",
"authorSex": "Sex or Gender of the author based on authority's definition of those terms. VIAF lists gender.",
"authorBirthYear": "Author's birth date. If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date. If full date is provided, only year will be compared.",
"authorCountryOrigin": "Author's country of origin.",
"authorISNI": "ISNI for the author",
"authorLanguage": "",
"authorLC": "Library of Congress ID for the author.",
"authorWikidata": "Wikidata ID for the author. Q values or full URI are acceptable.",
},
{
...
},
...
]
}
}
```
### Wikidata Requests
POST a request to the submit endpoint with `"authority": "wikidata-bibliographic"` to compare your data against all written works in Wikidata. In the future, this will be updated to include all authors in Wikidata even if they don't have any works in Wikidata.
Here is an example request with explanations of all the possible fields in place of actual values:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Content-Type: application/json' \
--header 'Authorization: {AuthToken}' \
--data-raw '{
"projectName": "My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "wikidata-bibliographic",
"matchNumber": 2,
"matchThreshold": 0.5,
"data": [
{
"unique_id": "Required id for each record. Needed so you can relate the results back to your data. Can be any unique value.",
"workID": "Wikidata ID for the work. Q values or full URI are acceptable.",
"workName1": "Primary title of the work.",
"workName2": "Alternate title of the work.",
"workCountryOrigin": "Country of origin of the work (P495).",
"workEdition": "Edition of the work (P393).",
"workFormat": "Format of the work (P7937).",
"workISBN10": "ISBN10 of the work (P957)",
"workISBN13": "ISBN13 of the work (P212)",
"workOCLC": "OCLC ID for the work (P5331)",
"workVIAF": "VIAF ID for the work (P214)",
"workISSN": "ISSN for the work (Q131276)",
"workLanguage": "Original language of the work. (P407)",
"workPublicationDate": "Original publication date for the work (P577). If full date is provided, only year will be compared.",
"workPublisher": "Work Publisher (P123)",
"authorID": "Wikidata ID for the author. Q values or full URI are acceptable. Author (P2093 or P50) or creator (P170) if no author is listed.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is not.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is not.",
"authorName1": "Primary full name of the author.",
"authorName2": "Alternate full name of the author.",
"authorName3": "Alternate full name of the author.",
"authorSex": "Sex or gender of the author based on authority's definition of those terms. Wikidata lists sex or gender (P21).",
"authorBirthYear": "Author's birth date (P570). If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date (P569). If full date is provided, only year will be compared.",
},
{
...
},
...
]
}
}
```
### Submitting a Local File
Rather than submitting your data as a JSON request body, you also have the option to submit a local data file with your request. This file should be a structured format where each row or json object represents a single bibliographic record. Only one file can be submitted per job, but that single file can contain thousands of records. The service accepts files in the following formats: `.csv`, `.tsv`, `.json`, `.parquet`.
When submitting a local file, you still need to include a request body with all the `context` parameters other than `data`. If you include `data` in `context` and a local file, then the file will be ignored in processing.
The linking is based off of the headers in your input file. So, for example, if you submit a .csv file then the headers for the columns you want to use in the linking must exactly match some or all of the fields in the above json requests for the selected authority. Any column with a header that does not match one of those fields will be ignored in the calculations. Remember there must be a `unique_id` column so that you know to which input records the results correspond.
Here is an example request that sends a local file to the reconciliation service for processing:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Authorization: {authToken}' \
--form 'file=@/localPath/data.csv' \
--form 'request={
"projectName":"Natalie'\''s Test Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "viaf-works",
"matchNumber": 3,
"matchThreshold": 0.7
}
}'
```
### Submit Response
When you successfully POST a job to the `submit` endpoint, you can expect to receive a `202 Accepted` response of the form:
```
{
"jobId": 42,
"resultsUri": "https://api.nssi.stage.lincsproject.ca/api/v2/reconciliation/results/42"
}
```
Make note of the `resultsUri` so that you can retrieve the results from the job.
## Checking Request Status
To find out when your job has finished processing and your results are available, you can poll the `status` endpoint with GET requests. For example:
```
curl --L --request GET 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/42/status' \
--header 'Authorization: {authToken}' \
```
Responses for existing jobs are of the following shape, where `status` can be `IN_PROGRESS`, `READY`, `FAILED`, or `CANCELLED`:
```
{
“jobId”: 42,
"status": "IN_PROGRESS"
}
```
Note that we will periodically clean up results from old jobs so remember to save your results right away.
## Getting the Results
Using `resultsUri` from the response to your POST request, you can retrieve your results as follows:
```
curl --L --request GET 'https://api.nssi.stage.lincsproject.ca/api/v2/reconciliation/results/42' \
--header 'Authorization: {authToken}' \
```
The output is a JSON response body of input record `unique_ids` that had candidate matches in the authority file as well as all available fields for the candidate matches from the authority file. The response will include the `matchNumber` best matches for each input record in decreasing order of `matchScore`. If an input record did not have any candidate matches, or did not have candidate matches that met the `matchThreshold` condition then that record's id will not be present in the results response.
Expected Response Example: (TODO)
\ No newline at end of file