This service handles requests asynchronously, meaning that when you send a job to the `submit` endpoint, the response will include a link to where you can find your results once the job is completed. While the job is processing, you can poll the `status` endpoint to find out when the job finishes. Then you can request the results from the `results` endpoint.
## Authentication (TO UPDATE)
All requests for this service require an authentication token which can be obtained through the NSSI authentication server. For up-to-date information on authenticating your requests, see the [NSSI documentation](https://gitlab.com/calincs/conversion/NSSI/-/wikis/home).
Example requests are in the [Demo Postman Collection]().
Here is an example authentication request:
```
curl --L --request POST 'https://auth.nssi.stage.lincsproject.ca/oauth/token' \
--header 'Authorization: {authentication}' \
--form 'scope=nerve_client' \
--form 'grant_type=password' \
--form 'username={username}' \
--form 'password={password}'
```
## Authentication
All requests for this service require an authentication token which can be obtained through the LINCS Keycloak server. For up-to-date information on authenticating your requests, see the [NSSI Keycloak documentation](https://gitlab.com/calincs/conversion/NSSI/-/wikis/Keycloak/Keycloak-Guide).
## Submitting a Job
...
...
@@ -30,23 +22,27 @@ The details for submitting each type of job are included in later sections, but
```
Where,
-`projectName` is a name for your own reference.
-`workflow` should always be set to `alberta_reconciliation` for this service. Other workflows are available for other [NSSI](https://gitlab.com/calincs/conversion/NSSI) modules.
-`authority` specifies which authority file your data should be compared against. If you want to compare against multiple files, it should be done as separate API requests. The following options are currently supported:
-`viaf-works`
-`viaf-expressions`
-`wikidata-bibliographic`
-`wikidata-works`
-`wikidata-authors`
-`matchNumber` specifies the maximum number of candidate matches that can be returned for a given input record. If not specified, it will default to 3.
-`matchThreshold` specifies the minimum match score that a candidate match must have for it to be returned. This must be a value from 0 to 1, where 0 means that all potential matches can be returned and 1 means that only a perfect scoring candidate match would be returned. If not specified, it will default to 0.6. See [TODO add section] for details on what this score means.
-`matchThreshold` specifies the minimum match score that a candidate match must have for it to be returned. This must be a value from 0 to 1, where 0 means that all potential matches would be returned and 1 means that only a perfect scoring candidate match would be returned. If not specified, it will default to 0.6. See the [Understanding Record Linkage] section (Understanding-Record-Linkage) for details on what this score means.
-`data` should be a json of records to be reconciled. The format of this section is specified below.
All of these fields are required for every request other than `data`, which must be excluded if you include a file upload in your request. If you include a file and the `data` field in the body, then only the `data` field in the request body will be used.
All of these fields are required for every request other than `data`, `matchNumber`, and `matchThreshold`. `data` must be excluded if you include a file upload in your request. If you include a file and the `data` field in the body, then only the `data` field in the request body will be used. Details on including a file in your request can be found below.
The possible fields within `data` vary for each `authority`, but for all authorities, `unique_id` is the only required field within `data`. However, `workName1` and `authorName1` (or `authorFirstName` and `authorLastName`) should also be provided or you are unlikely to get correct matches. You can omit fields entirely if you don't have data for them or set the values as empty strings (ie. `"field": ""`). It's fine if you include extra fields in the `data` json objects; those values will have no impact on the rest of the process (they will not be compared to the authority records) other than potentially slowing down the processing if there are a lot of extra fields.
The possible fields within `data` vary for each `authority`, but for all authorities, `unique_id` is the only required field within `data`. However, at least one of (ideally both) `workName1` and `authorName1` (or `authorFirstName` and `authorLastName`) should also be provided or you are unlikely to get correct matches. You can omit fields entirely if you don't have data for them or set the values as empty strings (ie. `"field": ""`). It's fine if you include extra fields in the `data` json objects; those values will have no impact on the rest of the process (they will not be compared to the authority records) other than potentially slowing down the processing if there are a lot of extra fields.
### VIAF Requests
POST a request to the submit endpoint with either `"authority": "viaf-works"` or `"authority": "viaf-expressions"` to compare your data against bibliographic records from [VIAF](http://viaf.org/). When using `"viaf-works"`, the input will be compared against all VIAF entities of `nameType UniformTitleWork` and all works listed under entities of `nameType Personal` even if the work itself is not a VIAF entity. When using `viaf-expressions`, the input will be compared against all VIAF entities of `nameType UniformTitleExpression`.
VIAF work and expression records do not contain publisher information, but there are cases where a corporate entity is listed as the author. In these cases, you may have success listing the publisher as the author.
Here is an example request with explanations of all the possible fields in place of actual values:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
...
...
@@ -62,28 +58,71 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
"data": [
{
"unique_id": "Required id for each record. Needed so you can relate the results back to your data. Can be any unique value.",
"workID": "VIAF ID for the work. ID number of full URI are acceptable.",
"workID": "VIAF ID for the work. ID number or full URI are acceptable.",
"workName1": "Primary title of the work.",
"workName2": "Alternate title of the work.",
"workISNI": "ISNI for the work.",
"workLanguage": "Original language of the work.",
"workPublicationDate": "Original publication date for the work. If full date is provided, only year will be compared.",
"workPublisher": "Work Publisher.",
"workLanguage": "Original language of the work. VIAF uses abbreviated language codes. The input data should be mapped accordingly by the user.",
"workWikidata": "Wikidata ID for the work. Q values or full URI are acceptable. This includes Wikidata URIs that exist in VIAF and results from our reverse lookup of VIAF IDs and ISNIs on Wikidata.",
"workWorldCat": "WorkCat ID for the work. ID number of full URI are acceptable.",
"authorID": "VIAF ID for the author. ID number of full URI are acceptable.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is not.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is not.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is blank.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is blank.",
"authorName1": "Primary full name of the author.",
"authorName2": "Alternate full name of the author.",
"authorName3": "Alternate full name of the author.",
"authorSex": "Sex or Gender of the author based on authority's definition of those terms. VIAF lists gender.",
"authorSex": "Sex or Gender of the author based on authority's definition of those terms. VIAF lists gender. Can use the following symbols to represent M, F, 0, 1, Male, Female. This is not case sensitive and anything not in that list of values will count towards unknown.",
"authorBirthYear": "Author's birth date. If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date. If full date is provided, only year will be compared.",
"authorCountryOrigin": "Author's country of origin.",
"authorISNI": "ISNI for the author",
"authorLanguage": "",
"authorLC": "Library of Congress ID for the author.",
"authorISNI": "ISNI for the author. Of the form XXXX XXXX XXXX XXXX or XXXXXXXXXXXXXXXX",
"authorLanguage": "Language listed for the author. VIAF uses abbreviated language codes. The input data should be mapped accordingly by the user.",
"authorLC": "Library of Congress ID for the author. Of the form https://www.worldcat.org/identities/lccn-n##########",
"authorWikidata": "Wikidata ID for the author. Q values or full URI are acceptable.",
},
{
...
},
...
]
}
}
```
Here is an example for the `viaf-expressions` authority:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Content-Type: application/json' \
--header 'Authorization: {AuthToken}' \
--data-raw '{
"projectName": "My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "viaf-expressions",
"matchNumber": 2,
"matchThreshold": 0.5,
"data": [
{
"unique_id": "Required id for each record. Needed so you can relate the results back to your data. Can be any unique value.",
"workID": "VIAF ID for the work associated with the expression. ID number or full URI are acceptable.",
"expressionID": "VIAF ID for the expression. Only used if authority is set to viaf-expressions."
"expressionName1": "Primary title of the expression.",
"expressionName2": "Alternate title of the expression.",
"expressionLanguage": "Original language of the expression. VIAF uses abbreviated language codes. The input data should be mapped accordingly by the user.",
"expressionWikidata": "Wikidata ID for the expression. Q values or full URI are acceptable. This includes Wikidata URIs that exist in VIAF and results from our reverse lookup of VIAF IDs and ISNIs on Wikidata.",
"expressionWorldCat": "WorldCat ID for the expression. ID number of full URI are acceptable.",
"authorID": "VIAF ID for the author. ID number of full URI are acceptable.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is blank.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is blank.",
"authorName1": "Primary full name of the author.",
"authorName2": "Alternate full name of the author.",
"authorName3": "Alternate full name of the author.",
"authorSex": "Sex or Gender of the author based on authority's definition of those terms. VIAF lists gender. Can use the following symbols to represent M, F, 0, 1, Male, Female. This is not case sensitive and anything not in that list of values will count towards unknown.",
"authorBirthYear": "Author's birth date. If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date. If full date is provided, only year will be compared.",
"authorCountryOrigin": "Author's country of origin.",
"authorISNI": "ISNI for the author. Of the form XXXX XXXX XXXX XXXX or XXXXXXXXXXXXXXXX",
"authorLanguage": "Language listed for the author. VIAF uses abbreviated language codes. The input data should be mapped accordingly by the user.",
"authorLC": "Library of Congress ID for the author. Of the form https://www.worldcat.org/identities/lccn-n##########",
"authorWikidata": "Wikidata ID for the author. Q values or full URI are acceptable.",
},
{
...
...
@@ -97,7 +136,7 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
### Wikidata Requests
POST a request to the submit endpoint with `"authority": "wikidata-bibliographic"` to compare your data against all written works in [Wikidata](https://www.wikidata.org/). In the future, this will be updated to include all authors in Wikidata even if they don't have any works in Wikidata.
POST a request to the submit endpoint with `"authority": "wikidata-works"` to compare your data against all written works in [Wikidata](https://www.wikidata.org/). Presently, this does not include works in scholarly article class. If this is desired then feel free to open an issue and it can be added. By setting `"authority": "wikidata-authors"` you can compare your data against all people listed as authors in wikidata. This captures a significant number of authors that do not have their works listed as entities in Wikidata. Some filtering was done to remove authors of unrelated works (ie. computer programs, visual art, etc.).
Here is an example request with explanations of all the possible fields in place of actual values:
```
...
...
@@ -108,7 +147,7 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
"projectName": "My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "wikidata-bibliographic",
"authority": "wikidata-works",
"matchNumber": 2,
"matchThreshold": 0.5,
"data": [
...
...
@@ -118,16 +157,20 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
"workName1": "Primary title of the work.",
"workName2": "Alternate title of the work.",
"workCountryOrigin": "Country of origin of the work (P495).",
"workCountryOriginID": URI for country of origin",
"workEdition": "Edition of the work (P393).",
"workFormat": "Format of the work (P7937).",
"workFormatID": "URI of Work Format",
"workISBN10": "ISBN10 of the work (P957)",
"workISBN13": "ISBN13 of the work (P212)",
"workOCLC": "OCLC ID for the work (P5331)",
"workVIAF": "VIAF ID for the work (P214)",
"workISSN": "ISSN for the work (Q131276)",
"workLanguage": "Original language of the work. (P407)",
"workLanguageID": "URI for work Language",
"workPublicationDate": "Original publication date for the work (P577). If full date is provided, only year will be compared.",
"workPublisher": "Work Publisher (P123)",
"workPublisher": "Work Publisher Wikidata ID Full URI",
"authorID": "Wikidata ID for the author. Q values or full URI are acceptable. Author (P2093 or P50) or creator (P170) if no author is listed.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is not.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is not.",
...
...
@@ -137,6 +180,45 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
"authorSex": "Sex or gender of the author based on authority's definition of those terms. Wikidata lists sex or gender (P21).",
"authorBirthYear": "Author's birth date (P570). If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date (P569). If full date is provided, only year will be compared.",
"authorVIAFID": "Author’s VIAF id (not in uri form)",
"authorISNI": "Author’s ISNI id (not in uri form)",
},
{
...
},
...
]
}
}
```
And here is a version for wikidata-authors:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Content-Type: application/json' \
--header 'Authorization: {AuthToken}' \
--data-raw '{
"projectName": "My Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "wikidata-authors",
"matchNumber": 2,
"matchThreshold": 0.5,
"data": [
{
"unique_id": "Required id for each record. Needed so you can relate the results back to your data. Can be any unique value.",
"authorID": "Wikidata ID for the author. Q values or full URI are acceptable. Author (P2093 or P50) or creator (P170) if no author is listed.",
"authorFirstName": "Author's given name(s). Will only be used if authorName3 is not.",
"authorLastName": "Author's family name(s). Will only be used if authorName3 is not.",
"authorName1": "Primary full name of the author.",
"authorName2": "Alternate full name of the author.",
"authorName3": "Alternate full name of the author.",
"authorSex": "Sex or gender of the author based on authority's definition of those terms. Wikidata lists sex or gender (P21).",
"authorBirthYear": "Author's birth date (P570). If full date is provided, only year will be compared.",
"authorDeathYear": "Author's death date (P569). If full date is provided, only year will be compared.",
"authorVIAFID": "Author’s VIAF id (not in uri form)",
"authorISNI": "Author’s ISNI id (not in uri form)",
},
{
...
...
...
@@ -148,26 +230,28 @@ curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/subm
```
### Submitting a Local File
Rather than submitting your data as a JSON request body, you also have the option to submit a local data file with your request. This file should be a structured format where each row or json object represents a single bibliographic record. Only one file can be submitted per job, but that single file can contain thousands of records. The service accepts files in the following formats: `.csv`, `.tsv`, `.json`, `.parquet`.
Rather than submitting your data as a JSON request body, you also have the option to submit a local data file with your request. This file should be a structured format where each row or json object represents a single bibliographic record. Only one file can be submitted per job, but that single file can contain thousands of records. The service accepts files in the following formats: `.csv`, `.tsv`, `.json`.
When submitting a local file, you still need to include a request body with all the `context` parameters other than `data`. If you include `data` in `context` and a local file, then the file will be ignored in processing.
The linking is based off of the headers in your input file. So, for example, if you submit a .csv file then the headers for the columns you want to use in the linking must exactly match some or all of the fields in the above json requests for the selected authority. Any column with a header that does not match one of those fields will be ignored in the calculations. Remember there must be a `unique_id` column so that you know to which input records the results correspond.
Here is an example request that sends a local file to the reconciliation service for processing:
```
curl --L --request POST 'https://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Authorization: {authToken}' \
--form 'file=@/localPath/data.csv' \
--form 'request={
"projectName":"Natalie'\''s Test Project",
"workflow": "alberta_reconciliation",
"context": {
"authority": "viaf-works",
"matchNumber": 3,
"matchThreshold": 0.7
curl --location --request POST 'http://api.nssi.stage.lincsproject.ca/api/v2/jobs/submit' \
--header 'Authorization: {bearer token}' \
--header 'Cookie: Cookie_1=value' \
--form 'file=@"/localFilePath/filename.csv"' \
--form 'request="{
\"projectName\": \"Test Project\",
\"workflow\": \"alberta_reconciliation\",
\"context\": {
\"authority\": \"viaf-works\",
\"matchNumber\": 7,
\"matchThreshold\": 0
}
}'
}";type=application/json'
```
### Submit Response
...
...
@@ -208,3 +292,5 @@ curl --L --request GET 'https://api.nssi.stage.lincsproject.ca/api/v2/reconcilia
The output is a JSON response body of input record `unique_ids` that had candidate matches in the authority file as well as all available fields for the candidate matches from the authority file. The response will include the `matchNumber` best matches for each input record in decreasing order of `matchScore`. If an input record did not have any candidate matches, or did not have candidate matches that met the `matchThreshold` condition then that record's id will not be present in the results response.