Commit 923086f5 authored by Karl Ward's avatar Karl Ward
Browse files

Handle DOIs containing unicode formatting or control chars.

parent 4599b6ea
......@@ -44,12 +44,16 @@
(defn normalize-short-doi [s]
(when s (.toLowerCase (extract-short-doi s))))
;; Regex below is a hack to handle broken CrossRef DOIs that contain
;; non printable characters.
(defn to-long-doi-uri
"Ensure a long DOI is in a normalized URI form."
[s]
(when s
(ids/get-id-uri :long-doi (normalize-long-doi s))))
(->> (.replaceAll s "[^\\p{Print}]" "")
(normalize-long-doi)
(ids/get-id-uri :long-doi))))
(defn to-short-doi-uri
"Ensure a short DOI is in a normalized URI form."
[s]
......
This source diff could not be displayed because it is too large. You can view the blob instead.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment