Expose internal structure of "desc" generated by zotxt, for export filter
Created by: titaniumbones
Is there currently any way to access the individual components of the "Description" string that zotxt hands over to zotxt-emacs? I can't see a way to add org-mode markup to the description; I would like to be able to do this for purposes of exporting bibliographies to HTML and ODT.
hmm, looking at the source I'm starting to wonder if this functionality would need to be added directly in the zotxt extension itself, or (gasp) even in Zotero, as I think the text string comes directly from the extension.
Another alternative I guess would be to translate the html output generated by zotxt, into org-mode syntax somehow. I don't know how to do that either...
Thanks,
Mat
Imported comments:
By titaniumbones on 2015-01-26 19:48:34+00:00
wait, for HTML this should be easy -- just write a function that grabs the formatted bibliography from zotxt and inserts the html result from the json response; then add
(if (eq format 'html) (my-custom-zotxt-function (path))
to the definition of the link type. But what would my-custom-zotxt-function look like? Any suggestions?
Thanks, Matt
By egh on 2015-01-27 18:02:34+00:00
Sorry, bitbucket notifications seem to be completely broken, so I'm just seeing this.
If you want to generate a bibliography and output the HTML, that should be pretty easy. See here https://bitbucket.org/egh/zotxt/src/a12d538ae9245b142fdb55550b2d241e43b82221/tests/test.rb?at=master#cl-118 for how to generate a bibliography.
If you want to generate org mode syntax, see the function I posted below (slightly refined over the one on the zotero-dev).
I'm curious what the use case is here. I'm trying to make org-zotxt export citations, but I'm not sure the best way to go about it. Right now I have a solution that I think would work well enough, but would be different if you wanted to export latex or markdown, and doesn't have a solution for HTML export.
#!emacs-lisp
(require 'pcase)
(defun org-zotxt-parse-htmlstring (html)
(with-temp-buffer
(insert html)
(libxml-parse-html-region (point-min) (point-max))))
(defun org-zotxt-htmlstring2org (html)
(org-zotxt-htmltree2org (org-zotxt-parse-htmlstring html)))
(defun org-zotxt-htmltree2org (html)
(pcase html
((pred (stringp)) html)
(`(a ,attrs . ,children)
(format "[[%s][%s]]" (cdr (assq 'href attrs))
(org-zotxt-htmltree2org children)))
(`(i ,attrs . ,children)
(format "/%s/" (org-zotxt-htmltree2org children)))
(`(b ,attrs . ,children)
(format "*%s*" (org-zotxt-htmltree2org children)))
(`(p ,attrs . ,children)
(format "%s\n\n" (org-zotxt-htmltree2org children)))
(`(span ,attrs . ,children)
(pcase (cdr (assq 'style attrs))
("font-style:italic;"
(format "/%s/" (org-zotxt-htmltree2org children)))
("font-variant:small-caps;"
;; no way?
(org-zotxt-htmltree2org children))
(_ (org-zotxt-htmltree2org children))))
((or `(html ,attrs . ,children)
`(body ,attrs . ,children))
(org-zotxt-htmltree2org children))
((pred (lambda (h) (and (listp h)
(or (stringp (car h))
(and (listp (car h))
(symbolp (car (car h))))))))
;; list of strings or elements
(mapconcat #'org-zotxt-htmltree2org html ""))))
(org-zotxt-htmlstring2org "<p><a href=\"http://example.org/\">hello</a> <span style=\"font-style:italic;\">world<br/> foo</span> <b>foo</b></p>")
By titaniumbones on 2015-01-28 02:27:13+00:00
Hi Erik,
Thanks again for this. So, I have 2 use cases:
(1) I write in org-mode but exchange all my papers for final editing in .doc or .docx format, with colleagues who only us MS Office. I would like to stay in org-mode all the way through to the very last stages of pre-circulation; but to do this, I need to be able to generate the appropriate odf output. THis seems like a somewhat difficult and finicky parsing operation, but one thought I had was to start with org-mode syntax and use org's native parsers. In this case, it is important to retain "live" zotero citations so that links cna be updated within libreofficee when I am making final revisions.My first htoughts about how to do this properly may be a little silly..
(2) Every semester I write two or three syllabi; I am grateful to have the the opportunity fo finally be able to select sources within emacs and havehtem appear in my org files! However, I need to be able to export these syllabi to .odf and .html. In this case, retaining "live" zotero citations is not important for me; so just hte ability to write out the appropriate styling informatio nwill be enough. I thoughht that, if I used org-mode syntax, I could hten make use of org's native export filters.
Does any of that make sense? I am pretty far behind you on this, and my thoughts are, I htink, ill'formed.
Thanks again,
Matt
PS -- Christian Moe is discussing his own extensions to zotxt on the org mailing list -- his modifications are, I htink, more likely to be useful than mine, but I will try to contribute something once I understnad better how to do it. I wish my lisp were better! At present my diffiucly is understanding how to modify the zotxt-choose functions so that they retain the html parsing that the zotxt extension generates; from there I think I cna probably figure out how to make use of the html expression. I can see this is a very simple tast but I don't understand lisp expressions very well.
Another idea is to write an org parser for citeproc-js.
Thanks again!
By egh on 2015-01-28 03:45:22+00:00
Hi Matt
Thank you for the details! Your use case makes absolute sense. Have you considered using org->markdown->odt/html via pandoc and pandoc-zotxt? I suspect that the easiest thing might be to ensure that this works in some way.
Thanks for the pointer to the org-mode discussion. I hadn't seen it yet since I read org-mode via gmane. I'm going to have a look at it now.
What I really want to have is a way of exporting org-zotxt links/citations to latex or markdown (and maybe HTML) with page refs, etc., but the design and implementation is a little tricky. I think we should set up a wiki page somewhere to flesh out ideas.
The org-zotxt code is a little trick because it uses the deferred library so as not to lock up emacs while searching zotero, which can take some time. It took me a long time to wrap my head around how it works. Thanks!
By egh on 2015-01-28 05:29:53+00:00
A solution you can use right now is to change:
(org-add-link-type "zotero"
(lambda (rest)
(zotxt-select-key (substring rest 15)))
(lambda (path desc format)
(if (string-match "^@\\(.*\\)$" desc)
(cond ((eq format 'latex)
(format "\\cite{%s}" (match-string 1 desc)))
((eq format 'md)
desc)
(t nil))
nil)))
And customize org-zotxt-link-description-style
to :easykey
and then write org files like:
[see [[zotero://select/items/0_4T8MCITQ][@doe:2006article]], p. 10]
that is, using pandoc citations around org-zotxt links that have a description @doe:2006article
. You can then export to markdown:
[see @doe:2006article, p. 10]
And use pandoc-zotxt
(see the zotxt repo)
pandoc -F pandoc-zotxt -F pandoc-citeproc file.md -s
or for docx output:
pandoc -F pandoc-zotxt -F pandoc-citeproc file.md -t docx -o file.docx
But I hope the org-mode community can come to some sort of consensus about citations!
By titaniumbones on 2015-01-28 18:14:59+00:00
Hi Erik,
I am willing to explore the pandoc route but it does seem a little ocnvoluted. for now, vor a very quick hack allowing me to export my syllabi to html, I'm trying to wrote a quick unction that I cna add to the link definition
#!elisp
(defun org-zotxt-get-html-bib (path)
(let ((key (org-zotxt-extract-link-id-from-link path) ))
(lexical-let ((d (deferred:new)))
(request
(format "%s/items" zotxt-url-base)
:params `(("key" . ,key)
("method" . ,(cdr (assq method zotxt-quicksearch-method-params)))
("format" . "bibliography"))
:parser 'json-read
:success (function*
(lambda (&key data &allow-other-keys)
(let* ((output (mapcar (lambda (e)
(cdr (assq 'html e))
)
data)
))
(with-output-to-temp-buffer "*results*"
(print output)
)
(deferred:callback-post
d (if (null citation) nil
output))
))))
d))
))
(org-add-link-type "zotero"
(lambda (rest)
(zotxt-select-key (substring rest 15)))
(lambda (path desc format)
(if (string-match "^@\\(.*\\)$" desc)
(cond ((eq format 'latex)
(format "\\cite{%s}" (match-string 1 desc)))
((eq format 'md)
desc)
((eq format 'html)
(org-zotxt-get-html-bib path)
(t nil))
nil)))
I suspect the syntax in the html cond is wrong, at the end, but also my org-zotxt-get-html-bib doesn't seem to work properly -- I would think it would return the desired html from the server, the way the -choose- functions do, but I am not getting any output. I think it's because I don't understand the deferred library (at all!). Can you give me any hints? Thanks, Matt
By egh on 2015-01-29 03:53:45+00:00
Hi Matt,
I just pushed a change to zotxt-emacs to avoid throwing away the html data when fetching the bibliography. Once you have that, here is a link type to use:
#!emacs-lisp
(org-add-link-type "zotero"
(lambda (rest)
(zotxt-select-key (substring rest 15)))
(lambda (path desc format)
(if (string-match "^@\\(.*\\)$" desc)
(cond ((eq format 'latex)
(format "\\cite{%s}" (match-string 1 desc)))
((eq format 'md)
desc)
((eq format 'html)
(deferred:$
(zotxt-get-item-bibliography-deferred `(:key , (substring path 15)))
(deferred:nextc it
(lambda (item)
(plist-get item :citation-html)))
(deferred:sync! it)))
(t nil)
nil))))
The deferred library basically allows you to chain together a series of asynchronous functions. In this case we only have one function, then we can deferred:sync!
to wait until we have the value. Hope that helps. This is probably going to be slow because it hits zotxt for every single item.
By egh on 2015-01-29 03:55:08+00:00
Oh, and this will depend on your link descriptions being in easykey format. If you want to change that you'll need to modify the string-match
By titaniumbones on 2015-01-29 15:27:19+00:00
THis is fantastic, thank you!
Two questions:
- Is it necessary to wrap the (cond ) in an (if (string-match...))? That is, wouldn't one want these citations to be exported no matter what the format?
- It appears to me that the html text does not currently include live URL's. So, for instance, one of my sources produces the following html:
<div style="line-height: 1.35; padding-left: 2em; text-indent:-2em;" class="csl-bib-body">
<div class="csl-entry">Coleman, Gabriella. “Geeks Are the New Guardians of Our Civil Liberties.” <i>MIT Technology Review</i>. Accessed August 19, 2014. http://www.technologyreview.com/news/510641/geeks-are-the-new-guardians-of-our-civil-liberties/.</div>
<span class="Z3988" title="url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fzotero.org%3A2&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rft.type=webpage&rft.title=Geeks%20are%20the%20New%20Guardians%20of%20Our%20Civil%20Liberties&rft.identifier=http%3A%2F%2Fwww.technologyreview.com%2Fnews%2F510641%2Fgeeks-are-the-new-guardians-of-our-civil-liberties%2F&rft.aufirst=Gabriella&rft.aulast=Coleman&rft.au=Gabriella%20Coleman"></span>
</div>
The url is simply reproduced without being linked. I htinnk a more desireable default in HTML export, at least, would be a live link. Would this necessitate changes in zotxt? in zotero? in csl? Or would it be better to fix in with some patern-matching of my own
By egh on 2015-01-29 15:58:38+00:00
This is the CSL style. You can change it with the variable zotxt-default-bibliography-style
, but I'm not sure which style incorporates links, if any.
By titaniumbones on 2015-01-29 16:23:03+00:00
Yeah, I can't find any instyles that do incorporate links. I think some kind of pattern matching on the output is the best bet, but as I'm lousy at writing such matches, it will take me a hwile to actually fix that problem...
Thanks again!
By egh on 2015-01-29 16:25:47+00:00
I think my previous comment is wrong. See http://citationstylist.org/2014/06/20/csl-dynamic-linking-in-citeproc-js/ and https://github.com/zotero/zotero/pull/505
It sounds like something we could implement in zotxt.
By titaniumbones on 2015-01-29 17:29:39+00:00
Wow, that doesn't look too complicated. So could one add something like:
sys.variableWrapper = function (params, prePunct, str, postPunct) {
if ( (params.variableNames[0] === 'title' || params.variableNames [0] ==='URL' )
&& params.itemData.URL
&& params.context === "bibliography") {
return prePunct
+ '<a href="'
+ params.itemData.URL
+ '">'
+ str
+ '</a>'
+ postPunct;
else {
return (prePunct + str + postPunct);
}
};
to mySys (defined ~line 59 of bootstrap.js of zotxt)? It looks like working with Zotero complicated the relationship to citeproc somewhat, and you're doing something slightly more complicated than Frank describes in that blog post.
By egh on 2015-01-29 18:05:57+00:00
I did a little more research, and it seems getting links is even easier. We just need to set a flag in the processor. I'll commit later, but it's:
#!javascript
function makeCslEngine (styleId) {
let style = z.Styles.get(fixStyleId(styleId));
if (!style) {
return null;
} else {
let csl = style.getCiteProc(true);
csl.opt.development_extensions.wrap_url_and_doi = true;
return csl;
}
}
By titaniumbones on 2015-01-29 18:29:55+00:00
wow, so easy -- two-line patch!
I'll try it out ( I think I can just edit bootstrap.js in my installation, right?).
By egh on 2015-01-29 18:44:34+00:00
Yes, that should work. You may need to edit when ff is not running.
By titaniumbones on 2015-01-29 19:17:23+00:00
it totally works! it's great.
In my org-add-link-type "zotero"
I added and extra cond
outside of the initial if
in the lambda
-- it seems to me that I will especially want this behaivour when I am using full, bibliography-style citations in the text 9as in syllabi). I guess it should be straightforward to expand the logic to work wiht more cases.
This is a big step forward for me, thank you Erik! I sitll have to figure out how to get citations into ODT, but maybe some of that work will come out on the list.
By egh on 2015-01-29 19:56:47+00:00
Glad it works! You might try pandoc for HTML -> ODT translation as well. :) It's a pretty magic tool.
By titaniumbones on 2015-01-30 02:18:08+00:00
OK, not quite right -- it looks like, in some cases, the "delimiter" (".") gets added to the URL in the href. So, for instance, I have a citation for this source:
(harvested via Zotero in Firefox some time ago). When I export to html, I get this code:
<div class="csl-entry">Broder, John M., and Ian Urbina. “All Eyes Turn to Virginia Senate Race.” <i>The New York Times</i>, November 9, 2006, sec. /. <a href="http://www.nytimes.com/2006/11/09/us/politics/09virginia.html?ex=1320728400&en=e65ed62ff1814d9b&ei=5088&partner=rssnyt&emc=rss.">http://www.nytimes.com/2006/11/09/us/politics/09virginia.html?ex=1320728400&en=e65ed62ff1814d9b&ei=5088&partner=rssnyt&emc=rss.</a></div>
To me that looks like a bug in citeproc-js, so I have reported here:
https://bitbucket.org/fbennett/citeproc-js/issue/171/enabling-wrap_url_and_doi-includes-final
By egh on 2015-01-30 02:22:10+00:00
Nice catch. Might be a while before a fix filters down to Zotero, though. :(
By titaniumbones on 2015-01-30 02:24:45+00:00
yeah, I reckon it will be some time yet... oh well. Still pretty close...
By fbennett on 2015-02-14 04:46:06+00:00
It looks like this may not be a processor bug, actually. The period at the end of the URL appears both in the href and in the link text of the sample cite, which suggests that it's in the input data. To protect against duplicate punctuation, the processor will detect terminal periods, even if they appear in input data, and even if nested in tags or other markup, and suppress its own terminal punctuation if appropriate. This is what the output would look like in that case.
By egh on 2015-02-14 05:05:18+00:00
Thanks for looking into this, Frank.