Commit f9cfbc40 authored by TheOuterLinux's avatar TheOuterLinux

...

parent 0c5905d0
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
This diff is collapsed.
......@@ -35,6 +35,7 @@ pyradio - Play internet radio stations from the command-line
rainbowstream - a smart and nice Twitter client on terminal
rsstail - console RSS reader that monitors a feed and outputs new entries
rtv - Reddit Terminal Viewer
scholar.py - scrape Google Scholar articles
setnet.sh - minimalist shell script for network configuration with dialog
interface
sftp - secure file transfer program
......
scholar.py
==========
scholar.py is a Python module that implements a querier and parser for Google Scholar's output. Its classes can be used independently, but it can also be invoked as a command-line tool.
The script used to live at http://icir.org/christian/scholar.html, and I've moved it here so I can more easily manage the various patches and suggestions I'm receiving for scholar.py. Thanks guys, for all your interest! If you'd like to get in touch, email me at christian@icir.org or ping me [on Twitter](http://twitter.com/ckreibich).
Cheers,<br>
Christian
Features
--------
* Extracts publication title, most relevant web link, PDF link, number of citations, number of online versions, link to Google Scholar's article cluster for the work, Google Scholar's cluster of all works referencing the publication, and excerpt of content.
* Extracts total number of hits as reported by Scholar (new in version 2.5)
* Supports the full range of advanced query options provided by Google Scholar, such as title-only search, publication date timeframes, and inclusion/exclusion of patents and citations.
* Supports article cluster IDs, i.e., information relating to the variants of an article already identified by Google Scholar
* Supports retrieval of citation details in standard external formats as provided by Google Scholar, including BibTeX and EndNote.
* Command-line tool prints entries in CSV format, simple plain text, or in the citation export format.
* Cookie support for higher query volume, including ability to persist cookies to disk across invocations.
Note
----
I will always strive to add features that increase the power of this
API, but I will never add features that intentionally try to work
around the query limits imposed by Google Scholar. Please don't ask me
to add such features.
Examples
--------
Try scholar.py --help for all available options. Note, the command line arguments changed considerably in version 2.0! A few examples:
Retrieve one article written by Einstein on quantum theory:
$ scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"
Title On the quantum theory of radiation
URL http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Year 1917
Citations 184
Versions 3
Cluster ID 17749203648027613321
PDF link http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations list http://scholar.google.com/scholar?cites=17749203648027613321&as_sdt=2005&sciodt=0,5&hl=en
Versions list http://scholar.google.com/scholar?cluster=17749203648027613321&hl=en&as_sdt=0,5
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
Note the cluster ID in the above. Using this ID, you can directly access the cluster of articles Google Scholar has already determined to be variants of the same paper. So, let's see the versions:
$ scholar.py -C 17749203648027613321
Title On the quantum theory of radiation
URL http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations 184
Versions 0
Cluster ID 17749203648027613321
PDF link http://icole.mut-es.ac.ir/downloads/Sci_Sec/W1/Einstein%201917.pdf
Citations list http://scholar.google.com/scholar?cites=17749203648027613321&as_sdt=2005&sciodt=0,5&hl=en
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
Title ON THE QUANTUM THEORY OF RADIATION
URL http://www.informationphilosopher.com/solutions/scientists/einstein/1917_Radiation.pdf
Citations 0
Versions 0
PDF link http://www.informationphilosopher.com/solutions/scientists/einstein/1917_Radiation.pdf
Excerpt The formal similarity between the chromatic distribution curve for thermal radiation [...]
Title The Quantum Theory of Radiation
URL http://web.ihep.su/dbserv/compas/src/einstein17/eng.pdf
Citations 0
Versions 0
PDF link http://web.ihep.su/dbserv/compas/src/einstein17/eng.pdf
Excerpt 1 on the assumption that there are discrete elements of energy, from which quantum [...]
Let's retrieve a BibTeX entry for that quantum theory paper. The best BibTeX often seems to be the one linked from search results, not those in the article cluster, so let's do a search again:
$ scholar.py -c 1 --author "albert einstein" --phrase "quantum theory" --citation bt
@article{einstein1917quantum,
title={On the quantum theory of radiation},
author={Einstein, Albert},
journal={Phys. Z},
volume={18},
pages={121--128},
year={1917}
}
Report the total number of articles Google Scholar has for Einstein:
$ scholar.py --txt-globals --author "albert einstein" | grep '\[G\]' | grep Results
[G] Results 4190
License
-------
scholar.py is using the standard [BSD license](http://opensource.org/licenses/BSD-2-Clause).
Usage: scholar.py [options] <query string>
A command-line interface to Google Scholar.
Examples:
# Retrieve one article written by Einstein on quantum theory:
scholar.py -c 1 --author "albert einstein" --phrase "quantum theory"
# Retrieve a BibTeX entry for that quantum theory paper:
scholar.py -c 1 -C 17749203648027613321 --citation bt
# Retrieve five articles written by Einstein after 1970 where the title
# does not contain the words "quantum" and "theory":
scholar.py -c 5 -a "albert einstein" -t --none "quantum theory" --after 1970
Options:
-h, --help show this help message and exit
Query arguments:
These options define search query arguments and parameters.
-a AUTHORS, --author=AUTHORS Author name(s)
-A WORDS, --all=WORDS Results must contain all of these words
-s WORDS, --some=WORDS Results must contain at least one of these words. Pass
arguments in form -s "foo bar baz" for simple words, and
-s "a phrase, another phrase" for phrases
-n WORDS, --none=WORDS Results must contain none of these words. See -s|--some
re. formatting
-p PHRASE, --phrase=PHRASE Results must contain exact phrase
-t, --title-only Search title only
-P PUBLICATIONS, --pub=PUBLICATIONS Results must have appeared in this publication
--after=YEAR Results must have appeared in or after given year
--before=YEAR Results must have appeared in or before given year
--no-patents Do not include patents in results
--no-citations Do not include citations in results
-C CLUSTER_ID, --cluster-id=CLUSTER_ID Do not search, just use articles in given cluster ID
-c COUNT, --count=COUNT Maximum number of results
Output format:
These options control the appearance of the results.
--txt Print article data in text format (default)
--txt-globals Like --txt, but first print global results too
--csv Print article data in CSV form (separator is "|")
--csv-header Like --csv, but print header with column names
--citation=FORMAT Print article details in standard citation format.
Argument Must be one of "bt" (BibTeX), "en" (EndNote),
"rm" (RefMan), or "rw" (RefWorks).
Miscellaneous:
--cookie-file=FILE File to use for cookie storage. If given, will read any
existing cookies if found at startup, and save resulting
cookies in the end.
-d, --debug Enable verbose logging to stderr. Repeated options
increase detail of debug output.
-v, --version Show version information
This diff is collapsed.
......@@ -259,6 +259,7 @@ sane - Scanner Access Now Easy: API for accessing scanners
sc - spreadsheet calculator
scanimage - scan an image
schismtracker - tracked music editor based on Impulse Tracker
scholar.py - scrape Google Scholar articles
screen - screen manager with VT100/ANSI terminal emulation
script - make typescript of terminal session
sed - stream editor for filtering and transforming text
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment