Approach so far
- Network-wide search and discovery will be powered by HQ API service
- HQ API will run on the same server as a regular instance of MoodleNet run by Moodle HQ
- Every MoodleNet instance will have a 'switch' (possibly an API key that sysadmin enters upon setup) to enable/disable public content to be discoverable and for users to be able to search network-wide, which when turned on means that when something is posted that is 'public', it will be pushed (using the ActivityPub federation mechanism) by the user's instance to the HQ instance
- HQ API instance will push an index of all public content to the search index (powered by Algolia)
- HQ API will integrate Algolia's API/SDK for search functionality
Implementation
Indexing MoodleNet > Algolia
- There's an Elixir SDK if we want to index to Algolia directly from our back-end: https://github.com/sikanhe/algolia-elixir
- It looks like there might be a way to auto-sync between GraphQL->Algolia: https://medium.com/@graphcool/algolia-auto-sync-for-graphql-backends-f78678f45889
- Otherwise we could create a separate GraphQL client app which fetches the latest data from the HQ API instance and pushes it to Algolia, maybe in JS to re-use code from the front-end:
Searching
- On the front-end, we could implement search by:
- Algolia SDK interfaces: https://www.algolia.com/doc/guides/search-ui/search-libraries/
- Implement API in JS: https://www.algolia.com/doc/api-client/getting-started/install/javascript/
- Queries in GraphQL: https://www.npmjs.com/package/apollo-link-algolia
- Via a bridge/proxy at the HQ API level
Other things (to research)
- Search in MVP (notes from Mike Larsson) https://docs.google.com/document/d/13jfzSW9i1_H-S8ecnWw11uo11oMedRnpwpyfHy6wst8/edit#
- Comparison of search solutions: https://stackshare.io/search-as-a-service
- Apache Nutch
- Scrapy
- https://en.wikipedia.org/wiki/Apache_Solr comes recommended
- Elasticsearch is also promising https://www.elastic.co/products/elasticsearch (both are using Lucene under the hood)
SaaS:
- https://www.algolia.com (very fast, including frontend UI)
Approaches
gRSShopper in a Box https://www.downes.ca/cgi-bin/page.cgi?post=67078
Sources of OER content & metadata
Documents
- Phil's list of sources: http://bit.ly/2MQVr88
- other info from Phil: https://docs.google.com/document/d/1aFmndUqQSIzYOXKd6Fu6iOoeOo4H4dvYey4KrOEHyYU/edit
OER search engines
- http://www.solvonauts.org
- http://solvonauts.org/?action=api_search&term=biology - this is the API end point as is
- https://github.com/pgogy/solvonauts-opendata/blob/master/schema.sql - DB schema
- https://github.com/pgogy/solvonauts/blob/master/modules/search/search.inc - line 55 for search code
Also see deprecated Trello card: https://trello.com/c/pDhMxS7y/65-search-filtering-indexing-oer-repositories