Jump to: navigation, search

Opensearch

This is the page for the (as yet unnamed) project to create search tools for students assigned public domain texts for class, and promote the public domain and open educational resources (OERs).

Inspiration

[1]

Project Summary

  • Databases containing public domain or open-licensed texts that might be assigned in a (college) class
    • Why college students? Because they buy their own books
    • Why texts that might be assigned in class? Because students are already looking for these books, and we can save them money (built-in audience)
  • API to search those databases and return results with a high signal-to-noise ratio for our purposes
    • Social searching? ("x% of users who searched for 'moby dick' found this useful...")
    • Why high signal-to-noise? We want to make this very easy to use (low barrier to entry)
  • Scripts to format output for various formats and devices
    • Read online, download, print
    • Why various formats and devices? We want to make this very easy to use (low barrier to entry)
  • End-user interfaces
    • Web site, Facebook application, etc.
    • Why various end-user interfaces? We want to make this very easy to use (low barrier to entry)
  • Promotional campaign
    • People need to know about this in order to use it!

Name

Everything needs a catchy name. What's our idea?

  • Free College Texts
  • College Texts Search

Databases to search

  • Maintained by: content repositories
  • What archives do we want to search?
    • Must contain public domain or open-licensed texts that might be assigned in a (college) class
    • The higher the signal-to-noise ratio, the better

Project Gutenberg

  • Site: Project Gutenberg
  • Size: > 22,000 (> 19,000 in English) (catalog)
    • Includes < 300 audio books, human-read (catalog)
  • Signal-to-noise: High
  • License: Project Gutenberg License
    • Verbatim redistribution OK; changes to formatting allowed
      • There are some further conditions if you charge money for copies, which we won't, so need to need worry.
    • Or, remove Project Gutenberg trademark and license; then, treat as public domain (any changes allowed)
    • There are also some copyrighted titles in the database. These titles are marked as such, and redistribution without permission is prohibited.
      • How many titles is this?
      • Can these be included for our purposes? (Are we "redistributing", or simply pointing users to the Project Gutenberg copy?)
    • Note: These titles are public domain in the United States; in other countries, YMMV
  • Search API:
    • (Brendan) I contacted the site admin once about this for adding that to a book inventory system but never received a response, if I remember correctly, things could be different now. Scripting their own search forms should not be too difficult but asking would probably be nicer...
    • Should we just store a copy of every book? We can figure that out later.
  • Bandwidth usage policies:
  • Formats:
    • Text: File Formats FAQ
      • Some books are published in multiple formats, but almost everything is published in at least plain text (.txt)
    • Audio books
      • OGG Vorbis (.ogg)
      • iTunes Audiobook (.m4b)
      • MP3 (.mp3)
      • Speex (.spx)
  • Linkback:
  • Contact:

Internet Archive

  • Site: Internet Archive
  • Size:
  • Signal-to-noise:
  • License:
  • Search API:
  • Bandwidth usage policies:
    • Any concerns here? (Will they get upset if we're hitting their database?)
  • Formats:
    • Existing code for making books look nice?
  • Linkback:
  • Contact:

I think the concern was that not all of their links were good. additionally, a good part of their archive is from project guttenberg, so there'd be an overlap (somewhat annoying). however, if they have books project guttenberg does, but from another source, that's a good thing, and makes it more likely that they're going to have the stuff students are looking for.

a small political consideration: linda frueh at the IA was very excited about the program, and it might seem rude to design a whole program that they really like yet exclude their archive

Open Library

  • Site: Open Library
  • How does this play into what we're doing?

WikiSource

  • Site: Wikibooks
  • Size:
  • Signal-to-noise:
  • License:
  • Search API:
  • Bandwidth usage policies:
    • Any concerns here? (Will they get upset if we're hitting their database?)
  • Formats:
    • Existing code for making books look nice?
  • Linkback:
  • Contact:

Audiobooks (human readings)

Librivox

  • Site: Librivox
  • Size:
  • Signal-to-noise:
  • License:
  • Search API:
  • Bandwidth usage policies:
    • Any concerns here? (Will they get upset if we're hitting their database?)
  • Formats:
    • Existing code for making books look nice?
  • Linkback:
  • Contact:

Other databases

Other databases

Questions to ask

  • Site:
  • Size:
  • Signal-to-noise:
  • License:
  • Search API:
  • Bandwidth usage policies:
    • Any concerns here? (Will they get upset if we're hitting their database?)
  • Formats:
    • Existing code for making books look nice?
  • Linkback:
  • Contact:

Search API

  • Maintained by: Open Library (maybe with some help from Free Culture Labs)

Output formats

  • Maintained by Free Culture Labs (probably)

Read online

Download

  • Plain text
  • HTML
  • PDF
  • Other formats?

Print

  • PDF?
  • CSS media-type: print?

Print-on-demand

Audio recording of computer reading (vocoder)

  • Would this be useful?
    • I don't think so. (Who wants to listen to a vocoder for multiple hours?) But we could include open-licensed audio recordings of human readings in the databases we search. --Gavin 02:56, 23 July 2007 (JST)

End-user interfaces

  • Maintained by Free Culture Labs

Web site

  • Brendan made a mockup
  • We should be sure to have a linkback to the original database
    • They'd probably appreciate it if we could even provide a direct link to their support / donations page

Facebook application

OpenSearch plugin

Firefox extension

Add-on to course management systems

Promoting it

Partners

Chapters & campus outreach

Online & media outreach


Timeline

What needs to be done? Who will do it when?

The Future

What will happen to the project in one year's time? (If the answer is "No plans (yet)" that's fine, but we still should think about this.)

Interest list

Feel free to add yourself! Anyone interested in volunteering should email Brendan(?) at (?).