URI vs. IRI
Change the protocol to use an IRI (RFC-3987) instead of a URI (RFC-3986). The text/gemini specification is based upon UTF-8, and it seems to restrict a portion of text/gemini to plain US-ASCII is short sighted. The extensions from a URI to IRI are minimal and the only real issues deal with hostnames and possibly a lack of support of existing URL parsing libraries to properly deal with UTF-8. Initial experiments were promising though, and my thoughts are as follows:
- Servers and clients MUST support IRIs (RFC-3987)
- Servers MUST include the punycoded hostname, and SHOULD include the UTF-8 hostnames in the certificate (only if an international host name is used)
- Client MUST convert an international hostname to punycode to resolve the hostname. RATIONAL: DNS is US-ASCII only, and most registrars, if not all, will convert international hostnames to punycode. There are other systems in use to resolve names (multicast DNS, host files) but DNS is still the primary method.
- Clients SHOULD use the punycoded hostname when making a request.
- Clients MAY percent-encode any UTF-8 in the path and query portion before sending to the server.
- Servers MUST accept punycoded hostnames, and SHOULD handle UTF-8 based hostnames.
- Servers MUST accept non-encoded UTF-8 characters in the path and query portion (unless otherwise stated in RFC-3987).
- Server SHOULD deal with paths and query with percent-encoded UTF-8 data.