URLs

The general form for a URL is:

protocol://host/path#name?query
protocol
http, ftp, telnet, gopher ...
host
domain name of host, can include ":port-number"
path
the path name to the document
name
name in an anchor in the document (optional)
query
query string (optional)

Common Gateway Interface

Common Gateway Interface (CGI) is an specification of how a web server can interface with an external program that is responsible for dynamically generating a web page or web content. Input to the CGI script is provided through environment variables and the standard input. The program's output should contain a content header specifying the MIME type of the rest of the output.

In the early days of the web, CGI scripts provided the only way of generating dynamic web pages.

Environment variables

In Windows, Mac OS, and Unix systems, information is passed to a process through a set of environment variables. For example, the PATH used by the command line interpreter specifies the directories to search for a command. The behaviour of many programs are affected by the environment variables. The java interpreter uses the CLASSPATH variable to control the directories and zip files that are searched for class files when running a java program.

The java.lang.System class provides the getenv static method to access a process's environment.

The entire environment can be accessed with the static Map<String,String> getenv() method.

Printing the environment

The environment of a Java program can be printed with:

PrintEnv.java
import java.util.Map;
import java.util.Set;

public class PrintEnv {
    public static void main( String[] args ) {
        Map<String,String> env = System.getenv();
        Set<String> set = env.keySet();

        for ( String n : set ) {
            System.out.println( n + "=" + env.get(n) );
        }
    }
}

CGI Environment variables

SERVER_NAME
contains the name of the host. An IP address can also be used to name the host.
SERVER_PORT
contains the port number of the web server.
REQUEST_METHOD
contains the HTTP method. The method is usually GET or POST.
PATH_INFO
contains the part of the URL after the part that references the CGI script.
SCRIPT_NAME
contains the part of the URL that references the CGI script.
QUERY_STRING
contains only the query string part of the URL, the string after the '?' character.
REMOTE_ADDR
contains the IP number of the client.
HTTP_USER_AGENT
contains the name of the web browser making the request.
CONTENT_TYPE
contains the type of data found in the standard input.
CONTENT_LENGTH
contains the number of bytes in the standard input.

There are more environment variables.

The script's output must contain a header and then the content. The header consists of name/value pairs that provide information about the content. Each pair must appear on a separate line terminated by '\r\n'. The content and header are separated by '\r\n'.

A minimal message could contain:

Content-Type: text/plain
     
hello

A CGI script that outputs the environment

The following Java program is a CGI script that outputs the most import environment variables passed to the script.

EchoEnv.java
public class EchoEnv {
    public static void main( String[] args ) {
	 System.out.print("Content-Type: text/plain\r\n\r\n");
	 System.out.println("server name = " + System.getenv("SERVER_NAME"));
	 System.out.println("server port = " + System.getenv("SERVER_PORT"));
	 System.out.println("method = " + System.getenv("REQUEST_METHOD"));
	 System.out.println("query = " + System.getenv("QUERY_STRING"));
	 System.out.println("info = " + System.getenv("PATH_INFO"));
	 System.out.println("script = " + System.getenv("SCRIPT_NAME"));
	 System.out.println("addr = " + System.getenv("REMOTE_ADDR"));
	 System.out.println("user = " + System.getenv("HTTP_USER_AGENT"));
	 System.out.println("type = " + System.getenv("CONTENT_TYPE"));
	 System.out.println("length = " + System.getenv("CONTENT_LENGTH"));
    }
}

All CGI scripts must output a header line that specifies the MIME type of the generated page. This is done with:


        System.out.print("Content-Type: text/plain\r\n\r\n");
        

The following shell script executes the EchoEnv Java program.

echo.sh
#!/bin/sh
export GENTOO_VM=sun-jdk-1.5
exec java -cp . EchoEnv
	

For a complete list for CGI programs, see here.

Example CGI script to generate a random list

A CGI script to generate a page of a random list. Each list item is a link to another script to generate 20 random numbers.

The Java program that generates the random list page, where the student is 123 is:

GenerateTestPage.java
import java.util.Random;

public class GenerateTestPage {

    private static long parseStudentId( String query )
	throws NumberFormatException, IllegalArgumentException
    {
        String [] params = query.split("&");
	for( int i = 0 ; i < params.length; i++ ) {
	    String [] words = params[i].split("=");
	    if ( words.length != 2 ) continue;
	    if ( words[0].equals("student") ) {
	        return Long.parseLong( words[1] );
	    }
	}
	throw new IllegalArgumentException("missing student param");
    }

    private static final int MAX_LIST = 10;
    private static final int MAX_SEED = 1000;
    private static String genseq = "genseq.sh";

    private static int[] generateList() {
        int[] seeds = new int[MAX_LIST];
	Random rd = new Random();
	System.out.println("<ol>");
	for( int i = 0 ; i < seeds.length; i++ ) {
	    seeds[i] = rd.nextInt( MAX_SEED );
	    String url = genseq + "?seed=" + seeds[i];
	    System.out.println("<li>");
	    System.out.print("<a href='" + url + "'>" ); 
	    System.out.println( seeds[i] + "</a>" ); 
	    System.out.println("</li>");
	}
	System.out.println("</ol>");
	return seeds;
    }

    public static void main( String[] args ) {
	long student;
	TransactionLog log;
	try {
	    student = parseStudentId( System.getenv("QUERY_STRING") );
	    log = new TransactionLog( "transaction.log" );
	    log.record("test: " + student );
	}
	catch( Exception ex ) {
	    System.out.print("Content-Type: text/plain\r\n\r\n");
	    System.out.println(ex.getMessage() );
	    return;
	}
	System.out.print("Content-Type: text/html\r\n\r\n");
	System.out.println("<html><head></head><body>" );
	int[] seeds;
	try {
	    seeds = generateList();
	    StringBuffer sb = new StringBuffer();
	    sb.append( "sum: " + student);
	    for( int i = 0 ; i < seeds.length; i++ ) {
	        sb.append( " " + seeds[i] );
	    }
	    log.record( sb.toString() );
	    log.close();
	}
	catch( Exception ex ) {
	    // ignore error, XXX
	}
	finally {
	    System.out.println("</body></html>" );
	}
    }
}

The genseq CGI script

The Java program that generates the random integer sequence page is:

GenerateRandomSequence.java
import java.util.Random;

public class GenerateRandomSequence {

    private static int parseSeed( String query )
	throws NumberFormatException, IllegalArgumentException
    {
        String [] params = query.split("&");
	for( int i = 0 ; i < params.length; i++ ) {
	    String [] words = params[i].split("=");
	    if ( words.length != 2 ) continue;
	    if ( words[0].equals("seed") ) {
	        return Integer.parseInt( words[1] );
	    }
	}
	throw new IllegalArgumentException("missing seed param");
    }

    private static final int MAX_SEQUENCE = 20;
    private static final int MAX_NUMBER = 100;

    public static void main( String[] args ) {
	 System.out.print("Content-Type: text/plain\r\n\r\n");
	 int seed;
	 try {
	     seed = parseSeed( System.getenv("QUERY_STRING") );
	 }
	 catch( Exception ex ) {
	     System.out.println(ex.getMessage() );
	     return;
	 }
	 Random seq = new Random( seed );
	 for( int i = 0 ; i < MAX_SEQUENCE; i++ ) {
	     System.out.println( seq.nextInt( MAX_NUMBER ) );
	 }
    }
}

Logging and file locking

The two previous CGI scripts use TransactionLog to log information about the requests.

TransactionLog.java
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.FileOutputStream;
import java.io.PrintWriter;
import java.io.File;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;

public class TransactionLog {

    private FileOutputStream fout;
    private FileChannel channel;

    public TransactionLog( String filename ) throws IOException {
	File file = new File( filename );
	// open log file for append only
	fout = new FileOutputStream(file, true);
	channel = fout.getChannel();
    }

    public void record( String transaction ) throws IOException {
	// lock file when saving transaction
	FileLock lock = channel.lock();
	try {
	    PrintWriter pw = 
		new PrintWriter(new OutputStreamWriter(fout) );
	    pw.println( transaction );
	    pw.flush();
	}
	finally {
	    lock.release();
	}
    }

    public void close() throws IOException {
	fout.close();
    }
}

Most web servers allow the concurrent executions of CGI scripts. File locking is necessary to ensure that concurrent access to the log file does not corrupt the file's contents.

Summary report script

The report.sh CGI script saves the reported sum using TransactionLog. The code is:

ReportSum.java
public class ReportSum {

    private static long[] parseParams( String query )
	throws NumberFormatException, IllegalArgumentException
    {
        long[] result = new long[2];

        String [] params = query.split("&");
	for( int i = 0 ; i < params.length; i++ ) {
	    String [] words = params[i].split("=");
	    if ( words.length != 2 ) continue;
	    if ( words[0].equals("student") ) {
	        result[0] = Long.parseLong( words[1] );
	    }
	    else if ( words[0].equals("sum") ) {
	        result[1] = Long.parseLong( words[1] );
	    }
	    else {
		throw new IllegalArgumentException("unknown param");
	    }
	}
	return result;
    }

    public static void main( String[] args ) {
	long[] params;
	TransactionLog log;
	try {
	    params = parseParams( System.getenv("QUERY_STRING") );
	    log = new TransactionLog( "transaction.log" );
	    String remote = System.getenv("REMOTE_ADDR");
	    if ( remote == null ) remote = "";
	    log.record("result: " + params[0] + " " + params[1] + " " + remote);
	}
	catch( Exception ex ) {
	    System.out.print("Content-Type: text/plain\r\n\r\n");
	    System.out.println(ex.getMessage() );
	    return;
	}
	System.out.print("Content-Type: text/plain\r\n\r\n");
	System.out.print("The " + params[1] + " for " + params[0] );
	System.out.println(" has been recorded" );
    }
}

Capturing a CGI script output without a browser

Acesses to web servers is provided by the URL and URLConnection classes defined in the java.net package. The URLConnection class implements the client-side of the HTTP protocol. These classes can be used to retrieve the documents provided by a web server. The Wget program outputs the returned document from the argument URL. The code is:

Wget.java
import java.net.URLConnection;
import java.net.URL;
import java.io.InputStreamReader;

public class Wget {
    public static void main( String[] args ) throws Exception {
	URL url = new URL(args[0]);
	URLConnection conn = url.openConnection();
	conn.connect();
	InputStreamReader content =
	    new InputStreamReader( conn.getInputStream() );
	int ch;
	while( (ch=content.read()) != -1 ) {
	    System.out.print( (char)ch );
	}
    }
}

The output of the genseq CGI can be viewed with the command:


java Wget http://www.cs.mun.ca/cgi-bin/user-cgi/~yzchen/cs3715/GenTest/gentest.sh?student=123

Testing CGI scripts without an HTTP server

The RunCgi provides a test harness for CGI scripts by executing the CGI script in the same way that a web server executes the script. The code for RunCgi is:

RunCgi.java
import java.util.Map;
import java.io.InputStreamReader;
import java.io.InputStream;
import java.io.File;
import java.io.FileOutputStream;

public class RunCgi {

    public static void main( String[] args ) throws Exception {
        ProcessBuilder pb = new ProcessBuilder("/bin/sh", "gentest.sh" );
	pb.directory( new File(".") );
	Map<String, String> env = pb.environment();
	env.clear();
	env.put( "QUERY_STRING", "student=123" );
	Process p = pb.start();
	InputStream instream = p.getInputStream();
	FileOutputStream save = new FileOutputStream("result.txt");
	int ch;
	while( (ch=instream.read()) != -1 ) {
	    save.write( (byte)ch );
	}
	save.close();
    }
}

CGI Setup for CS Students

If enabled, a CS student creates and executes a cgi script by:

  1. Creating the ~/.cgi-bin directory.
  2. Setting the premission to 0700 with chmod 0700 ~/.cgi-bin.
  3. Placing an executable program with the 0700 premissions in the above directory.
  4. The URL to execute the CGI script is:
    http://www.cs.mun.ca/cgi-bin/user-cgi/~user/executable
    where user is the user's login name, and executable is the path to the cgi-script contained in ~/.cgi-bin.