A Simple Documentation of CERN HTTPD Implementation Jin Zhang =============================================================== How does the HTTP Daemon works basically and functions that are involved in the changes to HTTP Daemon --------------------------------------------------------------- FUNCTION: main(int argc, char** argv) in HTDaemon.c 1. Call HTDefaultConfig() in HTConfig.c set the default configuration for "sc" and "cc" 2. Call HTFileInit() in HTSInit.c Initialize filename suffix table 3. Read the command line arguments and set the configuration accordingly and load the configuration file when necessary. 4. Call HTServerInit() in HTConfig.c Initialize default error message and default icons. 5. Call do_bind() in HTDaemon.c to allocate one socket and bind it on the specified port, listen to request on this port. 6. Set the parent process uid and gid if necessary. 7. If specified, call daemon_start() in HTDaemon.c to fork child processes twice and exit the parent processes so that the application runs in background. 8. Call set_signals() in HTDaemon.c to set signal handlers or ignore some signals. 9. Write the process id of current process into pid file, the pid file is in server_root directory if specified, or in /tmp if unspecified, named as "httpd-pid". 10. Call HTUserInit() in HTUserInit.c. This is an empty function at this moment. 11. Call server_loop() in HTDaemon.c --------------------------------------------------------------- FUNCTION: server_loop() in HTDaemon.c For unix, just call standalone_server_loop() in HTDaemon.c --------------------------------------------------------------- FUNCTION: standalone_server_loop() in HTDaemon.c 1. Run a loop, in the loop use select with timeout to monitor the requests. If there is a request coming, break from the loop. If timeout, do garbage collection and go to loop again. 2. Accept the request, if the request is not for the HTTPD, go to loop again. 3. Fork a child process, the parent call sig_child() in HTDaemon.c which calls wait3 with WNOHANG to wait for child process' status and then go to loop again. 4. Child process mainly calls HTHandle in HTDaemon.c to Handle this request. 5. Shutdown the connection between server and client and exit the child process. --------------------------------------------------------------- FUNCTION: HTHandle(int soc) in HTDaemon.c 1. Call reset_server_env() in HTDaemon.c to reset all the HT global variables. 2. Call HTParseRequest() in HTRequest.c to read and parse request from client. Returns a HTRequest type variable which contains information about the request. 3. Call HTAA_checkAuthorization() in HTAAServ.c to check the authorization of this request. 4. If the request is not authorized, log request and return. 5. Handle the request according to types: 1) HTTA_OK_GATEWAY: the httpd is used as a gateway. 1.1) there is cache and the request is authorized to access the cache, look up the cache. a) If the document is found in cache, open the document and send to client directly. b) Not found in cache, server only used for cache and has no connection to outside, send error message to client. c) Need to create the cache document, call HTProxyCache() in HTCache.c so that the document will be cached as well as being sent to client. If the request is "http:" request, call HTBodyStream() in HTDaemon.c Call HTLoadToStream in HTAccess.c to get document and send to client, cache if necessary. 1.2) no cache or not authorized to access cache, call hbuf_proxy_headers() in HTRequest.c to generate a proxy request header; call HTLoadToStream() in HTAccess.c to get document from server and send to client. 2) HTAA_OK_REDIRECT or HTAA_OK_MOVED 2.1) if redirection, call HTLoadRedirection() 2.2) HTCallScript() if need to run a script. /* haven't go through the details of this part yet */ 3) else, load a document normally to client 3.1) GET method: call HTRetrieve() in HTRetrieve.c to retrieve a document and send to client. 3.2) other methods: /* haven't go through the details of this part yet */ 6. log the request and return. =============================================================== 1) How does a proxy server forward a request? 1.1) In HTHandle(), whenever the proxy server need to send HTTP request to the http server, it will call hbuf_proxy_headers() to construct the HTProxyHeaders (global variable), which will be sent as a line in the HTTP request to http server. This header must include two lines as: User-Agent: UserAgent Via: HTTP_VERSION Nobody (HTAppName:HTAppVersion) 1.2) HTLoadToStream() ---> HTLoadDocument() ---> HTLoad() ---> HTLoadHTTP() ---> HTTPSendRequest() Proxy server call HTTPSendRequest() to send the request header to http server. When the global variable HTProxyHeaders is set, its content will be included in the request sent to http server. 1.3) HTTPSendRequest() -> ... -> HTTPGetBody, which create a "tee"-ed pipe for both the client connection and the cache file, and calls HTCopy to pull the data from socket and push it to the two streams. After HTCopy is done, the document is "loaded". 2) How does the http server recognize the requests? 2.1) In HTHandle(), the http server call HTParseRequest() to parse the requests. If it reads the header beginning with "User-Agent", it will set the global variable HTUserAgent, else HTUserAgent is a NULL pointer. 3) How does the http server compose response to proxy or client? 3.1) HTLoadToStream() ---> HTLoadDocument() ---> HTLoad() ---> HTLoadFile() ---> HTParseFile() ---> HTStreamStack() ---> HTMIMEWrapper() ---> HTReplyHeaders() ---> HTReplyHeadersWith() ---> Generate the header of the reply message. 4) How does proxy read and parse reply from the http server? HTLoadToStream() ---> HTLoadDocument() ---> HTLoad() ---> HTLoadHTTP() ---> HTCopy() ---> BODY_put_block() ---> BODY_put_char() ---> cache_put_char() In cache_put_char(), proxy server read and parse the reply from http server.