Future of Information Technology: UNIT

UNIT – 3 World Wide Web

3.1 Introduction

The World Wide Web (commonly shortened to the Web) is a system of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain text, images, videos, and other multimedia and navigate between them using hyperlinks.
The World Wide Web enabled the spread of information over the Internet through an easy-to-use and flexible format. It thus played an important role in popularising use of the Internet, to the extent that the World Wide Web has become a synonym for Internet, with the two being conflated in popular use.
How it works
Viewing a Web page on the World Wide Web normally begins either by typing the URL of the page into a Web browser, or by following a hyperlink to that page or resource. The Web browser then initiates a series of communication messages, behind the scenes, in order to fetch and display it.
First, the server-name portion of the URL is resolved into an IP address using the global, distributed Internet database known as the domain name system, or DNS. This IP address is necessary to contact and send data packets to the Web server.
The browser then requests the resource by sending an HTTP request to the Web server at that particular address. In the case of a typical Web page, the HTML text of the page is requested first and parsed immediately by the Web browser, which will then make additional requests for images and any other files that form a part of the page. Statistics measuring a website's popularity are usually based on the number of 'page views' or associated server 'hits', or file requests, which take place.
Client Server Architecture of WWW
WWW is based on Client-server architecture, where web browser is the client which sends HTTP request and web server filfills that request.
3.2 Web Browser

A Web browser is a software application which enables a user to display and interact with text, images, videos, music, games and other information typically located on a Web page at a Web site on the World Wide Web or a local area network. Text and images on a Web page can contain hyperlinks to other Web pages at the same or different Web site. Web browsers allow a user to quickly and easily access information provided on many Web pages at many Web sites by traversing these links. Web browsers format HTML information for display, so the appearance of a Web page may differ between browsers.
Web browsers are the most-commonly-used type of HTTP user agent. Although browsers are typically used to access the World Wide Web, they can also be used to access information provided by Web servers in private networks or content in file systems
Current Web browsers
Some of the Web browsers currently available for personal computers include Internet Explorer, Mozilla Firefox, Safari, Opera, Avant Browser, Konqueror, Lynx, Google Chrome, Flock, Arachne, Epiphany, K-Meleon and AOL Explorer.
Protocols and standards
Web browsers communicate with Web servers primarily using Hypertext Transfer Protocol (HTTP) to fetch Web pages. HTTP allows Web browsers to submit information to Web servers as well as fetch Web pages from them. The most-commonly-used version of HTTP is HTTP/1.1, which is fully defined in RFC 2616. HTTP/1.1 has its own required standards that Internet Explorer does not fully support, but most other current-generation Web browsers do.
Pages are located by means of a URL(Uniform Resource Locator, RFC 1738), which is treated as an address, beginning with http: for HTTP transmission. Many browsers also support a variety of other URL types and their corresponding protocols, such as gopher: for Gopher (a hierarchical hyperlinking protocol), ftp: for File Transfer Protocol (FTP), rtsp: for Real-time Streaming Protocol (RTSP), and https: for HTTPS (HTTP Secure, which is HTTP augmented by Secure Sockets Layer or Transport Layer Security).
The file format for a Web page is usually HTML (HyperText Markup Language) and is identified in the HTTP protocol using a MIME content type. Most browsers natively support a variety of formats in addition to HTML, such as the JPEG, PNG and GIF image formats, and can be extended to support more through the use of plugins. The combination of HTTP content type and URL protocol specification allows Web-page designers to embed images, animations, video, sound, and streaming media into a Web page, or to make them accessible through the Web page.
3.2.1 Web Browser Interface

3.2.2 Web Browser Details

3.2.2.1 Personal preferences

Most browsers have a number of options that you can set.
· Cookies:
HTTP cookies, more commonly referred to as Web cookies, tracking cookies or just cookies, are parcels of text sent by a server to a Web client (usually a browser) and then sent back unchanged by the client each time it accesses that server. HTTP cookies are used for authenticating, session tracking (state maintenance), and maintaining specific information about users, such as site preferences or the contents of their electronic shopping carts. The term "cookie" is derived from "magic cookie," a well-known concept in UNIX computing which inspired both the idea and the name of HTTP cookies.
Relevant count of maximum stored cookies per domain for the major browsers are:
Firefox 1.5: 50
Firefox 2.0: 50
Opera 9: 30
Internet Explorer 6: 20 (raised to 50 in update on August 14, 2007)
Internet Explorer 7: 20 (raised to 50 in update on August 14, 2007)
Enabling Cookies in Your Web Browser
In order for this service to work correctly, cookies must be enabled in your Web browser. A cookie is a message given to a Web browser by a Web server. The browser stores the message in a text file and sends it back to the server when it's needed. Cookies are used in this service to save your username and password and establish and maintain a path to a particular server during the research session.
To enable cookies in Internet Explorer,
1. From the Tools menu, choose Internet Options.
2. Click the Security tab and then click Custom Level.
3. Scroll to Cookies and select Enable. This allows cookies to be stored on your computer.
4. Click OK in the Security Settings dialog box.
5. Click OK in the Internet Options dialog box.
To enable cookies in Netscape,
1. From the Edit menu, choose Preferences.
2. Click Advanced in the left frame.
3. In the Cookies section, select Accept all cookies.
4. Click OK.

· Disk cache:
A disk cache is a mechanism for improving the time it takes to read from or write to a hard disk. Today, the disk cache is usually included as part of the hard disk. A disk cache can also be a specified portion of random access memory (RAM).
When a web page is displayed within your web browser, the text and any pictures are stored locally on your system or commonly known as temporary files. The next time the same web page is visited, you web browser communicates with the web browser to confirm that the page and pictures stored on your system are the latest version. If so, the page is loaded from your local hard disk versus loading the page from the web server.
Disk cache set up instructions for your Internet Explorer Web Browser:
Step 1.
Move your mouse to the top line of your web browser to the word "View" and press and hold your left mouse button.
Step 2.
Move your mouse pointer down to "Internet Options" and release the left mouse button.

Step 3.
Move your mouse pointer and press the button "Settings..."
Step 4.
Select one of the two options to:
Every visit to the page
or
Every time you start Internet Explorer

Step 5.
Press the button "OK".
Step 6.
Press the button "OK".

· Fonts
Set the fonts large enough to be seen easily, generally at least point size 12. There are often two settings, one for normal text, and one for old-fashioned typewriter type text that needs each character to take the same amount of space. Set the normal text -- perhaps called "Web page font", "Proportional", or "Variable width font" -- to a modern font like "Arial" or "Times New Roman". Set the typewriter style text -- perhaps called "Plain text font", "Fixed width font", or "Monospace" -- to "Courier New" or if not available then plain "Courier".
Explorer: Tools / Internet Options / General / FontsFirefox: Tools / Options / General / Fonts & Colors.

· URL display
Set the browser to display the site's URL in the top border, so that you can always see the full address of any page you visit.
Explorer: View / Toolbars / Address BarFirefox: Right-click on File / Edit bar at the top of window, select "Navigation Toolbar".

· Set home page
Set your home page to the Internet site you wish displayed when your browser starts. You can set your home page to your own home page, a search engine site, or a favorite subject page like a sports or gardening site. If you want your first page to start quickly, you should specify a simple page with fewer graphics, or no page at all.
Explorer: Tools / Internet Options / General / Home pageFirefox: Tools / Options / General / Home page

· Page reloading
Set your browser to use a copy of a web page in your computer's cache if it is not older than the maximum age specified by the server. Typically a browser will use a cached page throughout one browser session, unless the page is marked with HTML that specifies it be reloaded with some other frequency, such as every time, after one hour, after 24 hours, etc. This can speed up your surfing since it saves the time of reloading the page when it hasn't changed.
Explorer: Tools / Internet Options / General / Temporary Internet Files / Settings / Every time you start Internet ExplorerFirefox: Automatic setting.
· Images

Managing images in browsers:
Mozilla Firefox 2.0
On Windows, select the "Tools" menu, on Mac OSX, select the "Firefox" menu, or on Linux select the "Edit" menu.
Select "Options" on Windows, or "Preferences" on Mac OSX and Linux
Click the "Content" tab.
Uncheck the checkbox "Load images automatically."
Internet Explorer 7
Click the "Tools" button.
Select "Internet Options".
Click the "Advanced" tab.
Scroll down to "Multimedia" and uncheck "Show pictures".

· Javascript and JAVA

To enable JavaScript in Microsoft Internet Explorer 5.x or 6.x, perform the following steps:
From the Tools menu, click Internet Options.
From the Security tab, click Custom Level.
The Internet icon is highlighted by default.
Scroll to Java permissions and click to select High safety.
Click OK.

Click OK.

From the File menu, click Close.
Re-launch your browser.

To enable or disable Java
1. Open Internet Explorer by clicking the Start button , and then clicking Internet Explorer
2. Click the Tools button, and then click Internet Options.
3. Click the Advanced tab.
4. If Java is installed, there will be a Java section in the Settings list. To enable Java, select the option under Java. To disable Java, clear the option under Java. When you are finished, click OK.

3.2.2.2 Bookmarks/ Favourites
Internet bookmarks are stored Web page locations (URLs) that can be retrieved. As a feature of all modern Internet web browsers, their primary purpose is to easily catalog and access web pages that a user has visited and chosen to save. Saved links are called "favorites" in Internet Explorer, and by virtue of the browser's large market share, the term favorite has been synonymous with bookmark since the early days of widely-distributed browsers. Bookmarks are normally visible in a browser menu and stored on the user's computer, and commonly a folder metaphor is be used for organization. In addition to bookmarking methods within most browsers, many external applications exist for bookmark management.
Storage
Each browser has a built-in tool for managing the list of bookmarks. The list storage method varies, depending on the browser, its version, and the operating system on which it runs.
In Netscape-derived browsers, bookmarks are stored in the single HTML-coded file bookmarks.html. This approach permits publication and printing of a categorized and indented catalog, and works across platforms. Bookmark names need not be unique. Editing this file outside of its native browser requires editing HTML.
Firefox 3 stores bookmarks, history, cookies, and preferences in a transactionally secure database format (SQLite).
In Internet Explorer, "Favorites" (also "Internet Shortcuts") are stored as individual files named with the original link name, and the filename extension ".URL", for example "Home Page.URL". They are collected in a directory named "Favorites", which may have subdirectories. Bookmark names must be unique within a folder. Each file contains the original URL and Microsoft-specific metadata. Browsers have varying abilities to import and export bookmarks to favorites and vice versa.
Managing Bookmarks in Internet Explorer:
1. Launch Firefox from the icon on your desktop or through your "Start"(Explorer) menu by finding Mozilla Firefox in your "Programs" or "All Programs" section.
2. Click on "Bookmarks" in the top toolbar (in between "History" and "Tools"). Under this menu, click "Organize Bookmarks". Another browser window will pop up listing all your bookmarks. In this new window, there is a list of icons at the top.
3. Select "Bookmarks" from the far left pane, then click on "New Folder", which is the second icon button at the top. A new folder will appear at the bottom of your bookmark list in the far left pane, and a dialog window will appear.
4. Rename the folder to something appropriate, then click "Ok".

5. Scroll through your bookmarks in the right pane to find the ones you want to move to this folder.
Click on the first one you'd like to move.
Hold down the "Ctrl" key and continue to click on others to move. This will highlight only the bookmarks you select.

6. Click and drag your selections to the new folder you have created in the far left pane.
7. Repeat these steps to create additional folders and organize related bookmarks.

Managing favourites in Internet Explorer:
1. In Microsoft Internet Explorer, go to a Web site that you would like to go back to frequently.
2. Select Add to Favorites from the Favorites menu.
3. In the Name box, type in a name for the favorite page.
4. Create a new folder in which to organize your bookmark by clicking the New Folder button. (If the New Folder button is not visible, first click the Create In button.)
5. Type a name for the new folder, and then click OK.
6. In the Add Favorite dialog box, make sure the new folder is selected, and then click OK.
7. Whenever you want to view your new favorite page, click the Favorites menu. Point to the new folder that you created to view a list of the favorite pages contained within it, and then click the book marked Web site.
8. Keep adding sites to your Favorites list and organizing them into curriculum-related folders.

3.2.2.3 Plug-ins and Helper applications
Plug-ins:
A plug-in consists of a computer program that interacts with a host application (a web browser or an email client, for example) to provide a certain, usually very specific, function "on demand". Applications support plugins for many reasons.
Some of the main reasons include:
to enable third-party developers to create capabilities to extend an application
to support features yet unforeseen
to reduce the size of an application
to separate source code from an application because of incompatible software licenses.

Following are the examples of Plug-ins:
Beatnik
QuickTime
RealPlayer
Shockwave
VivoActive Player
· Beatnik:
Beatnik delivers high-quality interactive sound from websites. It is provided by Headspace, Inc. and is available for Netscape Navigator and Communicator on both Macintosh PowerPC and Windows 95/NT. The Headspace website offers detailed information on Beatnik, as well as an array of sites that showcase the plug-in's capabilities.
· QuickTime:
QuickTime, a product of Apple, Inc., is capable of delivering multimedia such as movies, audio, MIDI soundtracks, 3D animation, and virtual reality. It is available to Macintosh and Windows 3.x/95/NT. The QuickTime package contains plug-in and helper applications. The QuickTime Plug-in allows QuickTime and QuickTime VR content to be viewed directly within a browser. The Movie Player and Picture Viewer, helper applications, allow all QuickTime multimedia to be played (file creation and editing can be completed with QuickTime Pro).
· RealPlayer:
RealPlayer is a live and on-demand RealAudio and Real Video player, which functions without, download delays. It is provided by Real Networks, Inc. and is available for Macintosh, Unix, and Windows 3.1/95/NT as both a plug-in and helper application. To test your RealPlayer plug-in, visit any of the sites listed in their showcase. The plug-in is compatible with many popular browsers.
· Shockwave:
The Shockwave plug-in, provided by Macromedia, Inc., allows multimedia files created using Macromedia's Director, Authorware, and Flash to be viewed directly in your web browser. The plug-in is compatible with Netscape Navigator 2.0 or later and Internet Explorer 3.0 or later on Macintosh and Windows 3.1/95/NT platforms.
· VivoActive Player:
VivoActive Player delivers on-demand video and audio from any website offering VivoActive content. This product, provided by Vivo Software, Inc., is available for Netscape Navigator and Microsoft Internet Explorer browsers on Power Macintosh and Windows 3.x/95/NT platforms.

Helper Applications:
A helper application is an external viewer program launched to display content retrieved using a web browser.
Unlike a plugin(whose full code is included into browser code), a small line is added to the browser code to tell it to open a certain helper application in case it encounters a certain file format.
Following are the examples of helper Applications:
· Adobe Acrobat Reader
· QuickTime
· RealPlayer
· Adobe Acrobat Reader
Acrobat Reader allows you to access PDF files on the web. It is provided by Adobe Systems, Inc. and is available for Macintosh, Unix, and Windows 3.x/95/NT.
· QuickTime
QuickTime, a product of Apple, Inc., is capable of delivering multimedia such as movies, audio, MIDI soundtracks, 3D animation, and virtual reality. It is available to Macintosh and Windows 3.x/95/NT. The QuickTime package contains plug-in and helper applications. The QuickTime Plug-in allows QuickTime and QuickTime VR content to be viewed directly within a browser. The Movie Player and Picture Viewer, helper applications, allow all QuickTime multimedia to be played (file creation and editing can be completed with QuickTime Pro).
· RealPlayer
RealPlayer is a live and on-demand RealAudio and RealVideo player which functions without download delays. It is provided by RealNetworks, Inc. and is available for Macintosh, Unix, and Windows 3.1/95/NT as both a plug-in and helper application. To test your RealPlayer plug-in, visit any of the sites listed in their showcase. The plug-in is compatible with many popular browsers. For a complete listing visit the RealPlayer system requirements page.

3.3 Search engine
A Web search engine is a tool designed to search for information on the World Wide Web. Information may consist of web pages, images, information and other types of files. Some search engines also mine data available in newsbooks, databases, or open directories. Unlike Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input.
How Web search engines work
A search engine operates, in the following order
Web crawling
Indexing
Searching
1. Web crawling
A Web crawler (also known as a Web spider, Web robot, is a program or automated script that browses the World Wide Web in a methodical, automated manner. Other less frequently used names for Web crawlers are ants, automatic indexers, bots, and worms.
This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).
A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.
Crawling policies
There are three important characteristics of the Web that make crawling it very difficult:
its large volume,
its fast rate of change, and
dynamic page generation.
The behavior of a Web crawler is the outcome of a combination of policies:
a selection policy that states which pages to download,
a re-visit policy that states when to check for changes to the pages,
a politeness policy that states how to avoid overloading Web sites, and
a parallelization policy that states how to coordinate distributed Web crawlers.
Web crawler architectures

Examples of Web crawlers
The following is a list of published crawler architectures for general-purpose:
RBSE
WebCrawler
World Wide Web
Google Crawler
PolyBot
FAST Crawler
Open-source crawlers
DataparkSearch
GNU Wget
Heritrix
HTTrack
ICDL Crawler.
Pavuk
YaCy

2. Indexing
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is Web indexing.
The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
Index Design Factors
Major factors in designing a search engine's architecture include:
a. Merge factors
How data enters the index, or how words or subject features are added to the index during text corpus traversal, and whether multiple indexers can work asynchronously.
b. Storage techniques
How to store the index data, that is, whether information should be data compressed or filtered.
c. Index size
How much computer storage is required to support the index.
d. Lookup speed
How quickly a word can be found in the inverted index.
e. Maintenance
How the index is maintained over time.
f. Fault tolerance
How important it is for the service to be reliable.

Document Parsing
Document parsing breaks apart the components (words) of a document or other form of media for insertion into the forward and inverted indices. The words found are called tokens, and so, in the context of search engine indexing and natural language processing, parsing is more commonly referred to as tokenization. It is also sometimes called word boundary disambiguation, tagging, text segmentation, content analysis, text analysis, text mining, concordance generation, speech segmentation, lexing, or lexical analysis. The terms 'indexing', 'parsing', and 'tokenization' are used interchangeably in corporate slang.

3. Searching
A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.
Types
There are three broad categories that cover most web search queries[1]:
Informational queries – Queries that cover a broad topic (e.g., colorado or trucks) for which there may be thousands of relevant results.
Navigational queries – Queries that seek a single website or web page of a single entity (e.g., youtube or delta airlines).
Transactional queries – Queries that reflect the intent of the user to perform a particular action, like purchasing a car or downloading a screen saver.
Search engines often support a fourth type of query that is used far less frequently:
Connectivity queries – Queries that report on the connectivity of the indexed web graph.
Characteristics
Most commercial web search engines do not disclose their search logs, so information about what users are searching for on the Web is difficult to come by. Nevertheless, a study in 2001 [3] analyzed the queries from the Excite search engine showed some interesting characteristics of web search:
The average length of a search query was 2.4 terms.
About half of the users entered a single query while a little less than a third of users entered three or more unique queries.
Close to half of the users examined only the first one or two pages of results (10 results per page).
Less than 5% of users used advanced search features (e.g., Boolean operators like AND, OR, and NOT).

3.3.1 Metasearch engine
Meta-search engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search engine to index it all and that more comprehensive search results can be obtained by combining the results from several search engines. This also may save the user from having to use multiple search engines separately.
The term Metasearch is frequently used to classify a set of commercial search engines, see the list of search engines, but is also used to describe the paradigm of searching multiple data sources in real time. The National Information Standards Organization (NISO) uses the terms Federated Search and Metasearch interchangeably to describe this web search paradigm.

Architecture of a metasearch engine

Operation
Metasearch engines create what is known as a virtual database. They do not compile a physical database or catalogue of the web. Instead, they take a user's request, pass it to several other heterogeneous databases and then compile the results in a homogeneous manner based on a specific algorithm.
No two metasearch engines are alike. Some search only the most popular search engines while others also search lesser-known engines, newsgroups, and other databases. They also differ in how the results are presented and the quantity of engines that are used. Some will list results according to search engine or database. Others return results according to relevance, often concealing which search engine returned which results. This benefits the user by eliminating duplicate hits and grouping the most relevant ones at the top of the list.

3.4 Telnet

Telnet (Telecommunication Network) is a client – server protocol, based on a reliable connection – oriented transport. Telnet is a program that allows a user to login to another computer from an account into which the user is already logged.
It was developed in 1969 beginning with RFC 15 and standardized as IETF STD 8, one of the first Internet standards. Typically, telnet provides access to a command-line interface on a remote machine

Most implementations of TELNET have no authentication to ensure that communication is carried out between the two desired hosts and not intercepted in the middle.
The term telnet also refers to software which implements the client part of the protocol. Telnet clients are available for virtually all platforms. Most network equipment with a TCP/IP stack support some kind of Telnet service server for their remote configuration (including ones based on Windows NT). Because of security issues with Telnet, its use has waned in favour of SSH(Secure Shell) for remote access.
"To telnet" is also used as a verb meaning to establish or use a Telnet or other interactive TCP connection, as in, "To change your password, telnet to the server and run the passwd command".
Most often, a user will be telnetting to a Unix-like server system or a simple network device such as a router. For example, a user might "telnet in from home to check his mail at school". In doing so, he would be using a telnet client to connect from his computer to one of his servers. Once the connection is established, he would then log in with his account information and execute operating system commands remotely on that computer, such as ls or cd.
On many systems, the client may also be used to make interactive raw-TCP sessions. It is commonly believed that a telnet session which does not include an IAC (character 255) is functionally identical. This is not the case however due to special NVT (Network Virtual Terminal) rules such as the requirement for a bare CR (ASCII 13) to be followed by a NULL (ASCII 0).
Protocol details
Telnet is a client-server protocol, based on a reliable connection-oriented transport. Typically this protocol is used to establish a connection to TCP port 23, where a getty-equivalent program (telnetd) is listening, although Telnet predates TCP/IP and was originally run on NCP(Network Control Program).
Various Commands Used By TELNET

telnet to open a TELNET prompt.

telnet > open hostname to connect to a host. Hostname is the machine domain
name (for example, hopper.unh.edu) or the numerical
Internet Address of the machine. In some cases, one may
have to specify a port number.

telnet > help or telnet > ? Result in the Telnet documentation being displayed
that provides on – line help.

telnet > close used to end the Telnet session after logging out of the
or telnet > quit remote machine.

telnet > mode try to enter line or characters.

telnet > send transmit special characters.

telnet > set set operating parameters.

telnet > unset unset operating parameters.

telnet > status print status information.

telnet > slc change state of special characters.

telnet > z suspend telnet.

telnet > ! Invoke a sub shell.

telnet > ? Print help information.

telnet > return leave a command mode.

3.5 FTP

FTP fundamentals
FTP sites are typically used for uploading and downloading files to a central server computer, for the sake of file distribution.
In order to download and upload files to an FTP site, you need to connect using special FTP software. There are both commercial and free FTP software programs, and some browser-based free FTP programs as well.
The typical information needed to connect to an FTP site is:
The "server address" or "hostname". This is the network address of the computer you wish to connect to, such as ftp.microsoft.com.
The username and password. These are the credentials you use to access the specific files on the computer you wish to connect to.

FTP (File Transfer Protocol)

File transfer is an application that allows the user to transfer files between two computers on the Internet or on the same network.

The two most important file transfer functions are:

Ø Copying a file from another computer to the sender’s own computer. (downloading).
Ø Sending a file from user’s own computer to another computer. (uploading).

FTP connection types:

FTP Session typically consists of two type of connection:
Data connection
Control connection

· Control connection goes from the FTP client to port 21 on the FTP server. This connection is used for logon and to send commands and responses between the endpoints.
· Data transfers (including the output of “ls” and “dir” commands) requires a second data connection.
The data connection is dependent on the mode that the client is operating in:
Passive Mode
(often the default for web browsers) -- The client issues a PASV command. Upon receipt of this command, the server listens on a dynamically-allocated port then sends a PASV reply to the client. The PASV reply gives the IP address and port number that the server is listening on. The client then opens a second connection to that IP address and port number.
Active Mode
(often the default for line-mode clients) -- The client listens on a dynamically-allocated port then sends a PORT command to the server. The PORT command gives the IP address and port number that the client is listening on. The server then opens a connection to that IP address and port number; the source port for this connection is 20.

FTP Clients

Graphical File Transfer Clients

These applications display the sending computer’s file system in one window and the receiving computer’s file system in a second window.

Connecting to a remote computer for transferring files involves log in process that asks for:

The hostname or the IP address of the remote computer being connected to,
user ID of the user, and
the password

To transfer a file from one system to another, one can “drag” it using the mouse and “drop” it on the other system.

Double clicking on a file will usually cause it to be automatically transferred to the other side.

Graphical FTP client provides the option for transfer setting mode. Most clients can have a text transfer mode (ASCII) and a binary transfer mode. The mode should be set to text when transferring text files, while it should be set to binary when transferring images, files containing special characters or executable files.

After completing the FTP session, it is closed by clicking on “CLOSE” button and then exit the FTP client by clicking on the “EXIT” button.

Text – based File Transfer Clients

The text – based file transfer clients can use the various commands for transferring files from one system to another.

A file transfer client called File Transfer Protocol (FTP) can be launched by entering the command:

Start-> Run-> FTP

Once an FTP client starts running, a prompt appears as

ftp >

Now the first command to be used is

ftp > open hostname

This will allow you to open a connection to another machine.
After connecting to the server you will be asked for User-name and Password.

ftp> username: anonymous
password: --------

For a public login anonymous as User-name and a valid E-maid ID or Anonymous as Password can be provided.

Various other commands provided are:

ftp > bye Terminate the session and exit the file transfer
program.

ftp > cd change directory.

ftp > get Download a file.

ftp > help view a list of commands or help on a specific
command.

ftp > ls list the files in the current directory.

ftp > put Upload a file.

ftp > pwd print the name of the current directory.

ftp > mget “multiple get” allows to retrieve multiple files.

ftp > mput “multiple put” allows to send multiple files at once.

If the user does not want to be asked whether to transfer each file, he can turn prompt off by using command:
ftp > prompt

To transfer binary files, it should be mentioned by using the command:

ftp > binary

After the completion of FTP session, use the bye command to terminate the session or use the following command for the purpose:

ftp > quit
Web Server
The term web server can mean one of two things:
A computer program that is responsible for accepting HTTP requests from clients (user agents such as web browsers), and serving them HTTP responses along with optional data contents, which usually are web pages such as HTML documents and linked objects (images, etc.).
A computer that runs a computer program as described above.
Common features
The rack of web servers hosting the My Opera Community site on the Internet. From the top, user file storage (content of files.myopera.com), "bigma" (the master MySQL database server), and two IBM blade centers containing multi-purpose machines (Apache front ends, Apache back ends, slave MySQL database servers, load balancers, file servers, cache servers and sync masters.
Although web server programs differ in detail, they all share some basic common features.
HTTP: every web server program operates by accepting HTTP requests from the client, and providing an HTTP response to the client. The HTTP response usually consists of an HTML document, but can also be a raw file, an image, or some other type of document (defined by MIME-types). If some error is found in client request or while trying to serve it, a web server has to send an error response which may include some custom HTML or text messages to better explain the problem to end users.
Logging: usually web servers have also the capability of logging some detailed information, about client requests and server responses, to log files; this allows the webmaster to collect statistics by running log analyzers on these files.
In practice many web servers implement the following features also:
Authentication, optional authorization request (request of user name and password) before allowing access to some or all kind of resources.
Handling of static content (file content recorded in server's filesystem(s)) and dynamic content by supporting one or more related interfaces (SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP.NET, Server API such as NSAPI, ISAPI, etc.).
HTTPS support (by SSL or TLS) to allow secure (encrypted) connections to the server on the standard port 443 instead of usual port 80.
Content compression to reduce the size of the responses (to lower bandwidth usage, etc.).
Virtual hosting to serve many web sites using one IP address.
Large file support to be able to serve files whose size is greater than 2 GB on 32 bit OS.
Bandwidth throttling to limit the speed of responses in order to not saturate the network and to be able to serve more clients.
Load limits
A web server (program) has defined load limits, because it can handle only a limited number of concurrent client connections (usually between 2 and 60,000, by default between 500 and 1,000) per IP address (and TCP port) and it can serve only a certain maximum number of requests per second depending on:
its own settings;
the HTTP request type;
content origin (static or dynamic);
the fact that the served content is or is not cached;
the hardware and software limits of the OS where it is working.
When a web server is near to or over its limits, it becomes overloaded and thus unresponsive.

Future of Information Technology

search this blogs

Tuesday, March 30, 2010

UNIT – 3 World Wide Web

1 comment:

Indiae

About Me

Labels

Blog Archive

Pages

Followers