LinkScan

Single Document LinkScan Reference Manual

 

LinkScan Reference Manual.  

LinkScan Reference Manual. Table of Contents

    Part I. LinkScan Core Capabilities

  1. Introduction to LinkScan
  2. Essential LinkScan Concepts
  3. New LinkScan Installations
  4. Upgrading Existing LinkScan Installations
  5. Basic Scanning
  6. Viewing the Results
  7. LinkScan Status and Error Codes
  8. Scheduling LinkScan
  9. File System Scanning and Orphaned Files
  10. Import Scanning
  11. Advanced and Custom Scanning
  12. Advanced, Custom and Command Line Results
  13. LinkScan Enterprise Extensions
  14. LinkScan Support
  15. Known Problems and Limitations

    Part II. Companion Programs

  16. LinkScan Dispatch
  17. LinkScan Excel
  18. LinkScan Profiler
  19. LinkScan QuickCheck
  20. LinkScan Recorder
  21. LinkScan TapMap
  22. LinkScan WebServer
  23. LinkScan Utilities
  24. Weblint Man Page

    Part III. Appendixes

  25. Glossary of Terms
  26. LinkScan Quick Reference Card
  27. LinkScan and Various Web Servers
  28. LinkScan File Formats
  29. LinkScan Application Notes
  30. LinkScan Revision History
  31. LinkScan License Agreement

Other Documents

Search

You may use this form to perform keyword searches over all of the LinkScan documentation.

Enter search term(s):


Note: This Reference Manual is divided into multiple documents for ease and speed of navigation. However, the contents are also available as a single document suitable for searching and/or printing as the Single Document LinkScan Reference Manual.

LinkScan Reference Manual. Section 1

Introduction to LinkScan

LinkScan™ is an industrial-strength link checking and website management tool. It saves time and money by automating the quality assurance testing of virtually any website or web-based application.

LinkScan is built around applicable open systems standards. Hence it integrates easily with many other content development, management and testing applications as well as general purpose computer tools. It operates on all Microsoft Windows and Unix/Linux platforms and is professionally supported.

LinkScan users include Fortune 1000 companies such as Hewlett Packard, government agencies like NASA, as well as many smaller businesses.

New users will find that LinkScan is extremely simple to install, configure and use. And the more experienced user will appreciate the vast array of customization features built into the system. Together, these attributes make LinkScan ideal for:

Four LinkScan Editions

LinkScan is available in four different editions all based upon the same core technology:

The above descriptions are not complete nor comprehensive. You must read the LinkScan License Agreement for a complete definition of the products and your other rights and obligations.

Using LinkScan

The steps involved in using LinkScan include:

  1. Installing LinkScan
  2. Configuring LinkScan for your environment
  3. Scanning the website to create a LinkScan Database
  4. Viewing the results from the LinkScan Database
  5. Optionally, customizing LinkScan to accomplish additional specific objectives

Each of these steps is described in this Reference Manual. However, we recommend that new users get a fast start by jumping to one of the following pages:

LinkScan Reference Manual. Section 2

Essential LinkScan Concepts

This section introduces some important concepts and terms that are used throughout the remainder of this Reference Manual. These are:

  1. LinkScan Projects
  2. LinkScan Owners
  3. LinkScan Usernames
  4. Scanning Methods
  5. Documents and Links
  6. LinkScan Directory and File Structure
  7. LinkScan Configuration Files
  8. Perl Regular Expressions
  9. relative-path and relative-path-expression

2.1 LinkScan Projects

LinkScan is able to scan multiple websites. You may also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Users/administrators are required to select a Project when scanning, if multiple projects are defined. And, users must select a Project when viewing the results.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

For addition information concerning Projects, how to create them and how to scan them, see Basic Scanning for Windows Systems or Unix Systems.

2.2 LinkScan Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility. LinkScan Owners are defined via the LinkScan Configuration Files, discussed below. By default, LinkScan will create and assign Owners as follows:

This enables users to browse the results selectively so that the reports are smaller and more relevant to their needs. They're also produced more rapidly.

2.3 LinkScan Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

Those wishing to enable these access control features should see LinkScan Access Controls.

2.4 Scanning Methods

LinkScan supports three different scanning methods:

Network HTTP scanning is generally the best mode to use for sites with a large amount of dynamic content: .jsp, .asp files, etc. The File System Scanning method mode enables tracking of "orphaned" files, files which aren't linked to currently, and is more appropriate for sites with limited dynamic content.

2.5 Documents and Links

The LinkScan software, and this document, both maintain a strong distinction between Documents and Links.

Hence an HTML file is a Document containing Links. Dynamically generated web pages, PDF and Flash Files as well as Import Files may also be considered Documents since LinkScan can examine those files for the presence of Links. Images (such as .gif and .jpg files) are not considered documents.

References to sites other than the one being scanned (External Links) are not documents either, since LinkScan does not examine the content of those files for the presence of Links.

2.6 LinkScan Directory and File Structure

The LinkScan system is made up of a number of different file types:

In a basic LinkScan installation these files are organized within the following directory structure:

2.7 LinkScan Configuration Files

LinkScan's operation is controlled by a number of different configuration files. When running LinkScan via the Windows Graphical User Interface, these files are somewhat invisible. However, they still control the execution of the program and you may need to view and edit one or more of these files in order to enable or control some of the more advanced features of the program. On Unix systems, these files represent the primary method of configuring LinkScan. All of the files are formatted in plain ASCII text and may be viewed and modified using the editor of your choice (e.g. Windows Notepad, Unix vi, emacs, pico, nedit, et al).

The most important configuration files are:

This approach provides tremendous flexibility. It means you can establish Global Settings in the Global Configuration File that apply to all Projects. And you may override (single-valued) settings or supplement (multi-valued) settings with additional commands in the Project Configuration File(s); these being Project-specific.

Some additional configuration/control files are discussed elsewhere in this manual. They are used by LinkScan (i.e. do not delete them!) but it is rarely necessary for users to examine or modify them.

All of the configuration files include extensive comments. Comments are signified by the pound sign like this:


# This line contains only a comment

Realcommand = 1   # This comment could describe Realcommand

2.8 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://www.cpan.org/doc/manual/html/pod/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

2.9 relative-path and relative-path-expression

We make extensive reference to these terms in the customization sections of this manual and they are introduced here for your convenience.

Let us assume that we are scanning the website:

http://www.example.com/

An individual document within that website might be:

http://www.example.com/products/widget.html

LinkScan will refer to that page using its relative-path, which in this case, is:

products/widget.html

A relative-path-expression is a Perl Regular Regular Expression that matches relative-path. For example, all of the following will match our widget page:


products/widget.html      # Also matches products/widgetXhtml
products/widget\.html$    # Does not match anything else
(|.*/)widget\.html$       # Matches widget.html in any directory

LinkScan Reference Manual. Section 3

New LinkScan Installations

This section describes the pre-requisites for LinkScan and leads into step-by-step instructions for performing a new installation.

  1. Hardware Requirements
  2. Prerequisites
  3. Installation Step-by-Step

3.1 Hardware Requirements

LinkScan is supported on a wide variety of platforms including:

We do not recommend Windows 95/98/ME for scanning large websites of more than 5000 documents. Although LinkScan has been tested on websites of significantly greater size, performance and stability will be much improved when running under operating systems with a true multi-processing implementation such as Windows NT/2000/XP or Linux/Unix.

Disk and memory requirement depend almost exclusively on the size and nature of the website(s) to be analyzed. However, the following guidelines are intended to assist users with their capacity planning needs:

3.2 Prerequisites

To successfully install and configure LinkScan on your computer you must have:

  1. An appropriate version of Perl Version 5 installed on your computer. You may download a version suitable for your system via:

    http://www.elsop.com/perl/

  2. A copy of the LinkScan software and a LinkScan License Key. Both are available from:

    http://www.elsop.com/linkscan/dleval.cgi

3.3 Installation Step-by-Step

We recommended that new users get a fast start by jumping to one of the following pages:

LinkScan Reference Manual. Section 4

Upgrading Existing LinkScan Installations

This section describes how to upgrade an existing LinkScan installation to LinkScan Version 9.0.

  1. Upgrading Existing Windows Installations
  2. Upgrading Existing Unix Installations

4.1 Upgrading Existing Windows Installations

4.2 Upgrading Existing Unix Installations

LinkScan Reference Manual. Section 5

Basic Scanning

In order to scan and analyze a website, LinkScan must first be configured and then executed. During the course of the scan, LinkScan will build a database. Once the scan is complete, many different reports may be generated from the database using an interactive web browser-based interface, or from the command line.

The steps involved in scanning a website are:

  1. Creating or selecting a LinkScan Project
  2. Providing an essential definition for the Project
  3. Configuring other optional parameters for the Project
  4. Executing the scan

Every LinkScan Project must have:

To create, configure and scan a Project, two interfaces to LinkScan are provided:

LinkScan Reference Manual. Section 5

Basic Scanning with the Command Line Interface

This section describes how to create, configure and scan a LinkScan Project using the command line interface.

Before executing the LinkScan programs you must set the current working directory:

web:/> cd /usr/www/htdocs/linkscan/
web:/usr/www/htdocs/linkscan>

Creating a New Project

To create a new Project, simply execute the main LinkScan program (linkscan.pl) with the -newproject command line option:

web:/usr/www/htdocs/linkscan> perl linkscan.pl -newproject newproj

[...]

This Will Create the New LinkScan Project: newproj

The answers to the following questions are accepted verbatim without
validation. Please type carefully. <Control-C> to abort and start again.


Enter Homedir: 
Enter Home URL: http://www.example.com/index.html
Enter Organization: My Department
Enter Project Description: My First Test
** Status: Project newproj Created Successfully
web:/usr/www/htdocs/linkscan>

Configuring a Project

To configure a Project, simply edit the appropriate Project configuration file using your editor of choice:

web:/usr/www/htdocs/linkscan> vi ./newproj/linkscan.cfg

Note that lines starting with a pound sign (#) are comments.

In the simple case of scanning a website using the normal Network (HTTP) Scanning Method, you would only need to configure Homeurl with the URL to the root of the website, and Homefile with the filename (relative to server root) of the starting page. Be sure to leave Homedir blank since this will force LinkScan to use Network (HTTP) Scanning.

[...]
Homedir = 
Homeurl = http://www.example.com/
Mirrorurl = 
Homefile = index.html
Projectdesc = My First Test
Organization = My Department
[...]

This will scan the entire site www.example.com from it's starting page, index.html. The Homeurl parameter should always be the "root" URL of the site being scanned. To specify scans for sub-level areas, add information the Homefile parameter. For example, using the same Homeurl as above, and setting:


Homefile = recommendations/external/index.html

would start the scan at:

http://www.example.com/recommendations/external/index.html

Scanning a Project

To scan a Project, simple execute the main LinkScan program. You may specify the Project on the command line as shown below. Otherwise LinkScan will prompt you to select from the available list of valid Projects.

web:/usr/www/htdocs/linkscan> perl linkscan.pl -project newproj

LinkScan Enterprise Version 9.0 Unix.

[...]

** Status: LinkScan is Starting Processes...
** Status: Started 3 Processes...
** Status: LinkScan is Scanning Internal Links...
Processing  URL: 
Processing  URL: about.html
Processing  URL: linkscan/
Processing  URL: linkscan/dleval.cgi
Processing  URL: linkscan/order.cgi
Processing  URL: linkscan/support.html
[...]

You have now completed a scan of the website and LinkScan has created a Database for that Project. Next you will want to examine the findings by following the steps described in Viewing the Results.

Other Options

Run the main LinkScan program with the -help option to see a short listing of the available command line switches:

web:/usr/www/htdocs/linkscan> perl linkscan.pl -help
LinkScan Version 9.0 Unix
Copyright 1997-2001 Electronic Software Publishing Corporation

USAGE: linkscan.pl  {-help} {-alllinks} {-fast} {-home pathname} {-http}
       {-newproject name} {-noexternal} {-noorphans} {-project name}
       {-quiet} {-remote URL} {-retest}

-help            Displays this message
-alllinks        Check all external links [Override: Maxgoodhours etc]
-fast            Use larger number of processes to speed testing
-home pathname   Specify starting page [Override: Homefile in linkscan.cfg]
-http            Use HTTP navigation [Equiv: Execute .* and -noorphans]
-newproject name Create a new LinkScan Project
-noexternal      Test internal links only [Default: Internal and External]
-noorphans       Disable checking for orphaned files
-project name    Select a LinkScan Project
-quiet           Reduce verbosity of progress/status messages
-remote URL      Specify Remote Site [Equiv: -http; Override: Homeurl/Homefile]
-retest          Repeat last test, rechecking only those links that failed
Detailed Help [Y/N]:n

LinkScan Reference Manual. Section 5

Basic Scanning with the Windows Graphical User Interface

This section describes how to create, configure and scan a LinkScan Project using the Windows graphical user interface.

Creating a New Project

  1. From the Main LinkScan Window, select an existing Project from the displayed list of Projects and click New.

  2. You will be prompted for a Project Name, Description and Organization.

  3. The new Project will be created by cloning the originally selected Project (or the default Project if none was selected).

Configuring a Project

  1. From the Main LinkScan Window, select an existing Project from the displayed list of Projects and click Edit.

  2. On the Edit Project Dialog you must:

    Scanning Method: We recommend that you use the Network (HTTP) Scanning method, at least initially. This method is frequently the most appropriate and is also the simplest to configure. Optionally, you may also configure LinkScan to check for Orphaned Files but this requires a more detailed knowledge of your server environment and again we suggest you defer this until you are more familiar with LinkScan.

    In many cases, using the Network (HTTP) Scanning method, you will only need to supply the URL of the target website for LinkScan to complete a full analysis. Later, you may wish to explore the Orphaned File and File System Scanning and Import Scanning. capabilities of LinkScan.

    Review the status of the Case Sensitive Pathnames checkbox. This tells LinkScan whether to treat index.html and INDEX.HTML, for example, as a single file or two different files. In general, this box should be checked when scanning websites hosted on Unix servers and unchecked when scanning websites hosted on Windows servers.

    Also note the status of the Onlyinclude setting. Typically, this will be blank. However, if you enter a URL such as:

    http://www.example.com/Products/index.html

    LinkScan will automatically enter Products/ in the Onlyinclude box. This means that LinkScan will only scan the Products/ area (and below) of the website. It will not travel up to the root of the site and down into other regions. Simply clear the contents of the Onlyinclude box if you want LinkScan to explore the entire website. Many other features available to control the scope of a scan are described later in this document.

  3. Click OK to save the settings or Cancel to discard them.

Some users will find it instructive to explore the Advanced button on the Edit Project dialog box. This will open a Notepad window onto the configuration file associated with the selected Project (e.g. C:\LinkScan\someproject\linkscan.cfg). This will display the current configuration as it is stored by LinkScan. You may need to edit this file later to customize LinkScan and enable certain specific features. Note that lines starting with a pound sign (#) are comments. You will also find comment lines describing very briefly (as an aide memoir) each of the available customization commands. More detailed documentation is provided later in this manual.

Scanning a Project

  1. From the Main LinkScan Window, select an existing Project from the displayed list of Projects and click Scan.

  2. LinkScan will display the Scanning Dialog which enables you to monitor progress as the scan proceeds. On the progress display, note the distinction between Documents and Links.

  3. On completion of the scan, the Cancel button will change to an OK button and the system will beep. Press the OK button to dismiss the Scanning Dialog box.

You have now completed a scan of the website and LinkScan has created a Database for that Project. Next you will want to examine the findings by following the steps described in Viewing the Results.

LinkScan Reference Manual. Section 6

Viewing the Results

Once a Project has been scanned and a database created, a wide range of different reports are available.

This document describes those reports and how to view them interactively using a simple web browser-based interface. Note that a batch command-line interface is also available. See Section 12 of this manual.

To view the reports interactively:

The first time you access the results, you will be presented with the LinkScan Login and Preferences Menu. Simply click Login Now. No username is required unless you later decide to enable various LinkScan security features.

Once you have logged in, you will be presented with the LinkScan Main Menu.

Report Selection

You must select one of the individual Reports and submit the form by pressing Select Report.

A help page is available for each type of LinkScan Report. You may view the appropriate help page at any time by using the Help option on the context-sensitive LinkScan Toolbar. You may also use the [?] links on the LinkScan Main Menu, or the links provided in the summary table below.

The most frequently used reports have been organized in the left hand column; we suggest new users start there. Also, many of the reports incorporate hyperlinks to other reports. This means you can use a drill-down paradigm to view more detail associated with a specific problem or document. For example, some users may never explicitly select a LinkScan/QuickCheck Report. But they will likely view reports of that type by following the [Src] links from other reports.

Summary of Available Reports

Project Summary Report
Summary statistics for the current project
Summary of All Projects Report
Summary statistics for all configured projects
Problem Documents Report
List documents containing potential problems
Selected Status Codes Report
List errors of specific types
Document Detail Report
List all/selected documents
All Pages Linking To ... Report
Find pages that link to...
Critical Errors Report
List most critical errors
Orphaned Files Report
List orphaned files
Detailed Errors Report
List all/selected errors
External History Report
View history of an external link
Changed Documents Report
Compare two scans of the current project
Redirections Report
List a summary of redirections
Search Documents Report
Ad hoc searching: document-centric
System Configuration Report
Display current LinkScan configuration settings
Search Links Report
Ad hoc searching: link-centric
LinkScan/QuickCheck
View source code and detailed analysis of a document
SiteMap Report
Display LinkScan SiteMap
LinkScan/TapMap
Display LinkScan TapMap

Owner Selection

The LinkScan Main Menu may include an Owner Selection Box. If enabled, this option will allow you to select a sub-set of the website to which subsequent reports will apply.

In a default configuration, the Owner Selection Box will include entries for each top-level directory scanned, in addition to the special entry "All". This will be the default selection and subsequent reports will apply to the entire website scanned.

Note however, that the LinkScan Administrator may configure and customize the manner in which Owners are created. Hence your installation may appear and behave somewhat differently from that described herein.

SubMenu Selection

In many cases, when you submit the form by pressing Select Report you will be presented with a second menu of options. Initially, we suggest you accept the default options which have been carefully designed to produce excellent results in the vast majority of situations. However, to learn more, you may use the context-sensitive Help button on the LinkScan Toolbar at any time.

LinkScan Toolbar

Each of the LinkScan Menus and Reports includes a common LinkScan Toolbar. It contains a number of links:

 Main Menu   Preferences   Advanced   Help   Reference   HowTo   Card 

The Main Menu link will always return you to the LinkScan Main Menu.

The Preferences link will always take you to the LinkScan Login and Preferences Menu.

The Advanced link appears when appropriate and it will cause the current menu to be redrawn with additional options.

The Help link will display an appropriate section of the LinkScan Documentation depending upon the current context.

The Reference link will display the table of contents for the LinkScan Reference Manual.

The HowTo link will display a brief How To Guide with instructions for completing certain Common Tasks.

The Card link will display the LinkScan Quick Reference Card.

LinkScan Reference Manual. Section 7

LinkScan Status and Error Codes

The following section describes each of the LinkScan Error and Status Codes. Each Status Code is assigned to one of six Severities:

Symbol Code Severity Explanation
* 0 Unknown: LinkScan has not tested or was unable to test this link
* 1 Error: LinkScan found a hard error on this link
* 2 Possible Error: There may be a problem with this link. It should be retested at a later time
* 3 Warning: LinkScan found something unusual about this link. Manual inspection highly recommended
* 4 Advisory: This link is probably ok, but manual inspection recommended
* 5 No Error: This is a good link

The Severity associated with any specific Error or Status Code may be customized by the LinkScan Administrator through the use of the Statuscode option.

Status codes in the range 0-99 are generated exclusively by LinkScan and generally refer to the status of local links (HTML files, Non-HTML files, etc.).

Status codes in the range 100-699 are defined exclusively by the HyperText Transfer Protocol.

Status codes in the range 800-3099 are generated exclusively by LinkScan and generally refer to Networking Problems (Failed DNS lookups, failure to connect to a remote server or timeouts) as well as some other LinkScan detected warning or advisory messages.

* No Status (0)

* HTML File (1)

* Error: Bad HTML File (2)

* Non-HTML File (3)

* Error: Bad non-HTML File (4)

* Anchor (5)

* Error: Bad Anchor (6)

* Warning: Orphaned HTML File (7)

* Warning: Orphaned non-HTML File (8)

* Imagemap File (9)

* Error: Bad Imagemap File (10)

* Valid Mailto Link (11)

* Possible Error: Invalid Mailto Link (12)

* Warning: Missing / (13)

* Warning: Unprocessed SSI (14)

* PDF File (15)

* Error: Bad PDF File (16)

* Warning: No Closing /a (17)

* Error: Invalid Scheme (18)

* Advisory: No Alt/Height/Width (20)

* Flash File (21)

* Error: Bad Flash File (22)

* Continue (100)

* Switching Protocols (101)

* Good URL (200, 201, 202, 203, 205, 206)

* Error: No Content (204)

* Error: Multiple Choices (300)

* Error: Moved Permanently (301)

* Advisory: Moved Temporarily (302)

* Error: Network/Server Error (303, 304)

* Error: Use Proxy (305)

* Error: Network/Server Error (400)

* Warning: Unauthorized (401)

* Warning: Payment Required (402)

* Warning: Forbidden (403)

* Error: Not Found (404)

* Error: Method Not Allowed (405)

* Error: Not Acceptable (406)

* Error: Proxy Authentication Required (407)

* Possible Error: Request Timed Out (408)

* Error: Conflict (409)

* Error: Gone (410)

* Error: Length Required (411)

* Error: Precondition Failed (412)

* Error: Request Entity Too Large (413)

* Error: Request URI Too Large (414)

* Error: Unsupported Media Type (415)

* Possible Error: Server Error (500)

* Possible Error: Not Implemented (501)

* Possible Error: Bad Gateway (502)

* Possible Error: Service Unavailable (503)

* Possible Error: Gateway Timed Out (504)

* Possible Error: HTTP Version Not Supported (505)

* Possible Error: Network/Server Error (600, 601, 602, 603)

* Advisory: Skipped - Recently Test (800)

* Possible Error: Skipped - Bad Server (801)

* Advisory: Skipped - FTP Limit (802)

* Advisory: Skipped - CGI Limit (803)

* Possible Error: No DNS Entry (900)

* Possible Error: DNS Error or Timeout (901)

* Possible Error: Failed to Connect (902)

* Possible Error: Timed Out (903)

* Warning: Missing / (904)

* Warning: Probably OK (905)

* Warning: Contains an IP Address (906)

* Error: Multiple Redirections (907)

* Warning: Missing / (908)

* Error: Disconnected (909)

* Warning: Location Not Absolute (910)

* Error: Unsafe Character (911)

* Advisory: SSL Server Path Not Checked (912)

* Advisory: Simulated Redirect (913)

* Warning: Meta Redirect (914)

* Warning: Meta Loc not Absolute (915)

* Advisory: LDAP Server Query Not Checked (916)

* Error: No Headers Seen (917)

* Error: Error Creating Socket (990)

* Error: SSL Error (991)

* Error: Unknown (999)

* Error: FTP Error (1000)

* Error: Bad Syntax (2000)

* Error: SMTP No Such User (2001)

* Warning: SMTP Mailbox Full (2002)

* Possible Error: SMTP Failure (2003)

* Error: Errordoc Match (3000)

* Error: Errorbody Match (3001)

* Error: Profiler Match (3002)

LinkScan Reference Manual. Section 8

Scheduling LinkScan to Run Automatically

You may use a system scheduler to execute LinkScan automatically at pre-determined times:

LinkScan Reference Manual. Section 8

Scheduling LinkScan on Unix Systems

The following example is provided to assist those users who wish to run LinkScan as a cron job. The crontab system is a standard Unix utility that enables jobs to be executed automatically according to some regular schedule. On most Unix systems, see man crontab or man 5 crontab for help.

  1. Save any existing configured cron jobs to a file (for example, cron.job) using the following shell command:

    crontab -l > cron.job
    
  2. Edit the file cron.job and append an additional entry for LinkScan containing something like:

    40 8 * * 0,1,2,3,4,5,6 /usr/linkscan/linkscan.cron
    

    This will execute /usr/linkscan/linkscan.cron at 08:40am each day. Adjust the pathname to linkscan.cron accordingly.

  3. Submit this to the crontab system with the following shell command:

    crontab cron.job
    

    You can check that it's been scheduled with:

    crontab -l
    
  4. Edit the linkscan.cron file -- the following example file is automatically installed in the LinkScan directory:

    #!/bin/sh
    # Set current working directory
    cd /usr/linkscan/
    # Execute LinkScan
    /usr/local/bin/perl linkscan.pl -project proja
    /usr/local/bin/perl linkscan.pl -project projb
    

    Please note the following points:

LinkScan Reference Manual. Section 8

Scheduling LinkScan on Windows Systems

LinkScan is compatible with virtually any existing Windows scheduling utility.

  1. Using Notepad or a similar editor, simply edit the file linkscan.bat which is automatically installed in the LinkScan folder. This basic Windows BATCH file must set the current working directory to the LinkScan folder and execute LinkScan for each required Project.

    REM Set current working directory
    CD /D C:\LinkScan\
    REM Execute LinkScan Phase 1
    call perl linkscan.pl  -project myproject -manual
    REM Execute LinkScan Phase 2
    call perl linkscan2.pl -project myproject
    REM Execute LinkScan Phase 1
    call perl linkscan.pl  -project myotherproject -manual
    REM Execute LinkScan Phase 2
    call perl linkscan2.pl -project myotherproject
    

    Please note the following points:

  2. Finally, configure your Windows Scheduler to execute the file:

    C:\LinkScan\linkscan.bat
    

    according to the required schedule. LinkScan is compatible with almost all Windows Schedulers -- for example, the one you use to scan your system for viruses. Windows 2000 users may wish to use the standard system scheduler which works rather well. See Control Panel | Scheduled Tasks.

LinkScan Reference Manual. Section 9

File System Scanning and Orphaned Files

LinkScan incorporates the ability to examine the files on your local hard drive and interpret them in a manner very similar to a web server. This capability has two major applications:

Configuration is inherently significantly more complex when compared to normal Network (HTTP) Scanning. In particular, you must configure the following items:

If you do not configure the File System Pathnames, LinkScan will automatically use Network (HTTP) Scanning. It will also disable the Orphaned File checking.

If you wish to enable Orphaned File checking and use Network (HTTP) Scanning, you must configure the File System Pathnames to enable orphan checking. Simply select Network Scanning on the Edit Project dialog (Windows systems) or with Http = 1 (Unix systems).

This is best illustrated by example:

# Map the server root
# http://www.example.com/index.html  <==> /usr/www/htdocs/index.html

Homeurl = http://www.example.com/
Homedir = /usr/www/htdocs/
Homefile = index.html

# http://www.example.com/cgi-bin/    <==> /usr/www/cgi-bin/
# http://www.example.com/~username/  <==> /home/username/public_html/

Alias cgi-bin/ /usr/www/cgi-bin/
Alias ~([^/]+)/ /home/$1/public_html/

# Hide hidden files and directories from the Orphans Report

Noorphans (\.|.*/\.)

# The following are significant (but default) settings

Execute cgi-bin/             # Test cgi-bin/ via HTTP
Execute (?i).*\.(cgi|asp)$   # Test .cgi and .asp files via HTTP

Htmlfiles = html, shtml, htm
Mapfiles = map
Pdffiles = 
Flashfiles = swf
Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm

Indexoptions = 0             # Disallow directory listings
Expandssi = 1                # Expand Server Side Includes
Autohttp = 0                 # Disable automatic HTTP retry
Maxdirlevels = 10            # Don't explore file system beyond 10 levels

On Unix systems only, the Alias directive supports the special !HOME expression:

Alias ~([^/]+)(/|$) !HOME/public_html/

A reference to ~someuser/ will be Aliased to !HOME/public_html/. Then, !HOME will be replaced by the someuser's Home Directory which is determined via a lookup of /etc/passwd.

Remote File Systems

In some cases, the file system directories containing the web site may reside on a physically different computer from LinkScan. In these cases, LinkScan will support Network File System pathnames (subject to any locally imposed security controls).

In other cases, the file system of the remote system may not be visible via the network, quite possibly for security reasons. LinkScan will be unable to scan the remote computer using the File System Scanning Method. You must use Network (HTTP) Scanning.

However, it is still possible to enable Orphaned File checking. In summary, you will need to execute a small, self-contained Perl program on the remote computer. It will assemble a "picture" of the file system and save it as a simple ASCII file. That file may be transferred to the LinkScan computer using FTP (or any other more secure technique) and used to perform the orphan analysis in lieu of direct access to the remote server.

  1. Fully configure the selected Project as if your were using File System Scanning on your local machine. However, when setting the pathname to the root of the target webserver, (and any associated Aliases) use the pathname conventions applicable to the remote server.

  2. In the Project configuration file, force LinkScan to use normal Network (HTTP) Scanning by setting:

    
    Http = 1
    
  3. Set the Orphanfile setting in the Project configuration file to the full pathname of a file on your local computer. For example:

    
    Orphanfile = C:/LinkScan/someproject/orphans.list
    
  4. Transfer the following files to the remote server:

    
    C:/LinkScan/lsfind.pl
    C:/LinkScan/someproject/linkscan.cfg
    
  5. On the remote server, execute the lsfind.pl program:

    
    perl lsfind.pl orphans.list
    
  6. Transfer the orphans.list file back to the LinkScan machine.

  7. Initiate a scan of the target website in the normal manner. LinkScan will use the orphans.list file from the remote server in lieu of scanning the file system on the local server.

LinkScan Reference Manual. Section 10

Import Scanning

The LinkScan Import function may be used to:

When processing a list of Links each URL is checked in turn and its status stored in the LinkScan database. When processing a list of Documents, each document and every link within that document is checked and its status stored.

The import function offers enormous flexibility. To use this feature, carry out the following steps:

  1. Prepare the Import File

    LinkScan will import a simple ASCII file of the following format:

    URL ... one or more tab characters ... URL-Description

    URL's may be absolute, or relative to the Home URL for the current server. The URL-Description is imported and carried through to the LinkScan Reports for identification purposes. You may use any ASCII string, for example a database record number.

    An alternative field separator may be specified by including a special command as the first line of the file:

    ## \s+

    The command starts with '##' in column one followed by a Perl expression that specifies the field delimiter. In the example above, '\s+' means one or more whitespace characters (tab or space).

    Lines with a '#' in column one, and blank lines, are ignored as comments.

  2. Configure LinkScan

    To use the Import Function, open the linkscan.cfg file for the appropriate Project, and edit the Importfile setting. Supply the full pathname to the prepared ASCII import file. For example:

    
    Importfile = /usr/home/linkscan/importfiles/test.txt
    

    Then select the import mode by changing the Import setting. Valid values are:

    Import = 0 Import mode disabled
    Import = 1 Import a list of links
    Import = 2 Import a list of documents
    Import = 3 Import a list of documents with caching disabled

  3. Special Considerations

    LinkScan de-duplicates the list of links within an Import Document list. This means that LinkScan will validate each unique URL within the list only one time.

    However, you may force LinkScan to process an Import Sequence so that the same URL or document is checked more than once. This may be achieved by adjusting the URL's to make them appear unique. Note that this also provides a means by which to differentiate the test results for each step. Simply edit the URL's to make them unique by adding dummy name-value pairs to the query string of the URL's:

    http://www.example.com/cookie_sensitive?dummyseq=1
    [...]
    http://www.example.com/set_cookie
    [...]
    http://www.example.com/cookie_sensitive?dummyseq=2

    If the URL's already include a query string, simply append the additional parameter to the existing query and change:

    http://www.example.com/foo?name=value

    to:

    http://www.example.com/foo?name=value&dummyseq=1

    Normally, LinkScan maintains the status of each link in a cache while it scans a site. This dramatically improves performance since LinkScan does not need to re-check commonly used images and other components over and over. However, it may also be undesirable with some stateful sequences. For example, if the same URL produces a completely different result before and after a cookie is set.

    In those situations, you may use a special option (Import = 3) which will force LinkScan to flush its cache after each imported document has been validated.

LinkScan Reference Manual. Section 11

Advanced and Custom Scanning

LinkScan incorporates many powerful customization features described below.

  1. How to control the scope of a scan
  2. How to handle authentication schemes
  3. How to scan additional pages and submit forms
  4. How to validate JavaScript and drop-down lists
  5. How to handle special Error documents
  6. How to manipulate URLs on-the-fly
  7. How to emulate different browser types
  8. How to remap different hosts
  9. How to assign documents to Owners
  10. How to process additional per-document data
  11. How to control the testing of external links
  12. How to scan very large sites
  13. Other miscellaneous customizations

Hint: We strongly recommend that you read Essential LinkScan Concepts before studying this section of the Reference Manual.

11.1 How to control the scope of a scan

You may use any combination of the following commands to include or exclude specific areas of the target website.


Exclude relative-path-expression
Exclude absolute-url-expression
Nofollow relative-path-expression
Onlyfollow relative-path-expression
Onlyinclude relative-path-expression
Maxlevels depth
Maxclicks depth

Exclude: The Exclude command may be used to completely ignore specific links. You may supply a relative-path-expression to exclude Internal Links, or an absolute-url-expression to exclude External Links.

Nofollow: The Nofollow command may be used to provide even finer control over LinkScan's behavior. When LinkScan encounters a link matching a Nofollow command, it will validate the link (and check for any <a name = ... > tags if appropriate). However, it will not test any links that lead from the target document.

For greater flexibility and completeness, the Onlyinclude and Onlyfollow commands are also supported.

Onlyinclude: is logically equivalent to "Exclude everything except".

Onlyfollow: is logically equivalent to "Nofollow everything except".

Maxclicks: A command such as Maxclicks = 3 will limit the depth of the scan to three directory levels under server root.

Maxlevels: A command such as Maxclicks = 3 will limit the depth of the scan based on the number of clicks from the start of the scan. In order to more closely model the real user experience, LinkScan does not include clicks that result from following framesets or redirections.

The following rules of precedence apply when using multiple commands in combination:


Example 1:

Exclude http://www.domain.com/
Exclude test/

All links to "http://www.domain.com/" and all files in the local "test/" subdirectory will be ignored by LinkScan.


Example 2:

Nofollow user2/

LinkScan will check the links to files in the "user2/" directory, but it will not examine the content of any documents within the "user2/" directory or test any of the links contained within them.


Example 3:

Onlyfollow user1/

LinkScan will check the documents in the local "user1/" subdirectory and test the links to files in other local directories. However, LinkScan will not examine the content of any documents that lie outside of the local "user1/" directory or test any of the links contained within them.

Dynamic content

On websites that incorporate a high proportion of dynamic content it may not be productive to test any or all scripts with large number of query parameters or other variations. Controls are provided.

Maxcgi: The maximum number of times any single URL should be probed with different query parameters. This prevents LinkScan from trying to validate a CGI script or dynamic page with a potentially infinite number of query parameters.
[Default: Maxcgi = 100 ]

Taglimit: The Taglimit command may be used to provide even finer control over the number of times clusters of URL's are probed. Syntax and example:


Syntax:

Taglimit relative-path-expression maxnumber

Example:

Taglimit scripts/DatabaseLookup.asp 20

LinkScan will only attempt to parse 20 documents matching the pattern "scripts/DatabaseLookup.asp". Any further links matching the specified pattern will be completely ignored.

11.2 How to handle authentication schemes

Many websites include some form of access control or user authentication features. In general, these arrangements use one of two mechanisms defined by the HTTP protocols. Both are supported by LinkScan. They are:

In the case of HTTP Authentication, when a user attempts to access a protected area, their browser will present a challenge in the form of a pop-up dialog box that requires a username and password to be entered. In the case of cookie-based arrangements, the user is normally required to login by filling out an HTML form and submitting it.

HTTP Authentication

For sites that require HTTP Authentication, you must configure LinkScan with an appropriate Auth command:


Syntax:

Auth server-name "realm-name" username password

Examples:

Auth www.example.com "" guestuser xxxxxx
Auth app.example.com "Controlled Access" guestuser xxxxxx

You must include a realm-name (enclosed in double-quotes) but it may be empty. In that case, LinkScan will use the configured username and password for any realm on the target server. This is the recommended approach unless your server uses multiple realms with different access control rules for different portions of the website.

Cookie-based Authentication

HTTP access to some sites is controlled via authentication schemes requiring Cookies. For more information regarding Cookies see the Netscape Cookie Specification at http://www.netscape.com/newsref/std/cookie_spec.html.

LinkScan will automatically accept and return all valid cookies received during the course of a scan. However, to gain access to the site, you may need to configure LinkScan to ensure that the appropriate cookies are set. This may be achieved by one of two techniques:

The submissions of a login form may be configured using the Extrahome command (described in the next section). However, you may optionally initialize LinkScan's collection of stored cookies (aka Cookie Jar) with one or more permanent Cookies by using the Cookie command:


Syntax:

Cookie server-name cookiename=cookievalue

Example:

Cookie www.elsop.com LinkScan=cookie_value;

Note: Do not enter space characters around the '=' character

The server-name is the name of the server to be tested. For security reasons and in compliance with the applicable standards, LinkScan will only send the cookie when the specified server-name exactly matches the hostname portion of the requested URL. In this context, server names and their corresponding IP addresses are considered to be different (consistent with all major browsers). The cookie names and values must be reverse engineered from your server code or "discovered" via your browser by enabling the "Prompt before accepting cookies" or examination of stored cookies on disk.

Hint 1: Sites with especially complex schemes (multiple levels of access control, subscription expirations etc.) might consider configuring their server and/or scripts to recognize a "super-user-cookie" specifically for testing purposes. This approach may also be used to trigger test points within server-based scripts and greatly improve the meaningful testability of complex dynamic content.

Hint 2: HTTP Authentication and Cookie related transactions are logged by LinkScan during the course of the scan. You may examine the following file to view the log: .../LinkScan/Projectname/data/linkscan.red

11.3 How to scan additional pages and submit forms

You may configure LinkScan to examine additional documents that would not normally be found during the scan and might otherwise be reported as orphaned files. The same technique may be used to submit forms on your website with specific data values for testing purposes. This is achieved with the Extrahome command.

LinkScan may be configured to submit a form using either the GET or POST methods. Pages that require the GET method are specified with a normal URL and query string. Pages that require the POST method are specified in a similar manner except that the query character (?) is replaced with a double-query (??).


Syntax:

Extrahome relative-path-expression

Examples

Extrahome somedir/staticdoc.html
Extrahome cgi-bin/getscript.cgi?Var1=aaa&Var2=bbb
Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret

In this example, LinkScan will access the first URL as a regular document. The second and third will use the GET and POST methods respectively. Note that the query strings must be %encoded according to normal conventions. The ability to define complete form submissions with a single-line command greatly simplifies maintenance compared to other techniques for the automated testing of complex content.

Hint: Use the LinkScan Recorder to automatically capture the correctly constructed URL's.

Hint 2: When using the Extrahome command to submit a login form to provide access to a site, you may also need to configure LinkScan so that it doesn't immediately "click" any LOGOUT button which would invalidate the newly created session.

11.4 How to validate JavaScript and drop-down lists

LinkScan may be configured to interpret the contents of drop-down lists as links to other pages. The HTML specification does not define a standard method for indicating that a drop-down list contains hyperlinks (as opposed to regular data). Hence LinkScan needs some other "cue" and may be triggered by pattern matching of attributes within the SELECT tag. Consider, for example, the following:


<select name="URLLIST">
<option value="/products/" Selected> Relative URL to Products
<option value="http://www.mydomain.com/services/"> Absolute URL to Services
</select>

To instruct LinkScan to treat the contents of the drop-down list as URL's, use the following command:


Selecturl URLLIST

LinkScan will examine all SELECT tags and look for a Regular Expression match on the NAME attribute. If the match is successful (URLLIST in this example) LinkScan will treat each OPTION tag within the list as a hyperlink and validate it accordingly.

LinkScan includes the ability to validate links contained within JavaScript code. A relatively simple pattern matching technique is used -- LinkScan does not contain a full JavaScript interpreter. This means that LinkScan may "miss" some links or find "false positive errors" especially if the code creates the hyperlink references dynamically at run-time. The following Scriptmatch and Scriptnomatch commands give excellent results in most cases. However, you can customize the matching rules by changing these expressions and/or adding new ones.


Scriptmatch = (\w+://\S+|\S+/$|\S+\?\S+|\S+\.([a-z]{2,3}|[js]?html?|Z)$)
Scriptnomatch = .*([\(\)\[\]\{\}\']|document\.\S+|\.(src|com)$)

Some JavaScript constructs may still produce false errors. You may force LinkScan to ignore complete script blocks that match a specified pattern. For example:


Scriptexclude function\s+ZoomWindow

The above command will force LinkScan to ignore script blocks that contain a definition for the ZoomWindow function.

11.5 How to handle special Error documents

Many websites are constructed with special user-friendly error pages, sometimes known as "custom-404 documents". Some servers will deliver the error document directly whereas others may force a redirection to a specific error document. In either case, an issue arises if your server delivers the error document with a 200 OK response code. LinkScan (or any other link checker) would not be able to detect the error condition.

A similar issue arises with some dynamically generated documents. For example, a Java applet may encounter a run-time error condition after it has already sent a 200 OK response code to the client.

Hence LinkScan supports two special commands that may be used to detect such conditions and force a 404 Not Found error, regardless of the HTTP response code produced by the server/application. The first is used with servers that force a redirection by pattern matching on the HTTP Location: header. The second operates by pattern matches on the document bodies.


Syntax:

Errordoc pattern
Errorbody pattern

Examples:

Errordoc special/notfound\.html
Errorbody (?i).*runtime\serror

In the Errordoc example, LinkScan will report as 404 Not Found any URL that is redirected to http://your.server/special/notfound.html. In the Errorbody example, LinkScan will report as 404 any document that contains the string runtime error in the document body. Note the (?i) makes the pattern match case-insensitive.

Hint: The Errorbody pattern match is carried out on the entire document, including comments. Developers might consider including a standard error string within comment tags that may be used to trigger the Errorbody match.

11.6 How to manipulate URLs on-the-fly

One of the most powerful (and complex) customization features of LinkScan concerns the real-time manipulation of links during the course of the scan. This is typically used to control the testing of sites with complex dynamic content. The basic commands available are:


Sessionmatch expression
Substitute relative-path-expression expression owner
Substituteraw relative-path-expression expression

We shall consider a number of examples which may be adapted according to your specific needs.

Example 1

Consider a site that produces links such as:


http://www.example.com/page1.asp
http://www.example.com/page1.asp?Print

It is entirely possible that page1.asp has been designed in such a manner that it delivers the same basic content with minor variations in formatting depending upon the presence or absence of the Print query string. One might configure LinkScan with:


Substitute (.*\.asp)\?Print $1

Whenever LinkScan encounters a link matching the specified pattern it will make the substitution indicated before it tries to validate or follow that link. In this example, a link to:

http://www.example.com/page1.asp?Print

will immediately be transformed to:

http://www.example.com/page1.asp

Note, however, this is not the same as Excluding links which contain the Print query string; that would cause LinkScan to simply ignore the link. In this case, LinkScan will process the link but transform it on-the-fly during the scan.

Example 2

Next we will consider a significantly more complex scenario.


Sessionmatch .*&token=([^&]+)
Substitute (.*&token=)[^&]*(.*)$ $1!S$2

In this case, we use the special Sessionmatch command to capture and save the first value of the query parameter token that LinkScan sees. This is most likely some kind of session number assigned by the target server immediately following the submission of a login form. The Substitute command then instructs LinkScan to replace all subsequent values of token with the saved value (represented by the special parameter !S).

In this scenario, LinkScan ensures that the value of token can never change during the course of the scan from the originally assigned value.

Example 3

Next we'll consider a JSP site that produces URL's with the following structure:


http://www.example.com/content?A=123&B=456&C=789&D=XYZ

It may not be productive or efficient for LinkScan to scan all of the pages using every combination and permutation of values for the parameters A, B, C, D... etc.. We can control that by manipulating the individual name-value pairs during the scan. For example:


Substitute (content\.jsp\?.*)&B=[^&](.*) $1&B=456$2
Substitute (content\.jsp\?.*)&C=[^&](.*) $1$2
Taglimit content\.jsp\?.*&D= 20

The first command fixes the value of B=456. Whatever value the parameter B takes on during the scan, LinkScan will force the value back to 456. The second command deletes any references to the C parameter from every link that it finds. We have also included the third Taglimit command; this will cause LinkScan to completely ignore the twenty-first and subsequent links that include a D parameter. In other words, in this case, we only want to test a representative sample (20) of links that include a D parameter.

Example 4

For our next example, we shall consider a site that generates pages containing some links with the following structure:


http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F

Rather than linking directly to Yahoo!, this page links to a script that generates a frameset that includes the referenced page. In a default configuration, LinkScan will happily follow the link, validating the frameset and the ultimate link to Yahoo!. However, it may not be productive to do that for potentially thousands of links. Furthermore, in the (extremely unlikely) event that the link to http://www.yahoo.com/ was broken, the error would appear in one of the GenerateFrame documents and not the original referring document. In order to repair that link, one would have to backtrack through the frameset to locate the original source of the trouble.

Hence we can apply more Substitute magic:


Substitute cgi-bin/GenerateFrame.*&Link=([^&]+).* !U$1

This command will extract the value of the Link= parameter, and the special !U token instructs LinkScan that the string needs to be un-encoded. So the original link:

http://www.example.com/cgi-bin/GenerateFrame?Referer=abc&Link=http%3A%2F%2Fwww.yahoo.com%2F

is transformed on-the-fly to:

http%3A%2F%2Fwww.yahoo.com%2F

and then decoded to:

http://www.yahoo.com/

And this means LinkScan can validate the link to Yahoo! directly without checking the GenerateFrame script many, many times. Furthermore, any errors will be flagged against the original document (and not one or more steps removed).

Example 5

For our final example, we include for illustration the complete configuration for a real-world large and very complex dynamic site:


# Set the CGI limit to be very large
# Include all file types on the Map

Maxcgi = 10000
Mapinclude .*

# Force &A=B and insert it immediately after the '?'

Substitute (cgi-bin.*[&\?])A=[^&=]*&*(.*) $1$2
Substitute (cgi-bin.*\?)(.*) $1A=B&$2

# Discard null and undefined values

Substitute (cgi-bin.*)&B=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&C=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&D=(null|undefined)(.*) $1$3
Substitute (cgi-bin.*)&R=(null|undefined)(.*) $1$3

# For 'category', take the &C= if present, otherwise the &B=

Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&C=[^&=]*).* $1$2
Substitute (cgi-bin/bv/scripts/category.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'content', take the &D= or &R= if present (call it &D=). Otherwise take the &B=

Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2
Substitute (cgi-bin/bv/scripts/content.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'frame', take the &D= or &R= if present (call it &D=). Otherwise take the &B=

Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?&[DR]=([^&=]*).* $1&D=$2
Substitute (cgi-bin/bv/scripts/frame.*\?A=B).*?(&B=[^&=]*).* $1$2

# For 'mailing...', take the &R=

Substitute (cgi-bin/bv/scripts/mailing.*\?A=B).*?(&R=[^&=]*).* $1$2

# For 'contact', take the &B=, &C= and &Comments

Substitute (cgi-bin/bv/scripts/contact.*\?A=B).*?(&B=[^&=]*).*?(&C=[^&=]*).*?(&Comments=[^&=]*).* $1$2$3$4

# Mark redirects to Error page as 404
# Mark documents containing 'Error Code:' as 404

Errordoc cgi-bin/bv/scripts/error.jsp
Errorbody Error\s+Code:[^\n<]*

# Hide some frequent arising errors

Noforms = 1
Exclude images/arrow.gif

Synthesizing Additional Links

The Substitute commands may be used to modify existing links on-the-fly. However, a variation of this, the Insertlink command, may be used to insert additional links into specified documents in order to achieve a specific test coverage. Again, it is best illustrated by example:


Insertlink .*complex\.jsp\?.*SPVAR= -
Insertlink (.*complex\.jsp\?.*) /$1&ALTMODE=1 +

As each document is scanned, LinkScan will process all Insertlink commands (in the order specified). The URL of the scanned document is matched against the first parameter of each Insertlink command. In the case of the first example above, a link to:

complex.jsp?VAR=1&SPVAR=2

will match the expression and LinkScan will abort all Insertlink processing for this document (signified by the minus character).

However, a link to:

complex.jsp?VAR=1

does not match the expression. Processing will continue to the second command. This does match the expression and LinkScan will insert a link into this document (signified by the plus character). Hence, when LinkScan processes:

complex.jsp?VAR=1

It will insert into that document, the following link:

complex.jsp?VAR=1&ALTMODE=1

Hint: Clearly, the Substitute command requires a good working knowledge of Perl Regular Expressions. If you need assistance, the LinkScan engineers will be happy to help. Please write to mailto:linkscan@elsop.com describing in as much detail as possible, the transformations you are seeking to achieve.

11.7 How to emulate different browser types

Most web browsers advertise their identity by including a User-Agent header with every request that they make. LinkScan also sends a User-Agent header. For example, the versions of Netscape Navigator, Microsoft Internet Explorer and LinkScan installed on the writers computer send, respectively:


User-Agent: Mozilla/4.08 [en] (WinNT; I ;Nav)
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
User-Agent: LinkScan Enterprise/9.0 Windows

Some websites are constructed in a manner that is browser sensitive. They may, for example, deliver customized pages depending on the users browser type. Hence LinkScan may be customized to emulate different browser types using the Extraheader command:


Syntax:

Extraheader literal-header-string

Example:

Extraheader User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

In this example, LinkScan will advertise itself as Microsoft Internet Explorer version 5.5 running under Windows 2000.

In fact, the Extraheader command may be used to add any arbitrary HTTP headers to every request that LinkScan sends. A common application involves those servers which look for a language preference in the HTTP headers in order to deliver pages in the appropriate language. For example, the following command instructs LinkScan to include an English Language preference header with each request:


Extraheader Accept-Language: en

11.8 How to remap different hosts

Sometimes a single website may contain links such as:


http://www.example.com/
http://www2.example.com/

Where www.example.com and www2.example.com resolve to the same host IP address. However, LinkScan would consider www2.example.com to be an External Link and not part of the www.example.com Project. Hence the Hostalias command may be used to assign more than one name to the current server. Syntax and example:


Syntax:

Hostalias from-server-url to-server-url

Example:

Hostalias http://www2.example.com/  http://www.example.com/

A similar issue arises when scanning development or staging servers. For example, you may wish to scan the site:


http://staging.example.com/

but the site may contain one or more absolute links to http://www.example.com/. In this case, you can use the Mirrorurl command.


Syntax:

Mirrorurl absolute-url

Example:

Homeurl = http://www.example.com/
Mirrorurl = http://staging.example.com/

In this case, LinkScan will resolve all links as if it were scanning http://www.example.com/. However, all actual HTTP requests will be directed to http://staging.example.com/. This provides a convenient mechanism for scanning development and staging copies of a production website.

11.9 How to assign documents to Owners

You may define the ownership of any given document or file in one of several ways. Ownership directives are evaluated in the order specified with the last match taking precedence. Note that the file ownership attribute is case sensitive.

  1. By the Unix File System ownership attribute. Note: this is not supported on Windows systems

  2. By the Defaultowner command. The syntax for the Defaultowner command is:

    Defaultowner owner-name

  3. By pattern matching with one or more Owner commands. The syntax for the Owner command is:

    Owner relative-path-expression owner-name

    LinkScan also supports a special variation of the Owner command. This will automatically assign every file an owner-name based on the name of the top-level (i.e. under "www root") directory in which it resides. This feature is automatically enabled if no Defaultowner or Owner commands are specified. The syntax is:

    Owner *1

  4. By using preexisting META tags in your HTML documents. For example, if your existing documents already contain tags of the form:

    <METa name="S11CONTENT_OWNER" CONTENT="Malcolm Hoar">

    You may set the Owner to 'Malcolm Hoar' by configuring a suitable pattern. e.g.:

    Ownertags = ^meta\s+name\s*=\s*"content_owner"\s+content\s*=\s*"([^"]+)

  5. Finally, once an Owner has been assigned to the file or document, you may manipulate the Owner string with a simple pattern substitution:

    Owneralias .*?([a-zA-Z0-9]+)[\s\.\)]*$ \L$1

    This example would take the string 'Malcolm Hoar' and convert the ownership to 'hoar'. This technique may be used to deal with synonyms such as 'M. Hoar.', 'Malcolm C Hoar '.


Example:

Defaultowner elsop         # Set default
Owner *1                   # Assign Owner based on top level dir ...
Owner wrc/humor/ humor     # But, make this subdir look like top-level
Owner .*\.cgi$ webmaster   # And give all *.cgi files to webmaster

When using LinkScan Dispatch to create reports for delivery by Electronic mail, you may define associations between Owners and Addresses with the Mailalias command. The syntax is:

Mailalias expression list-of-addresses

list-of-addresses may be a comma separated list of addressees if you wish to distribute the report to multiple recipients. Use Mailalias owner-name null to skip a specific Owner.


Example:

Defaultowner elsop         # Set default
Owner *1                   # Assign Owner based on top level dir ...
Owner wrc/humor/ humor     # But, make this subdir look like top-level
Owner .*\.cgi$ webmaster   # And give all *.cgi files to webmaster

Mailalias elsop            malch@elsop.com, ken@elsop.com
Mailalias links            ken@elsop.com
Mailalias linkscan         malch@elsop.com
Mailalias wrc              ken@elsop.com
Mailalias humor            ken@elsop.com
Mailalias test             null

If no Mailaliases are defined, Dispatch will address the reports to Ownername @ Mailhost

11.10 How to process additional per-document data

Facilities are provided to extract additional data from each document scanned, store those data in the LinkScan database and create various reports. The additional data collected are typically collected from the META tags in each HTML document.

Supported commands are provided for data extraction, substitution/manipulation and formatting:


# Userdata [123] match-expression expression
# Userdatafmt [123] [DHLTX] integer[LRC] caption
# D=date; H=hot links; L=link; T=truncate to format; X=normal
# Userdatasub [123] expression expression

The following example illustrates the use of these commands to extract and process an employee badge number from document META tags:


Userdata 1 (?i)<meta\s[^>]*employee\s*=\s*"\s*(#?\d+)\s*" $1
Userdatasub 1 #?(\d+) $1
Userdatafmt 1 X 6R Badge-Number

In the above example, we use the first of the three available userdata fields. The first command extracts the badge number from the document META tag. The second command performs a substitution on the matched data to remove an optional pound symbol from the badge number. The third command defines the formatting attributes; X defines a simple text field; 6R specifies a six-character, right-adjusted layout and Badge-Number defines a simple caption.

During the course of the scan, the employee badge numbers are extracted from each document and stored in the LinkScan database. In fact, the userdata fields are stored in a separate file:


PATH-TO-LINKSCAN/Project-name/data/linkscan.usr

This means that it is relatively simple to post-process the data before creating reports. For example, in this case, one might translate the badge numbers to employee names via a lookup on an employee database. The linkscan.usr file is a simple ASCII file with <Control-G> field delimiters.

The final data may be searched/viewed using the Search Documents Report and/or Changed Document Report.

11.11 How to control the testing of external links

LinkScan includes the capability to maintain a History File containing the date/time tested and status of all external links. This feature may be enabled and controlled via various settings in linkscan.sys.

A Site History Report, available from the main LinkScan Reports Menu, may be used to examine the historic behavior of doubtful links.

Once enabled, the LinkScan History file may be used to avoid testing links to remote servers with an excessive frequency. Appropriate use of the following controls will help ensure that you do not impose unnecessary loads on the network or the remote servers your links access. This feature enables you to be a responsible user of the network. But equally important, it can significantly speed up the testing of large projects. Note: The Site History Feature must be enabled (Maxhist > 0) for these settings to be effective:

Masterhist: Normally, LinkScan will maintain a History file on a per-Project basis. Enabling this feature will force LinkScan to maintain a single History file (in the LinkScan directory) for all Projects. Concurrency control is provided to ensure that the file is not damaged when scanning two or more Projects simultaneously.
[Default: Masterhist = 0 (Disabled) ]

Maxhist: The maximum number of entries maintained in the History File for each external link.
[Default: Maxhist = 0 (Disabled) ]

Maxgoodhours: The maximum number of hours between attempts to retest good external links. The scanning of URL's that have been checked within the specified period is skipped and the LinkScan Reports display the Status Code from the prior test.
[Default: Maxgoodhours = 0 (Disabled) ]

Maxbadhours: The maximum number of hours between attempts to retest bad external links. The scanning of URL's that have been checked within the specified period is skipped and the LinkScan Reports display the Status Code from the prior test.
[Default: Maxbadhours = (Disabled) ]

How to control the hits on any one server

You may also control the number of hits per server with the following commands in linkscan.sys.

Maxservertries: The maximum number of links that should be tested on any given server when that server is apparently "dead". Once this limit is exceeded, all other links to that server are skipped and assigned an URL Skipped - Bad Server (801) Status Code.
[Default: Maxservertries = 25 ]

Maxftp: The maximum number of links to any single FTP server that should be validated. Once this limit is exceeded, all other FTP links to that server are skipped and assigned a URL Skipped - FTP Limit (802) Status Code.
[Default: Maxftp = 25 ]

FTPUser and FTPPass: Define the username and password that LinkScan will use when validating links to FTP sites.
[Default: FTPUser = anonymous; FTPPass = me@example.com ]

Active Validation of mailto: Links

In a default configuration, LinkScan performs a simple syntax check on mailto: links. Active checking of mailto: links may be configured -- LinkScan uses our Mailvet™ technology to contact the mail servers associated with the specified address and attempts to establish the validity of the address without actually sending a message. To enable this feature:

  1. Ensure the Perl Module Net::DNS is installed on your computer. The Net::DNS Module is available from http://www.fuhr.org/~mfuhr/perldns/
  2. Configure the Hostname setting in linkscan.sys. This value is used for the SMTP HELO message and, for maximum accuracy, should match the Reverse DNS hostname of your computer. If your computer does not have a Reverse DNS entry, some mail servers configured with anti-SPAM measures may produce false errors.
  3. Configure the Mailfrom setting in linkscan.sys. This value is used for the SMTP MAIL FROM message and, for maximum accuracy, should be a valid (deliverable) return address.
  4. Set Checkmailto = 1 in linkscan.cfg.

On some systems, Net::DNS may not correctly identify the default name servers from your operating system configuration. If you encounter difficulties, please run the following test script:

perl ./utils/dns.pl

You may also configure DNS name server addresses in linkscan.sys by adding an entry such as:


Nameservers = 10.10.10.10, 10.10.10.20

11.12 How to scan very large sites

On very large sites with large numbers of cross-links, storing the details of each link can create significant overheads without commensurate value. For example, on a 100,000 document website, it may not be especially useful to store the details of multiple links from a tool bar that is included in the header or footer of every document. Hence, you may control duplicate link storage with the following parameters:

Maxgoodint: The maximum number of links to any given document that are stored in the database for Good Internal Links.
[Default: Maxgoodint = 100 ]

Maxbadint: The maximum number of links to any given document that are stored in the database for Bad Internal Links.
[Default: Maxbadint = 100 ]

Maxext: The maximum number of links to any given URL that are stored in the database for External Links.
[Default: Maxext = 100 ]

Tagonce relative-path-expression: Only the first occurrence of any link that matches relative-path-expression is stored is the database.
[Default: None]

In addition, the following commands may be used to completely disable some LinkScan functions that may not be appropriate when scanning very large sites.


Nohash = 1
Nomap = 1
Nosplit = 1

Enabling Nohash will save significant disk storage and some processing time by suppressing the creation of some database hash files. This will disable the All Pages Linking to... Report. LinkScan QuickCheck also makes extensive use of the hash files and it will be unable to produce reports based upon the link status in the database. QuickCheck will check the link status of any document in real-time on demand; hence it will still function albeit more slowly.

Enabling Nomap will suppress the creation of the LinkScan SiteMap and TapMap. This will reduce memory usage.

Enabling Nosplit will suppress the creation of the data structures required for Owner-specific Reports. The database will be created with the entire site under the Ownership of a single default Owner. This will save significant disk storage and processing times.

11.13 Other miscellaneous customizations

This section deals with a few other miscellaneous commands:

LinkScan Reference Manual. Section 12

Advanced, Custom and Command Line Reports

This Section covers:

  1. Customizing the appearance of LinkScan Menus and Reports
  2. Adding hyperlinks to other applications
  3. Mailing LinkScan reports from a browser
  4. Customizing the LinkScan SiteMap and TapMap
  5. Customizing the LinkScan Status Codes
  6. Creating Reports from the Command Line

12.1 Customizing the appearance of LinkScan Menus and Reports

You may change the appearance of the LinkScan Menus and Reports by creating the following files in the LinkScan directory:

These files may contain any valid HTML and will be inserted at the top and bottom of each Menu and Report, respectively. The file linkhead.txt should include the following tags:


<html><head>
<title>Your title here</title>
</head><body>

There is no need to close out the <body> or <html> tags in linkfoot.txt. LinkScan will always insert a Copyright notice and version stamp after the content of linkfoot.txt and close out the document with </body></html>.

12.2 Adding hyperlinks to other applications

If the following optional directives are specified in linkscan.cfg, LinkScan will add [Edit] hyperlinks at various points throughout the reports:


Editlink = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
Editdoc  = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT

The linking URL is constructed from the Editlink and Editdoc settings. Those settings may include the optional tokens !URL, !CAP or !STAT.

These tokens are replaced with %encoded strings containing:

In the case of Internal links (same scheme/host/port as Homeurl) the URL is relative. e.g.

http://foo/bar.cgi?Url=resume.html&Cap=My%20Resume&Status=200

In the case of External links, the URL is absolute. e.g.

http://foo/bar.cgi?Url=http://www.example.com/xyz%3F123&Cap=External=&Status=404

12.3 Mailing LinkScan reports from a browser

A user viewing any LinkScan report with a browser may send a copy of that report to any valid e-mail address.

To enable this feature, you must:

12.4 Customizing the LinkScan SiteMap and TapMap

LinkScan incorporates features that enable the automatic generation of customized, publication quality tables of contents for your Projects. Two types of Maps may be created:

When creating Maps based on Link Order, the presence of cross-links may distort the structure of the report in ways which you find undesirable. Therefore, LinkScan incorporates features that enable you to "manipulate" or override the LinkScan algorithm.

You may customize the structure and content of the SiteMap/TapMap with the following commands in the linkscan.cfg configuration files. Note the the Mapmove command only affects Maps based on Link Order (not the Maps based on Directory Structure).


Mapdefaulttitle [ string ] [ !PATH | !FILE ] [ string ]
Mapinclude relative-path-expression
Maphide relative-path-expression
Maptitle relative-path, Alternative Title
Mapmove relative-path, relative-path, position, [Alternative Title]

By default, all HTML type files are included on the SiteMap/TapMap. The Mapinclude and Maphide commands may be used to modify this behavior as illustrated in the following example:


Examples:

Mapdefaulttitle Pathname: !PATH; Filename: !FILE
Mapinclude .*
Maphide (?i).*\.(gif|jpg)$
Maphide first-doc.html#Top
Maptitle second-doc.html, An Alternative Title for second-doc.html
Mapmove third-doc.html, index.html, 5, Alternative Title

The above example will:

Note that the Mapinclude and Maphide commands accept Regular Expressions. The Mapdefaulttitle, Maptitle and Mapmove commands require exact values.

12.5 Customizing the LinkScan Status Codes

Each link validated by LinkScan is assigned a specific LinkScan Error or Status Code. And, every Status Code is associated with a Severity. You may customize the Severity associated with any Status Code by using the Statuscode command. The command syntax is:


Statuscode statuscode, severitycode

The following Severity codes are valid:

Symbol Code Severity Explanation
* 0 Unknown: LinkScan has not tested or was unable to test this link
* 1 Error: LinkScan found a hard error on this link
* 2 Possible Error: There may be a problem with this link. It should be retested at a later time
* 3 Warning: LinkScan found something unusual about this link. Manual inspection highly recommended
* 4 Advisory: This link is probably ok, but manual inspection recommended
* 5 No Error: This is a good link

Examples:

Statuscode = 301,3    # 301 (Moved Permanently) from Error to Warning
Statuscode = 7,4      #   7 (Orphaned HTML File) to Advisory
Statuscode = 8,4      #   8 (Orphaned non-HTML File) to Advisory

The above commands will downgrade all 301 status codes from Errors to Warnings, and all Orphaned Files from Warnings to Advisories.

12.6 Creating Reports from the Command Line

Command line reports are provided to address the following requirements:

To enable command line reporting, you must create an environment variable called linkscan and set it to any non-null value. Depending on your system/shell the command is:

Unix users may wish to add the appropriate command to their .login or .cshrc files so that the environment variable is automatically initialized at each login.

When LinkScan Reports are generated via the normal browser-based interface, users select the type and style of report by completing and submitting normal HTML forms. Other techniques are required in order to make these selections from the command line interface and several options are provided:

  1. You may specify your selections in a configuration file. An example file with sensible defaults -- linkscan.rep -- is placed in each Project directory automatically.

  2. You may also select a specific report using the interactive browser-based interface and copy/paste the URL to the command line interface (since your selections are already embedded within the name-value pairs on the query string).

Simply execute the program linkscan.cgi and it will prompt you for some or all of the following parameters:

Alternatively, you may specify any or all of these parameters on the command line, as shown by the -help switch:

web:/usr/local/www/data/linkscan> perl linkscan.cgi -help

LinkScan Version 9.0
Copyright 1997-2001 Electronic Software Publishing Corporation

USAGE: linkscan  {-help} {-type type} {-project name} {-owner owner}
                 {-repfile file} {-query string} {-outfile path}
                 {-tty} {-mailto address} {-format n}

-help            Displays this message
-type type       Select report type
-project name    Specify a LinkScan Project
-owner owner     Specify a LinkScan Owner
-repfile file    Specify a filename with the reporting options
-query string    Specify all options in the form of an encoded URL
-outfile path    Specify an output filename
-tty             Output to terminal
-mailto address  Send report to email address
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text

Detailed Help [Y/N]:

Where the parameter to -type is one of:


Examples:

perl linkscan.cgi -type d -project default -outfile myreport.html

perl linkscan.cgi -query 

Also see the Sections of this Manual covering LinkScan Dispatch and LinkScan QuickCheck. Note there is no command-line interface to LinkScan TapMap due to its interactive nature.

LinkScan Reference Manual. Section 13

LinkScan Enterprise Extensions

LinkScan Enterprise incorporates the additional option to scan multiple hosts (or virtual hosts) within a single LinkScan Project. The following parameters must be configured in linkscan.cfg for each host:


Host1.URL    = http://www.example.com/
Host1.Short  = www:

Each host must be configured with a one or two digit number in the range 1 to 99. In this context, '1' and '01' are considered to be equivalent.

The URL setting specifies the URL of a specific host. The Short setting specifies an abbreviated form of the URL which is used to save real-estate on the various LinkScan Reports.

In addition, the following per-host parameters are optional:


Host1.Mirror = http://dev.example.com/
Host1.Nocase = 1
Host1.Path   = /usr/vhosts/devex/

The Path setting sets the File System root for this host. The Mirror setting specifies an alternate URL to be used for all HTTP requests. All tags are resolved using the URL setting but any physical HTTP requests are directed to the host specified by the Mirror setting (typically a development/staging server). The Nocase setting may be set to a positive integer to indicate that the specified host uses case insensitive pathnames (i.e. index.html and INDEX.HTML are considered identical).

In addition, when operating in multi-host mode, all of the LinkScan commands that normally include host-relative expressions, must be modified to use Absolute URLs. For example:

Exclude serverlogs/

Should be specified as:

Exclude http://www.example.com/serverlogs/

We can put all of this together with the following example:


# Hostalias -- maps all https: references back to http:
# Extrahome -- submits login form (?? selects POST method)
# Exclude   -- prevents premature logout
# Maxcgi    -- large value to test many query strings

Homeurl = http://www.example.com/
Host1.URL = http://www.example.com/
Host1.Short = www:
Host2.URL = http://app.example.com/
Host2.Short = app:

Hostalias https://www.example.com http://www.example.com
Hostalias https://app.example.com http://app.example.com
Extrahome = http://app.example.com/login??username=xxx&password=yyy
Exclude .*LOGOFF
Maxcgi = 5000

The behavior of the Owner *1 command is automatically modified when scanning multiple hosts within a single Project. Ownership is assigned based on the Short name for that host and the top level directory name within that host. Hence, the document:

http://www.example.com/somedir/somefile.html

is assigned to Owner www:somedir.

LinkScan Reference Manual. Section 14

LinkScan Support

Technical Support is available via e-mail from Electronic Software Publishing Corporation at mailto:linkscan@elsop.com.

Also see the Support Section of our website at:

When contacting the LinkScan engineers, please try and provide as much of the following information as you can:

Diagnostic Tools

When our engineers need more details concerning your LinkScan installation and configuration, they may ask you to run a small diagnostic utility that will create a summary of your setup. If you feel it is appropriate, please do not hesitate to supply this with your initial enquiry. From the command line, simply execute the following program and attach the linkdiag.txt file to your message:

C:\>cd LinkScan

C:\LinkScan>perl linkdiag.pl
Welcome to the LinkScan 9.0 Diagnostics.
This utility creates the ASCII file linkdiag.txt.
Please mail this file to linkscan@elsop.com on request.
Server Name: malch
Login  Name: malch
Current Working Dir: C:/LinkScan/
Completed!
C:\LinkScan>

LinkScan Reference Manual. Section 15

Known Problems and Limitations

LinkScan Reference Manual. Section 16

LinkScan Dispatch

[Not available in LinkScan Workstation]

LinkScan Dispatch may be used to create specific reports for each Owner in a Project. The reports may be formatted in either plain text or HTML. They may be saved to disk as static files or dispatched via electronic mail to selected addresses. Before using LinkScan Dispatch you must:

  1. Configure the LinkScan to Email Interface if you wish to distribute any reports via email.

  2. Ensure that you have appropriate document Ownership rules defined. Note that, in a default configuration, LinkScan will create and assign Owners based on the top-level directory names immediately beneath the website root. See also How to assign documents to Owners.

  3. Ensure that you have configured Mailhost in linkscan.cfg. Note that, by default, e-mail reports are sent to Owner@Mailhost. Use the Mailalias command to map specific Owners to specific e-mail addresses. See How to assign documents to Owners.

  4. Successfully complete a scan of the selected website.

  5. Execute dispatch.pl to create the LinkScan Dispatch reports.

Note that LinkScan Dispatch supports the following command line options:

web:/usr/www/htdocs/linkscan> perl dispatch.pl -help     

LinkScan/Dispatch Version 9.0
Copyright 1997-2001 Electronic Software Publishing Corporation

USAGE: dispatch [{-help}] | [{-mail} {-test} {-project name}]
                [-type x {-repfile file} {-outfile file} {-format n}]

-help            Displays this message
-mail            Mails report to user versus storing in saved file
-project name    Specify project name
-test            Send mail to STDOUT -- no mail is sent
-type [xeskdbco] Select report type
-repfile file    Specify a filename with the reporting options
-outfile file    Output filename
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text
Report Types:
-type x = Project Summary Report
-type e = Problem Documents Report
-type s = Document Detail Report
-type k = Critical Errors Report
-type d = Detailed Errors Report
-type b = Changed Documents Report
-type c = Selected Status Codes Report
-type o = Orphaned Files Report

Detailed Help [Y/N]:

Examples


perl dispatch.pl -project myproj -type k -format 4 -mail

In the example above, Dispatch will create a Critical Errors Report for each Owner within Project myproj and deliver them via e-mail in TEXT format.

The following style of command-line options is also support for compatibility with pre-9.0 versions of LinkScan/Dispatch.


perl dispatch.pl -project myproj -errors 4 -mail 

In the example above, Dispatch will create a Detailed Report for each Owner within Project myproj and deliver them via e-mail in TEXT format.

Adding Custom Headers/Footers to LinkScan Dispatch Reports

When creating Dispatch Reports in plain text format, the following files are automatically inserted into the header and footer of each report:


Mailheadtext = mailhead.txt
Mailfoottext = mailfoot.txt

When creating Dispatch Reports in HTML format, the following files are automatically inserted into the header and footer of each report:


Mailheadtext = mailhead.html
Mailfoottext = mailfoot.html

When distributing the Dispatch Reports via e-mail you may customize the Mail Headers by adding directives such as those shown below at the top of the Mailheadtext and/or Mailheadhtml files:


$H From: linkscan@example.com
$H Subject: LinkScan Status Report
$H MIME-Version: 1.0
$H Content-type: text/html

Note that the To: header is always generated by LinkScan Dispatch based on the Owner and Mailalias rules that you have defined.

LinkScan Reference Manual. Section 17

LinkScan Excel

LinkScan is shipped with a Microsoft Excel spreadsheet including some macros. This may be used to import portions of the LinkScan database into Excel for further analysis. The macros are compatible with the following versions of Microsoft Excel:

  1. Open the following file (or a copy of this file if you want to preserve a clean master version) in Microsoft Excel:

    Excel 97: C:\LinkScan\utils\LinkScan97.xls

    Excel 2000 or later: C:\LinkScan\utils\LinkScan.xls

  2. Select Sheet3 and, if necessary, adjust the value of Cell C2. This Cell must contain the pathname to your LinkScan installation folder (e.g. C:\LinkScan\).

  3. Select the first cell of an empty worksheet. Note that the LinkScan Import Macro always places the imported data starting at the currently selected cell of the current worksheet. Note that the Import Macro will not permit you to import data into Sheet3.

  4. Execute the macro LinkScanImport:

    Tools | Macro | Macros... | LinkScanImport | Run

    You may also bind this macro to an Excel Function Key, Menu Item and/or Toolbar.

  5. The LinkScan Macro will display a dialog that allows you to select a LinkScan Project and an Import Function:

    Excel Screenshot

  6. Depending on the Import Function selected, you may be presented with further options. Following confirmation, the selected data will be imported and you may use the full range of Excel features to manipulate the data.

  7. Note that Sheet3 of the LinkScan.xls workbook is reserved. This spreadsheet is used to control the LinkScan macros. For each Import Function, the sheet defines:

    You may modify Sheet3 to customize the column order and headings etc. However, care is required, since the macro performs very limited validation on those data values.

LinkScan Reference Manual. Section 18

LinkScan Profiler

[Not available in LinkScan Workstation]

The LinkScan Profiler may be used to help identify pages that contain or link to "inappropriate" [1] content. The Profiler operates on a rule-based scoring system.

The profile.txt file in the main LinkScan directory defines the actual rules and associated scores. The default profile.txt file contains some minimal profiling criteria based on the Platform for Internet Content Selection (PICS) standard. Under this standard, many sites include self-ratings in their web pages via META tags. The LinkScan Profiler specifically supports the RASC, ICRA and SafeSurf implementations. See the following References.

A much more comprehensive set of rules is available free of charge from Elsop. Since this implementation of the profile.txt file includes a significant amount of profane and offensive language, it is distributed separately once we receive satisfactory evidence of age verification and a waiver. To obtain a copy of this file, please send e-mail such as:

To: profiler@elsop.com
From: myname@example.com
Subject: Profiler Request

Please send me a copy of the LinkScan Profiler rules.
I confirm that:

1. I am over 21 years old.

2. I understand that the LinkScan Profiler rules
   contain a significant quantity of profane and
   offensive language including explicit sexual
   depictions.

3. I understand and agree that the LinkScan Profiler
   rules are subject to the same License Agreement
   and restrictions of use as LinkScan itself.

4. I confirm that I will use the LinkScan Profiler
   rules only in conjunction with LinkScan and in
   accordance with the LinkScan License Agreement.
   I shall not re-distribute the Profiler rules to
   any other person or organization.

The message must be sent from a verifiable corporate Email address. Mail sent via semi-anonymous services such as yahoo.com, MSN and AOL is not acceptable. If necessary, we will contact you to make alternative arrangements but Elsop will not supply the LinkScan Profiler files until we are satisfied that the request is made by an adult and is legitimate.

Configuring the Profiler

In a typical configuration, you will need to add the following commands to the Project linkscan.cfg file:


Profiler = 2
Profilerlog = 1
Profilermax = 200

The Profiler command enables the LinkScan Profiler. Valid options are:

The Profilerlog command enables a detailed trace indicating exactly what profiling rules were triggered. The log is maintained in the file:

.../LinkScan/Projectname/data/linkscan.red

The Profilermax command sets the trigger threshold for the LinkScan Profiler. The default and recommended setting is 200. Reduce this to 100 to make the Profiler even more sensitive. Increase the value to 300 or more to reduce the sensitivity.

Note: When enabled, the Profiler will force the following settings:


Fetchext = 1
Followext = 1

The Followext command instructs LinkScan to follow redirections when validating the external links. This is the default setting. The Fetchext command instructs LinkScan to fetch the body of a document referenced via an external link. Normally, LinkScan seeks to validate external links without retrieving the document bodies. This enables LinkScan to profile the content but note this will significantly increase the amount of bandwidth and processing required.

Initially, we recommend you complete a full scan with the settings shown above (at the top of this document) and manually review the linkscan.red log file. We think you will find this informative. More importantly, you will be able to decide what threshold to use for subsequent check-ups and whether you want to enable/disable/modify any of the existing rules. Some users may want to whitelist all .gov sites for example.

At the end of the day, only you can decide what links are appropriate for your site and consistent with your editorial policies. Material that may be entirely appropriate for a current affairs website may also be highly undesirable for a site specifically intended for younger children.

Hence you may want/need to review the active rules in the profile.txt file.

Proxy Servers and Firewalls

When LinkScan is operated behind a Proxy Server or Firewall that implements content-based access control policies, then you need to be aware that your proxy/firewall will likely prevent LinkScan from accessing the site. In this case, you will need to implement a Profiler rule which will enable LinkScan to detect the fact that access was denied. The Bess proxy system is widely used by many schools and some Internet Service Providers. When access is denied, the Bess system typically adds a special HTTP header: Pragma: BESSBLOCK The SonicWALL systems typically replace an offending page with a page that includes the phrase "Blocked By SonicWALL". The following header (H) and body (B) rules will detect those conditions:


H BESS-01    2000   pragma: bessblock
B SWALL-01   2000   blocked by sonicwall

References

Definition of Inappropriate

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it...

With apologies to:
Mr. Justice Stewart
United States Supreme Court
JACOBELLIS v. OHIO, 378 U.S. 184 (1964)

LinkScan Reference Manual. Section 19

LinkScan QuickCheck

LinkScan QuickCheck serves two functions:

  1. It is invoked automatically via hyperlinks from some of the other LinkScan Reports to display a highly detailed report for a single document.

  2. It may be invoked directly from the main LinkScan Reports Menu and used to check (or recheck) an single document or link.

Each QuickCheck Report includes several items of information that are transparently integrated:

QuickCheck has a strong affinity for the LinkScan database. If the data are available in the database associated with the currently selected Project, QuickCheck will seek to ascertain the status of each link using the database and the status found during the last full scan. If this is not available, or the requested document lies outside the scope of the current Project, QuickCheck will perform a full link analysis on that document in real-time.

If QuickCheck has pulled the link status data from the database, the user may force a fresh, real-time scan of that document. This is useful when, for example, you want to recheck a single document after making changes to it. Simply use the Recheck Now option included on each Report.

You may also run LinkScan QuickCheck from the command line in exactly the same manner as the linkscan.cgi program as show below:

web:/usr/www/htdocs/linkscan> perl quick.cgi -help        

LinkScan/QuickCheck Version 9.0
Copyright 1997-2001 Electronic Software Publishing Corporation

USAGE: quick.cgi {-help} {-url URL} {-project name}
                 {-repfile file} {-outfile path} {-tty}
                 {-mailto address} {-format n} {-now} {-http}

-help            Displays this message
-url URL         Specify the URL to be scanned
-project name    Specify a Project. Equivalent to -site
-repfile file    Specify a filename with the reporting options
-outfile path    Specify an output filename
-tty             Output to terminal
-mailto address  Send report to email address
-format n        1=Full HTML; 2=HTML; 3=Plain; 4=text
-now             Perform real-time check
-http            Force HTTP Access

Detailed Help [Y/N]:

Example:

perl quick.cgi -project default -home http://www.example.com/index.html -tty

The above example will run QuickCheck against http://www.example.com/index.html, reading the options from linkscan.rep and displaying the results on the terminal.

LinkScan Reference Manual. Section 20

LinkScan Recorder

Introduction

The LinkScan™ Recorder is a Windows application that interfaces with Microsoft Internet Explorer. It may be used to capture real web browsing sessions, such as a complex order entry sequence. The captured recording includes all of the data entered into any associated forms. LinkScan may then be configured to replay the recording on demand, validating every link on each form and results page in the sequence.

Hence LinkScan and the LinkScan Recorder provide powerful and convenient capabilities for the rapid and comprehensive regression testing of complex transaction-based systems.

Applications

The principal applications of the LinkScan Recorder are:

  1. To capture user-sequences, such as an on-line shopping or purchase procedure. These are typically complex sequences that are time consuming to test regularly and comprehensively. They are also tend to be some of the most important pages on a website or Intranet application.

    Once a sequence has been recorded, you may use the LinkScan Recorder to replay it and display the results in an Internet Explorer Window. More importantly, LinkScan may be configured to automatically replay the same steps and validate every link on each page in the sequence.

  2. To capture special URL's that are used to define the start of a site scan. This is typically required when the site uses a login page and cookie arrangement for access control. The URL's may be used as the main starting point for a scan (Homeurl/Homefile) or as additional seed links for a full site scan (Extrahome).

    Note: forms-based login procedures are completely different from HTTP authentication schemes. In the first case, users fill out a regular HTML form. In the latter case, the users browser presents an authentication challenge within a pop-up dialog box.

Using the LinkScan Recorder

Start the LinkScan Recorder by pressing the Record button on the main LinkScan Window, or by executing the recorder.exe program in the LinkScan installation folder. This will open two Windows; the LinkScan Recorder window and an associated copy of Microsoft Internet Explorer with an empty home page. The LinkScan Recorder Window looks like this:

LinkScan Recorder

The interface includes a number of simple command buttons:

Note that when the LinkScan Recorder is inactive (i.e. in Stop mode) you may edit the URL's in the current recording using the mouse and keyboard. The Control-C and Control-V keys may be used to copy and paste highlighted text to and from the Windows Clipboard.

Importing a Saved Recording into LinkScan

Once you have completed a recording, use the Save button to write the recording to disk. It is stored in plain ASCII text and may be edited using Windows Notepad or any other similar program. Specifically, you may wish to annotate each line/URL in the sequence with a comment. Simply append one or more <TAB> characters after the URL followed by your comment. LinkScan will process those comments much like the anchor text associated with a normal HTML hyperlink.

The recording is now in a suitable format for processing by LinkScan. On Windows systems, simply create a new Project (i.e. configuration) and select the Edit Project dialog. The following items must be configured:

Please see the Import Scanning section of the LinkScan Reference Manual for further details on this topic.

LinkScan Recorder and Unix Systems

The LinkScan Recorder is a Microsoft Windows application and does not run on Unix systems. However, the LinkScan Recorder is included with the LinkScan Unix distributions and it may be moved to a Windows system so that recordings may be prepared. Those recordings may be saved as simple ASCII text files for transfer to the Unix system where they may be processed by LinkScan. All of the LinkScan Import Features are, of course, fully supported on the Unix platforms.

To install the LinkScan Recorder, copy (e.g. with FTP) the following files from the LinkScan directory on the Unix system to a Windows machine:

FileRequiredFormat
recorder.exeRequiredBinary
docs/links19.htmlOptionalAscii
docs/lsrule.gifOptionalBinary
docs/newlogo.gifOptionalBinary
docs/ssrec.jpgOptionalBinary

The LinkScan Recorder executable requires the Microsoft Visual Basic runtime libraries. If these are not already installed on your Windows system, you may download the self-extracting archive from our website: http://www.elsop.com/download/vbrun60.exe.

Special Considerations

The following points are worthy of note and consideration:

  1. The data captured by the LinkScan Recorder includes POSTED form values that are normally invisible/hidden. The name-value pairs are represented using the special LinkScan URL convention based on the double question-mark. Hence forms utilizing the GET method are represented in the normal manner, for example:

    http://www.example.com/form.cgi?Name=John%20Doe&Country=USA

    Whereas, forms utilizing the POST method are represented thus:

    http://www.example.com/form.cgi??Name=John%20Doe&Country=USA

  2. If a website uses <FRAMESETS>, the individual frames within each frameset must be added to the import file to achieve full test coverage. Otherwise, LinkScan would view the frameset as a page, checking the links to each frame but not validating the links within the individual frames. The LinkScan Recorder will capture the URL of each frame automatically.

LinkScan Reference Manual. Section 21

LinkScan TapMap

This hyperlink activates the LinkScan TapMap - an interactive and highly dynamic variation of the LinkScan SiteMap. TapMap is an expandable and collapsible SiteMap that allows viewers to tap down through the various levels of a website to easily navigate and explore the website by clicking on a few control icons.

See TapMap Overview and Legend for a brief description of the TapMap control icons.

LinkScan Reference Manual. Section 22

LinkScan WebServer

The LinkScan WebServer is a small, easy-to-configure, HTTP compliant webserver. It enables interactive query and reporting capabilities from the LinkScan database via a standard web browser interface. However, it avoids the complexity of installing and configuring a fully functioned webserver on a desktop computer. LinkScan WebServer supports a surprisingly large number of features found in more complex products but, with the emphasis on simplicity. There are some limitations. Features include:

Limitations include:

Installation

The LinkScan WebServer is installed and configured automatically when you install LinkScan on a Windows System.

Customization

LinkScan WebServer reads several configuration files whenever it is executed. These are all located in the main LinkScan folder:

These configuration files are shared with the main LinkScan program. However, the behavior of the LinkScan WebServer may be customized with simple editing of these files as described below:

LinkScan Reference Manual. Section 23

LinkScan Utilities

The linkscan/utils/ directory contains several unsupported utilities described in this section:

  1. makeign.pl
  2. sendmail.pl
  3. tools.cgi

23.1 makeign.pl

The purpose of this script is to help users configure Server Aliases and Redirections. It will attempt to parse an Apache format server configuration file and create an equivalent LinkScan Alias commands in a format suitable for inserting into linkscan.cfg. Simply execute this script, specifying the pathnames to the appropriate Apache configuration file and a destination file.


Example:

perl utils/makeign.pl /usr/local/etc/httpd/conf/srm.conf linkscan.cfg.try

The completeness and accuracy of the results will vary depending upon your exact Apache Version. We highly recommend that you manually review the output before adding it to your production LinkScan installation.

23.2 sendmail.pl

See the LinkScan to Email Interface.

23.3 tools.cgi

The purpose of the tools.cgi script is to provide a simple web browser-based interface to some of the shell (command line) utilities available on Unix and NT servers for probing questionable URL's. It may be interfaced to various commands including:

[1] We find Ron Guilmette's dnw and ipw programs very useful. The source code may be downloaded from this website.

[2] httphead does not invoke an external shell (command line) command. It is a special built-in function that will display the HTTP headers associated with a target URL.

In order to use tools.cgi you must:

  1. Set the file permissions appropriate for CGI scripts on your server

  2. Install the script in a cgi-bin directory if required by your server

  3. Adjust the shebang line to point at the Perl 5 executable on your server

  4. Edit the hash %Tools that is declared and initialized at the top of the source code. %Tools is used to define the utilities you wish to make available, the absolute pathnames to the executables and type of parameters they accept (Domain, Host or URL)

LinkScan Reference Manual. Section 24

Weblint Man Page


weblint 1.020                                   weblint 1.020 

NAME

weblint - pick fluff off web pages (HTML)

SYNOPSIS

weblint [ -d id ] [ -e id ] [ -f filename ] [ -i ] [ -l ] [ -s ] [ -stderr ] [ -t ] [ -todo ] [ -help ] [ -U ] [ -urlget command ] [ -v ] [ -version ] [ -warnings ] [ -x extension ] file1 .. fileN

DESCRIPTION

Weblint is a Perl script which picks fluff off HTML pages. Files to be checked are passed on the command-line: % weblint foobar.html ./dodgy-files/ index.html If any of the arguments are directories weblint will recurse in the directory, and check any HTML files found. If an argument is a URL, then weblint will get the file using a URL retrieval program, and then check the file: % weblint http://www.foobar.com/ By default weblint will use lynx to retrieve URLs, but this can be over-ridden. A filename of `-' specifies that weblint should read from standard input: % lynx -source http://www.foobar.com/ | weblint - Warnings are generated a la lint: home.html(9): unmatched </A> (no matching <A> seen). Weblint includes the following features: + by default checks for HTML 3.2 (Wilbur) + 46 different checks and warnings + Warnings can be enabled/disabled individually, as per your preference + basic structure and syntax checks + warnings for use of unknown elements and ele- ment attributes. + context checks (where a tag must appear within a certain element). + overlapped or illegally nested elements. + do IMG elements have ALT text? + flags obsolete elements. + support for user and site configuration files + stylistic checks + checks for html which is not portable across all browsers + flags markup embedded in comments, since this can confuse some browsers + support for Netscape, and Microsoft HTML exten- sions

OPTIONS

-d warning-identifier Disable the warning associated with the identifier. Multiple identifiers can be specified, with a comma between identifiers. -e warning-identifier Enable the warning associated with the identifier. Multiple identifiers can be specified, with a comma between identifiers. -f config-file Specify a weblint configuration file which should be used in place of the user's default config file, or the site configuration file. -help Show a short usage summary. -i Ignore case of element tags. -l When recursing in directories, ignore any files which are symlinks (also known as soft links). This will also cause files on the command-line to be ignored if they are symlinks, unless only one file is given. -pedantic Turn on all warnings except the case-sensitive and bad-link warnings. -s Generate `short' warning messages, which do not include the filename. -stderr Print warning messages to STDERR rather than STD- OUT. -t Enable terse warning mode, which is mainly useful for the weblint testsuite. -U Same as -help. -urlget command The command which should be used to retrieve HTML pages specified by URL. -v Display the version number. -version Display the version number. -todo This prints out the URL for the online version of the weblint ToDo list. This includes known bugs, and requested/planned features. -warnings List all supported warnings, with warning identi- fier, and whether the warning is enabled. -x extension Include checks for the specified HTML extension; multiple extensions can be specified, separated with a comma. Currently the only extensions sup- ported are Netscape and Microsoft. This can also be set in your weblint configuration file, described below.

HTML EXTENSIONS

Unless you specify otherwise, weblint assumes you are using HTML 3.2. Weblint supports the Netscape and Microsoft HTML extensions in addition. For example, weblint will complain that the BLINK element is not known, unless you enable the Netscape extension. The following extensions are currently supported: Netscape The HTML extensions supported by the Netscape browser, version 4. Microsoft The HTML extensions supported by Microsoft Internet Explorer, version 4. To enable an extension, you can either use the -x command- line switch: % weblint -x Netscape foobar.html Or you can use the extension keyword in your .weblintrc: # enable the Microsoft extensions extension Microsoft

CONFIGURATION FILE

Weblint can be configured using a file .weblintrc in your home directory (or a file referenced by the WEBLINTRC environment variable). This file can be used to enable or disable specific warnings, set weblint variables, and include HTML extensions, as described above. Each warning has a short identifier string, used to refer to the warn- ing in config files, and from the command-line. For exam- ple, if you want to enable the check for tags in upper- case, but disable the check for obsolete elements, then you would include the following lines in your .weblintrc: # specify the command used to retrieve URLs (-urlget switch) set url-get = lynx -source # the style of warning message to generate (lint, short, or terse) set message-style = lint # enable warning for tags not in upper-case enable upper-case # disable the warning for obsolete tags disable obsolete # enable the Netscape HTML extensions extension Netscape # when recursing in a directory, # ignore files which are symlinks (also known as soft links) ignore symlinks The keywords can be followed by any number of arguments, separated by spaces or tabs. Anything following a `#' is treated as a comment. A sample configuration file is included in the weblint distribution (as of version 1.004), which mirrors the con- figuration built-in to weblint. Weblint also supports a site configuration file. If a user does not have a personal configuration file, then weblint will check for a local site configuration file. To provide such a file, create a directory such as /usr/local/weblint, and create a file global.weblintrc. You need to edit the weblint script and modify the $SITE_DIR variable, which you will find near the top of the file. For example: $SITE_DIR = '/usr/local/weblint'; At some point in the future there will be configuration support for weblint, so you won't have to modify the script directly yourself. If you have a site configuration file, then users can inherit the site defaults by adding the following line at the top of their .weblintrc file: use global weblintrc

WARNINGS

All warnings generated by weblint are listed below, along with the associated identifier, and whether the warning is enabled or disabled by default. tag <...> is not in upper case. Identifier: upper-case Default: disabled tag <...> is not in lower case. Identifier: lower-case Default: disabled foo attribute is required for <...> Identifier: required-attribute Default: enabled expected an attribute for <...> Identifier: expected-attribute Default: enabled unknown element <...> Identifier: unknown-element Default: enabled unknown attribute `...' for element <...>. Identifier: unknown-attribute Default: enabled should not have whitespace between `<' and `...>' Identifier: leading-whitespace Default: enabled bad form to use `here' as an anchor! Identifier: here-anchor Default: enabled no <TITLE> in HEAD element. Identifier: require-head Default: enabled tag <...> should only appear once. I saw one on line XX! Identifier: once-only Default: enabled <BODY> but no <HEAD>. Identifier: body-no-head Default: enabled outer tags should be <HTML> .. </HTML>. Identifier: html-outer Default: enabled <...> can only appear in the HEAD element. Identifier: head-element Default: enabled <...> cannot appear in the HEAD element. Identifier: non-head-element Default: enabled <...> is obsolete. Identifier: obsolete Default: enabled unmatched </...> (no matching <...> seen). Identifier: mis-match Default: enabled IMG does not have ALT text defined. Identifier: img-alt Default: enabled <...> cannot be nested. Identifier: nested-element Default: enabled Did not see <LINK REV=MADE HREF=mailto:...> in HEAD. Identifier: mailto-link Default: disabled </...> on line XX seems to overlap <...>, opened on line YY. Identifier: element-overlap Default: enabled no closing </...> seen for <...> on line XX. Identifier: unclosed-element Default: enabled markup embedded in a comment can confuse some browsers. Identifier: markup-in-comment Default: enabled odd number of quotes in element <...>. Identifier: odd-quotes Default: enabled heading <H?> follows <H?> on line N. Identifier: heading-order Default: enabled target for anchor Identifier: bad-link Default: disabled unexpected < in <...> -- potentially unclosed element. Identifier: unexpected-open Default: enabled illegal context for <...> - must appear in <...> element. Identifier: required-context Default: enabled unclosed comment (comment should be: <!-- ... --> Identifier: unclosed-comment Default: enabled element <...> is not a container -- </...> not legal. Identifier: illegal-closing Default: enabled <...> is physical font markup -- use logical (such as XXX) Identifier: physical-font Default: disabled attribute XYZ is repeated in element <...> Identifier: repeated-attribute Default: enabled empty container element <...> Identifier: empty-container Default: enabled use of ' for attribute value delimiter is not supported by all browsers (attribute XYZ of tag ABC) Identifier: attribute-delimiter Default: enabled closing tag <...> should not have any attributes speci- fied. Identifier: closing-attribute Default: enabled directory DIR does not have an index file (index.html) Identifier: directory-index Default: enabled <...> must immediately follow <...> Identifier: must-follow Default: enabled setting WIDTH and HEIGHT attributes on IMG tag can improve ren- dering performance on some browsers Identifier: img-size Default: disabled leading/trailing whitespace in content of container element ... Identifier: container-whitespace Default: disabled first element was not DOCTYPE specification Identifier: require-doctype Default: disabled `>' should be represented as `>' Identifier: literal-metacharacter Default: enabled malformed heading - open tag is <H?>, but closing is </H?> Identifier: heading-mismatch Default: enabled illegal context, <...>, for text; should be in XXX. Identifier: bad-text-context Default: enabled illegal value for AAA attribute of XXX (...) Identifier: attribute-format Default: enabled <...> is extended markup (use '-x <extension>' to allow this). Identifier: extension-markup Default: enabled attribute `...' for <...> is extended markup (use '-x <exten- sion>' to allow this). Identifier: extension-attribute Default: enabled value for attribute XYZ (xyz-value) of element FOOBAR should be quoted (i.e. XYZ='xyz-value') Identifier: quote-attribute-value Default: enabled you should use '>' in place of '>', even in a PRE ele- ment. Identifier: meta-in-pre Default: enabled <A> should be inside <H?>, not <H?> inside <A>. Identifier: heading-in-anchor Default: enabled The HTML spec. recommends the TITLE be no longer than 64 charac- ters. Identifier: title-length Default: enabled

TESTSUITE

A simple regression testsuite is included with weblint, in the Perl script test.pl. You can run the testsuite with either of the following commands: % make test % ./test.pl The results are printed to STDERR, with a more complete report generated in test.log. All tests should pass. If any tests fail, please email test.log to the address given in the AUTHOR section below.

ENVIRONMENT VARIABLES

WEBLINTRC If this variable is defined, and references a file, then weblint will read the referenced file for the user's configuration, rather than $HOME/.weblintrc. TMPDIR The directory where weblint will create temporary working files. Defaults to /usr/tmp.

FILES

$HOME/.weblintrc The user's configuration file. See the section `CONFIGURATION FILE'.

SEE ALSO

perl(1)

VERSION

This man page describes weblint 1.020.

AVAILABILITY

ftp://ftp.cre.canon.co.uk/pub/weblint/weblint.tar.gz http://www.cre.canon.co.uk/~neilb/weblint/

KNOWN BUGS

The list of known bugs can be found on the weblint home page: http://www.cre.canon.co.uk/~neilb/weblint/todo/ Certain versions of Perl have bugs which are triggered by weblint. You shouldn't experience problems if you have 4.036, or 5.002.

AUTHOR

Neil Bowers, Canon Research Centre Europe neilb@cre.canon.co.uk

CONTRIBUTIONS

Lots of people have contributed to weblint, in the form of suggestions, bug reports, fixes, and contributed code. Please email me if your name should appear in the roll call below. Abigail <abigail@mars.ic.iaf.nl>; Anthony Thyssen <anthony@cit.gu.edu.au>; Axel Boldt <axel@uni-pader- born.de>; Barry Bakalor <barry@hal.com>; Bill Arnett <billa@netcom.com>; Bob Friesenhahn <bfriesen@simple.dal- las.tx.us>; Mark Gates <mr-gates@uiuc.edu>; Bruce Speyer <bspeyer@texas-one.org>; Chris Siebenmann <cks@hawk- wind.utcs.toronto.edu>; Clay Webster <clay@unipress.com>; Dana Jacobsen <dana@acm.org>; David Begley <david@bacall.nepean.uws.edu.au>; David J. MacKenzie <djm@va.pubnix.com>; Douglas Brick <dbrick@u.washing- ton.edu>; Gil Citro; Eric de Mund <ead@ixian.com>; Richard Finegold <goldfndr@eskimo.com>; Joerg Heitkoetter <Joerg.Heitkoetter@germany.eu.net>; David Koblas <koblas@homepages.com>; John Labovitz <johnl@ora.com>; Eric Maryniak <E.Maryniak@rgd.nl>; John F. Whitehead <jfw@wral-tv.com> Juergen Schoenwaelder <schoenw@ibr.cs.tu-bs.de>; Frank Steinke <fsteinke@zeta.org.au>; Larry Virden <lvirden@cas.org>; Paul Black <black@lal.cs.byu.edu>; Doug Grinbergs <dougg@qualcomm.com>; Philip Hallstrom <philip@wolfe.net>; Craig Leres <leres@ee.lbl.gov>; Richard Lloyd <R.K.Lloyd@csc.liv.ac.uk>; Charles F. Randall <cran- dall@dmacc.cc.ia.us>; Robert Schmunk <pcrxs@nasagiss.giss.nasa.gov>; Jeff Schave <schave@engr.wisc.edu>; Jon Thackray <jrmt@uk.gdscorp.com>; Jens Thordarson <thor- durh@rhi.hi.is>; Ryan Waldron <rew@nuance.com>; Thomas Leavitt <leavitt@webcom.com>; Tom Neff <tneff@panix.com>; Victor Parada <vparada@inf.utfsm.cl>; Erick Branderhorst <branderhorst@fgg.eur.nl>; Bryan O'Sullivan <bos@serpen- tine.com>; Alan J. Flavell <FLAVELL@v2.ph.gla.ac.uk>; Raphael Manfredi <Raphael_Manfredi@grenoble.hp.com>; Keith Iosso <a-keithi@microsoft.com>; Chris Lambert <lam- bertc@sharelink.com>; Tristan Savatier <tristan@cre- ative.net>; Phil Hooper <hooper@bcci.eng.sun.com>; Gerald Viers <grviers@csupomona.edu>; Dean Brissinger <briss- ing@bvsd.k12.co.us>; Dave Schmitt <dschmi1@gl.umbc.edu>; John Van Essen <vanes002@maroon.tc.umn.edu>; Brandon Bell <brandon@arcs.bcit.bc.ca>; Fumio Moriya and Toshiaki Nomura <dsfrsoft@oai6.yk.fujitsu.co.jp>; Vincent Lefevre <vlefevre@ens-lyon.fr>; Jason Mathews <math- ews@nssdc.gsfc.nasa.gov>; Lars Balker Rasmussen <lbr@mjol- ner.dk>; Richard L. Hawes <rhawes@dmapub.dma.org>.

LinkScan Reference Manual. Section 25

Glossary of Terms

This section define some LinkScan constructs and related terminology with reference to various standards, where appropriate:

1. Projects 2. Owners 3. Usernames
4. Virtual Hosts 5. Pathnames 6. Pathname Expressions
7. Home Directory 8. LinkScan Directory 9. Project Directory
10. Uniform Resource Locators (URL's) 11. Internal Links 12. External Links
13. Orphaned Files 14. HyperText Markup Language (HTML) 15. HyperText Transfer Protocol (HTTP)
16. File Transfer Protocol (FTP) 16. HTTP Scanning 18. File System Scanning
19. Import Scanning 20. Perl Regular Expressions 21. Content-Type/MIME
22. Date and Time Last-Modified 23. Document Weight 24. Click Depth

25.1 Projects

LinkScan is able to scan multiple websites. It can also scan the same website multiple times with different configuration options. In each case, LinkScan creates a unique and corresponding LinkScan Database containing the results of the analysis. Together, the configuration files and database constitute a LinkScan Project.

Each LinkScan Project is stored within a subdirectory of the main LinkScan installation directory.

Hence users must always select a Project when scanning a website. Any they must select a Project when viewing the results.

25.2 Owners

Within each Project, you may also configure multiple LinkScan Owners. Collections of HTML documents and other files are assigned between Owners in a variety of ways:

The LinkScan Owner concept enables individual content developers or workgroups to view results that pertain to their documents or areas of responsibility.

25.3 Usernames

LinkScan incorporates access controls that may be used to limit user access to LinkScan databases and results. These controls are not enabled by default.

When activated, users may be required to login to the LinkScan system used a pre-defined LinkScan Username and associated password. The Username will define the Projects and Owners that an individual user is permitted to access.

25.4 Virtual Hosts

A Virtual Host is the Fully Qualified Domain Name (or IP address) of a network host configured on your server. Many servers are configured for a single Virtual Host but others are configured to support multiple Virtual Hosts. You must define at least one LinkScan Project for each Virtual Host that you wish to test.

25.5 Pathnames

Pathnames are used to refer to directory structures. They may be Relative or Absolute. Note also that Pathnames are used in the URL context and the File System context. For example:

/usr/www/htdocs/products/widget.html          # Absolute pathname, file system context
C:/www/products/widget.html                   # Absolute pathname, file system context
http://www.example.com/products/widget.html   # Absolute URL
../products/widget.html                       # Relative link, URL or file system context

LinkScan makes extensive use of a normalized representation such that the documents referred to above would be referenced as:

products/widget.html

This offers the advantages of brevity and consistency, since products/widget.html may typically be used to refer to both:

C:/www/products/widget.html and
http://www.example.com/products/widget.html

The normalized format is referred to in this document as relative-path.

25.6 Pathname Expressions

Many LinkScan customization features refer to relative-path-expression. That is a Perl Regular Expression matching a relative-path.

25.7 Home Directory

The directory on your server that is considered to be the root directory of your HTTP server. Sometimes known as www root.

25.8 LinkScan Directory

The directory on your computer where LinkScan is installed.

25.9 Project Directory

A subdirectory of the LinkScan Directory containing the configuration and data files associated with a specific Project.

25.10 Uniform Resource Locators (URL's)

The various Uniform Resource Locator formats are defined in RFC 2396.

25.11 Internal Links

Internal Links are defined as links to the current Project.


Examples:

<a href="filename.html">This is an Internal Link</a>

<a href="http://www.elsop.com/index.html">This is an Internal
Link if the current Project is http://www.elsop.com/</a>

25.12 External Links

External Links are defined as links specified using an Absolute URL to any Project other than the current Project.


Example:

<a href="http://www.otherdomain.com/">This is an External Link</a>

25.13 Orphaned File

Orphaned Files are defined files present in the Home Directory (or any subdirectory thereof) which cannot be reached via one or more internal links from the Home Page.

25.14 HyperText Markup Language (HTML)

The HyperText Markup Language (HTML 3.2) lies at the heart of the World Wide Web.

LinkScan attempts to parse the HTML source code according to the published standards. However, as with all web browsers, the results can be unpredictable when the HTML source code deviates from the specifications. Experience with LinkScan indicates that the following points are worthy of note.

25.15 HyperText Transfer Protocol (HTTP)

The HyperText Transfer Protocol (HTTP 1.0) has been used for World Wide Web communications since 1990. In January 1997, the specifications for HTTP 1.1 were published. LinkScan exploits many HTTP features to establish the status of the external links.

In most cases LinkScan is able to definitively establish the status of any given link. However, at any moment in time a small proportion of links (typically around 5%) are temporarily unavailable. In such cases, LinkScan will make two attempts to reach the site before flagging those URL's as "Possible Errors" to be retested at a later time (automatically or manually).

An even smaller percentage of sites are accessible via a web browser but fail to return message headers in accordance with the HTTP specifications. In many cases, LinkScan is still able to establish the status, but a few sites are so grossly non-compliant that LinkScan will return an "Unknown Error" to flag them for manual testing. In tests, only one or two sites per thousand fell into this category.

25.16 File Transfer Protocol (FTP)

The File Transfer Protocol (FTP) is a relatively old standard, compared to HTTP. See RFC 640.

25.17 HTTP Scanning

Typically, LinkScan accesses the scanned website via the Network and HTTP. This is an appropriate method in most cases.

25.18 File System Scanning

Optionally, LinkScan may be configured to access part of all of the scanned website by direct access to all of the website files on your computers file system. This offers several advantages and disadvantages:

Note that LinkScan may also be configured to scan a site using a combination of both the HTTP and File System Methods. This powerful capability my be used, for example, to enable HTTP Scanning of website content and the comparison of the results with those from File Systems Scanning to reconcile the Orphaned Files.

25.19 Import Scanning

In addition to HTTP Scanning and File System Scanning, LinkScan supports a third mode of operation; Import Scanning. This is used to validate lists of Documents or Links that are imported from simple text files. The Import Lists may be prepared manually but it is more common for them to be exported from a database management system or other application.

25.20 Perl Regular Expressions

LinkScan incorporates a vast array of customization features many of which exploit the power of Perl Regular Expressions. For a description of Perl Regular Expressions on Unix systems, see man perlre. HTML versions are available at many locations including:

http://www.cpan.org/doc/manual/html/pod/perlre.html

We also recommend the book Mastering Regular Expressions (a.k.a. the Owl Book) by Jeffrey E.F. Friedl, and published by O'Reilly [ISBN: 1-56592-257-3].

25.21 Content-Type/MIME

When files are served via the Hypertext Transfer Protocol (HTTP) the normal conventions with respect to file extensions do not apply. The content of the file is defined by a HTTP Content-Type header (a.k.a. MIME type). Common examples include:

Content-Type: text/html
Content-Type: image/gif

25.22 Date and Time Last-Modified

LinkScan always attempted to store a date/time stamp with each document to indicate when the file was last modified. When scanning via the File System, LinkScan is able to capture this data directly from the operating system. However, when LinkScan does not have direct access to the server File System, it looks for a HTTP Last-Modified header. Most web server supply this when serving static HTML documents (without Server Side Includes). However, it is typically not supplied when serving dynamic pages and the data may not be available. Note however, that LinkScan does have the ability to extract information of this type from META tags when available -- see How to process additional per-document data.

25.23 Document Weight

LinkScan calculates the total weight of each document. This calculation is based on the total in-line byte count and takes account of:

25.24 Click Depth

LinkScan tracks and stores the depth of each document during the course of the scan. The depth reflects the number of hyperlinks the use must click to reach the target starting from the initial URL. Note that LinkScan uses a deepest-first algorithm to scan a site. In general, the click-count is not incremented when following:

LinkScan Reference Manual. Section 26

LinkScan Quick Reference Card

Basic Casesensitive Homefile Homeurl Http
  Organization Projectdesc    

CustomReport Customsort Editdoc Editlink Reportsdir
  Reportsurl Statuscode    

CustomScan Auth Closeatag Collectmeta Cookie
  Errorbody Errordoc Execute Extraheader
  Extrahome Followframes Hostalias Imgtags
  Insertlink Mimetypes Mirrorurl Noforms
  Profiler Profilerlog Profilermax Sessionmatch
  Substitute Substituteraw Userdata Userdatafmt
  Userdatasub      

Database Maxbadint Maxext Maxgoodint Nohash
  Nomap Nosplit Tagonce  

Dispatch Dispatchsort Mailalias Mailhost Mailnoerr
  Maxsev Sendmailpath    

External Checkmailto FTPPass FTPUser Fetchext
  Followext Hostname Mailfrom Masterhist
  Maxbadhours Maxftp Maxgoodhours Maxhist
  Maxservertries Nameservers Noexternal  

File Alias Autohttp Checkorphans Defaultpages
  Expandssi Flashfiles Homedir Htmlfiles
  Indexoptions Mapfiles Maxdirlevels Noorphan
  Noorphans Onlyorphans Orphanfile Pdffiles
  Redirect      

Import Import Importfile    

JavaScript Scriptexclude Scriptmatch Scriptnomatch Selecturl
         

Misc Unsafechar      

Owner Defaultowner Owner Owneralias Ownertags
         

Scope Exclude Excludecookie Mask Maxcgi
  Maxclicks Maxlevels Nofollow Onlyfollow
  Onlyinclude Taglimit    

Security Access Httpauth Linkscancookie Mailto
  Noprojectlist Nostaticmenu Notapmapoptions  

SiteMap Mapdefaulttitle Mapext Maphide Mapinclude
  Mapmove Maptitle    

System Cgibinpath Cgibinurl Docspath Docsurl
  Httpsproxyport Httpsproxyserver Key LicenseNumber
  Licensee Linespeed Linkscandir Linkscanurl
  Masterport Msiis Noproxy Perlpath
  Proxyauth Proxymatch Proxyport Proxyserver
  Slaves1 Slaves2 Slavesfast1 Slavesfast2
  Timeout1 Timeout2 Weblintoptions Weblintpath
  Wwwpath Wwwurl    

WebServer Server Serverallow Serverauth Serverdeny

Access [1] Syntax: Access username : password : project-list : owner-list : menu-options
Category: Security Default: Access * : * : * : * : *
Type: Multi-valued Used by: linkscan.sys
 
Activates the Access Controls on the LinkScan Reports. Not enabled by default; see references.
 
Alias [1] Syntax: Alias relative-path-expression absolute-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
The Alias command maps a URL to a physical file system path. This is required when, for example, a specific directory does not reside under the normal webserver root directory. It is important to ensure that the forward slash symbols are balanced exactly as shown in the example.
Alias cgi-bin/ /usr/www/cgi-bin/
 
Auth [1] Syntax: Auth server-name "realm-name" username password
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Configures LinkScan to use HTTP Basic Authentication. Note that server-name must be specified as a hostname and not as a URL. The realm-name must be specified and quoted. However, it may be empty, in which case LinkScan will use the supplied username/password for any realm-name on server-name.
Auth www.example.com "" guestuser xxxxxx
 
Autohttp [1] Syntax: Autohttp = boolean
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Autohttp = 1 LinkScan will automatically attempt HTTP access on any link that cannot be found/validated when using File System Scanning.
 
Casesensitive Syntax: Casesensitive = boolean
Category: Basic Default: Casesensitive = 1
Type: Single-valued Used by: linkscan.cfg
 
When Casesensitive = 1 LinkScan assumes that all pathnames are case-sensitive (normally appropriate when scanning Unix-based servers). When Casesensitive = 0 LinkScan forces all pathnames to lower case (normally appropriate when scanning Windows-based servers).
 
Cgibinpath [1] Syntax: Cgibinpath = absolute-path
Category: System Default: Cgibinpath = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the directory in which the LinkScan CGI scripts reside.
 
Cgibinurl [1] Syntax: Cgibinurl = absolute-url
Category: System Default: Cgibinurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which the LinkScan CGI scripts reside. Required in order that the LinkScan CGI scripts can link to each other.
 
Checkmailto [1] Syntax: Checkmailto = boolean
Category: External Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Checkmailto = 1 enable active checking of mailto: links. Several other items must be configured when using this feature. See references.
 
Checkorphans [1] Syntax: Checkorphans relative-path
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Forces LinkScan to scan the directory specified by relative-path for Orphaned Files.
 
Closeatag Syntax: Closeatag = boolean
Category: CustomScan Default: Closeatag = 1
Type: Single-valued Used by: linkscan.cfg
 
When Closeatag = 0 do not generate errors for <A HREF=...> tags without a corresponding </A> tag.
 
Collectmeta Syntax: Collectmeta = boolean
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Collectmeta = 1 save all document <META> tags to the file: LinkScan/project_dir/data/linkscan.met
 
Cookie [1] Syntax: Cookie server-name cookie-name=cookie-value
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Pre-load LinkScan with Cookies. Note that server-name must be specified as a hostname and not as a URL. Do not enter spaces around the "=" sign. Prefix the domain name with a period to create a wildcard, as shown in the example.
Cookie .example.com USERID=1234
 
Customsort [1] Syntax: Customsort = expression
Category: CustomReport Default: none
Type: Single-valued Used by: linkscan.cfg
 

 
Defaultowner [1] Syntax: Defaultowner = owner-name
Category: Owner Default: none
Type: Single-valued Used by: linkscan.cfg
 
Establishes a default Owner.
 
Defaultpages [1] Syntax: Defaultpages = filename [, filename]...
Category: File Default: Defaultpages = index.html, index.shtml, index.htm, home.html, home.shtml, home.htm
Type: Single-valued Used by: linkscan.cfg
 
When configured to use File System Scanning and LinkScan encounters a link to a directory without a specific filename, it search for documents with these filenames (in the order specified).
 
Dispatchsort [1] Syntax: Dispatchsort = integer
Category: Dispatch Default: Dispatchsort = 1
Type: Single-valued Used by: linkscan.cfg
 
Defines the sort sequence for LinkScan Dispatch Reports.
1 = By referer; 2 = By status code; 3 = By links alphabetically
 
Docspath [1] Syntax: Docspath = absolute-path
Category: System Default: Docspath = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the directory in which the LinkScan documentation resides.
 
Docsurl [1] Syntax: Docsurl = absolute-url
Category: System Default: Docsurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which the LinkScan documentation resides. Required in order that the LinkScan CGI scripts can link to the documentation and associated images.
 
Editdoc [1] Syntax: Editdoc = URL
Category: CustomReport Default: none
Type: Single-valued Used by: linkscan.cfg
 
Adds a linking URL to the LinkScan Reports. These may include the optional tokens !URL, !CAP or !STAT. The tokens are replaced with %encoded strings containing:
The URL of the target resource
The Title or Caption (as appropriate) associated with the target resource
The Status Code of the target resource.
Editdoc = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
 
Editlink [1] Syntax: Editlink = URL
Category: CustomReport Default: none
Type: Single-valued Used by: linkscan.cfg
 
Adds a linking URL to the LinkScan Reports. These may include the optional tokens !URL, !CAP or !STAT. The tokens are replaced with %encoded strings containing:
The URL of the target resource
The Title or Caption (as appropriate) associated with the target resource
The Status Code of the target resource.
Editlink = http://foo/bar.cgi?Url=!URL&Cap=!CAP&Status=!STAT
 
Errorbody [1] Syntax: Errorbody = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Any document with a body that matches expression is marked as 404 Not Found regardless of the actual server status.
Errorbody (?i).*runtime\serror
 
Errordoc [1] Syntax: Errordoc = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Any URL that is redirected to a location that matches expression is marked as 404 Not Found regardless of the actual server status.
Errordoc special/notfound\.html
 
Exclude Syntax: Exclude relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are completely ignored by LinkScan.
Exclude archives/
 
Excludecookie Syntax: Excludecookie expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Cookies matching expression are completely ignored by LinkScan. Expression must either match the cookie name OR the following semi-colon delimited string of cookie attributes: "domain;port;path;cookiename;cookievalue;expires;setbypage"
Excludecookie [^;]*;[^;]*;[^;]*;[^;]*;SESSIONID
 
Execute Syntax: Execute relative-path-expression
Category: CustomScan Default: Execute Execute cgi-bin/, Execute (?i).*\.(cgi|asp)$
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are accessed using Network (HTTP) Scanning.
 
Expandssi [1] Syntax: Expandssi = boolean
Category: File Default: Expandssi = 1
Type: Single-valued Used by: linkscan.cfg
 
When Expandssi = 1 and File System Scanning is enabled LinkScan will process Server Side Includes (SSIs) constructed using the Apache Include Virtual conventions.
 
Extraheader [1] Syntax: Extraheader http-header
Category: CustomScan Default: Extraheader Extraheader User-Agent: LinkScan Enterprise/9.0 Windows
Type: Multi-valued Used by: linkscan.cfg
 
Configures additional HTTP headers that LinkScan will send with every request. Mainly used to emulate different browser types.
Extraheader User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
 
Extrahome [1] Syntax: Extrahome relative-path
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Instructs LinkScan to access the specified URL at the start of a scan. May be used to submit forms with specified data values. See example and references.
Extrahome cgi-bin/postscript.cgi??Name=Malcolm%20Hoar&Password=secret
 
FTPPass [1] Syntax: FTPPass = password
Category: External Default: FTPPass = me@example.com
Type: Single-valued Used by: linkscan.sys
 
Sets the password to use when validating links to FTP sites.
 
FTPUser [1] Syntax: FTPUser = username
Category: External Default: FTPUser = anonymous
Type: Single-valued Used by: linkscan.sys
 
Sets the username to use when validating links to FTP sites.
 
Fetchext [1] Syntax: Fetchext = boolean
Category: External Default: none
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to fetch the document bodies when checking External links. Normally used in conjunction with the LinkScan Profiler.
 
Flashfiles [1] Syntax: Flashfiles = file-extension [, file-extension]...
Category: File Default: Flashfiles = swf
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted using the Flash/Shockwave format. When using Network (HTTP) Scanning, a non-blank entry causes LinkScan to interpret any link with a Content-Type: application/x-shockwave-flash header using the Flash/Shockwave format.
 
Followext Syntax: Followext = boolean
Category: External Default: Followext = 1
Type: Single-valued Used by: linkscan.cfg
 
When Followext = 1 LinkScan follows redirections when scanning External links.
 
Followframes Syntax: Followframes = boolean
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Followframes = 1 LinkScan will always follow links within framesets (regardless of any Nofollow commands).
 
Homedir [1] Syntax: Homedir = absolute-path
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the absolute pathname to the directory/folder containing the root of the target website. Only applicable when File System Scanning and Orphan File detection are enabled. Note that Homedir must point at the root of the site and not a sub-directory thereof.
Homedir = C:/www/
 
Homefile [1] Syntax: Homefile = relative-url
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the initial document for the start of a scan (relative to Homeurl and Homedir).
Homefile = index.html
 
Homeurl [1] Syntax: Homeurl = absolute-url
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the base-URL for the start of a scan. Do not append additional directory or file names to the URL (use Homefile instead). Homedir must point at the root of the target website.
Homeurl = http://www.example.com/
 
Hostalias [1] Syntax: Hostalias from-absolute-url to-absolute-url
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Establishes synonyms for the same host.
Hostalias http://www2.example.com/ http://www.example.com/
 
Hostname Syntax: Hostname = hostname
Category: External Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the Hostname to use for HELO messages. Only used when active mailto: checking is enabled.
 
Htmlfiles [1] Syntax: Htmlfiles = file-extension [, file-extension]...
Category: File Default: Htmlfiles = html, shtml, htm
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted as an HTML document. When using Network (HTTP) Scanning, any link with a Content-Type: text/html header in interpreted as indicating HTML format.
 
Http Syntax: Http = boolean
Category: Basic Default: Http = 1
Type: Single-valued Used by: linkscan.cfg
 
When Http = 1 LinkScan uses Network (HTTP) Scanning for the entire target website. Note that this will disable Orphaned File checking. To enable Orphan checking, you must set Http = 0 and configure Homedir. Use Execute .* to force HTTP Scanning with Orphan File checking.
 
Httpauth Syntax: Httpauth = env-var
Category: Security Default: Httpauth = REMOTE_USER
Type: Single-valued Used by: linkscan.sys
 
Sets the system Environment variable name to use in conjunction with the LinkScan access controls and HTTP user authentication. Not required unless you enable LinkScan Access Controls.
 
Httpsproxyport [1] Syntax: Httpsproxyport = integer
Category: System Default: Httpsproxyport = 80
Type: Single-valued Used by: linkscan.sys
 
Sets the Port Number associated with Httpsproxyserver.
 
Httpsproxyserver [1] Syntax: Httpsproxyserver = hostname
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the Hostname or IP address of your HTTPS Proxy Server (if any). Do not enter a URL address. Not required on Windows systems since LinkScan includes native support for the Secure Sockets Layer (SSL) and https:// addresses.
 
Imgtags Syntax: Imgtags = [AHW]
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Enables additional checking of <IMG SRC=...> tags for Alt, Height and Width attributes.
 
Import Syntax: Import = 0 | 1 | 2 | 3
Category: Import Default: none
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to use Import Scanning.
Import = 1; Import ASCII list of links
Import = 2; Import ASCII list of documents
Import = 3; Import ASCII list of documents (with de-caching)
 
Importfile Syntax: Importfile = absolute-path
Category: Import Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the absolute pathname to the ASCII file to be processed when Import Scanning is selected.
 
Indexoptions [1] Syntax: Indexoptions = boolean
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Indexoptions = 1 and File System Scanning is enabled, LinkScan will create directory listing when no Defaultpages (e.g. index.html) are present.
 
Insertlink [1] Syntax: Insertlink Insertlink document-match new-document [-|+|*]
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
May be used to insert synthetic links into a scanned document.
 
Key [1] Syntax: Key = special-key
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the LinkScan License Key -- supplied by Elsop.
 
LicenseNumber [1] Syntax: LicenseNumber = integer (10-digit)
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the LinkScan License Number -- supplied by Elsop.
 
Licensee [1] Syntax: Licensee = name
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Name of your Company or Department.
 
Linespeed [1] Syntax: Linespeed = integer
Category: System Default: Linespeed = 1
Type: Single-valued Used by: linkscan.sys
 
Sets a default linespeed for the calculation of document load times on the Summary/Detail Report.
 
Linkscancookie Syntax: Linkscancookie = boolean
Category: Security Default: none
Type: Single-valued Used by: linkscan.sys
 
Define the type of Cookie used by the LinkScan Reporting System (i.e. linkscan.cgi) for storing user preferences. 0=Permanent cookie; 1=Session cookie; 2=No cookie
 
Linkscandir [1] Syntax: Linkscandir = absolute-path
Category: System Default: Linkscandir = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the directory in which LinkScan is installed.
 
Linkscanurl [1] Syntax: Linkscanurl = absolute-url
Category: System Default: Linkscanurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which LinkScan is installed.
 
Mailalias [1] Syntax: Mailalias expression address [, address]...
Category: Dispatch Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Sets associations between Owners matching expression and a comma separated list of e-mail addresses.
Mailalias Products sales@example.com, product-manager@example.com
 
Mailfrom [1] Syntax: Mailfrom = username
Category: External Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the address to use for FROM messages. Only used when active mailto: checking is enabled.
 
Mailhost [1] Syntax: Mailhost = hostname
Category: Dispatch Default: none
Type: Single-valued Used by: linkscan.cfg
 
Sets the default hostname for LinkScan Dispatch reports sent via e-mail. By default, all reports are mailed to Owner@Mailhost. See Mailalias if you need more control.
 
Mailnoerr [1] Syntax: Mailnoerr = boolean
Category: Dispatch Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Mailnoerr = 1 LinkScan Dispatch will e-mail reports to their respective Owners even when no broken links were detected.
 
Mailto [1] Syntax: Mailto = boolean
Category: Security Default: none
Type: Single-valued Used by: linkscan.sys
 
Enable Mailto forms on the LinkScan reports. This option requires that the LinkScan to Email Interface be configured.
 
Mapdefaulttitle [1] Syntax: Mapdefaulttitle [ string ] [ !PATH | !FILE ] [ string ]
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Defines a default Title for SiteMap/TapMap; used when no actually <title> tags were seen. The special tokens !PATH and !FILE are replaced with the actual pathnames or filenames, respectively.
Mapdefaulttitle = No title tags in !PATH
 
Mapext [1] Syntax: Mapext boolean
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Include External Links on the SiteMap.
Mapext = 1
 
Mapfiles [1] Syntax: Mapfiles = file-extension [, file-extension]...
Category: File Default: Mapfiles = map
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted as a server-side image map file.
 
Maphide [1] Syntax: Maphide relative-path-expression
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Documents matching relative-path-expression are hidden from the SiteMap and TapMap.
Maphide .*messages/
 
Mapinclude [1] Syntax: Mapinclude relative-path-expression
Category: SiteMap Default: Mapinclude HTML Documents
Type: Multi-valued Used by: linkscan.cfg
 
Documents matching relative-path-expression are included in the SiteMap and TapMap. By default, only HTML documents are included; links to images and other file types are hidden. You may include all files by using, for example:
Mapinclude .*
 
Mapmove [1] Syntax: Mapmove relative-document-path, new-parent-relative-path, position [, new-title]
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to customize the SiteMap and TapMap by forcing specific documents to assigned to different positions in the hierarchy.
Mapmove child.html, parent.html, 1
 
Maptitle [1] Syntax: Maptitle relative-document-path, string
Category: SiteMap Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Replace the actual title of document relative-document-path with string.
 
Mask Syntax: Mask = relative-path-expression
Category: Scope Default: none
Type: Single-valued Used by: linkscan.cfg
 
Directly equivalent to Onlyinclude except that Mask is single-valued.
 
Masterhist Syntax: Masterhist = boolean
Category: External Default: Masterhist = 1
Type: Single-valued Used by: linkscan.sys
 
When Masterhist = 1 LinkScan maintains the status of external links in a global history file shared between all Projects.
 
Masterport Syntax: Masterport = port#
Category: System Default: Masterport = 8010
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Defines a TCP/IP Port Number on your computer. LinkScan uses this Port and the following "N" ports for its own interprocess communication. "N" is defined by the maximum of Slave processes used during the scan. You will not normally need to change this unless the default Port is being used by another application.
 
Maxbadhours [1] Syntax: Maxbadhours = integer
Category: External Default: none
Type: Single-valued Used by: linkscan.sys
 
Do not check Bad External links more frequently than once every integer hours.
 
Maxbadint [1] Syntax: Maxbadint = integer
Category: Database Default: Maxbadint = 100
Type: Single-valued Used by: linkscan.cfg
 
Do not store more than integer references to an Internal broken link in the Database.
 
Maxcgi [1] Syntax: Maxcgi = integer
Category: Scope Default: Maxcgi = 100
Type: Single-valued Used by: linkscan.cfg
 
Controls the maximum number of times any given base URL with be tested with different query strings. Avoid the potential for excessive and potentially infinite iteration over many query strings. See also the Taglimit option provides even finer control.
 
Maxclicks [1] Syntax: Maxclicks = integer
Category: Scope Default: none
Type: Single-valued Used by: linkscan.cfg
 
Limit the scope of a scan to "N" click levels deep.
 
Maxdirlevels [1] Syntax: Maxdirlevels = integer
Category: File Default: Maxdirlevels = 10
Type: Single-valued Used by: linkscan.cfg
 
Do not scan the File System more than integer directory levels deep when scanning for Orphaned Files. Avoids recursion issues with Symlinks on Unix systems.
 
Maxext [1] Syntax: Maxext = integer
Category: Database Default: Maxext = 100
Type: Single-valued Used by: linkscan.cfg
 
Do not store more than integer references to an External link in the Database.
 
Maxftp [1] Syntax: Maxftp = integer
Category: External Default: Maxftp = 25
Type: Single-valued Used by: linkscan.cfg
 
Do not test more than integer links to any one FTP server. This prevents excessive/inappropriate loads on the remote server. The FTP protocol carries significantly more overhead than HTTP.
 
Maxgoodhours [1] Syntax: Maxgoodhours = integer
Category: External Default: Maxgoodhours = 4
Type: Single-valued Used by: linkscan.sys
 
Do not check Good External links more frequently than once every integer hours.
 
Maxgoodint [1] Syntax: Maxgoodint = integer
Category: Database Default: Maxgoodint = 100
Type: Single-valued Used by: linkscan.cfg
 
Do not store more than integer references to a good Internal link in the Database.
 
Maxhist Syntax: Maxhist = integer
Category: External Default: Maxhist = 10
Type: Single-valued Used by: linkscan.sys
 
For External links, store the last integer results in the History file.
 
Maxlevels Syntax: Maxlevels = integer
Category: Scope Default: none
Type: Single-valued Used by: linkscan.cfg
 
Limit the scope of a scan to "N" directory levels.
 
Maxservertries [1] Syntax: Maxservertries = integer
Category: External Default: Maxservertries = 25
Type: Single-valued Used by: linkscan.cfg
 
When validating External links, abort testing of all links to a host that has already recorded more than integer errors. This prevents LinkScan from attempting to check many links to a host that may be temporarily unavailable (and hence multiple timeout delays).
 
Maxsev [1] Syntax: Maxsev = severity
Category: Dispatch Default: Maxsev = 3
Type: Single-valued Used by: linkscan.cfg
 
Defines the maximum severity level to be included in the LinkScan Dispatch Reports.
 
Mimetypes Syntax: Mimetypes Mimetypes mime-type [D|H|J|S]
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Enables the scanning (via HTTP) of additional document types based on their MIME (Content-type) header. Analogous to the File System Scanning equivalents: Htmlfiles, Mapfiles, Pdffiles, and Flashfiles. Documents are interpreted as follows: D=PDF, H=HTML, J=JavaScript, S=Shockwave/Flash.
Mimetypes application/x-javascript J
 
Mirrorurl [1] Syntax: Mirrorurl = absolute-url
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Instructs LinkScan to send all HTTP requests to the Mirrorurl address even though, logically, it behaves as if it is scanning a different host.
Mirrorurl = http://staging.example.com/
 
Msiis [1] Syntax: Msiis = boolean
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Set Msiis = 1 when you are using LinkScan in conjunction with a Microsoft IIS/PWS installation running on your computer. This enables a workaround to an IIS bug.
 
Nameservers [1] Syntax: Nameservers = ipaddress [, ipaddress]...
Category: External Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets default name servers. Only used when active mailto: checking is enabled. See references.
 
Noexternal Syntax: Noexternal = boolean
Category: External Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Noexternal = 1 disable validation of all External links.
 
Nofollow [1] Syntax: Nofollow relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Do not analyze documents matching relative-path-expression. LinkScan will validate links to pages matching this pattern but it will ignore all links flowing out of pages matching this pattern.
 
Noforms [1] Syntax: Noforms = boolean
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Noforms = 1 do not validate links found within <FORM ACTION=...> tags.
 
Nohash [1] Syntax: Nohash = boolean
Category: Database Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Nohash = 1 suppress creation of the hash file portion of the database. This will disable some reports and effect the behavior of others. See references.
 
Nomap [1] Syntax: Nomap = boolean
Category: Database Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Nomap = 1 suppress creation of the LinkScan SiteMap and TapMap. This may save memory and processing time on very large sites.
 
Noorphan [1] Syntax: Noorphan = boolean
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Do not scan for Orphaned Files (equiv. -noorphans).
 
Noorphans [1] Syntax: Noorphans relative-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Do not scan directories matching relative-path-expression for Orphaned Files.
 
Noprojectlist Syntax: Noprojectlist = boolean
Category: Security Default: none
Type: Single-valued Used by: linkscan.sys
 
Noprojectlist = Prompt for Project versus displaying drop-down list
 
Noproxy [1] Syntax: Noproxy = hostname-expression [, hostname-expression]...
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Bypass any configured Proxy Server and use direct Network (HTTP) access to any hosts matching hostname-expression.
 
Nosplit [1] Syntax: Nosplit = boolean
Category: Database Default: none
Type: Single-valued Used by: linkscan.cfg
 
When Nosplit = 1 disable creation of per-Owner databases. Saves disk storage and processing time on very large sites.
 
Nostaticmenu Syntax: Nostaticmenu = boolean
Category: Security Default: none
Type: Single-valued Used by: linkscan.sys
 
When Nostaticmenu = 1 disable the LinkScan Toolbar on command-line generated reports.
 
Notapmapoptions Syntax: Notapmapoptions = boolean
Category: Security Default: none
Type: Single-valued Used by: linkscan.sys
 
When Notapmapoptions = 1 disable the Options Menu on LinkScan/TapMap.
 
Onlyfollow Syntax: Onlyfollow relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan areas of the website matching relative-path-expression. Validate but do not follow all other Internal links.
 
Onlyinclude Syntax: Onlyinclude relative-path-expression
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan areas of the website matching relative-path-expression. Completely ignore all other Internal links.
 
Onlyorphans [1] Syntax: Onlyorphans relative-path-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Only scan directories matching relative-path-expression for Orphaned Files.
 
Organization Syntax: Organization = string
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
Name of the organization/department associated with this Project (will appear on the subsequent reports).
 
Orphanfile [1] Syntax: Orphanfile = absolute-path
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
Specifies the absolute pathname to a file containing data regarding orphaned files, created by the lsfind utility. See references.
 
Owner [1] Syntax: Owner relative-path-expression owner-name
Category: Owner Default: Owner *1
Type: Multi-valued Used by: linkscan.cfg
 
Set document ownership. Documents with pathnames matching relative-path-expression are assigned to owner-name.
Owner mydirectory/ ownedbyme
 
Owneralias [1] Syntax: Owneralias expression owner-name
Category: Owner Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to manipulate Ownernames. Normally used in conjunction with Ownertags. See references.
 
Ownertags [1] Syntax: Ownertags = expression
Category: Owner Default: none
Type: Single-valued Used by: linkscan.cfg
 
Used to assign document Ownership based on META tags. See references.
 
Pdffiles [1] Syntax: Pdffiles = file-extension [, file-extension]...
Category: File Default: none
Type: Single-valued Used by: linkscan.cfg
 
When using File System Scanning, any file with this extension is interpreted using the PDF Document format. When using Network (HTTP) Scanning, a non-blank entry causes LinkScan to interpret any link with a Content-Type: application/pdf header using the PDF Document format.
 
Perlpath [1] Syntax: Perlpath = absolute-path
Category: System Default: Perlpath = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Absolute pathname to the Perl executable on your computer.
 
Profiler [1] Syntax: Profiler = integer
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Enables the LinkScan Profiler.
Profiler = 1 # Profile internal links
 
Profilerlog [1] Syntax: Profilerlog = integer
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Enables a detailed trace of the LinkScan Profiler results. The log is written to: .../LinkScan/Projectname/data/linkscan.red
 
Profilermax [1] Syntax: Profilermax = integer
Category: CustomScan Default: Profilermax = 200
Type: Single-valued Used by: linkscan.cfg
 
Sets the trigger level threshold for the LinkScan Profiler.
 
Projectdesc Syntax: Projectdesc = string
Category: Basic Default: none
Type: Single-valued Used by: linkscan.cfg
 
A description for this Project (will appear on the subsequent reports).
 
Proxyauth [1] Syntax: Proxyauth = "username:password"
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the username and password to use in conjunction with a Proxy Server that requires authentication (if any).
Proxyauth = "mylogin:mysecretpass"
 
Proxymatch [1] Syntax: Proxymatch Proxymatch [http|https|*] [host:port|direct] ["user:pass"] host1, host2...
Category: System Default: none
Type: Multi-valued Used by: linkscan.sys
 
The Proxymatch command may be used to configure complex proxy rules that are not handled by the (simpler) Proxyserver/Proxyport commands. Multiple Proxymatch commands are evaluated in the order specified with the last match assuming precedence.
 
Proxyport [1] Syntax: Proxyport = integer
Category: System Default: Proxyport = 80
Type: Single-valued Used by: linkscan.sys
 
Sets the Port number to use in conjunction with your Proxy Server (if any).
 
Proxyserver [1] Syntax: Proxyserver = hostname
Category: System Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the Hostname or IP address of your HTTP Proxy Server (if any). Do not enter a URL address.
 
Redirect Syntax: Redirect relative-path-expression absolute-url-expression
Category: File Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Used to simulate a webserver configured redirection when using File System Scanning.
Redirect documents/oldpage.html http://www.example.com/html/newpage.html
 
Reportsdir [1] Syntax: Reportsdir = absolute-path
Category: CustomReport Default: Reportsdir = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the path to the directory in which the LinkScan reports are created. Only used when generating reports from the command-line.
 
Reportsurl [1] Syntax: Reportsurl = absolute-url
Category: CustomReport Default: Reportsurl = Automatically set during installation
Type: Single-valued Used by: linkscan.sys
 
Sets the URL to the directory in which the LinkScan reports are created. Only used when generating reports from the command-line.
 
Scriptexclude [1] Syntax: Scriptexclude expression
Category: JavaScript Default: none
Type: Multi-valued Used by: linkscan.cfg
 
JavaScript code blocks matching expression are discarded and not scanned for links.
 
Scriptmatch [1] Syntax: Scriptmatch expression
Category: JavaScript Default: Scriptmatch (\w+://\S+|\S+/$|\S+\?\S+|\S+\.([a-z]{2,3}|[js]?html?|Z)$)
Type: Multi-valued Used by: linkscan.cfg
 
Patterns used to control the scanning of JavaScript constructs. You should not normally need to change these from their defaults.
 
Scriptnomatch [1] Syntax: Scriptnomatch expression
Category: JavaScript Default: Scriptnomatch .*([\(\)\[\]\{\}\']|document\.\S+|\.(src|com)$)
Type: Multi-valued Used by: linkscan.cfg
 
Patterns used to control the scanning of JavaScript constructs. You should not normally need to change these from their defaults.
 
Selecturl [1] Syntax: Selecturl expression
Category: JavaScript Default: none
Type: Multi-valued Used by: linkscan.cfg
 
The contents of select tags (drop-down lists) with name attributed matching expression are processed as links versus arbitrary data.
 
Sendmailpath [1] [2] Syntax: Sendmailpath = absolute-path
Category: Dispatch Default: none
Type: Single-valued Used by: linkscan.sys
 
Sets the absolute pathname to the sendmail executable on your computer.
 
Server Syntax: Server = boolean
Category: WebServer Default: Server = 1 (Windows); Not applicable on Unix systems
Type: Single-valued Used by: linkscan.sys
 
When Server = 1 the LinkScan WebServer is used to access the LinkScan CGI scripts and view the LinkScan Results.
 
Serverallow [1] Syntax: Serverallow IPaddress
Category: WebServer Default: Serverallow .*
Type: Multi-valued Used by: linkscan.sys
 
Access controls for the LinkScan WebServer. See references.
 
Serverauth [1] Syntax: Serverauth username:password
Category: WebServer Default: none
Type: Multi-valued Used by: linkscan.sys
 
Access controls for the LinkScan WebServer. See references.
 
Serverdeny [1] Syntax: Serverdeny IPaddress
Category: WebServer Default: none
Type: Multi-valued Used by: linkscan.sys
 
Access controls for the LinkScan WebServer. See references.
 
Serverindex [1] Syntax: Serverindex = boolean
Category: WebServer Default: Serverindex = 1
Type: Single-valued Used by: linkscan.sys
 
Access controls for the LinkScan WebServer. See references.
 
Sessionmatch [1] Syntax: Sessionmatch = expression
Category: CustomScan Default: none
Type: Single-valued Used by: linkscan.cfg
 
Used to capture, save, manipulate items such as session numbers. See references.
 
Slaves1 Syntax: Slaves1 = integer
Category: System Default: Slaves1 = 3
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the Internal links.
 
Slaves2 Syntax: Slaves2 = integer
Category: System Default: Slaves2 = 3
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the External links.
 
Slavesfast1 Syntax: Slavesfast1 = integer
Category: System Default: Slavesfast1 = 5
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the Internal links with the -fast option.
 
Slavesfast2 Syntax: Slavesfast2 = integer
Category: System Default: Slavesfast2 = 12
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Sets the number of simultaneous HTTP connections to be used when scanning the External links with the -fast option.
 
Statuscode [1] Syntax: Statuscode statuscode, severity
Category: CustomReport Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Modifies the severity associated with statuscode.
1=Error; 2=Possible Error; 3=Warning; 4=Advisory; 5=Good.
Statuscode = 301,3 # 301 (Moved Permanently) from Error to Warning
 
Substitute [1] Syntax: Substitute relative-path-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Manipulate links on-the-fly. See references.
 
Substituteraw [1] Syntax: Substituteraw relative-path-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Manipulate links on-the-fly. See references.
 
Taglimit [1] Syntax: Taglimit relative-path-expression integer
Category: Scope Default: none
Type: Multi-valued Used by: linkscan.cfg
 
When integer links matching relative-path-expression have been scanned, LinkScan ignores all subsequent matching links.
 
Tagonce [1] Syntax: Tagonce relative-path-expression
Category: Database Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Links matching relative-path-expression are stored only once, regardless of how many references are seen. Typically used to prevent thousands of references to "blank/filler" images from adding excessive bulk to the LinkScan database.
Tagonce .*blank\.gif$
 
Timeout1 Syntax: Timeout1 = integer
Category: System Default: Timeout1 = 20
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Timeout (in seconds) for first attempt to contact site.
 
Timeout2 Syntax: Timeout2 = integer
Category: System Default: Timeout2 = 40
Type: Single-valued Used by: linkscan.sys,linkscan.cfg
 
Timeout (in seconds) for second attempt to contact site.
 
Unsafechar [1] Syntax: Unsafechar = string
Category: Misc Default: Unsafechar = <>`"
Type: Single-valued Used by: linkscan.cfg
 
Unsafe characters. Do not escape these.
 
Userdata Syntax: Userdata [123] match-expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Extract user specified data from document (e.g. from META tags).
Userdata 1 (?i)<meta[^>]*emp-badge-no\s*=\s*"(\d+) $1
 
Userdatafmt Syntax: Userdatafmt [123] [DHLTX] integer[LRC] caption
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Format user specified data. D=date; H=hot links; L=link; T=truncate to format; X=normal
20R=20 chars right adjusted; 40L=40 chars left adjusted
Userdatafmt 1 X 10R Badge Number
 
Userdatasub Syntax: Userdatasub [123] expression expression
Category: CustomScan Default: none
Type: Multi-valued Used by: linkscan.cfg
 
Perform RegExp manipulations on user data fields.
 
Weblintoptions [1] [2] Syntax: Weblintoptions = string
Category: System Default: Weblintoptions = -d extension-markup,extension-attribute
Type: Single-valued Used by: linkscan.sys
 
Sets command-line options that are automatically passed to weblint.
 
Weblintpath [1] [2] Syntax: Weblintpath = absolute-path
Category: System Default: Weblintpath = C:/LinkScan/weblint/weblint
Type: Single-valued Used by: linkscan.sys
 
Sets the full pathname to the weblint executable.
 
Wwwpath [1] Syntax: Wwwpath = absolute-path
Category: System Default: Wwwpath = C:/LinkScan/
Type: Single-valued Used by: linkscan.sys
 
Sets the base/root folder for the LinkScan WebServer.
 
Wwwurl [1] Syntax: Wwwurl = absolute-url
Category: System Default: Wwwurl = http://localhost:83/
Type: Single-valued Used by: linkscan.sys
 
Sets the base/root URL for the LinkScan WebServer.
 

LinkScan Reference Manual. Section 27

LinkScan and Various Web Servers

This section discusses the use of LinkScan in conjunction with various web servers and the associated security implications:

  1. Web Server Requirements
  2. LinkScan and Apache
  3. LinkScan and IIS/PWS
  4. LinkScan Access Controls
  5. LinkScan Security Considerations

27.1 Web Server Requirements

When LinkScan is used to scan a website, the results are stored in the LinkScan database. Reports are created by executing queries against that database with several CGI programs that are supplied with LinkScan.

Hence, LinkScan will normally require that web server software be installed, configured and running on the installation computer. Note that LinkScan doesn't require access to a local web server in order to scan a web site. But a local web server is usually required to view the results of that scan.

The remainder of this section describes the use of LinkScan with various web servers and discusses the associated security considerations.

27.2 LinkScan and Apache

When using LinkScan with Apache (and most other web servers) two sets of considerations must be addressed:

Apache Requirements

Apache normally requires that several conditions be satisfied before it will execute the LinkScan CGI programs -- or any other CGI program, for that matter:

  1. The CGI programs must be installed in a directory that is configured to permit CGI executions. This is typically a cgi-bin directory configured with an Apache ScriptAlias However, any directory may be configured to permit CGI executions with the Apache Option ExecCGI
  2. The CGI programs must have an appropriate file extension Typically you will need an Apache AddHandler cgi-script .cgi
  3. The CGI program and the directory in which it resides will require appropriate permissions. Typically, one would use 711 for the directory and 755 for the CGI file
  4. The CGI program must not be owned by nobody
  5. The CGI program must include a valid shebang header pointing at the Perl 5 executable on your computer. For example:

    #!/usr/local/bin/perl

Unless all of the above are satisfied, Apache will refuse to execute the CGI program and you will likely receive a 500 Server Error or 403 Forbidden response.

LinkScan Requirements

LinkScan imposes certain additional (minimal) requirements:

  1. In the linkscan.sys configuration file, the Cgibinurl setting must be configured to point at the directory into which the LinkScan CGI programs have been installed. This is required in order that the LinkScan CGI programs can link to each other. For example: Cgibinurl = http://www.example.com/cgi-bin/
  2. In the linkscan.sys configuration file, the Docsurl setting must be configured to point at a directory containing the LinkScan documentation and associated images. For example: Docsurl = http://www.example.com/linkscan/docs/
  3. An additional requirement is imposed if (and only if) the LinkScan CGI programs are installed in a directory other than the main LinkScan directory (for example, if you moved them to a cgi-bin directory). In this case, the LinkScan CGI's will need to know where to find the rest of the LinkScan configuration files and databases. In the directory containing the LinkScan CGI programs, create a hidden file called .linkscan. This file needs to contain a single line entry with the full pathname to the main LinkScan directory. For example:

    /usr/linkscan/

    Be sure to include the leading and trailing forward-slash characters and make the file world readable (chmod 644 .linkscan).

Although the above guidelines are presented in the specific context of the Apache web server, the basic principals are quite generic and may easily be adapted to almost web server. Note also that LinkScan provides considerable flexibility; you may install the LinkScan CGI programs in one directory, the documentation in another and the main LinkScan system including the databases in a third. Indeed, LinkScan may easily be configured to run in chroot and other similar environments.

27.2 LinkScan and IIS/PWS

When using LinkScan with the Microsoft IIS or PWS web servers, two sets of considerations must be addressed:

IIS/PWS Requirements

IIS/PWS normally requires that several conditions be satisfied before it will execute the LinkScan CGI programs -- or any other CGI program, for that matter:

  1. The CGI programs must be installed in a folder that is configured to permit CGI executions.
  2. You will need to associate the .cgi file extension with Perl on your computer.

To associate the .cgi file extensions with Perl:

  1. Open the Internet Service Manager.
  2. From the tree display on the left, select the level at which to apply the mappings. You can choose an entire server, web site, or a given virtual directory. Select Properties from the Action menu.
  3. Click the Configuration button. This opens the Application Configuration dialog.
  4. Select the App Mappings tab and click the Add button. This opens the Add/Edit Application Extension Mapping dialog.
  5. Enter the full path to Perl.exe followed by %s %s. In the Extension field, type .cgi.
  6. Save/Apply the changes and close the Internet Service Manager.

Unless all of the above are satisfied, IIS/PWS will refuse to execute the CGI program and you will likely receive a 500 Server Error or 403 Forbidden response.

LinkScan Requirements

LinkScan imposes certain additional (minimal) requirements:

  1. In the linkscan.sys configuration file, the Cgibinurl setting must be configured to point at the folder into which the LinkScan CGI programs have been installed. This is required in order that the LinkScan CGI programs can link to each other. For example: Cgibinurl = http://www.example.com/cgi-bin/
  2. In the linkscan.sys configuration file, the Docsurl setting must be configured to point at a folder containing the LinkScan documentation and associated images. For example: Docsurl = http://www.example.com/linkscan/docs/
  3. An additional requirement is imposed if (and only if) the LinkScan CGI programs are installed in a folder other than the main LinkScan folder (for example, if you moved them to a cgi-bin folder). In this case, the LinkScan CGI's will need to know where to find the rest of the LinkScan configuration files and databases. LinkScan will look for the file .linkscan This file needs to contain a single line entry with the full pathname to the main LinkScan folder. For example:

    C:/linkscan/

    Be sure to include the leading and trailing forward-slash characters.

    However, the fun part is figuring out in which folder to place the .linkscan file. The LinkScan CGI programs will look in the current folder. But sadly, different versions and installations of IIS will launch CGI's with different starting folders. The chances are the .linkscan file will need to be in the IIS root folder. However, you may need try placing it in the same folder as the CGI's or the parent folder of the CGI folder.

  4. Finally, you will want to disable the LinkScan WebServer that is installed by default on Windows systems and activate an IIS fix associated with cookies and redirections. Simply start LinkScan and click Configure. Then:

27.4 LinkScan Access Controls

LinkScan includes some basic Access Controls that may be configured using the Access command in the configuration file linkscan.sys in the LinkScan directory. These access controls apply to CGI access only. It is assumed that standard operating system features will be used to control access by shell (command line) users.


Access username : password : project-list : owner-list : menu-options

An asterisk character may be used as a wildcard for any or all of the above parameters.

Indeed, a default LinkScan installation will create the following entry in linkscan.sys file providing unrestricted access:


Access = * : * : * : * : *

Facilities are also provided to integrate with HTTP Authentication Schemes. LinkScan will check for the Environment Variable specified by the Httpauth parameter in linkscan.sys (normally REMOTE_USER). If this variable is present, it will be used to set the current Username. LinkScan will assume that the user has already authenticated with the HTTP server and it will not check the password field in linkscan.sys.

Example: In the following example, we have configured two users with different passwords. User 'admin' has unrestricted access, but user 'webmaster' may only access the two Projects specified. Also the "Site History" and "System Configuration" Reports are not available to 'webmaster'.


Access = admin : root : * : * : *
Access = webmaster : html : www.example.com,devel.example.com : * : sxdcmoaqt

27.5 LinkScan Security Considerations

LinkScan incorporates some simple access controls on the various Reporting options and selections when run as CGI scripts. No LinkScan-specific access controls are applied when accessing LinkScan via a shell (command line) interface; it is assumed that normal operating system access controls apply. The LinkScan access controls are subject to the many and varied limitations inherent within the CGI protocol (see the WWW CGI Security FAQ and other sources for further discussion). In summary, if your HTTP server can access any specific file, then, any user with HTTP access to your server may be able to access that file. The LinkScan security features are provided as a convenience but they are no substitute for other more robust system-level security controls such as:

We highly recommend that you configure HTTP Authentication of the LinkScan directory. Other measures you may wish to consider include:

LinkScan Reference Manual. Section 28

LinkScan File Formats


The following notes describe the format of many of
the LinkScan database files stored in:

...LinkScan/ProjectName/data/
...LinkScan/ProjectName/hist/

Each file is created in (mainly) ASCII format,
with one Record per Line. Each Record contains
a number of Fields, delimited with <Control-G>
characters (Octal: 007). The Fields associated
with each Record type are outlined below.

linkscan.doc
============

One record per Document (does not include images etc)

 0 = Document URL
 1 = Document Type
 2 = Clicks
 3 = Content-Type (MIME)
 4 = Status Code (see codes.txt)
 5 = Extended Status
 6 = Content-Length (bytes)
 7 = Last-Modified (date/time)
 8 = Document Title
 9 = Location for Redirect
10 = Original Status Code (pre-redirect)
11 = File System Pathname
12 = Owner Code (see linkscan.own)
13 = Total Internal Links
14 = Bad Internal Links
15 = Total External Links
16 = Bad External Links
17 = Suspect External Links
18 = In-line bytes (page weight)


linkscan.fil
============

One record per file (e.g. images versus Documents)

Format is same as linkscan.doc, fields 0-12


linkscan.orp
============

One record per orphaned file

Format is same as linkscan.doc, fields 0-12


linkscan.mad and linkscan.map
=============================

SiteMap Data
linkscan.mad -- directory order
linkscan.map -- link order

 0 = Level
 1 = Document URL
 2 = Title
 3 = Document Size
 4 = Document Date/Time


linkscan.int, linkscan.ext, linkscan.int.err, linkscan.ext.err
==============================================================

Link Data -- internal, external, good and bad.

 0 = From URL index (see linkscan.idx)
 1 = Line number times 10
 2 = To URL index (see linkscan.idx)
 3 = Link Type
 4 = Status Code (see codes.txt)
 5 = Extended Status
 6 = Link Caption


linkscan.sum
============

Summary Statistics (Note this file is TAB delimited)

 0 = Version
 1 = Date and time of scan
 2 = Total Documents
 3 = Missing Documents
 4 = Documents Containing Errors
 5 = Total Other Files
 6 = Missing Other Files
 7 = Total Anchors
 8 = Missing Anchors
 9 = Total External Links
10 = External Links Tested This Scan
11 = External Links with Errors
12 = External Links with Possible Errors
13 = External Links with Warnings
14 = Total Orphans


hist/xxxxxx/dat
===============

History Data -- New File Created for Each Scan

 0 = Document URL
 1 = Owner Name
 2 = Document Type
 3 = Clicks
 4 = Content-Type (MIME)
 5 = Status Code (see codes.txt)
 6 = Content-Length (bytes)
 7 = Last-Modified (date/time)
 8 = Document Title


Document Type Codes
===================

 H = HTML
 D = PDF
 M = Image Map
 S = Flash
 Y = Special Control
 Z = Import

 I = In-line image
 F = File
 N = HTML nofollow

 A = Anchor
 R = Redirection

 U = External
 X = Special

LinkScan Reference Manual. Section 29

LinkScan Application Notes

  1. LinkScan to Email Interface
  2. Testing Wireless Servers with LinkScan
  3. Testing Secure Servers with LinkScan

29.1 LinkScan to Email Interface

LinkScan incorporates several functions that relate to electronic mail. These include:

Some or all of the following parameters must be configured in order to use these functions:

Windows Systems -- linkscan.sys

Sendmailpath = perl utils/sendmail.pl
Smtphost = smtp.example.com
Hostname = www.example.com
Mailfrom = LinkScan@example.com
Nameservers =
[...]
Mailto = 1

Unix Systems -- linkscan.sys

Sendmailpath = /usr/lib/sendmail -t
Smtphost = 
Hostname = www.example.com
Mailfrom = LinkScan@example.com
Nameservers =
[...]
Mailto = 1

linkscan.cfg

For completeness, we address two related settings in the linkscan.cfg file:

Mailhost = example.com
Checkmailto = 0

29.2 Testing Wireless Servers with LinkScan

LinkScan includes support for the Wireless Application Protocol (WAP) and Wireless Markup Language (WML). This allows LinkScan to validate wireless sites via an HTTP gateway. Typically, you will need to add the following configuration commands to linkscan.cfg:


Extraheader User-Agent: Nokia7110/1.0 (04.80)
Mimetypes text/vnd.wap.wml H

This will cause LinkScan to send an appropriate User-Agent header with each request and to parse/follow documents with a MIME/Content-Type of text/vnd.wap.wml.

29.3 Testing Secure Servers with LinkScan

LinkScan may be configured to test websites hosted on secure servers running the Secure Sockets Layer (SSL). i.e. sites with URL's of the form https://www.example.com/.

On the Microsoft Windows platforms, you need only specify the URL of the site to be scanned. LinkScan includes native support for the Secure Sockets Layer.

On Unix systems, you will need to install additional software to handle the SSL encryption. The required packages are:

At the time of writing LinkScan has been tested with OpenSSL version 0.9.6 and Net::SSLeay version 1.05.

Installation of both packages is very straightforward if you have root access:



cd $HOME/openssl-0.9.6
./config
make
make test
make install   # See Note 1

cd $HOME/Net_SSLeay.pm-1.05
perl Makefile.PL
make
make test      # See Note 2
make install   # See Note 1

Note 1: The make install steps may fail if you do not have root access. You may install and run these packages from a user directory if you do not have root access by using something like this:


cd $HOME/openssl-0.9.6
./config --openssldir=$HOME/myopenssl
make
make test
make install

cd $HOME/Net_SSLeay.pm-1.05
perl Makefile.PL $HOME/myopenssl
make
make test
mv ./blib/lib/Net/ /usr/www/linkscan/
mv ./blib/lib/auto/ /usr/www/linkscan/

Note 2: The make test on Net::SSLeay will produce a number of errors. In general, you can safely ignore them.

Once the module Net::SSLeay has been successfully installed, LinkScan will be able to scan https://... sites without any additional configuration changes.

Disclaimer

Each of the above referenced programs (with the exception of LinkScan) is maintained by parties other than Electronic Software Publishing Corporation. You are solely responsible for your use of those products and your compliance with any applicable software license agreements. Several of the referenced products contain encryption algorithms, the distribution and use of which may be subject to various laws and regulations. You are solely responsible for compliance.

LinkScan Reference Manual. Section 30

LinkScan Revision History

New in LinkScan 9.0

New in LinkScan 8.2

At LinkScan 8.2 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. We thank all of those users who contributed suggestions. Some of the highlights include:

New in LinkScan 8.1

At LinkScan 8.1 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. Although each individual change is relatively minor in scope, the aggregate of them all represents a significant improvement to the product. We thank all of those users who contributed suggestions and urge customers to install this greatly improved release at the earliest opportunity. In total, we have have made approximately 60 changes and enhancements. Some of the highlights include:

New in LinkScan 8.0

New in LinkScan 7.4

New in LinkScan 7.3

New in LinkScan 7.2

New in LinkScan 7.1

New in LinkScan 7.0

New in LinkScan 6.1

New in LinkScan 6.0

LinkScan 6.0 includes some significant changes to the scanning modules. For Windows users:

These changes eliminate prior restrictions due to limitations of the Perl implementation for Windows and can greatly improve performance.

For Unix users:

New in LinkScan 5.5

The configuration file format changes are summarized below:

We have found that these changes greatly simplify system configuration and administration in complex multi-Project scenarios. The automatic conversion script will attempt to normalize the global and project-specific linkscan.cfg files. However, users may find they can achieve further simplification with a few minutes of manual inspection and editing.

New in LinkScan 5.4

LinkScan 5.4 is primarily a maintenance release that consolidates several minor bug fixes and enhancements. It includes changes for the new LinkScan Server and LinkScan Workstation products as well as infrastructure to support new upcoming enhancements.

New in LinkScan 5.3

New in LinkScan 5.2

At LinkScan 5.2 we have improved HTTP navigation (the Execute command) for validating dynamic content (CGI scripts, Server Side Includes etc.), enhanced several of the LinkScan Reports and added some completely new reporting options. Some of the specific enhancements include:

New in LinkScan 5.1

LinkScan 5.0 was a major new release. At LinkScan 5.1 we have consolidated several minor bug fixes and a number of improvements designed to further simplify LinkScan administration. The following items are worthy of note:

New in LinkScan 5.0

New in LinkScan 4.2

At LinkScan 4.2, we have focused on enhancements to the various reporting modules with both new and more consistent options.

New in LinkScan 4.1

The following changes and enhancements were incorporated in LinkScan version 4.1:

New in LinkScan 4.0

The following changes and enhancements were incorporated in LinkScan version 4.0:

New in LinkScan 3.2

The following changes and enhancements were incorporated in LinkScan version 3.2:

New in LinkScan 3.1

The following changes and enhancements were incorporated in LinkScan version 3.1:

New in LinkScan 3.0

The following changes and enhancements were incorporated in LinkScan version 3.0:

New in LinkScan 2.1

The following changes and enhancements were incorporated at LinkScan version 2.1:

New in LinkScan 2.0

The following changes and enhancements were incorporated at LinkScan version 2.0:

New in LinkScan 1.2

The following changes and enhancements were incorporated at LinkScan version 1.2:

New in LinkScan 1.1

The following changes and enhancements were incorporated at LinkScan version 1.1:

LinkScan Reference Manual. Section 31

LinkScan End-User License Agreement
Including LinkScan Workstation, LinkScan Server,
LinkScan ServerPro and LinkScan Enterprise

This license agreement is proof of license. Please treat it as valuable property.

IMPORTANT - READ CAREFULLY: This End-User License Agreement ("Agreement") is a legal agreement between you (hereinafter "Licensee" or "you") and Electronic Software Publishing Corporation (hereinafter "Licensor") for the Licensor's software products identified above, and any upgrades which may be acquired by you for the identified products from time to time, which may include associated software components, media, printed materials, and "online" or electronic documentation (hereinafter "Product"). By downloading, installing, copying, or otherwise using the Product, you agree to be bound by the terms of this Agreement. If you do not agree to the terms of this Agreement, do not download, install or use the Product.

1. GRANT OF LICENSE.

Subject to payment of applicable license fee(s), Electronic Software Publishing Corporation hereby grants to you a non-exclusive non-sublicensable, non-transferable license to use its Product or grants you a license to use the Product free of charge for purposes of evaluating the Product for an evaluation period that is limited to a single one-time trial period of fifteen (15) days. You may use the Product only in the manner described herein. If you initially acquired a copy of the Product without purchasing a license and you wish to purchase a license you may do so by contacting the Licensor via the Internet at http://www.elsop.com/linkscan/ or linkscan@elsop.com.

If Licensor discovers and/or determines that a Licensee has used the Product on more than a single computer or has scanned more than the number of computers licensed for scanning or in an unauthorized manner, Licensor has the right to demand immediate payment of any amounts that the Licensee should have paid and did not previously pay or to terminate the License. Termination of the License may include, but not be limited to, disabling the licensed Product. Upon termination of license, Licensee shall destroy all copies of the Product in its possession. Licensee is liable for all legal and other expenses associated with the collection of these payments.

2. SCOPE OF GRANT.

Licensee may install and use a single copy of the Product on a single computer at a secure Location owned or leased by the Licensee. Licensee may maintain another copy of the Product for archival purposes, provided any copy must contain all of the original Product's proprietary notices.

LinkScan is offered as four different products: LinkScan Workstation, LinkScan Server, LinkScan ServerPro, and LinkScan Enterprise. The terms: "LinkScan Workstation", "LinkScan Server", "LinkScan ServerPro", and "LinkScan Enterprise" when used in reference to our Product as in "LinkScan Server" do not mean a physical or virtual server, but simply reference different products. The permitted uses of each product are described below.

The term Location is used in the following text and it is defined as the Licensee's premises (one company or institution) in the same building or campus with a contiguous boundary at the same physical postal address. A Location does not include branch locations or affiliated organizations.

A. LinkScan Workstation - You are licensed to scan up to 500 web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 500 web pages or other computers, you must obtain additional license(s).

B. LinkScan Server - You are licensed to scan up to 5,000 web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 5,000 web pages or other computers, you must obtain additional license(s).

C. LinkScan ServerPro - You are licensed to scan up to 15,000 web pages on a single physical computer that is owned or leased by you at one Location. The web pages may be on the computer on which the Product is installed or it may be a remote physical computer, but not both. You must buy additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than 15,000 web pages or other computers, you must obtain additional license(s).

D. LinkScan Enterprise - You are licensed to scan an unlimited number of web pages on up to ten (10) physical computers that are owned or leased by you at one Location. You must acquire additional licenses for each additional computer you scan even though you are using only one copy of the Product to scan the multiple computers. If you wish to scan more than ten computers, you must obtain additional (Add-on) license(s) for the additional computers beyond the base ten computers covered by this license. If you wish to scan computers at more than one location, you must purchase new LinkScan Enterprise licenses for those locations.

3. USE RESTRICTIONS.

Licensor shall issue to Licensee a Registration Key and Password which may only be installed on the single computer designated in the registration process. The Licensee may transfer the Product to another designated computer owned or leased by the Licensee and re-register the Product for that computer provided the original copy of the Product on the original designated computer is destroyed after the move of the Product has been accomplished. You also agree to not transfer to any other party the Registration Key and Password issued for the original computer. Licensor has the explicit right to monitor the use of the Product by the Licensee in order to enforce the provisions of this agreement.

Licensee agrees that it will not use or permit the Product to be used in any manner, whether directly or indirectly, that would enable Licensee's customers or any other person or entity to use the Product. However, Licensee may publish copies of SiteMaps and/or TapMaps produced by the Product for public consumption.

Licensee agrees that the Product is based on and includes trade secrets and proprietary know-how belonging to Licensor and is being made available to Licensee in confidence and solely on the basis of a confidential relationship with Licensor.

Licensee may not: permit other individuals to use the Product except under the terms listed above; modify, translate, reverse engineer, decompile, disassemble (except to the extent applicable laws specifically prohibit such restriction), or create derivative works based on the Product (including the Product's screen displays); copy the Product (except as specified above); or remove any proprietary notices or labels on the Product. If the licensee does any of the aforementioned activities in this paragraph and has not purchased a license then licensee agrees to immediately pay Licensor the License fee and to comply with all of its terms.

Licensee may not use the Product to provide timesharing, service bureau, or similar services to any other party. Licensees who are Internet Service Providers are explicitly prohibited from providing the Product or use of the Product to their customers or any other parties.

Licensee may not allow other parties to use the Product or the Registration Key or Password associated with the Product. Licensee may not allow any other person to do anything that is prohibited by this Agreement.

Licensee shall not make any portion of the Product available to a third party, rent, lease, sell, sublicense, assign, or otherwise transfer the Product, any portion thereof, or any output generated by the Product to a third party, and shall not convey for commercial purposes any information arising from the use of the product to any third person, or use the Product for a purpose other than that for which it is intended (as evidenced by the documentation). Recipient further agrees to treat the Product with at least the same degree of care as that with which it treats its own confidential or proprietary information.

4. COPYRIGHT.

The Product (including any images, applets, animations, and text incorporated into the Product) is owned by Licensor and is protected by copyright laws and international copyright treaties, as well as other intellectual property laws and treaties. The Product is licensed, not sold. All title, including but not limited to copyrights, in and to the Product and any copies thereof are owned by Licensor. You must treat the Product and any printed materials that may accompany the Product like any other copyrighted material. You may not copy the Product or any printed material that may accompany the Product. Licensor reserves all rights not expressly granted.

5. SOURCE AND BINARY CODE.

This is PROPRIETARY SOURCE AND BINARY CODE of Licensor; the contents of this file may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission of Licensor.

Permission is hereby granted solely to the licensee for use of this source code in its unaltered state. This source code may not be modified by Licensee except under direction of Licensor. Neither may this source code be given under any circumstances to other parties in any form, including source or binary. Licensee shall not reverse engineer, decompile or disassemble any portion of the Product's code. Modification of this source code by Licensee shall automatically terminate this License as per Section 11. Divulging the exact or paraphrased contents of this source code to unlicensed parties either directly or indirectly constitutes violation of federal and international copyright and trade secret laws, and will be duly prosecuted to the fullest extent permitted under law.

6. DELIVERABLES.

Licensee may acquire the Product in machine readable form by downloading it electronically from the Licensor's computer (website server) to his computer. The Product will not be delivered in any other form or manner. The Licensor shall deliver to the Licensee by Electronic Mail within a reasonable time after the Licensee has paid for the Product a Registration Key and Password which enables the Product to operate. Reasonable within this context means within three business days of receipt of payment.

7. DISCLAIMER OF WARRANTY AND LIMITED WARRANTY.

THE PRODUCT IS DEEMED ACCEPTED BY LICENSEE, AND IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, LICENSOR FURTHER DISCLAIMS ALL WARRANTIES, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. LICENSOR DOES NOT WARRANT, GUARANTEE, OR MAKE ANY REPRESENTATIONS REGARDING THE PERFORMANCE, USE OR RESULTS OF THE USE OF THE PRODUCT IN TERMS OF CORRECTNESS, ACCURACY, RELIABILITY, CURRENTNESS, OR OTHERWISE. IN NO EVENT SHALL LICENSOR OR ITS SUPPLIERS BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL, DIRECT, SPECIAL, PUNITIVE, OR OTHER DAMAGES WHATSOEVER (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR OTHER PECUNIARY LOSS) ARISING OUT OF THIS AGREEMENT OR THE USE OF OR INABILITY TO USE THE PRODUCT, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. YOU ASSUME THE ENTIRE RISK AS TO RESULTS AND PERFORMANCE OF THE PRODUCT. IF THE PRODUCT IS DEFECTIVE, YOU, AND NOT LICENSOR OR ITS DEALERS, DISTRIBUTORS, AGENTS, SUPPLIERS, OR EMPLOYEES, ASSUME THE ENTIRE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

THE ABOVE IS THE ONLY WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, THAT IS MADE BY LICENSOR REGARDING THE PRODUCT, NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY LICENSOR, ITS DEALERS, DISTRIBUTORS, AGENTS, SUPPLIERS, OR EMPLOYEES SHALL CREATE A WARRANTY, OR BIND LICENSOR, AND YOU MAY NOT RELY ON ANY SUCH INFORMATION OR ADVICE. THIS WARRANTY GIVES YOU SPECIFIC LEGAL RIGHTS. YOU MAY HAVE OTHER RIGHTS WHICH VARY FROM STATE TO STATE. NO LICENSOR DEALER, AGENT, SUPPLIER, OR EMPLOYEE IS AUTHORIZED TO MAKE ANY MODIFICATIONS, EXTENSIONS, OR ADDITIONS TO THIS WARRANTY. IF ANY MODIFICATIONS ARE MADE TO THE PRODUCT BY YOU OR IF YOU VIOLATE THE TERMS OF THIS AGREEMENT, THEN THIS WARRANTY SHALL IMMEDIATELY BE TERMINATED. THIS WARRANTY SHALL NOT APPLY IF THE PRODUCT IS USED ON OR IN CONJUNCTION WITH HARDWARE OR PRODUCT OTHER THAN THE UNMODIFIED VERSION OF HARDWARE AND PRODUCT WITH WHICH THE PRODUCT WAS DESIGNED TO BE USED AS DESCRIBED IN THE DOCUMENTATION.

8. TITLE.

Title, ownership rights, and intellectual property rights in the Product shall remain in Licensor and/or its suppliers. You understand that the Product is licensed and not sold to you. The Product is protected by the copyright laws and treaties. Title and related rights in the content accessed through the Product is the property of the applicable content owner and may be protected by applicable law. This License gives you no rights to such content.

9. SUPPORT AND MAINTENANCE.

Licensor offers no support (including technical support) or maintenance of this Product. Licensee, at its option, may negotiate for Support and Maintenance from Licensor and/or its suppliers through a separate agreement. Licensor may, at its option, publish on its website a list of Frequently Asked Questions (FAQ) concerning the Product without obligation to continue doing so or to maintain said list. Licensor may, at its option, offer and/or provide technical support or assistance for the Product without obligation to continue doing so.

10. LIMITATIONS ON LICENSOR'S OBLIGATIONS.

Licensee understands and agrees that Licensor may develop and market new or different computer programs which use part or all of the Product and which performs all of the functions performed by the Product. Nothing contained in this Agreement gives Licensee any rights with respect to such new or different computer programs.

11. TERMINATION.

The license will terminate automatically if you fail to comply with the limitations and restrictions described herein or if you are delinquent in making any payments for the Product of any sum due under this Agreement. On termination, you must destroy all copies of the Product. Licensor may also terminate this Agreement if you violate it. You must destroy all copies of the Product in your possession or control promptly upon termination. Upon Licensor's request, you must certify in writing that you have complied with your obligations under this Section and otherwise under this Agreement. Termination by Licensor will not limit any of its other rights or remedies under this Agreement or at law or in equity. Any provision of this Agreement that by its sense and context is intended to survive termination of this Agreement will survive termination.

12. LIMITATIONS ON LICENSOR'S LIABILITY AND UPON TIME TO SUE.

UNDER NO CIRCUMSTANCES AND UNDER NO LEGAL THEORY, TORT, CONTRACT, OR OTHERWISE, SHALL LICENSOR OR ITS SUPPLIERS OR RESELLERS BE LIABLE TO YOU OR ANY OTHER PERSON FOR ANY INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY CHARACTER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF GOODWILL, WORK STOPPAGE, COMPUTER FAILURE OR MALFUNCTION, OR ANY AND ALL OTHER COMMERCIAL DAMAGES OR LOSSES. IN NO EVENT WILL LICENSOR BE LIABLE FOR ANY DAMAGES IN EXCESS OF THE PRICE PAID FOR SUCH LICENSE, EVEN IF LICENSOR SHALL HAVE BEEN INFORMED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY. THIS LIMITATION OF LIABILITY SHALL NOT APPLY TO LIABILITY FOR DEATH OR PERSONAL INJURY TO THE EXTENT APPLICABLE LAW PROHIBITS SUCH LIMITATION. FURTHERMORE, SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES, SO THIS LIMITATION AND EXCLUSION MAY NOT APPLY TO YOU. NO ACTION, REGARDLESS OF FORM, ARISING OUT OF ANY OF THE TRANSACTIONS UNDER THIS AGREEMENT MAY BE BROUGHT BY LICENSEE MORE THAN ONE YEAR AFTER SUCH ACTION ACCRUED.

13. TRADEMARKS.

"Electronic Software Publishing Corporation", the Electronic Software Publishing Corporation logo, "Elsop", "LinkScan", the LinkScan logo, "LinkScan QuickCheck", "LinkScan Dispatch", "MailVet", and all other trademarks which identify the Licensed Program or the company are the trademarks, and in some jurisdictions may be registered trademarks, of the Electronic Software Publishing Corporation.

14. EXPORT CONTROLS.

You agree that none of the Product or underlying information or technology will be downloaded or otherwise exported or re-exported (i) into (or to a national or resident of) Cuba, Iraq, Libya, Federal Republic of Yugoslavia (Serbia and Montenegro, U.N. Protected Areas and areas of Republic of Bosnia and Herzegovina under the control of Bosnian Serb forces), North Korea, Iran, Syria or any other country to which the U.S. has embargoed goods; or (ii) to anyone on the U.S. Treasury Department's list of Specially Designated Nationals or the U.S. Commerce Department's Table of Deny Orders. You warrant and represent that neither the U.S.A. Bureau of Export Administration nor any other federal agency has suspended, revoked or denied your export privileges. By downloading or using the Product, you are agreeing to the foregoing and you are representing and warranting that you are not located in, under the control of, or a national or resident of any such country or on any such list.

In addition, if the licensed Product is identified as a not-for-export product (for example, in the registration process or in the installation process), then the following applies: Except for export to Canada for use In Canada by Canadian citizens, the Product and any underlying technology may not be exported outside the United States or to any foreign entity or "foreign person" as defined by U.S. government regulations, Including without limitation, anyone who is not a citizen, national or lawful permanent resident of the United States. By downloading or using the Product, You are agreeing to the foregoing and you are warranting that you are not a "foreign person" or under the control of a foreign person.

15. ENTIRE AGREEMENT.

This Agreement constitutes the entire agreement between the parties in connection with the subject matter hereof and supersedes all prior and contemporaneous agreements, understandings, negotiations and discussions, whether oral or written, of the parties, and there are no warranties, representations and/or agreements between the parties in connection with the subject matter hereof except as specifically set forth or referred to herein.

16. GOVERNING LAW; SEVERABILITY.

This Agreement represents the complete agreement concerning this license and may be amended only by a writing executed by both parties. If any provision of this Agreement is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. This Agreement shall be governed by California law, without reference to conflicts of law principles. The application of the United Nations Convention on Contracts for the International Sale of Goods is expressly excluded. THE ACCEPTANCE OF ANY PURCHASE ORDER PLACED BY YOU IS EXPRESSLY MADE CONDITIONAL ON YOUR ASSENT TO THE TERMS SET FORTH HEREIN, AND NOT THOSE IN YOUR PURCHASE ORDER. Any suit to enforce the terms of this Agreement may be brought in either the United States District Court of the Northern District of California or the California Superior Court in and for the County of Santa Clara, as appropriate, and you consent to the jurisdiction and venue of such court. If either party brings any action to enforce any rights arising out of or relating to this Agreement (whether or not suit is filed), the prevailing party shall be entitled to recover its costs and expenses related to such action, including reasonable attorneys' fees except as provided under section 1: Grant of License. All terms of this Agreement which, by their nature, are intended to survive termination of this Agreement shall survive any such termination.

17. COMPLIANCE WITH THE LAW.

Licensee agrees that it will comply with all federal, state and local laws and regulations governing the use of the Product.

18. RETURN AND REFUND POLICY.

The licensor allows no returns and will make no refunds.

19. TAXES.

In addition to all license fees paid by Licensee in acquiring this license, Licensee shall pay or reimburse Licensor for all federal, state, local or other taxes not based on Licensor's net income or net worth, including, but not limited to, sales, use, value-added, privilege and property taxes, or amounts levied in lieu thereof, based on charges payable under this Agreement or based on the Product, its use or any services performed hereunder, whether such taxes are now or hereafter imposed under the authority of any federal, state, local or other taxing jurisdiction.

20. U.S. GOVERNMENT RESTRICTED RIGHTS.

Use, duplication or disclosure by an agency, agent, unit, or instrumentality of the United States Government is subject to restrictions set forth in subparagraphs (a) through (d) of the Commercial Computer-Restricted Rights clause at FAR 52.227-19 when applicable, or in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013, and in similar clauses in the NASA FAR Supplement. Contractor/manufacturer is Electronic Software Publishing Corporation, 1361 Shelby Creek Court, San Jose, CA 95120 USA.

License Version 7-1
Revision Date: March 24, 2000
© Copyright 1997-2000 Electronic Software Publishing Corporation (Elsop)

Single Document LinkScan Reference Manual
LinkScan Version 9.0
© Copyright 1997-2001 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

 Help   Reference   HowTo   Card