![]() | LinkScan Reference Manual | Section 30 |
Previous Contents Next | Help Reference HowTo Card |
We have added support for the Wireless Application Protocol (WAP) and Wireless Markup Language (WML). This allows LinkScan to validate wireless sites via an HTTP gateway. Typically, you will need to add some configuration commands to linkscan.cfg. For example:
Extraheader User-Agent: Nokia7110/1.0 (04.80) Mimetypes text/vnd.wap.wml H
This will cause LinkScan to send an appropriate User-Agent header with each request and to parse/follow documents with a MIME/Content-Type of text/vnd.wap.wml.
We have added a new method for controlling the depth of a scan. The new Maxclicks command complements the existing Maxlevels command.
Whereas Maxlevels controls the depth of the scan based on an examination of the URL and the number of directory levels within it, the new Maxclicks command controls the depth of the scan based on the number of clicks required to reach the link from the starting (home) page.
The click level is normally incremented each time LinkScan follows a link. However, in order to more closely resemble real-world scenarios, the click level is not incremented when following links of this type:
Hence you may control the depth of a scan based on Maxclicks, Maxlevels or a combination of both.
A number of webmasters have told us about a new and increasing problem with their external links. Users are finding that working (200 OK) links are suddenly pointing at pages with "inappropriate" (e.g. adult) content. This has become quite an issue with large numbers of domains changing hands or, in some cases, being hijacked through exploits in the Internet Domain Name System (DNS). We have experienced the problem ourselves.
We have, therefore, implemented a range of special profiling techniques that may be used to automate the detection of these situations without the need to manually inspect each link on a periodic basis. The profiling options include user written profiles, pre-configured profiles available on request, and integration with third party content filtering products and services such as firewalls and proxies. See the LinkScan Profiler for details. [Not available in LinkScan Workstation]
We have incorporated a new Problem Documents Report. This report provides a summary of documents which:
We have greatly enhanced LinkScan Dispatch which now includes options to create and/or e-mail a range of different reports. LinkScan Dispatch supports a completely new series of command-line switches. However, for existing users, backwards compatibility with the pre-9.0 options has been preserved. See LinkScan Dispatch.
To improve ease of use, we have renamed and reorganized some reports and provided more context-sensitive help.
We have made numerous other small changes and enhancements to the LinkScan reports. We highly recommend that existing users who use the command line reporting update their linkscan.rep file(s) based on the new template.
We have enhanced LinkScan to save and store the MIME/Content-Type associated with each internal link. These data are available via the Search Documents and Changed Documents Reports.
We have enhanced the Windows Graphical User Interface to provide more control over the "scope" of a scan based on the Onlyinclude and Onlyfollow commands. See screenshot.
We have added several new Status Codes. Errors generated via the Errordoc (redirect match) command are displayed with the 3000 Status Code to differentiate them from regular 404's. Similarly, errors generated via the Errorbody (body match) command are displayed with the 3001 Status Code.
The 3002 Status Code is used by the new LinkScan Profiler described above.
We have added the Excludecookie command to filter/reject specific cookies.
We have added the Proxymatch command to provide more flexibility for those with complex network environments that require the use of different proxy servers for different hosts/domains.
At LinkScan 8.2 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. We thank all of those users who contributed suggestions. Some of the highlights include:
We have added a new Changed Document Report. This allows users to compare the summary data from two different scans of the same website/project. The report displays lists of new documents added, documents removed and documents changed. Document changes are detected based on one or more of the following data items: document size in bytes, document title, document date/time modified (if available) and/or additional user specified data collected from META tags as described below. Benefits include:
We have added an option which, when enabled, will allow users viewing any LinkScan Report to send a copy of that report to a specified e-mail address (in HTML or TEXT format). See Mailing LinkScan reports from a browser. This improves work flow; for example, a supervisor viewing a report of bad link(s) may rapidly mail it to someone else for action.
We have added two new reporting capabilities with forms -- Search Documents and Search Links. These may be used to perform arbitrary ad-hoc queries on the LinkScan Database with a flexible array of sort/select/display options. For example, one might use such a query to produce a report listing every document that contains one or more <FORM> tags.
This reporting capability permits very arbitrary queries on the database. It makes virtually the entire database searchable.
We have added a new control (Maxlevels) that may be used to more easily configure limits on the depth of a scan. This provides a fast and easy way to configure limits on the depth of a scan.
We have added the ability to collect additional user specified data from each document scanned. Typically this is used to extract document attributes from META tags although the feature is not limited to META data. The data may also be manipulated via Perl Regular Expressions prior to storage in the LinkScan database (e.g. to normalize formatting). The collected data may also be post-processed by external programs to carry out more complex transformations. See How to Process Additional per-Document Data.
User data collected could include the name of a person responsible for a document or an expiration date by which a document must be reviewed or updated. This feature enables the user to integrate LinkScan with their work flow tools and procedures.
We have noticed that a significant proportion of web pages include vast amounts of totally redundant, bandwidth-consuming whitespace. In our view, many website operators have an opportunity to improve page load times and reduce their bandwidth cost. We have, therefore, enhanced LinkScan to report a summary of the Whitespace-Bytes versus Total-Bytes consumed during the course of a scan.
We have added an summary of inline image data to the LinkScan QuickCheck reports. This report now displays just about everything that LinkScan knows about a given document.
We have introduced an option (Mapext) to include external links on the LinkScan SiteMap and TapMap.
We have made several small but significant adjustments to the low-level HTTP and HTTPS drivers for improved accuracy and greater performance. In particular, we have incorporated some improved timeout/retry algorithms to enhance accuracy and throughput on slower links. The handling of DNS timeouts has also been improved.
We have incorporated several improvements to the HTML and JavaScript parsers. These should benefit all users but the enhancements are especially significant on sites using IBM/Lotus Domino.
We have rewritten the Portable Document Format (PDF) drivers for improved accuracy and performance and to better handle the latest versions of the PDF file formats.
We have enhanced our MailVet technology to improve the speed and accuracy of the LinkScan active mailto: checking.
We have improved the speed at which all of the LinkScan reports are generated.
At LinkScan 8.1 we have consolidated several minor bug fixes and a large number of customer generated suggestions for improvements and enhancements. Although each individual change is relatively minor in scope, the aggregate of them all represents a significant improvement to the product. We thank all of those users who contributed suggestions and urge customers to install this greatly improved release at the earliest opportunity. In total, we have have made approximately 60 changes and enhancements. Some of the highlights include:
Several enhancements to the LinkScan Reports for improved management of user preferences and system security, additional/improved cross-linking between various reports, and a number of improvements to the report layouts.
A number of new error checks and improved error messages.
Various improvements to the LinkScan Webserver.
Numerous improvements to LinkScan Dispatch including:
Various enhancements to our MailVet technology to improve the speed and accuracy of the active mailto link checking. See Active Validation of mailto: Links.
Various enhancements to LinkScan Excel -- including an option to import all META tags. Note: To use this feature, a scan must be completed with the Collectmeta option in linkscan.cfg enabled.
CPU times as well as wall clock times are recorded for each scan, in the file linkscan.dbg.
Somewhat simplified configuration of Orphaned Files checking.
Added ability to direct documents with specific MIME (Content-Type) headers to an appropriate interpreter (HTML, PDF, Shockwave/Flash and JavaScript options currently supported). For example, to check the contents of included JavaScript files use:
Mimetypes application/x-javascript J
Added ability to insert synthetic links into selected documents on-the-fly, for controlling test coverage on complex dynamic content. See: How to manipulate URLs on-the-fly for a discussion of the Substitute command and the new Insertlink command.
Various corrections, clarifications and improvements to the LinkScan Documentation.
We have made very substantial internal changes to improve the performance, scalability and reliability of LinkScan. These changes should result in significant storage savings with a (typical) 50 percent reduction in database size. Some of the changes establish new foundations on which other enhancements will be built over the coming months and years.
We have significantly enhanced the Windows Graphical User Interface.
On Unix Systems we have added a direct interface to the OpenSSL package for scanning sites that use the Secure Sockets Layer (SSL) or https://... protocol. See: Testing Secure Servers.
We have substantially restructured and rewritten the LinkScan documentation.
We have enhanced several of the LinkScan Reports.
We have introduced the first release of LinkScan Excel.
We have added several new options/commands that may be used to optimize performance when scanning very large (100,000 and more documents) websites. See How to scan very large sites.
We have included the new Noforms command. When enabled, this will prevent LinkScan from testing links found in <FORM ACTION=...> tags. Attempting to test those links without submitting some associated data values may lead to 500 Server Errors on many sites. In general, this indicates inadequate error checking and recovery in the target scripts but we have nevertheless provided an option to avoid to such errors cluttering the reports.
We have included a detailed audit trail of all cookie transactions processed during a scan. The log is maintained in the file .../LinkScan/Projectname/data/linkscan.red.
We have made the list of unsafe characters a user configurable option. This means, for example, that users may control whether or not the use of a backslash character in URLs will or will not generate a 911 Unsafe Character warning. Note that the use of a backslash instead of a forward slash is indeed unsafe but some sites use it anyway.
The LinkScan Recorder is a Windows application that interfaces with Microsoft Internet Explorer. It may be used to capture real web browsing sessions, such as a complex order entry sequence. The captured recording includes all of the data entered into any associated forms. LinkScan may then be configured to replay the recording on demand, validating every link on each form and results page in the sequence. See LinkScan Recorder.
We have greatly enhanced the LinkScan Import feature which now includes two separate functions:
Import Links: May be used to validate a simple list of URL's that is derived from some external source such as an SQL database or spreadsheet export.
Import Documents: May be used to validate a list of documents, including all of the links within each document. Such sequences may be generated with the LinkScan Recorder or derived from some other source. See Import Scanning
.We have enhanced LinkScan to parse, and extract any hyperlinks embedded in Shockwave/Flash files.
We have enhanced LinkScan with the ability to add customized hyperlinks at various points throughout the reports. This provides a flexible means to integrate the LinkScan Reports with other applications. For example, these links may be configured to activate functions within a content management or other database management system.
Some web servers are configured in a manner that may mask serious errors from end users and link checkers alike. This typically arises when the server responds to an invalid request by delivering a user-friendly error page with a 200 OK status code rather than a 404 Not Found. In some cases, the server will issue a redirect to a custom error document such as:
http://www.example.com/notfound.html
In other cases, server-side application code will simply deliver a valid document that contains a description of the error or exception.
We have enhanced LinkScan with directives that may be used to force a 404 Not Found Error in either of these situations. For example:
In the former case, any links that result in a redirection to the URL "/notfound.html" will be reported as 404.
In the latter case, any links that return a document body with content matching the specified expression will be reported as 404.
We have enhanced the link status information displayed on the LinkScan Reports. The LinkScan database now includes an additional extended status information field which is used to display supplementary information about certain link types.
We have incorporated additional locking protections such that multiple Projects may safely be scanned simultaneously. Note that any attempt to scan a Project that is currently being scanned by another user/process, will be refused.
However, we do urge some caution. Scanning multiple Projects in parallel may consume significant processor, memory and/or network resources. If the available system resources are saturated, the overall impact on LinkScan's throughput may prove negative. Users should be prepared to monitor system resources using the available tools applicable to the operating system and make adjustments if necessary.
We have enhanced LinkScan for Windows (not Unix) to automatically and transparently support the Secure Sockets Layer (SSL). That is, URL's that start with https://.... Note the you must have Microsoft Internet Explorer 5.0 or later installed on your computer. On Unix systems, you must configure a suitable proxy server -- see: Testing Secure Servers with LinkScan.
We have enhanced the various LinkScan Menus and Reports with a completely new "look and feel". Major improvements include a new Critical Errors Report, a more comprehensive Summary Statistics Report, context-sensitive help, and more convenient preferences/options. All reports are available in Rich, Standard or Text formats. The Rich format makes extensive use of HTML tables which produce an easy to use layout. However, all major browsers tend to encounter memory problems when rendering very large tables with many thousands of cells. If a selected report is likely to exceed 1000 rows, LinkScan will automatically use Standard format to avoid these problems.
We have completely eliminated the dependency on the operating system sort utility.
We have improved still further LinkScan's analysis of JavaScript and ASP constructs and incorporated several significant performance enhancements.
We have added a new check and Status Code for <A HREF=...> tags with no corresponding </A> tag. This may be enabled or disabled with the Closeatag option in linkscan.cfg.
We have added a new Followext option to linkscan.cfg. If enabled, LinkScan will attempt to follow redirections when testing external links (versus simply noting the redirection).
We have added a new Errordoc option to linkscan.cfg. This feature is useful when scanning servers that automatically redirect bad requests to a Custom Error Document. If such a page is served with a 200 OK Status, serious errors may be masked. A command such as:
Errordoc notfound\.html$
will force LinkScan to report a 404 Not Found error for any URL that is redirected to a URL that matches the pattern specified with the Errordoc parameter.
We have enhanced the Substitute command. This command is used to manipulate URL's as they are processed by LinkScan. We now support separate Substituteraw and Substitute commands. The former operates on URL's as they are extracted from the raw HTML tags. The latter operates on URL's after they have been normalized relative to the then current base URL.
We have enhanced the Substitute command only with the special token !U. For example:
Substitute (.*) !U$1
This will cause LinkScan to decode any %-encoding within the URL. For example:
Substitute cgi-bin/redirect\?.*?&Link=([^&]+).* XX$2 Substitute XX(.*) !U$1
Hence a link to:
cgi-bin/redirect?Type=1&Link=http%3A%2F%2Fwww%2Eexample%2Ecom%2F
will be translated to:
XXhttp%3A%2F%2Fwww%2Eexample%2Ecom%2F
and then to:
http://www.example.com/
We have added a new Tagonce command to linkscan.cfg. If enabled, LinkScan will only process one time any link that matches the specified pattern. All subsequent references to that link will be completely ignored. This option may be used to eliminate excessive storage associated with tracking thousands of references to the same frequently used URL. For example links associated with toolbars and other navigation aids that are included in every document on a large website.
We have incorporated the ability to check for Orphaned Files on remote servers without the requirement to use NFS or a local mirror copy of the target website. We supply a script which may be executed on the remote machine to collect a recursive file listing that may subsequently be imported into LinkScan in lieu of direct file system access. See File System Scanning.
We have enhanced LinkScan Enterprise so that two or more hosts may be scanned within a single Project. For details see LinkScan Enterprise Extensions. This capability is not available in LinkScan Workstation, Server or ServerPro.
We have simplified the testing of password protected sites and links. The Auth command may be configured with a blank Realm. LinkScan will use the specified username and password for any Realm on the specified server. You do not need to specify a Realm unless you need LinkScan to use multiple username and password combinations for different Realms on the same server. For example:
Auth www.example.com "" username password
We have enhanced support for Cookies. LinkScan accepts all cookies received during a scan and tracks them in a cookie jar. The cookie jar may be initialized with additional cookies by using the existing Cookie command in linkscan.cfg.
We have enhanced LinkScan to optionally check all <IMG SRC> tags for ALT, HEIGHT and/or WIDTH attributes. To enable this feature, add the following command to the linkscan.cfg file:
Imgtags = AHW # Flag all IMG SRC tags without Alt, Height, Width
We have implemented additional controls which may be used to prevent unnecessary scanning of very large sites, especially those using dynamic content. The new Taglimit command may be used to limit the number of documents scanned that match a specified pattern. For example, the following command may be added to linkscan.cfg:
Taglimit scripts/DatabaseLookup.asp 20
This will limit the number of times that LinkScan will probe the DatabaseLookup.asp script with different query parameters. In this case, LinkScan will probe only the first 20 references to this script. Note that the Taglimit and Maxcgi are both checked for each document.
We have further refined the default JavaScript pattern matching algorithms to improve coverage and reduce false matches.
We have made several enhancements to some of the LinkScan Reports including a complete rewrite of the Selected Status Codes Report.
We have enhanced the Summary Detail Report with a completely new Slowest Pages First option to help webmasters examine page load times especially over slow (i.e. dial-up) connections.
We have improved the algorithms for the identification of JavaScript embedded hyperlinks to increase the percentage of links found and reduce false positives.
We have made several other small improvements especially relating to reliability under Windows 95/98.
LinkScan users with Unix systems may now scan remote systems via HTTP. Please see the LinkScan End-User License Agreement for permitted use. The following command will initiate such a scan:
perl linkscan.pl -remote http://www.example.com/ -project example
We have enhanced LinkScan with support for JavaScript. Links may be extracted from JavaScript code using (customizable) pattern matching techniques.
We have added the capability to specify additional URL's that must be scanned, whether or not LinkScan encounters links to those URL's in other documents. This includes the ability for LinkScan to submit specific forms with specified data values. Forms may be submitted using either the GET or POST methods. See How to Submit Forms.
We have included our MailVet technology that can verify, with a high degree of accuracy, whether an e-mail address will or will not bounce mail. MailVet will probe up to 500 unique "mailto" tags without actually sending any mail. See Active validation of mailto: links.
We have provided additional controls to specify document ownership. In particular, owner names may be extracted document META tags and subsequently manipulated via Regular Expressions.
We have added limited support for ldap://... links. LinkScan will attempt to establish a connection to Port 389 of the specified server. It does not currently validate the query and the status will be reported as an Advisory; "LDAP Server Connected - Query Not Checked".
We have added additional support for SSL (https://) secure server proxies.
We have provided powerful facilities to manipulate specific links via Regular Expressions. This feature may, for example, be used to remove or manipulate SESSIONID's that are added dynamically by your HTTP server. It can also be helpful in controlling test conditions for sites that use mainly dynamic content.
We have enhanced LinkScan with the ability to import a simple list of links for validation. This feature may be used to validate large numbers of links that have, for example, been exported from a database management system or other application program.
We have simplified the flexible (but confusing) array of options associated with LinkScan/QuickCheck. QuickCheck will now always attempt to retrieve the page status information from an existing Linkscan database (very fast). If this fails, QuickCheck will fetch the document via HTTP and validate the links in real-time (slower). When the results are based on the database, an option is provided to perform a new real-time check. In addition, QuickCheck will warn the user if the date-time-modified stamp on the source file is later than the data-time-modified stamp on the database. This alerts the user to the fact that the database status may be out of date.
We have enhanced LinkScan/QuickCheck to display the HTTP Request and Response Headers associated with document retrieval.
We have improved the performance of DNS lookups associated with all HTTP requests. This may cause problems on a very small number of installations (as far as we have been able to tell, systems running certain older Linux distributions). This problem normally presents as a series of 900 (DNS), 903 (Timeout) or 999 (Unknown) errors. Or rarely a core dump. In the unlikely event that you experience these symptoms, simply add the following entry to linkscan.sys:
Nodnsalarm = 1
We have greatly improved the support for validating hyperlinks embedded in Adobe Portable Document Format (PDF) documents. To enable this feature, you must set the following parameter in linkscan.cfg:
Pdffiles = pdf
We have enhanced LinkScan to recognize and validate links of the form:
<script src="foo">
We have added support for the special NULL token in the Htmlfiles parameter. This may be used to tell LinkScan to process files with no file extension as if they were HTML documents.
We have changed LinkScan so that it now assumes there is an implied <a name="top"></a> in each HTML document. This means that all references to <a href = "#top"> are considered valid, consistent with all common web browsers.
We have improved LinkScan's processing of references containing %encoded characters.
We have enhanced LinkScan with a new Extraheader command. Adding this command to linkscan.cfg will force LinkScan to send the additional header with each HTTP request. For example, to set a preferred language, use:
Extraheader = Accept-Language: en
We have enhanced LinkScan to prevent simple HTML errors resulting in the creation of databases for phantom Owners. For example, a hyperlink with a missing "http://" such as:
<a href="www.example.com">
will no longer result in the creation of a "www.example.com" Owner.
We have enhanced Linkscan so that the following linkscan.sys parameters may be overridden with the per-Project linkscan.cfg files:
LinkScan 6.0 includes some significant changes to the scanning modules. For Windows users:
These changes eliminate prior restrictions due to limitations of the Perl implementation for Windows and can greatly improve performance.
For Unix users:
The Graphical User Interface supplied with LinkScan for Windows incorporates numerous enhancements to simplify installation and configuration.
LinkScan for Windows includes a basic HTTP server, the LinkScan WebServer. Users may install the LinkScan WebServer automatically or elect to integrate LinkScan with an existing HTTP server such as Apache or Microsoft IIS.
Existing LinkScan users should note that the configuration file formats have changed significantly at LinkScan 5.5 to simplify system administration and maintenance. We have supplied a tool to automate the conversion of your existing configuration.
The configuration file format changes are summarized below:
The file linkscan.mas has been simplified. This file now contains a simple list of configured Project directories. Project Descriptions are now stored in the corresponding linkscan.cfg file.
The file linkscan.usr has been eliminated. These options, used to provide access controls to the LinkScan CGI scripts, have been integrated into linkscan.sys.
The file linkscan.ign has been eliminated. The LinkScan customization commands are now stored in the file linkscan.cfg.
The file linkscan.alt has been eliminated. The SiteMap customization commands are now stored in the file linkscan.cfg.
The linkscan.cfg templates have been "normalized". A global linkscan.cfg is always required in the main LinkScan directory. The settings in this file establish defaults for all configured Projects. The project-specific linkscan.cfg files in the individual project directories have been greatly simplified with far fewer items to configure. However, any default setting in the global linkscan.cfg file may be overridden by pasting the appropriate command into the linkscan.cfg file for an individual Project.
We have found that these changes greatly simplify system configuration and administration in complex multi-Project scenarios. The automatic conversion script will attempt to normalize the global and project-specific linkscan.cfg files. However, users may find they can achieve further simplification with a few minutes of manual inspection and editing.
LinkScan 5.4 is primarily a maintenance release that consolidates several minor bug fixes and enhancements. It includes changes for the new LinkScan Server and LinkScan Workstation products as well as infrastructure to support new upcoming enhancements.
At LinkScan 5.3 we have improved the processing of Server Side Include (SSI) tags when using File System navigation. SSI Include tags are fully expanded by LinkScan provided that Expandssi is enabled in linkscan.cfg. SSI tags that require scripts to be executed (CGI/EXEC) are not processed. When using HTTP Navigation, all SSI's (including executables) are processed by the HTTP server.
At LinkScan 5.3 you may optionally tell LinkScan to check your HTTP server access logs and include the per-document page impressions on the SiteMap reports. To enable this feature, be sure to set the Httpdlogfile parameter in linkscan.cfg.
At LinkScan 5.3, we have incorporated an audit trail of site scans. Each execution of linkscan.pl will append a record to the file .../linkscan/project_name/data/linkscan.sum. This tab delimited file may be imported into spreadsheets and other applications for management reports.
At LinkScan 5.3, when scanning via HTTP, LinkScan can submit an arbitrary cookie to your server. This makes it easier to validate those sites that use Cookie based user authentication schemes.
We have added support for the Onlyorphans command in linkscan.cfg to provide finer control over which directories on your server should and should not be checked for orphaned files.
We have made several cosmetic improvements to the SiteMap and TapMap reports.
We have made several small improvements to the treatment of pathnames containing non-standard (e.g. %encoded) characters.
We have inserted code to detect/correct several common configuration errors.
At LinkScan 5.2 we have improved HTTP navigation (the Execute command) for validating dynamic content (CGI scripts, Server Side Includes etc.), enhanced several of the LinkScan Reports and added some completely new reporting options. Some of the specific enhancements include:
The LinkScan Reports no longer require the use of Cookies for storing individual user preferences. The system will use cookies if available - otherwise it will maintain current settings by passing them via the URL. This avoids random problems that some users have reported with certain browser installations.
The Summary/Detail Report has been enhanced with an option to display all documents older than "N" days.
The Summary/Detail Report has been enhanced with an option to sort the documents by the number of "Inline Bytes". The Byte Count includes the document itself, any inline images (<img src> but not <img lowsrc> tags), background images and image buttons. Each unique image is only counted once - we assume that the client will cache multiple references to the same image within the same document. In-line image references to remote servers are also counted (assuming LinkScan can reach them via HTTP and that the server will return a size header without having to download the entire file).
The Summary Statistics Report displays separate tables for Internal and External links.
The Summary Statistics Report error counts are hyperlinked to the corresponding Detailed Report.
The All Pages Linking Report displays separate tables for Links To: and Links From:.
We have added the new Redirections Report to summarize all local redirections including the missing "/" on directory references, <META HTTP-EQUIV REFRESH> tags and actual HTTP redirects.
Several Reports provide for Include and Exclude expressions that may be matched on Referer or Target. Include/Exclude expressions may now be matched on Referer, Target or either.
When scanning for Orphaned Files user may control the depth of the scan in terms of directory levels with the new Maxdirlevels configuration option in linkscan.cfg.
We have added the Noorphans command option to linkscan.cfg. This will Exclude all files matching the specified expression from the Orphans Report without effecting any other Reports.
We have added the new Autohttp configuration command to linkscan.cfg. When navigating the Website via File System navigation, LinkScan can automatically attempt HTTP access when file system access fails to locate a specific file. This may be used to eliminate the requirement to configure server aliases and redirections but with some loss of performance. Note: file system access is typically 5 to 10 times faster than HTTP access.
We have improved the detection of, and recovery from, several rare exception conditions. Additional diagnostic capabilities have been incorporated to facilitate problem investigation and resolution in conjunction with Elsop's Technical Support personnel.
LinkScan 5.0 was a major new release. At LinkScan 5.1 we have consolidated several minor bug fixes and a number of improvements designed to further simplify LinkScan administration. The following items are worthy of note:
We have improved the default placement of output files from command-line generated reports (linkscan.cgi and dispatch.pl). Users must define the pathname to the default directory in the file linkscan.sys with the Reportsdir setting.
Some servers require that the LinkScan CGI scripts be installed a special directory (often cgi-bin). In these situations the scripts need to know where to find the remainder of the LinkScan files. In the past, this was achieved by setting a special variable ($LS::Lsdir) in the header of each script. At LinkScan 5.1, we have eliminated that special variable and the full pathname to the LinkScan directory must be defined in the hidden file called .linkscan. We have updated the LinkScan Configurator accordingly to make this change transparent to users installing LinkScan via that method.
We have enhanced the SiteMap customization features to make it easier to include or exclude different files from the LinkScan SiteMap and TapMap.
We have enhanced LinkScan to validate URL's contained within drop-down lists.
We have improved the error detection and recovery logic associated with various system interfaces to ensure that any configuration errors or exceptions are more clearly detected and reported.
We have significantly reduced LinkScan's virtual memory usage on large web sites. Virtual memory usage will depend to some extent on the Operating System, Perl version, malloc() implementation and the nature of the site being scanned. However, in studies, we have found that 1 MByte of virtual memory per 1,000 HTML documents is a reasonable rule-of-thumb. (This compares with 5-10 MBytes per 1,000 documents at LinkScan 3.x/4.x).
We have made many other changes to the internal code and data structures to improve performance, reliability and maintainability as well as providing a platform for future enhancements.
The previous implementation of multiple Projects has been changed. The new model introduces several new concepts which are defined below:
A Project is defined as a distinct LinkScan configuration. In general, you will only need to create one such configuration for each domain or virtual host on your server. You may, optionally create multiple configurations for a single domain or virtual host. Only LinkScan Enterprise includes the ability to scan multiple hosts within a single Project.
Within a given Project you may define multiple Owners. Each file within the Project may be assigned to one of an arbitrary list of Owners by any or all of the following means:
LinkScan creates (mainly) separate databases for each Owner. This facilitates user-selective queries and greatly improves performance. By default, LinkScan also creates an All Owners database for each Project.
Usernames are used to:
By default, LinkScan will set the default Owner selection to the current Username.
We have enhanced the LinkScan SiteMap and TapMap. SiteMaps and TapMaps based on Link Ordering are provided for each Project. In addition, SiteMaps and TapMaps based on Directory Structure are provided for each Project and each Owner within that Project.
Orphaned File listings have been removed from all of the previous reports and we have added a new Orphaned Files Report to the Main Menu.
We have enhanced the All Pages Linking To ... Report. In previous versions you could only view the first "N" referring pages where "N" was limited to the Maxgoodint setting in linkscan.cfg. From the Summary/Detail Overview you may now select a complete list of referring pages.
We have enhanced many other reports with new and more consistent options including:
We have also improved the formatting options. Reports may be created in any of the following formats:
We have similarly enhanced the command line reporting options. The linkscan.rep file format has been extended and you may now define specific default parameters for each report type.
We have updated and improved all of the LinkScan documentation and added the LinkScan Quick Reference Card.
We have provided the capability to relocate the LinkScan documentation and images directory to any URL on your server. You may also control what files the [Help] and [Status Code] hyperlinks on the reports will link to so that you can integrate local site-specific documentation more easily.
We have made several small error corrections and numerous other minor enhancements in response to customer feedback.
At LinkScan 4.2, we have focused on enhancements to the various reporting modules with both new and more consistent options.
We made the new Summary --> Detail Report the default selection with options to sort the report (ascending or descending) on the Number of Errors in the document, Document URL, or Document Age. It includes hyperlinks to LinkScan/QuickCheck which may be used to display all of the potential problems with a selected document.
We improved LinkScan/QuickCheck with many new features including Simple and Advanced Options Menus and the ability to configure default options for it in linkscan.sys.
QuickCheck "remembers" individual user preferences by setting a Cookie in the users browser.
We have also added Source Code Line Numbers to the LinkScan reports where it will be useful in diagnosing and correcting errors in a document.
In addition, QuickCheck integrates with Weblint. Weblint performs rigorous HTML syntax checking of the source document. This optional feature may be used to show all of the HTML syntax errors and broken links in a single report together with the HTML source code.
The menus for the various LinkScan CGI scripts may be customized by creating the files linkhead.txt and linkfoot.txt in the LinkScan directory.
When using custom headers and footers with SiteMap and TapMap, LinkScan displays a discrete version stamp and copyright notice at the bottom of each page.
The LinkScan documentation has been restructured and supplemented with a new LinkScan User Guide. This new guide is directed at the needs of Content Managers and Developers. The LinkScan Reference Manual (this document) is directed at the needs of Systems Administration personnel.
We added significant performance and accuracy enhancements when validating FTP links.
We added greater flexibility when creating and configuring multiple Projects.
We added a "-quiet" option to allow for more succinct progress displays during scanning. LinkScan also displays a total error count on completion of a scan.
We fixed several minor bugs and incorporated numerous other small changes requested by customers.
The following changes and enhancements were incorporated in LinkScan version 4.1:
LinkScan 4.1 is significantly faster at scanning the internal links. In tests, CPU usage was reduced by 30-50 percent
Added LinkScan/QuickCheck
Added the ability to validate FTP links. The FTP protocol is older and less consistently implemented that HTTP. You may, therefore, find that LinkScan produces some false errors when checking links to certain servers. If you discover any such examples, please E-mail the URL to <ftp@elsop.com> and we will seek to address the issue in the next release
Added syntax checking of mailto links. LinkScan does not probe or send E-mail to those destinations
Added the "All Pages Linking To ..." Report to the Main Menu of reporting options. This report helps webmasters quickly identify the impact of removing a document or file by listing all of the pages that link to it
Added support for server-side image maps
Added support for the HTTP Proxy-Authenticate feature
Added the additional status code Location Header Not Absolute
Added the additional status code URL Contains Unsafe Character
Numerous enhancements to LinkScan/Dispatch including the addition of the Defaultowner and Mailalias commands to linkscan.cfg, and the Ownertags command to linkscan.cfg. The dispatch.cfg file has been eliminated and those parameters are now defined in linkscan.sys/linkscan.cfg
Numerous enhancements to the LinkScan Configurator
Several minor bug fixes and improvements
The following changes and enhancements were incorporated in LinkScan version 4.0:
Added the LinkScan/Dispatch module
Added the Indexoptions directive and the ability for LinkScan to create virtual pages based on a directory listing if no default page exists in that directory
Added the Statuscode directive and the ability to customize the severity of any or all LinkScan Error and Status Codes
Several minor bug fixes and improvements
The following changes and enhancements were incorporated in LinkScan version 3.2:
The LinkScan Configurator will copy CGI files to a 'cgi-bin' directory and update the '$Lsdir' parameter automatically.
LinkScan automatically creates template for new Projects.
Added new 'Noprojectlist' directive to linkscan.sys file.
Added new 'Hostalias' directive to linkscan.cfg file for use with servers that have multiple identities.
LinkScan database is created in a temporary working directory so that previous reports remain available during scanning
Added new !HOME expression to 'Alias' directive in linkscan.cfg.
Added support for a new Global linkscan.cfg file
Several minor bug fixes and improvements
The following changes and enhancements were incorporated in LinkScan version 3.1:
Added the ability to check links embedded within Adobe PDF files. To enable this capability, simply add the 'pdf' suffix to the list of Pdffiles in linkscan.cfg
LinkScan now checks <a name=...> tags in documents that are defined as 'NoFollow'.
Enhanced TapMap such that users can create hyperlinks from regular documents to a specific TapMap at the appropriate position and level.
Added specific support for the <!--#echo var="DOCUMENT_URI" --> Server Side Include
The LinkScan Configurator automatically updates the "#!/usr/local/bin/perl" headers in all of the LinkScan executable files
Added a case-sensitive search option to the LinkScan History Report
Added new Hidelinkprefix option to linkscan.cfg.
Several minor bug fixes and improvements
The following changes and enhancements were incorporated in LinkScan version 3.0:
Redesigned Multi-site Manager for simplified configuration management.
New reporting option to display full system configuration parameters
Significant performance improvements (CPU time and memory) to the LinkScan Reports - linkscan.cgi
Overview by Web Page Report now includes a hyperlink to an Error Report for each page
Various new controls added to control the frequency with which external links are tested.
Randomized the order with which external links are testing to avoid load peaks on remote servers
Added controls to automatically purge/expire the History file, linkscan.hst
The file linkscan.red now includes a listing of the URL's for all pages on your site for easy submission to search engines. Infoseek will accept an E-mail submission containing all the links on your website. In a test submission of 313 pages for one of our websites, Infoseek indexed about 280 of them in about 10 days.
The Noproxy option was changed to work with a partial (versus exact) match.
Improved the Multi-Site Manager and provided for the definition of a default configuration.
<img src=...> tags within <input....> tags are now tested correctly
Added option to disable the TapMap options.
Various minor improvements to the SiteMap/TapMap HTML tags including additional optimization for the Lynx browser family
Several minor bug fixes
The following changes and enhancements were incorporated at LinkScan version 2.1:
Added the ability to emulate server aliases and redirections.
Added the ability to selectively execute CGI scripts and Server Side Includes, parse their output and validate any links that are generated.
Redesigned the capability for validating links to pages that require authentication. Username/password combinations are defined on the basis of server and "realm" rather than specific URL.
Added option to disable orphan checking.
Improved the TapMap navigation tools
Various other minor enhancements and bug fixes
The following changes and enhancements were incorporated at LinkScan version 2.0:
Major restructuring to increase performance and reduce virtual memory utilization especially when scanning large websites with thousands of documents.
Improved Multi-Site Manager to simplify the testing of partial websites and/or sub-sites.
Added "Noproxy" option to selectively disable proxy access on specified servers.
Modified definition of Internal and External links for greater flexibility.
Extended to Hide command to accept Regular Expressions.
Restructured the LinkScan Reference Manual
Various other minor enhancements and bug fixes
The following changes and enhancements were incorporated at LinkScan version 1.2:
Numerous enhancements to the HTML parser
Additional SiteMap and TapMap options. In particular, the incorporation of a Target option to simplify the creation of SiteMaps and TapMaps for use on websites that make use of "frames"
Various other minor enhancements and bug fixes
The following changes and enhancements were incorporated at LinkScan version 1.1:
Addition of the LinkScan Configurator and LinkScan Startup Guide
Initial Release of TapMap
Various other minor enhancements and bug fixes
LinkScan Reference Manual. Section 30. LinkScan Revision History
LinkScan Version 9.0
© Copyright 1997-2001
Electronic Software Publishing Corporation (Elsop)
LinkScan and Elsop are Trademarks of Electronic Software Publishing Corporation
Previous Contents Next | Help Reference HowTo Card |