LinkScan

LinkScan Reference Manual

Section 28

  Previous   Contents   Next   Help   Reference   HowTo   Card 

LinkScan File Formats


The following notes describe the format of many of
the LinkScan database files stored in:

...LinkScan/ProjectName/data/
...LinkScan/ProjectName/hist/

Each file is created in (mainly) ASCII format,
with one Record per Line. Each Record contains
a number of Fields, delimited with <Control-G>
characters (Octal: 007). The Fields associated
with each Record type are outlined below.

linkscan.doc
============

One record per Document (does not include images etc)

 0 = Document URL
 1 = Document Type
 2 = Clicks
 3 = Content-Type (MIME)
 4 = Status Code (see codes.txt)
 5 = Extended Status
 6 = Content-Length (bytes)
 7 = Last-Modified (date/time)
 8 = Document Title
 9 = Location for Redirect
10 = Original Status Code (pre-redirect)
11 = File System Pathname
12 = Owner Code (see linkscan.own)
13 = Total Internal Links
14 = Bad Internal Links
15 = Total External Links
16 = Bad External Links
17 = Suspect External Links
18 = In-line bytes (page weight)


linkscan.fil
============

One record per file (e.g. images versus Documents)

Format is same as linkscan.doc, fields 0-12


linkscan.orp
============

One record per orphaned file

Format is same as linkscan.doc, fields 0-12


linkscan.mad and linkscan.map
=============================

SiteMap Data
linkscan.mad -- directory order
linkscan.map -- link order

 0 = Level
 1 = Document URL
 2 = Title
 3 = Document Size
 4 = Document Date/Time


linkscan.int, linkscan.ext, linkscan.int.err, linkscan.ext.err
==============================================================

Link Data -- internal, external, good and bad.

 0 = From URL index (see linkscan.idx)
 1 = Line number times 10
 2 = To URL index (see linkscan.idx)
 3 = Link Type
 4 = Status Code (see codes.txt)
 5 = Extended Status
 6 = Link Caption


linkscan.sum
============

Summary Statistics (Note this file is TAB delimited)

 0 = Version
 1 = Date and time of scan
 2 = Total Documents
 3 = Missing Documents
 4 = Documents Containing Errors
 5 = Total Other Files
 6 = Missing Other Files
 7 = Total Anchors
 8 = Missing Anchors
 9 = Total External Links
10 = External Links Tested This Scan
11 = External Links with Errors
12 = External Links with Possible Errors
13 = External Links with Warnings
14 = Total Orphans


hist/xxxxxx/dat
===============

History Data -- New File Created for Each Scan

 0 = Document URL
 1 = Owner Name
 2 = Document Type
 3 = Clicks
 4 = Content-Type (MIME)
 5 = Status Code (see codes.txt)
 6 = Content-Length (bytes)
 7 = Last-Modified (date/time)
 8 = Document Title


Document Type Codes
===================

 H = HTML
 D = PDF
 M = Image Map
 S = Flash
 Y = Special Control
 Z = Import

 I = In-line image
 F = File
 N = HTML nofollow

 A = Anchor
 R = Redirection

 U = External
 X = Special

LinkScan Reference Manual. Section 28. LinkScan File Formats
LinkScan Version 9.0
© Copyright 1997-2001 Electronic Software Publishing Corporation (Elsop)
LinkScan™ and Elsop™ are Trademarks of Electronic Software Publishing Corporation

  Previous   Contents   Next   Help   Reference   HowTo   Card