| LinkScan Reference Manual
| Section 28 |
LinkScan File Formats
The following notes describe the format of many of
the LinkScan database files stored in:
...LinkScan/ProjectName/data/
...LinkScan/ProjectName/hist/
Each file is created in (mainly) ASCII format,
with one Record per Line. Each Record contains
a number of Fields, delimited with <Control-G>
characters (Octal: 007). The Fields associated
with each Record type are outlined below.
linkscan.doc
============
One record per Document (does not include images etc)
0 = Document URL
1 = Document Type
2 = Clicks
3 = Content-Type (MIME)
4 = Status Code (see codes.txt)
5 = Extended Status
6 = Content-Length (bytes)
7 = Last-Modified (date/time)
8 = Document Title
9 = Location for Redirect
10 = Original Status Code (pre-redirect)
11 = File System Pathname
12 = Owner Code (see linkscan.own)
13 = Total Internal Links
14 = Bad Internal Links
15 = Total External Links
16 = Bad External Links
17 = Suspect External Links
18 = In-line bytes (page weight)
linkscan.fil
============
One record per file (e.g. images versus Documents)
Format is same as linkscan.doc, fields 0-12
linkscan.orp
============
One record per orphaned file
Format is same as linkscan.doc, fields 0-12
linkscan.mad and linkscan.map
=============================
SiteMap Data
linkscan.mad -- directory order
linkscan.map -- link order
0 = Level
1 = Document URL
2 = Title
3 = Document Size
4 = Document Date/Time
linkscan.int, linkscan.ext, linkscan.int.err, linkscan.ext.err
==============================================================
Link Data -- internal, external, good and bad.
0 = From URL index (see linkscan.idx)
1 = Line number times 10
2 = To URL index (see linkscan.idx)
3 = Link Type
4 = Status Code (see codes.txt)
5 = Extended Status
6 = Link Caption
linkscan.sum
============
Summary Statistics (Note this file is TAB delimited)
0 = Version
1 = Date and time of scan
2 = Total Documents
3 = Missing Documents
4 = Documents Containing Errors
5 = Total Other Files
6 = Missing Other Files
7 = Total Anchors
8 = Missing Anchors
9 = Total External Links
10 = External Links Tested This Scan
11 = External Links with Errors
12 = External Links with Possible Errors
13 = External Links with Warnings
14 = Total Orphans
hist/xxxxxx/dat
===============
History Data -- New File Created for Each Scan
0 = Document URL
1 = Owner Name
2 = Document Type
3 = Clicks
4 = Content-Type (MIME)
5 = Status Code (see codes.txt)
6 = Content-Length (bytes)
7 = Last-Modified (date/time)
8 = Document Title
Document Type Codes
===================
H = HTML
D = PDF
M = Image Map
S = Flash
Y = Special Control
Z = Import
I = In-line image
F = File
N = HTML nofollow
A = Anchor
R = Redirection
U = External
X = Special

LinkScan Reference Manual. Section 28. LinkScan File Formats
LinkScan Version 9.0
© Copyright 1997-2001
Electronic Software Publishing Corporation (Elsop)
LinkScan and Elsop are Trademarks of Electronic Software Publishing Corporation