Text 442, 171 rader
Skriven 2007-01-30 19:13:14 av Mike Luther (1:117/3001.0)
Kommentar till text 441 av Sean Dennis (1:18/200.0)
Ärende: Really strange
======================
Sean .
SD> I turned on my monitor this morning and the computer
SD> was frozen. I couldn't get C-A-D to even work, so I
SD> turned off the computer as I had no other choice.
SD> Then, as I turned on the computer, much to my horror, CHKDSK started
SD> marking EVERY file as an "incorrect file". Needless
SD> to say, it moved nearly 3000 files to \FOUND0. I then
SD> had to go to work.
SD> I came home and ran several diagnostics on the hard
SD> drive to make absolutely sure there was nothing wrong
SD> with it. I then booted from my 4.52 CD to copy some
SD> things over to a diskette for safekeeping and I
SD> couldn't even do that.
Yes, I've seen this horror a couple times in many boxes and years. You'll see a
warning something like, "The HPFS File System has failed. Back up your data and
.. ", or something like that. There are also JFS file system failures but I
have no experience at that.
Generally, if you were there at the computer when this takes place, and if
there is no trap for the error that displays the visual alert above, you will
hear three horrible tones out of your speaker. It will go beep, beep, beep in
three descending pitch tones and locks. Which sends you into the hard disk Red
Light On solid condition that only can be fixed by power off switch and reboot.
When I see this one the FIRST thing I do is to *NOT* try to reboot the box to
the hard drive. Instead I use the floppy utility diskette set for the
particular box to boot to the OS/2 command line instead. And, yes, I keep a
complete set of floppy utility diskettes that will boot EVERY box; I have a
number of them! From the OS/2 command line I then look at the OS/2 hard disk,
which may or may not even be accessible at that point. If it is, I'll first
look at CHKDSK for it and *NOT* try to 'fix' the thing. If it is a 'simple'
issue with only a few issues that are found, and a couple things in FOUND0, I
can get an idea if I should do the CHKDSK /F on the drive or not. And do it
from the command prompt boot; not the hard disk reboot.
If there seem to be a lot of things like you saw start, or I can't even seem to
look at it, there is a copy of Jan Wijk's DFSEE waiting on another OS/2 floppy
diskette. I run it from the command prompt line and make decisions on what to
do next based on the information from DFSEE and what is available to me through
it.
The key to not falling further into this trap than necessary is to *NOT* try to
just 'normally' restart the box with a power up. Because if there really is
major hard disk partition table damage and directory damage even with HPFS, the
'forced' CHKDSK 'repair' that takes place during that next boot run can do far
more damage than if something that was wrong with the hard disk for that
partition is serviced or recovered. Through DFSEE *BEFORE* the HPFS file
system and CHKDSK on that reboot did what you saw; put the whole file system
tree into the FOUND0 bilibong.
There actually are some rather old ways this has happened to me and some new
ones recently that could maybe do this! Those folks still running Warp 4 who
never moved to MCP2 and are using SCSI operations with the Adaptec cards and
drivers could REALLY get into this prior to Fix Pack 16 and 17, with older
Adaptec Drivers. Especially if they are still have not applied the formal OS/2
Device Driver fixpack all the way through Fix Pack 3. Part of the reason for
this is that there are SERIOUS problems in other than the latest current
release of the Adaptec device drivers and some earlier versions of CHHDSK
together with certain versions of the SYSINSTX operation for making a partition
bootable with OS/2. A second vector into all this is how 'new' various
programs you are running with OS/2 are, especially on older OS/2 Warp 4 systems
still, do memory or hard disk I/O.
And some folks have seen this happen with XR_C005 use and PMMERGE.DLL. I never
have to my knowledge. But then, I don't use XWP or ODIN or 4DOS or things
which have been suspected of getting nasty with the file system and OS/2 and so
on.
Part of this focuses, per heavy and hard experience here, on the use of the
later versions of Mozilla or Seamonkey, XWP, Odin, earlier versions of Lotus
Smart Suite for OS/2 and the Norman Virus products for OS/2. Those with
communications programs which are left running all the time on a box such as
BBS operations, other TCP/IP operations, FTP servers and the like and JAVA
operations which are in use. Memory use really ramps up for the 200-400MB we
see in Seamonkey and so on now, plus all this you see relative to
VIRTUALADDRESS space boundaries and so on, a specific issue still is of focus
that I still see even on MCP2 with XR_C005 Fixpack, latest everything here.
Especially on OS/2 SCSI drive boxes, there is still an issue which can focus
out during the use of the HPFS Write Cache service for a drive, at the same
time memory is already into the used arena where the SWAPPER.DAT file has grown
from the initial size, networking I/O is taking place simultaneously, and OS/2
decides it has to update the .INI files!
Seriously. You really need to set the initial size of the SWAPPER.DAT file to
one that won't force it to be grown in normal operations. And you need to put
it on a partition that is less used than your boot partition if possible! These
issues all contribute to hard box locks which are now far more often than they
once were. Our programs and tools are demanding far more memory use and file
and swapper use than the original 16MB size!
These are some of the issues where I will see a box hard lock. And in my case,
the single most usual curious point where I will see this is from what are
called Long File blocks to the hard drive. Programs which use these very
unusual Long File blocks (64K per block) include, as best I know, the BA2K
Server Pro SCSI DAT tape drive backup operations, the current Norman Virus
operations when reading and working the now over 7MB of virus sigs per memory
read and file scan for inbound files over the network, the Seamonkey operations
when used with the Privoxy 3.0.3 proxy server operation. In particular, the
Seamonkey operations and proxy server issues where Privoxy leaves the log file
open from boot run to shut down and NEVER flushes it to the disk plus used a
single thread process for everything it was doing have, as I've researched
this, has been a real cause of this. This is the reason I've been doing such
hard work proofing the Seamonkey releases post the 1.05 and so on range. To
see if I could see this quit happening! Here is what I've seen.
A substantial part of this hard lock mess stopped when I moved to the new
Privoxy 3.0.5 multi-threaded release - and - started killing it after each
Seamonkey use and re-starting each time I fired off Seamonkey. The second real
cleanup happened when the Moz crew cleaned up the never-ending cascade of used
memory from repeated openings of the Newsgroups. And that has taken MONTHS but
the as released formal Seamonkey 1.1 versions post January 11, 2007, have, as
far as I can see from PMPatrol or Memsize or Sysinfo 8.20, finally stabilized
the memory romp, leakage, what have you.
If you have been using Seamonkey post the November 27, 2006, nightly prior to
when this got fixed after January 4, 2007, you likely stepped right into this
pit this way if you left it running for long.
If I have PMPatrol up when lockup happens like this, one way I know this
happens is like others have seen this. When the system simply runs out of RAM
space and runs afoul of cache writes to the disk! BLAM locked box, period.
Which if it was writing to the HPFS directories is woe, woe, woe and mo.
And, yes, yap, yap, yap here -- THIS, Sean, is where you can see the whole HPFS
file system get corrupted .. with the disk directories getting corrupted
producing the whole FOUND0 mess you saw.
So to help you specifically, if that can be done:
What version of OS/2?
Fix Pack level?
IDE or SCSI drive?
If SCSI then what adapter and SCSI driver version?
How much memory?
Tested with memory tester at board level on a DOS boot?
Is drive only OS/2 or has other operating systems on it?
Do you have NETBIOS OVER TCP/IP installed?
With respect to each major change or addition of a tool, product or
MOZ/Seamonkey use or version update:
Last time during work you looked at free RAM, Swapper size?
Last time you ran Unimaint or CHECKINI?
Seen major Desktop cleanup that might explain something?
Last time you ran CHKDSK during your work without /F?
What was running on the box when you last left it up and unattended before you
saw this?
I understand your fears. But at the same time we are moving farther and
farther into the OS/2 world with really huge memory hogging programs and disk
I/O instensive stuff. Which have had problems recently that can leave you
stunned, looking in disbelief, in the Red Light District! It is the reason I
still test and test and test memory and all that with each new major change in
a driver or whatever. Until I really get some indication that there is not
trouble waiting up ahead, I am very careful and also do solid full backups to
either my SCSI DAT drives for each system or with the floppy disk utility boots
and DFSEE 7.1.5 or later to a backup complete cloned hard drive for EVERY box I
have.
--> Sleep well; OS/2's still awake! ;)
Mike @ 1:117/3001
--- Maximus/2 3.01
* Origin: Ziplog Public Port (1:117/3001)
|