My team inherited an Oracle-based web application and they are fairly inexperienced with Oracle database servers.
The Oracle 10g server is running on a Windows 2003 Server with plenty of disk space and from time to time, all connectivity is lost, the application stops working, not even SQL Plus is able to connect to the database server.
But when we check the Windows Service manager, it says that the service is up and running. A restart usually fixes the problem, but we need to properly troubleshoot it so we know what's causing it and so we can avoid it to happen anymore.
Where should we start looking for clues? What are the criticial log files we should be investigating?
On the server you should have an environment variable called ORACLE_HOME which indicate the root of the Oracle install. Most likely the Oracle trace/dump folders will be under there. Search for a folder called "bdump" (background dump). That's where the main log file, knows as the alert log, will be, as well as trace files generated by background processes. There will be an adjacent file called "udump" which will contain any trace files generated by user processes.
However, my real advice is that you should either hire someone who knows Oracle or get Oracle Support involved.
The alert log would be the first file to check.
It will probably be in $ORACLE_HOME/admin/bdump and (probably) called alert_DATABASE-SID.log
It contains most of the important actions that the database does, as well as any important errors that occur.
I have to agree with cagcowboy. Check your alert logs for errors. If no errors then maintain a sysdba login into the database and when it hangs, attempt to do a hang analysis. See metalink note 215858.1 on hanganalyze.
Have you tried
tnsping? We've occasionally run into problems with the listener that requires an assist from our DBA.
tnsping is the diagnostic tool we use to do triage.
I would recommend hiring an experienced Oracle DBA if at all possible.
check the alert log to see how the Db is structured. sometimes badly set parameters make hangs or slow performance. or you can shutdown and start in mount mode, then check the v$parameter values for problems. setting total memory is very important.