Tips and tricks
by The editorial team
Red Hat’s customer service and support teams receive technical support questions from users all over the world. Red Hat technicians add the questions and answers to Red Hat Knowledgebase on a daily basis. Access to Red Hat Knowledgebase is free. Every month, Red Hat Magazine offers a preview into the Red Hat Knowledgebase by highlighting some of the most recent entries.
Tips from RHCEs
Wiping a hard drive
by Dominic Duval, Red Hat Certfied Engineer®
Ever needed to completely wipe out critical data off a hard drive? As we
all know, mkfs doesn’t erase a lot (you already knew this, right?). mkfs
and its variants (such as mkfs.ext3 and mke2fs) only get rid of a few
important data structures on the filesystem. But the data is still
there! For a SCSI disk connected as /dev/sdb, a quick:
dd if=/dev/sdb | strings
will let anyone recover text data from a supposedly erased hard drive.
Binary data is more complicated to retrieve, but the same basic
principle applies: the data was not completely erased.
To make things harder for the bad guys, an old trick was to use the ‘dd’
command as a way to erase a drive (note that this command WILL erase
your disk!):
dd if=/dev/zero of=/dev/sdb
There’s one problem with this: newer, more advanced, techniques make it
possible to retrieve data that was replaced with a bunch of 0’s. To make
it more difficult, if not impossible, for the bad guys to read data that
was previously stored on a disk, Red Hat ships the ’shred’ utility as
part of the coreutils RPM package. Launching ’shred’ on a disk or a
partition will write repeatedly (25 times by default) to all locations
on the disk (be careful with this one too!):
shred /dev/sdb
This is currently known to be a very safe way to delete data from a hard
drive before, let’s say, you ship it back to the manufacturer for repair
or before you sell it on eBay!
What are the different utilities used to manage the System V initialization?
by Ryan Del Rosario
Red Hat Enterprise Linux includes several utilities that facilitate the management of System V initialization:
- ntsysv- is a console-based interactive utility that allows you to control what services run when entering a given run level. This utility is used during system installation, but can be run from the command line. It configures the current run level by default. By using the --level option, it can be configured for other run levels
- serviceconf- is an X client that presents a display of each of the services that are started and stopped at each run level. Services can be added, deleted, or re-ordered in run levels 3 through 5 with this utility
- chkconfig- is a command-line utility. When passed the --list switch, it displays a list of all System V scripts and whether each one is turned on or off at each run level. Scripts can be managed at each run level with the --add or --del switches, or with the “on” and “off” chkconfig directives should the script not have predefined run levels for service.
- service- is used to start or stop a standalone service immediately; most services accept the arguments ‘start‘, ‘stop‘, ‘restart‘, ‘reload‘, ‘condrestart‘, and ‘status‘ as a minimum.
The “serviceconf” and “chkconfig” commands will start or stop an xinetd-managed service as soon as they are configured on or off. Standalone services won’t start or stop until the system is rebooted or the “service” command is run.
What is a Caching-only Name Server and how do I configure it to run in chroot environment?
by Liju Gopinath
Caching-only Name Server
A caching-only name server is used for looking up zone data and caching (storing) the result which is returned. Then it can return the answers to subsequent queries by using the cached information.
A caching-only server is authoritative only for the local host i.e 0.0.127.in-addr.arpa, but it can automatically send requests to the Internet host handling name lookups for the domain in question.
In most situations, a caching-only name server sends queries directly to the name server that contains the answer. Because of its simplified nature, a DNS zone file is not created for a caching-only name server.
Running the Caching-only Name Server in an chroot environment is a secure approach. The chroot environment has more security compared to the normal environment.
Configuration
The packages which needs to be installed are:
- bind-9.2.4-16.EL4.i386.rpm
- bind-chroot-9.2.4-16.EL4.i386.rpm
- caching-nameserver-7.3-3.noarch.rpm
These packages can be installed from the CD using the command:
# rpm -ivh
or using the up2date command:
# up2date
The configuration files associated with the caching name server are:
- /etc/sysconfig/named
- /var/named/chroot/etc/named.conf
- /var/named/chroot/var/named/named.local
- /var/named/chroot/var/named/named.ca
- /var/named/chroot/var/named/localhost.zone
- /var/named/chroot/var/named/localdomain.zone
Edit /etc/sysconfig/named and ensure that the following entry is made in the file, which tells named to run the chroot environment.
ROOTDIR=/var/named/chroot
Note: /etc/named.conf is a symbolic link to /var/named/chroot/etc/named.conf file.
To configure the /etc/named.conf file for a simple caching name server, use this configuration for all servers that don’t act as a master or slave name server. Setting up a simple caching server for local client machines will reduce the load on the network’s primary server. Many users on dialup connections may use this configuration along with bind for such a purpose. Ensure that the file /etc/named.conf highlights the entries below:
options {
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
forwarders { A.B.C.D; W.X.Y.Z; };
forward only;
};
// a caching only nameserver config
controls {
inet 127.0.0.1 allow { localhost; } keys { rndckey; };
};
zone "." IN {
type hint;
file "named.ca";
};
zone "0.0.127.in-addr.arpa" IN {
type master;
file "named.local";
allow-update { none; };
};
With the forwarders option, A.B.C.D and W.X.Y.Z are the IP addresses of the Primary/Master and Secondary/Slave DNS server on the network in question. They can also be
the IP addresses of the ISPs DNS server and another DNS server, respectively. With the forward only option set in the named.conf file, the name server doesn’t try to contact other servers to find out information if the forwarders does not give it an answer.
Now, /etc/resolv.conf should look like this:
nameserver 127.0.0.1
Start the caching-dns server
# /sbin/chkconfig named on # service named start
Test the caching-name server
# nslookup >Default Server: localhost Address: 127.0.0.1
Now enter a query in nslookup. For example: www.redhat.com
> www.redhat.com Server: localhost Address: 127.0.0.1 Name: www.redhat.com Address: 209.132.177.50
nslookup now asked the named to look for the machine www.redhat.com. It then contacted one of the name server machines named in the root.cache file, and asked it’s way from there. It might take a while before the result is shown, as it searches all the domains the user entered in /etc/resolve.conf. When tried again, the result should be similar to this example:
> www.redhat.com Server: localhost Address: 127.0.0.1 Non-authoritative answer: Name: www.redhat.com Address: 209.132.177.50
Note the Non-authoritative answer in the result this time. This means that named did not go out on the network to ask this time, it instead looked up in its cache and found it there. But the cached information might be out of date. So the user is informed of this danger by it saying Non-authoritative answer. When nslookup says this the second
time when a user ask for a host, it is a sign that it caches the information and that it’s working. Now exit nslookup by giving the command exit.
Why is the X11 Forwarding not working on my system when my ssh daemon is correctly configured with ‘X11Forwarding yes’?
by Eduardo Damato
The easiest way to debug the ssh connection is to run ssh with increased verbosity. This normally shows many silent errors/problems.
In order to increase verbosity, issue the following command:
$ ssh -v -X -l root
The output would look like the following:
OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Connecting to cs1 [192.168.161.134] port 22. debug1: Connection established. debug1: identity file /home/user/.ssh/identity type -1 debug1: identity file /home/user/.ssh/id_rsa type -1 debug1: identity file /home/user/.ssh/id_dsa type -1 debug1: Remote protocol version 1.99, remote software version OpenSSH_3.9p1 debug1: match: OpenSSH_3.9p1 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_3.9p1 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'cs1' is known and matches the RSA host key. debug1: Found key in /home/user/.ssh/known_hosts:37 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,gssapi-with-mic,password debug1: Next authentication method: gssapi-with-mic debug1: Authentications that can continue: publickey,gssapi-with-mic,password debug1: Authentications that can continue: publickey,gssapi-with-mic,password debug1: Next authentication method: publickey debug1: Offering public key: /home/user/.ssh/identity debug1: Server accepts key: pkalg ssh-dss blen 816 debug1: Authentication succeeded (publickey). debug1: channel 0: new [client-session] debug1: Entering interactive session. debug1: Requesting X11 forwarding with authentication spoofing. debug1: Remote: No xauth program; cannot forward with spoofing. Last login: Mon Aug 14 14:17:11 2006 from 192.168.161.1
From the example output above, there are two lines of interest. Check the following lines:
debug1: Requesting X11 forwarding with authentication spoofing. debug1: Remote: No xauth program; cannot forward with spoofing.
In this case, the xauth program is not installed in the system, and therefore the ssh target system can not add itself to the X authentication database of the X server, so Xforwarding is silently denied. To resolve the problem, install the xauth package:
# up2date xorg-x11-xauth
try to ssh in to the machine and verify that the display environment variable is correctly setup by the tunnel:
$ ssh -X -l root # echo $DISPLAY localhost:10.0
Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?
by Chris Snook
When ext3 encounters possible corruption in filesystem metadata, it
aborts the journal and remounts it as read-only to prevent causing damage to the metadata on disk. This can occur due to I/O errors while reading metadata, even if there is no metadata corruption on disk.
If filesystems on multiple disk arrays or accessed by multiple clients are repeatedly becoming read-only in a SAN environment, the most common cause is a SCSI timeout while the Fibre Channel HBA driver is handling an RSCN event on the Fibre Channel fabric.
An RSCN (Registered State Change Notification) is generated whenever the configuration of a Fibre Channel fabric changes, and is propagated to any HBA that shares a zone with the device that changed state. RSCNs may be generated when an HBA, switch, or LUN is added or removed, or when the zoning of the fabric is changed.
Resolution:
Some cases of this behavior may be due to a known bug in the interaction between NFS and ext3. For this reason, it is recommended that users experiencing this problem on NFS servers update their kernel, at least to version 2.6.9-42.0.2.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199172
The lpfc driver update in Red Hat Enterprise Linux 4 Update 4 includes a change to RSCN handling which prevents this problem in many environments. Users of Emulex HBAs experiencing this problem are advised to update their kernel, at least to version 2.6.9-42.EL. Here is the link to the related bugzilla entry https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179752
The lpfc and qla2xxx drivers also have configuration options which cause the driver to handle RSCNs in a less invasive manner, which often prevents timeouts during RSCN handling. These options must be set in the /etc/modprobe.conf file:
options lpfc lpfc_use_adisc=1 options qla2xxx ql2xprocessrscn=1
After making these changes, the initrd must be rebuilt and the system must be rebooted for the changes to take effect.
Recommendation:
This problem may be prevented or mitigated by applying SAN vendor recommended configurations and firmware updates to HBAs, switches, and disk arrays on the fabric, as well as recommended configurations and updates to multipathing software. This particularly applies to timeout and retry settings.
The architecture of Fibre Channel assumes that the fabric changes infrequently, so RSCNs can be disruptive even on properly configured fabrics. Events which generate RSCNs should be minimized, particularly at times of high activity, since this causes RSCN handling to take longer than it would on a mostly idle fabric.
In multipathed environments with separate fabrics for different paths, zone changes to the fabrics should be made far apart in time. It is not uncommon for complete handling of a zone change to take many minutes on a busy fabric with many systems and LUNs. Performing zone changes separately minimizes the risk of all paths timing out due to RSCN handling.
What does the /proc/cluster/lock_dlm/drop_count file do and why do some nodes exceed this value?
by Wade Mealing
The lock_dlm lock manager keeps a certain number of DLM locks for GFS files in memory, even after the GFS files are closed. This DLM "lock cache" boosts performance for GFS files that are frequently opened
and closed. The /proc/cluster/lock_dlm/drop_count file is used to tune the number of locks that lock_dlm keeps in its cache.
When /proc/cluster/lock_dlm/drop_count is not zero, the lock_dlm lock manager attempts to keep the number of local locks for GFS below this level (on a per node basis). This is not a "hard fixed" value but when the number of locks on each node exceeds this value, DLM will begin to release some of these locks. As a result, a minor performance penalty may occur when attempting to access files or posix locks no longer in the cache.
The current DLM implementation in Red Hat Enterprise Linux 4 defaults to 50000 locks per node. This may be modified but must be done so before the GFS file system is mounted on the node.
To change this value, use the following command:
/bin/echo "12345" > /proc/cluster/lock_dlm/drop_count
Where "12345" is the upper limit on the number of cached DLM locks.
Changes to this file will not take effect on currently mounted file systems. If the value is set to zero, DLM will never purge locks from its cache.
This value is not persistent across reboots, so the command should be executed on a node after each time it has been rebooted or fenced. It may also be automated by adding the command to the gfs service init script, /etc/rc.d/init.d/gfs.
The current number of cached locks for a single mount point may be obtained with the command:
# gfs_tool counters /mount/point
(Where the /mount/point is the location of where the GFS file system is mounted)
This command will return the total number of locks cached by all nodes. This may often be greater than the number of nodes multiplied by the number of open files per node. This is not unusual or dangerous.
Carefully consider whether this value should be changed on any node. Modifying this value only works on nodes using the dlm_lock lock manager. All changes should be tested in a test environment with a production work load before making these changes in the production
environment.
How do I set up single master replication in Directory Server?
by Andrew Ryan
Replication is when a Directory Server gets information from another Directory Server and allows clients to access that information. This article assumes that there are two instances of Red Hat Directory Server already configured.
The server containing the data that is to be replicated is the master server and the server receiving the data is the slave server.
Follow these steps to configure the system:
- Create a replication user
The first step is to create a user that has access to the part of the directory
that needs replicating. To do this, start up the Directory Server console on the slave server:- Select the Directory tab, then right-click on the
config tab. - Select New -> User.
- Fill in the mandatory fields on the dialog box. Remember the user ID that is chosen, and specify a password for the user. The DN (needed later) for this user is uid=username,cn=config.
- Select the Directory tab, then right-click on the
- Enable Changelog on master server
Open the Directory Server console on the master server.
- Select the Configuration tab.
- Click on the Replication item.
- Check the box marked Enable Changelog.
- Click the button marked Use Default.
- Select Save.
- Enable master server as a single master replica
Open the Directory Server console on the master server.
- Select the Configuration tab.
- Expand the Replication item.
- Click on userRoot item.
- Click on Enable Replica.
- Select Single Master
- Choose a unique Replica ID
- Click on the Save button.
- Enable slave server as a dedicated consumer
Open the Directory Server console on the slave server.
- Select the Configuration tab.
- Expand the Replication item.
- Click on userRoot item.
- Click on Enable Replica.
- Enter the DN of the Replication User set up in Step 1 as a
supplier DN, in the box marked "Enter a new Supplier DN".
You may need to scroll down to see this box. - Click on the Add button next to this box.
- Ensure that the Replica type is Dedicated
Consumer - Click on the Save button.
- Configure a replication agreement
Open the Directory Server console on the master server.
- Select the Configuration tab.
- Expand the Replication item.
- Right-click on userRoot item.
- Select New Replication Agreement...
- Enter a name and description for the agreement, and click Next
- Select the slave server from the drop down list, or, if it is not present
click on Other... and enter the slave's details. - In the Uid field, enter the DN for the replication user. For example,
uid=Replication User,cn=config. - Enter the password for the replication user.
- Select Next.
- Choose attributes to replicate, or select Next
for a full replica. - Choose a Synchronisation schedule, or select
Always Keep in Sync. - Select Next.
- Select Initialise Consumer now. If you do not
want the slave server to be updated, or want to manually configure the slave server, select another option, such as exporting to LDIF. - Select Finish.
- Configure clients to handle failover
Edit /etc/ldap.conf, changing the
host line to add in the slave server. In the example,
the slave server is used as the primary server, with failover to the master if
the slave is down.host slave.example.com master.example.com
- Test the configuration
Make sure the openldap-clients package is installed.
Using the ldapadd command, check that the slave is a read-only replica, and
attempts to add information to it result in referrals to the master server:[slave ~]# ldapadd -D 'cn=Directory Manager' -H ldap://slave/ -x -W -f people.ldif Enter LDAP Password: ldap_add: Referral (10) matched DN: dc=example,dc=com referrals: ldap://master.example.com:389Using the ldapadd command, add the data to the master server:
[slave ~]# ldapadd -D 'cn=Directory Manager' -W -H ldap://master/ -x -f people.ldif Enter LDAP Password: adding new entry "dc=master,dc=example,dc=com" adding new entry "uid=test,dc=master,dc=example,dc=com"
Using ldapsearch, check that both the master and the slave can be queried, and
the information added has been replicated:[slave ~]# ldapsearch -H ldap://slave/ -x '(uid=test)' > slave-search.txt [slave ~]# ldapsearch -H ldap://master/ -x '(uid=test)' > master-search.txt [slave ~]# diff master-search.txt slave-search.txt [slave ~]#
Why should manual fencing be avoided in a production cluster?
by Demosthenes Mateo
Global File System (GFS) manual fencing should be avoided in production clusters. It is meant to be used for testing purposes only when an appropriate fencing device is not yet available. Red Hat recommends a network power switch or a fiber channel switch fencing device for production clusters to guarantee filesystem integrity.
Outlined below is a scenario that explains how manual fencing might lead to filesystem corruption:
- A node stops sending heartbeats long enough to be dropped from the
cluster but has not panicked or the hardware has not failed. There a
number of ways this could happen: faulty network switch, gulm
hanging while writing to syslog, rogue application on the system locking out other applications,etc. - The fencing of the node is initiated in the cluster by one of the other
members. Fence_manual is called, lock manager operations are put on hold until the fencing operation is complete. (NOTE: Existing locks are still valid and I/O still continues for those activities not requiring additional lock requests.) - Administrator sees the fence_manual and immediately enters fence_ack_manual to the get cluster running again, prior to checking on the status of the failed node.
- Journals for the fenced node are replayed and locks cleared for those
entries so other operations can continue. - Fenced node continues to do read/write operations based on its last
lock requests. File system is now corrupt.







January 23rd, 2007 at 10:12 pm
hello sir,
i would like to ask some questions.
I am Bhupesh Karankar, RHCT and RHCE in RHEL 3.0, Working as System Administrator in OpenLX.
i found some problem in our samba server from last few days.
we have more then 2000 user, who use samba share in a single time, we have share some document from samba server.
problem is: when a user use a excel file from samba share, and change/edit and save that file, it work, but when other/second user open a already open file, then he got a popup say… file is locked.
we are not faceing that problem before few days, but now, i dont know what is the reason for that. configuration is same as before.
can u guide me what i have to do. which config may create problem.
can u give me a simple example config for that.
wating for ur reply
bhupesh karankar
May 7th, 2007 at 11:45 pm
I believe that’s part of Excel…I know Access has this feature; it locks the file so that the file doesn’t get 2 different inputs of data @ same time…If it is open @ point A and then someone @ point B opens it, and enters data, saves and is done, then point A add his info…things dont go well.. to reinforce integrity if I remember correctly.
May 31st, 2007 at 3:05 am
hi, everyone,
can anyony say that how can i change the kernel in rhel4?
from 2.6.9-5 to 2.6.9-42
what are the requirements and where can i find necesary packages?
July 10th, 2008 at 4:46 am
hi,
who told me that its a part of Excle,
do u know anything about oplocks in samba……….
please dont give any comment if YOU dont know anything about it.
Thanks
Bhupesh