Tech Support Forum banner

In a bind - 2003 Server + 100 XP Pro clients...

1312 Views 11 Replies 6 Participants Last post by  baker421
Ok...it's not a huge deal as my boss is very understanding and he knows that I'm not a complete freakin retard...but it's bugging me and I've been unable to find anything specific on the 'net that applies, so I'm hoping someone can shed some light on this phenomena...

What we had: A Windows 2003 Server domain with Active Directory, running primarily as a DHCP Server/Domain Controller, and a "lump" that I work with assigning Group Policies to the users (at my boss' request). The Server has had problems with it's Global Directory (I'm not positive if that's what it's called - I'm PC Support, not Server Support) for over a year and the aforementioned "lump" has been tasked for over 12 months with replacing it (it still has yet to be done).

What happened: One of the RAID drives failed causing the server to crash - hard. The secondary server was put online, however most of the client computers had (somehow) been assigned static IP addresses so they were pointing to the wrong place. Many hours of hassle later the problem is resolved, however...

Who's getting the blame: Two of my four coworkers are stating that I must have gone around to 100+ PCs and statically assigned all of the IP addresses. Let's not consider the fact that to do that with 100+ computers at five different sites would be an enormous waste of my time, especially considering that I know the difference between Static and Dynamic and the necessity of the client computers to be assigned via DHCP.

What I'm looking for: Has anyone encountered something like a Server 2003 crash that could cause the client computers to "keep" their Dynamically Assigned IP's as their very own? If not, is there a command that could be used in a Logon Script (via Group Policy) that could cause the phenomena?

I don't believe that any one of our staff had the inclination to "maliciously" go around and manually assign the IP addresses, but I'm pretty pissed that this is being pushed onto my shoulders and it would be nice to show that there are other things that could have happened to cause it. Of course, it would be even nicer to prevent the same problem in the future...

Sorry for the pissy tone...I know it sounds like I'm whining, but I've actually got (what I think) are some reasonable questions :p
See less See more
Status
Not open for further replies.
1 - 12 of 12 Posts
I had a domain controller / DHCP server crash a few weeks ago and the XP clients did not get static IP addresses, they only got automatic addresses, 169.x.x.x when the lease ran out. It is possible to assign a static IP address from a script, but it would most likely assign the same IP address to each computer. Is this the case? if no were the addresses in the same subnet that normally would be given out.

Also, why did the failure of a single drive cause the server to crash? Isnt the intent of RAID to PREVENT this or was it a RAID 0.

PS. Its Global Catalog
I had a domain controller / DHCP server crash a few weeks ago and the XP clients did not get static IP addresses, they only got automatic addresses, 169.x.x.x when the lease ran out. It is possible to assign a static IP address from a script, but it would most likely assign the same IP address to each computer. Is this the case? if no were the addresses in the same subnet that normally would be given out.

Also, why did the failure of a single drive cause the server to crash? Isnt the intent of RAID to PREVENT this or was it a RAID 0.

PS. Its Global Catalog
It did not assign the same IP address to each computer. My suspicion is that something caused the individual PCs to statically assign the ip address that they had been dynamically assigned. The IPs on the machines that I looked at were all on the correct subnets by site location. It *looks* like someone had gone to each PC, looked at what their dynamically assigned IP was, and statically assigned it...I just can't imagine anyone on our staff having the time or inclination to do so, and the regular users don't have the option (group policy prevents anyone except for the IT deparment from accessing *anything* except for the printers and 2 or 3 software packages (none of which have that capability).

As far as the RAID goes, I believe it's a RAID 1, however the machine has had enough ongoing problems over the last year or so to make it perfectly feasible that one drive failure could cause alot of unforseen problems. It wouldn't surprise me in the least if the individuals that were supposed to be replacing it messed something else up on the box anyway, but that's just speculation.

Well, I'll keep doing some digging on it and if it turns anything up I'll pass the information on to you folks here - thanks for your input! It certainly is a big question for me right now...
See less See more
When you say statically-assigned, do you mean they have been physically set in the interface's network connection properties? Has the "Obtain an IP address automatically" option been changed to a static assignment, or has the Alternate Configuration tab been set and is actively in use? You will find this information by accessing the properties of TCP/IP under the network connection's properties.

If it is still set as "Obtain an IP address automatically", this means that these addresses are being assigned by another DHCP server. These machines "shouldn't" be affected by a rogue DHCP, as the domain should have authorized DHCP set up. Take a look into it.

As for RAID, RAID 0 refers to a striped RAID, where information is striped between drives to increase performance. This however provides no fault tolerance whatsoever, and if one of the drives fail the other drive is not redundant as it only has pieces of data. RAID 1 on the other hand is mirrored, which does provide fault tolerance. If one drive fails, the other drive should have an exact copy of the failed drive and pick up where things left off.
See less See more
I too am suspicious of a rogue dhcp server such as a router or something else. I might also be wondering about apipa (automatic but self-assigned addresses.)

1. Could you please give us some samples of the "static" IP addresses the clients are getting, ie: 192.168.0.12 or whatever?

2. Could you also give us the scope of the dhcp in the server, for comparison?

Someone might well figure something out just by seeing those numbers.

3. Did you answer as to whether the "Obtain an IP address automatically" was actually not checked on the clients?

Cheers, JB
The secondary server was put online, however most of the client computers had (somehow) been assigned static IP addresses so they were pointing to the wrong place.

:p
Well, "pointing" to the wrong place and static IPs are different in my opinion. By pointing, I would think you mean the DNS server? If this is the case, perhaps it was setup incorrectly from the beginning. If the computers were "pointing" to something other than the DC, then you probably experienced slow connections, dropped mappings and dropped GP. If you are running Windows XP in a domain, it has to point at the domain controller.

Does this answer any of your questions?

I guess I am foggy as to what you mean by "pointing."
You said the DC went down "hard." Are you sure you even have dhcp installed and running with a viable scope?

I'd still really like to see the IP numbers of some of those "static" addresses, and know if the boxes are checked to obtain IP addresses automatically.

Unless the system was set up with statics originally, and the server has been rebuilt in such a way that they don't match, I can't see how a bunch of clients just switched to static. In fact, I don't think they did...?

Help us out here?

Cheers,
JB
This is quite interesting on the fact that your DC had crashed and a secondary "server" was brought online. If this new server did not aquire the FSMO roles and the gobal catalogue from the your original DC and you did not mention that these roles were seized, the original domain is theoretically non existant. The workstations are logging in under cached creditentials.

I cannot see how the workstation IPs were set statically by themselves. There has to be some sort of intervention at the workstation. Even if the workstations use APIPA, if all the workstation are on the same network, no two computers would get the same address. When the computer decides to use APIPA, it will broadcast what IP it will use. If no other computer responds, then it will assign itself that address. However if the computer is using APIPA, it cannot communicate with other computers on other subnets. Therefore it would be unable to contact the DC.
Whew...I missed alot in the last few days - sorry for the long delays. I'll try to answer everyone's questions to the best of my abilities, however keep in mind that there is a vast majority of information that (while I have access to) I'm not terribly knowledgable about.

When you say statically-assigned, do you mean they have been physically set in the interface's network connection properties? Has the "Obtain an IP address automatically" option been changed to a static assignment, or has the Alternate Configuration tab been set and is actively in use? You will find this information by accessing the properties of TCP/IP under the network connection's properties.

If it is still set as "Obtain an IP address automatically", this means that these addresses are being assigned by another DHCP server. These machines "shouldn't" be affected by a rogue DHCP, as the domain should have authorized DHCP set up. Take a look into it.

As for RAID, RAID 0 refers to a striped RAID, where information is striped between drives to increase performance. This however provides no fault tolerance whatsoever, and if one of the drives fail the other drive is not redundant as it only has pieces of data. RAID 1 on the other hand is mirrored, which does provide fault tolerance. If one drive fails, the other drive should have an exact copy of the failed drive and pick up where things left off.
The option for "Assign Automagically" had been changed. As I said before, there doesn't seem to be any rhyme or reason for it. The RAID, from what I understand, was using disk striping and with one failure, a good portion of the information was inaccessible. If it was *supposed* to be recoverable, it sure as hell wasn't as we had to order a new drive and rebuild the RAID before the DC was accessible outside of "Recovery Mode".
I too am suspicious of a rogue dhcp server such as a router or something else. I might also be wondering about apipa (automatic but self-assigned addresses.)

1. Could you please give us some samples of the "static" IP addresses the clients are getting, ie: 192.168.0.12 or whatever?

2. Could you also give us the scope of the dhcp in the server, for comparison?

Someone might well figure something out just by seeing those numbers.

3. Did you answer as to whether the "Obtain an IP address automatically" was actually not checked on the clients?
Last things first: To everyone's knowledge "Obtain Automatically" was selected on each of the affected machines prior to the event. The reason that I was being blamed was because I'm the only person that has spent any amount of time onsite at a few of the sites. Of course, as I said before, the idea is laughable that I would have time or inclination to hop from machine to machine to make such a ludicrous change, but some of my coworkers seem to think that's a viable option.

I'm not sure of the specific scope and stuff - but I can toss you some IPs if it will help.
The PDC uses 192.242.220.10 and is running as the DHCP server as well as the DNS server. We have other controllers at each site, but they all get their scope (from what I understand) from the above machine. Each site has a different third set of numbers - 192.242.229.xxx, 192.242.226.xxx etc. The computers at the individual sites that were affected, had the IP addresses that matched the schema (192.242.229.102) using IP addresses that would, normally, be assigned through DHCP. The difference being that they were statically assigned somehow (as mentioned above). Because of the static IP addresses, when the Domain Controller failed they were unable to pull proper records from DNS (because it's the same server) and basically lost most of their connectivity. The only reasonable solution was to log into each machine manually and change the settings back - hence the reason for "blame" in the first place.

Well, "pointing" to the wrong place and static IPs are different in my opinion. By pointing, I would think you mean the DNS server? If this is the case, perhaps it was setup incorrectly from the beginning. If the computers were "pointing" to something other than the DC, then you probably experienced slow connections, dropped mappings and dropped GP. If you are running Windows XP in a domain, it has to point at the domain controller.

Does this answer any of your questions?

I guess I am foggy as to what you mean by "pointing."
Oops...got posts mixed up - I answered yours above :)

The DNS server that was "statically" assigned to each PC was wrong as soon as the PDC went down. Were they still set to DHCP, they would have been able to get the DNS records from another computer, but because of the static assignment they were lost at sea.

You said the DC went down "hard." Are you sure you even have dhcp installed and running with a viable scope?

I'd still really like to see the IP numbers of some of those "static" addresses, and know if the boxes are checked to obtain IP addresses automatically.

Unless the system was set up with statics originally, and the server has been rebuilt in such a way that they don't match, I can't see how a bunch of clients just switched to static. In fact, I don't think they did...?

Help us out here?

Cheers,
JB
I know...alot of this is redundant, but I just want to make sure everyone gets the answers they're looking for :) The XP boxes definately became static somehow - I fixed a good number of them myself. Whether or not someone personally did it, a group policy did it, or the server crash did it (or some unknown quantity) I can't say. The DHCP scope and everything *should* be set up correctly (no problems as long as the server is online), but then again, I'm definately not a server guru, much less a networking guru. I can try to answer specific questions, but (as can be seen from my above posts), I'm hardly an expert. I can catagorically say that *I* didn't change the computers from DHCP to Static Addresses, and I can say that when the server crashed it crashed - in the midst of a reboot it hung (as a result of the failed RAID) and could only be accessed in Recovery Mode, which apparently limits it's ability to do it's job (just repeating what I experienced). Our network guy isn't the smartest, but he knows a helluva lot more than I do so...well...is it correct? I haven't the foggiest...but it seems to be set up correctly from my limited experiences.

This is quite interesting on the fact that your DC had crashed and a secondary "server" was brought online. If this new server did not aquire the FSMO roles and the gobal catalogue from the your original DC and you did not mention that these roles were seized, the original domain is theoretically non existant. The workstations are logging in under cached creditentials.
This sounds like exactly what happened, despite not understanding some of what you're saying :)

I cannot see how the workstation IPs were set statically by themselves. There has to be some sort of intervention at the workstation. Even if the workstations use APIPA, if all the workstation are on the same network, no two computers would get the same address. When the computer decides to use APIPA, it will broadcast what IP it will use. If no other computer responds, then it will assign itself that address. However if the computer is using APIPA, it cannot communicate with other computers on other subnets. Therefore it would be unable to contact the DC.
Can you explain a bit more about APIPA? I've noticed that with the IP address leasing stuff, it seems that a computer is prone to getting the same IP address anyway (unless there are conflicts). If that's the case, and if this could cause a formerly DHCP setting to become static, this sounds like what may have happened. The vast majority of the computers did not seem to have problems until after they were rebooted, perhaps clearing out some of the cached files and being unable to contact the PDC? If they were somehow attempting to use their former IP addy, it makes sense that there wouldn't be a conflict.

I just don't know enough about all of this - please pardon my ignorance :( I realize that on an odd problem like this y'all need more clarification, I just wish that I were more capable of expressing what happened in real terms instead of having to rely on my understanding of what's going on (hence the bastardized geek-speak :D )
See less See more
You asked about apipa. My comment there was meant to be rhetorical. Automatic Private IP Addressing (APIPA) will only occur when there is no dhcp server available, not even a router. This is how a small peer-to-peer network can function. That would be a small group of home or office computers hooked together with a switch or hub, or even just 2 computers hooked up back to back with a crossover cable. They don't find a dhcp, not even a router, so they have to assign themselves the IP's.

Of course they would have to have the box checked: "get IP automatically." If they had statics already onboard, ("Use the following IP address") then they wouldn't and couldn't use apipa.

Apipa is just what microsoft tosses in there so that a small group can function without dhcp.

I don't believe you have that issue. One reason is that the maximum number of computers that can connect using apipa is 10. Microsoft did that to sell server software. The 11th will fail to connect. If one of the 10 shuts down, then another can connect and so on, but always keeping the maximum at 10.

Another reason is the scope you gave. Those aren't apipa IP's.

Another is that you seem certain that statics are assigned (the box is checked saying something similar to "Use the following IP address.")

You're darned sure that's true?

I know this doesn't solve your problem at all but I think you can ignore the apipa comments.

Remember, if the clients don't have that box checked: "Use the following IP address" Then you don't have statics. I don't recall you verifying that this is the case. If that box is checked, my $.02 is that they were set up that way in the beginning and no one realized it. I agree that it doesn't make sense why someone would do that, but it wouldn't have affected functionality per se.

If I were smarter maybe I could tell you how to push that change from the server with AD, but I don't even know if it can be done. Maybe someone else here does.

In any event, I would want all of those clients changed so that the "receive IP address automatically" was checked rather that the "Use the following IP address."

Now about the RAID. If you were able to repair the DC with a new HDD and without reinstalling your OS etc, then it sounds as if you just need to keep a new HDD around at all times. Some types of RAID will rebuild that way but still not function when one HDD goes down, until a good HDD is installed and the rebuild process completed. Others will not repair and you have to reinstall everything.

Sorry I told you more about what problems you don't have than figuring out what you do have.

Good luck,

JB
See less See more
It would be impossible for a GP to change the network settings to static. However someone could create a script to run during a logon that may make it possible to change these settings. From the IPs you have given, your workstations are not being asigned APIPA and your subnet appears to be in the 192.242.xxx.xxx range.

My recommendations for your situation are.
1. On the new domain controller, you will need to seize the FSMO roles.
http://support.microsoft.com/kb/255504
http://www.petri.co.il/seizing_fsmo_roles.htm
Not a task taken light heartedly.
Without seizing the FSMO roles, your old domain is lost.
2. Change all workstations back to DHCP, pull them off the domain and then rejoin them back to the domain.

or

1. Create a new domain.
2. Change all workstations back to DHCP and let them join the new domain.

More recommendations.
1. Hire a competent system administrator.
2. Add another domain controller to your network for replication purposes.
3. Implement a disaster recovery plan.

I know this involves money but this little incident will prove to your boss that if you budget enough money to your network infrustructure, downtime can be limited.

All in all, it sounds to me like your are being a scape goat for someone elses incompetents.
See less See more
Crazijoe said:

"More recommendations.
...Hire a competent system administrator..."

and

"All in all, it sounds to me like your are being a scape goat for someone elses incompetents."

That's it. Ditto. Bingo.
1 - 12 of 12 Posts
Status
Not open for further replies.
Top