I need some technical help and assistance. - Updated & Resolved
By: Lewis A. Mettler on: Fri 16 of Mar, 2012 [18:05 UTC]
I need some technical help and assistance. Or, a guinea pig.
Occasionally we all have technical problems accessing the Internet. And I seem to have a really difficult one.
I have no problem downloading Kubuntu or MintLinux? but have extreme difficulty downloading a rather large XML file for a different source. I first conclude that my equipment and ISP are fine because I only have a problem with the large XML file at a specific IP. But, of course, “they” claim they are fine. And they might be. I have run a number of tests on their IP (ping) and it checks out fine with certain exceptions. The devil always being in the details.
Test 1: Ongoing ping of both Kubuntu and the target while a Kubuntu download is in progress. No problems, no lost packets. Okay 1 or 2 over 40 minutes or so. Download stream runs at about 400 kb/s. So good so far. Everybody is cool and working, right?
Test 2: Ongoing ping of both Kubuntu and the target while a download is in progress from the target. Huge problems. Both ping streams report about a 16% packet loss over an hour and a half. Download stream runs at only 200 kb/s plus a huge number of errors and lost packets.
So it would appear a download stream from one IP seriously affects a ping ongoing ping test with that IP and a control IP too (Kubuntu.org). How does that happen?
I do not want to disclose the target IP here. Sort of protecting the innocent until proved guilty I suppose. And I do not know that the problem is at their end.
But, I need help. If you have actually seen such an event or circumstance you are more than welcome to offer advice. Both my ISP and the target IP are working on it. But, so far we are stumped.
It is easy to help out however. I can provide the necessary download instructions (very simple). And you can initiate the download. It is about 700mbs and should take about 30 minutes if successful (depending upon your bandwidth). If it keeps crapping out, it will really help to know that. Success or failure is very useful information right now.
Send me an email and let me know what you think. Or, just volunteer to try the download from your systems. Either result is very helpful. I can provide the URL for the download and then we have more data points.
Now back to watching those cases with Oracle and Microsoft.
A couple of readers have volunteered to give the troubled download a try.
One reported a failure.
One reported a success.
It is definitely a problem. And, yes, misery loves company. But, any diagnostic process can benefit from success as well.
We are using the normal tools like ping and traceroute in order to try to figure out the problem. And a newer utility called MTR, or My Trace Route. If you have any experience with using these tools to pinpoint problems areas in Internet communications, you are more than welcome to give me a jingle. MTR, or mtr, is installed by default on Kubuntu 11.10 and 11.04. And I assume other versions of Ubuntu, Kubuntu, etc. And perhaps Fedora and others. MTR is a combination of ping and traceroute. It is a very interesting diagnostic tool. But, as any great tool, it takes some experience to know what the reports are really telling you.
We could have a hosting server problem. Or, perhaps a en route problem. Or, even a client problem (possibly related to a less than huge pipe). I know I have problems. So I do not need this one. But, we know the date the problem started and it has been present every day since. As least the problem is consistently present. If you have it.
I am still interested in having a few more users give downloading the XML file a try. Send me an email and I can give you the particulars. See email address up top the page.
Eventually you do find the problem if you keep working on it. At least I would like to think so. But, first I need to thank one more reader for helping out. A total of three readers have tried the download that was causing the problem. That person was able to conduct the download without incident. So of the downloads I was able to evaluate, two failed (including my own) and two worked fine. Now it was a large download (about 700mbs) and yes sometimes you do have problems conducting downloads regardless of size. But, this problem had persisted for almost two months.
And to be fair you have to realize that the download process was not a frequent or common event. It is rather unique data and not many users actually tried to download the data. In that sense, it is very different than Kubuntu or Ubuntu ISO files or even audio or video files. I do not know what the total number of daily downloads might be but it is relatively low.
To the discredit of the guilty party, it was only revealed to me yesterday that no other users complained of the difficult download. None. And I assume that the representation is true.
And as you can imagine, in these kinds of situations, everyone is pointing fingers at everyone else. You have been there, done that and suffered because of it.
But, we kept at it. The test downloads that resulted from the listing here were invaluable. Invaluable in pointing out to the guilty party that I was not the only person on the planet that had the problem. But, you have to go through all of the pain and frustration trying to diagnose the problem and, yes, point the finger elsewhere. Not easy.
Well, you have guess by now where the problem was. The hosting servers and equipment. But, most interesting in all of this effort is that none of the diagnostic software was able to identify the problem. Pings, traceroute and even MTR. MTR is a very nice tool by the way for those of you who have not yet had the opportunity to use it. It is available for Unix, Linux and even Microsoft. And I assume for Apple.
But, in the end, it was not the many reports and tests that pointed out the problem. Rather, it was the effort by IT staff at the host to just think through the problem trying to take into account all of the available reports.
And keep in mind that as any hosts is naturally inclined to do, blame the problem on the single user who has the problem. Everyone is happy but one customer. But, thankfully, I did have a download report in my hand from one of the readers here that reported also having great difficulty in conducting the download. Just knowing that two people (one in Sacramento and one in Seattle) are having difficulty is invaluable. Keep in mind we are not talking about 10 out of a 100 or even 1000 out of 10,000. Two known problem users. That is it. And as far as I know 2 users who can download the file without a problem.
And naturally I had substituted client systems, operating systems, browsers, switches, routers and even DSL modems in a vain attempt to find a local problem. I even fired up a copy of the Microsoft OS. And, of course, knowing that the reader in Seattle could not download the file either, certainly seems to let my own systems and ISP off the hook. But, the host or target system only had one of their customers with the problem. And that customer was me.
Yes. I will tell you what and where the problem resided.
It was the load balancing unit/software at the host.
Now how could a load balancing device prevent valid downloads. Well, do not ask me to explain it any further. If you have worked with these devices, you know they are obviously subject to customization. And, it was explained to me that configuring these devices is more of an art than a science. Oh, hog wash. It is a digital device running some software. Science all the way. But, the knowledge of what is going on or might go on or who might be disadvantaged is at best a bit unclear. Obviously my downloads were caught somehow and screwed up. Real bad as it turns out.
And, of course, the load balancing was outside the box as far as ping, traceroute and MTR is concerned. So those standard tools were of little help except to point out that most things seem to be working just fine. Actually, those reports did show some anomalies. But, in the end, they did not reflect the problem. Strangely enough the download seemed to adversely affect the MTR reports.
What's to be learned here?
It is rather simple. Thinking can solve problems. Just thinking through all of the possible sources of the problem and trying to understand the various tests and what possibilities they do or do not rule out, can be invaluable. So, yes, it was simply a thinking session last night by someone who just could not make sense out of all of the tests and reports. Then it happens. Maybe “this black box” is the problem.
Now keep in mind that I originally reported that the problem developed on a given date (and I presume time). So somebody did something. Or, something busted. Or, something was updated. But, even knowing that does not point the finger unless you know all of the hardware and software involved between two different locations and en route. And, no person has access to all of it.
So, yes, someone screwed around with the load balancing device and that process prevented valid downloads from being realized by some customers. Or, perhaps one single customer.
Not easy. And it took almost 2 calendar months to figure it out. Well, okay, it took the host about 3 weeks to get off their duff and seriously consider the issue.
Just remember, finger pointing is easy. Locating the problem can be a bitch.
In the end, it was the human brain thinking about the issue and eventually being able to identify the hardware/software that was causing the problem. There is a lot of comfort in knowing that.
And be easy on that single customer who is having problems with your service. It may not be their problem.