Need to vent...

Timothy Byrd

Our marketing department hired a new web developer, and they were given a task to write some code for our department. Here are a couple excerpts:

// Names have been changed to protect the guilty...
//
namespace OMG
{
static class TranslationClass
{
static ArrayList nonTranslatable = new ArrayList();

    public static ArrayList NonTranslatable
    {
        get
        {
            nonTranslatable.Add("1st String");
            nonTranslatable.Add("2nd String");
            // ...
            nonTranslatable.Add("30th String");
            return nonTranslatable;
        }
    }

    private SortedList translationDictionary = new SortedList();

    // more stuff here to fill the translationDictionary

    public void TranslateNode(XmlNode node)
    {
        foreach (string s in NonTranslatable)
        {
            if (node.Attributes\["Publish"\].InnerText.Trim().Contains(s))
            {
                return;
            }
        }

        foreach (DictionaryEntry de in translationDictionary)
        {
            if (de.Key.ToString().Equals(node.Attributes\["Publish"\].InnerText.Trim()))
                node.Attributes\["Publish"\].InnerText
                        = translationDictionary\[node.Attributes\["Publish"\].InnerText.Trim()\].ToString();
        }
    }
}

}

When I asked them to run their code on the sample dataset, I was told they'd need to run it over-night. Looking at their code, I could see why. So I wrote my own version of the utility, and timed them versus each other. Their program took an hour and a quarter to process a 2MB text file; mine took 2 seconds. The biggest problem was that every time that NonTranslatable was used, new, redundant elements were being added to it, and the loop took longer and longer to run. Also they were looping through the dictionary instead of using something like ContainsKey(). Argh!

PIEBALDconsult

Ouch! If you fix those two problems (repetitive additions and looping) how much faster is it? And how about removing the .ToString and .Equals ? If you are using .net 3.5, how about if they use a HashSet?

Chris Meech

Well, I think marketing are getting exactly what they deserve. The slower they work, the better. :)

Chris Meech I am Canadian. [heard in a local bar] In theory there is no difference between theory and practice. In practice there is. [Yogi Berra]

GibbleCH

Or better, if you're going to use 3.5, a List and some linq

Timothy Byrd

Simply putting an if (nonTranslatable.Count == 0) block around the code in the NonTranslatable property took the run time down from >75 minutes to ~30 seconds. (Fun fact: During the 75 minutes of processing, those repeated Add() calls caused the program to take up about 25MB of extra memory. That's a lot of strings to loop through and compare repeatedly. :omg: )

PIEBALDconsult wrote:

And how about removing the .ToString and .Equals ?

Not to mention all the calls to Trim() inside the loop. Or does the runtime optimize those out as invariant? I ended up just dumping all their code and using my own. To add another wrinkle they wrote it in VS 2008, and we are still on 2005 until we get this release out, so I couldn't even build their solution. Besides there were some other issues. Not only was it still 10x slower than my rewrite, but the output wasn't quite correct. The problem is that there was also a non-XML text file that needed to be parsed - imagine taking a Windows .rc file and extracting all the strings that can be translated from it and putting them into an Excel file. They weren't being careful enough with their regular expressions, so some things were getting missed and some extra things were being found. Using .net 2.0 and a Dictionary<string,> worked fine for me. By making the regular expressions more correct, I didn't need the NonTranslatable array at all. (The two loops I showed were actually in two separate routines, but I wanted to keep my post relatively compact.) I've vented - I feel better now. -- T

PIEBALDconsult

I think Hashset is better than List in this case. 0) No duplicates 1) Hash rather than iterate

Deflinek

Chris Meech wrote:

The slower they work, the better.

Because there are soooo maaaany ways to improve! Imagine if customer would get used to "the 2.5h" routine, and after few weeks or even better months they would show him your "2sec" one? WOW! :laugh:

-- "My software never has bugs. It just develops random features."

kenrentz

You're doing it wrong. First you give them the 2.5h routine. then you give them a free update that trims it to an even 2 hours. After that you offer to sell them the latest version that drops that time to about 15 minutes. the rest of the speed improvement you save until a later time when you can add it in to make up for the lack of genuine improvements in a release. I guess I've spent way too much time around marketing executives for my own good.

supercat9

I've written some code that turned out to be dog slow when given sufficiently-large data sets. Usually I've tried to fix it before the customer could try it and complain. It's not always easy, though, to know what things are going to pose a problem until one actually codes them. Obviously O(n^2) algorithms are likely to cause problems unless 'n' is bounded to a small value, but it may not be possible to recognize an algorithm which takes time c+kn as an obvious bottleneck without testing it on large data sets. For example, if c==1000k the program may seem to take the same amount of time with n=10 and n=100, but be unacceptably slow with n=1,000,000. It may well be that the n=100 case is much more common, but if the n=1,000,000 case could occur the system needs to be able to deal with it without causing timeouts.

ToddHileHoffer

Yeah, the code is AIDS but maybe it was an afterthought. Everybody writes stupid code once in a while. Maybe the coder just didn't care at the moment.

I didn't get any requirements for the signature

Douglas Dean

How about simply bagging the non-translatable list (which was probably only put there as a lame fix to the performance of his clunky lookup) and using the DataSet and DataTable, etc. classes to parse the XML?

Megidolaon

What is the point of

if (node.Attributes["Publish"].InnerText.Trim().Contains(s))
{
return;
}

A check for containting a single letter and then doing nothing? Also,

public static ArrayList NonTranslatable
{
get
{
nonTranslatable.Add("1st String");
nonTranslatable.Add("2nd String");
// ...
nonTranslatable.Add("30th String");
return nonTranslatable;
}
}

Sorry, but WTF is this ****? Who the hell adds elements to a collection in the getter? And where do the strings come from? From a DB? When are they checked? When is determined if something is non-translatable?

foreach (DictionaryEntry de in translationDictionary)
{
if (de.Key.ToString().Equals(node.Attributes["Publish"].InnerText.Trim()))
node.Attributes["Publish"].InnerText
= translationDictionary[node.Attributes["Publish"].InnerText.Trim()].ToString();
}

I never used a SortedList, you can access its elements with the DictionaryEntry class, is there any difference to a dictionary?

Timothy Byrd wrote:

Simply putting an if (nonTranslatable.Count == 0) block around the code in the NonTranslatable property took the run time down from >75 minutes to ~30 seconds. (Fun fact: During the 75 minutes of processing, those repeated Add() calls caused the program to take up about 25MB of extra memory. That's a lot of strings to loop through and compare repeatedly. OMG )

Wow.

Timothy Byrd wrote:

Not to mention all the calls to Trim() inside the loop. Or does the runtime optimize those out as invariant?

Maybe the XML contains trailing spaces? Unless you trim them, the XML Nodes' InnerText wouldn't equal the DictionaryEntries with same key. Anyway, I feel truly sorry for you... X|

Timothy Byrd

Megidolaon wrote:

What is the point of
    if (node.Attributes\["Publish"\].InnerText.Trim().Contains(s))
    {
        return;
    }
A check for containting a single letter and then doing nothing?

No, that loop checks if the node text contains one of the non-translatable strings and if so, bails.

Megidolaon wrote:

Maybe the XML contains trailing spaces? Unless you trim them, the XML Nodes' InnerText wouldn't equal the DictionaryEntries with same

You are correct, however within a given call to TranslateNode(), the xml node stays the same. Until it gets translated from the dictionary. Oh crap, there's yet another bug in there that I hadn't noticed - it can double translate. Anyway, the phrase node.Attributes["Publish"].InnerText.Trim() should be invariant, and I assume it takes some processing. So a line like:

string xmlText = node.Attributes\["Publish"\].InnerText.Trim();

and then refering to that string wold have been more efficient,and easier for me to read, I think. -- T