Thursday, April 14, 2011

Technical Interview Fail

So I had my technical interview for a Fortune 500 company today. From the title I think it's obvious I fared quite poorly. The opening questions were typical "Tell me about the project" and "Compare your two most familiar programming languages. What do you like/dislike about each?" While I was under-prepared for these questions, it's trivial to contrive rehearsed answers for these types of questions. My most significant failure relates to the problem presented next.

It is a simple problem, one that any first year Computer Science student should be able to solve with ease, let alone a fourth year graduating senior.

Problem: Develop an algorithm for a function that takes an array of integers and an integer containing the size of that array. Remove all the duplicate values from the array. Return the size of the array. That is, the number of unique values in the array.

Constraints:
  1. You may not use use any additional data structures. Local variables are allowed.
  2. Valid data must be in a form that is accessible using only the value returned. That is, using the number of unique values in the array, it must be possible to access only the valid data within the array.
While the second constraint wasn't explicitly given, I felt it was implied since any integer is considered potentially valid data. This precludes filling slots with a "null" value.

Given this problem, I'm am ashamed to report that I panicked. I resorted to the embarrassingly obvious naive solution. Moreover, even for a naive solution, it was a particularly poor one.

That algorithm is as follows:
  1. Declare a variable to count the number of duplicate entries
  2. Starting with the first element in the array, compare it to each other element.
  3. If they are equal, then the second element is considered the duplicate.
  4. If a duplicate element is found, it is swapped with an element from the back of the array.
  5. Continue until the element you are comparing is a duplicate element. That is, the index of that element is equal to the size of the array minus number of swapped items minus one.
  6. Return the size of the array minus the number of duplicate items.
Given that algorithm I proceeded to implement it in code. Upon completion I read that code to my interviewer. He pointed out several errors I had made in coding. Most importantly, he told me that swapping values was pointless. Since duplicated data should be "removed" preserving it through swapping was a useless waste of time and resources. I acknowledged this point and modified my code to simply copy over duplicate data.

And then came my Great Fail. My algorithm is O(n^2) in the worst case, with n being the number of comparisons made between elements. The natural follow up question is, of course, "What can be done to improve this algorithm?" I was stunned. In the back of my mind I knew this question was coming. I would have asked it myself had our positions been reversed. I had no answer. Did I have an idea? Certainly. I thought of using a binary tree to reduce the number of comparisons necessary. A friend of mine later suggested using a hash table upon hearing this problem. Both of these ideas are foolish. Indeed, they clearly violate the first constraint. So I said nothing. Should I have said something? Talked through my thought process? Perhaps. Although in retrospect the only thing I would have done was suggest a binary tree to reduce the number of comparisons. Perhaps the interviewer would have let me talk and at the end said something to the effect of, "You're not allowed to use data structures." Perhaps he would have cut me off at the mere mention of the words "binary tree." In his place I certainly would have. Perhaps in talking I would have been inspired and come up with my current solution or something. We'll never know. Instead, silence permeated until finally the interviewer had had enough."Do you have any questions for me?" he asked. Still stunned by my inability to respond intelligently to the previous question, I gave a meek "No, not at the moment." "Thank you for your time," he said. And that was the end.

That was more than five hours ago. Now I've come up with another solution to this easy problem. Perhaps it doesn't work or my analysis is flawed. I don't care. I need to write this down.

To understand my solution it's necessary to know a little bit about heap sort. Heapsort is an "in-place" sorting algorithm. "In-place" means that it does not require additional memory in order to execute. This is key due to Constraint 1. Heapsort has a worst case run time of O(n log n). I won't go into the details of big O notation here. It is sufficient to say that O(n log n) is significantly better than my naive O(n^2) algorithm.

"Solution":
  1. Declare variables to count the number of duplicate items and the array position (index) of the last (furthest to the left) duplicate value. Initialize them to 0 and -1, respectively.
  2. Compare each element with its neighbor. If the first two elements are the same, the right one is considered the duplicate. The last duplicate is now at position 1.
  3. After this, continue comparing elements with their right neighbor.
  4. If a unique element is found, copy its data to the position of the last duplicate. If there is no last duplicate, do nothing. A unique element is one where it's right neighbor is does not have the same value.
  5. If a duplicate element is found, the current element is considered the duplicate and its index is stored as the last duplicate. Increment the count of the number of duplicate elements found.
  6. Continue this until all elements have been compared.
  7. Return the size of the array minus the number of duplicate items. The array from index 0 to index array size minus number of duplicates minus one contains sorted, unique values.
This solution is simple. Absurdly so. Is it correct? I don't have the energy to confirm its veracity. My preliminary analysis indicates that its run time is O(n log n). It matters little now. It took me five hours to come up with this solution. Utterly embarrassing. Even if it turns out this solution is deeply flawed or no better than my original, it is far better than having said nothing at all.

No comments:

Post a Comment