07-09-2008 09:05 PM
07-09-2008 11:09 PM
07-10-2008 12:38 AM
07-10-2008 01:56 AM
07-10-2008 02:27 AM
I'm working on a "words counting" related puzzle, and I'm considering optimizing my old algorithm by sorting the 1D array. That's why I need to know the time complexity of this sorting function. I think it would be better if the time complexity is described in the function document.
07-10-2008 10:37 AM
07-10-2008 10:59 AM - edited 07-10-2008 11:00 AM
FanZhang wrote:
I'm working on a "words counting" related puzzle, and I'm considering optimizing my old algorithm by sorting the 1D array. That's why I need to know the time complexity of this sorting function. I think it would be better if the time complexity is described in the function document.
07-10-2008 08:12 PM
07-11-2008 12:09 AM - edited 07-11-2008 12:10 AM
Sounds similar to one of the old LabVIEW challenges from spring 2004: 😄
Coding Challenge - The Meta Keyword Generator, Results
07-11-2008 10:00 AM
A few thoughts based on my recollection of that old coding challenge. There are different kinds of algorithmic approaches you may consider.
1. A simple approach to that challenge could involve a lot of search / lookup functions. Each time you parse a word, you perform a search to see if it's in your list already. If so, increment its count. If not, add it to the list with count=1.
This was far too slow for word counting large docs. Too many slow-ish searches. But pretty easy to implement and maybe good enough for smaller docs.
2. I pursued an approach that used hashing instead of searches. It *seemed* too slow, but some months after the challenge when I had another use for the core hash table functionality, I found a really dumb bug that I'd overlooked before. Fixing it improved the speed by something like 100x. (Missed one spot where I needed a shift register to enforce in-placeness).
This is largely the same approach, merely trying to speed up the lookup/search function via a more complex technique (hash tables). As I found out, the complexity makes it more bug-prone. Still, once fixed, lookup speed is in the order of constant with no clear dependency on the number of elements in the hash table. (Note: it *is* still dependent on the "fill ratio". You want enough memory available to size your table for maybe ~25% usage or less.)
3. One idea I tried out was to maintain a separate list of the 10 or 25 most-common words, sorted by count. (The main reason was that there could be a tie for 4th place at the end, and I wanted enough extra candidates around to get the correct tie-breaker result.) Each time I parsed another word and incremented its count, I'd check to see whether its new count qualified it to belong in the special most-common list. I still did a lot of lookup and sort stuff, but it was always on a very
There are surely other kinds of data structures and algorithms available. Do you have a specific app in mind for this? Or just generally curious? I ask because some of the things one might do to optimize for a specific app (or code challenge) wouldn't be so appropriate as a general-purpose routine.
-Kevin P.