+ Reply to Thread
Results 1 to 2 of 2

Create a list of unique words from txt files

  1. #1
    Registered User
    Join Date
    10-10-2006
    Posts
    18

    Create a list of unique words from txt files

    Hi everyone. I have a very ambitious project that I am working on. There are many stages to it and it may well melt my CPU! However, I must first make a few tests and for these, some advice would be of great value, if you can help.

    First of all, I need to be able to index all of the words in a large selection of txt files. Each unique word needs to be placed in a cell in a big long column. I expect there will be tens of thousands of words in all of the text files so I mean a REALLY long column. I also need to count the number of incidences of each word over all the txt files and have this "count" as a separate column.

    The next step will be to remove words that are superfluous like "the, it, and ... etc" and words that are so rare that they do not provide any useful information. This, hopefully, will leave me with a list of common keywords from the txt files.

    This will then need to be put into alphabetical order!

    Presently, the txt files are in pdf form so I have a little bit of work ahead of me before I can begin. Any advice on how to index words in the way described above would be most valuable.

    I am realising more and more, that a good bit of skill with macro-programming will go a long way if I am using Excel for stuff like this. Can anyone suggest a good book that might help?

    Many thanks.

    Adam

  2. #2
    Forum Moderator Leith Ross's Avatar
    Join Date
    01-15-2005
    Location
    San Francisco, Ca
    MS-Off Ver
    2000, 2003, & 2010
    Posts
    23,258
    Hello Adam,

    It would be easiest to organize this large amount of data using the Dictionary Scripting Object. You would need to add supplemental code to track the number of word occurrences. You would then need to code routines to eliminate the superfluous words in the dictionary using a exclusion list.

    Rather than buy a book, find someone to help you. Many members here may be willing to help you. The results will be faster and free. From a macro standpoint, this is about a medium level of difficulty.

    Sincerely,
    Leith Ross

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1