Finding the number of expressions in Mathematica

I am trying to search a large array of text files in Mathematica 8 (12k +). Until now, I could draw a huge number of times when the word appears (that is, the word "love" appears 5,000 times in these 12k files). However, I encounter difficulties in determining the number of files in which โ€œloveโ€ appears once, which can only be in 1000 files, and it is repeated several times in others.

I find WRT FindList documentation, streams, RecordSeparators, etc. a little muddy. Is there a way to tune it so that it detects the frequency of one member in the file and then moves to the next?

Example file list:

"89001.txt", "89002.txt", "89003.txt", "89004.txt", "89005.txt", "89006.txt", "89007.txt", "89008.txt" "8901. txt "," 89010.txt "," 89011.txt "," 89012.txt "," 89013.txt "," 89014.txt "," 89015.txt "," 89016.txt "," 89017.txt " , "89018.txt", "89019.txt", "89020.txt", "89021.txt", "89022.txt", "89023.txt", "89024.txt"}

Returns all rows with love in all files. Is there a way to return only the first love fall in each file before moving on to the next?

FindList[filelist, "love"] 

Many thanks. This is my first post, and I mainly study Mathematica through peer / supervisor help, online tutorials, and documentation.

+6
source share
2 answers

In addition to Daniel, answer , you also seem to be asking for a list of files where the word occurs only once. To do this, I continue to run FindList for all files

 res =FindList[filelist, "love"] 

Then reduce the results to single rows using

 lines = Select[ res, Length[#]==1& ] 

But this does not exclude cases when there is more than one case in one line. For this, you can use StringCount and only accept instances where it is 1, as follows

 Select[ lines, StringCount[ #, RegularExpression[ "\\blove\\b" ] ] == 1& ] 

RegularExpression indicates that โ€œloveโ€ should be a single word using the word boundary marker ( \\b ), so words such as โ€œbeautifulโ€ will not be included.

Change It seems that FindList returns a flattened list when transferring a list of files, so you cannot determine which element will work with which file. For example, if you have 3 files, and they contain the word "love", 0, 1 and 2 times, respectively, you will get a list that would look like

 {, love, love, love } 

which is clearly not useful. To overcome this, you will have to process each file individually, and this is best done using Map ( /@ ), as follows

 res = FindList[#, "love"]& /@ filelist 

and the rest of the code above works as expected.

But, if you want to associate the results with the file name, you need to change it a little.

 res = {#, FindList[#, "love"]}& /@ filelist lines = Select[res, Length[ #[[2]] ] ==1 && (* <-- Note the use of [[2]] *) StringCount[ #[[2]], RegularExpression[ "\\blove\\b" ] ] == 1& ] 

which returns a list of form

 { {filename, { "string with love in it" }, {filename, { "string with love in it" }, ...} 

To extract the file names, simply enter lines[[All, 1]] .

Note that for Select for the properties you wanted, I used Part ( [[ ]] ) to indicate the second element at each reference point, as well as to extract file names.

+9
source

Help> Documentation Center> FindList 4 List Item:

"FindList [files, text, n] includes only the first n lines found."

So you can set n to 1.

Daniel Lichtblau

+4
source

Source: https://habr.com/ru/post/897830/


All Articles