Start a new topic

Compare 2 array fields, in 2 different Collection ?!

Hey,

following scenario:

My users have Tags, saved in an array (like "tag1", "tag2" ) and so on.

On the other collection the entitiys have also an array field containing tags. 

Now i want to compare these fields to match, the objects from the one collection to the tags from my user.

(and calculate how much percent they are matching)


Like:
User.tags: ['USA', 'Germany']

othercollection.tags : ['USA', 'Italy']


Should match and percentage is 50%.


Any idea how to solve this with mongodb? I don't really have a nice way or idea to achieve this on a smart way.


Help would be great!







You have a couple of options to consider:

  1. Use the current approach (should work fine up to a limit)
  2. Offload the "match" task to your nodeJS server and push results to kinvey collection
  3. Use scheduled code feature of kinvey (http://devcenter.kinvey.com/android/tutorials/scheduled-code-getting-started) to do the "match" task periodically

Your decision will depend on various factors including:
  • Use case: do you want to support real time fetching of "best matched jobs" or send "best matched jobs" periodically through email/push or do both
  • Frequency of execution
  • Volume of data processed

I would suggest you do some benchmarking with a couple of these options to decide on the best option for your app.


Regards,
Wani
Addition:

The example from above works fine - thanks! Now the question "only" regarding performance.

I'm running a node.js server for this application, too. Thought about storing the values in realtime or something like that. What do you think, does this make sense?

best, Nico

 

Hey,

thanks for your answer! i'll try it out within a few minutes.


It's for an upcoming app, so i can't say anything specific about "how much" data we are really dealing with. :/

Regarding to other networks in the same area i think we could calculate with a base of 50.000-100.000 Users. Where 1/4 of them are able to create job posts, i think Average 10 per month. Unfortunately i don't have more exact data for brainstorming the best way.

I already thought about an extra collection, storing the matching for each user/job but that seems very ugly to me. But I'm not a typical backend or database developer so even some thousends of data inside one collection "are scaring me" - even if i know it's a joke for the database engine in most cases. :)













 

Hi Nico,


To get 20 results, sorted by percentage, you could modify the script I wrote to do two things:

  • Add match percentage to each job entry

 

docs[i]["percentMatch"] = count/usertags.length;

 

  • Sort the final output on this percentMatch property and select top 20

 

finalDocs.sort(function(a, b) {
    return parseFloat(a.percentMatch) - parseFloat(b.percentMatch);
});
finalDocs = finalDocs.slice(0,20);

 


Additionally, can you tell me more about the size of both the collections that you are looking at, so that we can determine if this is feasible or not, processing power and time wise.

Also, given that users and job entries will keep fluctuating, we can also think about storing best matched jobs as a referenced-array property for each user, which can be refreshed periodically.

Let me know if this is helpful.


Regards,
Aniruddha Wani

I don't - everything is fine = )

My apologies Nico I hope you didn't take it as that,


I just wanted to let you know why there was a delay in processing on our end. 


You're not a bother at all,

Hey Damien,


no problem - I don't want to annoy anyone :)
Best, Nico

Nico,


I will have Wani follow up on this tonight (He started the replies so I'm going to let him continue).  That said, apologies for the slow reply, he has been on vacation since Thursday of last week, but should be on in a few hours time.


Thanks,

Hi,


have to "Push" this topic up, cause a blocker at the moment for our project.


Is there a way to acced more mongodb features within kinvey?


Best, Nico

Hey Wani,


thanks for the quick and nice answer! :) Helps me a lot. Think i have to adjust 1,2 small things, but looking good.

One question ahead:

I do have the percentage (i will change it a bit), but how can i query for this value?


So that i can do something like "pull 20 results", sorted by percentage. So the one matching 100% are on the top, 0% is bottom.
I'm not "a backend guy" (as you can see ;) ) - isn't that to much stuff 
processed on the request?



Best, Nico



Hi Nico,


I was able to achieve the percentage calculations by constructing an "$or" query to get books (the other collection) with matching tags and by writing a loop to calculate the percentage per book.


  

function onRequest(request, response, modules) {
  modules.collectionAccess.collection('user').find({"username":request.username}, function (err, users){
    if (err) {
      modules.logger.error('Query failed: '+ err);
      response.body.debug = err;
      response.complete(500);
    } else {
      modules.logger.info(users[0].tags);
      var usertags = users[0].tags;
      var q = { "$or" : [] };
      for (each in usertags){
        q["$or"].push({"tags": usertags[each] });
      }
      modules.logger.info(q);
    	modules.collectionAccess.collection('BookCloud').find(q, function (err, docs) {
        if (err) {
          modules.logger.error('Query failed: '+ err);
          response.body.debug = err;
          response.complete(500);
        } else {
          modules.logger.info(docs);
          modules.logger.info(docs.length);
          var finalDocs = [];
          for (i in docs){
            var count = 0;
            for (j in usertags){
              if (docs[i]["tags"].indexOf(usertags[j]) > -1){
                count ++;
              }
            }
            modules.logger.info(count/usertags.length);
            if ((count/usertags.length) > 0.5){
              finalDocs.push(docs[i]);
            }
          }
          modules.logger.info(finalDocs.length);
          response.body = finalDocs;
          response.complete(200);
        }
      });
    }
  });
}

   


Let me know if this is useful.


Regards,

Wani

I have to renew this.

I've read a bit thorugh the official docs of mongo db and found some stuff, not supported by kinvey. (Like aggregate)


I've come up with an example for my case - 

 

var queryAr = ["html"];
  
  coll('jobs').find({ tasks: {$in : queryAr} }, function(err, docs){
    
   // log(docs);
    for( var i = 0; i<docs.length; i++){
      log(docs[i].tasks);
    }
    
    
    response.complete(200);
  });

 So the idea is finding all jobs, where the tasks (an array of tags) at least contain the given one. Now i wanna turn that into a percentage value to show this to the user sorted by this value.


As i said i found some examples where they use $project together with aggregate to do this step by step, but since this is not available, any idea how to accomplish this?
At the moment I don't really see the answer for this.

Answer would be great.
Best, Nico


Hi Nico,


You can achieve this using business logic.


E.g. If you need to get items from a collection with  at least 50% match with tags of currently logged in user, you could hit this custom endpoint created using Business Logic and it would return matching items from the collection. This particular endpoint will use javascript code with modules like collectionAccess for accessing the collection.


To get started with Business Logic, please refer to following documentation:


Regards,
Wani
Login or Signup to post a comment