Start a new topic

Importing Large Data Sets

I'm trying to import some data for testing.



So far, the import process has been .... interesting. The import dialog warns about file sizes and formats. Since I'm trying to import a JSON dump from local MongoDB, keeping all my objects in an array is a bit of a pain.



So, I try with an array that contains 10 documents to import. The raw data looks like this:



{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c6" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "Et harum quidem rerum facilis est et expedita distinctio", "_id" : { "$oid" : "5282ba26fa28cfe817000001" } }

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c6" } ], "from" : { "$oid" : "52829b4b8adbe332170046c6" }, "msg" : "non provident, similique sunt in culpa qui officia deserunt ", "_id" : { "$oid" : "5282ba26fa28cfe817000002" } }

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c5" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "At vero eos et accusamus et iusto odio dignissimos ducimus qui ", "_id" : { "$oid" : "5282ba26fa28cfe817000003" } }

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c5" } ], "from" : { "$oid" : "52829b4b8adbe332170046c5" }, "msg" : "nihil impedit quo minus id quod maxime placeat facere possimus, ", "_id" : { "$oid" : "5282ba26fa28cfe817000004" } }

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4a8adbe33217000c2d" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "aut perferendis doloribus asperiores repellat", "_id" : { "$oid" : "5282ba26fa28cfe817000005" } }

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4a8adbe33217000c2d" } ], "from" : { "$oid" : "52829b4a8adbe33217000c2d" }, "msg" : "et quasi architecto beatae vitae dicta sunt explicabo", "_id" : { "$oid" : "5282ba26fa28cfe817000006" } }

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4b8adbe33217002300" } ], "from" : { "$oid" : "52829b4a8adbe33217000807" }, "msg" : "dolores et quas molestias excepturi sint occaecati cupiditate ", "_id" : { "$oid" : "5282ba8c52dc6ce917000001" } }

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4b8adbe33217002300" } ], "from" : { "$oid" : "52829b4b8adbe33217002300" }, "msg" : "Temporibus autem quibusdam et aut officiis debitis aut ", "_id" : { "$oid" : "5282ba8c52dc6ce917000002" } }

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4c8adbe33217004d89" } ], "from" : { "$oid" : "52829b4a8adbe33217000807" }, "msg" : "blanditiis praesentium voluptatum deleniti atque corrupti quos ", "_id" : { "$oid" : "5282ba8c52dc6ce917000003" } }

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4c8adbe33217004d89" } ], "from" : { "$oid" : "52829b4c8adbe33217004d89" }, "msg" : "in culpa qui officia deserunt mollit anim id est laborum", "_id" : { "$oid" : "5282ba8c52dc6ce917000004" } }



When I import it, I get some warning about my data looking so big. It's just 10 records that I am testing with.



When the import is completed, I see this as a single result in the collection:



_id : "[object Object]"

from : {"$oid":"52829b4a8adbe33217000807"}

members : [{"$oid":"52829b4a8adbe33217000807"},{"$oid":"52829b4b8adbe33217002300"}]

msg : "dolores et quas molestias excepturi sint occaecati cupiditate "



So, clearly Kinvey is not letting my import "real" data. I assume you want just simple, non-complex, CSV style data. However, that doesn't really help.



I need to import 1M+ records for testing. It's coming straight out of another MongoDB. How can I do this with Kinvey? Surely I don't need to wrap this in a script to create a record one at a time?



Is there any way to import BSON directly into the DB?

Justin,



Multiple objects ending up into a single one, chances are you missed a comma somewhere. Pasting your data into JSONlint revealed where:



Parse error on line 17:

...fe817000001" }}{ "members": [

---------------------^

Expecting 'EOF', '}', ',', ']'



I had surprising results when transfering data to/from DataStore - from csv, I must say. Depending on the quoting you use, numbers are parsed as strings, dates get double-quoted. Now I lint my data before uploading it.
Igor,



Thanks for a response. However, the data I posted is straight from a JSON dump from MongoDB. So, I'm pretty sure it's good. I did JSON lint as well and got no error :



{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c6" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "Et harum quidem rerum facilis est et expedita distinctio", "_id" : { "$oid" : "5282ba26fa28cfe817000001" } }



Oh, I see you copied all 10 rows that I demonstrated above. I just displayed those as that is the REAL output of a JSON dump from MongoDB. Not as an example of the data I tried to import.



When I tried to import the 10 rows wrapped in an array, I put commas after each object. That STILL caused the problem. Example :



[{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c6" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "Et harum quidem rerum facilis est et expedita distinctio", "_id" : { "$oid" : "5282ba26fa28cfe817000001" } },

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c6" } ], "from" : { "$oid" : "52829b4b8adbe332170046c6" }, "msg" : "non provident, similique sunt in culpa qui officia deserunt ", "_id" : { "$oid" : "5282ba26fa28cfe817000002" } },

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c5" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "At vero eos et accusamus et iusto odio dignissimos ducimus qui ", "_id" : { "$oid" : "5282ba26fa28cfe817000003" } },

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4b8adbe332170046c5" } ], "from" : { "$oid" : "52829b4b8adbe332170046c5" }, "msg" : "nihil impedit quo minus id quod maxime placeat facere possimus, ", "_id" : { "$oid" : "5282ba26fa28cfe817000004" } },

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4a8adbe33217000c2d" } ], "from" : { "$oid" : "52829b4b8adbe33217003793" }, "msg" : "aut perferendis doloribus asperiores repellat", "_id" : { "$oid" : "5282ba26fa28cfe817000005" } },

{ "members" : [ { "$oid" : "52829b4b8adbe33217003793" }, { "$oid" : "52829b4a8adbe33217000c2d" } ], "from" : { "$oid" : "52829b4a8adbe33217000c2d" }, "msg" : "et quasi architecto beatae vitae dicta sunt explicabo", "_id" : { "$oid" : "5282ba26fa28cfe817000006" } },

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4b8adbe33217002300" } ], "from" : { "$oid" : "52829b4a8adbe33217000807" }, "msg" : "dolores et quas molestias excepturi sint occaecati cupiditate ", "_id" : { "$oid" : "5282ba8c52dc6ce917000001" } },

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4b8adbe33217002300" } ], "from" : { "$oid" : "52829b4b8adbe33217002300" }, "msg" : "Temporibus autem quibusdam et aut officiis debitis aut ", "_id" : { "$oid" : "5282ba8c52dc6ce917000002" } },

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4c8adbe33217004d89" } ], "from" : { "$oid" : "52829b4a8adbe33217000807" }, "msg" : "blanditiis praesentium voluptatum deleniti atque corrupti quos ", "_id" : { "$oid" : "5282ba8c52dc6ce917000003" } },

{ "members" : [ { "$oid" : "52829b4a8adbe33217000807" }, { "$oid" : "52829b4c8adbe33217004d89" } ], "from" : { "$oid" : "52829b4c8adbe33217004d89" }, "msg" : "in culpa qui officia deserunt mollit anim id est laborum", "_id" : { "$oid" : "5282ba8c52dc6ce917000004" } }]



P.S. : I just imported the JSON above again but only one record imported. It gave me the "my what big data you have" flash as well.



So, I still can't really get a bunch of data into Kinvey for testing short of hitting your API a million times. I guess I may need to to that.



Is there any way to import BSON directly into the DB?
Seems like Kinvey's parser does not decode correctly the $oid representation. You can try removing it:



mongoexport --jsonArray | sed s/\{ "\$oid" : ("[0-9a-z]+") \}/\1/g

Justin,



At this point in time, BSON import isn't supported directly. The uploader was generally designed for standard JSON or CSV rather than BSON. This is definitely a valid use case though, so I'll make sure we incorporate it into our roadmap for future console improvements.



In the meantime, you can try writing a script to hit the API to upload each record - less than ideal, I realize, but probably your best option at the moment.
Login or Signup to post a comment