Warning alert:We're building our new website - with more examples, tips and docs! Some sections though might still be incomplete or under review. When browsing, keep an eye for progress notes and ways to get updated on particular topics.

Getting started with the mishmash io distributed database

  • Use mishmash io as your app's database when you need complex logic in your queries - matching data by similarity, exploring patterns and other data-driven algorithms such as ML and AI. See why

Like all databases, mishmash io allows you to program an app to store, retrieve and operate over data.

But unlike other databases, mishmash io is unopinionated, opaque and optimizes your code for distributed computation over a number of cluster nodes.

Unopinionated database

Info alert:mishmash io is 'feature-less':

  • It has no schema
  • It has no query language

You develop your entire application code in a single programming language of your choice.

As a database, mishmash io does not impose an extra language to write your queries, nor a data schema. It has no requirements on how to model, normalize or relate your data. It has no API you have to learn and adopt.

Read more about mishmash io Compare mishmash io to other databases

Use ints, strings, lists, maps... and IFs, FORs, WHILEs...

mishmash io 'plugs behind' your code and 'presents' all your data as if it is stored in variables in local memory.

To store data, you assign the same primitives, lists, dicts, maps, objects and other data types that your app uses.

To query data, you write the same if, for, while and other statements of your programming language - mishmash io then 'picks' the algorithm represented by those statements as a 'query.'

Info alert:With mishmash io you:

  • access data as if it is a variable inside your app's memory
  • write queries in your programming language

Learn more about mishmash io features

Parallelizes automatically your code

To compute a result, mishmash io analyses your code - its branches - and how it accesses variables (the data 'behind' them).

Then, mishmash io uses its understanding of data you stored inside (the inputs to your code) and 'rearranges' the code branches into an equivalent algorithm, but one that can run in parallel: over smaller subsets of the data and on different cluster nodes.

In fact, it doesn't necessarily have to execute your code - it can use its knowledge of the data to precompute results (intermediate or final) without having to loop over all data points.

Info alert:mishmash io does not actually run your code

You might have written your code with a loop over a massive subset of the data, but mishmash io will its powers to precompute results and eliminate a lot of operations.

Find out how it works

Developing apps with mishmash io

Once you've setup your app's project follow these simple steps to use mishmash io as a database.

Install dependencies

Before you begin you need to install the mishmash io client for your programming language:

pip install mishmash-io-client

Info alert:Authentication and authorization

mishmash io does not implement its own authentication and authorization functionality. Rather, it integrates with the IAM or SSO systems that are native to the environment where the cluster runs.

Client applications must also have a matching dependency installed in order to authenticate to the database. For brevity, we're ommiting this additional step here, but you can find out more in the supported environments documentation.

Create the client

To access a mishmash io database cluster from within your app, you have to instantiate the client:

from Mishmash import Mishmash
 
mishmash = Mishmash()

Your app is now connected and the mishmash variable holds all of your app's data as if it is in local memory.

Store data

To store data inside your app's data mishmash simply assign some values to members or elements of the mishmash variable:

mishmash.users[0].first_name = 'John'
mishmash.users[0].last_name = 'Doe'

Now, your app's data (or data mishmash) contains an array (or list) named users, which contains one element at index 0, which in turn is an object (or dict) with two members - first_name, holding the string 'John', and last_name equal to the string 'Doe'.

Let's add some login details for our hypothetical app's user:

mishmash.users[0].email = 'john.doe@acme.com'
mishmash.users[0].password = 'dont share passwords on the internet!'
mishmash.users[0].is_active = True
mishmash.users[0].secret_question = 'Life, the universe, and everything'
mishmash.users[0].secret_answer = 42

Now, we have also added additional fields to the user's object (or dict) - email, password, is_active and a secret_question and secret_answer.

Info alert:More ways of storing data

With mishmash io you can also assign entire objects/dicts, push elements to arrays/lists and more.

To see all of them check out the Programming Guide.

Build data mishmashes

Data mishmashes, or simply mishmashes are essentially subsets of the data represented by a mishmash variable, or a combination of data represented by multiple variables. Such a combination can be as simple as just the union of two sets, but it can also be as complex as a set generated by the execution of a complex 'joining' or 'matching' logic - hence the name data mishmash.

For the time being, let's explore the simple ways of building a mishmash, leaving the advanced scenarios for the Programming Guide.

To get a subset of a mishmash, start with a variable that's already a mishmash, and access a member of it, or an element at a given index:

# build a mishmash of all users
users_mishmash = mishmash_users
 
# build a mishmash of a single user, given an index
user_id = 0
user_mishmash = users_mishmash[user_id]
 
# build a mishmash of the user's first name
first_name_mishmash = user_mishmash.first_name
# or, same as the above:
first_name_mishmash = user_mishmash['first_name']
# also the same:
field = 'first_name'
first_name_mishmash = user_mishmash[field]

It's important to remember that building mishmashes does not actually pull any data from the database, they are merely representations of what data will your app use in a subsequent step of its execution.

It may also help to remember a simple rule - any operation done on a mishmash variable always returns another mishmash variable, unless 'forced' to actually pull and yield the data behind it.

Info alert:Data as variables in local memory

As we pointed out earlier - all data in mishmash io appears as if in a variable inside your app.

Central to this are the mishmashes - variables that ultimately originate from the mishmash client that we instatiated above. They do not actually hold the data in memory, they are a way of 'pointing' to data.

To save you the trouble of pulling massive amounts of data - an operation that can potentially take time and resources - these variables will not yield the actual data they represent until your app needs it to continue its operation.

Pull data

To retrieve data from mishmash io you should have a mishmash built and pointing to the data you need, and then use it in a context that needs data:

# pulls the size of a mishmash
num_users = len(users_mishmash)
 
# pulls users one after another
for user in users_mishmash:
    do_something_with_user(user)
 
# pulls the value of is_active
if users_mishmash[0].is_active:
    print('User 0 is active')


Push algorithms

Pulling data from a database might not always be the best approach to perform a computation, particularly when datasets are large. Retrieving lots of data can take time and resources and make your app hang.

To execute complex logic over large datasets (and avoid the overload), mishmash io allows you to 'push' the code to the database, where it will be automatically optimized and executed in parallel.

When building a mishmash simply provide your code:

def find_user(input_mishmash):
    for user in input_mishmash:
        if user.email == 'john.doe@acme.com' and user.is_active:
            return user
    
    return 'User not found or not active'
 
# Note: this still returns a mishmash, find_user() will not run yet
found_mishmash = users_mishmash(find_user)
 
if found_mishmash == 'User not found or not active':
    print('Cannot log in')
else:
    print('Welcome ' + found_mishmash.first_name)




Behind the scenes, mishmash io will 'take' the code of the find_user() function, analyse it and figure out there are two possible outcomes (or 'final states') - either a user is returned, or the string 'User not found or not active'.

The code of the function also specifies how the data (the input) determines which final state will be reached - if there's a single user record with the given email and is_active status - it will be returned. If there's no such record existing in the database - then the function will ultimately reach the state of returning 'User not found or not active'.

Note though that the loop over all users is not really necessary as long as we have a quick way of knowing if the sought after record exists or not.

Fortunately, mishmash io does have this ability - it arranges data in a way that allows it to quickly explore potential 'splits' of sets - even if you do not actually specify by which fields or values. See how that works here.

So, in the example above - mishmash io will not actually loop over all users. It will use its unique design to quickly determine which final state the find_user() function will assume, given it operates on the data stored. And in doing so - it will save you the trouble of retrieving data and looping over it.

Info alert:Multiple ways of pushing code:

If your programming language supports methods like filter(), forEach(), etc. you can use them on mishmashes too!

Next steps

Find more examples of how to use mishmash io in the section dedicated to code examples.

Or get the details with the Programming Guide.

© 2024, Mishmash I O UK Ltd. or its affiliates. All rights reserved. | Privacy Policy | Cookies