Warning alert:We're building our new website - with more examples, tips and docs! Some sections though might still be incomplete or under review. When browsing, keep an eye for progress notes and ways to get updated on particular topics.

Open Source at mishmash io

  • Here you can find open source code that we developed because we found it useful in our own work. We share it with the open source community because we believe it might be useful for your own software development effort too.

    The projects here are not directly related to our distributed database. See our integrations section for open source that you can use along with mishmash io.

What's in there for you

Warning alert:Open-sourcing our code is an ongoing initiative

We have not yet released all the tools and software that are mentioned in this section.

Follow us on social media to get notified when new releases happen.

Developing a distributed system (like our distributed database) is not an easy job as it involves a lot of components running on different servers and executing complex processes in parallel. The behaviour and performace of such a system can vary greatly depending on circumstances that are hard to uncover and harder to replicate.

This is why we, at mishmash io, spend a lot of extra effort to build the tools for the job and we're now sharing them as open source.

Our open source tools can be broadly categorized based on their role within our development process:

  • Gain knowledge
  • Experiment
  • Measure
  • Optimize
  • Test
  • Maintain

Tools to gain knowledge

The focus of the first category is to make it simpler to understand how a distributed system works. Understanding its behaviour is crucial for all parts of the development process - so, for example - we build tools to collect and visualize telemetry data.

Tools to experiment

Illustrating what goes on inside our database is not all though. To also gain practical knowledge you need to experiment, and experiments here at mishmash io can be quite large-scale and/or time consuming. Here, we develop tools to automate and scale software systems.

Also, we often use 'off the shelf' components (typically other open source software) to simulate the broader environment in which our database (or part of it) is used. For these, we've created some additional modules or have modified them in ways we find more suitable for our work.

Measure your progress

Measuring is another essential part of engineering as it tells you, for example, if you're going in the direction. We mentioned 'telemetry' earlier, but uncovering hidden relationships in telemetry data is not always doable on a dashboard, so we also develop batch jobs to do more advanced data processing.

Optimize algorithms

Optimizing a multi-component software system also involves optimizing every individual part of it. For example, the job of optimizing transaction-commit algorithms might be distinct from optimizations on the number of operations needed to solve a data-driven algorithm (where data is used as input).

One of the hardest tasks over here is exactly that - optimizing our code so it can take advantage of available CPUs, memory, disks and networks without 'hanging' the hardware it runs on. Tools in this category help us iterate faster with our algorithm designs and tweaks.

Test changes

Unlike user-facing software, a databse has a broad spectrum of use cases. In general, we can't easily put them into a definitive list of 'user stories' or 'workflow steps' and then use that list to test mishmash io against all intended uses.

Maintain production code

Every piece of code of a large software project has its own lifecycle - from the moment it gets introduced, through various iterations of 'settling it down' in the larger code base, until it is eventually replaced by a new and better implementation. In other words - living with the code you've already developed is also part of our software development process.

Having done that for a number of years now - we came up with some ideas on how to make our lives easier and implemented them in code.

How to adopt our open source

Info alert:Getting Source Code

Typically we host our open source projects on GitHub 

Open source by stack

Tools we develop often fall into more than one of the categories above. For simplicity, we'll organize them by software stacks. The ways we use each one of them and how it fits our development process is described in mode detail below.

Apache Big Data Stack

Warning alert:Section under development

This part of our website is still under development and major parts of its content are not published yet.

Follow us on social media to get notified when new content is released.

The Apache Software Foundation  develops hundreds of open source projects that form the backbone for the most visible and widely used applications in computing today.

Naturally, we use a lot of their projects as 'off the shelf' components, for running large scale tests and experiments, to handle telemetry data and more.

  • Our modifications, extensions and automation tools for the Apache Big Data stack are listed here.

OpenTelemetry

OpenTelemetry  allows you to generate, collect and export telemetry data (logs, metrics and traces) from your software systems. It is an invaluable tool that helps you analyze and understand your code's behaviour.

At mishmash io we use it extensively to identify areas of potential performance improvements and to test the effects of new code on performance.

Over time we developed a number of OpenTelemetry-related software tools that we found useful in our work and now we're sharing them with the open source community.

© 2024, Mishmash I O UK Ltd. or its affiliates. All rights reserved. | Privacy Policy | Cookies