Data Science at Circus Social

Circus Social is brimming with social intelligence. With a vibrant and vastly talented group of Data Engineers, Data Scientists, and Data Analysts on board – we live and breathe analytics.

More recently, our machine learning capabilities for the 20/Twenty platform have benefitted from a decentralized and distributed structure.

Data Science at Circus Social

Our clients are often faced with a variety of marketing challenges – and we try to work hand-in-hand to find the most efficient way of tackling them. These could include finding solutions to ‘hard-problems‘ like real-time document clustering or- predicting a virality score for posts on social media.

There are also new challenges that occur with changing lifestyles and social media behavior like, automatically grouping similar videos uploaded by users based on different attributes including context, language and emotions where we have to tap in to specializations of both the science and the engineering teams have been only possible with the contribution of both the science and the engineering teams.

While our data science team has created magic with data, predictive and modeling techniques, our engineering team has helped maintain several terabytes of data and created effective machine learning pipelines.

Data Science at Circus SocialThe team’s latest project has been to develop and deploy a real-time document clustering functionality – allowing our clients to find viral content, identify trends and proactively react to crisis situations. The three main phases of this project are feature extraction and selection, document representation and clustering.  After having experimented with a lot of clustering algorithms with speed and cluster quality as the primary metric for evaluation we settled for a hybrid approach that used both K-means and agglomerative hierarchical clustering. K-means, because of its run-time efficiency and agglomerative hierarchical clustering because of the cluster quality. While our initial motivation was to find similar conversations in a given geographical area, we have also started using it to recommend conversations and articles the user is most interested in.

Data Science at Circus Social

At Circus Social, promoting the best ideas not only pampers our love for an open culture but also helps drive an innovative culture that constantly challenges the status quo. As a social media analytics company, we have developed a similar wavelength for Data Science.

From where it all began in the early days of social listening-, to today’s culmination of intelligence, data science & machine learning-, we are truly excited at the possibilities of what the future of 20/Twenty holds for us, our team and our clients.

A Chance Meeting: Social Media Monitoring

I was at a Business School today to attend an event and met a few people from the industry. Here is an interesting conversation I had with somebody I met. Let’s call him Mr. X.

Mr. X – So what do you do?
Ram – I am an entrepreneur and run a software product company called Circus Social. We do social media monitoring and analytics; and have offices in Singapore and Bangalore.

Mr. X – What exactly is social media monitoring?
Ram – In simple terms, we fetch data, and a lot of it, from multiple social networks like Facebook, Twitter, Instagram, Pinterest, Reddit etc. and also from several blogs, forums and news sites. We then process and augment this data (with sentiment, gender, removing spam etc) and allow marketers to get insights from this data. We show them trends, insights, a lot of charts and graphs and allow them to make better business decisions.

Mr. X – So you are a big data company. Sounds like tough work. What kind of business decisions can be made?
Ram – Yes, we are indeed a big data company. Currently, our clients are enterprises, typically marketers, who would like to know their customers better, research their markets, analyse their competitors, identify top influencers, get real time alerts on topics of interest etc. We are also launching our SME product shortly.

Mr. X – So, do you invade people’s privacy? Isn’t this data private?
Ram – We only crawl data that is already publicly available. Anything that is private or protected by you is completely out of bounds for us. We do an incredibly good job of collating all of this data in our platform and marketers find tremendous value in quickly viewing trends from 10,000 feet but also in the ability to drill down to understand the “why” and “how” behind the trends.

Mr. X – Will this be useful for my company?
Ram – If people are talking about your company, your brands or topics that you are interested in tracking, you will definitely find it useful.

Mr. X – Where can I get more information?
Ram – Here’s my business card. Please visit our website https://www.circussocial.com where you can find a lot more information about what we do and a list of some of our clients. You can also sign-up for a free demo.

Mr. X – Sounds good. Thank you!
Ram – You are welcome!

Check out how the 20/Twenty Social Monitoring platform has grown!

Growth of 20/Twenty Social Monitoring & Intelligence Platform – Under the Hood

Growth of 20/Twenty Social Monitoring & Intelligence Platform – Under the Hood

We have often been asked about the challenges we faced in scaling up our technology stack to manage big data. I have attempted to address this in this post which is the first of a series of blog posts on this and similar topics.

20/Twenty was created ground-up as the most intuitive and easy to use cloud based (SaaS) Social Monitoring & Intelligence platform in the world.   Based on our deep understanding of what marketers needed and the awesome designs we created, we signed up our first client even before the product was officially launched. The pressure to quickly deliver the first version of the product was intense 🙂

From an engineering point of view, there’s a huge amount of data that we pull (Think Big Data!), process, augment and then visualize in the platform all on a near real-time basis. Imagine someone tweeting and it appears on our platform within a few seconds along with augmented information including Gender, Sentiment, Engagement, Spam score etc.

The evolution of 20/Twenty has already seen a few stages of growth. The graph below shows how 20/Twenty data has grown over the last 2 years since our product launch. This is a really cool growth for a startup like Circus Social both from a business perspective as well as from an engineering standpoint. We used several tricks from the books as well as a few practical hacks to ensure our ability to fetch, process, augment and visualize high volumes of data continued to become better, though this journey was not without pain!

social-intelligence-20twenty-big-data-growth

 

Stage 1

We created over 200 custom marketing applications in our previous avatar at Circus Social working with some of the biggest brands in the world. We used the same open source technologies (PHP / MySQL) to create the first version of 20/Twenty. This worked well and as our data grew in the first few months, we continued to grow vertically by adding more capacity (CPU/RAM).

Most of the queries from the application were read queries whereas a bulk of “write operations” were being performed by our data crawlers. We therefore created an efficient master-slave architecture where the application would read from the slaves and the crawler scripts would write into the master. This worked well in general but the exponential increase in the volume of data meant that certain queries were running extremely slow and impacting the user experience.

Stage 2

Since our data volume was growing exponentially and the relational aspects of the database were not the core of our application, we realized that sooner or later, we would have to move to a NoSQL database. However, the performance issues that were cropping up had to be sorted quickly and without a downtime. We quickly realized that we needed a dedicated search engine and MySQL was not good enough for this purpose.

We explored several options and Elasticsearch came to our rescue here. Elasticsearch is a distributed, RESTful search and analytics engine that centrally stores your data in a manner which can be retrieved / read really fast by your applications. Our awesome tech team deployed this in a matter of days. The improvement in performance was remarkable. The plan worked and we cheered!

Stage 3

Word spread in Singapore and Asia about how good our platform was (and our sales team did a good job too!) and we continued to sign up new clients. The volume of data continued to grow for existing clients as well as new clients. The tech stack of MySQL and ElasticSearch did not let us down but we wanted to create an architecture that would scale infinitely, if there’s a thing like that.

In Stage 3, we moved the core of our database from MySQL to Cassandra (Elasticsearch was now interacting with Cassandra) and the backend code from PHP to Node.js. We also migrated most of our front end code to Angular.js for better performance. This was a major architectural change on a live application being used by several clients so we created a parallel production like environment and ran it parallelly for several weeks to ensure everything was working as desired before switching over.

While we did the above, we continued to work on cool new features on the product and opened up our data API’s to a few clients who wanted a deeper integration with their own applications. Other tools we used during this and other stages were Postman, Github and JIRA.

As we scale further from here, we will probably have newer and more exciting technology challenges and we will keep posting about them. If you are excited to work on some of these, do write to us at careers@circussocial.com