What is StreamHorizon?
- StreamHorizon is the next generation ETL Big Data processing platform. It is highly efficient and very extensible replacement for legacy stove-pipe ETL platforms. We call this adaptiveETL.
When is the StreamHorizon the right choice for your project?
- If you need to deliver robust project quickly. StreamHorizon single XML configuration file is powerful method of rapid development which saves development time and significantly reduces possibility of bugs occurring in your ETL.
- When your solution needs to be highly parallel and you wish not to dedicate time to solve multithreading, synchronization & database locking issues.
- When your solution must scale up in data volumes, scale vertically and/or horizontally.
- When 90% of your ETL logic can simply be configured (fact loads, parallelism, Type 1, Type 2 and other dimension types, mappings, lookups etc.) and other 10% can be implemented by your own executables or OS scripts or SQL procedures which will be simply invoked by provided StreamHorizon Event architecture. All you need to specify is when your customised logic should be invoked by StreamHorizon engine. This is done by simply creating XML element in configuration file which is pointing to the location of your executable logic, script or SQL procedure.
- 100% of ability to override default behaviour of StreamHorizon engine by injecting your data/header/footer/cache processing Java classes which will be invoked by StreamHorizon engine.
- Ability to control/embed your ETL logic by invoking Pre, onSuccess, onFailure, Finally (Post) events for every data entity processed (data entity equates to message, file, SQL query).
- You need a solution that can be clustered simply by installing and copying XML configuration files
When is the StreamHorizon not the right choice for your project?
- If you require mainframe connectors (EBCDIC) or any other type of legacy technology connectivity
- If you already have established teams with expertise in ETL tool of your choice which aren’t high burden for your project budget (assumption made here is that you do not require massively parallel database loading functionality)
What is the difference between StreamHorizon and Leading Market ETL/Data Integration vendors?
In simple terms, if you require optimal hardware utilization, simple setup (xml configuration), cheaper (but not less functional) alternative to leading ETL & Data Integration tools
with high parallelism & high data processing throughput, without any need for ETL tool specific language & knowledge, StreamHorizon is the right ETL tool for your project.
Read In More detail about:
- High throughput, massively parallel ETL platform capable of processing very large data volumes
- Data throughput of 1+ million records per second (running on a single commodity server)
- Running on BigData, cloud deployable, available as Software as a Service
- Quick time to market (enables you to deliver pilot project within single week)
- Hardware efficient - Unrivalled efficiency ratio of data processing throughput achievable on low specification hardware
- No specific ETL knowledge of platform is required as StreamHorizon is fully configurable via single configuration XML file
- IT staff with basic IT knowledge can configure, setup & deliver Data Warehousing & Data Integration projects
- Ideal outsourcing candidate due to low level of skill required to operate StreamHorizon platform
- Simple Environment, Release/Revert and Upgrade Risk management, fully supporting Agile methodologies
- Low cost of ownership (low Business As Usual costs)
What StreamHorizon platform brings to your existing Data Integration and Business Intelligence stack?
StreamHorizon platform enables SQL developers and programmers to benefit from ETL framework which does not require intimate knowledge of complex ETL tools and yet offers all functionality of the same. This positions StreamHorizon as ideal candidate for strategic ETL solution.
In case that you already have existing Data Integration & Business Intelligence stack you could use StreamHorizon as:
- Platform for massively parallel bulk loading of data into the database (avoiding unnecessary database locking)
- OLAP Integration - Platform for massively parallel data loading into OLAP server in both Real Time and Batch oriented architectures
- Hadoop to Non-Hadoop ETL bridge (and vice versa)
- Platform for scheduled massively parallel data extraction from any data source into any other data source of your choice
- Framework for massively parallel file2file, message2file, db2db, db2file, file2db, Thrift2db, Thrift2file, file2Hadoop, messaage2Hadoop, dabase2Hadoop, Hadoop2Hadoop, Hadoop2file, Hadoop2database... and any other possible connector (source/target) permutation ...
- StreamHorizon should be used as a simple to configure, robust ETL platform which will enable existing IT staff (with database and/or programming background) to become productive extremely quickly. No intimate knowledge of ETL tool is required as it is XML configuration based. Framework can be extended as desired by using OS scripts (all platforms supported), Java or SQL (statements and procedures)
- StreamHorizon is simpler to use than Market Leading ETL tools. StreamHorizon platform offers unrivalled performance and parallelism. This is not to say that StreamHorizon cannot be used in conjunction with other ETL frameworks. For example, any ETL framework can be used as a tool (to transform data) and create bulk files while StreamHorizon can be used as massively parallel loading framework into any database. Something that market lading ETL tools cannot natively do as efficiently as StreamHorizon is able to
How is StreamHorizon licensed?
StreamHorizon licensing is based on:
Client makes a choice which option of licensing will be signed. Usually that is cheaper licence type of the two listed above.
Large corporate clients can be offered Bulk licence agreement which is effectively payment of agreed fixed amount which allows client to deploy agreed number of StreamHorizon deployments.
Please contact us with details of your CPU, CPU type, memory details and number of instances of StreamHorizon you wish to run. We will reply to you with pricing options to choose from.
- Number of CPU’s, CPU type and memory resources of your hardware
- Number of total number of StreamHorizon instances and total number of threads used for each instance
What is the recommended way to start using StreamHorizon?
We recommend that you first download our demo and test deploy StreamHorizon. Demo sample configurations provided will give you solid idea how simple StreamHorizon is to configure and operate as a platform.
By deploying StreamHorizon demo at adequate hardware you will get idea of power of the platform in terms of data throughput and processing efficiency.
Quick browse of XML tutorial provided in config folder of the StreamHorizon distribution will give you good idea about source and target connectors you could use and all tuning parameters of the platform which are designed to bring maximum performance out of your hardware.
Upon decision to proceed with purchase of StreamHorizon platform our technical staff will be in touch with you do discuss best deployment strategy, licensed copy of software including guides and manuals will be provided.
What are the minimum system requirements for running StreamHorizon?
Minimal requirements are:
- Any mainstream operating system (Linux, Windows, Solaris)
- 100MB of disk space
- Decent CPU
- 500MB of RAM
- Java 1.7 or higher
We recommend heap allocation of 4-8GB per StreamHorizon instance. Number of CPU’s and memory is down to limitation of your server and required throughput of your deployment.
In scavenging mode of operation, in which workstations are utilized to act as ETL processing engines, StreamHorizon can be run with allocated 500MB RAM and single thread.
For highest efficiency we recommend 64 bit JDK.