- Generate the file:
$ awk 'BEGIN { for(c=0;c<10000000;c++) printf "<p>LOL</p>" }' > 100M.html
$ (for I in `seq 1 100`; do cat 100M.html; done) | pv | gzip -9 > 10G.boomgz
- Check it is indeed good:
$ awk 'BEGIN { for(c=0;c<10000000;c++) printf "<p>LOL</p>" }' > 100M.html
$ (for I in `seq 1 100`; do cat 100M.html; done) | pv | gzip -9 > 10G.boomgz
package com.dataartisans.beam_comparison.customTriggers; | |
package com.cisco.sbg.amp; | |
import org.apache.flink.api.common.state.ValueState; | |
import org.apache.flink.api.common.state.ValueStateDescriptor; | |
import org.apache.flink.api.common.typeutils.base.BooleanSerializer; | |
import org.apache.flink.api.common.typeutils.base.LongSerializer; | |
import org.apache.flink.hadoop.shaded.com.google.common.base.Preconditions; | |
import org.apache.flink.streaming.api.windowing.time.Time; |
package org.apache.hadoop.mapred; | |
import java.io.IOException; | |
import org.apache.commons.logging.Log; | |
import org.apache.commons.logging.LogFactory; | |
import org.apache.hadoop.conf.Configuration; | |
import org.apache.hadoop.fs.FileSystem; | |
import org.apache.hadoop.fs.Path; | |
import org.apache.hadoop.mapred.FileOutputCommitter; | |
import org.apache.hadoop.mapred.FileOutputFormat; |
# You don't need Fog in Ruby or some other library to upload to S3 -- shell works perfectly fine | |
# This is how I upload my new Sol Trader builds (http://soltrader.net) | |
# Based on a modified script from here: http://tmont.com/blargh/2014/1/uploading-to-s3-in-bash | |
S3KEY="my aws key" | |
S3SECRET="my aws secret" # pass these in | |
function putS3 | |
{ | |
path=$1 |
I've been looking for the best Linux backup system, and also reading lots of HN comments.
Instead of putting pros and cons of every backup system I'll just list some deal-breakers which would disqualify them.
Also I would like that you, the HN community, would add more deal breakers for these or other backup systems if you know some more and at the same time, if you have data to disprove some of the deal-breakers listed here (benchmarks, info about something being true for older releases but is fixed on newer releases), please share it so that I can edit this list accordingly.
/* | |
* Copyright 2015 Databricks, Inc. | |
* | |
* Licensed under the Apache License, Version 2.0 (the "License"); you may | |
* not use this file except in compliance with the License. You may obtain | |
* a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 | |
* | |
* Unless required by applicable law or agreed to in writing, software |
A checklist for designing and developing internet scale services, inspired by James Hamilton's 2007 paper "On Desgining and Deploying Internet-Scale Services."
Producer | |
Setup | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1 | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3 | |
Single thread, no replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 |
import spark.streaming.{Seconds, StreamingContext} | |
import spark.storage.StorageLevel | |
import spark.streaming.examples.twitter.TwitterInputDStream | |
import com.twitter.algebird._ | |
import spark.streaming.StreamingContext._ | |
import spark.SparkContext._ | |
/** | |
* Example of using CountMinSketch monoid from Twitter's Algebird together with Spark Streaming's | |
* TwitterInputDStream |