Discussion:
[jruby-user] Reading binary Files
Roger Palma
2014-10-04 21:38:25 UTC
Permalink
Hi there
I am trying to read a binary file using Java's "FileInputStream" to
later store it in HBase.

My problem is the byte-array conversion needed to call the read-method:

inFile = File.new("/home/roger/Downloads/test.jpg")
inputStream = FileInputStream.new(inFile)

length = inFile.length()
buffer = ""

inputStream.read(buffer)


Any Ideas?

You may find the entire code in the attachment.

Thanks
Roger

Attachments:
http://www.ruby-forum.com/attachment/10139/test.rb
--
Posted via http://www.ruby-forum.com/.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Ariel Valentin
2014-10-04 21:51:29 UTC
Permalink
Would you mind clarifying your goal a little. Is it your intention to read
the bytes in a "streaming" fashion or is it OK to read the entire file in
memory as a byte[]?



Ariel Valentin
e-mail: ***@arielvalentin.com
website: http://blog.arielvalentin.com
skype: ariel.s.valentin
twitter: arielvalentin
linkedin: http://www.linkedin.com/profile/view?id=8996534
---------------------------------------
*simplicity *communication
*feedback *courage *respect
Post by Roger Palma
Hi there
I am trying to read a binary file using Java's "FileInputStream" to
later store it in HBase.
inFile = File.new("/home/roger/Downloads/test.jpg")
inputStream = FileInputStream.new(inFile)
length = inFile.length()
buffer = ""
inputStream.read(buffer)
Any Ideas?
You may find the entire code in the attachment.
Thanks
Roger
http://www.ruby-forum.com/attachment/10139/test.rb
--
Posted via http://www.ruby-forum.com/.
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
Roger Palma
2014-10-04 22:03:45 UTC
Permalink
Hey Ariel

It's the latter case. These PDFs are rather small. So I just want to
read them into memory and then pass this "stream" (say byte[]) to
another function (the put-method of HBase)
--
Posted via http://www.ruby-forum.com/.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Christian MICHON
2014-10-05 08:43:42 UTC
Permalink
Strange: based on your code, it should warn you that File is already
defined by ruby. "warning: already initialized constant File"

You need to perform at least a :remove_const on Object prior to any java
class import. Like this:
Object.send(:remove_const,:File) # put this as 1st line

Next error... NameError: no method 'read' for arguments
(org.jruby.RubyString) on Java::JavaIo::FileInputStream
Explanation: your buffer is actually a ruby string, not a byte[]. Possible
fix:
buffer = [].to_java(:byte)

I did not check the rest of HBase related code. Good luck!
Post by Roger Palma
Hi there
I am trying to read a binary file using Java's "FileInputStream" to
later store it in HBase.
inFile = File.new("/home/roger/Downloads/test.jpg")
inputStream = FileInputStream.new(inFile)
length = inFile.length()
buffer = ""
inputStream.read(buffer)
Any Ideas?
You may find the entire code in the attachment.
Thanks
Roger
http://www.ruby-forum.com/attachment/10139/test.rb
--
Posted via http://www.ruby-forum.com/.
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
--
Christian
Roger Palma
2014-10-05 09:34:41 UTC
Permalink
According to the Ruby Documentation, I just use their "file" class,
since my PDFs (here JPGs) are rather small. This seems to work so far,
but might be slower than a stream-based way. It did well for importing 5
files on my test box, we'll see how it does when running on the real
site with millions of files. Even without my "puts" the HBase shell
produces a lot of messages on screen...

java_import "org.apache.hadoop.hbase.util.Bytes"
java_import "org.apache.hadoop.hbase.client.HTable"
java_import "org.apache.hadoop.hbase.client.Put"


def jbytes(*args)
args.map { |arg| arg.to_s.to_java_bytes }
end

files = Dir.glob("/home/roger/Downloads/*.jpg")

files.each { |x| puts "File #{x}"

inFile = File.new(x)
buffer = inFile.read()

table = HTable.new(@hbase.configuration, "rb_test")
p = Put.new(*jbytes(File.basename(x)))

p.add(*jbytes("inhalt", "", buffer))

table.put(p)

table.close()
}
--
Posted via http://www.ruby-forum.com/.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email
Christian MICHON
2014-10-05 12:40:27 UTC
Permalink
Then please remove the following line from your code:

java_import "java.io.File"

Good luck!
Post by Roger Palma
According to the Ruby Documentation, I just use their "file" class,
since my PDFs (here JPGs) are rather small. This seems to work so far,
but might be slower than a stream-based way. It did well for importing 5
files on my test box, we'll see how it does when running on the real
site with millions of files. Even without my "puts" the HBase shell
produces a lot of messages on screen...
java_import "org.apache.hadoop.hbase.util.Bytes"
java_import "org.apache.hadoop.hbase.client.HTable"
java_import "org.apache.hadoop.hbase.client.Put"
def jbytes(*args)
args.map { |arg| arg.to_s.to_java_bytes }
end
files = Dir.glob("/home/roger/Downloads/*.jpg")
files.each { |x| puts "File #{x}"
inFile = File.new(x)
buffer = inFile.read()
p = Put.new(*jbytes(File.basename(x)))
p.add(*jbytes("inhalt", "", buffer))
table.put(p)
table.close()
}
--
Posted via http://www.ruby-forum.com/.
---------------------------------------------------------------------
http://xircles.codehaus.org/manage_email
--
Christian
Loading...